First of all, let's dispell the illusion that some people have that DXT1 is "pretty good". DXT1 is fucking awful. It compresses
at 1.33333 bits per byte (4 bits per pixel). That's very large as far as image compressors are concerned. For typical images,
around 4.0 bpb is true lossless, around 1.5 is perceptually lossless, and around 0.5 is "very good". In fact wavelet compressors
can get as low as 0.1 bpb and acheive about the same quality as DXT1. Despite this I've heard smart people saying that "DXT1 is
pretty good". Yes, it is a convenient fixed size block, and yes the decoder is extremely fast and simple, but as far as quality
is concerned it is not even close. At 1.33333 it should be near lossless.
Here are some numbers on various DXT1 compressors. The numbers in the table are the RMSE (sqrt of the L2 error). The far right
column is a wavelet compressor for comparison; it's not the best wavelet compressor in the world by a long shot, it's "cbwave" which
is very old and which I designed for speed, not maximum quality. In any case it gives you an idea how far off DXT1 is.
(BTW I always try to show results in RMSE because it is linear in pixel magnitude, unlike MSE or PSNR which is a very weird nonlinear scale).
More discussion after the table...
(ADD: the "ryg" results shown here have a bug and should be ignored; see DXTC Post Summary )
| file | Squish opt | Squish | CB opt | CB | ryg | D3DX8 | FastDXT | | cbwave
|
| kodim01.bmp | 8.2808 | 8.3553 | 8.352 | 8.6924 | 9.374 | 9.8466 | 9.9565 | | 2.6068
|
| kodim02.bmp | 6.1086 | 6.2876 | 6.1287 | 6.3025 | 7.52 | 7.4308 | 8.456 | | 1.6973
|
| kodim03.bmp | 4.7804 | 4.9181 | 4.8312 | 5.0225 | 5.855 | 6.094 | 6.4839 | | 1.3405
|
| kodim04.bmp | 5.6913 | 5.8116 | 5.7285 | 5.9394 | 6.9408 | 7.1032 | 7.3189 | | 1.8076
|
| kodim05.bmp | 9.6472 | 9.7223 | 9.707 | 10.112 | 10.8934 | 11.273 | 12.0156 | | 2.9739
|
| kodim06.bmp | 7.1472 | 7.2171 | 7.1777 | 7.44 | 8.1005 | 8.5195 | 8.6202 | | 2.0132
|
| kodim07.bmp | 5.7804 | 5.8834 | 5.8379 | 6.0583 | 6.8153 | 7.2182 | 7.372 | | 1.4645
|
| kodim08.bmp | 10.2391 | 10.3212 | 10.346 | 10.747 | 11.3992 | 11.8703 | 12.2668 | | 3.2936
|
| kodim09.bmp | 5.2871 | 5.3659 | 5.3306 | 5.5234 | 5.9884 | 6.5332 | 6.6716 | | 1.6269
|
| kodim10.bmp | 5.2415 | 5.3366 | 5.2777 | 5.4633 | 5.9377 | 6.4601 | 6.4592 | | 1.7459
|
| kodim11.bmp | 6.7261 | 6.8206 | 6.7643 | 7.0216 | 7.8221 | 8.1056 | 8.2492 | | 1.8411
|
| kodim12.bmp | 4.7911 | 4.8718 | 4.8204 | 4.9863 | 5.6651 | 6.005 | 6.0748 | | 1.5161
|
| kodim13.bmp | 10.8676 | 10.9428 | 10.925 | 11.4237 | 12.402 | 12.7139 | 12.9978 | | 4.1355
|
| kodim14.bmp | 8.3034 | 8.3883 | 8.3398 | 8.6722 | 9.4258 | 9.896 | 10.8481 | | 2.4191
|
| kodim15.bmp | 5.8233 | 5.9525 | 5.8568 | 6.0862 | 6.6749 | 7.3085 | 7.4932 | | 1.6236
|
| kodim16.bmp | 5.0593 | 5.1629 | 5.0863 | 5.2851 | 5.8093 | 6.3361 | 6.1592 | | 1.546
|
| kodim17.bmp | 5.5019 | 5.6127 | 5.5313 | 5.7358 | 6.4975 | 6.7395 | 6.8989 | | 1.7166
|
| kodim18.bmp | 7.9879 | 8.0897 | 8.0192 | 8.3716 | 9.7744 | 9.5357 | 9.7857 | | 2.9802
|
| kodim19.bmp | 6.5715 | 6.652 | 6.6692 | 6.91 | 8.0128 | 7.9229 | 8.0096 | | 2.0518
|
| kodim20.bmp | 5.4533 | 5.5303 | 5.4895 | 5.6864 | 6.3457 | 6.4878 | 6.8629 | | 1.5359
|
| kodim21.bmp | 7.1318 | 7.2045 | 7.1724 | 7.4582 | 8.1637 | 8.4703 | 8.6508 | | 2.0659
|
| kodim22.bmp | 6.43 | 6.5127 | 6.4644 | 6.7137 | 7.8264 | 8.0046 | 7.9488 | | 2.2574
|
| kodim23.bmp | 4.8995 | 5.0098 | 4.9244 | 5.0906 | 5.6989 | 6.3057 | 6.888 | | 1.3954
|
| kodim24.bmp | 8.4271 | 8.5274 | 8.4699 | 8.8564 | 9.3906 | 9.9389 | 10.5156 | | 2.4977
|
| clegg.bmp | 14.9733 | 15.2566 | 15.1755 | 15.7168 | 16.3563 | 21.471 | 32.7192 | | 10.5426
|
| FRYMIRE.bmp | 10.7184 | 12.541 | 12.132 | 12.8278 | 12.989 | 16.7308 | 28.9283 | | 6.2394
|
| LENA.bmp | 7.138 | 7.2346 | 7.1763 | 7.4264 | 8.1203 | 8.742 | 9.5143 | | 4.288
|
| MONARCH.bmp | 6.5526 | 6.6292 | 6.5949 | 6.846 | 7.5162 | 8.1053 | 8.6993 | | 1.6911
|
| PEPPERS.bmp | 6.3966 | 6.5208 | 6.4557 | 6.677 | 7.3618 | 8.1855 | 8.8893 | | 2.3022
|
| SAIL.bmp | 8.3233 | 8.3903 | 8.3598 | 8.6627 | 9.8685 | 9.7838 | 10.5673 | | 2.9003
|
| SERRANO.bmp | 6.3508 | 6.757 | 6.8385 | 7.9064 | 7.5303 | 9.0549 | 18.3631 | | 4.6489
|
| TULIPS.bmp | 7.5768 | 7.656 | 7.6146 | 7.8786 | 8.4084 | 9.3817 | 10.5873 | | 2.2228
|
Back to comparing the DXT1 encoders.
BTW the test set here is the Kodak image set plus the Waterloo Bragzone image set. The Kodak set is all photographs that are pretty noisy, and there's
not a huge difference in the coders.
The Bragzone image set has some synthetic images with things like gradients which are harder to compress well, and there you can really dramatically see
the bad encoders fall apart. In particular if you look at the results on "clegg" and "frymire" and "serrano" you can see how bad the "FastDXT" coder is.
The "Squish" in the table is the iterative cluster fit with uniform weighting. All coders work on linear RGB error; and the MSE is mean per pixel not per component.
The "CB" encoder is a simple 2-means fit. I seed the means by doing the PCA to find the best fit line. I put a plane perpendicular to that line through
the average and take all the points on each half to be the two clusters, average the cluster to seed the means, and then iteratively refine the means by
reclustering. Once I have the 2-means I do a simple search to find the best 565 DXT end points to find the two means. There are 3 cases to try :
1. put c0 and c1 = the two means
2. put c2 and c3 = the two means (so c0 and c1 are pushed out)
3. make (c0,c2) straddle mean 1 and (c3,c1) straddle mean 2 - this is best for gaussian clusters around the mean
A 2-means like this is slightly better than doing a pca "range fit" like Simon's fast method or the "ryg" method. If the data was all Gaussian noise, they
would be equivalent, but of course it's not. You often get blocks that have a bunch of pixels at the low end which are all exactly the same color (
for example, all perfect black), and then
a spread of a bunch of junk at the high end (like some orange, some yellow, etc.). You want to put one end point exactly on perfectly black and put the other
endpoint at the center of the cluster on the high end.
"CB opt" and "Squish opt" take the results of the CB and Squish compressors and then improve them by iterative search. Simon Brown on his page mentions
something about doing greedy endpoint refinement but claims it "doesn't converge because the indeces change". That's silly, of course it converges.
To do a greedy search : try wiggling one or both end points in 565 space. Find new best index fit for the new end points. Measure new error. If the
error is lower, take the step.
Of course that works and it does converge and it's pretty simple and improves the encoding after Squish.
In fact you can do even better by doing simulated annealing instead of a plain greedy search. We should know by now that any time we have a greedy
search like this where we can measure the error and it can get stuck in local minima, we can improve it with something like simulated annealing. I use 256
steps of annealing with a sinusoidal decreasing temperature pattern. I randomly pick a way to wiggle the two end points (you need to consider wiggling both,
not just single wiggles). I try the wiggle and measure the error delta. Negative errors (improvements) are always taken, positive errors are taken
probabilitistically based on the temperature.
This is what was done in the "opt" coders above.
Most of the time the annealing search just makes a small improvement, but once in a while (such as on "frymire" and "serrano") it really finds something
remarkable and makes a big difference.
Simon's cluster fit alg is very sweet. I didn't really understand it at first
from the description of the code, so I'm going to redescribe it for myself here.
The basic idea is you find an assignment of indices, and from that solve a best fit to give you the end points for those indices. If you think about all the
colors in the block living in 3x3 color space, assigning indices is like giving labels to each point. All the points with the same label are a "cluster".
Then each cluster is fit with either an end point or one of the interpolated points.
Once you know the clusters, putting the best two end points is a simple linear optimization. You can solve the least squares analytically, it's not like a
matrix iteration least squares problem or anything.
So the trick is how to decide the indices. If you tried them all, it would be 4^16 ways, which is way too many. So what Simon does is create a linear ordering
of the points using the PCA axis, then try all clusterings that can be created by planes perpendicular to that linear axis. That's equivalent to just trying all
groupings of the indeces sorted by their dot along the PCA axis. That is, the clusters are
[0,i) , [i,j) [j,k) [k,16)
for all i,j,k
which is a manageable amount to search, and gives you most of the interesting clusterings that you care about. Something that might improve Squish is tring a
few different axes and picking the best.
BTW this end point optimization is very approximate. One issue is that it assumes the best index for each point doesn't change. It
also of course just uses real number arithmetic to make the 1/3 points, not the actual integer rounding that's done. Those factors
are actually pretty big.