Here are some numbers on various DXT1 compressors. The numbers in the table are the RMSE (sqrt of the L2 error). The far right column is a wavelet compressor for comparison; it's not the best wavelet compressor in the world by a long shot, it's "cbwave" which is very old and which I designed for speed, not maximum quality. In any case it gives you an idea how far off DXT1 is. (BTW I always try to show results in RMSE because it is linear in pixel magnitude, unlike MSE or PSNR which is a very weird nonlinear scale). More discussion after the table...

(ADD: the "ryg" results shown here have a bug and should be ignored; see DXTC Post Summary )

file | Squish opt | Squish | CB opt | CB | ryg | D3DX8 | FastDXT | cbwave | |

kodim01.bmp | 8.2808 | 8.3553 | 8.352 | 8.6924 | 9.374 | 9.8466 | 9.9565 | 2.6068 | |

kodim02.bmp | 6.1086 | 6.2876 | 6.1287 | 6.3025 | 7.52 | 7.4308 | 8.456 | 1.6973 | |

kodim03.bmp | 4.7804 | 4.9181 | 4.8312 | 5.0225 | 5.855 | 6.094 | 6.4839 | 1.3405 | |

kodim04.bmp | 5.6913 | 5.8116 | 5.7285 | 5.9394 | 6.9408 | 7.1032 | 7.3189 | 1.8076 | |

kodim05.bmp | 9.6472 | 9.7223 | 9.707 | 10.112 | 10.8934 | 11.273 | 12.0156 | 2.9739 | |

kodim06.bmp | 7.1472 | 7.2171 | 7.1777 | 7.44 | 8.1005 | 8.5195 | 8.6202 | 2.0132 | |

kodim07.bmp | 5.7804 | 5.8834 | 5.8379 | 6.0583 | 6.8153 | 7.2182 | 7.372 | 1.4645 | |

kodim08.bmp | 10.2391 | 10.3212 | 10.346 | 10.747 | 11.3992 | 11.8703 | 12.2668 | 3.2936 | |

kodim09.bmp | 5.2871 | 5.3659 | 5.3306 | 5.5234 | 5.9884 | 6.5332 | 6.6716 | 1.6269 | |

kodim10.bmp | 5.2415 | 5.3366 | 5.2777 | 5.4633 | 5.9377 | 6.4601 | 6.4592 | 1.7459 | |

kodim11.bmp | 6.7261 | 6.8206 | 6.7643 | 7.0216 | 7.8221 | 8.1056 | 8.2492 | 1.8411 | |

kodim12.bmp | 4.7911 | 4.8718 | 4.8204 | 4.9863 | 5.6651 | 6.005 | 6.0748 | 1.5161 | |

kodim13.bmp | 10.8676 | 10.9428 | 10.925 | 11.4237 | 12.402 | 12.7139 | 12.9978 | 4.1355 | |

kodim14.bmp | 8.3034 | 8.3883 | 8.3398 | 8.6722 | 9.4258 | 9.896 | 10.8481 | 2.4191 | |

kodim15.bmp | 5.8233 | 5.9525 | 5.8568 | 6.0862 | 6.6749 | 7.3085 | 7.4932 | 1.6236 | |

kodim16.bmp | 5.0593 | 5.1629 | 5.0863 | 5.2851 | 5.8093 | 6.3361 | 6.1592 | 1.546 | |

kodim17.bmp | 5.5019 | 5.6127 | 5.5313 | 5.7358 | 6.4975 | 6.7395 | 6.8989 | 1.7166 | |

kodim18.bmp | 7.9879 | 8.0897 | 8.0192 | 8.3716 | 9.7744 | 9.5357 | 9.7857 | 2.9802 | |

kodim19.bmp | 6.5715 | 6.652 | 6.6692 | 6.91 | 8.0128 | 7.9229 | 8.0096 | 2.0518 | |

kodim20.bmp | 5.4533 | 5.5303 | 5.4895 | 5.6864 | 6.3457 | 6.4878 | 6.8629 | 1.5359 | |

kodim21.bmp | 7.1318 | 7.2045 | 7.1724 | 7.4582 | 8.1637 | 8.4703 | 8.6508 | 2.0659 | |

kodim22.bmp | 6.43 | 6.5127 | 6.4644 | 6.7137 | 7.8264 | 8.0046 | 7.9488 | 2.2574 | |

kodim23.bmp | 4.8995 | 5.0098 | 4.9244 | 5.0906 | 5.6989 | 6.3057 | 6.888 | 1.3954 | |

kodim24.bmp | 8.4271 | 8.5274 | 8.4699 | 8.8564 | 9.3906 | 9.9389 | 10.5156 | 2.4977 | |

clegg.bmp | 14.9733 | 15.2566 | 15.1755 | 15.7168 | 16.3563 | 21.471 | 32.7192 | 10.5426 | |

FRYMIRE.bmp | 10.7184 | 12.541 | 12.132 | 12.8278 | 12.989 | 16.7308 | 28.9283 | 6.2394 | |

LENA.bmp | 7.138 | 7.2346 | 7.1763 | 7.4264 | 8.1203 | 8.742 | 9.5143 | 4.288 | |

MONARCH.bmp | 6.5526 | 6.6292 | 6.5949 | 6.846 | 7.5162 | 8.1053 | 8.6993 | 1.6911 | |

PEPPERS.bmp | 6.3966 | 6.5208 | 6.4557 | 6.677 | 7.3618 | 8.1855 | 8.8893 | 2.3022 | |

SAIL.bmp | 8.3233 | 8.3903 | 8.3598 | 8.6627 | 9.8685 | 9.7838 | 10.5673 | 2.9003 | |

SERRANO.bmp | 6.3508 | 6.757 | 6.8385 | 7.9064 | 7.5303 | 9.0549 | 18.3631 | 4.6489 | |

TULIPS.bmp | 7.5768 | 7.656 | 7.6146 | 7.8786 | 8.4084 | 9.3817 | 10.5873 | 2.2228 |

Back to comparing the DXT1 encoders. BTW the test set here is the Kodak image set plus the Waterloo Bragzone image set. The Kodak set is all photographs that are pretty noisy, and there's not a huge difference in the coders. The Bragzone image set has some synthetic images with things like gradients which are harder to compress well, and there you can really dramatically see the bad encoders fall apart. In particular if you look at the results on "clegg" and "frymire" and "serrano" you can see how bad the "FastDXT" coder is.

The "Squish" in the table is the iterative cluster fit with uniform weighting. All coders work on linear RGB error; and the MSE is mean per pixel not per component.

The "CB" encoder is a simple 2-means fit. I seed the means by doing the PCA to find the best fit line. I put a plane perpendicular to that line through the average and take all the points on each half to be the two clusters, average the cluster to seed the means, and then iteratively refine the means by reclustering. Once I have the 2-means I do a simple search to find the best 565 DXT end points to find the two means. There are 3 cases to try :

1. put c0 and c1 = the two means 2. put c2 and c3 = the two means (so c0 and c1 are pushed out) 3. make (c0,c2) straddle mean 1 and (c3,c1) straddle mean 2 - this is best for gaussian clusters around the mean

A 2-means like this is slightly better than doing a pca "range fit" like Simon's fast method or the "ryg" method. If the data was all Gaussian noise, they would be equivalent, but of course it's not. You often get blocks that have a bunch of pixels at the low end which are all exactly the same color ( for example, all perfect black), and then a spread of a bunch of junk at the high end (like some orange, some yellow, etc.). You want to put one end point exactly on perfectly black and put the other endpoint at the center of the cluster on the high end.

"CB opt" and "Squish opt" take the results of the CB and Squish compressors and then improve them by iterative search. Simon Brown on his page mentions something about doing greedy endpoint refinement but claims it "doesn't converge because the indeces change". That's silly, of course it converges.

To do a greedy search : try wiggling one or both end points in 565 space. Find new best index fit for the new end points. Measure new error. If the error is lower, take the step.

Of course that works and it does converge and it's pretty simple and improves the encoding after Squish.

In fact you can do even better by doing simulated annealing instead of a plain greedy search. We should know by now that any time we have a greedy search like this where we can measure the error and it can get stuck in local minima, we can improve it with something like simulated annealing. I use 256 steps of annealing with a sinusoidal decreasing temperature pattern. I randomly pick a way to wiggle the two end points (you need to consider wiggling both, not just single wiggles). I try the wiggle and measure the error delta. Negative errors (improvements) are always taken, positive errors are taken probabilitistically based on the temperature. This is what was done in the "opt" coders above.

Most of the time the annealing search just makes a small improvement, but once in a while (such as on "frymire" and "serrano") it really finds something remarkable and makes a big difference.

Simon's cluster fit alg is very sweet. I didn't really understand it at first from the description of the code, so I'm going to redescribe it for myself here.

The basic idea is you find an assignment of indices, and from that solve a best fit to give you the end points for those indices. If you think about all the colors in the block living in 3x3 color space, assigning indices is like giving labels to each point. All the points with the same label are a "cluster". Then each cluster is fit with either an end point or one of the interpolated points.

Once you know the clusters, putting the best two end points is a simple linear optimization. You can solve the least squares analytically, it's not like a matrix iteration least squares problem or anything.

So the trick is how to decide the indices. If you tried them all, it would be 4^16 ways, which is way too many. So what Simon does is create a linear ordering of the points using the PCA axis, then try all clusterings that can be created by planes perpendicular to that linear axis. That's equivalent to just trying all groupings of the indeces sorted by their dot along the PCA axis. That is, the clusters are

[0,i) , [i,j) [j,k) [k,16) for all i,j,kwhich is a manageable amount to search, and gives you most of the interesting clusterings that you care about. Something that might improve Squish is tring a few different axes and picking the best.

BTW this end point optimization is very approximate. One issue is that it assumes the best index for each point doesn't change. It also of course just uses real number arithmetic to make the 1/3 points, not the actual integer rounding that's done. Those factors are actually pretty big.

## 4 comments:

I think my "DXT compression in CUDA" article has a fairly good explanation of the squish method:

http://developer.download.nvidia.com/compute/cuda/sdk/website/projects/dxtc/doc/cuda_dxtc.pdf

Have you tried using the axis returned by 2-means fit as the starting point for squish?

Do you have any thoughts about how to compress the Alpha blocks? I currently use an iterative end point optimization:

- Compute the endpoints based on the best fit line or bounding box

- Assign indices for these endpoints.

- Compute end points that minimize error for this index assignment.

- Repeat second step until the algorithm converges.

No, I haven't tried any other axes for Squish.

Actually one of the ways you could definitely beat "Squish opt" is just by randomly trying a few axes to start squish with.

I haven't really thought about alpha blocks at all yet.

Yes, but that would multiply the compression time by the number of trials. It would be nice if it were possible to pick a better axis without spending too much time.

BTW, these are the results on NVTT with normal and fast compression options:

'kodim01.png' 8.3178 8.8283

'kodim02.png' 6.1999 6.5642

'kodim03.png' 4.8679 5.199

'kodim04.png' 5.7507 6.0604

'kodim05.png' 9.6922 10.1898

'kodim06.png' 7.1862 7.5565

'kodim07.png' 5.8445 6.1761

'kodim08.png' 10.2765 10.8359

'kodim09.png' 5.3387 5.6826

'kodim10.png' 5.2946 5.6115

'kodim11.png' 6.7792 7.1448

'kodim12.png' 4.8437 5.1419

'kodim13.png' 10.9068 11.5045

'kodim14.png' 8.3499 8.7971

'kodim15.png' 5.8989 6.1742

'kodim16.png' 5.1197 5.4364

'kodim17.png' 5.5792 5.8778

'kodim18.png' 8.0427 8.4399

'kodim19.png' 6.6181 7.0284

'kodim20.png' 5.4907 5.7841

'kodim21.png' 7.18 7.5852

'kodim22.png' 6.4762 6.8052

'kodim23.png' 4.967 5.2579

'kodim24.png' 8.4657 8.8632

'clegg.tif' 15.0331 16.114

'frymire.tif' 12.1675 13.317

'lena.tif' 7.0679 7.3861

'monarch.tif' 6.6051 6.915

'sail.tif' 8.3798 8.7851

'serrano.tif' 6.5715 8.4293

'tulips.tif' 7.6531 8.0117

Average: 7.3214 7.7904

It looks like my squish version is slightly better. Hmm... I thought that my fast compressor was equivalent to ryg's....

Post a Comment