One thing I noticed is that the lsqr fit really doesn't do much other than shift an end point by one. That is, the end points are in 565 already, you can do this nice optimization in floats, but when you quantize back to 565 you pretty much always hit the point you started with or at most a step of 1.
So the new "CB" modes are :
CB 1 = just 4 means then lsqr fit , faster than Squish, a bit slower than ryg. Quality is roughly competitive with Squish, but they have different strengths and weakness, so taking the best of the two might be reasonable. Squish never beats "CB 1" by a lot, but "CB 1" kills it on the weird "frymire" image.
CB 2 = CB 1 followed by simulated annealing.
CB 3 = Very slow heavy optimization. This is an attempt to see what "optimal" would get you. It tries "CB 2" and then it also tries using all 16 colors in the block as end points, so 16*16 trials, does the lsqr optimization on each trial, and then anneals the best of those. There are still various other things to try that might find a better result, but this is already pretty crazy slow. This is too slow to use in real production, the idea is simply to get an idea of how close "CB 2" is to optimal. Of course "CB 3" could still be far off optimal, I'm only conjecturing that it's close.
One of the interesting things to look at is the curve of diminishing returns from CB1 to 2 to 3. In most cases there's only a small improvement from 2 to 3, but there are exceptions, mainly in the weird degenerate images. kodim02 is one case (this is a photo but it's almost all red), and frymire of course. That meets expectations. On noisy natural images, the cluster of colors is pretty close to Gaussian noise, which works well with the PCA single line fit and the least-squares contiuous distribution approximation. On weird images with degenerate cases there can be stranger optimal solutions (for example : putting one of the color end points outside of the volume of original colors, so that one of the interpolated 1/3 colors can hit a certain value more precisely).
ADDENDUM : you might validly object and ask why the annealing is not getting closer to the global optimum. There are some approximations in the annealing that are hurting. For one thing I only try wiggling the ends by 1 step in 565. Then I don't really run it for very long, so it doesn't have a chance to make big walks and get to really different solutions. All it can really do is local optimizations with small steps to tunnel past local barriers to find better minima - it's not trying huge steps to other parts of the solution space. Theoretically if I ran a much longer annealing schedule with more time spent at higher temperatures it would do a better job of finding the global minimum. But I'm happy with this approach - the annealing is just an improved local optimization that steps bast small barriers, and to find drastically different global solutions I have to seed the trial differently.
The new numbers :
|file||CB 1||CB 2||CB 3||Squish opt||Squish||ryg||D3DX8||FastDXT|