09-11-08 - 2

So I did my first project for RAD, which was using the PCA to find the KLT for color conversions. We're also going to be doing optimal channel decorrelation for audio, and in fact it can be used for other things, such as coordinates in geometry; basically any channels of data that are correlated is interesting to consider.

A comprehensive list of (decent) color conversions for image compression :

General note : color conversions which involve floats should & can be used for lossy image compression, but require a whole pipeline made for floats; many color conversion scale the components so again your quantization system must be aware of that. Lossless color conversion that go int->int are easier to analyze. They generally use lifting of some sort.

YUV or YCbCbr like in JPEG ; this is lossy and crappy.

KLT : matrix multiply by the PCA of the color planes of the image.

fixed KLT : constant KLT matrix optimized on the Kodak image set; you can find this in the FastVDO H.264 paper.

DCT3 : 3-sample DCT. Whoah, when I thought of this it gave me a huge "duh" moment. It's actually a very very good float->float matrix color conversion. Implemented as just a 3x3 matrix multiply. In fact there's a theoretical result that the DCT optimally decorrelates any data which is just Gaussian noise with order-1 correlation, and in fact hey color is very close to that.

YCoCg : both lossless and lossy versions. See Malvar papers at microsoft research or his H.264 submission. BTW this is equivalent to doing Haar[R,B] then Haar[G,B] (the Haar acts in place like {x,y} <- { (x+y)/2, y-x })

CREW aka RCT aka J2K-lossless : older lossless transform; pretty much always worse than YCoCg

FastVDO : another lossless transform proposed for H.264 ; matches the KLT slightly better than YCoCg , but actually usually codes worse than YCoCg.

This last one leads me to a general issue that was somewhat confounding :

Decorrelation is not the same as real world coding performance. That is, the most decorrelating color transform (the KLT) is not the best for coding in most cases. In fact, the KLT was quite poor. I did come up with some heuristic tricks to make a pseudo-KLT that does code quite well.

There's a theoretical measure of "coding gain" and the KLT maximizes that, but when run through real coders it falls down. I'm not sure at this point exactly what's happening. I have some theories; one issue is that the original RGB is not a Gaussian float, it's integers, so things don't behave smoothly; for example, long long ago I wrote on here about how D(Q) is not smooth in the real world, that is the distortion for a given quantizer does not increase monotonically with Q; it has special peaks when Q hits rational numbers, because those values map ints to ints better. All the theoretical literature on rate-distortion is almost garbage because D(Q) and R(Q) are so non-smooth in the real world. My other theories are that the oblique rotations the KLT sometimes takes is essentially making the bottom bit random which is hurting the spatial prediction of later coding stages.

One interesting case for games is compressing images with an alpha channel. In that case, the alpha channel can be losslessly predicted from a linear combination of RGB, which is a very good model of many alpha channels, which leads to them being packed in only a few bytes.

No comments:

old rants