02-10-10 - Some little image notes

1. Code stream structure implies a perceptual model. Often we'll say that uniform quantization is optimal for RMSE but is not optimal for perceptual quality. We think of JPEG-style quantization matrices that crush high frequencies as being better for human-visual perceptual quality. I want to note and remind myself that actually just the coding structure actually targets perceptual quality even if you are using uniform quantizers. (obviously there are gross ways this is true such as if you subsample chroma but I'm not talking about that).

1.A. One way is just with coding order. In something like a DCT with zig-zag scan, we are assuming there will be more zeros in the high frequency. Then when you use something like an RLE coder or End of Block codes, or even just a context coder that will correlate zeros to zeros, the result is that you will want to crush values in the high frequencies when you do RDO or TQ (rate distortion optimization and trellis quantization). This is sort of subtle and important; RDO and TQ will pretty much always kill high frequency detail, not because you told it anything about the HVS or any weighting, but just because that is where it can get the most rate back for a given distortion gain - and this is just because of the way the code structure is organized (in concert with the statistics of the data). The same thing happens with wavelet coders and something like a zerotree - the coding structure is not only capturing correlation, it's also implying that we think high frequencies are less important and thus where you should crush things. These are perceptual coders.

1.B. Any coder that makes decisions using a distortion metric (such as any lagrange RD based coder) is making perceptual decisions according to that distortion metric. Even if the sub-modes are not overtly "perceptual" if the decision is based on some distortion other than MSE you can have a very perceptual coder.

2. Chroma. It's widely just assumed that "chroma is less important" and that "subsampling is a good way to capture this". I think that those contentions are a bit off. What is true, is that subsampling chroma is *okay* on *most* images, and it gives you a nice speedup and sometimes a memory use reduction (half as many samples to code). But if you don't care about speed or memory use, it's not at all clear that you should be subsampling chroma for human visual perceptual gain.

It is true that we see high frequencies of chroma worse than we see high frequencies of luma. But we are still pretty good at locating a hard edge, for example. What is true is that a half-tone printed image in red or blue will appear similar to the original at a closer distance than one in green.

One funny thing with JPEG for example is that the quantization matrices are already smacking the fuck out of the high frequencies, and then they do it even harder for chroma. It's also worth noting that there are two major ways you can address the importance of chroma : one is by killing high frequencies in some way (quantization matrices or subsampling) - the other is how fine the DC value of the chroma should be; eg. how should the chroma planes be scaled vs. the luma plane (this is equivalent to asking - should the quantizers be the same?).

1 comment:

ryg said...

On 1): Another interesting thing in the same vein is the choice of binarization scheme when you're using binary arithmetic coding. The obvious effect is that it determines your prior model as long as you don't have sufficient statistics (and that way you could do "perceptual preconditioning"); but there's also secondary effects, because the binarization also makes some contexts a lot easier to capture than others (context within a single binarized symbol is trivial to include). Together with the prior expectation and RD optimization, this forms a positive feedback loop (you tend to favor small motion vectors because they initially take fewer bits, so they're estimated as more likely, so they take even fewer bits, and so on) - it's self-reinforcing in a way. That's not just in the beginning - at least some of the bits are going to be pretty random, so the shorter binarizations to tend to be cheaper to code than the long ones. Whatever binarization you pick actually biases your coding choices towards its own distribution, even after the model has been trained on your data.

old rants