Context on Chroma Subsampling

Poked by Guetzli into thinking about pros/cons of chroma subsampling -

Chroma downsampling (as in standard JPEG YCbCr 420) is a big ugly hammer. It just throws away a ton of bits of information. That's pretty much always a bad thing in compression.

Now, preferring to throw away chroma detail before luma detail is valid and good. So if you are not chroma subsampling, then you need a perceptually optimizing encoder that knows to give fewer bits to high frequency chroma. You have much more control doing this through bit allocation than you do by just smashing the chroma planes. (for example, on blocks where there is almost no luma signal, then you might keep more of the high frequency chroma, on blocks with luma masking you can throw away lots of the high chroma AC bits - you just have way more precise control).

The chroma subsample is just a convenient way to get decent perceptual tradeoffs in a *non* optimizing encoder.

Chroma subsample is of course an R-D choice. It throws away signal, giving a certain disortion, in exchange it saves you some rate. This particular move is a good R-D choice at some tradeoff zone. Generally at high bit rate, it's a bad move. In most encoders, it becomes a good move at some lower quality. (in JPEG the tradeoff point is usually somewhere around 85). Measuring this D in RMSE is easy, but measuring it perceptually is rather tricky (when luma masking is present it may be near zero D perceptually, but without luma masking it can be much worse).

There are other issues.

In non-subsampled world, the approximate important weights for YCbCr are something like {0.7,0.13,0.17} . If you do subsample, then the chroma weights per-pixel need to go up by 4X , in which case they become pretty close to all being the same.

Many people mistakenly say the "eye does not see blue levels well". Not true, the eye can see the overall level of blue perfectly well, just as well as red or green. (eg for images where the whole thing is a single solid color). What the eye has is very poor spatial resolution in blue.

One of the issues is that chroma subsample is a nice win for *speed*. It gives you 4X less pixels to work on in two of your planes, half as many pixels overall. This means that subsampled chroma images are almost 2X faster to decode.

I used to be anti-chroma-subsample in my early years. For example in wavelets it's much neater to keep all your color planes full res, but send your bitplanes in [YUV] order. That way when you truncate the bottom bit planes, you drop the highest frequency chroma first. But then I realized that the 2X speedup from chroma subsample was nothing to sneeze at, and in general I'm now pro-chroma-subsample.

Another reminder : if you don't chroma subsample, then you may as well do a KLT on the color planes, rather than just use YUV or whatever. (maybe even KLT per region). The advantage of using standard YUV is that the chroma are known to be perceptually okay to downsample (you can't downsample the two minor components of the KLT transformed planes because you have no guarantee that they are of a type that the eye can't perceive high frequency data).

You can obviously construct adversarial images where the detail is all in chroma (the whole image has a constant luma). In that case chroma downsampling looks really bad and is perceptually a big mistake.

Chroma-from-luma in the decoder fixes all the color fringing that people associate with JPEG, but obviously it doesn't help in the adversarial cases where there is no luma detail to boost the chroma with.

I should also note while I'm at it that there are many codecs out there that just have bugs and/or mistakes in their downsamplers and/or upsamplers that cause this operation to produce way more error than it should.

ADD : Won sent me an email with an interesting idea I'd never thought about. Rather than just jumping between not downsampling chroma and doing 2x2 downsample, you could take more progressive steps, such as going to a checkerboard of chroma (half as many pixels) or a Bayer pattern. It's probably too complex to support these well and make good encoder decisions in practice, but they're interesting in theory.

No comments:

old rants