First a correction : what I said about downsampling there is mostly wrong. I made the classic amateur's blunder of testing on too small a data set and drawing conclusions from it. I'm a little embarassed to make that mistake, but hey this is a blog not a research journal. Any expectations of rigor are unfounded. For example this is one of the test images I ran on that convinced me that downsample was bad :
aikmi -i7 qtable ; CoCg optimized joint for min SCIELAB downsample : 262,144 -> 32,823 = 1.001 bpb = 7.986 to 1 (per pixel) Q : 11.0000 Co scale = Cg Scale = 1.525 bits DC : 19636|5151|3832 , bits AC : 175319|38483|19879 bits DC = 10.9% bits AC = 89.1% bits Y = 74.3% bits CoCg = 25.7% rmse : 7.3420 , psnr : 30.8485 ssim : 0.9134 , perc : 73.3109% scielab rmse : 2.200 no downsample : 262,144 -> 32,679 = 0.997 bpb = 8.021 to 1 (per pixel) Q : 12.0000 Co scale = Cg Scale = 0.625 bits DC : 19185|13535|9817 , bits AC : 160116|39407|19091 bits DC = 16.3% bits AC = 83.7% bits Y = 68.7% bits CoCg = 31.3% rmse : 6.9877 , psnr : 31.2781 ssim : 0.9111 , perc : 72.9532% scielab rmse : 1.980you can see that downsample is just much worse in every way, including severely worse in SCIELAB which doesn't care about chroma differences as much as luma. In this particular image, there's a lot of high detail color bits, and the downsampled version looks significantly worse, it's easy to pick out visually.
However, in general this is not true, and in fact downsample is often a small win.
Without further ado I present lots of stats :
|i0 Cg=1 Co=1||i0 Cg = 0.6 Co = 0.575||i7 Cg = 0.6 Co = 0.575||i4/i7 opt per image||i7 CoCg optimized independently per image||i7 CoCg optimized jointly per image downsampled|
|file||rmse||scielab||rmse||scielab||rmse||scielab||rmse||scielab||Co||Cg||rmse||scielab||Co / Cg||rmse||scielab|
explanation : output bit rate 1 bpb in all cases parameters are optimized to minimize E = ( 2 * SCIELAB + 1 * RMSE ) RMSE is on RGB SCIELAB is perceptual color difference metric i0 = flat quantization matrix i7 = tweaked perceptual quantization matrix to minimize E i4/i7 = optimized blend of flat to perceptual matrices The table reads roughly left to right in terms of decreasing perceptual error. "i0 Cg=1 Co=1" : flat q-matrix, standard lossless YCoCg transform without extra scaling "i0 Cg=0.6 Co=0.575" ; optimize CoCg scale for E ; interestingly this also helps RMSE "i7 Cg=0.6 Co=0.575" ; non-flat constant Q-matrix ; hurts RMSE a bit, helps SCIELAB a lot "i4/i7 opt per image" ; per-image non-flat Q-matrix ; not a big difference "i7 CoCg optimized independently per image" : independently optimize Co and Cg for each image "i7 CoCg optimized jointly per image downsampled" : downsample test, CoCg optimized with Co=Cg
On the full kodak set, downsampling is a slight net win. There are a few cases (kodim03,kodim23) where it hurts a lot like I saw before, but in most cases it is a slight win or close to neutral. The conclusion is that given the speed benefit, you should downsample. However there are occasional cases where it will hurt a lot.
I think most of the results are pretty intuitive and not extremely dramatic.
It's a little non-inuitive what exactly is going on with the per-image customized chroma scales. Your first thought might be "well those images have different colors in them, so the color space scale is adapting to the color content in the image". That's not so. For one thing, more or less content of a certain color doesn't mean you need a different color space - it just means that that band of the color space will get more energy, and thus more bits. e.g. an image that has lots of "Co" component colors will simply have more energy in the Co plane - that doesn't mean scaling Co either up or down will help it.
If you think about the scaling another way it's more obvious what's going on. Scaling the color planes is equivalent to using different quantizers per plane. Optimizing the scalings is equivalent to doing an R/D optimization of the quantizer of each plane. Thus we see what the scaling is doing : it's taking bits away from hard to code planes and moving them to easier to code planes (in an R/D slope sense).
In particular, when I visually inspected some of the more extreme cases (cases where the per-image optimized scales were a big win vs. a constant overall scale, such as kodik10) what I found was that the optimized scalings were taking bits *away* from the dominant colors. One very obvious case was on photos of the ocean. The ocean is mostly one color and is very hard to code (expensive in an R/D sense) because it's all choppy and random. The optimized scaling took bits away from the ocean and moved them to other colors that had more R/D payoff.
(BTW rambling a bit : I've noticed that x264 Psy VAQ tends to do the same kind of thing - it takes bits away from areas that are really noisy mess, such as water, and moves them to areas that have smooth pattern and edges. Intuitively you can guess if an area is a mess and just really hard to code then you should just say "fuck it" and starve it for bits even if MSE R/D tells you it wants bits. I think also that improving an area from an RMSE of 4 to 2 is better than improving from 10 to 7, even though it's less of a distortion win. Visually there's a bit difference that occurs when an area goes from "looks good" to "looks noisy" , but not much of a difference when an area goes from "looks bad" to "looks really bad").
So this is in fact not really a surprising result. We know already that heavy R/D bit allocation can do wonders for lossy compressors. That are lots more areas to explore - optimization of every coefficient in the quantization matrix, optimization of the color transform, optimization of the transform basis functions, etc. etc. - and in each case you need to be clever about the way you encode the extra rate control side information.
ADDENDUM : I thought I should write up what I think are the useful takeaway conclusions :
1. It is crucial to do the right kind of scaling to Co/Cg (or chroma more generally) depending on whether you downsample or not. In particular the way most people just turn downsample on or off and don't compensate by scaling chroma is a mistake, eg. not a fair comparison, because their scaling will be tuned for one or the other.
2. Downsample vs. no-downsample is pretty close to neutral. If you downsample for speed, that's probably fine. There are rare cases where it does hurt a whole lot though.
3. Using a non-flat Q matrix does in fact help perceptual quality significantly. And it doesn't hurt RGB RMSE nearly as much as it helps SCIELAB (helps SCIELAB by 10.35 % , hurts RMSE by 1.58 % ).
4. It does appear acceptable to use global tweaked values all the time rather than custom tweaking to each image. Custom tweaks do give you another bit of benefit, but it's not huge, thus not worth the very slow optimization step. (see DCTune eg)