2/23/2010

02-23-10 - Image Compresson - Color , ScieLab

The last time I wrote about anything technical it was to comment on image coding perceptual targets and chroma . Let's get into that a bit more.

There are these standard weapons available to us : 1. Colorspace transform (lossy or lossless) , 2. Relative scaling of color channels, 3. Downsampling , 4. Non-flat quantization matrices.

Many image compressors use some combination of these. For example, JPEG uses YCbCr colorspace, which has a built-in down scaling of the chroma channels, also optionally downsamples chroma, and also usually uses a very high-frequency-killing quantization matrix. The result is that chroma is attacked in many ways - the DC accuracy is destroyed by the scaling in the color conversion as well as the [0] entry of the quantization matrix, and high frequency info is killed both by downsampling and the high entries in the quantization matrix.

But is this good? Obviously it's all bad in terms of RMSE (* not completely true, but close enough), so we need something that approximates the human eye's less sensitie chroma receptors.

For a long time I put off this question because it seemed the only way to attack it was by showing a ton of images to test subjects and asking "is this better?". (Furthermore, there's the ugly problem that any perceptual metric is heavy tied to viewing conditions, and without knowing the viewing conditions you may be optimizing for the wrong thing). But maybe I found a solution.

Let me be clear briefly that I am here only trying to address the issue of how the human eye sees chroma vs luma. This is not a full "psychovisual perceptual metric" which would have to account for the brain identifying areas of noise vs. areas of smoothness, repeated patterns, linear ramps, etc. Basically the only thing I'm trying to capture here is the importance of luma bits vs. chroma bits.

Well, it turns out there's this thing from color research called SCIELAB . You may be familiar with "CIE LAB" aka the "Lab color space" which is considered to be pretty close to "perceptually uniform" , that is 1 unit of distance between two Lab colors has the same perceptual error importance no matter what the two colors are. Well SCIELAB is the extension of CIELAB to images (not just single colors). You can read the paper at that link (or see links below), but the basic thing it does is very simple :

SCIELAB takes the image and transforms it to "opponent color" (luma, red-green, and blue-yellow) , which is roughly the color space that eyes use to see light (rods see luma, cones see chroma) (note that here we are transforming "pixel values" into real light values, so we have to make an assumption about the brightness and color calibration of the viewing device). In opponent color space, each channel is filtered. The filter used for each channel represents the angular resolution that a rod or cone has. Basically this is a gaussian whose sdev is proportional to the angular resolution in that channel. This depends on the DPI of the viewing device and the viewing distance (eg. how many pixels fit into one degree at the eye). The gaussian is narrow for luma, indicating good precision, and wider for chroma. The filter also has a wide negative lobe around the center peak, which captures the fact that we see values as relative to their neighborhood - eg. 100 on a background of 10 looks brighter than 100 on a background of 50.

The gaussian filters represent the probability of a photon from a given pixel hitting and activating a rod or cone. The wider filters for chroma indicate that a half-toned image in red-green or blue-yellow will be indistiguishable from the original at a much shorter distance than a half-toned luma image.

One you do this filtering, you transform back to CIELAB and then you can just do a normal MSE to create a "delta E". (CIE also defines a more elaborate more uniform "delta E" metric for LAB , but for our purposes the plain L2 distance is very close and much simpler). The result is a "SCIELAB delta E" metric that is analytic and can be used in place of MSE for comparing images. Having this SCIELAB metric now lets us try various things and it tells us whether they are perceptually better or not (in terms of optical perception of color, anyway).

So far as I know this has never been used in the mainstream image compression literature ; the only place I found it was this Stanford school project tech report : Direction-Adaptive Partitioned Block Transform for Color Image Coding . This paper is pretty interesting; they aren't actually doing anything with the DA-PBT , they're just evaluating color spaces and how to do color coding starting with a grayscale image compressor.

Let's go through the EE398 paper in detail.

First they use YCbCr because they claim it produces better scielab results than RGB. True enough, but there were a lot of other color spaces to try. Furthermore, they don't mention this, but they are using the JPEG style YCbCr, which has a built in 0.5 scaling of the chroma channels (chroma should have a range of [-256,256] but JPEG offsets and scales to put it back into [0,256]) - they have effectively killed the chroma precision by using YCBCr.

They then look at whether sub-sampling helps or not. They find it to be roughly neutral - but when you try subsampling or not subsampling you should also try optimizing all other free options (scaling of the chroma channels, quantization matrix).

The most interesting part to me is "Rate Allocation". They try giving different fractions of the bit budget to Y or CbCr. They find that optimal delta E almost always occurs somewhere around Y bits = 66% of the total , that is the bit ratios are like [4:1:1]. In order to acheive this ratio they had to use small quantization step sizes for CbCr than Y, but that is an anomaly because of the fact that the YCbCr they use has killed the chroma - if you use a non-scaling YCbCr you would find that the chroma quantization values should be *larger* than luma to acheive the 66% bit allocation. (note that using different quantization values on each channel is equivalent to scaling the channels relative to each other).

They also found that using non-uniform quantization matrices (ala JPEG) hurt. I believe this was just an anomaly of their flawed testing methodology.

This paper was the most serious study of color in image compression that I've ever seen, but is still flawed in some simple ways that we can fix. The big problem is that they make the classic blunder of many people working in compression of optimizing parameters one by one. That is, say you have a compressor with options {A,B,C}. The blunderer finds the optimal value for option A and holds that fixed, then the optimal for B, then the optimal for C. They then try out some experimental new mode for step A, and their tests show it doesn't help - but they failed to retry every option for B and C in the new mode for A. eg. for example something like downsampling might hurt if you're using YCbCr, but say you use some other color space, or scale your colors in some way, or whatever, then downsampling might help and the result of doing all those steps together may be the best configuration.

Let's go back through it carefully :

First of all, the color conversion. Let me note that we use the color conversion in image compression for really two separate purposes which are mixed up. One use is for decorrelation (or energy compaction if you prefer) - this helps compression even for lossless mode. The second is for perceptual separation of chroma from luma so that we can smack the chroma around. Obviously here we need a color transform which gives us {luma/chroma} separation - that is, we cannot use something like the KLT which doesn't necessarilly have a perceptual "luma" axis.

From my earlier color studies, I found that YCoCg produces good results, usually within 1% of the best color transform on each image, so we'll just use that. But we will be careful and use a float <-> float YCoCg which doesn't scale any of the channels.

We will then scale Y relative to CoCg. This scaling is equivalent to variable quantizers and is (one of the ways) how we will control the bit allocation to Y vs. Chroma. This scaling gives you a difference in "value resolution" , it doesn't kill high frequencies.

You can then optionally downsample chroma. Note that in naive tests I have found in the past that downsampling chroma sometimes helps visual quality; and in fact in some cases it even helps MSE measured on the RGB data. I now know that that was just an anomaly due to the fact that I wasn't considering chroma scaling. That is, downsampling was just a crude way of allocating fewer bits to chroma, which does in fact sometimes help, but if you also have the ability to change the chroma bit allocation by relative scaling of the channels, the advantage of downsampling vanishes.

I optimized the scaling of CoCg relative to Y on lots of images. Obviously the true optimum value is highly image dependent (you could compute this per image and store it with the image of course), but in most cases a scale near 0.7 is optimal if you are not downsampling, and a scale near 1.1 is close to optimal when downsampling ( 1.0 is not bad when downsampling ). When not downsampling, the optimal bit allocation is usually in the area of Y ~= 66% of the bits, as seen in the EE398 paper. When downsampling, the optimal bit allocation tends to be closer to Y = 80% of the bits. Downsampling generally hurts RGB MSE and SCIELAB delta E, but I find it sometimes helps RGB SSIM.

Obviously downsampling is resulting in more bits being used on luma, which means you'll have sharper edges and better preservation of texture and a visual appearance of more "detail", at the cost of the color values being far off. By my own examination, I often will find that if I just stare at the image made from downsampled chroma it looks "better" - eg. I see more edge detail, and it has less of that obvious appearance of being compressed, eg. less ringing artifacts, halos, stair-steps, etc. However, when I switch back and forth between the original and the compressed, the version made from downsampled chroma shows obvious color errors. The version made from non-downsampled chroma obviously has much better color preservation, but appears generally blurrier, has more block artifacts, etc. The non-downsampled version wins according to "delta E" , but by my eyes I can't really clearly say one is better than the other, they're just different errors.

The last tool we have is a non-uniform quantization matrix. NUQM lets us give more bits to the low frequencies vs. the high frequencies. Generally NUQM hurts MSE, but it might help "delta E" , because SCIELAB accounts for the "fuzziness" of human visual (insensitivity to high frequency pattern). To test this, what we need to try is various different NUQM's for both luma and chroma, as well as optimizing the relative scaling value in each case. I haven't completed this yet, but early results show that NUQM's do in fact help delta E. Note that I'm not talking about doing a per-image optimal NUQM like "dctopt" does or something, just finding something like the JPEG style skewed matrix to use globally.

Some numbers for example :


On a 512x512 color image of a face , at 1.0 bits per pixel , 
optimizing quality at constant bit rate


baseline : delta E = 2.2933

not downsampled , optimal CoCg scale = 0.625 : delta E = 2.155  (bits Y = 72%)

    downsampled , optimal CoCg scale = 1.188 : delta E = 2.381  (bits Y = 80%)

best NUQM and scaling (no downsampling) : delta E = 1.899  (bits Y = 61%)

( JPEG delta E = 2.7339 )

One thing I notice that NUQM does obviously is give a lot more bits to the DC's. In this case :


not downsampled, same cases as previous
UQM = uniform quantization matrix

 UQM , bits DC = 15.4% , Q = 14.0  , delta E = 2.155 , bits Y = 72%

NUQM , bits DC = 19.4% , Q =  7.25 , delta E = 1.899 , bits Y = 61%

Here Q is the quantizer of the DC component of Y - in the UQM case all Q's are the same (though the Q for chroma is effectively scaled). In the NUQM case the higher frequency AC components get much higher Q's. We can see from the above that because of NUQM, the quantizer for the DC can be much lower at the same bit rate.

Personal visual inspection indicates that the NUQM images just have much more "JPEG-like" artifacts. That is, they generally look more speckly. They obviously preserve flat areas and simple ramps somewhat better. The tradeoff is much worse ringing artifacts and destruction of high frequency detail like fine edges. (in my case the lower Q from NUQM also means a much weaker deblocking filter is used which may be part of the reason for more speckly appearance).

In any case, NUQM clearly helps delta E due to the ability to take bits away from the high frequency chroma data - much better than just scaling and downsampling can.

This is all very interesting and promising, but we have to ask ourselves at some point - how much do we trust this "scielab delta E" ? eg. by optimizing for this metric are we actually making better results? More and more I am convinced that the biggest thing missing from data compression is a better image quality metric (and then once you have that, you need to go back to basics and re-test all your assumptions against it in the correct way).

Color links :

Working Space Comparison sRGB vs. Adobe RGB 1998
Welcome to IEEE Xplore 2.0 Using SCIELAB for image and video quality evaluation
Video compression's quantum leap - 12112003 - EDN
Useful Color Equations
Useful Color Data
Standard illuminant - Wikipedia, the free encyclopedia
SpringerLink - Book Chapter
S-CIELAB Matlab implementation
References related to S-CIELAB
Lab color space - Wikipedia, the free encyclopedia
IEEE Xplore - Login
help - sRGB versus Adobe RGB (1998)
efg's Chromaticity Diagrams Lab Report
CIECAM02 - Wikipedia, the free encyclopedia
Chromatic Adaptation
Brian A. Wandell -- Reference Page
Ask a Color Scientist!
A top down description of S-CIELAB and CIEDE2000. Garrett M. Johnson. 2003; Color Research & Application - Wiley InterScienc
A proposal for the modification of s-CIELAB

4 comments:

cbloom said...

BTW I should note here that I've been noticing something else funny, which is that there are really two different ways to rate "perceptual quality".

One is by "how close is this to the original". (eg. toggling with the original and assigning a "distance")

Another is simple "how good does this look" (without comparing to an original). (eg. just looking at dest and assigning a "quality")

I think that scielab delta E is pretty good for the first, but not for the second. Similarly, SSIM seems to be pretty strong for the second but not so much for the first. I think that the x264 SATD might be more of the second than the first.

Anonymous said...

Did you a/b downsampled vs. non-downsampled chroma with the exact same bit allocation? Your description of the comparison sounds like it was of a version that was changing the Y bits (that is, you seem to be describing the comparison larlge in terms of the effect on Y, but I may be misunderstanding).

The other thing about chroma downsampling is that it's done for speed. If you can make it look just as good, the argument goes the other way: if we're going to crush our chroma high-frequencies a lot anyway, we might as well downsample the chroma because then that helps our decompression speed by halving the total decoded samples and looks just as good.

cbloom said...

"Did you a/b downsampled vs. non-downsampled chroma with the exact same bit allocation?"

No, in each case I let it optimize the bit allocation to minimize delta E.

"The other thing about chroma downsampling is that it's done for speed."

Yeah, I should've mentioned that. Obviously if it's at all close you want to downsample. The EE398 paper has a silly line where they say like "downsampling is almost neutral so we don't do it". Wait, if it's neutral of course you DO do it. But I'm finding that it's not actually very close to neutral if you believe delta-E.

BTW I believe there's a possibility for a much higher quality chroma upsample (using the luma information) that would change this perhaps.

won3d said...

You could do luma-guided upsampling of chroma (in-loop or not). I imagine it is similar but simpler than the RAW image demosaicing. I guess you suggest this, already.

What about higher levels of chroma downsampling? Or downsampling the two opponent color channels differently amounts I would guess that the red-green opponent channel is much more sensitive to downsampling than orange-blue -- consider how much the V channel is compressed in NTSC. You could probably get away with more than 2x there.

From a perceptual standpoint, even though you are less sensitive to orange-blue spatial details, it is actually more sensitive to gradations, so you might not want to quantize it as much. During a CES I went to, the Sony people were using blue gradients to demonstrate the lack of banding on their HDMI 1.3 Deep Color TVs. At least, I think I am remembering this correctly...

old rants