10-15-10 - Image Comparison Part 6 - cbwave

"cbwave" is my ancient wavelet coder from my wavelet video proof of concept. It's much simpler than JPEG 2000 and not "modern" in any way. But I tacked a bunch of color space options onto it for testing at RAD so I thought that would be interesting to see :

cbwaves various colorspaces :

log rmse :

ms-ssim-scielab :

notes :

RMSE : Obviously no color transform is very bad. Other than that, KLT is surprisingly bad at high bit rate (something I noted in a post long ago). The other color spaces are roughly identical. This coder has the best RMSE behavior of any we've seen yet. This is why wavelets were so exciting when they first came out - this coder is incredibly simple, there's no RDO or optimizing at all, it doesn't do wavelet packets or bit planes, or anything, and yet it beats PAQ-JPEG (on rmse anyway).

MS-SSIM-SCIELAB : and here we see the disappointment of wavelets. The great RMSE behavior doesn't carry over to the perceptual metric. The best color space by far is the old "YUV" from JPEG, which has largely fallen out of favor. But we see that maybe that was foolish.

cbwave also has an option for downsampling chroma, but it's no good - it's just box downsample and box upsample, so these graphs are posted as an example of what bad chroma up/down sampling can do to you : (note that the probem only appears at high bit rates - at low bit rates the bad chroma sampling has almost no effect)

log rmse :

ms-ssim-scielab :

cbwave is a fixed pyramid structure wavelet doing daub97 horizontally and cdf22 vertically; the coder is a value coder (not bitplane) for speed. Some obvious things to improve it : fix the chroma subsample, try optimal weighting of color planes for perceptual quality, try daub97 vertical, try optimal per-image wavelet shapes, wavelet packets, directional wavelets, perceptual RDO, etc.

ASIDE : I'd like to try DLI or ADCTC , but neither of them support color, so I'm afraid they're out.

CAVEAT : again this is just one test image, so don't take too many conclusions about what color space is best.

ADDENDUM : results on "moses.bmp" , a 1600x1600 with difficult texture like "barb" :

Again YUV is definitely best, KLT is definitely worst, and the others are right on top of each other.


ryg said...

How do you do the color-space conversions? Do you convert to 8-bit integer RGB first and go from there to S-CIELAB or do you keep float intermediates? Every discrete rounding step in the pipeline introduces a bit of round-of error, and it keeps adding up. You need to use more than 8 bits per channel for your decoded images or this rounding error biases your comparison. Easiest solution (well, not easiest really, but least likely to develop unforeseen systematic biases) overall is to do the YUV->RGB in float and store floating-point RGB values. If you don't want to use floats, at least use more than 8 bits per color channel.

The same thing goes for other rounding steps throughout the codec. Don't fully descale post-DCT, keep at least 2 bits extra and only remove them at the very end (usually during Y'CbCr->RGB). Similarly, consider keeping your YCbCr/YUV coefficients in more than 8 bits (usually 16 bits signed for SW implementations) and only clamp once at the end. If you use a IJG-style Triangle (aka bilinear) upsampling filter, do the multiplies but hold the divides/shifts (if you're working in 16 bits, you have enough leftover bits to do this). You can fold the divides into your one descaling shift at the end.

When you finally do the shift, rounding is important. At the very least, add a rounding bias of 0.5. Even better, use some unbiased tie-breaking rule like round-to-nearest-even (not really an issue if you use FP internally or have lots of extra bits, but it's significant if you're working in fixed point and only have 2-4 extra bits). Alternatively (and easier during development), just do everything in float and only round to int at the very end. Same stuff goes for the encoder. Scale your RGB->YCbCr matrix up by a small power of 2, use more than 8 bits internally for YCbCr coefs and only remove the extra scale factor during quantization. Doing all this properly can give you an improvement of 1dB PSNR for very little effort indeed, depending on the image (it's not really visible, but RMSE/PSNR is hypersensitive to this).

S-CIELAB first does RGB->XYZ, and CIE Y happens to be very closely aligned with YCbCr Y (and less so with the other transforms). If you treat chroma differently from luma, that means all other bases will smudge some of their chroma noise into CIE Y (and from there into L*) while JPEG YCbCr won't. IOW, given your choice of metric, it's no surprise that YUV comes out looking as good as it does.

cbloom said...

My code is all 100% float so I don't have any of those problems.

I assume that some of the other people suffer from those problems.

The problem with the cbwave downsample is just that it uses box filters.

cbloom said...

"that means all other bases will smudge some of their chroma noise into CIE Y (and from there into L*) while JPEG YCbCr won't. IOW, given your choice of metric, it's no surprise that YUV comes out looking as good as it does."

I'm not sure I buy this argument.

While my metric is in fact LAB, it's *float* LAB, and it's basically just a rotation of the color matrix. Rotation doesn't change L2 so it shouldn't be a large effect on error.

Obviously if you were downsampling chroma, then having your axes aligned should make a big difference, but the YUV basis wins pretty big even without downsampling - in the non-downsampled case the color channels are all treated the same way.

Something a little more complex is going on. I believe it must be something about the preferential directions of the discretization grid.

ryg said...

"While my metric is in fact LAB, it's *float* LAB, and it's basically just a rotation of the color matrix. Rotation doesn't change L2 so it shouldn't be a large effect on error."
Wait a minute. First off, you're not usually coding in linear RGB, but YUV derived from nonlinear RGB with gamma. First step when converting to XYZ (which is always linear) is to convert that to linear RGB first (there goes linearity of the whole transform). The RGB->XYZ matrix is *not* orthogonal and hence doesn't preserve the L2 metric (or related quantities). And CIELAB takes the CIE XYZ coefficients and applies some more nonlinear transforms on top (the remapping function, which does cube roots for the higher values and is linear in the lower parts of the range).

The remapping function partially cancels out with the gamma curve, but still, it's definitely not just a rotation.

cbloom said...

Yeah okay, that was wrong. RGB-YUV is pretty close to a rotation, though its not actually orthornormal either. And RGB-LAB has a degamma, but then it has a sort of regamma, but of course it doesn't preserve L2's , that's the whole *point* of it is to not preserve L2's, it's supposed to make the distances more perceptually uniform.

But it still doesn't make sense to me. It's not like the YUV space data is passed directly to the error metric - it goes back into integer RGB to get written out to a BMP. I figure that step is appropriate because in practice to use it we convert to 24 bit RGB to display on the screen, so that step should be included in the error metric.

Why for example is YUV so much better than KLT-FixedY which uses the same Y but chooses it own chroma axes?

There's two different issues with colorspace choice in a lossy compressor. One is how it decorrelates the data and simply puts it in a more compressible form. The other is how it rotates (and shears and scales) the quantization grid.

I dunno, I have to think about it a bit more.

cbloom said...

I've been thinking about this a bit.

The question is, why is it such a big advantage for cbwave to use YUV, which is more similar to the measurement basis LAB than the other color spaces.

First of all let's be clear about what's NOT going on :

1. cbwave is not downsampling chroma or in any way taking bits away from chroma.

In coders that *do* downsample chroma, then obviously it is a big advantage to have your color axes aligned with the measurement axes. This is because you are killing chroma data, and if your concept of chroma is rotated from the measurement basis, then it will believe that you are incorrectly killing some useful bits.

2. You might think it's always best to work in the measurement basis, but obviously that's not true - see the RGB colorspace results for RMSE for example. The advantage of a good decorrelating colorspace is much more valuable.

My guess is that the largest effect is the orientation of the quantization axis. And in particular, the luma axis since it's the most important one.

Maybe I'll post a picture because it's hard to explain with words.

old rants