9/08/2008

09-08-08 - 2

Some things naive wavelet coders get wrong :

1. Extra truncations that just throw away data. One way to say this is they basically do math in floats and then convert to int, and do it multiple times. Often they do their math "in ints" and right shift away extra bits, which is the same mistake. The easiest way to get this right is just to read in your input image to floats, do your color transform, your wavelet transform, all in floats. Then when you quantize, quantize float->int and do your coding on ints. (of course you could also just code the floats, instead of doing bit masks for bit plane coding you check if value >= thresh).

2. Quantizing wrong. Quantizing is pretty simple if you think about it in terms of buckets and medians. You're taking a certain range and labeling that with an int. When you dequantize back to floats, you should restore to the middle of the range. (actually you should restore to the most likely value in the range, which might not be the middle in a typical skewed distribution, but that's a very minor detail). Many wavelet quantizers intentionally use an ESZZ (Extra Size Zero Zone) quantizer, which is an artificial way of getting more zeros, which is generally a win because the coder likes lots of zeros - but an ESZZ is wrong on the DC LL band !. When you have zeros you don't need to send sign bits. One thing people frequently get wrong is handling negatives wrong; you have to be careful about trying to divide negatives and do float-to-ints and all that; I find it easiest to take the sign off and do my math then put the sign back.

3. Not accounting for norm scaling. If all your transforms were orthonormal you wouldn't have to do this, but most wavelet transforms are not. What that means is that a value of 1 in the LL and a value of 1 in the HH do not have the same L2 norm after untransforming. That means you can't quantize them with the same value, or use the same bitplane truncation. Quantizers need to be scaled, as do any rate-distortion functions.

4. Not sending the most important bits first (assuming you want an "embedded" aka truncatable stream, which you do). Obviously when you are planning to chop off the end of your data you need to sort it by importance. To really get this right you need to be able to shuffle in different bits. You want to send the Y LL first, then the U LL, V LL, then the Y LH, etc. but not necessarilly in a fixed order; in some images you might want to send the whole Y band before you send any U. That's global level stuff that even value-based coders should do. In the extreme case you can do things like the bit-plane coder EBCOT that splits the bits of a row into chunks to try to get the most important bits first in the stream. Note that each bit of the compressed stream always has the same amount of information; what we want to do is put the bits that have the biggest L2 norm significance first; that is to say, bits that wind up affecting a large region of the image with large value changes.

No comments:

old rants