Comments on cbloom rants: 02-10-09 - Fixed Block Size Embedded DCT Coder

Oh, yeah, right, that makes sense.

2009-02-12T17:36:00.000-08:00

Oh, yeah, right, that makes sense.

"Do either of the coders actually do this? They se...

2009-02-12T14:45:00.000-08:00

"Do either of the coders actually do this? They seem not to."

No, with RMSE order you pretty much always know that higher bits are more important than lower bits in any other sample. (that might not always be true but you would need very strong predictions) I mentioned it because in the general case its something you would want to explore, but I haven't really done so.

"It seems like you could actually just apply the jpeg quantizer tables exactly here and then go in regular bitplane order."

You certainly could apply those tables if you want that perceptual scaling (it would hurt RMSE) but you would still need to reorder the samples in order to make the stream decently embedded.

"Another unrelated question: It seems weird to me that the DCT coefficients are spatially correlated"

Yeah, it's not actually spatial correlation - it's "octave" correlation between the subbands like a wavelet tree.

That is, the coefficient for frequency [k,m] is very similar to the coefficient at [2k,2m]

see Xiong et.al. ("A DCT-Based Embedded Image Coder")

though there is the possibility of sending the top...

2009-02-12T14:02:00.000-08:00

though there is the possibility of sending the top bitplanes of some coefficients before the top bitplanes of other coefficients

Do either of the coders actually do this? They seem not to.

It seems like you could actually just apply the jpeg quantizer tables exactly here and then go in regular bitplane order. In other words, divide by the quantizers but keep fixed point results (or, equivalent, multiply by 1024/quantizer[i] or whatever). This ends up de-emphasizing the bits of the higher frequency coefficients, shifting them down to line up with lower bits of the lower-frequency components. Oh, maybe this is what you meant by "non-uniform DCT scaling". And like you say I guess if you're using RMSE as your metric that stuff wouldn't help.

Another unrelated question: It seems weird to me that the DCT coefficients are spatially correlated (which I think you're saying for CodeTree), since I thought one of the things frequency-ish transforms were supposed to do was produce spikier output data?

"Isn't that implicit in how much you have to recon...

2009-02-10T20:05:00.000-08:00

"Isn't that implicit in how much you have to reconstruct? If you failed to transmit a lot of the bits, the block must be noisy, so use a noisy profile. If you transmitted most of the bits, it's smooth, but any noise you're adding in will be in the highest frequencies, so the block will still be smooth?"

Yeah, that's a good thought, that may be an easy way to exploit it. I haven't really looked into it much. Obviously that happens implicitly to some extent.

"Y is usually considered perceptually vastly more ...

2009-02-10T19:55:00.000-08:00

"Y is usually considered perceptually vastly more important. What if you tried two bits of shift (arbitrary shift?) - could that give even better quality?"

I was measuring quality with straight RGB RMSE. Obviously if you believe there is a perceptual difference, then you should scale Y up more, or you could subsample Co or Cg or whatever. Also it may improve perceptual quality to have non-uniform DCT scaling. I didn't want to get into any of that because it makes it hard to compare to other algorithms reliably, but yes in practice you would probably want at least the option of doing those things.

"if you can tell that the block is pretty noisy, t...

2009-02-10T19:34:00.000-08:00

"if you can tell that the block is pretty noisy, then he grainy random restoration is probably better; if the block looks pretty smooth, then restoring to zero is probably better."

Isn't that implicit in how much you have to reconstruct? If you failed to transmit a lot of the bits, the block must be noisy, so use a noisy profile. If you transmitted most of the bits, it's smooth, but any noise you're adding in will be in the highest frequencies, so the block will still be smooth?

"That shifts the way the bit planes relate to RGB ...

2009-02-10T19:15:00.000-08:00

"That shifts the way the bit planes relate to RGB errors. We could remember this and step through the planes differently, but it's easier just to shift Y left by one bit."

Y is usually considered perceptually vastly more important. What if you tried two bits of shift (arbitrary shift?) - could that give even better quality?