07-07-09 - Small Image Compression Notes , Part 2

Deblocking survey :

There are a few different ways to come at the problem theoretically.

One is to work on post-decode data in spatial domain. These approaches basically work by explicitly trying to detect block edges and just filter them. This is the approach, for example, of the H264 "in loop deblocking filter" which there is a lot of literature on. See for example "Adaptive Deblocking Filter" by List, Joch, et.al. For an example of the filter-based approach on the 8x8 DCT case see "DCT-Based Image Compression using Wavelet-Based Algorithm with Efficient Deblocking Filter" by Yan and Chen. (BTW the JPEG standard contains a "block smoother" which basically predicts AC1 as a linear function from neighboring block coefficients. This is okay for the specific case of smooth images and very high quantization, but is generally not awesome and is an ancient technique. Ignore.)

A more hardcore version of the filtering approach is "Combined Frequency and Spatial Domain Algorithm for the Removal of Blocking Artifacts" which does adaptively-offset and adaptively-directed gaussian filters ; this is sort of like the image denoising stuff that creates pixel gradient flow vectors - the filters are local gradient adaptive so they don't go across real edges. This appears to perform quite well but is very expensive.

The other general approach is a more abstract maximum-likelihood idea. You received a lossy compressed image I. You know the original image was one of the many which when compressed produces I. You want to output the one that was most likely the true original image. This is a maximum likelihood problem, and requires some a-priori model of what you think "natural" images look like. In particular, for the case of quantized DCT coefficients, you have a quantized DCT coefficient C ; instead of just reproducing Q*C you can reproduce anything in the range { Q*C - Q/2 , Q*C + Q/2 } , and you should choose the thing in that range that makes the "best" image.

"Optimal JPEG Decoding" (1998) by Jung, Antonini, Barlaud takes this approach directly. Their results are not awesome though; presumably because their prior is not good. A more modern version of the same idea is "Block Artifact Reduction Using a Transform-Domain Markov Random Field Model" by Li and Delp which uses a better model for image likelihood, but is in the same vein of doing a brute force search in the allowed coefficient space to find the maximum-likelihood reproduction.

A related method that was popular for a while is "Projection onto Convex Sets". This is basically just a method of satisfying simple convex constraints in an optimization. Here our constraint is that the quantized coefficient stay the same, that is, repro in { Q*C - Q/2 , Q*C + Q/2 } . You then apply some target function, such as you want smoothness or something, and take iterative steps towards that goal and project onto the constraints one by one. There are a lot more details to this, I haven't paid too much attention to it because these are all crazy expensive and I want something realtime.

"Blocking Artifact Detection and Reduction in Compressed Data" by Triantafyllidis etal (2002) is in the same vein but simpler and more analytical. It again worse directly in DCT space on coefficients within their quantization range, but it directly solves for the ideal reconstruction value as a function of neighbors based on minimization of specific simple deblocking metric. You wind up with just some equations for how to modify each coefficient in terms of neighbor coefficients. While the paper is good, I think one of their base assumptions - that the frequencies can be dealt with independently - is not sound, and most other people do not make that assumption.

"Derivation of Prediction Equations for Blocking Effect Reduction" by Gopal Lakhani and Norman Zhong (1999) is an older, simpler still version of the Triantafyllidis paper. They only correct the first few coefficients and solve for optimal reconstruction to minimize MSDS (mean squared difference of slopes). You can actually look at the equations here and they're very intuitively obviously right. For example, the first AC coefficient should be corrected using the difference of the neighboring DC coefficients. In case you don't see that that's obviously right, if you have DC's like [8],[16],[24] after dequantization at Q=8, and your AC's all got quantized to zero, obviously the original image most likely had a smooth slope, so the first AC in the middle block should be predicted to be the linear interpolation.

An interesting one I found that's related to the stuff I tried with smooth reconstruction of the DC band is : "Improvement of DCT-based Compression Algorithms Using Poisson�s Equation" by Yamatani and Saito (2006) .

BTW a related issue that often comes up is the incorrectness of center dequantization of AC coefficients. I've written about this before and lots of these papers mention it; the best full note on it is : "Biased Reconstruction for JPEG Decoding" by Price.

The very modern stuff has gotten quite arcane. People now are doing things like directional overcomplete wavelets on the reproduced image; with this they can detect both block artifacts and also ringing and other quantized transform artifacts. They then use maximum-likelihood markov models to guess what the source image was that produced this output. This stuff is extremely complex and I haven't really followed it because it's nowhere near realtime, but probably the best solution for offline very high quality JPEG decoders.

An interesting outlier is John Costella's Unblock . It's based on a clever simple idea that I've never seen anywhere else. Unblock is based on the assumption that pixels near the block boundaries come from the same model as pixels in the centers of blocks. That sounds obvious but it's quite profound. It means that pixels near the edges of blocks should have the same statistics as pixels in the centers (in the maximum likelihood lingo, this is a prior we can use to choose an optimal output). In particular, it's useful because in the DCT the interior pixels are much more accurate than the edge pixels. What Unblock does is looks at the statistics of the decompressed interior pixels and assumes those are our goal, and then it forces the pixels near the edge to match the statistics of the interior. The corrections are applied as wide smooth filters.


Autodidactic Asphyxiation said...

So Sean's idea of using the midpoint of quantization regions, and the papers' constraint of staying within the quantization regions, during decompression seem to be motivated to stay within what could be considered an up-to-spec JPEG decompressor. It doesn't really look like Unblock does that.

How desirable is it to stay within the quantization regions? It seems like you have to for quantitatively accurate reconstruction. Perhaps it is possible to try to match block boundaries with block center statistics by manipulating the AC coefficients instead of the post-decoded colors.

Also, I would suppose that knowing about the deblocking filter in advance would help compression somewhat, a la h.263/4's in-look deblocking. Of course, I have no idea how such things actually would work.

cbloom said...

"It doesn't really look like Unblock does that."

Yeah, Unblock is just a spatial filter in the end so it doesn't know about maintaining the AC coefficients. (of course you could use Unblock and then DCT the result and do POCS to iteratively enforce that constraint)

"How desirable is it to stay within the quantization regions?"

Obviously it's theoretically appealing because it gives you an output image that actually compresses to the same stream you decoded. In practice that aspect is not actually important - we just want something that looks good.

The crucial thing that it does is ensure that you aren't doing much harm.

"Also, I would suppose that knowing about the deblocking filter in advance would help compression somewhat,"

Yes, actually my encoder implicitly does this because my Lagrange search finds the optimal lambda including deblocking.

Autodidactic Asphyxiation said...

"Yes, actually my encoder implicitly does this because my Lagrange search finds the optimal lambda including deblocking."

For your custom image format, or for JPEG?

cbloom said...

Custom, but you could easily enough do the same thing for JPEG.

I've been thinking you could make a pretty damn good format using the JPEG bitstream and an optimizing encoder, but you would need a custom decoder to get the full benefit so there's not really a lot of point to using the JPEG bitstream. (you need a custom decoder to do your deblocking as well better dequantization)

Autodidactic Asphyxiation said...

"I've been thinking you could make a pretty damn good format using the JPEG bitstream and an optimizing encoder, but you would need a custom decoder to get the full benefit so there's not really a lot of point to using the JPEG bitstream."

Except you get backwards compatibility with all those JPEG decoders out there (like the one I'm typing in now)? I wonder how bad the naive/legacy decoder would be for something that presumed deblocking and a special quantizer?

astrange said...

I really recommend not reading papers and reading the x264 source instead. It's much more practical.

You'd also find out the real reason H.264 is so good for intra - it's not really because of DCT, since it doesn't even use DCT but just a cheap approximation, but more because of the large set of predictors, DC hadamard, and especially the use of CABAC over VLC coding. And x264's major use of adaptive quantization helps too.

cbloom said...

"but more because of the large set of predictors"

There are only 9 right, and they're just block gradient predictors. It is somewhat interesting and surprising how much that does seem to help. I did some experiments with doing more advanced versions of that.

"DC hadamard"

That's only for sets of four 4x4 blocks right?

"and especially the use of CABAC over VLC coding"

Well, that's silly, compared to what? CABAC is obviously very poor compared to a real coder like ECECOW. I keep finding papers that say how great CABAC is compared to VLC, but it's hard for me to tell if it's actually good compared to a real arithmetic coder.

"And x264's major use of adaptive quantization helps too."

This is something I'm very interested in. I wonder how much adaptive quantization and trellis-truncation helps.

old rants