10-08-10 - Optimal Baseline JPEG

One of the things we are missing is a really good modern JPEG encoder/decoder. I mentioned most of this in the WebP post, but I thought it was important enough to repeat. This would be a great project if someone wants to do it; I'd like to, I think it's actually important, not just as a fair comparison between modern coders like x264 and good old JPEG, but also because it would actually be useful to people who care about JPEG images. (eg. a common use case is you have some old jpeg and you want to decode it as well as possible.

Using normal JPEG code streams, but trying to make the encoder & decoder as good as possible, you should do something like :

Encoder :

  • RDO based structure; eg. encoder is given lambda and finds optimal R/D point. Unfortunately this has to be iterative because of huffman codes, decisions in one pass affect the huffman codes for the next pass.

  • A good perceptual metric to target. Maybe SSIM or x264's funny SATD activity thing, or something else.

  • Trellis quantization; the JPEG-huff code block structure lends itself to trellis state optimization pretty directly.

  • Better chroma subsample (aware of the up-filter).

  • Quant matrix optimization for perceptual metric.

Decoder :

  • Deblocking filter, or maybe the "Unblock" histogram non-filter approach or some combination.

  • Luma-aided chroma upsample

  • Expectation-in-bucket instead of mean-in-bucket dequantization.

  • Noise reinjection , perhaps predicting where some of the zeros in the DCT should in fact be small non-zeros.

  • Shape-aware deringing ; similar to camera denoisers, there's a lot of work on this in the literature.

1 comment:

ryg said...

Encoder-wise, there's some (fairly old) papers that at least implement basic RDO and iterative optimization, e.g. this: http://www.ece.umassd.edu/FACULTY/acosta/ICASSP/Icassp_1995/pdf/ic952331.pdf (There's also http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4379276 which is considerably more recent but behind a pay wall; the abstract suspiciously sounds like the authors did exactly the same stuff as in the 1995 paper without being aware of it).

JPEG is different from most audio/video formats in that the vast majority of JPEG-supporting apps use the same encoder/decoder (the IJG JPEG lib). Getting a half-decent implementation of one feature into the IJG lib is orders of magnitude more valuable than an excellent implementation of the same feature in a new library (or standalone encoder/decoder with source).

old rants