01-12-11 - ImDiff Sample Run and JXR test

This is the output of a hands-off fully automatic run :

(on lena 512x512 RGB ) :

I was disturbed by how bad JPEG-XR was showing so I went and got the reference implementation from the ISO/ITU standardization committee and built it. It's here .

They provide VC2010 projects, which is annoying, but it built relatively easily in 2005.

Unfortunately, they just give you a bunch of options and not much guide on how to get the best quality for a given bit rate. Dear encoder writers : you should always provide a mode that gives "best rmse" or "best visual quality" for a given bit rate - possibly by optimizing your options. They also only load TIF and PNM ; dear encoder writers : you should prefer BMP, TGA and PNG. TIF is an abortion of an over-complex format (case in point : JXR actually writes invalid TIFs from its decoder (the tags are not sorted correctly)).

There are two ways to control bit-rate, either -F to throw away bit levels or -q to quantize. I tried both and found no difference in quality (except that -F mucks you up at high bit rate). Ideally the encoder would choose the optimal mix of -F and -q for R/D. I used their -d option to set UV quant from Y quant.

There are three colorspace options - y420,y422,y444. I tried them all. With no further ado :

Conclusions :

This JXR encoder is very slightly better than the MS one I was using previously, but doesn't differ significantly. It appears the one I was using previously was in YUV444 color space. Obviously Y444 gives you better RMSE behavior at high bitrate, but hurts perceptual scores.

Clearly the JXR encoders need some work. The good RMSE performance tells us it is not well perceptually optimized. However, even if it was perceptually optimized it is unlikely it would be competitive with the good coders. For example, Kakadu already matches it for RMSE, but kills it on all other metrics.

BTW you may be asking "cbloom, why is it that plain old JPEG (jpg_h) tests so well for you, when other people have said that it's terrible?". Well, there are two main reasons. #1 is they use screwed up encoders like Photoshop that put thumbnails or huge headers in the JPEG. #2 and probably the main reason is that they test at -3 or even -4 logbpp , where jpg_h falls off the quality cliff because of the old-fashioned huffman back end. #3 is that they view the JPEG at some resolution other than 1:1 (under magnification of minification); any image format that is perceptually optimized must be encoded at the viewing resolution.

One of the ways you can get into that super-low-bitrate domain where JPEG falls apart is by using images that are excessively high resolution for your display, so that you are always scaling them down in display. The solution of course is to scale them down to viewing resolutions *before* encoding. (eg. lots of images on the web are actually 1600x1200 images, encoded at JPEG Q=20 or something very low, and then displayed on a web page at a size of 400x300 ; you would obviously get much better results by using a 400x300 image to begin with and encoding at higher quality).


ryg said...

1. Both the HD Photo SDK and the reference en/decoders don't do any RDO (as far as I can tell)
2. It always uses flat quantization matrices for AC coeffs (the format doesn't allow anything else), so it's effectively always optimizing for RMSE/PSNR. An encoder can change quantization per-macroblock, that's it.
3. Didn't find any informal descriptions of the JXR entropy coder, just the spec (and the only way I'd wade through that and figure out what it actually does is if someone paid me to do it). It looks better than Huffman-coded JPEG. Not sure how it would compare to H.264 CAVLC. Definitely worse than CABAC or a competent arith coder.

ryg said...

Short version: The transform is optimized for PSNR not visual quality (all it really does is trade clearly visible blocks with sharp edges for clearly visible blocks with smoothed edges), and the rest of the format likewise.

A better encoder would help, but the forced flat quant and 4x4 blocks seem like a liability.

cbloom said...

1. Yeah, so it seems. I don't understand how they get off making all these claims about their format without ever writing a proper encoder.

2. Indeed.

3. I read through the whole thing in detail at some point, though I forget most of it. The back end actually seems decent; it's got some questionable over-complex stuff like the dynamic coefficient reordering. But the back end is as good as WebP for example.

I'm sure if they had a decent R/D encoder and a perceptual quantization matrix they would be okay.

old rants