1. Don't test JPEG at quality levels below 40, it doesn't work.
2. Don't compare to "JPEG" when you use a bad encoder and use basic huff-only. At least use IJG and use -optimize and -progressive.
3. Don't test on a 2400 line image when you will be viewing it on a 1200 line monitor. Always view your tests at native pixel res, that's what the compressors are designed for.
So anyway, I found a "PDI-Target" image that's 2297 x 3600 in a jpeg ( here among other places ). I scaled it to 766x1200 so it would fit on my monitor. I ran Hipix "Good" to set the target file size - it used 117050 bytes, which 1.019 bpp , which is a reasonable target for compression. Then I ran JPEG and Kakadu to try to get the same file sizes.
Here are the images and you can look at them with your own eyes :
PDI 766 x 1200 original
JPEG Arith , 116818 bytes
JPEG PackJPG , 116457 bytes
Kakadu JPEG 2000 , 117120 bytes
Hipix , 117050 bytes
Note : do NOT zoom in when comparing them! Also, it's easier to see differences in A/B toggle tests, so before you post a comment, download the above and do A/B toggle testing with something like ACDSee in full screen mode please.
(BTW I'm using PackJPG now instead of PAQ for the "modern entropy backend of JPEG" ; PackJPG is very good, it's fast, and it also doesn't have the bug that PAQ has for some small size files; it usually compresses slightly larger than PAQ, but pretty close; also I've switched JPEG-Huff to progressive as it helps slightly (doesn't help JPEG-ari or JPEG-pack))
My subjective conclusion :
Overall I think Hipix is the worst quality. It's the only one that badly screws up parts of the faces, messes up some large-scale DC, and just generally loses tons of detail. JPEG preserves detail and sharpness way better than any other, and is only really bad in one way - ringing artifacts, it has lots of ringing artifacts. Kakadku is amazingly free of ringing (some early JPEG2000 coders suffered from bad ringing), but it also just blurs the hell out of the image. If you look at the Kakadu output without comparing to the original, it looks pretty nice and artifact free, but when compared to the original it looks just like a gaussian blur has been run on the whole image.
Basically Kakadu has the least visually annoying "artifacts" , but at the cost of pretty severe blurring and general chunky blobby look everywhere. JPEG is great except for ringing articacts. Hipix is somewhere between the two (it both blurs and rings) but is just not a good middle ground, it's worse than an average of the two.
Some portions. The order in these images is :
[ original , hipix, kakadu, jpeg pack, jpeg ari ]
Fruit :
Bad chunkiness in the hipix image. Kakadu also has nasty edges on the apple. JPEG looks like the winner, despite some ringing around the lemon.
Leather satchel thingy :
Hipix and Kakadu both completely destroy the detail of the leather texture, lots of blurring.
Black baby's hair :
This one might be clearest win for JPEG. Excellent preservation of the detail in both JPEGs. Kakadu and Hipix both blur the bang wisps to hell. Hipix also creates a bad overall change of color and brightness, this is easist to say by toggling the original vs the hipix version.
Sunflower :
Note how blurry Kakadu is, especially the nasty chunky blurs on the lower stem area and the curve of the leaf on the right. Some bad ringing in JPEG around the stem and leaves.
Gear and circuit board :
Hipix and Kakadu again just toss out detail on the gear like crazy. Kakadu blurs the circuit board to all hell. The JPEGs actually add detail to the circuit board that shouldn't be there by ringing ;)
Hand and corn :
Hipix stands out here by completely screwing up the back of the hand, throwing away all detail and changing overall luma and adding weird chunkies. The JPEGs as usual do great with detail, the back of the hand is best on the JPEGS, but the lower edge and the fingers show some bad ringing.
CDs :
Again Hipix stands out as the only one that makes the rainbow patterns all chunky. Kakadu does okay on the interior rainbows but ruins the the edges of the left CD with blurry chunks. The JPEG does well except for some ringing on the inside circular edge of the CD.
Robots :
JPEG ringing is really bad on these, notice the black disease all over the robots body, and chroma distortion on the robot's left hand. Hipix makes the diagonal edge in the lower left all chunky and has a little ringing. Kakadu is probably best here.
Color Boxes :
JPEG is the only one that does really badly on these, creating ringing ghosts in the colors from the black bars. Hipix does very well on this type of "graphic arts" material (just as WebP does BTW), so if you are doing graphic-design type images it might be a win there (though I'm guessing x264 probably does that better, or you know, you could just use PNG). ( Color boxes shows [ original , hipix, kakadu, jpeg-xr, jpeg pack, jpeg ari ] )
Some charts for good measure :
Kakadu is by far the best numeric performer. Its one big fault is making everything blurry. Since our perceptual metric so far does not have any measure of detail preservation, Kakadu gets away with it (SSIM doesn't do much for us here).
You can really see the way JPEG works from these test sets. If you take any of them and zoom up a lot, the JPEG just looks horrible. But at correct pixel size, they look great. This is because JPEG is intentionally allowing errors that are just under the threshold of visibility.
In normal viewing conditions, JPEG is just great. One usage in which it is not great is for video game textures, because those often get sheared, zoomed, colored, etc. which ruins the JPEG perceptual model, which means they may have much larger visible artifacts than other compressors.
What are some valid complaints about JPEG ?
1. Yes there are a lot of bad encoders out there and the average JPEG that's out on the net is probably pretty far from optimal. In the WebP recompression project, you could easily replace that with just re-jpegging the JPEGs. (this includes people using grossly wrong quality settings, or not down-scaling images that will be shown very small on the web page).
2. It falls apart at very low quality. If for some reason you really need super low bit rates, JPEG is not for you. (However, the common test that people do of very large images at very low bit rates is not a valid test, nor is cranking down the quality to "see the difference").
3. JPEG needs to be viewed at native res with the original pixel intensities. The whole way it works is based on the human-optical model, so if your image will be stretched or shown in some weird way, JPEG is not for you.
4. It does create a lot of ringing. This is sort of an inherent trade off in signal processing - when you represent a signal with a truncated basis set, you can either get smoothing or ringing. JPEG is way towards the choice of ringing, not smoothing, it might be slightly more ideal to be able to get somewhere in between.
3 comments:
Since ringing is the worst issue in Jpeg, it's worth mentioning...
You've talked about this before (and posted links to various attempts at it), but since a naive jpeg decompressor is attempting to construct an image block that could have compressed to the end result (ignoring the chroma upsampling, which I don't think obeys this rule), we can improve the ringing by figuring out how to decompress the blocks to favor a least-ringing source image.
The problem is how to do so well and computationally efficiently.
Yeah. Obviously there are deringing filters similar to unblocking filters that you can just run on the output (basically they come down to running a strong smoothing filter like median and restricting to the area near very hard edges).
But you raise an interesting idea. One approach would be to modify the DCT basis functions as you decode a block.
That is, when I see a value of 1 for the [4,6] coefficient, what should I add into my 8x8 block? I could just add in the normal DCT_46 basis, or I could look at what coefficients I have already and add in the most likely shape that would have produced a 46 coefficient given the prior information in the block.
In particular for the case of ringing, if you see a strong edge in the low/mid frequency coefficients, then it is likely that the high frequency contribution is only right on that edge, not all over the block.
Another thing to consider with this kind of synthetic test image is that anything that is able to do any kind of R/D optimization within the image is going to (in some sense) behave differently in such an artificial scenario than it would in a "real one"--even a real one that includes some of those same elements.
Like, Garrett-Glaser often mentions that adaptive quantization is huge because it lets you do R/D, so if you imagine a DCT format that in one image only allows exactly two different quantization choices per block, then such a thing can do a "better job" on images (including the synthetic one), but the kind of good job it can do is dependent on the content and if your content aggressively exercises a broader spectrum of "needs" than a "plausible" real image would, then it's kind of unfair to such a coder.
None of which is saying that looking at this is totally bogus, it's more of a justification for why my image compression corpus doesn't have this kind of crazy test in it. (It probably should, actually, just it should be downweighted.)
Post a Comment