cbloom rants: 04-09-12 - Old Image Comparison Post Gathering

4/09/2012

04-09-12 - Old Image Comparison Post Gathering

Perceptual Metrics, imdiff, and such. Don't think I ever did an "index post" so here it is :

7 comments:

AnonymousApril 10, 2012 at 4:19 PM
You never really drew any final conclusions publically. Was there a bottom line?

I wish the final perceptual results had including jpeg-huffman as well; while I understand your goal was to compare new ones to the "state-of-the-art" jpeg, I'm curious to also know which ones even beat plain-jane jpeg, and by how much (given that some of us continue to use it, since we have easy-to-use free decoders and such).
ReplyDelete
Replies
cbloomApril 11, 2012 at 10:16 AM
But there *are* jpeg_h results in the final runs :

In fact, see the last post with results :

ImDiff Sample Run and JXR test

http://cbloomrants.blogspot.com/2011/01/01-12-11-imdiff-sample-run-and-jxr-test.html
ReplyDelete
Replies
AnonymousApril 11, 2012 at 7:45 PM
Well, ok, yes, but only on Lena, not on the three previous "Perceptual Results" posts.
ReplyDelete
Replies
cbloomApril 12, 2012 at 9:40 AM
Yeah, fair enough, I never did the "here's a bunch of metrics for a bunch of compressors on a bunch of images".

The problem is it's information overload; any one metric only shows you part of the picture, you really want to look at 4 metrics to see what's going on.
ReplyDelete
Replies
AnonymousApril 13, 2012 at 6:22 PM
Yeah, it is a hard problem, I realize.

Your "axes" are: images, metrics, compressors, and compression level.

The last one is the only one that's continuous, so you tended to make that be the x axis, but maybe that's not ideal.

Also, you evaluated the metrics for their correlation to perceived quality, so you could just pick the top metric and stick to that.

So you could make, say, a bar graph that is clusters of bars; each cluster is an image, each bar is a particular metric for that image, the height of the bar is the quality under that metric. Make three bar graphs, one for log bpp -1, one for log bpp 0, and one for log bpp 1.

I dunno, that's probably still terrible too.
ReplyDelete
Replies
AnonymousApril 13, 2012 at 6:23 PM
Whoops, I typo'd "metric" several times where I meant "compressor", since as I said, this is assuming just use the top metric.
ReplyDelete
Replies
cbloomApril 14, 2012 at 9:31 AM
Nah, I don't just mean it's "information overload" = "it's a hard problem".

What I mean is there's a certain amount of expert analysis needed to make sense of the numbers. Hopefully somebody who read the whole series of posts picks up a bit of the flavor of how I look at the results and learns to do it a bit themselves.

If I just put up a bunch of summary numbers, that tempts people to just go and look at those numbers without reading the more detailed analysis.

I believe it's one of those cases where putting simple numbers on things actually gives you worse information.

It's sort of like how a CIA analyst only passes on their expert summary, not the original source information, because people who are untrained in interpreting the source information can make foolish conclusions from it (like thinking Iraq was somehow involved in 9/11 LOL); it's a case where the individual facts can actually be misleading unless you keep the whole picture in mind.

It's sort of like giving schools or teachers a single numerical score; you've actually greatly *decreased* the depth of analysis by doing that.

The thing about the perceptual metrics is they can all be fooled in certain ways; you have to look at the behavior under a few metrics, and at many bit rates to get a holistic view of the nature of the coder.

Maybe I will write a summary because I have a few more things to say.
ReplyDelete
Replies

Add comment