12-06-10 - More Perceptual Notes

To fit an objective synthetic measure to surveyed human MOS scores, the VQEG uses the form :

A1 + A2 * x + A3 * logistic( A4 * x + A5 )

I find much better fits from the simpler form :

B1 + B2 * x ^ B3

Also, as I've mentioned previously, I think the acos of SSIM is a much more linear measure of quality (SSIM is like a Pearson correlation, which is like a dot product, so acos makes it like an angle). This is borne out by fitting to TID2008 - the acos of SSIM fits better to the human scores (though fancy fit functions can hide this - this is most obvious in a linear fit).

But what's more, the VQEG form does not preserve the value of "no distortion", which I think is important. One way to do that would be to inject a bunch of "zero distortion" data points into your training set, but a better way is to use a fit form that ensures it explicitly. In particular, if you remap the measured MOS and your objective score such that 0 = perfect and larger = more distorted, then you can use a fit form like :

C1 * x + C2 * x ^ C3

(C3 >= 0) , so that 0 maps to 0 absolutely, and you still get very good fits (in fact, better than the "B" form with arbitrary intercept). (note that polynomial fit (ax+bx^2+cx^3) is very bad, this is way better).

A lot of people are using Kendall or Spearman rank correlation coefficients to measure perceptual metrics (per VQEG recommendation), but I think that's a mistake. The reason is that ranking is not what we really care about. What we want is a synthetic measure which correctly identifies "this is a big difference" vs. "this is a small difference". If it gets the rank of similar differences a bit wrong, that doesn't really matter, but it does show up as a big difference in Kendall/Spearman. In particular, the *value* difference of scores is what we care about. eg. if it should have been a 6 and you guess 6.1 that's no big deal, but if you guess it's a 9 that's very bad. eg. if your set of MOS scores to match is { 3, 5.9, 6, 6.1, 9 } , then getting the middle 3 in the wrong order is near irrelevant, but the Kendall & Spearman have no accounting for that, they are just about rank order.

The advantage of rank scores of course is that you don't have to do the functional fitting stuff above, which does add an extra bias, because some objective scores might work very well with that particular functional fit, while others might not.

No comments:

old rants