1/10/2011

01-10-11 - Perceptual Metrics

Almost done.

RMSE of fit vs. observed MOS data :


RMSE_RGB             : 1.052392
SCIELAB_RMSE         : 0.677143
SCIELAB_MyDelta      : 0.658017
MS_SSIM_Y            : 0.608917
MS_SSIM_IW_Y         : 0.555934
PSNRHVSM_Y           : 0.521825
PSNRHVST_Y           : 0.500940
PSNRHVST_YUV         : 0.480360
MyDctDelta_Y         : 0.476927
MyDctDelta_YUV       : 0.444007

BTW I don't actually use the raw RMSE as posted above. I bias by the sdev of the observed MOS data - that is, smaller sdev = you care about those points more. See previous blog posts on this issue. The sdev biased scores (which is what was posted in previous blog posts) are :


RMSE_RGB             : 1.165620
SCIELAB_RMSE         : 0.738835
SCIELAB_MyDelta      : 0.720852
MS_SSIM_Y            : 0.639153
MS_SSIM_IW_Y         : 0.563823
PSNRHVSM_Y           : 0.551926
PSNRHVST_Y           : 0.528873
PSNRHVST_YUV         : 0.515720
MyDctDelta_Y         : 0.490206
MyDctDelta_YUV       : 0.458081
Combo                : 0.436670 (*)

(* = ADDENDUM : I added "Combo" which is the best linear combo of SCIELAB_MyDelta + MS_SSIM_IW_Y + MyDctDelta_YUV ; it's a static linear combo, obviously you could do better by going all Netflix-Prize-style and treating each metric as an "expert" and doing weighted experts based on various decision attributes of the image; eg. certain metrics will do better on certain types of images so you weight them from that).

For sanity check I made plots (click for hi res) ; the X axis is the human observed MOS score, the Y axis is the fitted metric :

Sanity is confirmed. (the RMSE_RGB plot has those horizontal lines because one of the distortion types is RGB random noise at a few fixed RMSE levels - you can see that for the same amount of RGB RMSE noise there are a variety of human MOS scores).

ADDENDUM : if you haven't followed old posts, this is on the TID2008 database (without "exotics"). I really need to find another database to cross-check to make sure I haven't over-trained.

Some quick notes of what worked and what didn't work.


What worked :

Variance Masking of high-frequency detail

Variance Masking of DC deltas

PSNRHVS JPEG-style visibility thresholds

Using the right spatial scale for each piece of the metric
  (eg. what size window for local sdev, what spatial filter for DC delta)

Space-frequency subband energy preservation

Frequency subband weighting


What didn't work :

Luma Masking

LAB or other color spaces than YUV in most metrics

anything but "Y" as the most important part of the metric

Nonlinear mappings of signal and perception
  (other than the nonlinear mapping already in gamma correction)

4 comments:

ryg said...

Wow, that looks pretty damn sweet.

It's interesting that the MyDct plot has a few outliers where the visual quality is underestimated, but not where it's significantly overestimated. I wonder if that's just a blip or reproducible with other datasets.

How does the spatial band energy preservation work? Do you partition the DCT coefficients into a set of buckets of roughly similar frequency content and take a weighted L2 norm of the difference, or is it more complicated?

cbloom said...

"It's interesting that the MyDct plot has a few outliers where the visual quality is underestimated, "

Yeah, I was wondering what that is. If I was spending more time on this I would look at those images and see what it is about them that's different.

"Do you partition the DCT coefficients into a set of buckets of roughly similar frequency content and take a weighted L2 norm of the difference, or is it more complicated? "

Yep, pretty much just that. Though L1 is better than L2 and the exact composition and weighting of the groups matters. I described an older version of the idea here :

http://cbloomrants.blogspot.com/2010/10/10-30-10-detail-preservation-in-images.html

bztdlinux said...

I have been unable to find any information regarding how to apply these metrics to YUV data - for example, linear weights to apply to each plane. For example, although you have a YUV implmentation of PNSR-HVS-M, the original Matlab source by the author only works in grayscale.

cbloom said...

@bztdlinux - yes, that is one of the issues that is just glossed over by most.

In some of my metrics I use this :

double planeWeights[3] = { 1.0, 0.879837, 0.412461 };

but it's more subtle than that. For example the variance masking usually just comes from the Y plane but affects all 3 planes.

On many of the test sets you can do well just measuring error in Y, because they don't stress chroma-only error well. Really we need much bigger & more varied test sets and good human ratings on them.

old rants