Spearman rank correlations (it's just the Pearson correlation on sort ranks) :
mse.txt 0.66782 square_ms_ssim_y.txt 0.747733 scielabL1.txt 0.855355 scielabL2.txt 0.868207 vif.txt 0.87631 ms_ssim_bad_down.txt 0.880403 ms_ssim.txt 0.89391 iw_ms_ssim.txt 0.901085 wsnr.txt 0.90905 mydctdelta.txt 0.932998 my_psnr_hvs_m.txt 0.94082 mydctdeltanew.txt 0.944086
I just thought of a new way to check the fit for this kind of scenario which I think is pretty cool.
You have two vectors of values. One vector is the original which you wish to match, the other is output by your program to try to match the original vector. The problem is that your program outputs in some other scale, different units which may involve some warping of the curve, and you don't know what it is. You wish to find how close your program is to the target without worry about matching that curve.
Well, one way is "rank lookup error" :
given vectors "orig" and "test" find "test_rank" such that r = test_rank[i] means item i is the r'th in sorted order in the vector test find "sorted_orig" = orig, sorted Sum{i} : err += square( orig[i] - sort_orig[ test_rank[ i ] ] )that is, the fit value for mapping from test's scale to orig is to find the sort index within test, and lookup the value in the sorted list of originals.
Obviously this isn't quite ideal; it does handle ties and near-ties pretty well though (better than Spearman/Kendall, because you get less error contribution when you get the rank wrong of two items with very similar value). Most importantly it avoids all the fidgetty function fitting stuff.
Here are the scores with "rank lookup rmse" :
mse.txt 1.310499 square_ms_ssim_y.txt 1.090392 scielabL1.txt 0.853738 scielabL2.txt 0.835508 ms_ssim_bad_down.txt 0.716507 ms_ssim.txt 0.667152 wsnr.txt 0.643821 vif.txt 0.591508 iw_ms_ssim.txt 0.575474 mydctdelta.txt 0.548946 my_psnr_hvs_m.txt 0.543057 mydctdeltanew.txt 0.494918if nothing else it's a good sanity check for the fitting stuff.
Also with rank-lookup-error you don't have to worry about transforms like acos for ssim or undo-psnr or anything else that's monotonic.
For comparison, these were the fit scores :
mse.txt 1.169794 square_ms_ssim_y.txt 1.005849 scielabL1.txt 0.819501 scielabL2.txt 0.808595 ms_ssim_bad_down.txt 0.690689 ms_ssim.txt 0.639193 wsnr.txt 0.632670 vif.txt 0.576114 iw_ms_ssim.txt 0.563879 my_psnr_hvs_m.txt 0.552151 mydctdelta.txt 0.548938 mydctdeltanew.txt 0.489756
mydctdeltanew is a new one; it just uses mydctdelta for only the AC part of the DCT (excludes the DC which is gross luma differences), and then it adds back on the gross luma difference using a form similar to IW-MS-SSIM (that is, using large filters and then using both variance masking and saliency boosting).
2 comments:
"Obviously this isn't quite ideal; it does handle ties and near-ties pretty well though (better than Spearman/Kendall, because you get less error contribution when you get the rank wrong of two items with very similar value)."
Squaring the error seems combinatorially perverse.
In general, wouldn't you ideally want to use a measure of the sequence's relative sortedness? Classical choices include the inversion number, the crossing number, the number and length of runs, the minimum number of adjacent transpositions to sort the sequence, etc.
These all reflect different aspects of relative sortedness you might want to capture. In your application, where it's just a basic sanity check, it probably doesn't matter.
"In general, wouldn't you ideally want to use a measure of the sequence's relative sortedness?"
The point I've been trying to make is that that is not what I want to measure. Kendall and Spearman *are* measured of relative sortedness, and I'm arguing that since sorting is not what you care about, that it can give the wrong picture of which metric is best.
It appears that "rank lookup mse" does in fact give ratings to the metrics which are much closer to the order of the rating I actually care about (which is the ability to fit the scores).
Post a Comment