10/14/2010

10-14-10 - Image Comparison Part 2

Well, I wanted to post JPEG vs x264 numbers, but there's a problem.

The first problem I had was that the headers out of x264 are very large. JPEG has about 240 bytes of header ; MP4 has about 1200 bytes of header for one frame of video only , the .x264 internal format has about 600 bytes of header. So I'm just doing a subtract to correct for that, but that's rather approximate and ugly since you can store side information in headers, etc. But whatever, that's not the big problem.

The big problem is that the "lossless" x264 is actually really bad (it's only lossless after color conversion). Here is the quality for lossless x264 :


rmse : 4.9177 , psnr : 34.3296
ssim : 0.9840 , perc : 88.5788%
scielab rmse : 3.140
scielab ssim : 0.9844 , angle : 88.7457%

I confirmed that is caused by the chroma conversion & subsample by just making the y4m and then converting the y4m back to RGB, and I get the exact same numbers. You don't usually see this because people show PSNR in the Y color plane. That's absolutely terrible. For comparison, here's JPEG at 100% quality :

rmse : 2.1548 , psnr : 41.4967
ssim : 0.9928 , perc : 92.3420%
scielab rmse : 0.396
scielab ssim : 0.9997 , angle : 98.4714%

now JPEG q 100 isn't even lossless, but this is way more like what you'd expect. A 2.0 base rmse is about what you get from the "lossy" old school chroma conversion that JPEG does.

Unfortunately this large constant error ruins any attempt to measure x264's performance. You can see that the RMSE line for x264 is just offset way up :

x264 vs JPEG : log rmse vs log bpp

The slope is way better than JPEG so we think that if the color space wasn't throwing away so much information it would be much better.

So if anybody has a suggestion on how to run x264 without the destructive y4m color transform, I'd appreciate it. I believe the problem is that it's putting YUV back into bytes. I suspect that the color consersions and up/down sample are just not being done very well (I'm using ffmpeg) , maybe there's some -highquality setting that I don't know about that would make them better?

In any case, we can still see a few things. JPEG behaves very badly below 0.5 bpp which x264 degrades nicely into very low bit rates.

x264 vs JPEG : scielab SSIM :

And another point on graph scaling. SSIM is a dot product, and as such is nonlinear and distortion. In particular, in the functional range, SSIM values are usually between 0.980 and 0.990 , which is a tiny and confusing space.

One better way to map it is to turn the dot product into an angle (using acos). I then change the angle into a percent (100% = 0 degrees apart, 0% = 90 degrees apart). The "SSIM angle" plot looks like this :

scielab SSIM angle :

In particular, it's slightly easier to see the separation between normal JPEG and the "flatnosub" (RMSE tuned) JPEG in the SSIM angle plot than in the regular SSIM plot. In regular SSIM everybody goes into this asymptote to 1 together and it makes a mess. This scielab MS-SSIM is semi-perceptual so it rewards JPEG over jpeg-flat-nosub.

You should basically ignore the x264 results here because of the aforementioned problem with large constant error.

ADDENDUM : you can repro the x264 problem like this :


ffmpeg -i my_soup.avi -vcodec libx264 -fpre presets\libx264-lossless_slow.ffpreset test.mkv
ffmpeg -i test.mkv -vcodec png out.png

imdiff my_soup.bmp out.png

and the result is :

rmse : 5.7906 , psnr : 32.9104
ssim : 0.9783 , perc : 86.7096%
scielab rmse : 2.755
scielab ssim : 0.9928 , angle : 92.3736%

or without even getting x264 involved :

ffmpeg -i my_soup.avi -pix_fmt yuv420p my_soup.y4m
ffmpeg -i my_soup.y4m -vcodec png uny4m.png

same results, which shows that the problem is the yuv420 conversion

I confirmed that my_soup.avi is in fact a perfect lossless RGB copy of my_soup.bmp ; Note that this result is even worse than the one I reported above. The one above was made by first converting the AVI to Y4M using Mencoder , so apparently that path is slightly higher quality than whatever ffmpeg is doing here.

ADDENDUM : I think ffmpeg/x264 use "broadcast standard" YUV in the 16-235 range instead of 0-255 , so that might be a large piece of the problem.

latest attempt, still not good :


ffmpeg -f image2 -i my_soup.bmp -sws_flags +accurate_rnd+full_chroma_int -vcodec libx264 -fpre c:\progs\video_tools\ffmpeg-latest\presets\libx264-lossless_slow.ffpreset test.mkv
ffmpeg -i test.mkv -sws_flags +accurate_rnd+full_chroma_int -vcodec png uny4m_2.png
imdiff my_soup.bmp uny4m_2.png

4 comments:

ryg said...

"I think ffmpeg/x264 use "broadcast standard" YUV in the 16-235 range instead of 0-255 , so that might be a large piece of the problem."
Btw, what type of chroma subsampling do you use for JPEG/NewDCT? I'm not sure whether the IJG code defaults to 2:1 chroma subsampling in both horizontal or vertical direction, or just 2:1 horizontal with no vertical subsampling. The latter would make a big difference.

.y4m looks straightforward enough, can't you just write it out yourself using the JPEG color transform and a decent downsampler, and similarly have the video decoder output to .y4m and do the chroma upsampling/YCbCr->RGB yourself? That would get rid of a lot of possible problems.

cbloom said...

"Btw, what type of chroma subsampling do you use for JPEG/NewDCT?"

They both use 2x2 chroma subsampling, so-called "420"


".y4m looks straightforward enough, can't you just write it out yourself using the JPEG color transform and a decent downsampler, and similarly have the video decoder output to .y4m and do the chroma upsampling/YCbCr->RGB yourself? That would get rid of a lot of possible problems."

Yeah, I've been thinking about that for a while, but it's rather a lot of work just to test someone else's compressor.

I wish there was a standard float image format, because then we could separate the color transform part from the plane coding. There could be a driver that does the color convert and assigns a number of bits to each plane, and then you just run a gray scale plane coder on each one. It would make comparison much easier. But it has to be a float image format. And of course that presupposes that your only handling of color is in the transform, which is woefully primitive.

cbloom said...

BTW NewDCT downsamples with the 6-tap filter described here :

http://cbloomrants.blogspot.com/2009/06/06-17-09-inverse-box-sampling-part-2.html

cbloom said...

I've discovered there *is* a YUVJ in ffmpeg :

http://ffmpeg.org/doxygen/0.5/pixfmt_8h.html

however it doesn't seem to be actually supported by anything.

the Y4M output pipe refuses to take it, and so does libx264

.. apparently yuvj is deprecated, you're supposed to use YUV420 and set AVCOL-RANGE-JPEG somehow, but I can't figure out how to set that.

I think I've figured out how to make them do the up/down sample correctly. You specify :

-sws_flags +bicubic+accurate_rnd+full_chroma_int


Not sure if full_chroma_inp is also helpful

old rants