10/15/2010

10-15-10 - Image Comparison Part 8 - Hipix

Hipix is a commercial lossy image tool. I hear it is based on H264 Intra so I wanted to see if it was a better version of that idea (x264 and AIC both having let me down).

Well, at this point we should be unsurprised that it sucks balls :

log rmse :

ms-ssim-scielab :

One note : the rightmost data point from hipix is their "perfect" setting, which is very far from perfect. It's only a little over 2 bpp and the quality is shit. I feel bad for any sucker customers who are saving images as hipix "perfect" and thinking they are getting good quality.

I started to think , man maybe my ms-ssim-scielab is just way off the mark? How can everyone be so bad? Any time your test is telling you things that are hard to believe, you need to reevaluate your test. So I went and looked at the images with my own eyes.

Yep, hipix is awful. JPEG just blows it away.

A sample from the hipix image closest to the 0 on the x axis , and a JPEG of the same size : (HiPix is 230244 bytes, JPEG is 230794 bytes)

JPEG :

HiPix :

Note the complete destruction of the wood grain detail in the hipix, as well as introduction of blockiness and weird smudge shapes. Note the destruction of detail in the plate rim, and the ruining of the straight line edge of the black bowl.

BTW when you are evaluating perceptual quality, you should *NOT* zoom in! JPEG is optimized for human visual system artifact perceptibility at the given scale of the image. JPEG intentionally allows nasty artifacts that look bad when you zoom in, but not when you look at the image in its normal size.

Conclusion : Hipix needs to immediately release a "new and improved HiPix v2.0 that's way better than the last!" by just replacing it with JPEG.

Since they don't offer a command line app I won't be testing this on any more images.

ADDENDUM : Well I ran two points on Moses :

The two points are "High" and "Perfect" and perfect is way not perfect.

10-15-10 - Image Comparison Part 7 - WebP

I thought I wasn't going to be able to do this test, because damn Google has only released webpconv for Linux (or you know, if you download Cygwin and built yourself WTFBBQ). But I found these :

WebP for .NET

webp.zip solution for VC

... both of which are actually broken. The .NET one just fails myseriously on me. The webp.zip one has some broken Endian stuff, and even if you fix that the BMP input & output is broken. So.. I ripped it out and relaced it with the cblib BMP in/out, and it seems to work.

(I didn't want to use the webp converter in ffmpeg because I've seen past evidence that ffmpeg doesn't do the color conversion and resampling right, and I wanted to use the Google-provided app to make sure that any bad quality was due only to them)

My build of WebP Win32 is here : webp.zip

Here are the results :

log rmse :

ms-ssim-scielab :

Now I am surprised right off the bat that the ms-ssim-scielab results are not terrible, but the rmse is not very good. I've read rumors in a few places that On2 tweaked WebP/WebM for RMSE/PSNR , so I expected different.

Looking at the RMSE curve it's clear that there is a bad color conversion going on. Either too much loss in the color convert, or bad downsample code, something like that. Any time there is a broken base color space, you will see the whole error curve is a bit flatter than it should be and offset upwards in error.

The perceptual numbers are slightly worse than jpeg-huff through the "money zone" of -1 to 1. Like all modern coders it does have a flatter tail so wins at very low bit rate.

(BTW I think JPEG's shortcoming at very low bit rate is due to its very primitive DC coding, and lack of deblocking filter, but I'm not sure).

BTW I also found this pretty nice Goldfishy WebP comparison

Also some pretty good links on JPEG that I've stumbled on in the last few days :
jpeg wizard.txt
ImpulseAdventure - JPEG Quality Comparison
ImpulseAdventure - JPEG Quality and Quantization Tables for Digital Cameras, Photoshop
ImpulseAdventure - JPEG Compression and JPEG Quality

Here's how the WebP test was done :


webp_test.bat :
call dele s:\*.bmp
call dele s:\*.webp
Release\webp -output_dir s:\ -format webp -quality %1 r:\my_soup.bmp
Release\webp -output_dir s:\ -format bmp s:\my_soup.webp
namebysize s:\my_soup.bmp s:\my_soup.webp s:\webp_ .bmp
call mov s:\webp_*.bmp r:\webp_test\



md r:\webp_test
call dele r:\webp_test\*
call webp_test 5
call webp_test 10
call webp_test 15
call webp_test 20
call webp_test 25
call webp_test 30
call webp_test 40
call webp_test 50
call webp_test 60
call webp_test 65
call webp_test 70
call webp_test 75
call webp_test 80
call webp_test 85
call webp_test 90
call webp_test 95
call webp_test 100
call mov s:\webp_*.bmp r:\webp_test\
imdiff r:\my_soup.bmp r:\webp_test -cwebp_imdiff.csv
transposecsv webp_imdiff.csv webp_trans.csv

BTW the webpconv app is really annoying.

1. It fails out mysteriously in lots of places and just says "error loading" or something without telling you why.

2. It takes an "output_dir" option instead of an output file name. I guess that's nice for some uses, but you need an output file name option for people who are scripting. (you can fix this of course by making your batch rename the input file to "webp_in" or something then you ran rename the output at will)

3. It's got image format loaders for like 10 different formats, but they're all semi-broken. Don't do that. Just load one standard format (BMP is good choice) and support it *well* , eg. be really compliant with variants of the bitstream, and let the user convert into that format using ImageMagick or something like that.

4. It won't write the output files if they already exist, and there's no "force overwrite" option. This one had me absolutely pulling out my hair as I kept running it with different options and the output files stayed the same. (you can fix this of course by making your batch delete the output first)

Despite all this negativity, I actually do think the WebP format might be okay if it had a good encoder.

ADDENDUM : WebP on Moses :

On "my_soup" it looked like WebP was at least close to competitive, but on Moses it really takes itself out of the running.

10-15-10 - Image Comparison Part 6 - cbwave

"cbwave" is my ancient wavelet coder from my wavelet video proof of concept. It's much simpler than JPEG 2000 and not "modern" in any way. But I tacked a bunch of color space options onto it for testing at RAD so I thought that would be interesting to see :

cbwaves various colorspaces :

log rmse :

ms-ssim-scielab :

notes :

RMSE : Obviously no color transform is very bad. Other than that, KLT is surprisingly bad at high bit rate (something I noted in a post long ago). The other color spaces are roughly identical. This coder has the best RMSE behavior of any we've seen yet. This is why wavelets were so exciting when they first came out - this coder is incredibly simple, there's no RDO or optimizing at all, it doesn't do wavelet packets or bit planes, or anything, and yet it beats PAQ-JPEG (on rmse anyway).

MS-SSIM-SCIELAB : and here we see the disappointment of wavelets. The great RMSE behavior doesn't carry over to the perceptual metric. The best color space by far is the old "YUV" from JPEG, which has largely fallen out of favor. But we see that maybe that was foolish.

cbwave also has an option for downsampling chroma, but it's no good - it's just box downsample and box upsample, so these graphs are posted as an example of what bad chroma up/down sampling can do to you : (note that the probem only appears at high bit rates - at low bit rates the bad chroma sampling has almost no effect)

log rmse :

ms-ssim-scielab :

cbwave is a fixed pyramid structure wavelet doing daub97 horizontally and cdf22 vertically; the coder is a value coder (not bitplane) for speed. Some obvious things to improve it : fix the chroma subsample, try optimal weighting of color planes for perceptual quality, try daub97 vertical, try optimal per-image wavelet shapes, wavelet packets, directional wavelets, perceptual RDO, etc.

ASIDE : I'd like to try DLI or ADCTC , but neither of them support color, so I'm afraid they're out.

CAVEAT : again this is just one test image, so don't take too many conclusions about what color space is best.

ADDENDUM : results on "moses.bmp" , a 1600x1600 with difficult texture like "barb" :

Again YUV is definitely best, KLT is definitely worst, and the others are right on top of each other.

10/14/2010

10-14-10 - Xbox 360 vs my HTPC

I was looking at the Xbox 360 sitting on my HTPC, and it makes me really angry. For one thing, I have two perfectly good computers sitting right there that are totally redundant. Why is it so damn hard to play games on the PC?

But more than that, I was thinking my HTPC has got a dual-core AMD chip in it that was like $50, it's got an ATI Dx10 part that was again about $50. It's got a nice stereo-cabinet like case and it's very cool and quiet. I'm pretty sure I could make a console from off the shelf retail parts that would be faster than an Xbox 360 or PS3. Yeah yeah the 360/PS3 are faster in theory if you just count flops or something, but the massive advantage of a proper PC CISC OO core would make my homebrew console faster on most real world code.

Obviously those parts weren't so cheap when the 360 or PS3 were being developed. Also I guess the cost of all the little bits adds up : case, mobo, PSU, DVD drive, hard drive. Still, it's pretty upsetting that our consoles are these awful PowerPC chips with weird GPUs when a proper reasonable computer is so cheap.

10-14-10 - Image Comparison Part 5 - RAD VideoTest

VideoTest is my test video coder for RAD. It's based on "NewDCT" , in fact it has exactly the same DCT core, but it has a sightly better perceptual tuning, and it has a better RDO encoder.

log rmse :

scielab ms-ssim :

videotest vs. newdct is almost identical in rmse, but we did make a nice step up in perceptual measure.

I am finally beating JPEG ari in the perceptual measure, but it's a bit disturbing how much work I had to do! And of course PAQ JPEG still dominates.

The videotest I frame coder has pretty sophisticated RDO, but it's missing a lot of other things that modern coders have, it has no I predictors, no in-frame matches. It uses a little bit of a perceptual D measure for RDO, but not as well tweaked as x264 by a long shot.

videotest currently crashes for high bit rates; I remember I put some stupid fixed size buffer somewhere to get things working one day, and now I forget where it is :( So that's why there are no results for it above 2.0 bpp

10-14-10 - Image Comparison Part 4 - JPEG vs NewDCT

"NewDCT" is a test compressor that I did for RAD. Let's run it through our chart treatment.

For reference, I'll also include plain old JPEG huff , default settings, no PAQ.

log rmse :

scielab ms-ssim :

A few notes :

Note that the blue "jpeg" line is two different jpegs - the top one is jpegflatnosub , optimized for rmse, the bottom one is regular jpeg, optimized for perceptual metric. In contrast "newdct" is the same in both runs, which is semi-perceptual.

The first graph is mainly a demonstration of something terrible that people in literature and all over the net do all the time - they take standard jpeg_huff , which is designed for perceptual quality, and show PSNR/RMSE numbers for it. Obviously JPEG looks really bad when you do that and you say "it's easy to beat" , but you are wrong. It's terrible. Stop it.

In fact in the second graph we see that JPEG's perceptual optimization is so good that even shitty old jpeg_huff is competitive with my newdct above 1.0 bpp . Clearly I still have things to learn from JPEG.

I have no idea what's up with jpeg_paq going off the cliff for small file sizes; it becomes worse than jpeg_ari. Must be a problem in the PAQ jpeg stuff, or maybe an inherent weaknesss in PAQ on very small files that don't give it enough data to learn on.

Note that the three JPEG back ends always give us 3 horiztonal points - they make the same output, only the file sizes are different. (I'm talking about the bottom chart, in the top chart there are two different jpegs and they make different output, as noted previously).

Below 0.50 bpp JPEG does in fact have a problem. All more modern coders will have a straighter R/D line than JPEG does, it starts to slope down very fast. But, images generally look so bad down there that it's rather irrelevant. I noted before that the "money zone" is -1 to 1 in log bpp, that's where images look pretty good and you're getting a good value per bit.

10-14-10 - Image Comparison Part 3 - JPEG vs AIC

Testing Bilsen's AIC (AIC is a subset of H264 Intra without the good encoder of x264) :

Bilsen's AIC doesn't have the crippling low quality colorspace problem of x264, but JPEG just kills it on both metrics. Note that I use jpegflatnosub for rmse and jpeg default options for the perceptual metric.

I think we've already debunked the claims that JPEG is "easy to beat" or "not competitive with modern codecs" or that the "H264 Intra Predictors are a big advantage". (granted AIC is not the best modern codec by a long shot).

I should fill in some more details before I go further.

All the tests so far have been on one image, I made it with my camera by taking a RAW photo and scaling & cropping it down from 4000x3000 down to 1920x1200 to reduce noise and improve chroma resolution. The image is called "my_soup" (maybe I'll post it somewhere for download). I will at some point run some tests on a bunch of images, because it's a bad idea to test on just one.

As I said before, the JPEG I'm using is just IJG , but I am losslessly recompressing the JPEGs with PAQ. I also tried the old JPEG -arith , and I found it's about half way between jpeg-huff and jpeg-PAQ, so I believe this is roughly a fair way of making the JPEG entropy coder back end "modern". I haven't really tried to optimize the JPEG encoding at all, for example there might be better quant matrices, or better options to give to IJG, and obviously you could easily add an unblock on the outside, etc. Without any of that stuff, JPEG is already competitive.

I should also take this chance to state the caveat : MS-SSIM-SCIELAB is in no way a proof of visual superiority. It's the best analytic metric I have handy that is pretty close to visual quality, but the only test we have for real visual quality at the moment is to look at the output with your own eyes.

The jpeg results are made like this :


jpegtest.bat :

c:\util\jpeg8b\cjpeg -dct float -optimize -quality %2 -outfile %1.jpg %1
paq8o8 -6 %1.jpg
call d %1.jpg*
c:\util\jpeg8b\djpeg -dct float -bmp -dither none -outfile de.bmp %1.jpg
namebysize de.bmp %1.jpg.paq8o8 jpeg_test_ .bmp

jpegtests.bat :

call jpegtest %1 5
call jpegtest %1 10
call jpegtest %1 15
call jpegtest %1 20
call jpegtest %1 25
call jpegtest %1 30
call jpegtest %1 40
call jpegtest %1 50
call jpegtest %1 60
call jpegtest %1 65
call jpegtest %1 70
call jpegtest %1 75
call jpegtest %1 80
call jpegtest %1 85
call jpegtest %1 90
call jpegtest %1 95
call jpegtest %1 100
call dele jpg_test\*
call mov jpeg_test_* jpg_test\
imdiff %1 jpg_test -c
call zr imdiff.csv jpg_imdiff.csv
transposecsv jpg_imdiff.csv jpg_trans.csv

10-14-10 - Image Comparison Part 2

Well, I wanted to post JPEG vs x264 numbers, but there's a problem.

The first problem I had was that the headers out of x264 are very large. JPEG has about 240 bytes of header ; MP4 has about 1200 bytes of header for one frame of video only , the .x264 internal format has about 600 bytes of header. So I'm just doing a subtract to correct for that, but that's rather approximate and ugly since you can store side information in headers, etc. But whatever, that's not the big problem.

The big problem is that the "lossless" x264 is actually really bad (it's only lossless after color conversion). Here is the quality for lossless x264 :


rmse : 4.9177 , psnr : 34.3296
ssim : 0.9840 , perc : 88.5788%
scielab rmse : 3.140
scielab ssim : 0.9844 , angle : 88.7457%

I confirmed that is caused by the chroma conversion & subsample by just making the y4m and then converting the y4m back to RGB, and I get the exact same numbers. You don't usually see this because people show PSNR in the Y color plane. That's absolutely terrible. For comparison, here's JPEG at 100% quality :

rmse : 2.1548 , psnr : 41.4967
ssim : 0.9928 , perc : 92.3420%
scielab rmse : 0.396
scielab ssim : 0.9997 , angle : 98.4714%

now JPEG q 100 isn't even lossless, but this is way more like what you'd expect. A 2.0 base rmse is about what you get from the "lossy" old school chroma conversion that JPEG does.

Unfortunately this large constant error ruins any attempt to measure x264's performance. You can see that the RMSE line for x264 is just offset way up :

x264 vs JPEG : log rmse vs log bpp

The slope is way better than JPEG so we think that if the color space wasn't throwing away so much information it would be much better.

So if anybody has a suggestion on how to run x264 without the destructive y4m color transform, I'd appreciate it. I believe the problem is that it's putting YUV back into bytes. I suspect that the color consersions and up/down sample are just not being done very well (I'm using ffmpeg) , maybe there's some -highquality setting that I don't know about that would make them better?

In any case, we can still see a few things. JPEG behaves very badly below 0.5 bpp which x264 degrades nicely into very low bit rates.

x264 vs JPEG : scielab SSIM :

And another point on graph scaling. SSIM is a dot product, and as such is nonlinear and distortion. In particular, in the functional range, SSIM values are usually between 0.980 and 0.990 , which is a tiny and confusing space.

One better way to map it is to turn the dot product into an angle (using acos). I then change the angle into a percent (100% = 0 degrees apart, 0% = 90 degrees apart). The "SSIM angle" plot looks like this :

scielab SSIM angle :

In particular, it's slightly easier to see the separation between normal JPEG and the "flatnosub" (RMSE tuned) JPEG in the SSIM angle plot than in the regular SSIM plot. In regular SSIM everybody goes into this asymptote to 1 together and it makes a mess. This scielab MS-SSIM is semi-perceptual so it rewards JPEG over jpeg-flat-nosub.

You should basically ignore the x264 results here because of the aforementioned problem with large constant error.

ADDENDUM : you can repro the x264 problem like this :


ffmpeg -i my_soup.avi -vcodec libx264 -fpre presets\libx264-lossless_slow.ffpreset test.mkv
ffmpeg -i test.mkv -vcodec png out.png

imdiff my_soup.bmp out.png

and the result is :

rmse : 5.7906 , psnr : 32.9104
ssim : 0.9783 , perc : 86.7096%
scielab rmse : 2.755
scielab ssim : 0.9928 , angle : 92.3736%

or without even getting x264 involved :

ffmpeg -i my_soup.avi -pix_fmt yuv420p my_soup.y4m
ffmpeg -i my_soup.y4m -vcodec png uny4m.png

same results, which shows that the problem is the yuv420 conversion

I confirmed that my_soup.avi is in fact a perfect lossless RGB copy of my_soup.bmp ; Note that this result is even worse than the one I reported above. The one above was made by first converting the AVI to Y4M using Mencoder , so apparently that path is slightly higher quality than whatever ffmpeg is doing here.

ADDENDUM : I think ffmpeg/x264 use "broadcast standard" YUV in the 16-235 range instead of 0-255 , so that might be a large piece of the problem.

latest attempt, still not good :


ffmpeg -f image2 -i my_soup.bmp -sws_flags +accurate_rnd+full_chroma_int -vcodec libx264 -fpre c:\progs\video_tools\ffmpeg-latest\presets\libx264-lossless_slow.ffpreset test.mkv
ffmpeg -i test.mkv -sws_flags +accurate_rnd+full_chroma_int -vcodec png uny4m_2.png
imdiff my_soup.bmp uny4m_2.png

10/12/2010

10-12-10 - Image Comparison Part 1

First of all, let's talk about how we graph these things. For this comparo, I'm measuring RGB L2 error (aka MSE) using IJG JPEG. I'm comparing :

    jpeg : default settings (with -optimize)
    jpegflat : jpeg + flat quantization matrix
    jpegflatnosub : jpegflat + no subsample for chroma

what people usually plot is PSNR vs. bpp, which looks like this :

jpegs psnr vs bpp :

these psnr vs bpp graphs are total shit. I can't see anything. The area that you actually care about is around 0.5 - 1.5 bpp, and it's all crammed together. In particular it's impossible to tell if jpeg vs jpegflat is better, and I have no intuition for what the numbers mean. Please stop making these graphs right now.

(NOTE : bpp means bits per *pixel* not bits per byte; eg. uncompressed is 24 bpp)

IMO rmse is better than PSNR because it's more intuitive. 1 level of rmse is one pixel value, it's intuitive. Unfortunately, the plot is only slightly better :

jpegs rmse vs bpp :

What we want obviously is to expand the area around 1 bpp. The obvious thing should occur to - our bpp scale is wrong. In particular, what we really care about are doublings of bpp - eg. 0.25 bpp, 0.5 bpp, 1.0 bpp, 2.0 bpp - the step from 0.25 to 0.50 is about the same importance as the step from 4.0 bpp to 8.0 bpp. Obviously we should use a log scale. A similar argument applies to rmse, so we have :

jpegs log rmse vs log bpp :

which is much clearer. It also is amazingly linear through the "money zone" of -1 to 1 (0.5 to 2.0 bpp) which is where JPEG performs well.

BTW note of course PSNR is a log scale as well, it's just log rmse flipped upside down and then offset all weird by some constant. I don't like it as much, but PSNR vs. log bpp is okay :

PSNR vs log bpp :

Conclusion :

Plots vs log bpp are the way to go.

If you are showing L2 errors for JPEG you need to be using a flat quantization matrix with no subsampling of chroma. (note that I didn't optimize the relative scaling of the planes, which might change the results or improve them further).

Next post I'll move on to some perceptual measurements.

(BTW the JPEG numbers posted here are all with PAQ ; see later posts for full details)

10/11/2010

10-11-10 - FFmpeg success

I've never gotten ffmpeg to work because I could never find a win32 build with everything integrated in it properly (for the win32 build to work easily you need to statically build in mingw, libavcodec, etc.). Anyway, I finally found one :

Automated FFmpeg Builds at arrozcru

And in particular it's got a mess of x264 presets included so you don't have to try to figure out that command line. In particular I can now do :


echo USAGE : [in] [speed] [out]
echo speed = medium,slow,fast,slower,etc.
%FFMPEG%\bin\ffmpeg -i %1 -threads 0 -acodec copy -vcodec libx264 -fpre %FFMPEG%\presets\libx264-%2.ffpreset %3

(my latest fucking video Babel problem is that god damn Youtube doesn't offer mp4 options for many videos anymore, they are all FLV, and my HTPC can't play FLV, so I have to convert youtubes friggle fracking frizzum). (BTW you can also change containers sometimes by using -sameq and not specifying anything else)

Some more reference :

ffmpeg.org FFmpeg FAQ
ffmpeg.org FFmpeg Documentation
VideoAudio Encoding Cheat Sheet for FFmpeg
Links - FFmpeg on Windows
FFmpeg x264 encoding guide robert.swain

BTW any time you are trying a format change pathway it's a good idea to check sync with something like this video

ADDENDUM : Well, as I predicted the Unicode Fuckitude on Windows command lines is coming true. FFmpeg won't open some videos with weird characters in file names, and Youtube in fact has a bunch of videos with weird chars in their names. The simplest solution is to run my Deunicode on your download directory periodically. Deunicode actually still allows 8-bit characters, I might make a -ascii option to restrict it even further to just 7 bit ascii. (8 bit characters work consistently on Windows but still have the annoying attribute that they may display differently on the CLI vs in Explorer).

BTW the problem is not really FFMpeg , it's the fucked up Windows Console CP thing. ( see previous ). The only way to make command line apps that work with unicode file names in Windows is to do the elaborate "GetUnicodeFileNameFromMatch" thing in cblib that I do. So far as I know I'm the only person in the whole world who actually does this (I guess not that many people actually use the command line in windows any more).

old rants