10/18/2010

10-18-10 - How to make a Perceptual Database

Well, I'm vaguely considering making my own perceptual test database since you know it's been like 20 years since it was obvious we needed this and nobody's done. I'm going to brainstorm randomly about how it should be.

EDIT : hmm whaddaya know, I found one.

You gather something like 20 high quality images of different characteristics. The number of base images you should use depends on how many tests you can run - if you won't get a lot of tests don't use too many base images.

Create a bunch of distorted images in various ways. For each base image, you want to make something like 100 of these. You want distortions at something like 8 gross quality levels (something like "bit rates" 0.125 - 2.0 in log scale), and then a variety of distortions that look different at each quality level.

How you make these exactly is a question. You could of course run various compressors to make them, but that has some weird bias built in as testers are familiar with those compressors and their artifacts, so may have predispositions about how they view them. It might be valuable to also make some synthetically distorted images. Another idea would be to run images through multiple compressors. You could use some old/obscure compressors like fractals or VQ. One nice general way to make distortions is to fiddle with the coefficients in the transformed space, or to use transforms with synthesis filters that don't match the analysis (non-inverting transforms).

The test image resolution should be small enough that you can display two side by side without scaling. I propose that a good choice is 960 x 1080 , since then you can fit two side by side at 1920x1080 , which I believe is common enough that you can get a decent sample size. 960 divides by 64 evently but 1080 is actually kind of gross (it only divides up to 8), so 960x1024 might be better, or 960x960. That is annoyingly small for a modern image test, but I don't see a way around that.

There are a few different types of test possible :

Distorted pair testing :

The most basic test would be to show two distorted images side by side and say "which looks better". Simply have the use click one or the other, then show another pair. This lets testers go through a lot of images very quickly which will make the test data set larger. Obviously you randomize which images you show on the left or right.

To pick two distorted images to show which are useful to test against, you would choose two images which are roughly the same quality under some analytic metric such as MS-SSIM-SCIELAB. This maximizing the amount of information you are getting out of each test, because when you put up images where one is obviously better than another you aren't learning anything (* - but this a useful way to test the user for sanity, occasionally put up some image pairs that are purely randomly chosen, that way you get some comparisons where you know the answer and can test the viewer).

Single image no reference testing :

You just show a single distorted image and ask the viewer to rate its "quality" on a scale of 0-10. The original image is not shown.

Image toggle testing :

The distorted and original image are shown on top of each other and toggled automatically at N second intervals. The user rates it on a scale of 0-10.

Double image toggle testing :

Two different distorted images are chosen as in "distorted pair testing". Both are toggled against the original image. The user selects which one is better.

When somebody does the test, you want to record their IP or something so that you make sure the same person isn't doing it too many times, and to be able to associate all the numbers with one identity so that you can throw them out if they seem unreliable.

It seems like this should be easy to set up with the most minimal bit of web programming. You have to be able to host a lot of bandwidth because the images have to be uncompressed (PNG), and you have to provide the full set for download so people can learn from it.

Obviously once you have this data you try to make a synthetic measure that reproduces it. The binary "this is better than this" tests are easier to deal with than the numeric (0-10) ones - you can directly test against them. With the numeric tests you have to control for the bias of the rating on each image and the bias from each user (this is a lot like the Netflix Prize actually, you can see also papers on that).

10/16/2010

10-16-10 - Image Comparison Part 11 - Some Notes on the Tests

Let's stop for a second and talk about the tests we've been running. First of all, as noted in a comment, these are not the final tests. This is the preliminary round for me to work out my pipeline and make sure I'm running the compressors right, and to reduce the competitors to a smaller set. I'll run a final round on a large set of images and post a graph for each image.

What is this MS-SSIM-SCIELAB exactly? Is it a good perceptual metric?

SCIELAB (see 1 and 2 ) is a color transform that is "perceptually uniform" , eg. a delta of 1 has the same human visual importance at all locations in the color cube, and is also spatially filtered to account for the difference in human chroma spatial resolution. The filter is done in the opponent-color domain, and basically the luma gets a sharp filter and the chroma gets a wide filter. Follow the blog posts for more details.

SCIELAB is pretty good at accounting for one particular perceptual factor of error measurement - the difference in importance of luma vs. chroma and the difference in spatial resolution of luma vs. chroma. But it doesn't account for lots of other perceptual factors. Because of that, I recognize that using SCIELAB is somewhat distorting in the test results. What it does is give a bonus to compressors that are perceptually tuned for this particular issue, and doen't care about other issues. More on this later. (*1) (ADDENDUM : of course using SCIELAB also gives an advantage to people who use a colorspace similar to LAB, such as old JPEG YUV, and penalize people who use YCoCg which is not so close. Whether or not you consider this to be fair depends on how accurate you believe LAB to be as an approximation of how the eye sees).

MS-SSIM is multi-scale SSIM and I use it on the SCIELAB data. There are a few issues about this which are problematic.

How do you do SSIM for multi-component data?

It's not that clear. You can just do the SSIM on each component and then combine - probably with an arithmetic mean, but if you look at the SSIM a bit you might get other ideas. The basic factor in SSIM is a term like :


ss = 2 * x*y / (x*x + y*y )

If x and y are the same, this is 1.0 , the more different they are, the smaller it is. In fact this is a normalized dot product or a "triangle metric". That is :

ss = ( (x+y)^2 - x*x - y*y ) / (x*x + y*y )

if you pretend x and y are the two short edges of a triangle, this is the length of the long edge between them minus the length squared of each edge.

Now, this has just been scalar, but to go to multi-component SSIM you could easily imagine the right thing to do is to go to vector ops :


ss = 2 * Dot(x,y) / ( LenSqr(x) + LenSqr(y) )

That might in fact be a good way to do n-component SSIM, it's impossible to say without doing human rating tests to see if it's better or not.

Now while we're at it let's make a note on this basic piece of the SSIM.


ss = 2 * x*y / (x*x + y*y ) = 1 - (x-y)^2 /  ( x*x + y*y )

we can see that all its done is take the normal L2 MSE term , and scale it by the inverse magnitude of the values. The reason they do this is to make SSIM "scale independent" , that is if you replace x and y with sx and sy you get the same number out for SSIM. But in fact what it does is make errors in low values much more important than errors in high values.


ssim :

    delta from 1->3

    2*1*3 / ( 1*1 + 3*3 ) = 6 / 10 = 0.6

    delta from 250->252

    2*250*252 / ( 250*250 + 252*252 ) = 0.999968

Now certainly it is true that a delta of 2 at value 1 is more important than a delta of 2 at value 250 (so called "luma masking") - but is it really *this* much more important? In terms of error (1 - ssim), the different is 0.40 vs. 0.000032 , or 1250000 % greater. I think not. My conjecture is that this scale-independence aspect of SSIM is wrong, it's over-counting low value errors vs. high-value errors. (ADDENDUM : I should note that real SSIM implementations have a hacky constant term added to the numerator and denominator which reduce this exaggeration)

As usual in the SSIM papers they show that SSIM is better at detective true visual quality than a straw man opponent - pure RMSE. But what is in SSIM ? It's plain old MSE with a scaling to make low values count more, and it's got a local "detail" detection term in the form of block sdevs. So if you're going to make a fair comparison, you should test against something similar. You could easily do RMSE on scaled values (perhaps log-scale values, or simple sqrt-values) to make low value errors count more, and you could easily add a detail preservation term by measuring local activity and adding an RMSE-like term for that.

What's the point of even looking at the RMSE numbers? If we just care about perceptual quality, why not just post that?

Well, a few reasons. One, as noted previously, we don't completely trust our perceptual metric, so having the RMSE numbers provide a bit of a sanity check fallback for that. For another, it lets us sort of check on the perceptual tuning of the compressor. For example if we find something that does very well on RGB-RMSE but badly on the perceptual metric, that tells us that it has not been perceptually tuned; it might actually be an okay compressor if it has good RMSE results. Having multiple metrics and multiple bit rates and multiple test images sort of let you peer into the function of the compressor a bit.

What's the point of this whole process? Well there are a few purposes for me.

One is to work out a simple reproducable pipeline in which I can test my own compressors and get a better idea of whether they are reasonably competitive. You can't just compare against other people's published results because so many of the test are done on bad data, or without enough details to be reproducable. I'd also like to find a more perceptual metric that I can use.

Another reason is for me to actually test a lot of the claims that people bandy about without much support, like is H264-Intra really a very good still image coder? Is JPEG2000 really a lame duck that's not worth bothering with? Is JPEG woefully old and easy to beat? The answers to those and other questions are not so clear to me.

Finally, hopefully I will set up an easy to reproduce test method so that anyone at home can make these results, and then hopefully we will see other people around the web doing more responsible testing. Not bloody likely, I know, but you have to try.

(*1) : you can see this for example in the x264 -stillimage results, where they are targetting "perceptual error" in a way that I don't measure. There may be compressors for example which are successfully targetting some types of perceptual error and not targetting the color-perception issue, and I am unfairly showing them to be very poor.

However, just because this perceptual metric is not perfect doesn't mean we should just give up and use RMSE. You have to use the best thing you have available at the time.

Generally there are two classes of perceptual error which I am going to just brazenly coin terms for right now : occular and cognitive.

The old JPEG/ SCIELAB / DCTune type perceptual error optimization is pretty much all occular. That is, they are involved in studying the spatial resolution of rods vs. cones, the occular nerve signal masking of high contrast impulses, the thresholds of visibility of various DCT shapes, etc. It's sort of a raw measure of how the optical signal gets to the brain.

These days we are more interested in the "cognitive" issues. This is more about things like "this looks smudgey" or "there's ringing here" or "this straight line became non-straight" or "this human face got scrambled". It's more about the things that the active brain focuses on and notices in an image. If you have a good model for cognitive perception, you can actually make an image that is really screwed up in an absolute "occular" sense, but the brain will still say "looks good".

The nice thing about the occular perceptual optimization is that we can define it exactly and go study it and come up with a bunch of numbers. The cognitive stuff is way more fuzzy and hard to study and put into specific metrics and values.

Some not very related random links :

Perceptual Image Difference Utility
New cjpeg features
NASA Vision Group - Publications
JPEG 2000 Image Codecs Comparison
IJG swings again, and misses Hardwarebug
How-To Extract images from a video file using FFmpeg - Stream #0
Goldfishy Comparison WebP, JPEG and JPEG XR
DCTune 2.0 README

A brief note on this WebP vs. JPEG test :

Real world analysis of google�s webp versus jpg English Hard

First of all he uses a broken JPEG compressor (which is then later fixed). Second he's showing these huge images way scaled down, you have to dig around for a link to find them in their native sizes. He's using old JPEG-Huff without a post-unblock ; okay, that's fine if you want to compare against ancient JPEG, but you could easily test against JPEG-Arith with unblocking. But the real problem is the file sizes he compresses to. They're around 0.10 - 0.15 bpp ; to get the JPEGs down to that size he had to set "quality" to something like 15. That is way outside of the functional range of JPEG. It's abusing the format - the images are actually huge, then compressed down to a tiny number of bits, and then scaled down to display.

Despite that, it does demonstrate a case where WebP is definitely significantly better - smooth gradients with occasional edges. If WebP is competitive with JPEG on photographs but beats it solidly on digital images, that is a reasonable argument for its superiority.

10-16-10 - Image Comparison Part 9 - Kakadu JPEG2000

Kakadu JPEG2000 (v6.4) can be tuned for visual quality or for MSE (-no_weights) , so we run both :

my_soup :

Performance in general is excellent, and we can see that they did a good job with their visual tuning (according to this metric anyway). KakaduMSE is slightly worse that jpeg_paq through the [-1,1] zone, but the visually tuned one is significantly better.

moses :

Moses is one of those difficult "noisey / texturey" type of images (like "barb") that people historically say is bad for wavelets, and indeed that seems to be the case. While Kakadu still stomps on JPEG, it's not by nearly as much as on my_soup.

The old MSU test says that ACDSee and Lurawave are better than Kakadu (v4.5) so maybe I'll try those, but they're both annoyingly commercial.

10-16-10 - Image Comparison Part 10 - x264 Retry

Well I've had a little bit more success.

I still can't get x264 to do full range successfully; or maybe I did, but then I can't figure out how to make the decoder respect it.

I think the thing to do is make an AVISynth script containing something like :


AviSource("r:\my_soup.avi")
ConvertToYV12( matrix="pc.709")

The pc YV12's are supposed to be the full 0-255 ones, and then on the x264 command like you also do "--fullrange on --colormatrix bt709" , which are just info tags put into the stream, which theoretically the decoder is supposed to see so that it can do the inverse colorspace transform correctly, but that doesn't seem to be working. Sigh!

One difficulty I have is that a lot of programs don't handle these one frame videos right. MPlayer refuses to extract any frames from it, Media Player Classic won't show me the one frame. FFmpeg does succeed in outputting the one frame, so its what I'm using to decode right now.

Anyway these are the some of the links that don't actually provide an answer :

Convert - Avisynth
Re FFmpeg-user - ffmpeg & final cut pro best format question
new x264 feature VUI parameters - Doom9's Forum
MPlayer(1) manual page
Mark's video filters
libav-user - Conversion yuvj420P to RGB24
H.264AVC intra coding and JPEG 2000 comparison
H.264 I-frames for still images [Archive] - Doom9's Forum
FFmpeg-user - ffmpeg & final cut pro best format question
FFmpeg-user - Converting high-quality raw video to high-quality DVD video
FFmpeg-devel - MJPG decoder picture quality
FFmpeg libswscaleoptions.c Source File
YCbCr - Wikipedia, the free encyclopedia

log rmse :

ms-ssim-scielab :

There's still a large constant error you can see in the RMSE graph that I believe is due to the [16-235] problem.

It should not be surprising that --tune stillimage actually hurts in both our measures, because it is tuning for "psy" quality in ways that we don't measure here. In theory it is actually the best looking of the three.

NOTE : This is counting the sizes of the .x264 output including all headers, which are rather large.


1:

call x264 -o r:\t.mkv r:\my_soup.avs --preset veryslow --tune psnr %*
call ffmpeg -i t.mkv -sws_flags +bicubic+accurate_rnd+full_chroma_int -vcodec png tf.png

2:

call ffmpeg -sws_flags +bicubic+accurate_rnd+full_chroma_int+full_chroma_inp -i my_soup.avi -vcodec libx264 -fpre c:\progs\video_tools\ffmpeg-latest\presets\libx264-lossless_slow.ffpreset r:\t.mkv
call ffmpeg -sws_flags +bicubic+accurate_rnd+full_chroma_int+full_chroma_inp -i r:\t.mkv -vcodec png r:\tf.png

I think I'll just write my own Y4M converter, since having my own direct Y4M in/out would be useful for me outside of this stupid test.

ADDENDUM : well I did.

added to the chart now is x264 with my own y4m converter :

I actually was most of the way there with the improved ffmpeg software scaler flags. I was missing the main issue - the reason it gets so much worse than our jpegfnspaq line at high bit rate is because "fns" is short for "flat no sub" and the no sub is what gets you - all subsampled codecs get much worse than non-subsampled codecs at high bit rates.

Even using my own Y4M converter is a monstrous fucking pain in the ass, because god damn FFMPEG won't just pass through the YUVJ420P data raw from x264 out to a Y4M stream - it prints a pointless error and refuses to do it. That needs to be fixed god damn it. The workaround is to make it output to "rawvideo" with yuvj420p data, and then load that same raw video but just tell it it has yuv420p data in it to get it to write the y4m. So my test bat for x264 is now :


call dele r:\t.*
call dele r:\ttt.*
c:\src\y4m\x64\release\y4m.exe r:\my_soup.bmp r:\t.y4m
r:\x264.exe -o r:\t.mkv r:\t.y4m --fullrange on --preset veryslow --tune psnr %*
call ffmpeg.bat -i r:\t.mkv -f rawvideo r:\ttt.raw
call ffmpeg.bat -pix_fmt yuv420p -s 1920x1200 -f rawvideo -i r:\ttt.raw r:\ttt.y4m
c:\src\y4m\x64\release\y4m.exe r:\ttt.y4m r:\ttt.bmp
namebysize r:\ttt.bmp r:\t.mkv r:\xx_ .bmp

The big annoyance is that I have the resolution hard-coded in there to make rawvideo work, so I can't just run it on arbitrary images.

FYI my_y4m is currently just doing PC.601 "YUV" which is the JPEG YCbCr. I might add support for all the matrices so that it can be a fully functional y4m converter.

I was going to use the Y4M reader/write code from MJPEGTOOLS , but it looks like it's fucking GPL which is a cancerous toxic license, so I can't use it. (I don't actually mind having to release my source code, but it makes my code infected by GPL, which then makes my code unusable by 90% of the world).

10/15/2010

10-15-10 - Image Comparison Part 8 - Hipix

Hipix is a commercial lossy image tool. I hear it is based on H264 Intra so I wanted to see if it was a better version of that idea (x264 and AIC both having let me down).

Well, at this point we should be unsurprised that it sucks balls :

log rmse :

ms-ssim-scielab :

One note : the rightmost data point from hipix is their "perfect" setting, which is very far from perfect. It's only a little over 2 bpp and the quality is shit. I feel bad for any sucker customers who are saving images as hipix "perfect" and thinking they are getting good quality.

I started to think , man maybe my ms-ssim-scielab is just way off the mark? How can everyone be so bad? Any time your test is telling you things that are hard to believe, you need to reevaluate your test. So I went and looked at the images with my own eyes.

Yep, hipix is awful. JPEG just blows it away.

A sample from the hipix image closest to the 0 on the x axis , and a JPEG of the same size : (HiPix is 230244 bytes, JPEG is 230794 bytes)

JPEG :

HiPix :

Note the complete destruction of the wood grain detail in the hipix, as well as introduction of blockiness and weird smudge shapes. Note the destruction of detail in the plate rim, and the ruining of the straight line edge of the black bowl.

BTW when you are evaluating perceptual quality, you should *NOT* zoom in! JPEG is optimized for human visual system artifact perceptibility at the given scale of the image. JPEG intentionally allows nasty artifacts that look bad when you zoom in, but not when you look at the image in its normal size.

Conclusion : Hipix needs to immediately release a "new and improved HiPix v2.0 that's way better than the last!" by just replacing it with JPEG.

Since they don't offer a command line app I won't be testing this on any more images.

ADDENDUM : Well I ran two points on Moses :

The two points are "High" and "Perfect" and perfect is way not perfect.

10-15-10 - Image Comparison Part 7 - WebP

I thought I wasn't going to be able to do this test, because damn Google has only released webpconv for Linux (or you know, if you download Cygwin and built yourself WTFBBQ). But I found these :

WebP for .NET

webp.zip solution for VC

... both of which are actually broken. The .NET one just fails myseriously on me. The webp.zip one has some broken Endian stuff, and even if you fix that the BMP input & output is broken. So.. I ripped it out and relaced it with the cblib BMP in/out, and it seems to work.

(I didn't want to use the webp converter in ffmpeg because I've seen past evidence that ffmpeg doesn't do the color conversion and resampling right, and I wanted to use the Google-provided app to make sure that any bad quality was due only to them)

My build of WebP Win32 is here : webp.zip

Here are the results :

log rmse :

ms-ssim-scielab :

Now I am surprised right off the bat that the ms-ssim-scielab results are not terrible, but the rmse is not very good. I've read rumors in a few places that On2 tweaked WebP/WebM for RMSE/PSNR , so I expected different.

Looking at the RMSE curve it's clear that there is a bad color conversion going on. Either too much loss in the color convert, or bad downsample code, something like that. Any time there is a broken base color space, you will see the whole error curve is a bit flatter than it should be and offset upwards in error.

The perceptual numbers are slightly worse than jpeg-huff through the "money zone" of -1 to 1. Like all modern coders it does have a flatter tail so wins at very low bit rate.

(BTW I think JPEG's shortcoming at very low bit rate is due to its very primitive DC coding, and lack of deblocking filter, but I'm not sure).

BTW I also found this pretty nice Goldfishy WebP comparison

Also some pretty good links on JPEG that I've stumbled on in the last few days :
jpeg wizard.txt
ImpulseAdventure - JPEG Quality Comparison
ImpulseAdventure - JPEG Quality and Quantization Tables for Digital Cameras, Photoshop
ImpulseAdventure - JPEG Compression and JPEG Quality

Here's how the WebP test was done :


webp_test.bat :
call dele s:\*.bmp
call dele s:\*.webp
Release\webp -output_dir s:\ -format webp -quality %1 r:\my_soup.bmp
Release\webp -output_dir s:\ -format bmp s:\my_soup.webp
namebysize s:\my_soup.bmp s:\my_soup.webp s:\webp_ .bmp
call mov s:\webp_*.bmp r:\webp_test\



md r:\webp_test
call dele r:\webp_test\*
call webp_test 5
call webp_test 10
call webp_test 15
call webp_test 20
call webp_test 25
call webp_test 30
call webp_test 40
call webp_test 50
call webp_test 60
call webp_test 65
call webp_test 70
call webp_test 75
call webp_test 80
call webp_test 85
call webp_test 90
call webp_test 95
call webp_test 100
call mov s:\webp_*.bmp r:\webp_test\
imdiff r:\my_soup.bmp r:\webp_test -cwebp_imdiff.csv
transposecsv webp_imdiff.csv webp_trans.csv

BTW the webpconv app is really annoying.

1. It fails out mysteriously in lots of places and just says "error loading" or something without telling you why.

2. It takes an "output_dir" option instead of an output file name. I guess that's nice for some uses, but you need an output file name option for people who are scripting. (you can fix this of course by making your batch rename the input file to "webp_in" or something then you ran rename the output at will)

3. It's got image format loaders for like 10 different formats, but they're all semi-broken. Don't do that. Just load one standard format (BMP is good choice) and support it *well* , eg. be really compliant with variants of the bitstream, and let the user convert into that format using ImageMagick or something like that.

4. It won't write the output files if they already exist, and there's no "force overwrite" option. This one had me absolutely pulling out my hair as I kept running it with different options and the output files stayed the same. (you can fix this of course by making your batch delete the output first)

Despite all this negativity, I actually do think the WebP format might be okay if it had a good encoder.

ADDENDUM : WebP on Moses :

On "my_soup" it looked like WebP was at least close to competitive, but on Moses it really takes itself out of the running.

10-15-10 - Image Comparison Part 6 - cbwave

"cbwave" is my ancient wavelet coder from my wavelet video proof of concept. It's much simpler than JPEG 2000 and not "modern" in any way. But I tacked a bunch of color space options onto it for testing at RAD so I thought that would be interesting to see :

cbwaves various colorspaces :

log rmse :

ms-ssim-scielab :

notes :

RMSE : Obviously no color transform is very bad. Other than that, KLT is surprisingly bad at high bit rate (something I noted in a post long ago). The other color spaces are roughly identical. This coder has the best RMSE behavior of any we've seen yet. This is why wavelets were so exciting when they first came out - this coder is incredibly simple, there's no RDO or optimizing at all, it doesn't do wavelet packets or bit planes, or anything, and yet it beats PAQ-JPEG (on rmse anyway).

MS-SSIM-SCIELAB : and here we see the disappointment of wavelets. The great RMSE behavior doesn't carry over to the perceptual metric. The best color space by far is the old "YUV" from JPEG, which has largely fallen out of favor. But we see that maybe that was foolish.

cbwave also has an option for downsampling chroma, but it's no good - it's just box downsample and box upsample, so these graphs are posted as an example of what bad chroma up/down sampling can do to you : (note that the probem only appears at high bit rates - at low bit rates the bad chroma sampling has almost no effect)

log rmse :

ms-ssim-scielab :

cbwave is a fixed pyramid structure wavelet doing daub97 horizontally and cdf22 vertically; the coder is a value coder (not bitplane) for speed. Some obvious things to improve it : fix the chroma subsample, try optimal weighting of color planes for perceptual quality, try daub97 vertical, try optimal per-image wavelet shapes, wavelet packets, directional wavelets, perceptual RDO, etc.

ASIDE : I'd like to try DLI or ADCTC , but neither of them support color, so I'm afraid they're out.

CAVEAT : again this is just one test image, so don't take too many conclusions about what color space is best.

ADDENDUM : results on "moses.bmp" , a 1600x1600 with difficult texture like "barb" :

Again YUV is definitely best, KLT is definitely worst, and the others are right on top of each other.

10/14/2010

10-14-10 - Xbox 360 vs my HTPC

I was looking at the Xbox 360 sitting on my HTPC, and it makes me really angry. For one thing, I have two perfectly good computers sitting right there that are totally redundant. Why is it so damn hard to play games on the PC?

But more than that, I was thinking my HTPC has got a dual-core AMD chip in it that was like $50, it's got an ATI Dx10 part that was again about $50. It's got a nice stereo-cabinet like case and it's very cool and quiet. I'm pretty sure I could make a console from off the shelf retail parts that would be faster than an Xbox 360 or PS3. Yeah yeah the 360/PS3 are faster in theory if you just count flops or something, but the massive advantage of a proper PC CISC OO core would make my homebrew console faster on most real world code.

Obviously those parts weren't so cheap when the 360 or PS3 were being developed. Also I guess the cost of all the little bits adds up : case, mobo, PSU, DVD drive, hard drive. Still, it's pretty upsetting that our consoles are these awful PowerPC chips with weird GPUs when a proper reasonable computer is so cheap.

10-14-10 - Image Comparison Part 5 - RAD VideoTest

VideoTest is my test video coder for RAD. It's based on "NewDCT" , in fact it has exactly the same DCT core, but it has a sightly better perceptual tuning, and it has a better RDO encoder.

log rmse :

scielab ms-ssim :

videotest vs. newdct is almost identical in rmse, but we did make a nice step up in perceptual measure.

I am finally beating JPEG ari in the perceptual measure, but it's a bit disturbing how much work I had to do! And of course PAQ JPEG still dominates.

The videotest I frame coder has pretty sophisticated RDO, but it's missing a lot of other things that modern coders have, it has no I predictors, no in-frame matches. It uses a little bit of a perceptual D measure for RDO, but not as well tweaked as x264 by a long shot.

videotest currently crashes for high bit rates; I remember I put some stupid fixed size buffer somewhere to get things working one day, and now I forget where it is :( So that's why there are no results for it above 2.0 bpp

10-14-10 - Image Comparison Part 4 - JPEG vs NewDCT

"NewDCT" is a test compressor that I did for RAD. Let's run it through our chart treatment.

For reference, I'll also include plain old JPEG huff , default settings, no PAQ.

log rmse :

scielab ms-ssim :

A few notes :

Note that the blue "jpeg" line is two different jpegs - the top one is jpegflatnosub , optimized for rmse, the bottom one is regular jpeg, optimized for perceptual metric. In contrast "newdct" is the same in both runs, which is semi-perceptual.

The first graph is mainly a demonstration of something terrible that people in literature and all over the net do all the time - they take standard jpeg_huff , which is designed for perceptual quality, and show PSNR/RMSE numbers for it. Obviously JPEG looks really bad when you do that and you say "it's easy to beat" , but you are wrong. It's terrible. Stop it.

In fact in the second graph we see that JPEG's perceptual optimization is so good that even shitty old jpeg_huff is competitive with my newdct above 1.0 bpp . Clearly I still have things to learn from JPEG.

I have no idea what's up with jpeg_paq going off the cliff for small file sizes; it becomes worse than jpeg_ari. Must be a problem in the PAQ jpeg stuff, or maybe an inherent weaknesss in PAQ on very small files that don't give it enough data to learn on.

Note that the three JPEG back ends always give us 3 horiztonal points - they make the same output, only the file sizes are different. (I'm talking about the bottom chart, in the top chart there are two different jpegs and they make different output, as noted previously).

Below 0.50 bpp JPEG does in fact have a problem. All more modern coders will have a straighter R/D line than JPEG does, it starts to slope down very fast. But, images generally look so bad down there that it's rather irrelevant. I noted before that the "money zone" is -1 to 1 in log bpp, that's where images look pretty good and you're getting a good value per bit.

old rants