cbloom rants: 08/2009

8/27/2009

08-27-09 - Oodle Image Compression Looking Back Pictures

I thought for the record I should put up some pictures about what I talked about last time.

First of all the R/D trellis quantization issue. Very roughly what we're doing here is coding to a certain bit rate. The "RDO" lets us use a smaller quantization bucket size, which initially lowers distrortion and increases our rate, but then we hammer on some of the values - mainly we just force them to zero, which causes some distortion and decreases rate; we choose to hammer the values that save us the most rate per distortion. (99% of the time all you're doing is turning 1's into 0's, so it's a matter of picking the 1 to squash to 0 which saves you most the rate).

Here are the results on "Moses" at 0.5 bits per pixel :

No R/D : RMSE = 9.9381 :

Unconstrained R/D : RMSE = 9.7843 :

You should be able to see in the R/D image that some of the image looks better, but other parts look much worse. The RDO has stolen rate from places where it was expensive in terms of rate to encode a certain distortion, and moved those bits to parts of the image where you can get more distortion win at a cheaper rate. This is awesome if your goal is to minimize RMSE, but it's unclear to me whether this is *ever* good perceptually.

In this particular case, the RDO Moses image actually has a worse SSIM than the No-RD image; this type of mistake is actually something that SSIM is okay at detecting.

In practice I use some hacks to limit how much the RDO can do to any one block. With those hacks I almost always get an SSIM improvement from RDO, but it's still unclear to me whether or not it's actually a perceptual improvement on many images (in some cases it's a very clear win; images like kodim09 or kodim20 where you have big flat patches in some spots and then a lot of edge detail in other spots, the RDO does a good job of stealing from the flats to give to the edges, which the eye likes, because we don't mind it if an almost perfectly smooth area becomes perfectly smooth).

Now for the hacky perceptual smooth DC issue.

This is "kodim04" at 0.25 bpp ; no RDO ; no unblock , no perceptual DC quantization ; basically a naive DCT coder :

Now we turn on the hacky perceptual quantization that gives more precision to smooth DC's : (unblock still off) :

Note that the perceptual quant of DC means that we are using more of our bitrate for the DC band, so we give less bits to AC, which means using a larger quantizer for AC to match the bit rate constraint.

Now with unblocking , no perceptual DC quant : (RMSE = 12.8565 , SSIM = 58.62%)

With unblocking and perceptual DC quant : (RMSE = 12.9666, SSIM = 57.88%)

I think the improvement is clearest on the unblocked images - the perceptual DC quant one actually looks okay, the parts that are supposed to be smooth still look smooth. The one with uniform DC quant looks disgustingly bumpy. Note that the SSIM of the better image is actually quite a bit worse. Of course RMSE gets worse any time you do a perceptual improvement. You should also be able to see that the detail in the hat thatching is better in the nonperceptual version, but that doesn't bother the eye nearly as much as breaking smoothness.

ADDENDUM : some close up pictures of Moses' waddle area showing the R/D artifacts better. You should zoom these to full screen with a box filter and toggle between them to see most clearly. You should see the RDO killing blocks in the collar area very clearly. All you really need to do is look at the last picture of these four and you should be able to see what I'm talking about with the RDO :

Portion of Moses at 0.75 bpp : No lagrange optimization :

With Lagrange RDO :

Crop of No-L :

Crop of RDO :

8/25/2009

08-25-09 - Oodle Image Compression Looking Back

I did a little image compressor for RAD/Oodle. The goal was to make something with quality comparable to a good modern wavelet coder, but using a block-based scheme so that it's more compact and simple in memory use so that it will be easy to stream through the SPU and SIMD and all that good stuff, we also wanted an internal floating point core algorithm so that it extends to HDR and arbitrary bit depths. I wrote about it before, see : here or here . That's been done for a while but there were some interesting bits I never wrote about so I thought I'd note them quickly :

1. I did lagrange R-D optimization to do "trellis quantization" (see previous ). There are some nasty things about this though, and it's actually turned off by default. It usually gives you a pretty nice win in terms of RMSE (because it's measuring "D" (distortion) in terms of MSE, so by design it optimizes that for a given rate), but I find in practice that it actually hurts perceptual quality pretty often. By "perceptual" here I just mean my own eyeballing (because as I'll mention later, I found SSIM to be pretty worthless). The problem is that the R-D trellis optimization is basically taking coefficients and slamming them to zero where the distortion cost of doing that is not worth the rate it would take to have that coefficient. In practice what this does is take individual blocks and makes them very smooth. In some cases that's great, because it lets you put more bits where they're more important (for example on images of just human faces it works great because it takes bits away from the interior patches of skin and gives those bits to the edges and eyes and such).

One of the test images I use is the super high res PNG "moses.png" that I found here . Moses is wearing a herring bone jacket. At low bit rates with R-D Trellis enabled, what happens is the coder just starts tossing out entire blocks in the jacket because they are so expensive in terms of rate. The problem with that is it's not uniform. Perceptually the block that gets killed stands out very strongly and looks awful.

Obviously this could be fixed by using a better measure of "D" in the R-D optimization. This is almost a mantra for me : when you design a very aggressive optimizer and just let it run, you better be damn sure you are rating the target criterion correctly, or it will search off into unexpected spaces and give you bad results (even though they optimize exactly the rating that you told it to optimize).

2. It seems DCT-based coders are better than wavelets on very noisy images (eg. images with film grain, or just images with lots of energy in high frequency, such as images of grasses, etc). This might not be true with fancy shape-adaptive wavelets and such, but with normal wavelets the "prior" model is that the image has most of its energy in the smooth bands, and has important high frequency detail only in isolated areas like edges. When you run a wavelet coder at low bit rate, the result is a very smoothed looking version of the image. That's good in most cases, but on the "noisy" class of images, a good modern wavelet coder will actually look worse than JPEG. The reason (I'm guessing) is that DCT coders have those high frequency pattern basis functions. It might get the detail wrong, but at least there's still detail.

In some cases it makes a big difference to specifically inject noise in the decoder. One way to do this is to do a noisey restore of the quantization buckets. That is, coefficient J with quantizer Q would normally restore to Q*J. Instead we restore to something random in the range [ Q*(J-0.5) , Q*(J+0.5) ]. This ensures that the noisey output would re-encode to the same bit stream the decoder saw. I wound up not using this method for various reasons, instead I optionally inject noise directly in image space, using a simplified model of film grain noise. The noise magnitude can be manually specified by the user, or you can have the encoder measure how noisey the original is and compare to the baseline decoder output and see how much energy we lost, and have the noise injector restore that noise level.

To really do this in a rigorous and sophisticated way you should really have location-variable noise levels, or even context-adaptive noise levels. For example, an image of a smooth sphere on a background of static should detect the local neighborhood and only add noise on the staticy background. Exploring this kind of development is very difficult because any noise injection hurts RMSE a lot, and developing new algorithms without any metric to rate them is a fool's errand. I find that in some cases reintroducing noise clearly looks better to my eye, but there's no good metric that captures that.

3. As I mentioned in the earlier posts, lapping just seems to not be the win. A good post process unblocking filter gives you all the win of lapping without the penalties. Another thing I noticed for the first time is that the JPEG perceptual quantization matrix actually has a built-in bias against blocking artifacts. The key thing is that the AC10 and AC01 (the simplest horizontal and vertical ramps) are quantized *less* than the DC. That guarantees that if you have two adjacent blocks in a smooth gradient area, if the DC's quantize to being one step apart, then you will have at least one step of AC10 linear ramp to bridge between them.

If you don't use the funny JPEG perceptual quantization matrix (which I don't think you should) then a good unblocking filter is crucial at low bit rate. The unblocking filter was probably the single biggest perceptual improvement in the entire codec.

4. I also somewhat randomly found a tiny trick that's a huge improvement. We've long noticed that at high quantization you get this really nasty chroma drift problem. The problem occurs when you have adjacent blocks with very similar colors, but not quite the same, and they sit on different sides of quantization boundary, so one block shifts down and the neighbor shifts up. For example with Quantizer = 100 you might have two neighbors with values {49, 51} and they quantize to {0,1} which restores to {0,100} and becomes a huge step. This is just what quantization does, but when you apply quantization separately to the channels of a color (RGB or YUV or whatever), when one of the channels shifts like that, it causes a hue rotation. So rather than seeing a stair step, what you see is that a neighboring block has become a different color.

Now there are a lot of ideas you might have about how to address this. To really attack it thoroughly, you would need a stronger perceptual error metric, in particular one which can measure non-local patterns, which is something we don't have. The ideal perceptual error metric needs to be able to pick up on things like "this is a smooth gradient patch in the source, and the destination has a block that stands out from the others".

Instead we came up with just a simple hack that works amazingly well. Basically what we do is adaptively resize the quantization of the DC component, so that when you are in a smooth region ("smooth" meaning neighboring block DC's are similar to each other), then we use finer quantization bucket sizes. This lets you more accurately represent smooth gradients and avoid the chroma shift. Obviously this hurts RMSE so it's hard to measure the behavior analytically, but it looks amazingly much better to our eyes.

Of course while this is an exciting discovery it's also terrifying. It reminded me how bad our image quality metrics are, and the fact that we're optimizing for these broken metrics means we are making broken algorithms. There's a whole plethora of possible things you could do along this vein - various types of adaptive quantizer sizes, maybe log quantizers? maybe more coarse quantizers in noisy parts of the image? it's impossible to explore those ideas because we have no way to rate them.

As I mentioned previously, this experiment also convinced me that SSIM is just worthless. I know in the SSIM papers they show examples where it is slightly better than RMSE at telling which image is higher quality, but in practice within the context of a DCT-based image coder I find it almost never differs from RMSE; that is, if you do something like R-D optimized quantization of DCT coefficients with Distortion measured by RMSE, you will produce an image that has almost exactly the same SSIM as if you did R-D with D measured by SSIM. If RMSE and SSIM were significantly different, that would not be the case. I say this within the context of DCT-based image coding because obviously RMSE and SSIM can disagree a lot, but that axis of freedom is not explored by DCT image coders. The main thing is that SSIM is really not measuring anything important visual at all. A real visual metric needs to use global/neighborhood information, and knowledge of shapes and what is important about the image. For example, changing a pixel that's part of a perfect edge is way more important than changing an image that's in some noise. Changing a block from grey to pink is way worse than changing a block from green to blue-green, even if it's a smaller value change. etc. etc.

It seems to me there could very easily be massive improvements possible in perceptual quality without any complexity increase that we just can't get because we can't measure it.

8/05/2009

08-05-09 - Relacy License Notes

The lock-free code I posted with Relacy has a clarification to the license agreement added. If you have downloaded this please read and make sure you are in compliance. I've copied the added text here :

ADDENDUM ON RELACY LICENSE : (revised 9-14-09)

Relacy is now released under the BSD license :


    1 /*  Relacy Race Detector
    2  *  Copyright (c) 2009, Dmitry S. Vyukov
    3  *  All rights reserved.
    4  *  Redistribution and use in source and binary forms, with or without modification,
    5  *  are permitted provided that the following conditions are met:
    6  *    - Redistributions of source code must retain the above copyright notice,
    7  *      this list of conditions and the following disclaimer.
    8  *    - Redistributions in binary form must reproduce the above copyright notice, this list of conditions
    9  *      and the following disclaimer in the documentation and/or other materials provided with the distribution.
   10  *    - The name of the owner may not be used to endorse or promote products derived from this software
   11  *      without specific prior written permission.
   12  *  THIS SOFTWARE IS PROVIDED BY THE OWNER "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
   13  *  THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
   14  *  IN NO EVENT SHALL THE OWNER BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
   15  *  OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
   16  *  DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
   17  *  STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
   18  *  EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
   19  */

My work with Relacy is 100% free for any use. However, the original Relacy license still applies to all work product made with Relacy, such as my code above.

The version of Relacy that I built my code with was released under a previous less clear and restrictive license. Dmitry says the new BSD license applies.

8/04/2009

08-04-09 - CINIT

Two questions I can't find answers to :

1. Is there a way to tell from a piece of code that you are being called from cinit ? eg. in C++ when a constructor causes some code to run, and that calls some function, and then I get called, is there anything I can check to see that I'm currently in cinit, not main?

(obviously a very evil thing I could do is run a stack trace and see what's at the top of the stack). I can't find anything in C that I can check, because the C stdlib is initialized before me, so to my cinit code it looks just like I'm in the app run.

The reason I want this is mainly for asserting & validation - I want to make sure that my own cinit code isn't calling certain things (such as memory allocation) so I want to put in checks like ASSERT( ! in_cinit() );

2. Is there a way to disallow cinit code in certain modules? For example, having cinit stuff in any library is very unsafe because you have to be wary that your OBJ could get dropped from the link and the cinit stuff will not be run, plus you have order of run issues, it's something I can handle fine in my own projects, but not something I want to force on clients. So I want to make sure that my actual deliverable libraries have no cinit stuff - but I do want cinit stuff in my test apps, so I don't want to just break it entirely. I'd really like a compiler error so I know I did a booboo right when I write it.