cbloom rants

12/15/2008

12-15-08 - Denoising

I've been playing around with denoising images a tiny bit. There's a ton of papers on this and I've barely only scratched the surface, but it strikes me that the majority of the researches seem to be doing silly things that are ill-conceived.

Almost all of them work in the same basic way. They create a prediction of what the current pixel should be with a local smooth predictor, let's call that 'S'. Then they take the difference from the actual pixel value 'A'. If the difference is greater than a certain threshold, |S - A| > T , they replace the pixel value with 'S'.

That's just very wrong. It assumes that images can be modeled with a single-lobe Gaussian probability distribution. In fact, images are better modeled as a blend of several lobes with different means. That is, there is not one single smooth generator, but many, which are switched or blended between based on some state.

Any single-lobe predictor will incorrectly identify some valid image detail as noise.

I like to make it clear that the problem has two parts : deciding if a pixel is noise or not noise, and then filling in a replacement value if you decide that the pixel is noise.

My feeling is that the second part is actually not the important or difficult part. Something like a median filter or a bilateral filter is probably an okay way to fill in a value once you decide that a pixel is noise. But the first part is tricky and as I've said any simple weighted linear predictor is no good.

Now, ideally we would have a rich model for the statistical generation of images. But I wrote about that before when talking about Image Doubling (aka Super-Resolution), and we're still very far from that.

In the mean time, the best thing we have at the moment, I believe, is the heuristic modes of something like CALIC, or the Augural Image Zooming paper, or Pyramid Workshop or TMW. Basically these methods have 6 - 100 simple models of local image neighborhoods. For example the basic CALIC models are : {locally smooth}, {vertical edge}, {horizontal edge}, {45 degree edge}, {135 degree edge}, {local digital}, {pattern/texture}. The neighborhood is first classified to one of the heuristic models, and then a prediction is made using that model.

We can thus propose a simple heuristic noise detection algorithm :

```Bayesian Noise Detection :

N = current local neighborhood
A = actual pixel value

P(M|N) = probability of model M given neighborhood N
P(A|M) = probability of pixel A was generated by model M

let

P(A|N) = argmax{M} P(A|M) * P(M|N)

then classify A as noise if

P(A|N) `<` T

for some threshold T

(I don't specify how the P's are normalized because it just changes the scaling of T,
but they should be normalized in the same way for the whole image)

```

Note that a very crucial thing is that we are using the argmax on models, NOT the average on models. What we're saying is that if *any* of our heuristic local models had a high likelihood of generating the value A, then we do not consider it noise. The only values that are noise are ones which are unlikely under *all* models.

In a totally hand-wavey heuristic way, this is just saying that if a pixel is within threshold of being a locally smooth value, or an edge value, or a texture, etc. then it's not noise. If it fits none of those models within threshold, it's noise.

I started by looking at the Median Filter and the Bilateral Filter. There have been some cool recent papers on fast Median Filtering :
Constant Time Median Filter
Weiss Log(r) Median Filter
Fast Bilateral Filter ; Sylvain Paris and Fr�do Durand + good references
Siggraph Course on Bilateral Filtering

Those are all very worth reading even though I don't think it's actually super useful. The fast median filter approaches use cool ways of turning an operation over a sliding window into incremental operations that only process values getting added in and removed as you step the window. Median filter is a weird thing that works surprisingly well actually, but it does create a weird sort of Nagel-drawing type of look, with nothing but smooth gradients and hard edges. In fact it's a pretty good toon-shading process.

BTW the fast median filters seem useless for denoising, since they really only matter for large r (radius of the filter), and for denoising you really only want something like a 5x5 window, at which size a plain brute force median is faster.

Bilateral filter actually sort of magically does some of the heuristic cases that I've talked about it. Basically it makes a prediction using a filter weighted by distance and also value difference. So similar values contribute and disparate values don't. That actually does a sort of "feature selection". That is, if your pixel value is close to other pixel values in a vertical edge, then the bilateral filter will weight strongly on those other similar pixel values and ignore the other off-edge pixels. That's pretty good, and the results are in fact decent, but if you think about it more you see the bilateral filter is just a ghetto approximation of what you really want. Weighting based on pixel value difference is not the right way to blend models, it makes no distinction about the context of that value difference - eg. it doesn't care if the value difference comes from a smooth gradient or a sudden step. As others have noted, the Bilateral Filter makes the image converge towards piecewise-constant, which is obviously wrong. Getting towards piecewise linear would be better, piecewise bicubic would be better still - but even that is just the very easiest beginning of what the heuristic estimator can do.

NL Means is a denoising algorithm which is a bit closer to the right idea; he's definitely aware of the issues. However, the actual NL means method is poor. It relies on closely matching neighborhoods to form good predictions, which anyone who's worked in image compression or super-resolution knows is not a good approach. The problem is there are simply too many possible values in reasonably sized neighborhoods. That is, even for a moderately sized neighborhood like 5x5, you have 2^8^25 possible values = 2^200. No matter how much you train, the space is too sparse. It may seem from the NL Means formulation that you're weighting in various neighbors, but in reality in practice you only find a few neighbors that are reasonably close and they get almost all of the weight, and they might not even be very close. It's like doing K-means with 2^200 possible values - not good.

There's a lot of work on Wavelet Denoising which I haven't really read. There are some obvious appealing aspects of that. With wavelets you can almost turn an image into a sum of {smooth}+{edges}+{texture}+{noise} and then just kill the noise. But I don't really like the idea of working in wavelet domain, because you wind up affecting a lot of pixels. Most of the noise I care about comes from cameras, which means the noise is in fact isolated to 1 or 2 adjacent pixels. I also don't like all the research papers that want to use 9x9 or larger local windows. Real images are very local, their statistics change very fast, and pulling in shit from a 9x9 window is just going to mess them up. IMO a 5x5 window is the reasonable size for typical image resolutions of today.

BTW one thing I've noticed with my camera noise images is that the fucking camera JPEG makes the noise so much harder to remove. The noise looks like it's originally just true single pixel noise, but when it goes through the camera JPEG, that single-pixel peak is really unfriendly to the DCT, so it gets smeared out, and you wind up having noise lumps that look like little DCT shapes. To specifically denoise photos that have gone through JPEG you probably have to work on 8x8 blocks and work directly on the DCT coefficients. (also the Bayer pattern demosaic obviously spreads noise as well; ideally you'd get to work on the raw taps before jpeg, before the camera denoise, and before the Bayer demosaic).

ADDENDUM : a lot of the denoise people seem to be trying to perfect the Playboy Centerfold algorithm, that makes photos look extremely smoothed and airbrushed. Often if you're not sure a pixel is noise it's best to leave it alone. Also, all the methods that use a pixel-magnitude threshold value for noise are wrong. The threshold for noise needs to be context sensitive. That is, in smooth parts of the image, you might be able to say that a pixel is probably noise when it's off from expectation by only 1 or 2 pixel values. In chaotic textures parts of the image, a pixel might be off by 20 values or more and you still might not be able to say it's noise. The correct parameter to expose to the user is a *confidence*. That is, I want to do something like replace all pixels which the algorithm is >= 90% confident it can improve.

Another problem I've seen with the majority of the denoisers is that they create structures from noise. If you run them on just staticy junk, they will form local flat junks, or linear bits or weird feathery patterns. This is because even in random noise there will be little bits that have similar values so they become seeds to create structures. This is very bad, the weird structures that are formed by this "denoising" are much worse than the original static, which is pretty inoffensive to the eye.

Marc sent me the link to GREYCstoration a free denoiser based on the image manifold PDE research. I don't like that this technique is becoming known as "PDE" - PDE just means partial differential equation; in this case it's a diffusion equation, in particular a variant of anisotropic diffusion with different diffusion rates based on the local curvature - eg. it diffuses across smooth areas and not across edges. (that's actually an old basic technique, the new thing he does is the diffusion follows contour lines (but is actually 2d, just weighted) and works on all components). It looks pretty decent. It's actually more exciting to me for super-resolution, it looks like it does a pretty good job of image super-resolution.

3 comments:

Metatron said...

Noise is an artifact of inadequate super-sampling, the best denoiser is allways downscale the image by a factor big X.

Fighting noise is fighting too little camera-shots, too little stochastic renders, too little information.

When NASA made the Ultra Deep Field, they actually sampled each pixel 170 times, not spacially in that case (because it would not have solved the problem) but through repetition and averaging.

Indeed would that be one easy approach to not-inthefirst-noising in digital cameras, if they could just make multiple shots fast enough.

On the topic of do-afterthedamagehasbeendone-denoising I think lossy image compressors and denoising-algorithms are just two sides of the same medal (or model :), obviously any good denoising-algorithm is a good predictor for lossy image compression (fe. neural-networks), and vice versa. Both are and must also be imperfect.

cbloom said...

"Noise is an artifact of inadequate super-sampling, the best denoiser is allways downscale the image by a factor big X."

No, not at all, not by my definition. That certainly is an okay way to reduce noise, but it also removes tons of good information. I want algorithms that preserve as much of the original information as possible. (and in fact that is not even the best way to remove noise even if you are willing to give up lots of information).

Down-sampling with weights on the pixels where the weight is proportional to the confidence in the pixel's correctness is a pretty good denoiser if you are willing to give up some spatial information (eg. to compare a 15 MP noisy camera picture to 10 MP).

"lossy image compressors and denoising- algorithms are just two sides of the same medal"

Yes they are very similar. In general all these things are just trying to create an algorithmic model for the theoretical "image source". However, there are some major differences, as I tried to write about in my post and have also talked about in the past on Super-Resolution.

Part of the key difference comes from how the quality of the result is measured.

If you use normal image compression techniques for super-resolution or denoising you wind up making images that look very smoothed out. This is because the compressors want to pick the lowest entropy prediction, which is in the middle of the probability bump. That's not the same goal as a denoiser or super-resolution algorithm, which wants to pick the single maximum likelihood value.

This is a lot like the difference between median and average.

Metatron said...

Found some nice paper on the topic:

http://www.busim.ee.boun.edu.tr/~mihcak/publications/spl99.pdf