4/04/2011

04-04-11 - Yet more notes on filters

Topics for today :
N-way filters

symmetry of down & up

qualitative notes

comb sampling

1. N-way filters. So for a long time cblib has had good doublers and halvers, but for non-binary ratios I didn't have a good solution and I wasn't really sure what the solution should be. What I've been doing for a long time has been to use doublers/halvers to get the size close, then bilinear to get the rest of the way, but that is not right.

In fact the solution is quite trivial. You just have to go back to the original concept of the filters as continuous functions. This is how arbitrary float samplers work (see the Mitchell papers for example).

Rather than a discrete filter, you use the continuous filter impulse. You put a continuous filter shape at each pixel center, multiplied by the pixel value. Now you have a continuous function for your whole image by just summing all of these :

Image(u,v) = Sum[ all pixels ] Pixel[i,j] * Filter_func( u - i, v - j )

So to do an arbitrary ratio resize you just construct this continuous function Image, and then you sample it at all the fractional u,vs.

Now, because of the "filter inversion" principle that I wrote about before, rather than doing this by adding up impulse shapes, you can get the exact same output by constructing an impulse shape and convolving it with the source pixels once per output pixel. The impulse shape you make should be centered on the output pixel's position, which is fractional in general. This means you can't precompute any discrete filter taps.

So - this is cool, it all works. But it's pretty slow because you're evaluating the filter function many times, not just using discrete taps.

There is one special case where you can accelerate this : integer ratio magnifications or minifications. Powers of two obviously, but also 3x,5x, etc. can all be fast.

To do a 3x minification, you simply precompute a discrete filter that has a "center width" of 3 taps, and you apply it in steps of 3 to make each output pixel.

To do a 3x magnification, you need to precompute 3 discrete filters. The 3 filters will be applied at each source pixel location to produce 3 output pixels per source pixel. They correspond to the impulse shape offset by the correct subpixel amounts (for 3X the offets are -1/3,0,1/3). Note that this is just the same as the arbitrary ratio resize, we're just reusing the computation when the subpixel part repeats.

(in fact for any rational ratio resize, you could precompute the filters for the repeating sub-integer offsets ; eg. to resize by a ratio of 7/3 you would need 21 filters ; this can save a lot of work in some cases, but if your source and dest image sizes are relatively prime it doesn't save any work).

2. Symmetry of down & up . If you look at the way we actually implement minification and magnification, they seem very different, but they can be done with the same filter if you like.

That is, the way we actually implement them, as described above for 3X ratio for example :

Minify : make a filter with center width 3, convolve with source at every 3rd pixel to make 1 output

Magnify : make a filter with center width 1, convolve with source at every 1/3rd pixel to make 1 output

But we can do magnify another way, and use the exact same filter that we used for minify :

Magnify : make a filter with center with 3, multiply with each source pel and add into output

Magnify on the left , Minify on the right :

As noted many times previously, we don't actually implement magnify this way, but it's equivalent.

3. Qualitative notes. What do the filters actually look like, and which should you use ?

Linear filters suffer from an inherent trade-off. There is no perfect filter. (as noted previously, modern non-linear techniques are designed to get around this). With linear filters you are choosing along a spectrum :

blurry -> sharp / ringy

The filters that I've found useful, in order are :
[blurry end]

gauss - no window
gauss - cos window (aka "Hann")
gauss - blackman

Mitchell blurry (B=3/2)
Mitchell "compromise" (B=1/3)
Mitchell sharp (B=0)

sinc - blackman
sinc - cos
sinc - sinc (aka Lanczos)

[sharp end]

there are lots of other filters, but they are mostly off the "pareto frontier" ; that is, one of the filters above is just better.

Now, if there were never any ringing artifacts, you would always want sinc. In fact you would want sinc with no window at all. The output from sinc resizing is sometimes just *amazing* , it's so sharp and similar to the original. But unfortunately it's not reliable and sometimes creates nasty ringing. We try to limit that with the window function. Lanczos is just about the widest window you ever want to use with sinc. It produces very sharp output, but some ringing.

Note that sinc also has a very compelling theoretical basis : it reproduces the original pixels if you resize by a factor of 1.0 , it's the only (non-trivial) filter that does this. (* not true - see later posts on this topic where we construct interpolating versions of arbitrary filters)

If you are resizing in an interactive environment where the user can see the images, you should always start with the sharp filters like Lanczos, and the user can see if they produce unnacceptable artifacts and if so go for a blurrier filter. In an automated environment I would not use Lanczos because it is too likely to produce very nasty ringing artifacts.

The Mitchell "compromise" is a very good default choice in an automated environment. It can produce some ringing and some blurring, but it's not too horrible in either way. It's also reasonably compact.

The Gauss variants are generally more blurring than you need, but have the advantage that all their taps are positive. The windowed gaussians generally look much better than non-windowed, they are much sharper and look more like the original. They can produce some "chunkiness" (raster artifacts), that is under magnification they can make edges have visible stair steps. Usually this is preferrable to the extreme blurriness of the true non-windowed gaussian.

4. comb sampling

Something that bothers me about all this is the way I make the discrete filters from the continuous ones is by comb sampling. That is, I evaluate them just at the integer locations and call that my discrete filter.

I have some continuous mother filter function, f(t) , and I make the discrete filter by doing :

D[] = { f(-2), f(-1), f(0), f(1), f(2) }

and then I apply D to the pixels by just doing mult-adds.

But this is equivalent to convolving the continuous filter f(t) with the original image if the original pixels has dirac delta-functions at each pixel center.

That seems kind of whack. Don't I want to imagine that the original pixels have some size? Maybe the are squares (box impulses), or maybe they are bilinear (triangle hats), etc.

In that case, my discrete filter should be made by convolving the base pixel impulse with the continuous filter, that is :

D[] = { f * hat(-2) , f * hat(-1) , .. }

(* here means convolve)

Note that this doesn't really apply to resizing, because in resizing what we are doing when we apply a filter is we are representing the basic pixel impulses. This applies if I want to apply a filter to my pixels.

Like say I want to apply a Gaussian to an image. I shouldn't just evaluate a gaussian function at each pixel location - I should convolve pixel shapes with the gaussian.

Note that in most cases this only changes the discrete filter values slightly.

Also note that this is equivalent to up-sizing your image using the "hat" as the upsample filter, and then doing a normal comb discrete filter at the higher res.

15 comments:

won3d said...

"But this is equivalent to convolving the continuous filter f(t) with the original image if the original pixels has dirac delta-functions at each pixel center.

That seems kind of whack. Don't I want to imagine that the original pixels have some size? Maybe the are squares (box impulses), or maybe they are bilinear (triangle hats), etc."

This seems whack-ier to me. The "shape" of the pixel is determined at reconstruction time, i.e. when it gets displayed on the screen. If anything, you want to invert this by pre-convolving with the inverse of the pixel shape.

cbloom said...

Consider for example if I want to do some analysis on an image as it would be seen on a monitor.

Then the "hats" should be the monitor's pixel response functions.

And I should apply whatever filter I want to run on that output continuous light field.

won3d said...

Yeah, I misinterpreted when you would be doing this. But if you're going to do this at the analysis phase, you ought to invert it at the synthesis phase. I was assuming that you're creating a new image, but I see that you could be generating something something based on stats, like a quality score or something in which case the output is not another image.

cbloom said...

Yeah yeah, I think we agree.

There's a few subtle points :

1. Upon display on the monitor, some hat shape is implicitly multiplied per pixel.

2. Resizing using some filter is implicitly using that filter as the hat shape of the pixels.

So in both of those cases you should not do your own (and if you already did it, you should invert it).

But if you just want to compute some numerical score - like what's the gaussian convolution of this region of the image? - I believe you should pre-convolve the image with the pixel hat shape (which is equivalent to applying to the filter).

Nasty.

won3d said...
This comment has been removed by the author.
castano said...

> Note that this doesn't really apply to resizing

Can you elaborate more on that? In NVTT I tried representing the image pixels as boxes or bilinear functions and although I didn't do a very exhaustive testing, I think it worked quite well. In fact, the box-sampling version enabled by default.

ryg said...

"in fact for any rational ratio resize, you could precompute the filters for the repeating fractions"
Yeah but why bother being clever in the non-integer ratio case? Worst case you just compute+store filter taps for every destination row/column. That's between 1000 and 5000 filters you need to precompute and store for current expected image sizes, and your accesses to that table are always perfectly sequential.

And of course if you're walking through the image in raster order, you only need to precompute filter taps for the horizontal dimension and then recompute your vertical filter weights every row.

cbloom said...

"Yeah but why bother being clever in the non-integer ratio case? "

Well, it depends how expensive your filter function is to evaluate.

If your dest image is 3000 pixels wide, and your filter is quite wide, say 15 taps or so, that's 45,000 function evaluations.

Currently in my implementation that's a function-pointer-call for each evaluation, which is super slow, but of course you could rearrange to make that just one function pointer call that evaluates it many times.

But of course if your filter is a kaiser window with a bessel function or some nutty shit like that, it's still ungodly slow.

But yeah, I'm not actually going to implement the rational case, just noting it. I'm not going to remove my function pointer calls either cuz it would make the code so much uglier.

ryg said...

I retract the "extra complexity" bit btw, that was just BS, the whole "extra complexity" of doing the full rational over what I just described is one extra gcd computation :)

cbloom said...

"Can you elaborate more on that? In NVTT I tried representing the image pixels as boxes ..."

I've been thinking about this a bit more and can't quite get my head around it.

What I meant is that when you run a filter for resizing, that's like using that filter as as the reconstruction shape of the pixels, and then point-sampling at the new grid points.

eg. if I resize with a gaussian filter, that's the same as me saying my pixels are gaussian blobs, and then I point sample the continuous field in a few places.

Okay, but what if you wanted to say "my source pixels are hat blobs, and I want to resize to hat blobs at a different spacing" (where hat=box,linear,gaussian,whatever). It's a bit tricky.

Pinky's Brain said...

Shouldn't spline interpolation be on the list? Not a convolution kernel, but it is linear.

Did you see Hughes Hoppes big ass comparison of resamplers?

"Generalized sampling in computer graphics."

He has some videos for slow changes in translation. Mitchell (compromise) and Keys (aka Mitchell sharp aka Catmull-Rom splines) look quite poor. Lots of pumping between blurry and sharp.

cbloom said...

I hadn't seen that Hoppe paper, thanks.

"Shouldn't spline interpolation be on the list?"

Well, cubic filter is a type of cubic spline. Mitchell can be made into Catmull-Rom by choosing the parameter right. In fact the filter that I call "cubic" is B-spline Beta3, which is box convolved with itself 3 times.

You could also ask about things like "natural" splines, which would be a linear synthesis but a very complicated analysis.

cbloom said...

Though to be clear when I say "cubic filter" I'm not usually meaning solving for the control points to be interpolating which means that under nop the original image is not preserved.

Pinky's Brain said...

Yeah, sorry ... meant B-splines, they are natural splines. It's not as cheap as a convolution kernel, but Unser has found some cheap-ish ways of implementing them.

Jeff Larson said...

I stumbled across your post while researching lanczos, I'm trying recreate your constants for the filter -- but I must be missing something in the calculation, could I bother you to post some code or psuedocode for how you arrived at them?

old rants