4/06/2011

04-06-11 - And yet more on filters

In the comments of the last post we talked a bit about reconstruction/resampling. I was a bit confounded, so I worked it out.

So, the situation is this. You have some discrete pixels, P_i. For reconstruction, each pixel value is multiplied by some continuous impulse function which I call a "hat", centered at the pixel center. (maybe this is the monitor's display response and the output value is light, or whatever). So the actual thing you care about is the continuous output :


Image(t) = Sum_i P_i * hat( t - i)

I'll be working in 1d here for simplicity but obviously for images it would be 2d. "hat" is something like a box, linear, cubic, gaussian, whatever you like, presumably symmetric, hopefully compact.

Okay, so you wish to resample to a new set of discrete values Q_i , which may be on a different lattice spacing, and may also be reconstructed with a different basis function. So :


Image'(t) = Sum_j Q_j * hat'( t - r*j )

(where ' reads "prime). In both cases the sum is on integers in the image domain, and r is the ratio of lattice spacings.

So what you actually want to minimize is :


E = Integral_dt {  ( Image(t) - Image'(t) )^2 }

Well to find the Q_j you just do derivative in Q_j , set it to zero, rearrange a bit, and what you get is :


Sum_k H'_jk * Q_k = Sum_i R_ij * P_i

H'_jk = Integral_dt { hat'(t-r*j) * hat'(t-r*k) }

R_ij = Integral_dt { hat'(t-r*j)) * hat(t-i) }

or obviously in matrix notation :

H' * Q = R * P

"H" (or H') is the hat self-overlap matrix. It is band-diagonal if the hats are compact. It's symmetric, and actually

H_ij = H( |i-j| )

that is, it only depends on the absolute value of the difference :

H_ij = 
H_0 H_1 H_2 ...
H_1 H_0 H_1 H_2 ...
H_2 H_1 H_0 ...

and again in words, H_i is the amount that the hat overlaps with itself when offset by i lattice steps. (you could normalize the hats so that H_0 is always 1.0 ; I tend to normalize so that h(0) = 1, but it doesn't matter).

If "hat" is a box impulse, then H is the identity matrix (H_0 = 1, else = 0). If hat is the linear tent, then H_0 = 2/3 and H_1 = 1/6 , if hat is Mitchell1 (compromise) the terms are :


H_0 : 0.681481
H_1 : 0.176080
H_2 : -0.017284
H_3 : 0.000463

While the H matrix is very simple, there doesn't seem to be a simple closed form inverse for this type of matrix. (is there?)

The R matrix is the "resampling matrix". It's the overlap of two different hat functions, on different spacings. We can sanity check the trivial case, if r = 1 and hat' = hat, then H = R, so Q = P is the solution, as it should be. R is also sparse and sort of "band diagonal" but not along the actual matrix diagonal, the rows are shifted by steps of the resize ratio r.

Let's try a simple case. If the ratio = 2 (doubling), and hat and hat' are both linear tents, then :


H_0 = 2/3 and H_1 = 1/6 , and the R matrix has sparse rows made of :

...0..., 0.041667, 0.250000, 0.416667, 0.250000, 0.041667 , .... 0 ....

Compute  H^-1 * R =

makes a matrix with rows like :

0 .. ,0.2500,0.5000,0.2500,0 ..

which is a really complicated way to get our triangle hat back, evaluated at half steps.

But it's not always that trivial. For example :


if the hat and hat' are both cubic, r = 2, 

H (self overlap) is :

0 : 0.479365
1 : 0.236310
2 : 0.023810
3 : 0.000198

R (resize overlap) is :

0 : 0.300893
1 : 0.229191
2 : 0.098016
3 : 0.020796
4 : 0.001538
5 : 0.000012

and H^-1 * R has rows of :

..0..,0.0625,0.2500,0.3750,0.2500,0.0625,..0..

which are actually the values of the standard *quadratic* filter evaluated at half steps.

So the thing we get in the end is very much like a normal resampling filter, it's just a *different* one than if you just evaluated the filter shape. (eg. the H^-1 * R for a Gaussian reconstruction hat is not a Gaussian).

As noted in the previous comments - if your goal is just to resize an image, then you should just choose the resize filter that looks good to your eyes. The only place where this stuff might be interesting is if you are trying to do something mathematical with the actual image reconstruction. Like maybe you're trying to resample from monitor pixels to rod/cone pixels, and you have some a-priori scientific information about what shape reconstruction functions each surface has, so your evaluation metric is not ad-hoc.

.. anyhoo, I'm sure this topic has been covered in academic papers so I'm going to leave it alone.

ADDENDUM : another example while I have my test app working.


Gaussian filter with sdev = 1/2, evaluated at half steps :

1.0000,0.6065,0.1353,0.0111,0.0003,...

The rows of (H^-1 * R) provide the resizing filter :

1.0000,0.5329,0.0655,0.0044,-0.0007,...

which comes from :

filter self overlap (H) :
0 : 0.886227
1 : 0.326025
2 : 0.016231
3 : 0.000109
4 : 0.000000
5 : 0.000000

filter resample overlap (R) :
0 : 1.120998
1 : 0.751427
2 : 0.226325
3 : 0.030630
4 : 0.001862


Let me restart from the beginning on the more general case :

Say you are given a continuous function f(t). You wish to find the discrete set of coefficients P_i such that under simple reconstruction by the hats h(t-i) , the L2 error is minimized (we are not doing simple sampling such that P_i = f(i)). That is :


the reconstruction F is :

F(t) = Sum_i P_i * h(t-i)

the error is :

E = Integral_dt { ( f(t) - F(t) ) ^2 }

do derivative in P_i and set to zero, you get :

Sum_j H_ij * P_j = Integral_dt { f(t) * h(t-i) }

where H is the same hat self-overlap matrix as before :

H_ij = h_i <conv> h_j

(with h_i(t) = h(t-i) and conv means convolution obviously)

or in terse notation :

H * P = f <conv> h

(H is a matrix, P is a vector )

rearranging you can also say :

P_j = f <conv> g

if

g_i(t) = Sum_j [ H^-1_ij * h(t-j) ]

what we have found is the complementary basis function for h. h (the hat) is like a "synthesis wavelet" and g is like an "analysis wavelet". That is, once you have the basis set g, simple convolution with the g's produces the coefficients which are optimal for reconstruction with h.

Note that typically H^-1 is not compact, which means g is not compact - it has significant nonzero value over the entire image.

Also note that if there is no edge to your data (it's on infinite domain), then the g's are just translations of each other, that is, g_i(t) = g_0(t-i) ; however on data with finite extent this is not the case (though the difference is compact and occurs only at the edges of the domain).

It should be intuitively obvious what's going on here. If you want to find pixel P_i , you take your continuous signal and fit the hat at location i and subtract that out. But your neighbors' hats also may have overlapped in to your domain, so you need to compensate for the amount of the signal that they are taking, and your hat overlaps into your neighbors, so choosing the value of P_i isn't just about minimizing the error for that one choice, but also for your neighbors. Hence it becomes non-local, and very much like a deconvolution problem.

No comments:

old rants