A while ago I posed this problem to the world :
Say you are given the box-downsampled version of a signal (I may use "image" and "signal" interchangeably cuz I'm
sloppy). Box-downsampled means groups of N values in the original have been replaced by the average in that group
and then downsampled N:1. You wish to find an image which is the same resolution as the source and if box-downsampled
by N, exactly reproduces the low resolution signal you were given. This high resolution image you produce should be "smooth" and
close to the expected original signal.
Examples of this are say if you're given a low mip and you wish to create a higher mip such that downsampling again
would exactly reproduce the low mip you were given. The particular case I mainly care about is if you are given
the DC coefficients of a JPEG, which are the averages on 8x8 blocks, you wish to produce a high res image which
has the exact same average on 8x8 blocks.
Obviously this is an under-constrained problem (for N > 1) because I haven't clearly spelled out "smooth" etc.
There are an infinity of signals that when downsampled produce the same low resolution version. Ideally I'd like
to have a way to upsample with a parameter for smoothness vs. ringing that I could play with.
(if you're nitty, I can constrain the problem precisely : The correlation of the output image and the original source image
should be maximized over the space of all real world source images (eg. for example over the space of all images that
exist on the internet)).
Anyway, after trying a whole bunch of heuristic approaches which all failed (though Sean's iterative approach is actually
pretty good), I found the mathemagical solution, and I thought it was interesting, so here we go.
First of all, let's get clear on what "box downsample" means in a form we can use in math.
You have an original signal f(t) . We're going to pretend it's continuous because it's easier.
To make the "box downsample" what you do is apply a convolution with a rectangle that's N wide. Since I'm treating t as continuous
I'll just choose coordinates where N = 1. That is, "high res" pixels are 1/N apart in t, and "low res" pixels are 1 apart.
Convolution { f , g } (t) = Integral{ ds * f(s) * g(t - s) }
The convolution with rect gives you a smoothed signal, but it's still continuous. To get the samples of the low res image, you
multiply this by "comb". comb is a sum of dirac delta functions at all the integer coordinates.
F(t) = Convolve{ rect , f(t) }
low res = comb * F(t)
low res = Sum[n] L_n * delta_n
Okay ? We now have a series of low res coefficients L_n just at the integers.
This is what is given to us in our problem. We wish to try to guess what "f" was - the original high res signal. Well, now that
we've written is this way, it's obvious ! We just have to undo the comb filtering and undo the convolution with rect !
First to undo the comb filter - we know the answer to that. We are given discrete samples L_n and we wish to reproduce the smooth
signal F that they came from. That's just Shannon sampling theorem reconstruction. The smooth reconstruction is made by just
multiplying each sample by a sinc :
F(t) = Sum[n] L_n * sinc( t - n )
This is using the "normalized sinc" definition : sinc(x) = sin(pi x) / (pi x).
sinc(x) is 1.0 at x = 0 and 0.0 at all other integer x's and it oscillates around a lot.
So this F(t) is our reconstruction of the rect-filtered original - not the original. We need to undo the rect filter. To do
that we rely on the Convolution Theorem : Convolution in Fourier domain is just multiplication. That is :
Fou{ Convolution { f , g } } = Fou{ f } * Fou{ g }
So in our case :
Fou{ F } = Fou{ Convolution { f , rect } } = Fou{ f } * Fou{ rect }
Fou{ f } = Fou{ F } / Fou{ rect }
Recall F(t) = sinc( t - n ) , so :
Fou{ f } = Sum[n] L_n * Fou{ sinc( t - n ) } / Fou{ rect }
Now we need some Fourier transform knowledge. The easiest way for me to find this stuff is just to do the integrals myself. Integrals
are really fun and easy. I won't copy them here because it sucks in ASCII so I'll leave it as an exercise to the reader. You can
easily figure out the Fourier translation principle :
Fou{ sinc( t - n ) } = e^(-2 pi i n v) * Fou{ sinc( t ) }
As well as the Fourier sinc / rect symmetry :
Fou{ rect(t) } = sinc( v )
Fou{ sinc(t) } = rect( v )
All that means for us :
Fou{ f } = Sum[n] L_n * e^(-2 pi i n v) * rect(v) / sinc(v)
So we have the Fourier transform of our signal and all that's left is to do the inverse transform !
f(t) = Sum[n] L_n * Fou^-1{ e^(-2 pi i n v) * rect(v) / sinc(v) }
because of course constants pull out of the integral. Again you can easily prove a Fourier translation principle : the e^(-2 pi i n v) term
just acts to translate t by n, so we have :
f(t) = Sum[n] L_n * h(t - n)
h(t) = Fou^-1{ rect(v) / sinc(v) }
First of all, let's stop and see what we have here. h(t) is a function centered on zero and symmetric around zero - it's a reconstruction
shape. Our final output signal, f(t), is just the original low res coefficients multiplied by this h(t) shape translated to each integer
point n. That should make a lot of sense.
What is h exactly? Well, again we just go ahead and do the Fourier integral. The thing is, "rect" just acts to truncate the infinite range
of the integral down to [-1/2, 1/2] , so :
h(t) = Integral[-1/2,1/2] { dv e^(2 pi i t v) / sinc(v) }
Since sinc is symmetric around zero, let's take the two halves of the range around zero and add them together :
h(t) = Integral[0,1/2] { dv ( e^(2 pi i t v) + e^(- 2 pi i t v) ) / sinc(v) }
h(t) = Integral[0,1/2] { dv 2 * cos ( 2 pi t v ) * pi * v / sin( pi v) }
(note we lost the c - sinc is now sin). Let's change variables to w = pi v :
h(t) = (2 / pi ) * Integral[ 0 , pi/2 ] { dw * w * cos( 2 t w ) / sin( w ) }
And.. we're stuck. This is an integral function; it's a pretty neat form, it sure smells like some kind of Bessel function or something
like that, but I can't find this exact form in my math books. (if anyone knows what this is, help me out).
(actually I think it's a type of elliptic integral).
One thing we can do with h(t) is prove that it is in fact exactly what we want. It has the box-unit property :
Integral[ N - 1/2 , N + 1/2 ] { h(t) dt } = 1.0 if N = 0 and 0.0 for all other integer N
That is, the 1.0 wide window box filter of h(t) centered on integers is exactly 1.0 on its own unit interval, and 0 on others. In
other words, h(t) reconstructs its own DC perfectly and doesn't affect any others.
(prove this by just going ahead and doing the integral; you should get sin( N * pi ) / (N * pi ) ).
While I can't find a way to simplify h(t) , I can just numerically integrate it. It looks like this :
You can see it sort of looks like sinc, but it isn't. The value at 0 is > 1. The height of the central peak vs.
the side peaks is more extreme than sinc, the first negative lobes are deeper than sinc. It actually reminds me
of the appearance of a wavelet.
Actually the value h(0) is exactly 4 G / pi = 1.166243... , where "G" is Catalan's constant.
Anyway, this is all very amusing and it actually "works" in the sense that if you blow up a low-res image using
this h(t) basis shape, it does in fact make a high res image that is smooth and upon box-down sampling exactly
reproduces the low-res original.
It is, however, not actually useful. For one thing, it's computationally ridiculous. Of course you would
precompute the h(t) and store it in a table, but even then, the reach of h(t) is infinite, and it doesn't
get small until very large t (beyond the edges of any real image), so in practice every output pixel must be
a weighted sum from every single DC values in the low res image. Even without that problem, it's useless
because it's just too ringy on real data. Looking at the shape above it should be obvious it will ring
like crazy.
I believe these problems basically go back to the issue of using the ideal Shannon reconstruction when I
did the step of "undoing the comb". By using the sinc to reproduce I doomed myself to non-local effect and
ringing. The next obvious question is - can you do something other than sinc there? Why yes you can,
though you have to be careful.
Say we go back to the very beginning and make this reconstruction :
F(t) = Sum[n] L_n * B( t - n )
We're making F(t) which is our reconstruction of the smooth box-filter of the original. Now B(t) is some reconstruction
basis function (before we used sinc). In order to be a reconstruction, B(t) must be 1.0 at t = 0, and 0.0 at all other
integer t. Okay.
If we run through the math with general B, we get :
again :
f(t) = Sum[n] L_n * h(t - n)
but with :
h(t) = Fou^-1{ Fou{ B } / sinc(v) }
For example :
If B(t) = "triangle" , then F(t) is just the linear interpolation of the L_n
Fou{ triangle } = sinc^2 ( v)
h(t) = Fou^-1{ sinc^2 ( v) / sinc(v) } = Fou^-1{ sinc } = rect(t)
Our basis functions are rects ! In fact this is the reconstruction where the L_n is just made a constant over each
DC domain. In fact if you think about it that should be obvious. If you take the L_n and make them constant on
each domain, then you run a rectangle convolution over that - as you slide the rectangle window along, you get linear
interpolation, which is our F(t).
That's not useful, but maybe some other B(t) is. In particular I think the best line of approach is for B(t) to be
some kind of windowed sinc. Perhaps a Guassian-windowed sinc. Any real world window I can think of leads to a Fourier
transform of B(t) that's too complex to do analytically, which means our only approach to finding h is to do a
double-numerical-integration which is rather a disastrous thing to do, even for precomputing a table.
So I guess that's the next step, though I think this whole approach is a practical dead end and is now just a scientific
curiosity. I must say it was a lot of fun to actually bust out pencil and paper and do some math and real thinking. I really miss it.