07-01-08 - 6

I just found this post on lapped transforms on my machine; I can't find it on the internet any more so here it is. BTW since I wrote this there have been many papers written on lapped transforms that are pretty good.

From cbloom@texas.net Thu Jan 28 20:02:10 1999
Newsgroups: comp.compression
Subject: Re: the Lapped transform and JPEG

Some followup info on the Lapped transform:

We can think of it as a smoothly windowed
DCT.  The basis functions are :

Bnk(x) = Wn(x) C(k*x)

(n and k are subscripts, that's B_n_k)
where the C() is the normalized cosine
as in the DCT :

        C(f) = sqrt( 2 / L ) * cos( 2pi *f / L)

n is the interval number, x is the discrete
spatial coordinate, and k is the frequency.
For simplicity, let each interval be of width
(0-L), and then k runs 0 <= k < L

Then our transform is :

g(n,k) = Sum[x] Bnk(x) f(x)

For clarity, consider the special case where
Wn(x) = {       0 , x < n*L
                1 , n*L <= x < (n+1)*L
                0 , x >= (n+1)*L }
(that is, the step function on the nth interval).

Then g(n,k) is just the DCT of the input signal
f.  Back to our main line:

the reconstruction is :

F(x) = Sum[nk] Bnk(x) gnk = Sum[nky] Bnk(x) Bnk(y) f(y)

Obviously, we get reconstruction IFF :

Sum[nk] Bnk(x) Bnk(y) = delta(x-y)

So, what condition does this place on W ?

delta(x-y) = Sum[nk] Wn(x) C(kx) Wn(y) C(ky)

           = ( Sum[n] Wn(x) Wn(y) ) ( Sum[k] C(kx) C(ky) )

Now the final sum is just something we see all the time
in DCT, and we know

Sum[k] C(kx) C(ky) = delta(x-y)


delta(x-y) = ( Sum[n] Wn(x) Wn(y) ) delta(x-y)


        Sum[n] Wn(x) Wn(x) = 1

        Sum[n] Wn(x)^2 = 1

So, this is a very nice and simple constraint!
For example, our step function obviously satisfied it.
(in fact, if our W's don't overlap, then the step
function is the *only* solution! hence the DCT..)

On the other hand, let's allow our W's to overlap,
but only overlap with one neighbor.  That is:

        Wn(x) * W(n+m)(x) = 0  if abs(m) >= 2

In this case, the constraint becomes

        Wn(x)^2 + Wn+1(x)^2 = 1         

in their region of overlap.

There are lots of nice functions that satisfy this.

Besides our step function, the next simplest is a
triangle peaked at the center of our interval, and
goes to zero at the peaks of its neighbors :

        Wn(x) = sqrt(1 - abs( x - (n+.5)*L ))

The constrain is satisfied because :

1 - abs( x - (n+.5)*L ) + 1 - abs( x - (n+1+.5)*L) )
= 2 + (x - (n+.5)*L) - (x - (n+1+.5*L)) = 1 !

I'll post some references on more choices for W()
if people are interested.  In the end, you're going
to discretize W(), so it's really just a table of
values for the coder.

If the W's are symmetric, and are all just translations
of the mother function :

        Wn(x) = w( x - (n+.5)L )

        w(-x) = w(x)

Then the constraint reads

        w(x + L/2)^2 + w(x - L/2)^2 = 1         

        w(L/2 + x) = sqrt( 1 - w(L/2 - x)^2 )

In other words, if we specify w() in the range 0 to L/2 ,
then it is fully constrained in the range -L to L

If we choose L = 8 , we can take advantage of the
standard JPEG machinery.  In fact, we can do our
inverse transform in two steps :

        1. for each interval, do an 8->16 IDCT

        2. multiply the (16) un-DCT's valued by Wn(x)
           and add them to the main stream.

This is 16 more multiplies & memory accesses per
IDCT, which could probably be dramatically reduced
if you're more clever than I !

(we need 16, because we need our interval of 8, and
possible overlaps of 4 into each neighboring interval).

The advantage over normal JPEG is obvious : the
choice of W() is like modeling; if you throw away
or quantize your coefficients, then the shape of W()
starts to "show through".  (on the encoding side,
a good choice of W will concentrate the energy more).
In particular, the blocky artifacts at very-low
bitrates can be essentially removed.

No comments:

old rants