From cbloom@texas.net Thu Jan 28 20:02:10 1999 Newsgroups: comp.compression Subject: Re: the Lapped transform and JPEG Some followup info on the Lapped transform: We can think of it as a smoothly windowed DCT. The basis functions are : Bnk(x) = Wn(x) C(k*x) (n and k are subscripts, that's B_n_k) where the C() is the normalized cosine as in the DCT : C(f) = sqrt( 2 / L ) * cos( 2pi *f / L) n is the interval number, x is the discrete spatial coordinate, and k is the frequency. For simplicity, let each interval be of width (0-L), and then k runs 0 <= k < L Then our transform is : g(n,k) = Sum[x] Bnk(x) f(x) For clarity, consider the special case where Wn(x) = { 0 , x < n*L 1 , n*L <= x < (n+1)*L 0 , x >= (n+1)*L } (that is, the step function on the nth interval). Then g(n,k) is just the DCT of the input signal f. Back to our main line: the reconstruction is : F(x) = Sum[nk] Bnk(x) gnk = Sum[nky] Bnk(x) Bnk(y) f(y) Obviously, we get reconstruction IFF : Sum[nk] Bnk(x) Bnk(y) = delta(x-y) So, what condition does this place on W ? delta(x-y) = Sum[nk] Wn(x) C(kx) Wn(y) C(ky) = ( Sum[n] Wn(x) Wn(y) ) ( Sum[k] C(kx) C(ky) ) Now the final sum is just something we see all the time in DCT, and we know Sum[k] C(kx) C(ky) = delta(x-y) So: delta(x-y) = ( Sum[n] Wn(x) Wn(y) ) delta(x-y) or: Sum[n] Wn(x) Wn(x) = 1 or: Sum[n] Wn(x)^2 = 1 So, this is a very nice and simple constraint! For example, our step function obviously satisfied it. (in fact, if our W's don't overlap, then the step function is the *only* solution! hence the DCT..) On the other hand, let's allow our W's to overlap, but only overlap with one neighbor. That is: Wn(x) * W(n+m)(x) = 0 if abs(m) >= 2 In this case, the constraint becomes Wn(x)^2 + Wn+1(x)^2 = 1 in their region of overlap. There are lots of nice functions that satisfy this. Besides our step function, the next simplest is a triangle peaked at the center of our interval, and goes to zero at the peaks of its neighbors : Wn(x) = sqrt(1 - abs( x - (n+.5)*L )) The constrain is satisfied because : 1 - abs( x - (n+.5)*L ) + 1 - abs( x - (n+1+.5)*L) ) = 2 + (x - (n+.5)*L) - (x - (n+1+.5*L)) = 1 ! I'll post some references on more choices for W() if people are interested. In the end, you're going to discretize W(), so it's really just a table of values for the coder. If the W's are symmetric, and are all just translations of the mother function : Wn(x) = w( x - (n+.5)L ) w(-x) = w(x) Then the constraint reads w(x + L/2)^2 + w(x - L/2)^2 = 1 or w(L/2 + x) = sqrt( 1 - w(L/2 - x)^2 ) In other words, if we specify w() in the range 0 to L/2 , then it is fully constrained in the range -L to L If we choose L = 8 , we can take advantage of the standard JPEG machinery. In fact, we can do our inverse transform in two steps : 1. for each interval, do an 8->16 IDCT 2. multiply the (16) un-DCT's valued by Wn(x) and add them to the main stream. This is 16 more multiplies & memory accesses per IDCT, which could probably be dramatically reduced if you're more clever than I ! (we need 16, because we need our interval of 8, and possible overlaps of 4 into each neighboring interval). The advantage over normal JPEG is obvious : the choice of W() is like modeling; if you throw away or quantize your coefficients, then the shape of W() starts to "show through". (on the encoding side, a good choice of W will concentrate the energy more). In particular, the blocky artifacts at very-low bitrates can be essentially removed.

## No comments:

Post a Comment