09-08-08 - 3

There are two ways to think about the Haar transform. Haar takes two values {A,B} and makes the average and difference. Naively it seems like this might be impossible to do losslessly in integers because the average has a divide by two. It's crucial that you make average, not sum, because you don't want "bit expansion" (that is, requiring one more bit each time you do a Haar). BTW Haar aka S-transform.

One way is this magic sort of trick with where the LSB goes. You form

L = (A+B)/2 
H = B - A

(all divides floored on integers); how do I get back my original values? The trick is getting the bottom bit right, and the key is that when H is odd it means you need to set a bottom bit on either A or B; if H is positive you should put it on B, if negative put it on A. This means :

A = L + (1-H)/2
B = L + (H+1)/2

An easier way to see it is via the generic lifting/predictor mechanism. You can always create a transform that is reversible by taking a series of predictor steps. One predictor step is :

A -= P(B)
You predict A only using B, and subtract it off; you can invert it of course just by doing +=. To make the Haar transform this way you do :
B -= A;
A += B/2;
To invert you flip the signs and reverse the order :
A -= B/2;
B += A;

P() the lifting predictor can actually be any function, it need not be linear to be reversible, but if we restrict ourselves to linear transforms, then it can be written as a 2x2 matrix multiply :

A = [ 1 X ] [ A ]
B   [ 0 1 ] [ B ]
Any transform like this with 1's on the diagonal and half the matrix zero is a valid lifting transform. This is a "shear". One useful fact is that any rotation can be written as a shear. In particular :
[ c -s ] = [ 1   0 ] [ 1  -sc ] [ c  0  ]
[ s  c ]   [ s/c 1 ] [ 0   1  ] [ 0 1/c ]

t = (c-1)/s
[ c -s ] = [ 1 t ] [ 1 0 ] [ 1 t ]
[ s  c ]   [ 0 1 ] [ s 1 ] [ 0 1 ]
Form A uses a scale first which is not reversible, form B is pure shears. Form B is good except when the sine "s" becomes small, in which case the temp value "t" becomes very large. Even ignoring divergence, a large "t" value is very bad for accuracy and bit expansion. You don't really want to be doing lifting shears with multipliers much bigger than 2.0 ; fortunately, when "s" is small, "c" is very close to 1 (or -1), which means the scale term in form A is nearly identity. In that case we can just approximate it as exactly one and not do the scale step at all. If we drop the scale a more accurate lifting rotation is just :
near s = 0
[ c -s ] ~= [ 1 0 ] [ 1 -s ] (* -1 if c < 0)
[ s  c ]    [ s 1 ] [ 0  1 ]
Using this, any matrix that can be written as rotations (eg. any orthonormal matrix) can be decomposed into lossless integer lifting steps.

There's a beautiful paper by Srinivasan called Modulo Transforms - An Alternative to Lifting . I don't think it's actually very practical because division and mod are so expensive, but it is a very nice explanation of integer transforms. The description of the quantization lattice and volume expansion is the best I've seen.

Let's look at the Haar again. It starts as a 45 degree rotation :

[ sqrt(1/2)  -sqrt(1/2) ] =  sqrt(1/2) * [ 1 -1 ]
[ sqrt(1/2)   sqrt(1/2) ]                [ 1  1 ]
This is orthonormal, which means it's L2 preserving, and BTW it's what the KLT would tell you to do if your data is 100% correlated; that is, the fit is through x = y.

To do this on ints, the scaling factor sqrt(1/2) ruins int->int , so the easiest thing to do is just don't do the scale :

[ 1 -1 ]
[ 1  1 ]
BTW this is called the "Hadamard transform". It's int->int, but it's expanding. It uniformly expands the L2 norm by *2 (the determinant of the matrix is the expansion factor). On ints, this expansion means that half of the output values in the valid range are impossible. In particular the two outputs always have the same bottom bit. In the Srinivasan paper he has some nice pictures of how int transforms give you a lattice of valid values with lots of blanks. Obviously we can get rid of the volume expansion by quantizing the output space. This just means dividing one of the two outputs by 2.

This is where it gets subtle. We can divide either of the two outputs by 2. Both choices eliminate the volume expansion, but neither choice is L2 preserving, and the L2 norm of the two choices is different.

V(x) = < x^2 > is the variance of x assuming zero mean
< y > means average of y

The Hadamard expansion is :

V(a+b) + V(a-b) = 2 * ( V(a) + V(b) )

The normal Haar (quantizing the sum) gives variance :

V( (a+b)/2 ) + V( a-b ) = 5/4 * ( V(a) + V(b) ) - 3/2 * < a*b >

Quantizing the difference gives :

V(a+b) + V( (a-b)/2 ) = 5/4 * ( V(a) + V(b) ) + 3/2 * < a*b >
Note that quantizing the difference is the same as negating one of the values before doing the Haar.

You can see the Haar transform is not variance preserving. It reduces the variance if the correlation between the values is high enough, in particular :

Haar reduces variance if :

< a*b >  >=  (1/6) ( V(a) + V(b) )

This is interesting; the Haar may actually be better than the original 45 degree rotation it approximates. There is some implicit data modelling going on in the Haar; like most wavelets it is not norm preserving, not orthonormal, and it behaves well (reduces norm) when the input data fits the predicted pattern.

Addendum : I found another nice paper ( PL Haar PDF ), on something called the "PL Haar" (Piecewise Linear Haar) ; the PLHaar transform is itself interesting, but aside from that they go over the same points I'm talking about here re: the Haar and range expansion, etc. , and they have some nice pictures.

BTW you can see the implicit modeling I'm talking about in the PL Haar very neatly as well. By design, the PLHaar preserves Linf norm, not L2 norm. L2 norm is (roughly) what matters for coding. In the square domain, stuff in the corners has the max L2 norm, and stuff along the sides has the lower L2 norm. By doing the PL 45 degree rotation, PLHaar takes the large L2 norm stuff along the x=y lines and rotates them to the sides of the matrix, giving them the lower norm.

No comments:

old rants