cbloom rants: 10-02-11 - How to walk binary interval tree

I noted previously that SA3 uses a binary interval tree. It's obvious how that works but it always takes me a second to figure it out so let's write it down.

This is going to be all very similar to previous notes on cumulative probability trees (and followup ).

Say you have an array of 256 entries. You build a min tree. Each entry in the tree stores the minimum value over an interval. The top entry covers the range [0,255] , then you have [0,127][128,255] , etc.

If you're at index i and you want to find the next entry which has a value lower than yours. That is,


find j such that A[j] < A[i] and j > i

you just have to find an interval in the min tree whose min is < A[i]. To make the walk fast you want to step by the largest intervals possible. (once you find the interval that must contain your value you do a normal binary search within that interval)

You can't use the interval that overlaps you, because you only want to look at j > i.

It's easy to do this walk elegantly using the fact that the binary representation of integers is a sort of binary interval tree.

Say we start at i = 37 (00100101). We need to walk from 37 to 256. Obviously we want to use the [128,256) range to make a big step. And the [64,128). We can't use the [32,64) because we're inside that range - this corresponds to the top on bit of 37. We can use [48,64) and [40,48) because those bits are off. We can't use [36,40) but we can use [38,40) (and the bottom on bit corresponds to [37,38) which we are in).

Doing it backwards, you start from whatever index (such as i=37). Find the lowest on bit. That is the interval you can step by (in this case 1). So step by 1 to i=38. Stepping by the lowest bit always acts to clear that bit and push the lowest on bit higher up (sometimes more than 1 level). Now find the next lowest on bit. 38 = 00100110 , so step by 2 to 40 = 00101000 , now step by 8 to 48 = 00110000 , now step by 16 to 64 = 01000000. etc.

In pseudo code :


Start at index i

while ( i < end )
{
  int step = i & (-i); // isolate bottom bit
  // I'm looking at the range [i,i+step]
  int level = BitPos(step);
  check tree[level][i>>level];
  i += step;
}

Now this should all be pretty obvious, but here comes the juju.

I've written tree[][] as if it is layed out in the intuitive way, that is :


tree[0] has 256 entries for the 1-step ranges
tree[1] has 128 entries for 2-step ranges
...

The total is 512 entries which is O(N). But notice that tree[0] is only actually ever used for indexes that have the bottom bit on. So the half of them that have the bottom bit off are not needed. Then tree[1] is only used for entries that have the second bit on (but bottom bit off). So the tree[1] entries can actually slot right into the blanks of tree[0], and half the blanks are left over. And so on...

It's a Fenwick tree!

So our revised iteration is :


// searching :
Start at index i
(i starts at 1)

while ( i < end )
{
  int step = i & (-i); // isolate bottom bit
  // I'm looking at the range [i,i+step]
  check fenwick_tree[i];
  i += step;
}

// building :

for all i
{
  int step = i & (-i); // isolate bottom bit
  fenwick_tree[i] = min on range [i,i+step]
}

(in practice you need to build with a binary recursion; eg. level L is built from two entries of level L-1).

Note that to walk backwards you need the opposite entries. That is, at level 7 (steps of 128) you only need [128,256) to walk forward, never [0,128) because a value in that range can't take that step. To walk backwards, you only need [0,128) , never [128,256). So in fact to walk forward or backward you need all the entries. When we made the "Fenwick compaction" for the forward walk, we threw away half the values - those are exactly the values that need to be in the backward tree.

For the three bit case , range [0,8) , the trees are :


Forward :

+-------------------------------+
|              0-8              |
+-------------------------------+
|       ^       |      4-8      |
+---------------+---------------+
|   ^   |  2-4  |   ^   |  6-8  |
+---------------|---------------+
| ^ |1-2| ^ |3-4| ^ |5-6| ^ |7-8| 
+-------------------------------+

where ^ means go up a level to find your step
the bottom row is indexed [0,7] and 8 is off the end on the right
so eg if you start at 5 you do 5->6 then 6->8 and done

Backward :

+-------------------------------+
|              8-0              |
+-------------------------------+
|      4-0      |       ^       |
+---------------+---------------+
|  2-0  |   ^   |  6-4  |   ^   |
+---------------|---------------+
|1-0| ^ |3-2| ^ |5-4| ^ |7-6| ^ | 
+-------------------------------+

the bottom row is indexed [1,8] and 0 is off the end of the left
so eg if you start at 5 you go 5->4 then 4->0 and done

You collapse the indexing Fenwick style on both by pushing the values down the ^ arrows. It should be clear you can take the two tables and lay them on top of each other to get a full set.

cbloom rants

10/02/2011

10-02-11 - How to walk binary interval tree

No comments:

old rants