10-17-08 - 1

For static data (or semi-static) and very low memory use, one of the best data structures for lookup is just a sorted array. I've written about this a bunch in the past long ago, so you can look back if you like, but hey you basically sort & then you binary search, and it has zero memory use for the lookup because you're just holding the data you had to hold anyway. Part of the reason it's good is that in the common case of small lookups, hey all your memory is right there in a row and it all comes in to cache, and you never have to jump around pointer chasing, and you can easily do things like once you get down to a range of 8 elements or so in your binary search you switch to linear.

While that lookup is log(N) it can be a little slow for big trees, partly because logN gets big, but mainly because you jump all over in memory and cache miss at every branch. Rather than using a plain sorted list, you could do a binary tree style of reshuffle, eg. first sort, and then reorder the list to {N/2, 0,N, N/4,3N/4, N/8,3N/8,5N/8,7N/8 , ... } so that as you do a binary search the data you want to lookup is right near you in cache, of course this is just a binary tree with implicit child indexes, it's just a reshuffling of the array it's not creating any new nodes. Actually I guess the best layout is something like a van Emde Boas tree; it's intuitively obvious that you don't just want all the elements at the same depth next to each other, rather you want to gather subtrees near the leaves so they are consecutive chunks in memory. One problem I'm having is that I don't want to store indexes to kids, I want to implicitly index the tree, and it's not clear to me how to implicitly index a vEB tree.

BTW this is sort of like doing a B-tree but having 16 elements in each level of the B-tree and using a binary search to find your element in each step of the B-tree. So really you still just have a binary tree overall, but it's organized differently in memory. This is simple and all, and in fact pretty good, but the problem is that choice of # of elements at each level of the B-tree is very strongly tied to your architecture's cache line size, which is evil for libraries. See Cache Oblivious Binary Trees .

There must be lots of research on this but I'm having trouble finding it. What I'm interested in is speeding this up simply, or adding a small acceleration structure, something that's O( sqrt(N) ) in size, not O( N ) like a full hash.

For example it seems obvious that you should be able to do a jump-ahead structure to get past half the depth. One obvious thing would be to hash the lookup key down to log(N)/2 bits, then make a seperate sorted list for each of the hash values.

No comments:

old rants