10-21-08 - 5

So I benchmarked khash from Attractive Chaos ; it's a little bit slow, but the memory use is very low. It's basically a plain old flat mod-prime hash table like the old Knuth days. It uses a form of linear rehash reprobing. It has a separate flag array for empties which is a poor decision IMO, that means separate cache misses, and you pretty much always have key values that you can use for EMPTY and DELETED. It's better to just have the key array. He also has the data in a seperate array which is again bad for cache, but I'm not using that since I'm just doing a "set" not a "map".

On the plus side, the memory use is quite low. One thing I hadn't really thought of is that using mod-prime hashes lets you size your hash much closer to the size that you really want. Like in one test I have 8950 words, which with a pow2 means I jump up to 16384 and waste a lot of space. With mod-prime I can have a table of primes and get close to the fill size I want, like for 75% I can use 11933 which is almost perfect.

The hash for strings in khash is a variant of a "Bernstein" hash which is very special-cased to ASCII. It's actually a very poor hash, but it works totally fine when you mod-prime it. mod-prime just like magically fixes crappy hashes and makes them okay. I tried putting some better hash functions in khash and they don't really make a difference. I saw the same thing with STLport hash which also uses a mod-prime - it's very insensitive to the hash function.

Anyway khash performs similarly to the STLport hash but with lower memory use, so that's cool. Basically a good old fashioned reprobing hash is totally still a good thing. Sean's "stb.h" has a very similar hash implementation that is better in some ways. STB doesn't have seperate flags, it uses the EMPTY and DELETED keys. STB uses pow-2 sizes, but I'm not sure that's a win, the mod-prime is pretty nice for these hashes.

Very large benchmark :

num words : 775481
num queries : 1602134

jumplist             : 0.13 seconds, 249.35 clocks, rate= 12.04 M/s
mem use: 10398156 = 13.41/per

strhash              : 0.14 seconds, 255.68 clocks, rate= 11.75 M/s
mem use: 12582912 = 16.23/per

std::hashset         : 0.17 seconds, 311.93 clocks, rate= 9.63 M/s

khash                : 0.16 seconds, 303.12 clocks, rate= 9.91 M/s
khash mem use : 6684696 = 8.62/per

sorted               : 0.43 seconds, 809.12 clocks, rate= 3.71 M/s
mem use: 6203848 = 8.00/per

implicit bt          : 0.33 seconds, 621.21 clocks, rate= 4.83 M/s
mem use: 8388608 = 10.82/per

Small benchmark :

num words : 8960
num queries : 1602134

jumplist             : 0.06 seconds, 113.15 clocks, rate= 26.54 M/s
mem use: 137220 = 15.31/per

strhash              : 0.07 seconds, 138.33 clocks, rate= 21.71 M/s
mem use: 196608 = 21.94/per

std::hashset         : 0.08 seconds, 150.36 clocks, rate= 19.97 M/s

khash                : 0.11 seconds, 197.32 clocks, rate= 15.22 M/s
khash mem use : 52232 = 5.83/per

sorted               : 0.20 seconds, 367.69 clocks, rate= 8.17 M/s
mem use: 71680 = 8.00/per

implicit bt          : 0.16 seconds, 294.28 clocks, rate= 10.21 M/s
mem use: 131072 = 14.63/per

Very Large benchmark is all the unique words of 4 chars or more in the 10^8 byte enwik8 data set.

No comments:

old rants