On the plus side, the memory use is quite low. One thing I hadn't really thought of is that using mod-prime hashes lets you size your hash much closer to the size that you really want. Like in one test I have 8950 words, which with a pow2 means I jump up to 16384 and waste a lot of space. With mod-prime I can have a table of primes and get close to the fill size I want, like for 75% I can use 11933 which is almost perfect.
The hash for strings in khash is a variant of a "Bernstein" hash which is very special-cased to ASCII. It's actually a very poor hash, but it works totally fine when you mod-prime it. mod-prime just like magically fixes crappy hashes and makes them okay. I tried putting some better hash functions in khash and they don't really make a difference. I saw the same thing with STLport hash which also uses a mod-prime - it's very insensitive to the hash function.
Anyway khash performs similarly to the STLport hash but with lower memory use, so that's cool. Basically a good old fashioned reprobing hash is totally still a good thing. Sean's "stb.h" has a very similar hash implementation that is better in some ways. STB doesn't have seperate flags, it uses the EMPTY and DELETED keys. STB uses pow-2 sizes, but I'm not sure that's a win, the mod-prime is pretty nice for these hashes.
Very large benchmark : num words : 775481 num queries : 1602134 jumplist : 0.13 seconds, 249.35 clocks, rate= 12.04 M/s mem use: 10398156 = 13.41/per strhash : 0.14 seconds, 255.68 clocks, rate= 11.75 M/s mem use: 12582912 = 16.23/per std::hashset : 0.17 seconds, 311.93 clocks, rate= 9.63 M/s khash : 0.16 seconds, 303.12 clocks, rate= 9.91 M/s khash mem use : 6684696 = 8.62/per sorted : 0.43 seconds, 809.12 clocks, rate= 3.71 M/s mem use: 6203848 = 8.00/per implicit bt : 0.33 seconds, 621.21 clocks, rate= 4.83 M/s mem use: 8388608 = 10.82/per Small benchmark : num words : 8960 num queries : 1602134 jumplist : 0.06 seconds, 113.15 clocks, rate= 26.54 M/s mem use: 137220 = 15.31/per strhash : 0.07 seconds, 138.33 clocks, rate= 21.71 M/s mem use: 196608 = 21.94/per std::hashset : 0.08 seconds, 150.36 clocks, rate= 19.97 M/s khash : 0.11 seconds, 197.32 clocks, rate= 15.22 M/s khash mem use : 52232 = 5.83/per sorted : 0.20 seconds, 367.69 clocks, rate= 8.17 M/s mem use: 71680 = 8.00/per implicit bt : 0.16 seconds, 294.28 clocks, rate= 10.21 M/s mem use: 131072 = 14.63/per
Very Large benchmark is all the unique words of 4 chars or more in the 10^8 byte enwik8 data set.