Testing on 10M arrays of average length 192 (random in [128,256]).
count : 10000000 totalBytes : 1920164768 clocks per byte : burtle : 1.658665 crc32 : 10.429893 adler32 : 1.396631 murmur : 1.110712 FNV : 2.520380So Adler is in fact decently fast, not as fast as Murmur but a bit faster than Burtle. (everything is crazy fast on my x64 lappy; the old post was on my work machine, everything is 2-3X faster on this beast; it's insane how much Core i7 can do per clock).
BTW I wasn't going to add Murmur and FNV to this test - I didn't test them before because they are really not "corruption detection" hashes, they are hashes for hash tables, in particular they don't really try to specifically gaurantee the one bit flips will change the hash or whatever it is that CRC's gaurantee, but after I saw how non-robust Adler was I figured I should add them to the test, and we will see that they do belong...
Now when I count collisions in the same way as before, a problem is evident :
collisions : rand32 : 11530 burtle : 0 crc32 : 11774 adler32 : 1969609 murmur : 11697 FNV : 11703note that as before, rand32 gives you a baseline on how many collisions a perfect 32 bit hash should give you - those collisions are just due to running into the limitted space of the 32 bit word. Burtle here is a 64 bit hash and never collides. (I think I screwed up my CRC a little bit, it's colliding more than it should. But anyhoo). Adler does *terribly*. But that's actually a known problem for short sequences.
How does it do on longer sequences ? On arrays of random length between 2k and 4k (average 3k) :
num hashes : 10000000 totalBytes : 30722620564 clocks per byte : burtle : 1.644675 crc32 : 11.638417 adler32 : 1.346784 murmur : 1.027105 FNV : 2.999243 collisions : rand32 : 11530 burtle : 0 crc32 : 11586 adler32 : 12335 murmur : 11781 FNV : 11653it's better, but still the worst of the group.
BTW I should note that the adler32 implementation does unrolling and rollup/rolldown and all that kind of stuff and none of the other ones do. So it's speed advantage is a bit unfair. All these sort of informal speed surveys should be taken with a grain of salt, since to really fairly compare them I would have to spend a few weeks on each one making sure I got it as fast as possible, and of course testing on various platforms. In particular FNV and Murmur use multiplies with is a no-go, but you could probably use shift and add to replace the multiplies, and you'd get something like Bob's "One at a Time" hash.
So I figured I'd test on what is more like my real usage scenario.
In the RAD LZH , I compress 16k data quanta, and check the CRC of each compressed chunk before decompressing. So compressed chunks are between 0 and 16k bytes. Since they are compressed they are near random bits. Corruption will take various forms, either complete stompage with random shite, or some bit flips, or tail or head stomps. Complete stompage has been tested in the above runs (it's the same as checking the collision rate for two unrelated sequences), so I tested incremental stomps.
I made random arrays between 256 and 16k bytes long. I then found the hash of that array, did some randomized incremental stomping, and took the hash after the changes. If the hashes were the same, it counts as a collision. The results are :
numTests : 13068402 burtle : 0 : 0.00000% crc32 : 0 : 0.00000% adler32 : 3 : 0.00002% murmur : 0 : 0.00000% FNV : 0 : 0.00000%Adler32 is the only one that fails to detect these incremental stomps. Granted the failure rate is pretty low (3/13068402) but that's not secure. Also, the hashes which are completely not designed for this (Murmur and FNV) do better. (BTW you might think the Adler32 failures are all on very short arrays; not quite, it does fail on a 256 byte case, then twice at 3840 bytes).
ADDENDUM : Ok I tested Fletcher32 too.
cycles : rand32 : 0.015727 burtle : 1.364066 crc32 : 4.527377 adler32 : 1.107550 fletcher32 : 0.697941 murmur : 0.976026 FNV : 2.439253 large buffers : num hashes : 10000000 totalBytes : 15361310411 rand32 : 11530 burtle64 : 0 crc32 : 11710 adler32 : 12891 fletcher32 : 11645 murmur : 11792 FNV : 11642 small buffers : num hashes : 10000000 totalBytes : 1920164768 rand32 : 11530 burtle64 : 0 crc32 : 11487 adler32 : 24377 fletcher32 : 11793 murmur : 11673 FNV : 11599 difficult small buffers : num hashes : 10000000 totalBytes : 1920164768 rand32 : 11530 burtle64 : 0 burtle32 : 11689 crc32 : 11774 adler32 : 1969609 fletcher32 : 11909 murmur : 11665 FNV : 11703Conclusion : Adler32 is very bad and unsafe. Fletcher32 looks perfectly solid and is very fast.
ADDENDUM 2 : a bit more testing. I re-ran the test of munging the array with incremental small changes of various types again. Running on lengths from 256 up to N, I get :
munge pattern 1 : length : 6400 numTests : 25069753 rand32 : 0 burtle64 : 0 burtle32 : 0 crc32 : 0 adler32 : 14 fletcher32 : 22 murmur : 0 FNV : 0 munge pattern 2 : length : 4096 numTests : 31322697 rand32 : 0 burtle64 : 0 burtle32 : 0 crc32 : 0 adler32 : 9 fletcher32 : 713 murmur : 0 FNV : 0
So I strike my conclusion that Fletcher is okay. Fletcher and Adler are both bad.
ADDENDUM 3 : Meh, it depends what kind of "corruption" you expect. The run above in which Fletcher is doing very badly includes some "munges" which tend to fill the array with lots of zeros, in which area it does very badly.
If you look at really true random noise type errors, and you always start your array full of random bits, and then you make random bit flips or random byte changes (between 1 and 7 of them), and then refill the array with rand, they perform as expected over a very large number of runs :
numTests : 27987536 rand32 : 3 : 0.00001% burtle64 : 2 : 0.00001% burtle32 : 2 : 0.00001% crc32 : 1 : 0.00000% adler32 : 1 : 0.00000% fletcher32 : 2 : 0.00001% murmur : 2 : 0.00001% FNV : 1 : 0.00000%