tag:blogger.com,1999:blog-5246987755651065286.post225376983424231765..comments2024-02-22T16:15:42.388-08:00Comments on cbloom rants: 08-21-10 - Adler32cbloomhttp://www.blogger.com/profile/10714564834899413045noreply@blogger.comBlogger22125tag:blogger.com,1999:blog-5246987755651065286.post-36385691762843251782010-11-19T11:54:52.396-08:002010-11-19T11:54:52.396-08:00"Hi Mr. Bloom,
"Conclusion: Fletcher32 ..."Hi Mr. Bloom,<br /><br />"Conclusion: Fletcher32 looks perfectly solid and is very fast."<br />"<br /><br />You didn't read the addenda !cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-27217832831873687232010-11-19T08:20:18.842-08:002010-11-19T08:20:18.842-08:00Hi Mr. Bloom,
"Conclusion: Fletcher32 looks ...Hi Mr. Bloom,<br /><br />"Conclusion: Fletcher32 looks perfectly solid and is very fast."<br /><br />Just check the collision table at:<br />http://encode.ru/threads/1160-Fastest-non-secure-hash-function!<br /><br />Regards,<br />Georgi 'Sanmayce'Georgi 'Sanmayce'https://www.blogger.com/profile/01558214040818712679noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-75576996037676198302010-08-25T00:11:17.133-07:002010-08-25T00:11:17.133-07:00To put it in perspective, if there is exactly 1 2^...To put it in perspective, if there is exactly 1 2^16 bit number which can be replaced by another specific 2^16 bit number without detection, then this possibility occurs on 1 in 2^32 possible random 16-bit munges.<br /><br />In fact there are two cases, since the swap can go in either direction, so the chance is 2^31.<br /><br />In the grand scheme of things that's not so bad (given that it's a 32-bit checksum). So a randomized version of it might be tolerable -- obviously having the 0x0000 for 0xffff error itself is untenable. But I agree at that point you might as well go with something else.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-75908988096299856752010-08-24T08:56:47.479-07:002010-08-24T08:56:47.479-07:00"So I had thought about how to deal with Flet..."So I had thought about how to deal with Fletcher's weird 00/FF problem. Maybe all you do is xor the input words with a counter or maybe some pseudorandom thing which would spread the naughtiness around a bit."<br /><br />Yeah I was thinking the same thing. I mean Fletcher does work okay as long as you don't hit the bad case. If you used an actually pseudorandom generator that changed as you went, it would be pretty hard to hit the bad case. But yeah then your complexity is back up to Burtle levels so fuck it.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-83983610155102874232010-08-23T22:08:15.181-07:002010-08-23T22:08:15.181-07:00So I had thought about how to deal with Fletcher&#...So I had thought about how to deal with Fletcher's weird 00/FF problem. Maybe all you do is xor the input words with a counter or maybe some pseudorandom thing which would spread the naughtiness around a bit.<br /><br />But yeah, screw it, just use Burtlewon3dhttps://www.blogger.com/profile/09787472194187459747noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-4915959997520730892010-08-23T15:18:41.157-07:002010-08-23T15:18:41.157-07:00BTW the reason why the Fletcher looks so bad in on...BTW the reason why the Fletcher looks so bad in one of those tests is because one of my Fuzz Munge modes is "stomp byte to 00" and another mode is "stomp byte to FF" , and those hit the bad case.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-77544443390545585822010-08-23T14:55:24.248-07:002010-08-23T14:55:24.248-07:00The munges are the same ones I used for fuzz testi...The munges are the same ones I used for fuzz testing :<br /><br />http://cbloomrants.blogspot.com/2010/08/08-19-10-fuzz-testing.html<br /><br />there's a variety of munge methods (single bit flip, stomp byte to zero, stomp byte to random, etc.).<br /><br />I pick a random munge method and check collision.<br /><br />I haven't done the study to see which munge types exactly it is that are causing problems for Adler and Fletcher. Maybe I'll do that for thoroughness.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-16989706925507306982010-08-23T14:51:20.351-07:002010-08-23T14:51:20.351-07:00Thanks for running these tests public.
What are t...Thanks for running these tests public.<br /><br />What are the munge patterns? What do the collisions look like?won3dhttps://www.blogger.com/profile/09787472194187459747noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-48870328394016254662010-08-23T12:59:20.613-07:002010-08-23T12:59:20.613-07:00I guess the issue is how easy is it for two change...<i>I guess the issue is how easy is it for two changes to cancel each other.</i><br /><br />I think even a single 1-bit change could be potentially invisible. The statistical goal is that every 1-bit change flip half the output bits, but there's no requirement that a 1-bit flip of ANY input produce flips in half the bits, hence no requirement they flip any visible bits. The problem is how much testing you'd have to do to find a bad case if you can't prove it, and a question of how meaningful the proof of such properties for adler/fletcher is if you're worried about larger stomps.<br /><br />Also, that thesis seemed to claim both that Fletcher32/Adler32 can detect any single error up to 16-bits wide (it talked in terms of k-bit-error, where I believe k in this case is 16), but also that they don't detect 0x0000 vs 0xffff, which seems kind of mystifying.<br /><br />But whatever the actual proven guarantee is, the question of whether such a proven guarantee buys you much for errors-found-in-practice compared to a hash like burtle/fnv is unclear.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-7491251975440931932010-08-23T11:54:13.993-07:002010-08-23T11:54:13.993-07:00Some addendums to original post.Some addendums to original post.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-63983829789986979432010-08-23T09:14:01.518-07:002010-08-23T09:14:01.518-07:00" The other thing to look at is 1-byte/1-b..." The other thing to look at is 1-byte/1-bit stomps. These are the sort of things that CRC and Fletcher/Adler are generally guaranteed to detect."<br /><br />Yeah, this is exactly what I did in the "So I figured I'd test on what is more like my real usage scenario." portion of the post.<br /><br />"The biggest concern would be something like Burtle that has more internal state than is visible in the hash,"<br /><br />Yeah but the Burtle one is rigorously designed so that all internal state maps to visible bits. In fact I think he specifically gaurantees that one bit flips show up. I guess the issue is how easy is it for two changes to cancel each other.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-40075255707939533432010-08-23T03:57:41.338-07:002010-08-23T03:57:41.338-07:00Oh wait, I found stuff later in that thesis where ...Oh wait, I found stuff later in that thesis where they underscore that Adler32 is worse than Fletcher32. Apparently historically, Adler32 was invented as an improvement of Fletcher16, unaware that you could just extend Fletcher as-is to 32.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-34791079963848967312010-08-23T03:56:32.662-07:002010-08-23T03:56:32.662-07:00The other thing to look at is 1-byte/1-bit stomps....The other thing to look at is 1-byte/1-bit stomps. These are the sort of things that CRC and Fletcher/Adler are generally guaranteed to detect. The hashes aren't normally analyzed on that front, but it wouldn't surprise me if they could be. The biggest concern would be something like Burtle that has more internal state than is visible in the hash, so you might be able to get cases where flipping a bit only affects the invisible state. (If the Burtle one you're looking at is like that.)<br /><br />Of course whether 1-bit/1-byte flips are plausible in your data I dunno.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-88341673809192326042010-08-23T03:48:37.798-07:002010-08-23T03:48:37.798-07:00Also, poking around wikipedia, it looks like commo...Also, poking around wikipedia, it looks like common understanding (like the thesis "The Effectiveness of Checksums for Embedded Networks") is that Adler32 and Fletcher32 are basically indistinguishable in terms of behavior (Adler32 slightly theoretically worse, but in graphs shown as essentially identical).<br /><br />I wonder what the deal is with that.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-63486740384193317832010-08-22T22:20:01.091-07:002010-08-22T22:20:01.091-07:00"I think PS3's are actually manufactured ..."I think PS3's are actually manufactured from the blood and tears of programmers."<br />And you don't even have to deal with the RSX! It only does everything... wrong.ryghttps://www.blogger.com/profile/03031635656201499907noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-27208265329258310692010-08-22T21:30:24.981-07:002010-08-22T21:30:24.981-07:00"But I mean adler/fletcher are vectorizable a..."But I mean adler/fletcher are vectorizable as a single stream"<br /><br />Oh yeah that's true. Though that may be true of some of the other mathematical hashes as well.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-73477651635598263862010-08-22T21:18:41.759-07:002010-08-22T21:18:41.759-07:00There is one huge advantage of Adler32 (or any byt...There is one huge advantage of Adler32 (or any byte-at-a-time hash) , which is that you can roll up to alignment, then run on 16-byte aligned pieces, then roll down the tail.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-29338822561515743742010-08-22T20:59:48.857-07:002010-08-22T20:59:48.857-07:00Oh, you totally commented on Fletcher's 0s v 1...Oh, you totally commented on Fletcher's 0s v 1s behavior in my original comment on the other thread.<br /><br />"I think all the hashes are trivially vectorizable by treating the data packet as 4 different streams and computing a hash for each stream in each lane, then mixing them at the end."<br /><br />Of course. But I mean adler/fletcher are vectorizable as a single stream.<br /><br />"I believe that does slightly compromise the quality of the hash though."<br /><br />It would amplify the issue if you have weaknesses in short strings (like adler, or fnv to some extent), but you should be able to compensate for this with a better final mix.<br /><br />But yeah, it is unlikely that hashing will be a significant part of your runtime or any of your critical paths. It's there if you ever feel like mental masturbation, neglecting your loved ones, etc.won3dhttps://www.blogger.com/profile/09787472194187459747noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-69269010297693645892010-08-22T20:30:58.537-07:002010-08-22T20:30:58.537-07:00I think all the hashes are trivially vectorizable ...I think all the hashes are trivially vectorizable by treating the data packet as 4 different streams and computing a hash for each stream in each lane, then mixing them at the end. I believe that does slightly compromise the quality of the hash though.<br /><br />I still think Burtle might well be the fastest on the SPU since it does DWORD lookups, and reading shit out of the array is going to be one of the slowest parts on the SPU. A huge speed difference on the SPU would also be whether I could align my words or not.<br /><br />If Bob could cook me up one that used 4 dwords instead of 3 I'm sure that would be the nuts (if my data could be 16-byte aligned).<br /><br />But I really don't feel like spending another month micro-optimizing some useless shit for this damn obsolete hunk of junk platform. I think PS3's are actually manufactured from the blood and tears of programmers.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-1851798351786261672010-08-22T20:16:42.508-07:002010-08-22T20:16:42.508-07:00I think "secure" was a poor choice of wo...I think "secure" was a poor choice of word, and lets leave it at that.<br /><br />Yeah, Fletcher has some annoying known collisions. IIRC, I think it confuses 0xFFFF with 0x0000. And adler and fletcher are both vectorizable so should be way faster than burtle or murmur.<br /><br />It is curious that Fletcher32 is obscure and Adler32 is not, considering Fletcher is the older. However, Fletcher16 had some serious problems, so people must have just tossed it aside.<br /><br />Also murmur is substantially faster if you interleave words 2x. See MurmurHash64B. You could also vectorize interleave murmur-like hashes using the parallel integer multiply instructions in SSE4. SSE2 has multiply, but it isn't particularly useful for murmur.won3dhttps://www.blogger.com/profile/09787472194187459747noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-9455091382625908382010-08-22T19:50:54.667-07:002010-08-22T19:50:54.667-07:00Eh, yeah, okay. I'm not making rigorous state...Eh, yeah, okay. I'm not making rigorous statements. I don't really know this field at all. But it seems like Adler32 was just a mistake. Fletcher32 is older, simpler, faster, and more robust.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-51867221510506264682010-08-22T19:27:39.097-07:002010-08-22T19:27:39.097-07:00I'm not trying to defend adler. But I'm no...I'm not trying to defend adler. But I'm not sure I buy this:<br /><br /><i>"Granted the failure rate is pretty low (3/13068402) but that's not secure."</i><br /><br />Why is 3/13,068,402 not secure, but 3/130,068,402 is secure? (Note, I pulled that number out of my butt, but it's compatible with the the 0/13,068,402 numbers.)<br /><br />Obviously one is more secure than the other, but it's not you postulated some binary test based on some minimum threshhold before you started.Anonymousnoreply@blogger.com