P = "1 - e^( - 20*19/(2*(2^31)) )" = 8.847564e-008
Now if you have N names, what's the chance that *any* of them has a collision?
C = 1 - (1 - P)^N
So, for what N of objects is C = 50% ?
log(0.5) = N * log(1 - P) N = log(0.5)/(log(1-P) N = 7,834,327
You can have 7 Million files before you get a 50% chance of any one of them having a collision.
For a more reasonable number like N = 100k , what's the chance of collision?
C = "1 - (1 - 8.847564*(10^-8))^100000" = 0.00881 = 0.881 %
These probabilities are very low, and I have been pessimistic so they're even lower, but perhaps they are too high. On any given project, an 0.881% chance of a problem is probably okay, but for me I have to care about what's the chance that *any customer* has a problem, which puts me closer to the 7 M number of files and means it is likely I would see one problem. Of course a collision is not a disaster. You just tell Oodle to refresh everything manually, and the chance is that only happens once to anybody ever in the next few years.
BTW we used CRC32 to hash all the strings in Oddworld : Stranger's Wrath. We had around 100k unique strings which is pretty close to the breaking point for CRC32 (unlike the above case, they all had to be unique against each other). Everything was fine until near the end of the project when we got a few name collisions. Oh noes! Disaster! Not really. I just changed one of the names from "blue_rock.tga" to "blue_rock_1.tga" and it no longer collided. Yep, that was a crazy tough problem. (of course I can't use solutions like that as a library designer).