Probability of getting a duplicate value when calling GetHashCode() on strings

Large. (Sorry Jon!) The probability of getting a hash collision among short strings is extremely large. Given a set of only ten thousand distinct short strings drawn from common words, the probability of there being at least one collision in the set is approximately 1%. If you have eighty thousand strings, the probability of there … Read more

Can two different strings generate the same MD5 hash code?

For a set of even billions of assets, the chances of random collisions are negligibly small — nothing that you should worry about. Considering the birthday paradox, given a set of 2^64 (or 18,446,744,073,709,551,616) assets, the probability of a single MD5 collision within this set is 50%. At this scale, you’d probably beat Google in … Read more

Hash collision in git

Picking atoms on 10 Moons An SHA-1 hash is a 40 hex character string… that’s 4 bits per character times 40… 160 bits. Now we know 10 bits is approximately 1000 (1024 to be exact) meaning that there are 1 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 … Read more

How would Git handle a SHA-1 collision on a blob?

I did an experiment to find out exactly how Git would behave in this case. This is with version 2.7.9~rc0+next.20151210 (Debian version). I basically just reduced the hash size from 160-bit to 4-bit by applying the following diff and rebuilding git: — git-2.7.0~rc0+next.20151210.orig/block-sha1/sha1.c +++ git-2.7.0~rc0+next.20151210/block-sha1/sha1.c @@ -246,6 +246,8 @@ void blk_SHA1_Final(unsigned char hashou blk_SHA1_Update(ctx, padlen, … Read more

hash function in Python 3.3 returns different results between sessions

Python uses a random hash seed to prevent attackers from tar-pitting your application by sending you keys designed to collide. See the original vulnerability disclosure. By offsetting the hash with a random seed (set once at startup) attackers can no longer predict what keys will collide. You can set a fixed seed or disable the … Read more