How to incrementally sample without replacement?

If you know in advance that you’re going to want to multiple samples without overlaps, easiest is to do random.shuffle() on list(range(100)) (Python 3 – can skip the list() in Python 2), then peel off slices as needed. s = list(range(100)) random.shuffle(s) first_sample = s[-10:] del s[-10:] second_sample = s[-10:] del s[-10:] # etc Else … Read more

Fastest primality test

The only deterministic, polynomial-time algorithm for primality testing I know of is the AKS primality test (http://en.wikipedia.org/wiki/AKS_primality_test). However, there are a lot of very good randomized primality tests that are fast and have extremely good probability of success. They usually work by finding whether the number is composite with exponentially good probability, so they’ll either … Read more

How do I assess the hash collision probability?

Equal hash means equal file, unless someone malicious is messing around with your files and injecting collisions. (this could be the case if they are downloading stuff from the internet) If that is the case go for a SHA2 based function. There are no accidental MD5 collisions, 1,47×10-29 is a really really really small number. … Read more

Cosmic Rays: what is the probability they will affect a program?

From Wikipedia: Studies by IBM in the 1990s suggest that computers typically experience about one cosmic-ray-induced error per 256 megabytes of RAM per month.[15] This means a probability of 3.7 × 10-9 per byte per month, or 1.4 × 10-15 per byte per second. If your program runs for 1 minute and occupies 20 MB … Read more

Python – Is a dictionary slow to find frequency of each character?

Performance comparison Note: time in the table doesn’t include the time it takes to load files. | approach | american-english, | big.txt, | time w.r.t. defaultdict | | | time, seconds | time, seconds | | |—————-+——————-+—————+————————-| | Counter | 0.451 | 3.367 | 3.6 | | setdefault | 0.348 | 2.320 | 2.5 | … Read more

Nth Combination

Note you can generate the sequence by recursively generating all combinations with the first element, then all combinations without. In both recursive cases, you drop the first element to get all combinations from n-1 elements. In Python: def combination(l, r): if r == 0: yield [] elif len(l) == r: yield l else: for c … Read more