## Generalized birthday paradox — keygenme3 by Dcoder

The birthday problem [0] asks what’s the probability that among people at least two of them have the same birthday. The “paradox” is that the answer is counterintuitive — in a group of 23, the probability is close to 50%.

To see that, let , where is the probability of encountering at least one pair of equal birthdays in a group of people. is the probability that in a group of , birthdays are pairwise different. Assuming there are 365 days in a year, — first person is free to have any birthday, the second one can pick any day from the remaining 364. By generalizing this reasoning, we arrive at: , for (for , ).

Above graph shows how quickly rises.

If we replace birthdays with values of a hash function , birthday paradox provides a way to find collisions for . Algorithm is quite simple:

**CHECK-IF-COLLISION-EXISTS(k, H):**

- , ,
- pick random (without repetition) and compute ,
- if , return
*YES*, - ,
- ,
- if goto 1, else return
*PROBABLY NOT*.

Algorithm runs in , but this isn’t really useful — we want complexity to be parametrized by probability of finding a collision. Formula for isn’t easily invertible, so we need to approximate it by something simpler. One easy to derive approximation is (see [0]), so .

We can extend this reasoning and conclude that if there are “days” and persons, then the probability that there are at least two of them with the same birthday is . Now, if we assume that , where is equal to the size of image of our hash function, then by inverting the formula for (see [1]), we get , where is the desired probability of collision. For , . This approximation shows that for (for example) MD5, CHECK-IF-COLLISION-EXISTS would loop around times (MD5 hashes are 128 bits long) and return YES with probability 0.5.

# Generalized birthday paradox

There are multiple generalizations of this problem, but we are interested in a result of David Wagner from 2002 [2]. Citing from the paper’s abstract:

*“We study a k-dimensional generalization of the birthday problem: given k lists of n-bit values, find some way to choose one element from each list so that the resulting k values xor to zero. For k = 2, this is just the extremely well-known birthday problem, which has a square-root time algorithm (…). In this paper, we show new algorithms for the case k > 2: we show a cube-root time algorithm for the case of k = 4 lists (…).”*

For details, see the paper [2]. Key observations from our point of view are:

- algorithm for k=4 runs in cube-root time,
- xor can be replaced by addition modulo a power of 2.

Armed with this knowledge, we can take a look at the problem from KeygenMe3 by Dcoder [5].

## Keygenme3 by Dcoder

Here’s a high-level pseudocode of the protection scheme:

sum = 0 name_hash = hash(name) for i in 0..15: x = (i<<8) | serial[i] sn_hash = hash(x) sum += sn_hash if(sum == name_hash) good else bad

Hash function used is SipHash [3], which is a lightweight PRF (pseudorandom function).

You might recognize the problem stated by Dcoder as the subset sum problem (SUM) [4]. SUM is NP-complete and it’s unknown if there is a polynomial time algorithm that solves it exactly. The best exact general-case algorithm has complexity , where n is the size of the set we can pick numbers from. In our case , which makes prohibitively large.

There’s a probabilistic algorithm (described in [2]) that solves SUM in time, where is equal to the bit size of numbers used. In our case bits, so . The algorithm takes 4 lists of numbers: , , , and returns a set of tuples , where . The only requirement is that we can ‘extend’ all lists freely. All details are clearly explained in [2], so I won’t repeat them here.

At first sight, there are few problems:

- we have 16 lists of numbers, instead of 4 (there are 16 bytes in a serial),
- each of our 16 lists contains exactly 256 numbers (each one of 16 positions can take one of 256 possible values),
- the target sum is (almost always) nonzero.

We can deal with 1 and 2 by observing that instead of working with 16 lists, we can work with 4 lists containing random 4-element sums, so will contain 4-sums of elements from , , , , 4-sums of elements from , , , , etc. Solving SUM for , .., , solves it for , .., , because if and , then , where .

Problem 3 is easily dealt with by observing that subtracting a constant from all elements of a single list is equivalent to solving SUM for nonzero sum. Indeed: implies . Here, is the target sum subtracted from list number 4.

Keygen sources are available here. 4sum.py is an easy to read implementation of the original algorithm from [2]. sum4.c is the actual keygen source. Should compile without any warnings with ‘make’.

0 – http://en.wikipedia.org/wiki/Birthday_problem

1 – http://en.wikipedia.org/wiki/Birthday_attack

2 – http://www.cs.berkeley.edu/~daw/papers/genbday-long.ps

3 – https://131002.net/siphash/

4 – http://en.wikipedia.org/wiki/Subset_sum_problem

5 – http://crackmes.us/read.py?id=611

hi, what tool do you use in order to draw a graph??

This is a graph from wikipedia page, so I’m not shure, but it looks like a gnuplot graph.