Given a class of size \(N\), what's the probability
of *two* students sharing the same birthday? A
classical statistics problem. The answer to that,
rather unintuitively is:
$$
1 - \frac{365!}{(365 - N)! \times 365^N}
$$

We can see that the probability of birthday-sharing rises very quickly with respect to \(N\). In fact you need 23 people to have more than 50% chance of sharing a birthday.

There is also such a problem in crypography. When expressed formally - given a hash function with a finite (and this is crucial) number of outputs \( H \), find a message/'image' \(x\), and a different one \(x'\), such that[1]: $$ \text{hash}(x) = \text{hash}(x') $$

We know that the attacker will eventually find \(x\) and \(x'\) because hash functions have finite outputs - if they produce 128-bits for example then the space is just \(2^{128}\); while the sizes of their inputs are usually larger than their outputs. For instance SHA1 has a block size of 512 bits while it's digest (output) is only 160 bits.

The birthday attack exploits the solution to the birthday problem- you can see where this is going; if we have a finite amount of output \(H\) then the probability of finding \(x'\) after \(N\) trials with unique inputs is: $$ 1 - \frac{H!}{(H - N)! \times H^N} $$

This brings out a host of problems- particularly in digital signatures. A common example of an 'attack' is the following:

- Mallory generates a lot of versions of a fair contract \(x\), e.g. by re-wording it, changes in punctuation etc, such that it will still be acceptable for Bob.
- Mallory also generates a lot of versions of a fraudulent contract \(x'\). She then finds the version of the fair and unfair contracts that have the same hash.
- She gives \(x\) to Bob, who then signs it. She then attaches the signature to \(x'\), therefore proving that Bob has signed \(x'\).

The Birthday Attack is a test on collision resistance and second-image resistance (how easy it is to find \(x'\), not just through brute forcing but e.g. using analysis of the algorithm).

To make such attacks uneconomical[2] the hash function should choose a large enough \(H\) and it is simple to see why — from our expression for the probability, \(H^N\) grows much faster than \(H!\).