Ninja: N-Dice Sums

The following article describes how to derive a (recursive) probability distribution for the sum of rolling N-dice (the fruit ninja slices and dices) with very rudimentary probability and statistics knowledge.

We are all familiar with the distribution for a fair, 6-sided die:

\(x\) 1 2 3 4 5 6
\(P(x)\) \(\frac{1}{6}\) \(\frac{1}{6}\) \(\frac{1}{6}\) \(\frac{1}{6}\) \(\frac{1}{6}\) \(\frac{1}{6}\)

What about that of a two dice? Those of you familiar with combinations will immediately jump to a solution. However we'll be using a more simplistic approach. Let \(X\) be the sum of the faces of the two dice, and \(P_2\) the probability function for the sum. Then what we want to find is: $$ P_2(X=x) = \sum{P(i) \times P(j)} $$

subject to \(i + j = x\), and \(i, j \in \{1,2,3 \ldots 6\}\). We can then use this definition to find the probability distribution. So to calculate the distribution for 2 dice: $$ \begin{aligned} P_2(2) &= P(1)P(1) = \frac{1}{36} \\ P_2(3) &= P(1)P(2) + P(2)P(1) = \frac{2}{36} \\ P_2(4) &= P(1)P(3) + P(2)P(2) + P(3)P(1) = \frac{3}{36} \\ &\ldots \end{aligned} $$

Simple enough. Now let's generalise our formula and say that, for \(N\) dice rolls the probability distribution over the set of outputs \(X\), we can write: $$ \begin{aligned} P_N(X=x) &= \sum{P_{N-1}(i) \times P_1(j)} \\ \text{where } &\, i + j = x, \\ \text{and } &\, j \in \{1,2,3, \ldots 6\} \end{aligned} $$

Which is the same as saying that the probability of the sum being \(x\) is the sum of the probability of the previous \(N-1\) dice rolls being \(i\) times (and) the probability of the Nth dice roll being \(j\), where \(i + j = x\). We'll now simplify the expression by noting that for any value of \(j\), \(P_1(j) = \frac{1}{6}\). $$ P_N(X=x) = \frac{1}{6}\sum^u_{i=l}{P_{N-1}(i)} $$

? Note: I picked this recursive form because it frees us from having to think about the combination of values required. For example if we want to build the value 4 from 3 dice rolls, we don't have to think about the combinations, i.e. {1,1,2}, {1,2,1}, {2,1,1}.

All that's left to do now is to find the upper and lower bounds, \(u\) and \(l\) respectively, for the sum. Now see that \(1 \leq j \leq 6\) needs to be satisfied because \(P_1\) only has non-zero values for 1 to 6. From \(i + j = x\): $$ 1 \leq x - i \leq 6 $$

After some rearranging and solving for \(i\), we arrive at the following pair of inequalities: $$ \begin{aligned} i &\geq x - 6 \\ i &\leq x - 1 \end{aligned} $$

At the same time, we know that \(u\) cannot be larger than \(6(N-1)\) because it is the maximum value of \(x\) for the previous distribution. Also, \(l\) cannot be smaller than \(N - 1\), the smallest value for the previous distribution. So we have, combining with our previous inequalities: $$ \begin{aligned} l &= \max(N - 1, x - 6) \\ u &= \min(x - 1, 6(N - 1)) \end{aligned} $$

Which is the same as saying that:

Putting it all together we have: $$ \begin{aligned} P_1(x) &= \frac{1}{6} \\ P_N(X=x) &= \frac{1}{6}\sum_{i=l}^{u}{P_{N-1}(i)} \\ \text{where }\, l &= \max(N - 1, x - 6), \\ u &= \min(x - 1, 6(N - 1)) \end{aligned} $$

Simulation

Ninja is the name for a project on Github modelling the probability distribution of the sum of N-dice. You can find it at eugene-eeo/ninja, and run the simulations yourself. The distributions are simulated using the same technique as highlighted above, namely: $$ \begin{aligned} P_N(X=x) &= \sum{P_{N-1}(i) \times P_1(j)} \\ \text{where } &\, i + j = x, \\ \text{and } &\, j \in \{1,2,3, \ldots 6\} \end{aligned} $$

The core of the simulation algorithm is:

from collections import defaultdict
from fractions import Fraction


def dice_sum(prev_dist=None):
    p = Fraction(1, 6) # 1/6
    if prev_dist is None:
        return {n: p for n in range(1, 6+1)}

    dist = defaultdict(Fraction)
    for i in prev_dist:
        for j in range(1, 6+1):
            dist[i+j] += prev_dist[i] * p
    return dist

It is generational and only requires the previous distribution to calculate the next. As such the distribution for the Nth roll can be calculated fairly quickly in roughly O(N) time.

Because we are using arbitrary precision arithmetic in the form of fractions and bignums in Python we can also store and compare the values exactly. You can find the results from both the simulation and computation.