A probabilistic way to discover the rainbow
AA probabilistic way to discover the rainbow
Joscha Prochno and Michael Schmitz
Abstract “No two rainbows are the same. Neither are two packs of Skittles.Enjoy an odd mix!”. Using an interpretation via spatial random walks,we quantify the probability that two randomly selected packs of Skittlescandy are identical and determine the expected number of packs one hasto purchase until the first match. We believe this problem to be appealingfor middle and high school students as well as undergraduate students atUniversity.
A slogan that you find on the back of a pack of Skittles candy says “No tworainbows are the same. Neither are two packs of Skittles. Enjoy an odd mix.”.In the online blog [1] it is described how the blogger found two identical packsof Skittles, among 468 packs with a total of 27,740 Skittles. Meticulously col-lecting the data for this experiment was apparently triggered by some earliercalculations. More precisely, the blogger writes:“A few months ago, we did some calculations on a cocktail napkin, so to speak,predicting that we should be able to find a pair of identical packs of Skittleswith a reasonably – and perhaps surprisingly – small amount of effort.”Whether performing this admittedly yummy experiment really only requiresa “small amount of effort” or not, we chose to write this short article as intellec-tual candy for the mathematically inclined reader and model this experiment asa probabilistic one. This allows to quantify the probability that two randomlyselected packs of Skittles candy are identical and, in a next step, to determinethe expected number of packs one has to purchase until the first match. Theapproach requires merely elementary probability theory and, as is typical forsuch a discrete problem, some combinatorial considerations.We also believe this problem to be appealing for middle and high schoolstudents as such an experiment can be repeated and subsequently be analysedby probabilistic tools. Here, an adaption in the precision of arguments createsa certain variability in the level of requirements. Concluding the article moresophisticated tools, such as generating functions, are employed, which certainlyexceed middle or high school level but are suitable for working with undergrad-uate university students. 1 a r X i v : . [ m a t h . HO ] O c t n order to approach this question mathematically, we need to start with asuitable model. Let us assume that each pack of Skittles contains the exact samenumber n ∈ N of Skittles (for brevity we sometimes say n -packs of Skittles) andthat there are d ∈ N different colours. When filling a pack of Skittles we wantto assume that we do it randomly and with a uniform distribution over the d colours, i.e., for each colour k ∈ { , . . . , d } and each Skittle entering a pack,the probability that it has the colour k is given by d . We shall say that twopacks of Skittles are identical if for each colour they contain the same numberof Skittles. A key to solving the Skittles problem elegantly will be a reinterpretation thatcan be best understood when considering only d = 2 colours for the time being,and generalize the idea to an arbitrary number d ≤ n of colours afterwards. Weconsider the filling process of a pack of n Skittles with two possible colours, sayred and green, and imagine it to be constructed by a planar random walk fromthe origin (since we start with an empty pack) on the integer lattice in the plane,where a step right adds a red sweet, and a step up adds a green sweet; each ofthe two possibilities being equally likely, having probability 1 /
2, while the stepsare independent. The pack is full on the line x + y = n , and the finishing pointon that line corresponds to a pack of x reds and y greens. The following pictureshows two lattice paths (dashed/solid) that both correspond to a pack of elevenSkittles with six reds and five greens. Picture 1:
Lattice paths represent Skittles packs.
There are (cid:0) nx (cid:1) (= (cid:0) ny (cid:1) ) paths from the origin to a given point ( x, y ) with x + y = n , as we can choose any of the n total steps to be the x steps to the Actually the number varies from pack to pack, in [1] it says that most studies suggest anaverage of about 60 candies per pack. There are five different colours/flavours. y steps up). Note that since x + y = n , indeed (cid:18) nx (cid:19) = (cid:18) nn − y (cid:19) = (cid:18) ny (cid:19) , where we used the identity (cid:0) nk (cid:1) = (cid:0) nn − k (cid:1) , which holds for all n ∈ N and k ∈{ , , . . . , n } . Clearly, each path with n steps has the probability (1 / n andtherefore such a random walk ends at ( x, y ) with probability (cid:0) nx (cid:1) / n .Thus, two independent random walks both end at a given point ( x, y ) withthe probability (cid:0) nx (cid:1) / n . Summing over all possible x ∈ { , , . . . , n } gives theprobability of the event E n that two independent random walks end at a mutualpoint with step distance n from the origin, namely P [ E n ] = (cid:80) nx =0 (cid:0) nx (cid:1) n . (1)Note that we have hereby solved the problem at hand, because P [ E n ] equals theprobability that two independently chosen n -packs of Skittles with two possiblecolours are identical, although the cardinality of the event E n is not the numberof pairs of identical n -packs with two possible colours. The trick is that wechose a model that takes order into account (by considering paths: each steprepresents a sweet entering a pack), although there is clearly no order in a packof Skittles.We can use a reinterpretation to simplify (1). To this end we consider thesame kind of random walks with 2 n instead of n steps. The probability thatsuch a walk ends at ( n, n ) equals (cid:0) nn (cid:1) / n , as n steps to the right have to bechosen out of 2 n steps. On the other hand, such a walk has to pass the line x + y = n at some point ( x, y ). This means that amongst the first n steps x steps to the right have been made. Then, in order to end at ( n, n ), amongst theremaining n steps n − x steps to the right have to be made (see Picture 2).3 icture 2: A walk with 2 n steps passes a point ( x, y ) with x + y = n . Therefore, there are (cid:0) nx (cid:1) · (cid:0) nn − x (cid:1) = (cid:0) nx (cid:1) paths that end at ( n, n ) and passthrough the point ( x, y ). Summing over all x tells us that there are (cid:80) nx =0 (cid:0) nx (cid:1) walks of step length 2 n that end at ( n, n ), and we obtain (cid:80) nx =0 (cid:0) nx (cid:1) n = P [ E n ] = (cid:0) nn (cid:1) n . Now we consider n -packs of Skittles with d possible colours, and start to gen-eralize our approach by tackling the case d = 3. That is, considering a spatialrandom walk from the origin on the integer lattice in 3-dimensional space, wherea step right adds a red sweet, a step forward adds a green sweet, and a step upadds a blue sweet. The pack is full on the plane x + y + z = n . Analogouslyto above we denote the event that two independently performed n -step randomwalks end at a mutual point by E n (or E dn in general). Picture 3:
A 7-step spatial random walk and its orthogonal projection from above.
For a walk to end at a given point ( x, y, z ) with x + y + z = n , there have tobe made z steps up and we have x + y = n − z . This means that if we look atthe situation directly from above (i.e., consider an orthogonal projection on the xy -plane), we have a random walk with n − z steps in the plane that ends atthe point ( x, y ), so we are back in the situation considered earlier (see picture3). Thus, the number of pairs of walks that both perform exactly z steps upand have the same endpoint equals (cid:0) nz (cid:1) | E n − z | , and summing over all possible z ∈ { , , . . . , n } yields | E n | = n (cid:88) z =0 (cid:18) nz (cid:19) | E n − z | . (2) What we did here is essentially the combinatorial proof of the Vandermonde identity (cid:0) m + m n (cid:1) = (cid:80) nk =0 (cid:0) m k (cid:1)(cid:0) m n − k (cid:1) for the case m = m = n .
4s each path has probability (1 / n , we obtain P [ E n ] = | E n | / n . Formula(2) is a nice recursion, but if we pause for a moment, we see that the sameconsiderations also provide a non-recursive expression. For a walk to end at( x, y, z ) we have (cid:0) nz (cid:1) choices for when to make the z steps up, and out of theremaining n − z steps we have (cid:0) n − zx (cid:1) choices when to make the x steps to theright. Therefore, there are (cid:0) nz (cid:1) (cid:0) n − zx (cid:1) pairs of walks with z steps up and x stepsto the right (and consequently y steps forward). Summing over all possible x, y, z now yields | E n | = (cid:88) x + y + z = nx,y,z ∈{ , ,...,n } (cid:18) nz (cid:19) (cid:18) n − zx (cid:19) . Keeping the assumption n − z − x = y in mind, we observe that (cid:18) nz (cid:19)(cid:18) n − zx (cid:19) = n !( n − z )! z ! · ( n − z )!( n − z − x )! x ! = n ! z ! x ! y ! , and recognize the multinomial coefficient (cid:0) nx,y,z (cid:1) := n ! x ! y ! z ! . Therefore,we mayrewrite | E n | = (cid:88) x + y + z = nx,y,z ∈{ , ,...,n } (cid:18) nx, y, z (cid:19) . It is now not so hard to generalize these ideas to n -packs of Skittles with anarbitrary number d of possible colours. Filling a pack can be thought of as aspatial random walk from the origin on the integer lattice in d -dimensional space,where a step in x -direction adds a sweet of colour 1, a step in x -direction addsa sweet of colour 2, and so on. Now, the considerations for both the recursiveand for the non-recursive formula are pretty much as above.For the recursion we fix a number k of steps that are made in direction x .There are (cid:0) nk (cid:1) possible choices for these k steps. The remaining n − k steps haveto be carried out in d − (cid:0) nk (cid:1) | E d − n − k | pairs of walksthat perform exactly k steps in direction x and end at a mutual point. Again,we sum over all k and obtain | E dn | = n (cid:88) k =0 (cid:18) nk (cid:19) | E d − n − k | . (3)The non-recursive formula is also generalized in a straightforward manner. Fora random walk to end at a given point ( k , k , . . . k d ) in d -dimensional spacewith k + k + . . . + k d = n , we have to choose k steps in x -direction, k stepsin x -direction, and so on. There are (cid:0) nk (cid:1) possible choices for the k steps in The general definition is (cid:0) nx ,...,x d (cid:1) := n ! x ! ...x d ! . A well known and nice interpretation ofthe multinomial coefficient is an alphabetical jumble, i.e., the number of distinct permutationsof a word of length n in which d different letters occur and each letter i occurs x i times. Forinstance, there are (cid:0) , , , , , (cid:1) different permutations of the Word SKITTLES. -direction. From the remaining n − k steps we have (cid:0) n − k k (cid:1) possibilities tochoose k steps in x -direction, and so on. Therefore, there are (cid:18) nk (cid:19) · (cid:18) n − k k (cid:19) · . . . · (cid:18) n − [ k + . . . + k d − ] k d (cid:19) = (cid:18) nk , k , . . . , k d (cid:19) paths with n steps that end at ( k , k , . . . , k d ). Summing over all possible k , . . . , k d yields | E dn | = (cid:88) k ··· + kd = nki ∈{ ,...,n } (cid:18) nk , . . . , k d (cid:19) . (4)As each path occurs with probability (1 /d ) n , we obtain P [ E dn ] = | E dn | /d n . Let us use our formulas to compute some values and see why it is nice to haveboth closed and recursive expressions. By (4) we have | E | = (cid:88) k k ki ∈{ , , } (cid:18) k , k (cid:19) = (cid:18) , (cid:19) + (cid:18) , (cid:19) + (cid:18) , (cid:19) = 6 . That was easy, so we could try dealing with slightly larger numbers in anotherexample, e.g., | E | = (cid:88) k k k ki ∈{ , , , } (cid:18) k , k , k (cid:19) . We can arrange the sum 3 = 0 + 0 + 3 in three and 3 = 0 + 1 + 2 in six possibleorders, while 3 = 1 + 1 + 1 has only one possible order. Therefore, we obtain | E | = 3 (cid:18) , , (cid:19) + 6 (cid:18) , , (cid:19) + (cid:18) , , (cid:19) = 3 + 6 · . Considering this (for n = 3 and d = 3 the sum already consists of ten sum-mands!) we are lucky to have a recursion for determining (e.g., by means of acomputer) the numbers | E nd | for larger n and d . Tables 1 and 2 show the valuesfor | E nd | and P [ E nd ] (rounded to four digits) for 1 ≤ n, d ≤ \ d 1 2 3 4 51 1 2 3 4 52 1 6 15 28 453 1 20 93 256 5454 1 70 639 2716 78855 1 252 4653 31504 127905 Table 1: | E nd | for 1 ≤ n, d ≤ \ d 1 2 3 4 51 1 0.5 0.3333 0.25 0.22 1 0.375 0.1825 0.1094 0.0723 1 0.3125 0.1276 0.0625 0.03494 1 0.2734 0.0974 0.0414 0.02025 1 0.2461 0.0788 0.0300 0.0131 Table 2: P [ E nd ] for 1 ≤ n, d ≤ Of course, we also want to know the probability that two randomly purchasedpacks of Skittles are identical assuming the realistic values of d = 5 colours and n = 60 sweets in each pack. This is P [ E ] = 0 . ... ≈ , We now imagine that somebody purchases a pack of Skittles candy each dayand compares it to any of the previously bought packs to see if it is identical toone of them (as said above, there are actually people who do such things). Wenow ask the following question:
How many packs must be bought on average until the first match appears, i.e.,what is the expected value of purchased packs in this experiment?
We know that two independent n -step random walks ω , ω represent iden-tical n -packs of Skittles if and only if they have the same endpoint. In this caselet us say that they are equivalent and write ω ∼ ω . In order to tackle theproblem at hand, we must switch from considering two walks to considering asequence ω , ω , ω , . . . of pairwise independent n -step random walks.As we already know, the probability that ω i ∼ ω j for i (cid:54) = j equals p := P [ E dn ].We consider a random variable X that shall return the number of purchasesuntil the first match, in other words X takes on the value (cid:96) ≥ ω , ω , . . . , ω (cid:96) − are pairwise non-equivalent and ω (cid:96) ∼ ω i for some i ∈ { , . . . , (cid:96) − } . The random walks are independent and there are (cid:0) (cid:96) − (cid:1) possibilities tochoose two elements of { ω , . . . , ω (cid:96) − } . Thus, the probability that ω , . . . , ω (cid:96) − are pairwise non-equivalent equals (1 − p )( (cid:96) − ). For each i ≤ (cid:96) − ω (cid:96) is equivalent to ω i equals p , and these (cid:96) − ω (cid:96) ∼ ω i for some i ≤ (cid:96) − (cid:96) − p . Fromthese considerations we obtain P [ X = (cid:96) ] = (1 − p )( (cid:96) − ) · ( (cid:96) − p. Thus, we have E [ X ] = ∞ (cid:88) (cid:96) =1 (cid:96) · P [ X = (cid:96) ] = ∞ (cid:88) (cid:96) =1 (cid:96) ( (cid:96) − p (1 − p )( (cid:96) − ) , For the precise definition of X and a more formal derivation of P [ X = (cid:96) ] see Section 7. n = 60 and d = 5. This is much smaller than the resultof the experiment in the blog [1] and, in fact, it should be, considering thatactually the packs of Skittles may contain different numbers of candies, whichclearly reduces the probability for a match and therefore increases the expectedvalue. In the internet blog mentioned above a generating function for the numbers P [ E nd ] is presented. In particular, it says that P [ E nd ] = 1 d n (cid:20) x n ( n !) (cid:21) (cid:88) k ≥ (cid:18) x k k ! (cid:19) d , ( d, n ∈ N , d ≤ n )which means that | E nd | = d n P [ E nd ] is the coefficient of x n ( n !) in the d th powerof the series (cid:80) k ≥ (cid:16) x k k ! (cid:17) . We want to see that this corresponds exactly to ourresult. We recall from (4) that | E nd | = (cid:88) k ··· + kd = nki ∈{ ,...,n } ( n !) ( k ! k ! · · · k d !) . Thus, it remains to show that the coefficient of x n in the d th power of the series (cid:80) k ≥ (cid:16) x k k ! (cid:17) equals (cid:88) k ... + kd = nk ,...,kd ∈{ ,...,n } ( k ! k ! · · · k d !) − . To this end we consider the d th power of the series, i.e., (cid:18) x · (1!) + x · (2!) + x · (3!) + . . . (cid:19) d . To understand how to expand this expression imagine the d brackets written outas a product. We have to pick exactly one factor from each of the d brackets,multiply these d factors, and sum over all possible choices. Thereby each ofthe chosen factors has the form x k ( k !) , and if we multiply d such factors, say x k ( k !) , x k ( k !) , . . . x kd ( k d !) , we get x k + k + ... + k d ) ( k ! k ! · · · k d !) . x equals 2 n if and only if k + k + . . . + k d = n , sowe have to sum over all choices of k , . . . k d ∈ { , . . . , n } satisfying this conditionand obtain that the coefficient of x n is given by (cid:88) k ... + kd = nk ,...,kd ∈{ ,...,n } k ! k ! · · · k d !) , as desired. We have assumed that all packs of Skittles contain the same number n of candies,which is actually not the case. In [1] it is pointed out that, assuming the number n of Skittles in a pack is independently distributed with probability densityfunction f , the probability that two randomly purchased packs are identical isgiven by ∞ (cid:88) n =1 f ( n ) P [ E nd ] . Moreover, it says that they guessed f ( n ) based on similar past studies andthereby obtained an expected value of 400-500 packs until the first match, de-pending on the assumptions for the density f .Concluding this article we want to point out a possible pitfall. It is relativelyeasy to determine the number of different packs of Skittles if each pack contains n Skittles and d colors are available, and one obtains (cid:0) n + d − d − (cid:1) possibilities. Forinstance, if n = 60 and d = 5 this gives (cid:18) (cid:19) = 635 , / ,
376 would be wrong(even if we assume that each pack contains exactly 60 candies, which is not thecase), because the different packs are not equally likely. For instance, a packwith only red Skittles is less likely than a pack with twelve Skittles of each color.
In the preceding sections we computed probabilities in an intuitive manner, andwe want to specify this here by stating the corresponding probability spacesprecisely. For the 2-dimensional random walks with n steps considered in section2, which represent the filling processes of an n -pack of Skittles with only twopossible colors, we useΩ = (cid:8) ( x , . . . , x n ) : x , . . . , x n ∈ { r, u } (cid:9) , x i = r means that step i is a step to the right, and x i = u indicates thatthe i th step is a step up. Then, we have | Ω | = 2 n and P [ A ] = | A | / n for each A ⊆ Ω, as used intuitively above.For the 3-dimensional case corresponding to d = 3 colours we use Ω = { ( x , . . . , x n ) : x , . . . , x n ∈ { r, f, u }} with the obvious meanings of x i = r, f, u ,and P [ A ] = | A | / n for each A ⊆ Ω, since | Ω | = 3 n . For the general case, we letΩ = (cid:8) ( x , . . . , x n ) | x , . . . , x n ∈ { , , . . . , d } (cid:9) , where x i = k means that the i th step is a step in direction x k for each k ∈{ , . . . , d } . Clearly, we then have | Ω | = d n .When asking ourselves with which probability two randomly and indepen-dently chosen n -packs of Skittles are identical (i.e., two n -step random walkshave the same endpoint) we formally consider the sample space Ω of all pairs( ω , ω ) of random walks ω , ω . The probability measure is then the productmeasure given by P [( A , A )] = | A | · | A | / | Ω | . In the general case ( d possiblecolors/ d -dimensional space) this means P [ E dn ] = | E dn | /d n , as we have alreadyused intuitively above.When considering sequences of random walks as in Section 4, we formallydeal with the product sample space Ω N . The random variable X consideredthere is precisely defined by X : Ω N → N , ( ω i ) i ∈ N (cid:55)→ min (cid:8) n ∈ N : ω n ∼ ω i for some i < n (cid:9) . To determine P [ X = (cid:96) ], for fixed n, d ∈ N and (cid:96) ≥
2, we consider the events A ij = { ( ω k ) k ∈ N : ω i ∼ ω j } and let p := P [ A ij ] = P [ E nd ] for i (cid:54) = j . As ω , ω , . . . are independent, A ij is independent of A rs if i < j, r < s and ( i, j ) (cid:54) = ( r, s ), andwe obtain P [ X = (cid:96) ] = P (cid:34) (cid:92) i,j<(cid:96)i (cid:54) = j A cij ∩ (cid:96) − (cid:91) i =1 A (cid:96)i (cid:35) = (cid:89) i,j<(cid:96)i (cid:54) = j P (cid:2) A cij (cid:3) (cid:96) − (cid:88) i =1 P [ A (cid:96)i ] , where A cij denotes the complement of A ij . The latter equals (1 − p )( (cid:96) − )( (cid:96) − p as explained above. Acknowledgment
Joscha Prochno is supported by the Austrian Science Fund (FWF) with theProject P32405 “Asymptotic Geometric Analysis and Applications”. We thankMichael’s brother-in-law Friedrich Delgado for pointing out the internet blog[1] to him. We also thank Gunther Leobacher (Graz) for reading a preliminaryversion of this article and his helpful comments and suggestions.
References [1] PossiblyWrong.
Follow-up: I found two identical packs ofSkittles, among 468 packs with a total of 27,740 Skittles .10 ttps://possiblywrong.wordpress.com/2019/04/06/follow-up-i-found-two-identical-packs-of-skittles-among-468-packs-with-a-total-of-27740-skittles/ttps://possiblywrong.wordpress.com/2019/04/06/follow-up-i-found-two-identical-packs-of-skittles-among-468-packs-with-a-total-of-27740-skittles/