Maximizing the Expected Value of a Lottery Ticket: How to Sell and When to Buy
MMaximizing the Expected Value of a Lottery Ticket:How to Sell and When to Buy
Allen Kim and Steven SkienaDept. of Computer ScienceStony Brook University { allekim | skiena } @cs.stonybrook.eduJanuary 2, 2020 Abstract
Unusually large prize pools in lotteries like Mega Millions and Powerball attract ad-ditional bettors, which increases the likelihood that multiple winners will have to sharethe pool. Thus, the expected value of a lottery ticket decreases as the probability ofcollisions (two or more bettors with identical winning tickets) increase. We proposea way to increase the expected value of lottery tickets by minimizing collisions, whilepreserving the independent generation necessary in a distributed point-of-sales envi-ronment. Our approach involves partitioning the ticket space among different vendorsand pairing them off to ensure no collisions among pairs. Our analysis demonstratesthat this approach increases the expected value each ticket, without increasing the sizeof the prize pool. We also analyze when ticket sales have maximal expected value, andshow that they provide positive returns when the jackpot is between $ $ With players attracted by the potential winnings from enormous lottery pools, mul-tistate lotteries like Mega Millions and Powerball sell tens to hundreds of millions oftickets each week across the United States. Larger lottery pools attract more sales,but the real expected value of a particular lottery ticket is a function of combinatorics,pool size, and consumer behavior.Calculating the probability of winning a lottery is a standard exercise in combina-torics [1, 2, 3, 4]. Each ticket for the Powerball lottery contains six numbers, with five“white balls” in range from 1 to 69, plus one “powerball” in the range from 1 to 26.The grand prize requires selecting all of these numbers correctly. Thus there are (cid:18) (cid:19)(cid:18) (cid:19) = 292 , , a r X i v : . [ c s . D M ] J a n he payoff for a winning ticket varies each week, depending on the size of the prizepool. The pool for Powerball starts at $
40 million, but increases each week until thereis a winner. On January 13, 2016 it reached a record high of $ without costing thecentral authorities any additional contributions to the payoff pool. Given that largerpotential winnings attract more players, we anticipate that implementation of ourscheme would generate increased interest in these games, and enlarge the ostensiblebenefits the governments running them can provide.Further, we demonstrate that the number of Powerball tickets bought increasesquadratically with pool size, which implies that tickets become increasing less valuableafter the pool passes a critical threshold. This analysis enables us to determine therange of pool sizes where tickets have positive expected value. In particular, we es-tablish that Powerball tickets bought (under the current sales model) with pool sizesbetween $ $ Purchasers of Powerball and Mega Millions tickets have the option to select the com-bination for each ticket they buy, and roughly 30% of tickets are sold with such self-selected combinations [6]. But self-selection leads to a greater likelihood of collisions,where multiple players pick the same combination and hence must share the prizepool should they win. People tend to choose numbers that are meaningful to them,such as dates and arithmetic progressions. This lack of independence skews certaincombinations to be selected far more often than would be suggested by chance [7].The remaining 70% of tickets for these lotteries are sold through
Quick Picks , wherethe point-of-sale terminal generates a combination at random. Details of the generationalgorithm are not available to us, but we presume that something like a standardlinear congruential generator (LCG) is used to produce pseudorandom numbers. Thesegenerators iteratively produce a sequence of values using the recurrence relation X n +1 = ( a · X n + c ) mod m Instantiated with appropriate constants a , c , and m , one can permute through all thevalues of m before repeating. See Knuth [8] for a thorough discussion on the theory ofrandom number generation.Our presumption is that such methods do an effective job selecting tickets withuniform probability on each sales terminal. But under the well known birthday paradox ,collisions occur surprisingly early in any such independent sampling strategy. Weexpect the first collision to happen after about (cid:113) πN ≈ . √ N tickets sold, where N is the size of the ticket space. For Powerball, where N = 292,201,338, this works outto an expected sales collision after only 21,367 tickets are sold. he problem of collisions is further complicated because tickets are sold simulta-neously at thousands of terminals across the nation. Synchronization of the randomnumber generators across these machines (with the same constants a , c , m , and initial X ) would be disastrous, because the same combinations would get sold repeatedly bydifferent stores.Quick Pick works independently across different lottery terminals. Kelly Cripe,spokeswoman for the Multi-State Lottery Association which runs Powerball, statedthat Quick Picks “has no memory of what it previously selected” as an explanation forwhy multiple players can get identical combinations [9]. Presuming that the constantsand initializations of the random number generators have been chosen correctly, thecollection of tickets across stores should be generated independently, with the resultantcollision probabilities well defined as a function of the number of tickets sold.All of this leads to the question of whether it is possible to construct an efficientdistributed lottery scheme such that the probability of having to share the prize isminimized. We consider a setting where m stores independently generate tickets on demand. Eachdistinct lottery ticket can be ranked, or equivallently put in a bijection with a distinctinteger ranging from 0 to N −
1. It is a straightforward task to unrank each such integerinto a ticket, as well as the inverse operation of ranking each ticket to correspondinginteger using a a recursive combinatoric approach. We discuss such operation in theappendix.In our analysis, we consider a ticket as winning only if it claims part of the grandprize, ignoring smaller prizes granted for similar but incomplete matches. We assumethe winning ticket will be drawn uniformly at random over the ticket space. To sim-plify our analysis, we assume that that all tickets are bought through a Quick Pickmechanism, meaning that customers cannot or do not selected their own combinations.Our goal here is to devise an efficient, distributed mechanism to implement QuickPick so as to optimize the expected value of a ticket, given that n tickets have beensold. We consider three different models: • Independent Generation – This is the simplest ticket generation strategy, and theone presumably implemented in current lottery point-of-sales terminals. Eachstore generates a integer in the ticket space from 0 to N − m stores generate tickets independently, withoutmemory of what they or any other store have generated in the past. The downsideis that no mechanism exists to prevent the same ticket being generated twice, indifferent stores or even the same store. • Central Server Generation – At the other end of the spectrum, we consider acentral server that stores communicate with, that ensures no duplicate ticketever gets sold until the ( N + 1)st request. uch a server could be implemented constructing a random permutation of theentire ticket space, and respond to the i th ticket request with the i th element inthis ordering. Alternately, we can represent the ticket space as a bit vector, andsearch from a randomly selected position 0 ≤ x ≤ N − i ≥ x . After N tickets have been sold, every subsequent ticket sold afterwill result in a collision, but this is clearly unavoidable due to the pigeonholeprinciple.Although this central server idea appears to be optimal in terms of preventingcollisions, it requires constant communication between each sales terminal andthe server. If at any point the connectivity is lost, tickets cannot be dispensed.We seek a generation approach where ticket machines can work independently,without any need of external communication while still minimizing collisions. • Deterministic Pairing – Here we propose a strategy where each store is assigneda “partner”, such that each store and its partner comprise a pair. Thus m storesyield p = (cid:100) m/ (cid:101) pairs. We partition the ticket space N into p regions, and assigna distinct region of size N/p to each pair. This represents the range of ticketsthat a particular pair is allowed to sell from. (Recall that each possible ticket isrepresented by a distinct integer from 0 to N − We now determine the expected value of a purchased ticket, given that k tickets havebeen sold. Let P be the prize pool for a winning ticket, and N be the size of the ticketspace. We first consider the expected value of a single ticket. If the contribution ofthis ticket is unique among all tickets sold, then its expected value is P/N , because theprobability of it winning is 1 /N and the prize if it wins is P . If the ticket is not uniqueand shares its numbers with g tickets in total, then its expected value is P/ ( gN ) sincethe prize would now be split among g people resulting in a prize of P/g .To get the expected value of a purchased ticket, we sum over all the expected prizesfor each ticket and divide by the total number of tickets. To find the sum of theseexpected prizes, we make the following observation. If we consider just a single set of g tickets that share the same numbers, then the sum of the expected values for thosetickets will always be P/N , regardless of g . This is due to the fact that for a given ticketnumber with g collisions, each ticket in the same set will have expected value P/ ( gN )and since we have g of them, we get a total expected value sum of P/ ( gN ) ∗ g = P/N .Thus, to compute the expected value sum over all tickets, we simply need to count thenumber of distinct ticket numbers and multiply this value by
P/N .In summary, the expected value of each ticket is simply the number of distincttickets sold multiplied by P/ ( kN ), where k the number of tickets sold so far. .1 Independent Random Generation In the case of independent random generation, the number of distinct tickets can becomputed analogously to the number of distinct birthdays among a random sample of k people.This is a known problem, but to motivate the solution, we assume we know how tocompute the number of distinct birthdays for k − k . The probability k does not share a birthday with any ofthe original k − (cid:0) N − N (cid:1) k − , and we can simply increase the expected valueby 1 in such a case. In the other scenario, the new person does not contribute to thenumber of distinct birthdays, so the value does not increase.This is summarized by the following recurrence, where N is the size of the ticketspace: E ( k ) = (cid:18) N − N (cid:19) k − [1 + E ( k − (cid:32) − (cid:18) N − N (cid:19) k − (cid:33) [ E ( k − (cid:18) N − N (cid:19) k − + E ( k − k − (cid:88) i =0 (cid:18) N − N (cid:19) i = 1 − (cid:0) N − N (cid:1) k − (cid:0) N − N (cid:1) = N · (cid:32) − (cid:18) N − N (cid:19) k (cid:33) To get the final expected value, we multiply by P/ ( kN ) to get: E IR ( k ) = (cid:34) − (cid:18) N − N (cid:19) k (cid:35) · P/k
In the case of the central server, each ticket contributes to a unique number. Thus,the expected number of distinct tickets is equal to the number of tickets sold until N tickets are sold. At that point, the maximum number of distinct tickets are sold. Theexpected value of each ticket is given by: E CS ( k ) = (cid:40) P/N k ≤ N Pk k > N The deterministic pairing scheme can be modeled by considering a balls and binsproblem with a limited capacity c for each bin, where the balls represent a ticket, andthe bins represent the partitions of the ticket space. A ball is discarded whenever itis thrown into a full bin. This captures the fact that the ticket values are recycledafter a partition of the ticket space is all used up. We calculate the total number ofballs remaining in the bins after k balls are thrown. This value of the number of balls emaining represents the number of distinct tickets sold so far. In the worst case, allthe balls get thrown into a single bin and k − c balls are discarded, but this is highlyunlikely.We solve this analytically for the case of two bins with c = N/
2. This is equivalentto considering two pairs of stores with an even partition of the ticket space with size N/ X with outcome values x i , each withprobability p i : E ( X ) = k (cid:88) i =1 p i x i In this setting, we wish to consider every possible sequence of ball tosses. We canrepresent this as a binary string, in which 0 represents the left bin and 1 representsthe right bin. The i th bit of the string will represent the i th ball thrown. We considerall binary strings of length k , and compute the expected value by summing over thevalues when there are i zeros and k − i ones directly for all i .The values, in this case, are the minimum between i (or k − i ) and N/
2. Thisreflects the discarding aspect as a bin cannot have more than N/ k tickets, is: E DP ( k ) = k (cid:88) i =0 (cid:0) ni (cid:1) k · NkP (cid:20) min (cid:18) i, N (cid:19) + min (cid:18) k − i, N (cid:19) (cid:21) Figure 1: The expected fraction of the pool size claimed over all tickets sold under three dif-ferent models, for N = 100 , We provide simulation results in Figure 1, for N = 100 , ndependent strategy does the worst of the three methods as collisions arise relativelyquickly. The deterministic pairing method does quite well as it nearly does as well asthe ideal server model, making it best among practical methods. We now analyze at what point in the jackpot is the expected value of a ticket maximal.To do so, we first estimate the number of tickets sold for a given jackpot size. We dothis by collecting data on lottery ticket sales across the United States for Powerball .By graphing the number of tickets sold as a function of the jackpot, we note that thecurve is approximately quadratic. Thus, we run linear regression to find the best fitquadratic formula. If T ( j ) is defined to be a function of the jackpot that outputs thenumber of tickets sold, we find that it is approximately: T ( j ) = 278 . j − . j + 10582740 . j is measured in terms of millions of dollars. Figure 2: The number of tickets sold as a function of the jackpot
Given this model, we can evaluate E IR ( j ), the expected value of a ticket for agiven jackpot. Recall the expected value of a ticket can be computed for the randomindependent generation scheme, where k = T ( j ) is the number of tickets sold and N isthe size of the ticket space (the size of the ticket space N is 292,201,338 for Powerball),as E IR ( j ) = (cid:34) − (cid:18) N − N (cid:19) T ( j ) (cid:35) · P/T ( j ) https://lottoreport.com/ticketcomparison.htm Similarly, for the (ideal) central server approach, we evaluate E CS ( j ), as E CS ( j ) = (cid:40) PN T ( j ) ≤ N PT ( j ) T ( j ) > N In the end, we get the following results presented in Figure 5. The cost of eachlottery ticket is $
2, so we are interested in situations when the expected value of a ticketis greater than $
2. We see that for the standard Quick Pick scheme, one can expectto see returns when the jackpot is between $
775 million and $ $ $
584 million and $ We propose an alternative to the standard ticket generation scheme used in popularlotteries that generally minimizes collisions and raises the expected value of a ticket.Our deterministic pairing method only requires an agreed setup between the lotteryassociate and its distributors. No further communication is required during sales. Forfuture work, one may consider adding some degree of communication to establish howmuch more this method can be improved upon. Analyzing the impact of non-uniform icket sales among stores (some more popular than others) is another factor to consideras well.The reader may wonder what the catch is with our ticket generation procedure.How can we really increase expected value by affecting sales strategy, without anychange in the cost of the lottery pool? Over the course of a single lottery, it is clearthat we accomplish our goals. But there are certain subtleties in running a sequence oflotteries, where the pools increase whenever there is no winner the previous week. Byreducing duplicates entries, we increase the likelihood that prize will be claimed eachweek. Over a sequence of lotteries, our scheme will create fewer large pools resultingfrom long runs of unsuccessful contests. But no one likes to share, and a lucky winnerwould be more likely to keep all of it. References [1] Z. F¨uredi, G. J. Sz´ekely, and Z. Zubor, “On the lottery problem,”
Journal ofCombinatorial Designs , vol. 4, no. 1, pp. 5–10, 1996.[2] C. Barboianu,
The mathematics of lottery: odds, combinations, systems . LightningSource ;, 2009.[3] V. Lim, E. Deahl, L. Rubel, and S. Williams,
Local Lotto . IGI Global, 2015.[4] V. Lim, L. Rubel, L. Shookhoff, M. Sullivan, and S. Williams, “The lottery isa mathematics powerball,”
Mathematics Teaching in the Middle School , vol. 21,no. 9, pp. 526–532, 2016.[5] “Jackpot history,” 2019. .[6] M. Keneally, “How to pick your powerball lottery numbers,” Jan 2016. https://abcnews.go.com/US/pick-powerball-lottery-numbers/story?id=36113250 .[7] N. Wells, “It’s not a good idea to buy quick pick,” Jan 2016. .[8] D. E. Knuth,
The Art of Computer Programming, Volume 2 (3rd Ed.): Seminu-merical Algorithms . Boston, MA, USA: Addison-Wesley Longman Publishing Co.,Inc., 1997.[9] M. Rocheleau, “More than half of powerball tickets sold this timewill be duplicates.” , Jan 2016.
We describe a process of converting an integer n (rank) into a sequence of numbers(a lottery ticket). The reverse, going from ticket to rank, can be done in a similarmanner, where the steps are nearly reversed. Recall that a lottery ticket consists of5 integers from 1 to 69 (white balls), and a sixth integer ranging from 1 to 26 (the“powerball”). In this case, n must be at least 0 and less than 292,201,338. We firstconsider generating just the white balls in the range from 0 to 68 (we simply add 1 toeach ball in the end). ur approach is to generate each number sequentially, keeping track of a lowerbound to ensure a strictly increasing order. Let GenTicket(n,l,s) be a functionwhere n is the rank in question, l is a lower bound for the numbers we are allowedto use, and outputs a sequence of s integers in strictly increasing order with valuesat least 0 and less than h (globally provided). To generate the white balls, we wouldcall GenTicket(n,0,5) , where n is the rank and h is globally provided as 69. We candefine GenTicket as follows, where
Binom(n,k) counts the number of combinations tochoose k out of n objects: global h : upper bound of ticket numbersfunction GenTicket(n,l,s):if s == 1:return [n + l]else:i = 1while n >= Binom(h - l - i, s - 1):n -= Binom(h - l - i, s - 1)i += 1return [l + i - 1] + GenTicket(n, l + i, s - 1) Intuitively, we use
Binom to determine how many tickets there are starting with thegiven lower bound, and continuously reduce the ticket space until we know the rangein which the first number should lie. Then, we can recursively compute the rest of theticket.For the remaining powerball, we simply divide the integer n by the total number ofpossible white balls (11,238,513 for Powerball) to get the powerball number. As longas n is within the possible number of tickets, this will compute the appropriate “level”in the ticket space.Note that the first ( n = 0) ticket would be (1 , , , , ,
1) and the last ( n =292 , , , , , , , n = 100,000,000 by 11,238,513 to get the Powerball number, whichis 8. Now, we find the remainder of n when divided by 11,238,513 to get 10,091,896,which represents the rank that we have to compute the values of the white balls for. Wego step by step through the process to show how intermediate values are determined. Our starting point(?, ?, ?, ?, ?, 8)n: 10091896, lower bound: 0, size: 5
In the end, we increment each value by 1 to have every value start at 1, and we findthat the 100,000,000th ticket is (25, 33, 47, 51, 59, 9) ..