[PDF] Introducing Bayesian Analysis with m&m's ® : an active-learning exercise for undergraduates

Abstract

We present an active-learning strategy for undergraduates that applies Bayesian analysis to candy-covered chocolate m&m's ® . The exercise is best suited for small class sizes and tutorial settings, after students have been introduced to the concepts of Bayesian statistics. The exercise takes advantage of the non-uniform distribution of m&m's ® colours, and the difference in distributions made at two different factories. In this paper, we provide the intended learning outcomes, lesson plan and step-by-step guide for instruction, and open-source teaching materials. We also suggest an extension to the exercise for the graduate-level, which incorporates hierarchical Bayesian analysis.

Full PDF

IIntroducing Bayesian Analysis with m&m’s R (cid:13) :an active-learning exercise for undergraduates Gwendolyn Eadie , Daniela Huppenkothen , Aaron Springford , and TylerMcCormick a Department of Astronomy, Box 351580, University of Washington, Seattle, WA, 98195-1580 b eScience Institute, Campus Box 351570, U.W., 3910 15th Ave NE, Seattle, WA 98195 c DIRAC Institute, Box 351580, University of Washington, Seattle, WA, USA 98195-1580 d Weyerhaeuser Company, 220 Occidental Ave S, Seattle, WA 98104 e Department of Statistics, Box 354322, University of Washington, Seattle, WA 98195-4322 f Department of Sociology, University of Washington, Seattle, WA

Accepted to the Journal of Statistics Education on 28 March 2019 corresponding author: [email protected] [email protected] [email protected] [email protected] a r X i v : . [ s t a t . O T ] A p r bstract We present an active-learning strategy for undergraduates that applies Bayesian analysis tocandy-covered chocolate m&m’s R (cid:13) . The exercise is best suited for small class sizes and tutorialsettings, after students have been introduced to the concepts of Bayesian statistics. The exer-cise takes advantage of the non-uniform distribution of m&m’s R (cid:13) colours, and the difference indistributions made at two different factories. In this paper, we provide the intended learning out-comes, lesson plan and step-by-step guide for instruction, and open-source teaching materials.We also suggest an extension to the exercise for the graduate-level, which incorporates hierarchi-cal Bayesian analysis. KEYWORDS : Education, Bayesian Methods, Inference, Active-learning, Eliciting Priors ccepted to the JSE

Eadie et al. 2019

We have developed an active-learning exercise for upper-year undergraduates that applies Bayesiananalysis to m&m’s R (cid:13) candy. It is a fun activity that can be completed in a tutorial or small classroomsetting within a 50-80 min class. Part of the exercise relies on the fact that m&m’s R (cid:13) are made attwo different factories, and that the colour distributions produced at these factories are different.The exercise may also be extended to the graduate-level as a way to introduce and practicallyapply hierarchical Bayesian analysis (Section 8 and Appendix A).In short, the activity involves giving each student a bag of m&m’s R (cid:13) (their data), and guidingthem through an exercise to perform Bayesian analysis in order to:1. Infer the probability of drawing a blue m&m’s R (cid:13) from a new bag of m&m’s R (cid:13) , given a likelihood,prior, and their data. Predict the factory from which the m&m’s R (cid:13) were produced, based on the posterior distribu-tion for the entire class’ data .The exercise is primarily meant for undergraduates who have been introduced to Bayesianstatistics, but who have not yet applied Bayesian inference to a real problem. The exercise is alsorelevant to anyone new to Bayesian inference who has a quantitative background. The mathemat-ics in the exercise involves probability density functions that are both analytic and readily availablein many open-source software packages. Thus, it is straightforward for students to simplify thelikelihood times prior on the ﬂy, and also write computer code to plot their results immediately inclass. In the Github repository associated with this paper, we also provide R and Python scriptsfor the instructor and/or student to complete the exercise. If class time allows, then it might bebeneﬁcial for students to write their own scripts.In the next section of this paper, Section 2, we describe the pedagogical motivation and devel-opment of the Bayesian m&m’s R (cid:13) exercise. In Section 3, we describe the true colour distributionof m&m’s R (cid:13) as produced by the MARS company, and how this relates to the exercise. The lessonplan is presented in Section 4, which includes an overview (4.1), a list of intended-learning out-comes (4.2), and a detailed step-by-step outline (4.3) with suggested discussion questions andstrategies for implementation of the exercise in classrooms. Next, we present the actual posteriordistributions found by an undergraduate class and by a seminar for graduate students, postdocs,and faculty. Following this, we brieﬂy describe and provide links to the publicly available soft-ware instructors might like to use while running our active-learning exercise (Section 7). Finally,we present two extensions to the m&m’s R (cid:13) exercise that are intended for more advanced (e.g.,graduate-level) students (Section 8 and Appendix A). By the time students are introduced to Bayesian analysis, they have usually had several classeson the frequentist perspective. Therefore, it is important to provide students with a concrete wayto conceptualize the Bayesian framework, and this can be achieved through active-learning. Them&m’s R (cid:13) exercise presented in this paper provides such an activity.Author GE, in collaboration with AS, originally created the Bayesian m&m’s R (cid:13) lesson and activityfor a guest lecture in a third-year undergraduate astronomy class focused on statistics. The stu-dents had already been exposed to the Bayesian perspective but needed an interactive exampleto solidify the basic concepts. The class enrollment was approximately 10 students, and classtime ccepted to the JSE Eadie et al. 2019was 80 minutes. Thus, the learning environment was well-suited for an active-learning exercisethat would engage students and allow them to apply what they had learned about Bayesian statis-tics to a tangible example.While searching for an interactive learning tool for introductory Bayesian inference, GE foundblog posts describing how the colour distribution of m&m’s R (cid:13) can be used as a teaching tool forfrequentist statistics . There is an m&m’s R (cid:13) exercise using Bayes’ theorem in Downey (2013),but the content is quite brief and the intended-learning outcome (ILO) is somewhat unclear. Thegoal for the latter is to predict whether an m&m’s R (cid:13) came from a bag produced in 1994 or 1996(the colour distributions changed in 1995, when blue m&m’s R (cid:13) were introduced). Thus, it seemsthe ILO for the latter exercise was not to introduce Bayesian inference, but possibly to practicecalculating probabilities.A handful of publications describing learning activities with m&m’s R (cid:13) also exist (e.g., Alexanderand SherriJoyce, 1994a,b; SherriJoyce and Alexander, 1994; Fricker, 1996; Dyck and Gee, 1998;Lin and Sanders, 2006; Froelich and Stephenson, 2013; Schwartz, 2013)). As a group, these arti-cles cover a variety of topics in statistics, such as regression and correlation, analysis of variance,sampling distribution of the mean, design of experiments, and chi-squared goodness-of-ﬁt tests.Other activities have also been suggested for more general mathematics education, from popula-tion modeling (e.g., Winkel, 2009) to memoryless processes and hypergeometric functions (e.g.,Badinski et al., 2017).Albert and Rossman (2009) — a workbook intended for an entire introductory course on statis-tics through the Bayesian perspective — includes a Bayesian exercise using m&m’s R (cid:13) , entitled,“What proportion of m&m’s R (cid:13) are brown?”. This exercise resides in the Chapter “Learning aboutmodels using Bayes’ rule”, and is not meant to be a comprehensive, stand-alone activity aboutBayesian inference. The activity ﬁrst states that a bag of m&m’s R (cid:13) has 3 brown candies out of10 (this is the “real” data). Next, a table is presented that shows mock m&m’s R (cid:13) data that weresimulated from four different factory models. The students are tasked with comparing the “real”m&m’s R (cid:13) data to the simulations, along with their prior assumptions, in order to determine thebest factory model given their data. Thus, the exercise is not a thorough introduction to Bayesianinference, and instead plays a small part in a workbook that has “ [...] its emphasis on active learn-ing and its use of the Bayesian viewpoint to introduce the basic notions of statistical inference”(Preface, Albert and Rossman, 2009).In summary, a comprehensive Bayesian example using m&m’s R (cid:13) candies does not seem toexist in the education literature nor is one publicly available online, even though m&m’s R (cid:13) havebeen used as a teaching tool for statistics since the 1990s. Thus, we designed this comprehensiveactive-learning exercise with m&m’s R (cid:13) to help students learn and practice the concepts of Bayesiananalysis.GE ﬁrst implemented our Bayesian m&m’s R (cid:13) activity in a third-year undergraduate class and re-ceived very positive feedback from students. The exercise has also been presented and informallyrelayed at conferences in both astronomy and statistics, with consistently enthusiastic responses.We have since formalized the lesson plan, learning outcomes, and teaching materials, and madethem available through this manuscript. We have also developed and included an extension to them&m’s R (cid:13) exercise for more advanced classes that can be used to introduce concepts in hierarchicalBayesian analysis (Section 8 and Appendix A).Our goal is that the m&m’s R (cid:13) exercise may be widely used and improved upon by the greaterstatistics community and other quantitative disciplines that teach and use Bayesian analysis. https://blogs.sas.com/content/iml/2017/02/20/proportion-of-colors-mandms.html https://joshmadison.com/2007/12/02/mms-color-distribution-analysis/ https://katedegner.wordpress.com/tag/mm-color-distribution/ ccepted to the JSE Eadie et al. 2019Figure 1: The m&m’s R (cid:13) colour distributions produced at the New Jersey and Tenessee factories,respectively. The percentage of blue, orange, and green m&m’s R (cid:13) differ the most between the twofactories. This chart originally appears in Purtill (2017), and was created by Atlas (2017), usingdata from Wicklin (2017). R (cid:13) The active-learning exercise we present here takes advantage of an important fact about the pro-duction of m&m’s R (cid:13) : two factories of the MARS Company (one in Hackettstown, New Jersey, theother in Cleveland, Tennesee) make m&m’s R (cid:13) , but these factories produce different distributionsof m&m’s R (cid:13) colours! As shown in Figure 1, the New Jersey and Tennessee factories make signiﬁ-cantly different percentages of blue, orange, and green m&m’s R (cid:13) . The New Jersey and Tennesseefactories make 25% and 20.7% blue m&m’s R (cid:13) respectively, and this is the colour we use throughoutthe exercise. Of course, orange or green could instead be used and should give similar results.Due to the design of our activity, it is imperative to keep the true colour distribution a secret fromstudents until the end of the exercise. The exercise assumes most students have prior knowledgeabout m&m’s R (cid:13) (e.g., they have eaten them), but that they are unaware of the nonuniform colourdistribution. In our experience, students are surprised to ﬁnd out at that the m&m’s R (cid:13) colour dis-tributions are actually nonuniform, and that they also differ between the two factories. We haveconducted this exercise with New Jersey and Tennessee m&m’s R (cid:13) , and the posterior distributionspredicted the correct factory in both cases. We used peanut m&m’s R (cid:13) , but the exercise should workwith traditional m&m’s R (cid:13) too (and would also be more allergy-friendly). Admittedly, the exercise alsorelies on the factories not changing their production lines for the foreseeable future. ccepted to the JSE Eadie et al. 2019

The entire lesson consists of interactive lecturing, discussion, and active-learning. We recommendnot handing out the m&m’s R (cid:13) candy too early in the lesson, lest students become distracted, orworse, eat the evidence before it is recorded! In Sections 4.1–4.3, we provide an overview for theexercise, ILOs, and detailed steps and discussion questions. The lesson begins with a “hook” to engage students. A bag of m&m’s R (cid:13) is shown to the class,and they are asked to think about the distribution of colors inside. Next, they are asked how theywould predict the percentage of blue m&m’s R (cid:13) produced at the factory, using only a single bag ofm&m’s R (cid:13) . The instructor leads a discussion and points out the beneﬁts of using Bayesian inference,because it can incorporate the students’ prior knowledge about m&m’s R (cid:13) . Finally, the students arepresented with the exercise: Use Bayes’ theorem and a single bag of m&m’s R (cid:13) as data to predictthe percentage of blue m&m’s R (cid:13) produced at the factory.To help the students start the problem, Bayes theorem is reviewed and the instructor helpsstudents decide what likelihood will be appropriate for their study. The instructor also elicits priorinformation from the class, and helps students quantify the prior distribution based on this in-formation. (Of course, this could instead be presented in reverse order, with ﬁrst eliciting priorinformation and then introducing a likelihood.) The instructor can either select a method of priorelicitation in advance, or can present different options to the class for consideration and discussionif wanted.Next, students work out the product of the likelihood and prior to obtain an analytic form of theposterior distribution. If there is time and computer resources available, they may also write andprepare a short computer script to plot this distribution. Alternatively, they may use the JupyterNotebook provided as supplementary material to this manuscript. Next, each student is given abag of m&m’s R (cid:13) to open and inspect. They record the total number of m&m’s R (cid:13) and the number ofeach colour, and then eat the evidence if they so wish.Students now plot the posterior distribution given their data using either their computer scriptor the one provided, and then discuss the results with a partner. Following this, the instructorpools the m&m’s R (cid:13) data from the entire class and creates a posterior distribution for all of them&m’s R (cid:13) . A discussion and question period is then led about the effect of more data on theposterior distribution.Next, the surprise twist is revealed (Figure 1), and students are asked to infer which factoryproduced their bag of m&m’s R (cid:13) , using their posterior distribution. They can also compare theirinference to what they would infer given the entire class’ m&m’s R (cid:13) . After the students have arrivedat an answer, the instructor provides the factory codes for the two factories. Students can check thelot number on the back of their m&m’s R (cid:13) packages for veriﬁcation. In our experience, a class sizeof ten or more students seems to provide enough data for the mode of the posterior to accuratelypredict the percentage of blue m&m’s R (cid:13) and the factory from which the candy originated.It should be noted that our exercise may also provide an opportunity to compare the Bayesianformalism to the frequentist approach. Exactly how to make these connections will depend onthe students’ level and depth of exposure to both topics, so we leave this to the discretion of theinstructor. However, we strongly suggest that any such comparisons be done in a follow-up lessonor in a homework assignment, as adding this material to our exercise would be too much to ﬁt intoone class. ccepted to the JSE Eadie et al. 2019

This lesson assumes that students have been exposed to the idea of probability distributionsand have seen Bayes’ theorem, at least in passing. They should also have an understandingof parameters and their role in deﬁning the shape of a probability distribution. As mentionedpreviously, we also assume that students do not have prior knowledge about the surprise twistmentioned in Section 3.The intended learning outcomes (ILOs) are as follows. By the end of this lesson, studentsshould be able to:1. Recall Bayes’ theorem and identify the likelihood, prior, and posterior.2. Recognize and quantify prior information.3. State the conjugate prior to the binomial distribution, and list the hyperparameters.4. Understand the term hyperparameters.5. Calculate the posterior distribution for the probability of drawing a blue m&m’s R (cid:13) from a newcandy bag, given the m&m’s R (cid:13) data and the prior distribution.6. Write a working computer script to plot and compare the prior distribution to the posteriordistribution obtained with their data (if computing resources are available).7. Perform inference based on the Bayesian posterior distribution.8. Know the appropriate terminology for reporting results from Bayesian analysis.9. Connect this simple example to similar real-world practical applications from the students’domain of study. In many of the steps listed below, we include suggested discussion questions as bullets.1. As a “hook”, show a bag of m&m’s R (cid:13) to the class and explain that they are going to inferthe probability of getting a blue m&m’s R (cid:13) using Bayes’ Theorem. Each person will get a bagof m&m’s R (cid:13) as their data, and they will use this data with a probability model to infer thepercentage of blue m&m’s R (cid:13) produced in the factory.2. Ask students prompting questions to help them formulate a quantitative picture of the prob-lem. • What kind of data will we have when we open the m&m’s R (cid:13) bags? Is it numerical?Categorical? Continuous? • What’s one way to estimate the percentage of blue m&m’s R (cid:13) in the bag? Are there anypitfalls to the approach? • If you observe zero blue m&m’s R (cid:13) in your bag, then what is your estimate of the fractionof blue m&m’s R (cid:13) in the population? Is this realistic? • How will we record the data from the m&m’s R (cid:13) bags? ccepted to the JSE Eadie et al. 20193. Review Bayes’ Theorem and its constituent parts.4. Lead a discussion about what likelihood might best describe the probability of drawing a bluem&m’s R (cid:13) from the bag, given their answers to the questions in Item 2 above. Ask for studentinput and guide the class to use a binomial distribution. • What assumptions are made by the likelihood? • Are we sampling with replacement or without? • After recording the data, can we eat it or should we wait until the analysis is complete?5. Return to their answers in Item 2 to talk about prior information and to develop an approx-imate distribution for their prior belief in the percentage of blue m&m’s R (cid:13) . Some questionsfor eliciting prior information could include: • How do you think m&m’s R (cid:13) are made in the factory? • Do you think the manufacturing process affects the proportion of blue m&m’s R (cid:13) in atypical bag? • Do you think m&m’s R (cid:13) are well mixed before they are put into bags? • What do you think the percentage of blue m&m’s R (cid:13) is, at the factory? Do you think everybag will have this exact percentage?6. Help the students quantify their prior information. • Where do you think most of the prior probability should be? • What parts of parameter space would you like to apply a very low probaiblity? • Sketch out a distribution that encompasses your belief about the percentage of bluem&m’s R (cid:13) in the bag.7. If conjugate priors are not a new concept to students, then brieﬂy discuss why they areuseful. If conjugate priors are new, introduce the beta distribution and its parameters, andexplain its conjugacy to the binomial distribution.8. Explore the properties of the beta distribution given its hyperparameters. If computer re-sources are available, students can interactively plot the prior distribution until they achievea prior distribution that quantiﬁes their prior belief about the proportion of blue m&m’s R (cid:13) . • What happens to the beta distribution as both hyperparameters go to one? Do yourecognize this distribution? • What happens when the hyperparameters are equal? • What values of the hyperparameters best approximate the prior distribution sketched inItem 6? A well-deﬁned method of prior elicitation from the literature could be used to guide this part. A couple of entry pointsto the literature are Garthwaite et al. (2005) and Kadane and Wolfson (1998). Prior elicitation is a broad topic, but oneworth introducing to the students if possible; in our experience, prior elicitation is rarely covered in coursework, but canbe vitally important to statistical practice. ccepted to the JSE Eadie et al. 20199. Choose the hyperparameters, thus deﬁning the prior distribution. • Why is it important that we deﬁne the prior distribution before looking at the data (i.e.,before opening the bag of m&m’s R (cid:13) )? • How many m&m’s R (cid:13) worth of information does your prior contain?10. As a class, decide on the hyperparameter values for the instructor’s prior distribution (whichwill be used with all the class’ data). One way to achieve class agreement is for the instruc-tor to interactively plot the prior distribution for hyperparameter values, suggested by thestudents, until a consensus is reached. • What is the mean of the prior distribution that was chosen? • How does the prior distribution compare to a uniform distribution? How does the priordistribution compare to a simple assumption that the probability of drawing a bluem&m’s R (cid:13) is ?11. Students work out the product of likelihood and prior, and identify the kernel of this distribu-tion. If computer resources are available, they write a short script to take in their data andplot the posterior distribution. Test their script with some toy data.12. Open the bags of m&m’s R (cid:13) and look at the data. Make sure students record not only thenumber of blue m&m’s R (cid:13) , y , but also the total number of all m&m’s R (cid:13) , n . • How do the number of blue m&m’s R (cid:13) in your bag compare to your neighbor’s bag? • Do you think there are enough m&m’s R (cid:13) in one bag to correctly infer the percentage ofblue m&m’s R (cid:13) produced at the factory?13. (If computer resources are not available, then skip to the next step). Students plot the poste-rior distribution given their data and prior information, and compare it to the prior distribution. • Where is most of the probability for the percentage of blue m&m’s R (cid:13) ? • How has the posterior changed from the prior distribution? • Is this the result you expected, given that there are six different colours of m&m’s R (cid:13) ? • How does your posterior distribution compare to your neighbor’s?14. Instructor compiles the class’ data for y and n . • What assumptions have we made when combining data from different bags? • Do you have any predictions about how the posterior distribution might change in lightof more data?15. Instructor produces a plot comparing the class posterior distribution to the class’ prior distri-bution. • Where is most of the probability for the percentage of blue m&m’s R (cid:13) ? • How does the posterior distribution for the data from the whole class compare to yourown? • How does the prior compare to the posterior, now that there is more data? ccepted to the JSE Eadie et al. 2019 • Has the shape of the posterior changed with more data? Did it change in the way youexpected? • What is the expected value (mean) and variance of the new posterior distribution for thepercentage of blue m&m’s R (cid:13) ? • Is this the result you expected for the percentage of blue m&m’s R (cid:13) , given that there aresix different colours of m&m’s R (cid:13) ? What might you infer about the colour distribution ofm&m’s R (cid:13) from these results? • Discuss the best way to report and display these results.16. (Optional) Repeat the analysis for each of the other colours (red, orange, green, and yellow)for each colour of m&m’s R (cid:13) . Repeating the analysis may help students master the practicalcomponents of the exercise which are generally useful, such as identifying the variable ofinterest, checking their computer code, and presenting results through graphics.17. Reveal the surprise twist: Show the colour distributions presented in Figure 1, and askstudents to discuss with one another and infer from the posterior distribution which factorytheir m&m’s R (cid:13) came from.18. Once they have performed inference, then they can look at the back of the m&m’s R (cid:13) bag tosee if their inference is correct. The lot number will show either CLV (Cleveland, Tennesseefactory) or HKP (Hackettstown, New Jersey). R (cid:13) exercise Bayes’ theorem states that the probability of θ given y is p ( θ | y ) = p ( y | θ ) p ( θ ) p ( y ) , (1)where p ( y | θ ) is the likelihood. For the probability of drawing a blue m&m’s R (cid:13) from the bag, we usea binomial distribution for the likelihood p ( y | θ ) ∝ θ y (1 − θ ) n − y , (2)where y is the number of successes (blue m&m’s R (cid:13) ), n is the total number of m&m’s R (cid:13) drawn fromthe bag, and θ is the percentage of blue m&m’s R (cid:13) produced at the factory.More details about eliciting prior information from the students is presented in Section 4.3. Aftereliciting prior information and sketching an approximate prior distribution, we set out to quantifythis information. We use the conjugate prior to Equation 2, the beta distribution, for the prior on θ : p ( θ ) ∝ θ α − (1 − θ ) β − , (3)with hyperparameters α and β . Equation 3 not only simpliﬁes the m&m’s R (cid:13) example for a time-constrained class, but also provides the opportunity to review the concept of a conjugate prior.Now that the prior has been deﬁned, the posterior distribution is proportional to p ( θ | y ) ∝ θ y + α − (1 − θ ) n − y + β − . (4)We ﬁnd it useful for students to simplify the likelihood and prior on their own, and then use think-pair-share to identify the form of the posterior distribution. Once students have recognized that theposterior is also a beta distribution, they may use software to plot the posterior distribution giventheir data (i.e., their m&m’s R (cid:13) ). ccepted to the JSE Eadie et al. 2019

Prior Distributions for class and seminar q p ( q ) class values: a = , b = a = , b = Figure 2: The prior distribution for the probability of drawing a blue m&m’s R (cid:13) from a bag of m&m’s R (cid:13) ,as determined by the class of undergraduates (solid curve) and by the seminar of graduate stu-dents, postdocs, and faculty (dashed curve). In general, students believed it was less likely todraw a blue m&m’s R (cid:13) than did the seminar members. We implemented the exercise presented in this paper in a third-year undergraduate astronomyclass, and in a seminar for graduate students, postdocs, and faculty at the Institute for Data In-tensive Research in Astrophysics & Cosmology (DIRAC). In this section, we show the posteriordistribution that resulted from both instances.In our class of approximately 10 students, it was agreed that a beta distribution with hyperpa-rameters α = 2 and β = 9 best described their prior information about the probability of drawing ablue m&m’s R (cid:13) from a bag. We reached this agreement through open discussion and the questionssuggested in Items 6-10 of Section 4.3. In the seminar, the graduate students, postdocs, andfaculty, used hyperparameters α = 3 and β = 7 . Both prior distributions are shown in Figure 2.The posterior distribution for the ﬁrst author’s data (i.e., one bag of m&m’s R (cid:13) ), and the posteriordistribution for the class’ data from all bags, are shown with the prior distribution in Figure 3. Theﬁrst author’s data are included in the supplementary material. The ensemble data consisted of100 m&m’s R (cid:13) , 25 of which were blue. The class inferred from the total posterior distribution and itsmean that the m&m’s R (cid:13) must have originated from the New Jersey factory. We checked the backof the packages, and indeed they showed the code HKP, indicating they came from Hackettstown,NJ (Figure 4).For the class, we had purchased individual packages of peanut m&m’s R (cid:13) from a local store inSeattle, WA. For the seminar, however, we expected more than twenty people— so instead webought a large box of peanut m&m’s R (cid:13) packages from Costco R (cid:13) . ccepted to the JSE Eadie et al. 2019

Posterior distribution for the probability of drawing a blue m&m q p ( q ) prior, a = and b = Figure 3: Posterior distributions for instructor’s data (dashed curve), and for the combined datafrom the entire class (solid curve). The prior distribution (dotted curve) and posterior mean forthe class’ posterior distribution (red vertical dashed-dotted line) are also shown for reference. Them&m’s R (cid:13) used were produced in the Hackettstown, New Jersey factory (HKP).Figure 4: A picture of the lot number from a package in the third-year undergraduate class. Thecode HKP indicates the package came from the Hackettstown, New Jersey factory. ccepted to the JSE Eadie et al. 2019

Posterior distribution for the probability of drawing a blue m&m q p ( q ) prior, a = and b = m = , s = Figure 5: Posterior distributions for the DIRAC Institute seminar and the ﬁrst author’s bag ofm&m’s R (cid:13) . The m&m’s R (cid:13) used for this seminar were produced in the Cleveland, Tennessee fac-tory (code CLV), and the data used to generate this ﬁgure are included as supplemental material.Figure 6: A picture of the lot number from a package from the seminar. The code CLV indicatesthe package came from the Cleveland, Tennessee factory. ccepted to the JSE Eadie et al. 2019The results from the DIRAC seminar are presented in Figure 5. In this case, the ﬁrst authorwas very surprised to see a different result than the undergraduate class, as they had expected thepackages to also come from New Jersey. As it turned out, the m&m’s R (cid:13) packages in the large boxwere produced in Cleveland, Tennessee (Figure 6)! The data are included in a supplementary csvﬁle entitled DIRAC mmsTallyCLVTennessee . Interestingly, Seattle, WA has packages from bothfactories, and perhaps other North American cities do too. This bodes well for instructors lookingto purchase m&m’s R (cid:13) from a speciﬁc factory (e.g., when they wish to implement the advancedversion of this exercise in Section 8). Ideally, the students should be given a chance to write computer scripts to plot the prior andposterior distributions in this exercise, in their chosen software (e.g., R, Python, SAS, etc.). Ifclasstime is short, then it might expedite the exercise to use the pre-authored scripts we provideon the Github repository accompanying this paper . For higher-level (e.g., graduate-level) classes, there are several ways to extend this exercise tointroduce and encompass more advanced concepts. We explored extensions that use the massesof the m&m’s R (cid:13) , that incorporate quality control levels at the factory, that consider multiple colours,or that include the colour distributions of m&m’s R (cid:13) produced at different factories. The latter twomost naturally follow from the original exercise, and so we describe these two ideas in more detailbelow.A straightforward extension generalizes the case of two possible outcomes of the data (blueversus not-blue m&m’s R (cid:13) ) to the case of six possible outcomes (i.e., blue, orange, green, yellow,red, or brown). By including a parameter for the probability of each colour, a multi-dimensional pos-terior distribution is naturally introduced. More speciﬁcally, modeling all six colours of m&m’s R (cid:13) si-multaneously involves the multinomial distribution and its conjugate prior, the Dirichlet distribution— both of which have a vast range of applications in domains with categorical data (e.g., topicmodeling; see e.g., Blei 2012 for an introduction).An even more advanced extension not only incorporates multiple colours within a bag, butalso the fact that m&m’s R (cid:13) bags may come from one of two factories, and that the two factories(Tennessee and New Jersey) produce different colour distributions. Incorporating the individualbags, class collection of bags, and the two factories into the exercise provides a natural basis forintroducing and applying Bayesian hierarchical modeling and mixture models.Instead of creating a joint data set from all bags in the class and obtaining a posterior dis-tribution (which assumes all bags of m&m’s R (cid:13) originated in the same factory), the students areasked to write down a joint model for all bags, with latent variables to be inferred along with theoverall distributions. Students must jointly infer the colour distributions produced at each factory,the latent assignment of each bag to a factory, and the mixture proportions of factories withinthe set of bags in the class. The hierarchical nature of this model is in the joint inference of thelatent mixture assignments with the mixture proportions. In addition, the colour distributions areinferred as global population parameters, while the joint information from all bags will shrink the https://github.com/gweneadie/BayesianMandMs ccepted to the JSE Eadie et al. 2019posteriors on the latent mixture assignment variables. The speciﬁcs of the hierarchical model forthe m&m’s R (cid:13) exercise are shown in Appendix A.The hierarchical version of the m&m’s R (cid:13) exercise is appealing because the variables’ true val-ues can be known by the instructor; the colour proportions from each factory are relatively wellknown (Figure 1, as long as the MARS company does not change their practices), and the in-structor can deliberately set the mixture proportions of the bags (depending on the availability ofm&m’s R (cid:13) from both factories). In the context of a longer course on Bayesian analysis, this exercisecan also be used to introduce related advanced concepts such as probabilistic graphical modelsas well as Markov Chain Monte Carlo methods and Gibbs sampling. Many successful active-learning activities that use m&m’s R (cid:13) exist for topics in frequentist statisticsand mathematics education. The literature, however, was lacking a Bayesian m&m’s R (cid:13) example.We hope that this exercise ﬁlls the gap.Introducing Bayesian analysis to undergraduate students, regardless of their ﬁeld of study,requires a discussion of Bayes’ theorem, probability distributions, and prior distributions. Examplesthat are meant to illuminate these concepts often rely on objects or systems such as playing cards,dice, or urns. Although these examples can be useful and familiar, we feel they are overused andhave become repetitive both for students and instructors. Most undergraduates have already seenexamples using playing cards, dice, and urns in high school-level mathematics classes, so it wouldbe nice to have an entirely different type of example to entice student engagement. Students mustalso have the opportunity to apply what they are learning and, in Bayesian analysis in particular,develop practical skills. We hope that this m&m’s R (cid:13) example provides an enticing (and tasty!)alternative to traditional examples.It would be beneﬁcial and probably more cost effective if instructors could source m&m’s R (cid:13) fromthe two MARS factories directly. We have not contacted the MARS Company about direct pur-chasing, but should they read this manuscript, we are happy to discuss setting up a program form&m’s R (cid:13) discounted offers in the name of statistical education. Acknowledgements

The authors thank the editors and the anonymous referees whose comments and suggestionshelped improve this manuscript.This research was funded by an eScience Institute Postdoctoral Fellowship to the ﬁrst author,G. Eadie. G. Eadie and T. McCormick acknowledge support from the three Moore, Sloan, andWashington Research Foundations at the eScience Institute, University of Washington. G. Eadieand D. Huppenhothen acknowledge support from the University of Washington Institute for DataIntensive Research in Astrophysics and Cosmology (DIRAC). The DIRAC Institute is supportedthrough generous gifts from the Charles and Lisa Simonyi Fund for Arts and Sciences, and theWashington Research Foundation. The authors would like to thank the DIRAC Institute membersfor their useful discussions relating to this exercise, and for their willingness to eat m&m’s R (cid:13) onbehalf of pedagogical development. The authors would also like to thank C. Morrison for identifyingthe location of the lot code on the candy packages during the DIRAC seminar. ccepted to the JSE Eadie et al. 2019

References

Albert, J. and Rossman, A. (2009).

Workshop Statistics: Discovery with Data, A Bayesian Ap-proach . Springer-Verlag New York.Alexander, M. and SherriJoyce, K. (1994a). Colored Candies and PROC MACONTROL: A SweetWay to Produce Chi-Square Control Charts Part 2: Statistical Presentation.

SUGI 19: Proceed-ings of the Nineteenth Annual SAS Users Group International Conference , pages 1145–1150.Alexander, M. and SherriJoyce, K. (1994b). Colored Candles and SAS/QC: A Sweet Way toProduce Chi-Square Control Charts.

SAS Conference Proceedings: SouthEast SAS UsersGroup 1994 , pages 249–253.Atlas, Purtill, C. (2017). M&M’s Color Distribution, c.2017. . Accessed: 2018-09-14.Badinski, I., Huffaker, C., McCue, N., Miller, C. N., Miller, K. S., Miller, S. J., and Stone, M. (2017).The m&m game: From morsels to modern mathematics.

Mathematics Magazine , 90(3):197–207.Blei, D. M. (2012). Probabilistic topic models.

Communications of the ACM , 55(4):77–84.Downey, A. (2013).

Think Bayes . O’Reilly, Beijing, 1 edition.Dyck, J. L. and Gee, N. R. (1998). A sweet way to teach students about the sampling distributionof the mean.

Teaching of Psychology , 25(3):192–195.Fricker, R. D. (1996). The mysterious case of blue m&m’s.

Chance , 9(4):19–22.Froelich, A. G. and Stephenson, W. R. (2013). How much do m&m’s weigh?

Teaching Statistics ,35(1):14–20.Garthwaite, P. H., Kadane, J. B., and O’Hagan, A. (2005). Statistical methods for eliciting proba-bility distributions.

Journal of the American Statistical Association , 100(470):680–701.Kadane, J. and Wolfson, L. J. (1998). Experiences in elicitation.

Journal of the Royal StatisticalSociety: Series D (The Statistician) , 47(1):3–19.Lin, T. and Sanders, M. S. (2006). A sweet way to learn doe.

Quality Progress , 38(88).Purtill, C. (2017). A statistician got curious about m&m colors and went onan endearingly geeky quest for answers.

Quartz . https://qz.com/918008/the-color-distribution-of-mms-as-determined-by-a-phd-in-statistics/ .Schwartz, T. A. (2013). Teaching principles of one-way analysis of variance using m&m’s candy. Journal of Statistics Education , 21(1).SherriJoyce, K. and Alexander, M. (1994). Colored Candies and Base SAS: A Sweet Way toProduce Chi-Square Control Charts Part 1: Tutorial.

SUGI 19: Proceedings of the NineteenthAnnual SAS Users Group International Conference , pages 1470–1475.Wicklin, R. (2017). The distribution of colors for plain M&M candies. https://blogs.sas.com/content/iml/2017/02/20/proportion-of-colors-mandms.html .Winkel, B. (2009). Population modelling with m&m’s.

International Journal of Mathematical Edu-cation in Science and Technology , 40(4):554–558. ccepted to the JSE Eadie et al. 2019

A Details of the Hierarchical Bayesian Model

In the hierarchical version of the exercise, there are two sets of parameters β and β (one foreach factory, f ), and each set represents the fraction of m&m’s R (cid:13) per colour that are produced inFactories 1 and 2. A Dirichlet prior is placed on each set of parameters, with hyperparameters η .In addition, each bag b out of the total sample of B bags is given a latent categorical variable z b thatassigns the bag to Factory 1 or Factory 2. This variable is drawn from a categorical distributiondescribing the probability of drawing a bag from either factory. The parameters for the mixtureproportions is denoted as θ . The latter, too, has a Dirichlet prior, with hyperparameters α . Agraphical version of this model is shown in Figure 7. bags factories α θ z b c b ηβ f Figure 7: A probabilistic graphical model describing the hierarchical approach to modeling b bagsof m&m’s R (cid:13) from different factories f = 1 , , . . . . The vector c b denotes the number of m&m’s R (cid:13) ofeach of the six colours in a single bag b . z b are the latent assignments of each bag to a factory, θ indicates the mixture proportions for the two factories, and β f are the colour distributions for eachfactory. Finally, α and η denote the hyperparameters for the Dirichlet priors on θ and β f .The full posterior distribution for the categorical mixture model described in Figure 7 can bewritten as p ( { z b } Bb =1 , { β f } f =1 , θ |{ c b } Bb =1 , η, α ) = p ( θ | α ) (cid:89) f =1 [ p ( β f | η )] B (cid:89) b =1 [ p ( c b | z b , β f ) p ( z b | θ )] /Z, (5)where θ ∼ Dirichlet( α ) , (6) β f ∼ Dirichlet( η ) ,z b ∼ Categorical( θ ) , andc b ∼ Multinomial( z b , β f ) ..