Enriching students' conceptual understanding of confidence intervals: An interactive trivia-based classroom activity
EEnriching Students’ ConceptualUnderstanding of Confidence Intervals: An
Interactive Trivia-based Classroom Activity
Xiaofei WangDepartment of Statistics, Yale UniversityNicholas G. ReichDepartment of Biostatistics and Epidemiology,University of Massachusetts at AmherstNicholas J. HortonDepartment of Mathematics and Statistics, Amherst CollegeJanuary 31, 2017
Abstract
Confidence intervals provide a way to determine plausible values for a populationparameter. They are omnipresent in research articles involving statistical analyses.Appropriately, a key statistical literacy learning objective is the ability to interpretand understand confidence intervals in a wide range of settings. As instructors,we devote a considerable amount of time and effort to ensure that students masterthis topic in introductory courses and beyond. Yet, studies continue to find thatconfidence intervals are commonly misinterpreted and that even experts have troublecalibrating their individual confidence levels. In this article, we present a ten-minutetrivia game-based activity that addresses these misconceptions by exposing studentsto confidence intervals from a personal perspective. We describe how the activitycan be integrated into a statistics course as a one-time activity or with repetitionat intervals throughout a course, discuss results of using the activity in class, andpresent possible extensions.
Keywords: uncertainty, calibrating confidence, subjective probability1 a r X i v : . [ s t a t . O T ] J a n Introduction
Confidence intervals are one of the most commonly used statistical methods to summarizeuncertainty in parameter estimates from data analyses. However, both the formal conceptof and intuition behind confidence intervals remain elusive to many students and dataanalysts. In the case of a 95% confidence interval for a proportion for a given sample size,one textbook definition is “95% of samples of this size will produce confidence intervalsthat capture the true proportion” (De Veaux et al.; 2011). Another textbook definitionreads “A plausible range of values for the population parameter is called a confidenceinterval” and later specifies for 95% confidence “Suppose we took many samples and builta confidence interval [for the population mean] from each sample...then about 95% of thoseintervals would contain the actual mean” (Diez et al.; 2012). Studies have found thatmany students struggle with understanding definitions like these even after completingcoursework at various levels (Fidler and Cumming; 2005; Kalinowski; 2010; Kaplan et al.;2010).In addition to a lack of intuition about the definition of a confidence interval, researchershave documented overconfidence in quantitative assessments of uncertainty in studentsand experts alike (Alpert and Raiffa; 1982; Soll and Klayman; 2004; Jørgensen et al.;2004; Cesarini et al.; 2006). In this context, overconfidence has been defined as individualshaving “excessive precision” in their beliefs about particular facts (Moore and Healy; 2008).One study showed that when subjects were asked to provide 98% confidence intervalsrelating to numeric facts, only 57.4% of their intervals captured the true value (Alpert andRaiffa; 1982). This display of overconfidence is not limited to students and novices, butis also exhibited by field experts (Hynes and Vanmarcke; 1976; Christensen-Szalanski andBushyhead; 1981).The textbook definition of a confidence interval can be challenging to understand, andvarious pedagogical tools have been proposed to assist with comprehension. Some teachershave documented success by teaching confidence intervals with bootstrapping techniques(Maurer and Lock; 2015) or with visualizations of simulated samples (Cumming; 2007;Hagtvedt et al.; 2008). Behar et al. (2013) presented one analogy useful in understanding95% confidence intervals: “It is like a person who tells the truth 95% of the time, but2e do not know whether a particular statement is true or not.” Providing students withan opportunity to calibrate their intuitive understanding of what it means to be, say,90% confident about a fact through short repetitive activities may enhance their abilityto understand the strength of conclusions from a data analysis. In this manuscript, wedescribe an activity that aims to solidify an intuitive understanding of what it means tobe “90% confident”.Motivated by Alpert and Raiffa (1982), our short interactive classroom activity tiestogether formal concepts of confidence intervals to tangible information and questions aboutfacts that students may or may not have some familiarity with and interest in. In brief,the instructor reads off ten trivia questions with specific quantitative answers. Studentsare asked to provide their answer to each question in the form of a 90% confidence interval(rather than just providing a point estimate). After all the trivia questions are read, thecorrect answers are revealed and students calculate the number of intervals that capturethe correct answers within their intervals.By providing immediate feedback and fostering a gently competitive spirit in the class-room, this activity incentivizes students to engage with the central conceptual challengesof confidence intervals as described in the GAISE College Report (ASA GAISE Collegeworking group: ASA; 2016).The questions should be appropriate for the audience, and can be about topics that aretimely and relate to cultural memes (for example, celebrities, sports teams, and TV shows).However, it is important to emphasize that this is not a test of knowledge, but rather ofhow well one knows and is able to quantify the limits of their knowledge. Specifically, ifstudents are not familiar with a particular topic, they can (and should) respond with awide interval. Assessing personal confidence interval coverage rates provides immediatefeedback about how well-calibrated a student’s confidence level is.The activity is simple, requires little preparation, and is appropriate for students atall different levels and backgrounds, including undergraduate and graduate students. Theactivity requires only a list of 10 questions with quantitative answers and can be usedas early as in an introductory statistics course when the students are first exposed toconfidence intervals. In an upper level course, the activity can be used early on for review.3iven the simplicity of the exercise, this activity can be utilized in a classroom of twentystudents or two hundred students with little modification.In Section 2, we describe the activity procedure and necessary preparation. Section 3presents anecdotal and numerical evidence relating to the instructional value of the activity.Finally, in Section 4, we present a discussion relating to the effectiveness of the activity.
Before class, the instructor needs to prepare a series of ten or more trivia questions thateach have specific quantitative answers. There are many sources for ideas. Trivia questionsabout the college or university may be popular (e.g. how many official student organizationsare listed on the school website?). We have used the board game Wits and Wagers as onesource of questions. The questions should ideally cover a broad range of topics so that moststudents will not know the answers exactly, but that educated guesses are possible. Forexample, a question such as “In what year was the Declaration of Independence signed?”does have a quantitative answer but would be a poor choice because most students wouldbe able to answer 1776 without hesitation. Table 1 gives a series of example questions thatmight be used for this activity. These questions were obtained by one of the authors viathe Wits & Wagers: Trivia Party iOS app . Appendix A contains a list of thirty additionalquestions that can be used for this activity.We now outline the steps of the activity assuming that ten questions have already beenselected for use.1. Provide students with a blank sheet of paper and ask the students to make a numberedlist from 1 to 10, leaving room for ten answers.2. Tell students that they will hear a series of ten questions, each one having a numericanswer. Instead of writing down a specific value to answer each question, studentsshould provide their answer in the form of a 90% confidence interval, i.e. (lower Crapuchettes, D. and N. Heasley (2005). Wits and Wagers. Bethesda, MD: North Star Games. https://itunes.apple.com/us/app/wits-wagers-trivia-party/id637929057?mt=8 N = 10 and various probabilities of success. Some studentsmay have scored poorly due to an overestimation of how well they know the subject areasin question. Other students may have been overeager to play a competitive trivia game andmay have overlooked the specification of the 90% confidence level that should be consideredas part of their answers. For all students, this activity serves to bridge the gap between a5 uestion Answer We note that when exactly ten questions are utilized, students can ‘cheat’ the activity bygiving nine extremely wide intervals and one obviously wrong interval. In doing so, theycan almost definitely guarantee a score of nine out of ten. One way to fix this loophole is tomodify the number of questions used. For instance, students might be asked 15 questionsbut only a random selection of 10 are used for scoring. Additionally, the activity could beextended by asking students to think about and propose a metric that could identify thiskind of cheating; for example, a mean and standard deviation could be calculated fromall intervals for a question, and students having multiple interval widths more than some z -scores above the mean could be flagged as potential cheaters.The proposed activity can take many forms and be used for different purposes. As agroup, the authors have used this activity with both graduate and undergraduate students,6t one or more times during the semester, and with slightly different learning goals in mind.Two of the authors first use the activity in an undergraduate introductory statistics courseduring the class period immediately following the introduction of confidence intervals witha follow-up towards the end of the semester. Another has used it repeatedly throughout thesemester (using different questions each time) in a graduate-level course on regression thatis a required course for statistics Masters students. In this course, the goal of repeatingthe activity is to illustrate the challenges of calibrating uncertainty, i.e. avoiding “excessiveprecision”, and understanding what it means to be 90% confident about a fact. In each ofthe courses described above, the instructors also employ small simulation-based exercisesto illustrate the properties of confidence intervals in a more formal statistical sense.Apart from using this activity for understanding confidence intervals, the act of pro-ducing intervals and then validating whether the answers fall within those intervals servesas a way of calibrating one’s level of confidence. We believe students stand to benefitfrom participating in this exercise several times over the course of a semester, of course,on different trivia questions. In our experience, students score better with each additionaliteration (see Section 3). Such incremental improvements suggest that students learn tobetter quantify their levels of uncertainty with practice.In the graduate-level course where the activity was repeatedly used, the ComprehensiveAssessment of Outcomes in a First Statistics course, or the CAOS test, was used to evaluateoverall conceptual understanding of statistics (Delmas et al.; 2007). Specifically, the CAOSpre- and post-tests were administered at the beginning and end of the semester. We includea comparison of pre- and post-test scores from confidence interval-related questions in theSupplementary Materials. Over the course of one semester, the three authors used this activity two, two, and fivetimes in their respective classes. We now describe the results with data consisting ofstudent scores from these three classes. The data collection, management, and analysiswere approved by the University of Massachusetts at Amherst Institutional Review Board( rstanarm package(Gabry and Goodrich; 2016) in R version 3.3.2 (2016-10-31) (R Core Team; 2016) to fit8he model with a Bayesian Markov Chain Monte Carlo algorithm, using standard weaklyinformative priors for the coefficients and variance parameters. While accuracy increasedwith each iteration, the biggest incremental score gain occurred between the first and sec-ond iterations, with an improvement of 1.5 to 2 questions answered correctly (Table 2).Accuracy increased more slowly between iterations 2 through 5, although an average im-provement of almost a single correct question was observed across these four iterations.Comparing the 90% posterior credible intervals (CI) for the number of questions answeredcorrectly per iteration, only rounds 4 (90% CI: (4.7, 7)) and 5 (90% CI: (4.9, 7.2)) showedsignificant increases in accuracy over iteration 1 (90% CI: (2.4, 4.4)); the credible intervalsfor iterations 2 and 3 still overlapped with the credible interval for iteration 1.Horton Wang ReichIteration 1 2 1 2 1 2 3 4 5n 21 21 26 19 11 13 14 14 14mode 5 3 1 4 6 6 8 3 7median 5.0 6.0 3.0 6.0 4.0 6.0 6.5 6.0 6.5mean 4.5 5.7 3.3 5.8 3.6 5.2 5.6 5.8 6.0sd 1.8 2.1 2.3 2.2 2.5 2.8 3.0 2.8 1.8estimated mean 3.3 5.3 5.6 5.9 6.1estimated 90% CI 2.2-4.5 4.1-6.5 4.4-6.8 4.7-7 4.9-7.2Table 2: Summary Statistics of Activity By Class and Iteration. The last two rows representmodel-based estimates.
Confidence intervals play a key role in inferential thinking. We have described an interac-tive activity to help solidify students’ understanding of confidence intervals. The activity isattractive because it requires no technology, is interactive and applicable to a wide varietyof students, and fun, in the spirit of friendly competition between students. The activitycan help reinforce statistical thinking and help students recognize overconfidence. However,it should be noted as a limitation of the activity that the intervals requested of the studentsare somewhat removed from the typical notion of a confidence interval; students are not9 round nu m be r r i gh t ( ou t o f ) Figure 1: Student scores from five iterations of the activity ( n = 14, shown in lighter colors,color-coded by student). The solid black line indicates the mean score across all students.The dashed blue line indicates the model-estimated posterior median number of questionsanswered correctly, with corresponding 90% confidence intervals shown in vertical bluelines.computing standard errors and point estimates but are rather relying on their own judge-ment. Nonetheless, because the activity does not involve any calculations, which tends tobe the typical entry point for teaching confidence intervals, it can be particularly helpful forpromoting an intuitive understanding. Students with very little mathematical background,such as graduate students from non-quantitative fields, may find this activity a helpful wayto understand confidence intervals without the distraction of the mathematical formulae.It is interesting to note that while scores improve over repeated iterations of the activity,perfect calibration of confidence was not attained within the course of a single semester.Even with five iterations of the activity, student scores plateau between 5 to 6 correctanswers out of 10, well below the desired level of 9. Moreover, students in the classesthat only used the activity twice in a semester seemed to do about as well in the seconditeration as those who had a third, fourth, or fifth try. This observation suggests thatoverconfidence is challenging to overcome. Examining whether the same achievement gapholds if students were asked to produce confidence intervals at different levels (i.e. 50%or 70%) could provide information on whether this is a problem inherent to individual’s10udgment uncertainty, or whether 90% is a particularly tricky level to achieve.We relied on the CAOS test (details provided in the Supplementary Materials) to pro-vide a quantitative evaluation of student’s understanding of confidence intervals. The datasuggested improvement in students’ overall understanding of the interpretation of confi-dence intervals, although the sample size was small (15 students took both tests) andimprovements in understanding could also be attributed to other content in the course.With the continued increase of written content by “data journalists” in the mass media(often including some technical measures of uncertainty), the value of accurate intuitionabout uncertainty will grow for not just students and practitioners of statistics but for thegeneral public as well. The results shown in this manuscript serve as proof-of-concept thatasking students to create confidence intervals for quantities that are of potential interestcan foster a more intuitive understanding of confidence and uncertainty. Our experience isthat students are consistently surprised by the results of this exercise, leading to “teachablemoments” when students grasp how and why their interval coverage was too low. Therefore,activities such as the one presented here and others may serve an important role in creatinginformed consumers of modern data-driven content, whether journalistic or academic, byarming them with tools to interpret quantitative results. First we would like to thank the Associate Editor and two anonymous reviewers whoprovided very helpful feedback on the first iteration of this manuscript. We also thank thestudents who participated in these activities. Finally, we thank Eric W. Bright, CFA, whointroduced NGR to a version of this activity.11 eferences
Alpert, M. and Raiffa, H. (1982).
A progress report on the training of probability assessors ,Cambridge University Press, pp. 294–305.ASA GAISE College working group: ASA (2016). Guidelines for assessment and instructionin statistics education: College report (draft).
URL:
Ayres, I. (2007).
Super Crunchers , Bantam Books.Behar, R., Grima, P. and Marco-Almagro, L. (2013). Twenty-five analogies for explainingstatistical concepts,
The American Statistician (1): 44–48.Cesarini, D., Sandewall, ¨O. and Johannesson, M. (2006). Confidence interval estimationtasks and the economics of overconfidence, Journal of Economic Behavior & Organization (3): 453–470.Christensen-Szalanski, J. J. and Bushyhead, J. B. (1981). Physicians’ use of probabilis-tic information in a real clinical setting., Journal of Experimental Psychology: Humanperception and performance (4): 928.Cumming, G. (2007). Inference by Eye: Pictures of Confidence Intervals and ThinkingAbout Levels of Confidence, Teaching Statistics (3): 89–93.De Veaux, R. D., Velleman, P. and Bock, D. E. (2011). Stats: Data and Models , 3 edn,Pearson.Delmas, R., Garfield, J., Ooms, A. and Chance, B. (2007). Assessing students’ conceptualunderstanding after a first course in statistics,
Statistics Education Research Journal (2): 28–58.Diez, D. M., Barr, C. D. and Cetinkaya-Rundel, M. (2012). OpenIntro statistics , 174-175edn, CreateSpace.Fidler, F. and Cumming, G. (2005). Teaching confidence intervals: Problems and potentialsolutions,
Proceedings of the 55th international statistics institute session .12abry, J. and Goodrich, B. (2016). rstanarm: Bayesian Applied Regression Modeling viaStan . R package version 2.10.1.
URL: https://CRAN.R-project.org/package=rstanarm
Hagtvedt, R., Jones, G. T. and Jones, K. (2008). Teaching Confidence Intervals UsingSimulation,
Teaching Statistics (2): 53–56.Hynes, M. E. and Vanmarcke, E. H. (1976). Reliability of embankment performance pre-dictions , Department of Civil Engineering, Masachusetts Inst. of Technology.Jørgensen, M., Teigen, K. H. and Moløkken, K. (2004). Better sure than safe? over-confidence in judgement based software development effort prediction intervals,
Journalof Systems and Software (1): 79–93.Kalinowski, P. (2010). Identifying misconceptions about confidence intervals, Proceedingsof the Eighth International Conference on Teaching Statistics .Kaplan, J., Fisher, D. G. and Rogness, N. T. (2010). Lexical ambiguity in statistics: howstudents use and define the words: association, average, confidence, random and spread,
Journal of Statistics Education .Maurer, K. and Lock, E. (2015).
Bootstrapping in the introductory statistics curriculum ,Technology Innovations in Statistics Education.Moore, D. A. and Healy, P. J. (2008). The trouble with overconfidence.,
Psychologicalreview (2): 502.R Core Team (2016).
R: A Language and Environment for Statistical Computing , R Foun-dation for Statistical Computing, Vienna, Austria.
URL:
Russo, J. E. and Schoemaker, P. J. H. (1989).
Confident Decision Making , Piatkus.Soll, J. B. and Klayman, J. (2004). Overconfidence in interval estimates.,
Journal ofExperimental Psychology: Learning, Memory, and Cognition (2): 299.13 ppendix A: 30 Questions In this section, we provide thirty more questions in addition to the ones presented in Table 1for convenience. The first ten questions are adapted from Russo and Schoemaker (1989)(reprinted in Ayres (2007), among other places), the next nine are taken from the Witsand Wagers board game, and the rest are written up by the authors.
Sources for questions 20 to 30 http://web.mta.info/nyct/facts/ridership/ https://en.wikipedia.org/wiki/List_of_James_Bond_films https://en.wikipedia.org/wiki/Yosemite_Falls https://en.wikipedia.org/wiki/Moons_of_Saturn https://en.wikipedia.org/wiki/Copa_Am%C3%A9rica_Centenario uestion Answer $ $ Supplementary Materials ctivity Printout The first two pages serve as a convenient one-sheet (front and back) summary of the activity that can beprinted out for use in class.
Activity Outline
Activity lasts approximately 10-15 minutes. Steps follow:1. Each student starts out with a sheet of paper with numbered list from 1 to 10, leaving room for 10answers.2. You will read aloud 10 trivia questions each with a numeric answer. Instruct students to write downa 90% confidence interval in response to each question.3. Read each of the trivia questions.4. Review answers to the trivia questions. Students should score themselves: 1 point for every intervalthat captures the correct numeric answer.5. Ask students to tally up their score out of 10.6. Obtain and visualize a distribution of student scores.7. Discuss the results as a group.
Possible Discussion Questions
1. If a student provided 90% confidence intervals in all ten cases, how many points would we expecthim/her to score?2. If every student provided 90% confidence intervals in all ten cases, what would a histogram of scoreslook like for the class?3. Examining the histogram (or stem-and-leaf plot) of scores from our class, do you think we wereoverconfident or underconfident?4. How might we, as a class, do better at this exercise?
Sample Questions, Answers, and Scoring
In the table below, we reiterate the questions/answers presented in Table 1 and present sample responsesfor each. Responses that are bolded earn one point. uestion Answer Sample Response (1 , , (3 , (50 , , , , (1896 , (1000 ,
10 How many states were part of the United States in 1860? 33 (0 , Total score: 6 out of 10
Sample Visualization
Sample Histogram of Scores
Score D en s i t y AOS Test Results