[PDF] Enriching students' conceptual understanding of confidence intervals: An interactive trivia-based classroom activity

Abstract

Confidence intervals provide a way to determine plausible values for a population parameter. They are omnipresent in research articles involving statistical analyses. Appropriately, a key statistical literacy learning objective is the ability to interpret and understand confidence intervals in a wide range of settings. As instructors, we devote a considerable amount of time and effort to ensure that students master this topic in introductory courses and beyond. Yet, studies continue to find that confidence intervals are commonly misinterpreted and that even experts have trouble calibrating their individual confidence levels. In this article, we present a ten-minute trivia game-based activity that addresses these misconceptions by exposing students to confidence intervals from a personal perspective. We describe how the activity can be integrated into a statistics course as a one-time activity or with repetition at intervals throughout a course, discuss results of using the activity in class, and present possible extensions.

Full PDF

EEnriching Students’ ConceptualUnderstanding of Conﬁdence Intervals: An

Interactive Trivia-based Classroom Activity

Xiaofei WangDepartment of Statistics, Yale UniversityNicholas G. ReichDepartment of Biostatistics and Epidemiology,University of Massachusetts at AmherstNicholas J. HortonDepartment of Mathematics and Statistics, Amherst CollegeJanuary 31, 2017

Abstract

Conﬁdence intervals provide a way to determine plausible values for a populationparameter. They are omnipresent in research articles involving statistical analyses.Appropriately, a key statistical literacy learning objective is the ability to interpretand understand conﬁdence intervals in a wide range of settings. As instructors,we devote a considerable amount of time and eﬀort to ensure that students masterthis topic in introductory courses and beyond. Yet, studies continue to ﬁnd thatconﬁdence intervals are commonly misinterpreted and that even experts have troublecalibrating their individual conﬁdence levels. In this article, we present a ten-minutetrivia game-based activity that addresses these misconceptions by exposing studentsto conﬁdence intervals from a personal perspective. We describe how the activitycan be integrated into a statistics course as a one-time activity or with repetitionat intervals throughout a course, discuss results of using the activity in class, andpresent possible extensions.

Keywords: uncertainty, calibrating conﬁdence, subjective probability1 a r X i v : . [ s t a t . O T ] J a n Introduction

Conﬁdence intervals are one of the most commonly used statistical methods to summarizeuncertainty in parameter estimates from data analyses. However, both the formal conceptof and intuition behind conﬁdence intervals remain elusive to many students and dataanalysts. In the case of a 95% conﬁdence interval for a proportion for a given sample size,one textbook deﬁnition is “95% of samples of this size will produce conﬁdence intervalsthat capture the true proportion” (De Veaux et al.; 2011). Another textbook deﬁnitionreads “A plausible range of values for the population parameter is called a conﬁdenceinterval” and later speciﬁes for 95% conﬁdence “Suppose we took many samples and builta conﬁdence interval [for the population mean] from each sample...then about 95% of thoseintervals would contain the actual mean” (Diez et al.; 2012). Studies have found thatmany students struggle with understanding deﬁnitions like these even after completingcoursework at various levels (Fidler and Cumming; 2005; Kalinowski; 2010; Kaplan et al.;2010).In addition to a lack of intuition about the deﬁnition of a conﬁdence interval, researchershave documented overconﬁdence in quantitative assessments of uncertainty in studentsand experts alike (Alpert and Raiﬀa; 1982; Soll and Klayman; 2004; Jørgensen et al.;2004; Cesarini et al.; 2006). In this context, overconﬁdence has been deﬁned as individualshaving “excessive precision” in their beliefs about particular facts (Moore and Healy; 2008).One study showed that when subjects were asked to provide 98% conﬁdence intervalsrelating to numeric facts, only 57.4% of their intervals captured the true value (Alpert andRaiﬀa; 1982). This display of overconﬁdence is not limited to students and novices, butis also exhibited by ﬁeld experts (Hynes and Vanmarcke; 1976; Christensen-Szalanski andBushyhead; 1981).The textbook deﬁnition of a conﬁdence interval can be challenging to understand, andvarious pedagogical tools have been proposed to assist with comprehension. Some teachershave documented success by teaching conﬁdence intervals with bootstrapping techniques(Maurer and Lock; 2015) or with visualizations of simulated samples (Cumming; 2007;Hagtvedt et al.; 2008). Behar et al. (2013) presented one analogy useful in understanding95% conﬁdence intervals: “It is like a person who tells the truth 95% of the time, but2e do not know whether a particular statement is true or not.” Providing students withan opportunity to calibrate their intuitive understanding of what it means to be, say,90% conﬁdent about a fact through short repetitive activities may enhance their abilityto understand the strength of conclusions from a data analysis. In this manuscript, wedescribe an activity that aims to solidify an intuitive understanding of what it means tobe “90% conﬁdent”.Motivated by Alpert and Raiﬀa (1982), our short interactive classroom activity tiestogether formal concepts of conﬁdence intervals to tangible information and questions aboutfacts that students may or may not have some familiarity with and interest in. In brief,the instructor reads oﬀ ten trivia questions with speciﬁc quantitative answers. Studentsare asked to provide their answer to each question in the form of a 90% conﬁdence interval(rather than just providing a point estimate). After all the trivia questions are read, thecorrect answers are revealed and students calculate the number of intervals that capturethe correct answers within their intervals.By providing immediate feedback and fostering a gently competitive spirit in the class-room, this activity incentivizes students to engage with the central conceptual challengesof conﬁdence intervals as described in the GAISE College Report (ASA GAISE Collegeworking group: ASA; 2016).The questions should be appropriate for the audience, and can be about topics that aretimely and relate to cultural memes (for example, celebrities, sports teams, and TV shows).However, it is important to emphasize that this is not a test of knowledge, but rather ofhow well one knows and is able to quantify the limits of their knowledge. Speciﬁcally, ifstudents are not familiar with a particular topic, they can (and should) respond with awide interval. Assessing personal conﬁdence interval coverage rates provides immediatefeedback about how well-calibrated a student’s conﬁdence level is.The activity is simple, requires little preparation, and is appropriate for students atall diﬀerent levels and backgrounds, including undergraduate and graduate students. Theactivity requires only a list of 10 questions with quantitative answers and can be usedas early as in an introductory statistics course when the students are ﬁrst exposed toconﬁdence intervals. In an upper level course, the activity can be used early on for review.3iven the simplicity of the exercise, this activity can be utilized in a classroom of twentystudents or two hundred students with little modiﬁcation.In Section 2, we describe the activity procedure and necessary preparation. Section 3presents anecdotal and numerical evidence relating to the instructional value of the activity.Finally, in Section 4, we present a discussion relating to the eﬀectiveness of the activity.

Before class, the instructor needs to prepare a series of ten or more trivia questions thateach have speciﬁc quantitative answers. There are many sources for ideas. Trivia questionsabout the college or university may be popular (e.g. how many oﬃcial student organizationsare listed on the school website?). We have used the board game Wits and Wagers as onesource of questions. The questions should ideally cover a broad range of topics so that moststudents will not know the answers exactly, but that educated guesses are possible. Forexample, a question such as “In what year was the Declaration of Independence signed?”does have a quantitative answer but would be a poor choice because most students wouldbe able to answer 1776 without hesitation. Table 1 gives a series of example questions thatmight be used for this activity. These questions were obtained by one of the authors viathe Wits & Wagers: Trivia Party iOS app . Appendix A contains a list of thirty additionalquestions that can be used for this activity.We now outline the steps of the activity assuming that ten questions have already beenselected for use.1. Provide students with a blank sheet of paper and ask the students to make a numberedlist from 1 to 10, leaving room for ten answers.2. Tell students that they will hear a series of ten questions, each one having a numericanswer. Instead of writing down a speciﬁc value to answer each question, studentsshould provide their answer in the form of a 90% conﬁdence interval, i.e. (lower Crapuchettes, D. and N. Heasley (2005). Wits and Wagers. Bethesda, MD: North Star Games. https://itunes.apple.com/us/app/wits-wagers-trivia-party/id637929057?mt=8 N = 10 and various probabilities of success. Some studentsmay have scored poorly due to an overestimation of how well they know the subject areasin question. Other students may have been overeager to play a competitive trivia game andmay have overlooked the speciﬁcation of the 90% conﬁdence level that should be consideredas part of their answers. For all students, this activity serves to bridge the gap between a5 uestion Answer We note that when exactly ten questions are utilized, students can ‘cheat’ the activity bygiving nine extremely wide intervals and one obviously wrong interval. In doing so, theycan almost deﬁnitely guarantee a score of nine out of ten. One way to ﬁx this loophole is tomodify the number of questions used. For instance, students might be asked 15 questionsbut only a random selection of 10 are used for scoring. Additionally, the activity could beextended by asking students to think about and propose a metric that could identify thiskind of cheating; for example, a mean and standard deviation could be calculated fromall intervals for a question, and students having multiple interval widths more than some z -scores above the mean could be ﬂagged as potential cheaters.The proposed activity can take many forms and be used for diﬀerent purposes. As agroup, the authors have used this activity with both graduate and undergraduate students,6t one or more times during the semester, and with slightly diﬀerent learning goals in mind.Two of the authors ﬁrst use the activity in an undergraduate introductory statistics courseduring the class period immediately following the introduction of conﬁdence intervals witha follow-up towards the end of the semester. Another has used it repeatedly throughout thesemester (using diﬀerent questions each time) in a graduate-level course on regression thatis a required course for statistics Masters students. In this course, the goal of repeatingthe activity is to illustrate the challenges of calibrating uncertainty, i.e. avoiding “excessiveprecision”, and understanding what it means to be 90% conﬁdent about a fact. In each ofthe courses described above, the instructors also employ small simulation-based exercisesto illustrate the properties of conﬁdence intervals in a more formal statistical sense.Apart from using this activity for understanding conﬁdence intervals, the act of pro-ducing intervals and then validating whether the answers fall within those intervals servesas a way of calibrating one’s level of conﬁdence. We believe students stand to beneﬁtfrom participating in this exercise several times over the course of a semester, of course,on diﬀerent trivia questions. In our experience, students score better with each additionaliteration (see Section 3). Such incremental improvements suggest that students learn tobetter quantify their levels of uncertainty with practice.In the graduate-level course where the activity was repeatedly used, the ComprehensiveAssessment of Outcomes in a First Statistics course, or the CAOS test, was used to evaluateoverall conceptual understanding of statistics (Delmas et al.; 2007). Speciﬁcally, the CAOSpre- and post-tests were administered at the beginning and end of the semester. We includea comparison of pre- and post-test scores from conﬁdence interval-related questions in theSupplementary Materials. Over the course of one semester, the three authors used this activity two, two, and ﬁvetimes in their respective classes. We now describe the results with data consisting ofstudent scores from these three classes. The data collection, management, and analysiswere approved by the University of Massachusetts at Amherst Institutional Review Board( rstanarm package(Gabry and Goodrich; 2016) in R version 3.3.2 (2016-10-31) (R Core Team; 2016) to ﬁt8he model with a Bayesian Markov Chain Monte Carlo algorithm, using standard weaklyinformative priors for the coeﬃcients and variance parameters. While accuracy increasedwith each iteration, the biggest incremental score gain occurred between the ﬁrst and sec-ond iterations, with an improvement of 1.5 to 2 questions answered correctly (Table 2).Accuracy increased more slowly between iterations 2 through 5, although an average im-provement of almost a single correct question was observed across these four iterations.Comparing the 90% posterior credible intervals (CI) for the number of questions answeredcorrectly per iteration, only rounds 4 (90% CI: (4.7, 7)) and 5 (90% CI: (4.9, 7.2)) showedsigniﬁcant increases in accuracy over iteration 1 (90% CI: (2.4, 4.4)); the credible intervalsfor iterations 2 and 3 still overlapped with the credible interval for iteration 1.Horton Wang ReichIteration 1 2 1 2 1 2 3 4 5n 21 21 26 19 11 13 14 14 14mode 5 3 1 4 6 6 8 3 7median 5.0 6.0 3.0 6.0 4.0 6.0 6.5 6.0 6.5mean 4.5 5.7 3.3 5.8 3.6 5.2 5.6 5.8 6.0sd 1.8 2.1 2.3 2.2 2.5 2.8 3.0 2.8 1.8estimated mean 3.3 5.3 5.6 5.9 6.1estimated 90% CI 2.2-4.5 4.1-6.5 4.4-6.8 4.7-7 4.9-7.2Table 2: Summary Statistics of Activity By Class and Iteration. The last two rows representmodel-based estimates.

Conﬁdence intervals play a key role in inferential thinking. We have described an interac-tive activity to help solidify students’ understanding of conﬁdence intervals. The activity isattractive because it requires no technology, is interactive and applicable to a wide varietyof students, and fun, in the spirit of friendly competition between students. The activitycan help reinforce statistical thinking and help students recognize overconﬁdence. However,it should be noted as a limitation of the activity that the intervals requested of the studentsare somewhat removed from the typical notion of a conﬁdence interval; students are not9 round nu m be r r i gh t ( ou t o f ) Figure 1: Student scores from ﬁve iterations of the activity ( n = 14, shown in lighter colors,color-coded by student). The solid black line indicates the mean score across all students.The dashed blue line indicates the model-estimated posterior median number of questionsanswered correctly, with corresponding 90% conﬁdence intervals shown in vertical bluelines.computing standard errors and point estimates but are rather relying on their own judge-ment. Nonetheless, because the activity does not involve any calculations, which tends tobe the typical entry point for teaching conﬁdence intervals, it can be particularly helpful forpromoting an intuitive understanding. Students with very little mathematical background,such as graduate students from non-quantitative ﬁelds, may ﬁnd this activity a helpful wayto understand conﬁdence intervals without the distraction of the mathematical formulae.It is interesting to note that while scores improve over repeated iterations of the activity,perfect calibration of conﬁdence was not attained within the course of a single semester.Even with ﬁve iterations of the activity, student scores plateau between 5 to 6 correctanswers out of 10, well below the desired level of 9. Moreover, students in the classesthat only used the activity twice in a semester seemed to do about as well in the seconditeration as those who had a third, fourth, or ﬁfth try. This observation suggests thatoverconﬁdence is challenging to overcome. Examining whether the same achievement gapholds if students were asked to produce conﬁdence intervals at diﬀerent levels (i.e. 50%or 70%) could provide information on whether this is a problem inherent to individual’s10udgment uncertainty, or whether 90% is a particularly tricky level to achieve.We relied on the CAOS test (details provided in the Supplementary Materials) to pro-vide a quantitative evaluation of student’s understanding of conﬁdence intervals. The datasuggested improvement in students’ overall understanding of the interpretation of conﬁ-dence intervals, although the sample size was small (15 students took both tests) andimprovements in understanding could also be attributed to other content in the course.With the continued increase of written content by “data journalists” in the mass media(often including some technical measures of uncertainty), the value of accurate intuitionabout uncertainty will grow for not just students and practitioners of statistics but for thegeneral public as well. The results shown in this manuscript serve as proof-of-concept thatasking students to create conﬁdence intervals for quantities that are of potential interestcan foster a more intuitive understanding of conﬁdence and uncertainty. Our experience isthat students are consistently surprised by the results of this exercise, leading to “teachablemoments” when students grasp how and why their interval coverage was too low. Therefore,activities such as the one presented here and others may serve an important role in creatinginformed consumers of modern data-driven content, whether journalistic or academic, byarming them with tools to interpret quantitative results. First we would like to thank the Associate Editor and two anonymous reviewers whoprovided very helpful feedback on the ﬁrst iteration of this manuscript. We also thank thestudents who participated in these activities. Finally, we thank Eric W. Bright, CFA, whointroduced NGR to a version of this activity.11 eferences

Alpert, M. and Raiﬀa, H. (1982).

A progress report on the training of probability assessors ,Cambridge University Press, pp. 294–305.ASA GAISE College working group: ASA (2016). Guidelines for assessment and instructionin statistics education: College report (draft).

URL:

Ayres, I. (2007).

Super Crunchers , Bantam Books.Behar, R., Grima, P. and Marco-Almagro, L. (2013). Twenty-ﬁve analogies for explainingstatistical concepts,

The American Statistician (1): 44–48.Cesarini, D., Sandewall, ¨O. and Johannesson, M. (2006). Conﬁdence interval estimationtasks and the economics of overconﬁdence, Journal of Economic Behavior & Organization (3): 453–470.Christensen-Szalanski, J. J. and Bushyhead, J. B. (1981). Physicians’ use of probabilis-tic information in a real clinical setting., Journal of Experimental Psychology: Humanperception and performance (4): 928.Cumming, G. (2007). Inference by Eye: Pictures of Conﬁdence Intervals and ThinkingAbout Levels of Conﬁdence, Teaching Statistics (3): 89–93.De Veaux, R. D., Velleman, P. and Bock, D. E. (2011). Stats: Data and Models , 3 edn,Pearson.Delmas, R., Garﬁeld, J., Ooms, A. and Chance, B. (2007). Assessing students’ conceptualunderstanding after a ﬁrst course in statistics,

Statistics Education Research Journal (2): 28–58.Diez, D. M., Barr, C. D. and Cetinkaya-Rundel, M. (2012). OpenIntro statistics , 174-175edn, CreateSpace.Fidler, F. and Cumming, G. (2005). Teaching conﬁdence intervals: Problems and potentialsolutions,

Proceedings of the 55th international statistics institute session .12abry, J. and Goodrich, B. (2016). rstanarm: Bayesian Applied Regression Modeling viaStan . R package version 2.10.1.

URL: https://CRAN.R-project.org/package=rstanarm

Hagtvedt, R., Jones, G. T. and Jones, K. (2008). Teaching Conﬁdence Intervals UsingSimulation,

Teaching Statistics (2): 53–56.Hynes, M. E. and Vanmarcke, E. H. (1976). Reliability of embankment performance pre-dictions , Department of Civil Engineering, Masachusetts Inst. of Technology.Jørgensen, M., Teigen, K. H. and Moløkken, K. (2004). Better sure than safe? over-conﬁdence in judgement based software development eﬀort prediction intervals,

Journalof Systems and Software (1): 79–93.Kalinowski, P. (2010). Identifying misconceptions about conﬁdence intervals, Proceedingsof the Eighth International Conference on Teaching Statistics .Kaplan, J., Fisher, D. G. and Rogness, N. T. (2010). Lexical ambiguity in statistics: howstudents use and deﬁne the words: association, average, conﬁdence, random and spread,

Journal of Statistics Education .Maurer, K. and Lock, E. (2015).

Bootstrapping in the introductory statistics curriculum ,Technology Innovations in Statistics Education.Moore, D. A. and Healy, P. J. (2008). The trouble with overconﬁdence.,

Psychologicalreview (2): 502.R Core Team (2016).

R: A Language and Environment for Statistical Computing , R Foun-dation for Statistical Computing, Vienna, Austria.

URL:

Russo, J. E. and Schoemaker, P. J. H. (1989).

Conﬁdent Decision Making , Piatkus.Soll, J. B. and Klayman, J. (2004). Overconﬁdence in interval estimates.,

Journal ofExperimental Psychology: Learning, Memory, and Cognition (2): 299.13 ppendix A: 30 Questions In this section, we provide thirty more questions in addition to the ones presented in Table 1for convenience. The ﬁrst ten questions are adapted from Russo and Schoemaker (1989)(reprinted in Ayres (2007), among other places), the next nine are taken from the Witsand Wagers board game, and the rest are written up by the authors.

Sources for questions 20 to 30 http://web.mta.info/nyct/facts/ridership/ https://en.wikipedia.org/wiki/List_of_James_Bond_films https://en.wikipedia.org/wiki/Yosemite_Falls https://en.wikipedia.org/wiki/Moons_of_Saturn https://en.wikipedia.org/wiki/Copa_Am%C3%A9rica_Centenario uestion Answer $ $ Supplementary Materials ctivity Printout The ﬁrst two pages serve as a convenient one-sheet (front and back) summary of the activity that can beprinted out for use in class.

Activity Outline

Activity lasts approximately 10-15 minutes. Steps follow:1. Each student starts out with a sheet of paper with numbered list from 1 to 10, leaving room for 10answers.2. You will read aloud 10 trivia questions each with a numeric answer. Instruct students to write downa 90% conﬁdence interval in response to each question.3. Read each of the trivia questions.4. Review answers to the trivia questions. Students should score themselves: 1 point for every intervalthat captures the correct numeric answer.5. Ask students to tally up their score out of 10.6. Obtain and visualize a distribution of student scores.7. Discuss the results as a group.

Possible Discussion Questions

1. If a student provided 90% conﬁdence intervals in all ten cases, how many points would we expecthim/her to score?2. If every student provided 90% conﬁdence intervals in all ten cases, what would a histogram of scoreslook like for the class?3. Examining the histogram (or stem-and-leaf plot) of scores from our class, do you think we wereoverconﬁdent or underconﬁdent?4. How might we, as a class, do better at this exercise?

Sample Questions, Answers, and Scoring

In the table below, we reiterate the questions/answers presented in Table 1 and present sample responsesfor each. Responses that are bolded earn one point. uestion Answer Sample Response (1 , , (3 , (50 , , , , (1896 , (1000 ,

10 How many states were part of the United States in 1860? 33 (0 , Total score: 6 out of 10

Sample Visualization

Sample Histogram of Scores

Score D en s i t y AOS Test Results