aa r X i v : . [ s t a t . O T ] D ec Kill The Math and Let the Introductory Course Be Born
David Kane
ABSTRACT : Our introductory classes in statistics and data science use too much mathematics. Thekey causal effect which our students want our classes to have is to improve their future performance andopportunities. The more professional their computing skills (in the context of data analysis), the greatertheir likely success. Introductory courses should feature almost no mathematical/statistical formulas beyondsimple algebra.
KEYWORDS : Education, Teaching, Introductory statistics.
Introduction In A Dance with Dragons , the fifth book in the
The Game of Thrones series, Maester Aemon Targaryenadvises Jon Snow, newly elected Lord Commander of the Night’s Watch, to “Kill the boy and let the man beborn.” Maester Aemon’s point was not that there was anything wrong with a boy’s friendships and interests.He was not anti-boy. Instead, Aemon recognized that those attachments prevented Jon Snow from movinginto manhood, from devoting his full energy to what mattered the most for his new role.This article provides similar advice:
Kill the math and let the introductory course be born.
The mathematicaland statistical formulas found in a typical introductory course are not evil in and of themselves. Math is funand beautiful. Yet it is precisely an over-emphasis on math which has allowed data science courses to thrive,to meet the informed student demand which statistics departments have too long ignored. Math is a boyishenthusiasm, unobjectionable when considered in isolation but inimical to the goals of our students becauseit crowds out topics which matter more.Nolan and Temple Lang (2010) were among the most prominent to recognize this conflict, insisting that“Computational literacy and programming are as fundamental to statistical practice and research as math-ematics.” In this article, I will discuss the causal effects which we and our students want our courses toachieve, connect those effects to some of the specific advice offered by Nolan and Temple Lang (2010), and1ecommend replacing large portions of the mathematics we currently inflict on our students in introductorycourses with lessons in statistical programming. Kaplan (2020) bemoans “the crawling pace of the integra-tion of modern computing into the statistics education curriculum.” Mathematics “grips with a dead handon the undergraduate [statistics] curriculum,” according to Cobb (2015). Let us break loose.
Goals
Which causal effects do we seek to achieve in our classes? Rumsey (2002), echoing Cobb (1992) and others,argues for “statistical literacy.” Hardin et al. (2015) want to prepare students to “think with data.” (Seealso Horton and Hardin (2015).) Aliaga et al. (2005) used similar terminology, urging us to “emphasizestatistical literacy and develop statistical thinking” among our students. No reader will disagree with theseworthy goals. Or, perhaps a better word than goals would be platitudes . We need to be much more preciseif we want to make informed choices about how to balance math and programming in our courses.De Veaux et al. (2017) discuss guidelines for undergraduate data science programs. None are objectionable.Yet, in aggregate, they fail to provide clear metrics for making specific trade-offs. This issue applies acrossour curriculum but, for purposes of this article, I will focus on introductory courses. The (vast?) majorityof our students only take one or two classes in statistics. Consider a collection of causal effects which anintroductory course might have. Rubin (1974) defines a causal effect as the difference between two potentialoutcomes. For simplicity, compare the effect of spending 10 extra hours of student time on programmingversus spending those same 10 hours on mathematics.We want our students to do well:•
In the final project for this class.
Such a causal effect is only defined, of course, for classes with a finalproject. For those without, however, we can consider the causal effect on final project performance ifthe class were, counterfactually, to have had a final project.•
In research projects for other classes.
As Bray et al. (2014) and Mair (2016) explain, reproduciblestatistical research is impossible without basic computer skills. If we want our students to do betterwork in their other classes, we must give them the tools to do so.•
As a summer research intern.
We want out students to perform well when they work on summerresearch for us, and for other faculty. As every supervisor knows, independent work outside theclassroom is very different from the generally well-structured tasks involved in class assignments.•
When working as a summer intern for non-faculty administrators.
The first three causal effects are not2ontroversial. Preparing our students for academic pursuits — class projects, summer research — is atthe heart of our pedagogical responsibilities. But why should we be concerned with how well studentsdo in working for admissions or in the investment office? Still, we like our non-faculty colleagues andcertainly hope our students do good work for them.•
When working as a summer intern or full time employee outside the university. “Hold on!” shout myfellow faculty. “When did our introductory statistics class morph into a seminar at the Office of CareerServices?” The answer: When we decided to take our students’ hopes and dreams seriously. Studentsknow they won’t be in school forever. Almost all of them will spend the rest of their lives working for,and being evaluated by, non-professors at non-academic organizations. Problem sets and exams willbe left behind, along with other childish things. Our students want to do well in the outside world.Do we want that? None of us are so misanthropic that we want them to do poorly, of course. Yet anyhonest appraisal of our syllabi would suggest that helping them to do well outside of academia is notthe highest of our priorities.•
In getting hired in any of the above roles.
However well we might prepare students for summer researchpositions and full time employment, that preparation counts for naught if students can’t get thosejobs. How many of us give any thought to increasing our students’ chances of being hired? Again, wearen’t against student success per se , but very few of us explicitly consider the causal effects which ourcourses have on this outcome. On the margin, 10 hours spent on coding increases the odds of a joboffer much more than 10 hours spent on mathematics, however beautiful the mathematics might be.Each professor will weigh the relative importance of these causal effects differently. Some (many?) will putvery little weight on the last two. Your students, however, will put most (almost all?) their weight on gettinga desirable internship/job/career and doing well at it.These are not the only outcomes professors (and students) care about. There is nothing evil about under-standing the math behind the sampling distribution of the mean, the confidence interval for a proportion, anull hypothesis test, the Central Limit Theorem or the Chi-square statistic. The causal effect of spendingmore time on math is to increase student comprehension of these (important!) theories.The central reason for the rise of data science courses is that students recognize, correctly, that taking atypical Data Science 101 course increases their odds of getting the future they want relative to what thoseodds would have been if they had taken Statistics 101. But even those data science classes don’t do as muchas they might to improve the odds. 3 n Introductory Course
Even after the improvements in statistics education over the last 20 years, there is still too much math in thetypical introductory course. Consider some of the statistical formulas in De Veaux et al. (2018), a superbintroductory textbook:• σ (ˆ p ) = p pqn , the sampling distribution model for a proportion.• SE (ˆ p ) = q ˆ p ˆ qn , the standard error of an estimated proportion.• ¯ y ± t ∗ n − × q P ( y − µ )2 n √ n , the one-sample t -interval for the mean.• t n − = ¯ y − µ q P ( y − µ )2 n √ n , the one-sample t -test for the mean.• SD ( ˆ p − ˆ p ) = q p q n + p q n , the sampling distribution model for a difference between proportions.• SE ( ˆ p − ˆ p ) = q ˆ p ˆ q n + ˆ p ˆ q n , a two-proportion z -interval.• z = ( ˆ p − ˆ p ) − p ˆ p q n + ˆ p q n , a two-proportion z -test.• t = ( ¯ y − ¯ y ) − ( µ − µ ) q s n + s n , a sampling distribution model for the difference between two means.• t = ( ¯ y − ¯ y ) − ∆ q s n + s n , a two-sample t -test for the difference between means.• t n − = ¯ d − ∆ sd √ n , the paired t -test.• χ = P ( Obs − Exp ) Exp , the Chi-square statistic.• t = b − β se √ n − sx , a sampling distribution for regression slopes.• V IF = − R j ) , the variance inflation factor.• SE (ˆ µ ν ) = q SE ( b ) × ( x ν − ¯ x ) + s e n , confidence interval for a predicted mean value.• SE (ˆ y ν ) = q SE ( b ) × ( x ν − ¯ x ) + s e n + s e , the prediction interval for an individual.I use De Veaux et al. (2018) for this example, not because it is excessively mathematical, but because itrepresents the very best of accepted practice, including a solid introduction to randomization-based infer-ence. With regard to the changes made for the current edition, the authors insist (p. xi) that “all of theseenhancements follow the new Guidelines for Assessment and Instruction in Statistics Education (GAISE)2016 report” resulting in “a course that is more aligned with the skills needed in the 21st century, one thatfocuses even more on statistical thinking and makes use of technology in innovative ways, while retaining4ore principles and topic coverage.” And who is better positioned than second author Paul Velleman, aco-author of both the GAISE 2005 and GAISE 2016 reports, to make such a claim?The central problem with curriculum design, as with writing, is knowing what to cut. De Veaux et al.(2018) explicitly refuse that hard choice, noting that “Many first statistics courses serve wide audiences ofstudents who need these skills for their own work in disciplines where traditional statistical methods are,well, traditional. So we have not reduced our emphasis on the concepts and methods you expect to find inour texts.”In other words, despite decades of committee meetings and reports, these authors present — and expecttheir students to learn — about the same amount of mathematics as they would have last century. I urge adifferent approach.Consider Gov 50: Data, a Harvard course which I have taught for the last 4 semesters. (Kane (2020a)) Thisis an introductory statistics course, focused on meeting the needs of political science majors but open to theentire university. In the most recent semester, we had 120 students, ranging in level from first years throughmasters students, with the typical student having no background in statistics or programming. We use anopen-source textbook (Kane (2020b)) created specifically for the course. One motto for the course is that,even though everyone is welcome, the target audience is “poets and philosophers,” students whose mainacademic focus is elsewhere but who recognize that some basic data analysis skills will be helpful, in boththeir other courses and after graduation. The course is designed to be the answer to the following question:“If I only want to devote one course to learning skills which are valued outside the university, which courseshould I take?”I never discuss the formulas which De Veaux et al. (2018) spend so much time on. I never ask a question aboutthem in problem sets or exams. I devote all that time, and more, toward improving students’ computationalskills. My advice:• Use Git and Github . See Bryan (2018), Fiksel et al. (2019) and Beckman et al. (2020) for guidance. Itis possible to write an essay using just a typewriter. It is possible to write code without using sourcecontrol. Neither is professional. Although Git/Github are the most popular tools today, that willchange over time. The important point is not the exact tool. The key is that students should, as muchas possible, work with professional tools in a professional fashion. No employer wants to teach yourstudents about source control.•
Use an open source programming language . For introductory classes, R (R Core Team (2019)) is thebest choice. See Çetinkaya-Rundel and Bray (2012), Carson and Basiliko (2016), Silva and Moura52020) and Long and Turner (2020) and for discussion. Python can also work, especially if some priorprogramming experience is a prerequisite for the class. These are the statistical programming languagesthat employers want our students to know.•
Teach randomization-based inference . Cobb (2007) provides some background on why we should prefersuch an approach. See Wardrop (1995) for an early approach and Ernst (2004) for an overview. Ismayand Kim (2019) is an excellent textbook, suitable for use in an introductory class. Although there isevidence (Hildreth et al. (2018)) that a simulation-based curriculum improves student understanding,an even bigger advantage, relative to the traditional approach, is the excuse it offers for programmingpractice. The more code that students write, the better they will become at coding.•
Flip the classroom . Nielson et al. (2018) demonstrate that students learn more in a flipped classroom.Perhaps even more important, however, is that a flipped classroom allows for more time to be spentin supervised practice. Every minute spent confronting real data — importing, shaping, visualizing,modeling — is a minute well-spent. Can the same be said for every minute of every lecture? If studentswill only devote X hours per week to an introductory course, the best way to ensure that they got asmuch practice as possible is to devote class time to that practice. You learn soccer with the ball atyour feet. You learn data science with your hands on the keyboard.•
Cold-call . There is no better way to ensure that students are engaged in a class than to call on themat random. See Dallimore et al. (2012) for a literature review and Lemov (2015) for practical details.•
Require that student work be reproducible . See Bray et al. (2014) for motivation and Baumer et al.(2014) for advice. If your own work is reproducible (and I assume that it is), why wouldn’t you demandthe same level of rigor from your students?•
Require that student work be public . Students are somewhat diligent with the work that only youwill see. They take work which others will see much more seriously. Requiring that student work bepublic is the easiest way, or at least the most pleasant way, to cause students to work harder. Andthe harder they work, the more likely they are to learn something. Naive observers might fear thatgovernment regulations, like the Family Educational Rights and Privacy Act (FERPA), prevent facultyfrom requiring work be made public. This is not true. See Ramirez (2009) for details. FERPA, andsimilar regulations, apply to the records — grades, comments, et cetera — which faculty create. Youcan’t make those public. You may require that student work be public. See bit.ly/1005_projects forseveral hundred projects completed by my students.•
Require solo final projects . See Ledolter (1995) for background and discussion of final (or research)6rojects in statistics classes. Imagine that you are trying to decide between two otherwise similarcandidates for a summer research position. The first has created an impressive final project, writtenin R and hosted on Github, for which all the analysis is reproducible. The second has an impressivegrasp of the mathematics behind a t -test. Who would you hire? Outside of academia, where almostno one uses much mathematics, the first student has a big advantage, just as Horton (2015) reports.Projects should be solo because otherwise students will divide-and-conquer the work, often with onedoing all the writing and the other all the coding, which is not the outcome we want.My introductory course does all these things. The combined causal effects of these requirements is tosignificantly increase my students’ odds of achieving the futures they want, relative to what those odds wouldhave been if my course were more mathematical. Computational skills are more relevant than mathematicalunderstanding to their future success.A similar course would work just as well outside of Harvard. First, there is an extensive overlap betweenthe bottom quarter of the Harvard distribution of student ability and the student bodies of other top tercileschools. Second, weaker students are even more interested in learning skills which employers value. Withless talented students, I would follow exactly the same approach while, perhaps, cutting back on the totalnumber of topics covered. Conclusion
Horton (2015) writes:Anecdotal reports have indicated that statistics undergraduates are at a competitive disadvantagerelative to undergraduate computer science (CS) degree holders for entry level positions. Suchpositions tend to have data-related skills at the core of the job descriptions. At present, manyCS students tend to be able to perform computations with data more easily than their statisticsequivalents. This should not be allowed to continue.“Allowed by whom?” one might ask. The central problem is that, even a decade after Nolan and TempleLang (2010) made a similar point, we have done too little. Nolan and Temple Lang (2010) quoted Friedman(2001):Computing has been one of the most glaring omissions in the set of tools that have so far definedStatistics. Had we incorporated computing methodology from its inception as a fundamentalstatistical tool (as opposed to simply a convenient way to apply our existing tools) many of the7ther data related fields would not have needed to exist. They would have been part of our field.Indeed. But it is never too late to change course. Do you need to make the full leap described above nextsemester? No. Start slowly. Remove 10 hours of time spent on math and add 10 hours more on programming.Friedman (2001) also suggested that “We may have to moderate our romance with mathematics.” There isno “may” about it. To make room for computation, we must ditch something. The most obvious somethingis mathematics. We should stop using mathematics beyond algebra in our introductory courses. No questionon a problem set or exam should require the use of a formula. We need radical surgery if our courses are tohave the causal effects which our students most want. My message to instructors: Stop using mathematics,despite your fond memories of how it helped you in the past, and, from this point forward, use only computers.
Kill the math and let the introductory course be born.Acknowledgment : I thank Joe Blitzstein, Mike Parzen and Liberty Vittert for useful discussions. Specialthanks to three anonymous reviewers and to as associate editor of the
Journal of Statistics Education fortheir comments, and to an anonymous reviewer and an editor at the
Harvard Data Science Review . References
Aliaga, M., Cobb, G., Cuff, C., Garfield, J., Gould, R., Lock, R., Moore, T., Rossman, A., Stephenson, B.,Utts, J., Velleman, P., and Witmer, J. (2005),
Guidelines for assessment and instruction in statisticseducation (GAISE): College Report , American Statistical Association .Baumer, B., Çetinkaya-Rundel, M., Bray, A., Loi, L., and Horton, N. J. (2014), “R markdown: Integratinga reproducible analysis tool into introductory statistics,”
Technology Innovations in Statistics Education ,University of California: Berkeley Electronic Press, 8.Beckman, M. D., Çetinkaya-Rundel, M., Horton, N. J., Rundel, C. W., Sullivan, A. J., and Tackett, M.(2020), “Implementing version control with git as a learning objective in statistics courses.”Bray, A., Çetinkaya-Rundel, M., and Stangl, D. (2014), “Five concrete reasons your students should belearning to analyze data in the reproducible paradigm,”
Chance , Abingdon: Taylor & Francis Ltd., 27,53.Bryan, J. (2018), “Excuse me, do you have a moment to talk about version control?”
The AmericanStatistician , Taylor & Francis, 72, 20.Carson, M. A., and Basiliko, N. (2016), “Approaches to r education in canadian universities,”
F1000 research ,8ngland: Faculty of 1000 Ltd, 5, 2802.Cobb, G. (1992), “Heeding the call for change: Suggestions for curricular action,”
Teaching Statistics , 22,3–43.Cobb, G. (2007), “The introductory statistics course: A ptolemaic curriculum?” University of California:eScholarship.Cobb, G. (2015), “Mere renovation is too little too late: We need to rethink our undergraduatecurriculum from the ground up,”
The American Statistician , Taylor & Francis, 69, 266–282.https://doi.org/10.1080/00031305.2015.1093029.Çetinkaya-Rundel, M., and Bray, A. (2012), “Integrating R into introductory statistics.” Joint StatisticalMeetings.Dallimore, E. J., Hertenstein, J. H., and Platt, M. B. (2012), “Impact of cold-calling on student voluntaryparticipation,”
Journal of management education , Los Angeles, CA: SAGE Publications, 37, 305–341.De Veaux, R. D., Agarwal, M., Averett, M., Baumer, B. S., Bray, A., Bressoud, T. C., Bryant, L., Cheng, L. Z.,Francis, A., Gould, R., Kim, A. Y., Kretchmar, M., Lu, Q., Moskol, A., Nolan, D., Pelayo, R., Raleigh, S.,Sethi, R. J., Sondjaja, M., Tiruviluamala, N., Uhlig, P. X., Washington, T. M., Wesley, C. L., White, D.,and Ye, P. (2017), “Curriculum guidelines for undergraduate programs in data science,”
Annual Reviewof Statistics and Its Application , 4, 15–30. https://doi.org/10.1146/annurev-statistics-060116-053930.De Veaux, R., Velleman, P., and Bock, D. (2018),
Intro stats , Boston: Pearson.Ernst, M. D. (2004), “Permutation methods: A basis for exact inference,”
Statistical Science , Institute ofMathematical Statistics, 19, 676–685.Fiksel, J., Jager, L. R., Hardin, J. S., and Taub, M. A. (2019), “Using Github Classroom to teach statistics,”
Journal of Statistics Education , Taylor & Francis, 27, 110–119.Friedman, J. H. (2001), “The role of statistics in the data revolution?”
International Statistical Review ,Wiley, 69, 5–10.Hardin, J., Hoerl, R., Horton, N. J., Nolan, D., Baumer, B., Hall-Holt, O., Murrell, P., Peng, R.,Roback, P., Temple Lang, D., and Ward, M. D. (2015), “Data science in statistics curricula: Prepar-ing students to ‘Think with Data’,”
The American Statistician , Taylor & Francis, 69, 343–353.https://doi.org/10.1080/00031305.2015.1077729.Hildreth, L. A., Robison-Cox, J., and Schmidt, J. (2018), “Comparing student success and understanding9n introductory statistics under consensus and simulation-based curricula,”
Statistics Education ResearchJournal , 17, 103–120.Horton, N. J. (2015), “Challenges and opportunities for statistics and statistical education: Looking back,looking forward,”
The American Statistician , Taylor & Francis, 69, 138–145. https://doi.org/10.1080/00031305.2015.1032435.Horton, N. J., and Hardin, J. S. (2015), “Teaching the next generation of statistics students to ‘Think withData’: Special issue on statistics and the undergraduate curriculum,”
The American Statistician , Taylor& Francis, 69, 259–265. https://doi.org/10.1080/00031305.2015.1094283.Ismay, C., and Kim, A. Y. (2019),
Statistical inference via data science: A ModernDive into R and theTidyverse , Chapman & hall/CRC the r series, CRC Press.Kane, D. (2020a), “Syllabus for Gov 1005: Data,” Zenodo. https://doi.org/10.5281/zenodo.3766268.Kane, D. (2020b),
Preceptor’s Primer for Bayesian Data Science , Zenodo. https://doi.org/10.5281/zenodo.3766374.Kaplan, D. (2020), “StatPREP: Living in interesting times for introductory statistics education,”
AmstatNews , American Statistical Association, 36.Ledolter, J. (1995), “Projects in introductory statistics courses,”
The American Statistician , Taylor & FrancisGroup, 49, 364–367.Lemov, D. (2015),
Teach like a champion 2.0 : 62 techniques that put students on the path to college , SanFrancisco: Jossey-Bass.Long, J. D., and Turner, D. (2020), “Applied r in the classroom,”
Australian Economic Review , WileySubscription Services, Inc, 53, 139–157.Mair, P. (2016), “Thou shalt be reproducible! A technology perspective,”
Frontiers in Psychology , 7, 1079.https://doi.org/10.3389/fpsyg.2016.01079.Nielson, P. L., Bean, N. W. B., and Larsen, R. A. A. (2018), “The impact of a flipped classroom model oflearning on a large undergraduate statistics class,”
Statistics Education Research Journal , 17, 121–140.Nolan, D., and Temple Lang, D. (2010), “Computing in the statistics curricula,”
The American Statistician ,Taylor & Francis, 64, 97–107. https://doi.org/10.1198/tast.2010.09132.R Core Team (2019),
R: A language and environment for statistical computing , Vienna, Austria: R Founda-tion for Statistical Computing.Ramirez, C. A. (2009),
FERPA clear and simple: The college professional’s guide to compliance , The jossey-10ass higher and adult education series FERPA clear and simple, San Francisco: Jossey-Bass.Rubin, D. B. (1974), “Estimating causal effects of treatments in randomized and nonrandomized studies,”
Journal of Educational Psychology , American Psychological Association, 66, 688–701.Rumsey, D. J. (2002), “Statistical literacy as a goal for introductory statistics courses,”
Journal of StatisticsEducation , Taylor & Francis, 10. https://doi.org/10.1080/10691898.2002.11910678.Silva, H. A. da, and Moura, A. S. (2020), “Teaching introductory statistical classes in medi-cal schools using RStudio and r statistical language: Evaluating technology acceptance andchange in attitude toward statistics,”
Journal of Statistics Education , Taylor & Francis, 0, 1–8.https://doi.org/10.1080/10691898.2020.1773354.Wardrop, R. L. (1995),