Integrating computing in the statistics and data science curriculum: Creative structures, novel skills and habits, and ways to teach computational thinking
aa r X i v : . [ s t a t . O T ] D ec Integrating computing in the statistics and datascience curriculum: Creative structures, novelskills and habits, and ways to teachcomputational thinking
Nicholas J. HortonAmherst CollegeandJohanna S. HardinPomona CollegeDecember 23, 2020
Abstract
Nolan and Temple Lang (2010) argued for the fundamental role of computing in thestatistics curriculum. In the intervening decade the statistics education communityhas acknowledged that computational skills are as important to statistics and datascience practice as mathematics. There remains a notable gap, however, between ourintentions and our actions. In this special issue of the
Journal of Statistics and DataScience Education we have assembled a collection of papers that (1) suggest creativestructures to integrate computing, (2) describe novel data science skills and habits,and (3) propose ways to teach computational thinking. We believe that it is criticalfor the community to redouble our efforts to embrace sophisticated computing in thestatistics and data science curriculum. We hope that these papers provide usefulguidance for the community to move these efforts forward.
Keywords: statistical computing, algorithmic thinking, education, data acumen, statisticalanalysis, workflow 1 ntroduction
In their 2010 paper “Computing in the Statistics Curriculum”, Deborah Nolan and DuncanTemple Lang noted that “computational literacy and programming are as fundamentalto statistical practice and research as mathematics” and that “these changes necessitatere-evaluation of the training and education practices in statistics” (Nolan & Temple Lang2010). We couldn’t agree more about the fundamental role of computing and the need forchange at all educational levels. Over the last decade we’ve seen the role of computingin the statistics curriculum change and grow. The tools have become better, computingis now more established in almost every classroom, and arguably most importantly, thedevelopment and success of modern statistics has been enhanced by ideas of computationalthinking.Before introducing the articles in this special issue, we reflect on the questions originallyposed by Nolan and Temple Lang:1. When they graduate, what ought our students be able to do computationally, andare we preparing them adequately in this regard?2. Do we provide students the essential skills needed to engage in statistical problemsolving and keep abreast of new technologies as they evolve?3. Do our students build the confidence needed to overcome computational challengesto, for example, reliably design and run a synthetic experiment or carry out a com-prehensive data analysis?4. Overall, are we doing a good job preparing students who are ready to engage in andsucceed at statistical inquiry?Nolan and Temple Lang also provided a damning critique of the status quo at the time:Many statisticians advocate-or at least practice-the approach in which studentsare told to learn how to program by themselves, from each other, or fromtheir teaching assistant in a two-week “crash course” in basic syntax at thestart of a course. Let us reflect on how effective this approach has been. Canour students compute confidently, reliably, and efficiently? We find that thisdo-it-yourself ‘lite’ approach sends a strong signal that the material is not of2ntellectual importance relative to the material covered in lectures. In addition,students pick up bad habits, misunderstandings, and, more importantly, thewrong concepts. They learn just enough to get what they need done, but theydo not learn the simple ways to do things nor take the time to abstract whatthey have learned and assimilate these generalities. Their initial knowledgeshapes the way they think in the future and typically severely limits them,making some tasks impossible. (page 100)We concur that such an approach to computation is insufficient and at times counter-productive.What has happened in the intervening decade? We believe that there is a growingconsensus on the importance of computational literacy and computing in the statisticsand data science curriculum. The American Statistical Associations updated Guidelinesfor Undergraduate Programs in Statistics (American Statistical Association 2014), the re-vised GAISE (Guidelines for Assessment and Instruction in Statistics Education) Collegereport (American Statistical Association 2016), and the National Academies of Science, En-gineering, and Medicine’s consensus study on “Data Science for Undergraduates: Opportu-nities and Options” (National Academies of Science, Engineering, and Medicine 2018) pro-vide detailed rationales for the fundamental role computing plays in statistical thinking.More pointedly, George Cobb (2015) noted a convergence of mathematics, computation,and context in statistics education and called for a deep-rethinking of the curriculum fromthe ground up. The “Mere Renovation is Too Little Too Late” paper sparked 19 spiritedresponses and a provocative rejoinder (more on the “tear-down” metaphor) that challengedthe community in a number of fundamental ways (Various 2015).We envisioned this special issue as a way both to highlight innovations and approachesthat have helped move the profession forward, as well as to identify places where futurework is needed. Many of these papers work to answer the questions posed by Nolan andTemple Lang as well as ones they had not anticipated in 2010.The set of articles included in the special issue can be organized into three non-mutuallyexclusive clusters that take different approaches to address the questions laid out by Nolanand Temple Lang. The first approach features creative structures for changing how we3ntegrate computing into the learning of statistics. The second approach focuses on novelor technical data science skills and habits . The third reflects that, more and more,statistics educators are embracing and teaching ideas of computational thinking . Creative structures
Restructuring how we conceive of a syllabus and how we teach particular material is nevera small task. However, as different individuals modernize their own courses, we can alllearn from their experiences. Both Çetinkaya-Rundel & Ellison (2021) and Donoghue et al.(2021) describe creative and modern data science courses that fold together aspects ofstatistical inference with vital computational skills. Schwab-McCoy et al. (2021) reporton a study describing the emerging consensus of the elements of a data science course.Kim & Henke (2021) present some of the technical aspects vital to getting a solid compu-tational course up and running. A less technical approach is described by Burckhardt et al.(2021) using the suite of materials implemented by their Integrated Statistics Learning En-vironment (ISLE). An immersive data science living and learning community is presentedby Gundlach & Ward (2021). Finally, Theobold et al. (2021) describe an alternative tocourse learning through a series of workshops.
Novel or technical data science skills and habits
The world of data science is rapidly changing, and it can be incredibly difficult to keepup. Many of the papers in this special issue focus on new, important, and exciting skillsand tools that are important for students if they want to contribute in today’s data-centricworld. Boehm & Hanlon (2021) and Çetinkaya-Rundel & Ellison (2021) discuss the fullcycle of iterating a data science project. Kim & Hardin (2021) take it one step further anddescribe the importance of iterating on the full cycle. A few specific skills are laid out indetail: Dogucu & Çetinkaya-Rundel (2021) describe web scraping and Adams et al. (2021)explore techniques for working with multivariate data. Beckman et al. (2021) compareways of incorporating Git in the statistical classroom so that students have the skills tohit the ground running in jobs and in their own data projects, reinforcing the value ofreproducible workflows as a foundation for reproducible research.4 omputational thinking
The last approach may be the most difficult for statistics and data science educators toembrace and implement in their own classes. The value of bringing in ideas of softwareengineering or computational thinking is that they help create a mindset that empowersstudents to simultaneously think both statistically and computationally. Wing (2006) de-scribes how computing can impact a field, for example, “Computer science’s contributionto biology goes beyond the ability to search through vast amounts of sequence data lookingfor patterns. The hope is that data structures and algorithms—our computational abstrac-tions and methods—can represent the structure of proteins in ways that elucidate theirfunction. Computational biology is changing the way biologists think.”As a discipline, we are embracing the many ways that computing is changing how statis-ticians think. Woodard & Lee (2021) report on a study where students spoke through theirthought process as they performed computational tasks; their results are, somewhat unsur-prisingly, that computing is difficult and not intuitive. Schwab-McCoy et al. (2021) describethe challenge in front of us to teach computational thinking effectively. Donoghue et al.(2021), who describe debugging, and Theobold et al. (2021), who discuss the teaching ofiteration (a fundamental component of algorithmic thinking), speak to integrating smallpieces of computational thinking within the data science curriculum. Reinhart & Genovese(2021) describe an entire course that is a cross between software engineering and statistics,providing insight into the types of skills that many statisticians (at all levels) need forsuccess.
What would Deb and Duncan say?
As we worked on the special issue, we thought that there would be value in asking theauthors of Nolan & Temple Lang (2010) to share their thoughts about the paper, whatthey saw as most valuable, and to peer into the future. Their incisive and provocativeretrospective leads off the special issue (Nolan & Temple Lang 2021).5 onclusion
The articles in the special issue encourage us to redouble our efforts to embrace computingin the classroom, to constantly push ourselves to learn more tools, and to let computationalthinking make our own work better. We believe that the leading thinkers of the nextdecade will be those who seamlessly knit together tools from both statistics and computingand that how we think about statistics will be informed by complementary computationalthinking. To forge ahead we need to cultivate computing foundations throughout thestatistics paradigm. It is our hope that the papers in this special issue initiate a new wayof thinking for your and your students. 6 eferences
Adams, B., Baller, D., Jonas, B., Joseph, A.-C. & Cummiskey, K. (2021), ‘Computationalskills for multivariable thinking in introductory statistics’,
Journal of Statistics and DataScience Education (1), 1–21. URL: https://doi.org/10.1080/10691898.2020.1852139
American Statistical Association (2014), ‘Curriculum guidelines for undergraduate pro-grams in statistical science’. Accessed: 2020-12-20.
URL:
American Statistical Association (2016), ‘Guidelines for assessment and instruction instatistics education (GAISE) revised college report’. Accessed: 2020-12-20.
URL:
Beckman, M. D., Çetinkaya-Rundel, M., Horton, N. J., Rundel, C. W., Sullivan, A. J. &Tackett, M. (2021), ‘Implementing version control with Git and GitHub as a learningobjective in statistics and data science courses’,
Journal of Statistics and Data ScienceEducation (1), 1–35. URL: https://doi.org/10.1080/10691898.2020.1848485
Boehm, F. J. & Hanlon, B. M. (2021), ‘What is happening on Twitter? a framework forstudent research projects with tweets’,
Journal of Statistics and Data Science Education (0), 1–XX. URL: https://doi.org/10.1080/10691898.2020.1848486
Burckhardt, P., Nugent, R. & Genovese, C. R. (2021), ‘Teaching statistical concepts andmodern data analysis with a computing-integrated learning environment’,
Journal ofStatistics and Data Science Education (1), 1–28. URL: https://doi.org/10.1080/10691898.2020.1854637
Çetinkaya-Rundel, M. & Ellison, V. (2021), ‘A fresh look at introductory data science’,7 ournal of Statistics and Data Science Education (0), 1–11. URL: https://doi.org/10.1080/10691898.2020.1804497
Cobb, G. (2015), ‘Mere renovation is too little too late: We need to rethink our undergrad-uate curriculum from the ground up’,
The American Statistician (4), 266–282.Dogucu, M. & Çetinkaya-Rundel, M. (2021), ‘Web scraping in the statistics and datascience curriculum: Challenges and opportunities’, Journal of Statistics and Data ScienceEducation (0), 1–11. URL: https://doi.org/10.1080/10691898.2020.1787116
Donoghue, T., Voytek, B. & Ellis, S. E. (2021), ‘Teaching creative and practical data scienceat scale’,
Journal of Statistics and Data Science Education (1), 1–22. URL: https://doi.org/10.1080/10691898.2020.1860725
Gundlach, E. & Ward, M. D. (2021), ‘The data mine: Enabling data science across thecurriculum’,
Journal of Statistics and Data Science Education (1), 1–14. URL: https://doi.org/10.1080/10691898.2020.1848484
Kim, A. Y. & Hardin, J. (2021), ‘’Playing the whole game’: A data collection and anal-ysis exercise with Google Calendar’,
Journal of Statistics and Data Science Education (0), 1–10. URL: https://doi.org/10.1080/10691898.2020.1799728
Kim, B. & Henke, G. (2021), ‘Easy-to-use cloud computing for teaching data science’,
Journal of Statistics and Data Science Education (1), 1–18. URL: https://doi.org/10.1080/10691898.2020.1860726
National Academies of Science, Engineering, and Medicine (2018),
Data Science for Un-dergraduates: Opportunities and Options . Accessed: 2020-12-20.
URL: https://nas.edu/envisioningds
Nolan, D. A. & Temple Lang, D. (2010), ‘Computing in the statistics curriculum’,
TheAmerican Statistician (2), 97–107. 8olan, D. A. & Temple Lang, D. (2021), ‘Computing in the statistics curricula: A 10-yearretrospective’, Journal of Statistics and Data Science Education (0), 1–XX. URL: https://doi.org/10.1080/10691898.2020.1862609
Reinhart, A. & Genovese, C. R. (2021), ‘Expanding the scope of statistical computing:Training statisticians to be software engineers’,
Journal of Statistics and Data ScienceEducation (1), 1–23. URL: https://doi.org/10.1080/10691898.2020.1845109
Schwab-McCoy, A., Baker, C. M. & Gasper, R. E. (2021), ‘Data science in 2020: Com-puting, curricula, and challenges for the next 10 years’,
Journal of Statistics and DataScience Education (1), 1–17. URL: https://doi.org/10.1080/10691898.2020.1851159
Theobold, A. S., Hancock, S. A. & Mannheimer, S. (2021), ‘Designing data science work-shops for data-intensive environmental science research’,
Journal of Statistics and DataScience Education (1), 1–31. URL: https://doi.org/10.1080/10691898.2020.1854636
Various (2015), ‘Discission papers: Mere renovation is too little too late: We need to rethinkour undergraduate curriculum from the ground up’,
The American Statistician (4). URL: https://nhorton.people.amherst.edu/mererenovation
Wing, J. M. (2006), ‘Computational thinking’,
Communications of the ACM (3).Woodard, V. & Lee, H. (2021), ‘How students use statistical computing in problem solving’, Journal of Statistics and Data Science Education (1), 1–18. URL: https://doi.org/10.1080/10691898.2020.1847007https://doi.org/10.1080/10691898.2020.1847007