Explorations in Statistics Research: An Approach to Expose Undergraduates to Authentic Data Analysis
aa r X i v : . [ s t a t . O T ] A ug Explorations in Statistics Research:An Approach to Expose Undergraduates toAuthentic Data Analysis
Deborah NolanBerkeley, CA 94720-3860 ∗ Duncan Temple LangDavis, CA 95616 † August 25, 2015 ∗ Deborah Nolan is a Professor, Department of Statistics, University of California, 367 Evans Hall MC 3860,Berkeley CA 94720-3860, (email: deborah [email protected] ) † Duncan Temple Lang is Director of the campus Data Sciences Initiative and a Professor, Department of Statis-tics, University of California, 4210 Mathematical Sciences Building, One Shield Ave, Davis, CA 95616 (email: [email protected] ). The authors gratefully acknowledge support from the National Science Foundationgrant DMS-0840001. bstract The Explorations in Statistics Research workshop is a one-week NSF-funded summer pro-gram that introduces undergraduate students to current research problems in applied statistics.The goal of the workshop is to expose students to exciting, modern applied statistical researchand practice, with the ultimate aim of interesting them in seeking more training in statistics atthe undergraduate and graduate levels. The program is explicitly designed to engage studentsin the connections between authentic domain problems and the statistical ideas and approachesneeded to address these problems, which is an important aspect of statistical thinking that isdifficult to teach and sometimes lacking in our methodological courses and programs. Overthe past nine years, we ran the workshop six times and a similar program in the sciences twotimes. We describe the program, summarize feedback from participants, and identify the keyfeatures to its success. We abstract these features and provide a set of recommendations forhow faculty can incorporate important elements into their regular courses. K EY W ORDS : statistical problem solving, visualization, data analysis pipeline, pedagogy, co-curricular activity.
The Explorations in Statistics Research (ESR) workshop is a week-long summer program whereundergraduates work closely with statisticians and graduate students to analyze data from impor-tant research problems at the frontier of applied statistics. The aim of the program is to givestudents an understanding of and experience with the role of statistics in scientific discovery withthe goal of encouraging them to pursue advanced studies in statistical science. The students areguided through the process of using statistics to address interesting scientific and social questions.In the workshop, they experience how statisticians work and reason about an authentic problem ina science or industry domain.This one-week program attempts to bridge some of the gap between the teaching of statisticsand the modern practice of statistics by exposing students to the interplay between a questionin a scientific area and the way statistics can address the question. The workshop gives studentsexposure to how a statistician frames a research question in statistical terms, and they gain first handexperience with how to explore relevant data, understand the statistical issues, and use statistical2ethods to address the scientific question. Speed (1986) describes the importance of this aspect ofour field and why it should be a central part of statistics education:The interplay between questions, answers and statistics seems to me to be somethingwhich should interest teachers of statistics, for if students have a good appreciation ofthis interplay, they will have learned some statistical thinking, not just some statisticalmethods. Furthermore, I believe that a good understanding of this interplay can helpresolve many of the difficulties commonly encountered in making inferences fromdata.The importance of this interplay in educating our students features prominently in the AmericanStatistical Association’s 2014 Curriculum Guidelines for Undergraduate Programs in StatisticalScience (ASA 2014). There, the first guiding principle is the “scientific method and its relation tothe statistical problem solving cycle,” and the guidelines state (p.6):All too often, undergraduate statistics majors are handed a “canned” data set and toldto analyze it using the methods currently being studied. ... Students need practicedeveloping a unified approach to statistical analysis and integrating multiple methodsin an iterative manner. ... Students need to see that the discipline of statistics is morethan a collection of unrelated tools (or methods); it is a general approach to problemsolving using data.In this paper, we present the ESR workshop and draw lessons from our experience with theprogram, which we hope provide insights and ideas for addressing this central aspect of statisticstraining. The ESR has evolved over the years as we experimented with different approaches andgathered student feedback. ESR began in 2005, organized by Hansen, Nolan, and Temple Lang,and was offered six times between 2005 and 2012. All together, 21 researchers worked with atotal of 146 undergraduate participants and approximately 45 graduate students, teaching faculty,and organizers. We first describe the core of the program, including a description of how one3esearcher engaged students in her research. Next, we review the impact of the ESR as reported bythe undergraduate and graduate participants, summarize the key elements of success, and presentideas for how these features can be adopted in the statistics major. These proposed changes to whatand how we teach will equip our students with essential skills in statistical thinking with data.
The Explorations in Statistics Research workshop offers a strong scientific program led by researchstatisticians working at the frontiers of modern applications in science, policy, industry, politicalscience, etc. It exposes undergraduates to research and the associated data analysis process that thestudents often do not experience until a capstone project at the end of their major, past the pointwhen they are deciding on a major or whether to apply to graduate school. The core of the one-week workshop consists of three two-day data analysis projects. Each two-day topic is led by aresearcher who organizes the activities to engage the students on problems related to his or her cur-rent work. (Table 1 lists the topics, researchers, and their institutions for all offerings of ESR.) Theresearcher provides data for analysis and prepares short talks and computer investigations wherethe students are introduced to the material in stages. Through hands-on data analysis, studentsexperience statistical research, from exploratory data analysis (EDA) to modeling to conclusions.Throughout the workshop, students have multiple opportunities to converse with the researcherabout their insights and ideas, both collectively and individually. The activities carefully buildupon each other, beginning with simple EDA and advancing to the application of modern sta-tistical methods. The three-to-one student to “teacher” ratio enables the students to be creativewithout getting bogged down by computational issues of implementation. The workshop emulatesa research/work environment involving group work and frequent presentation and discussion ofideas.In general, the topic begins with the researcher providing a high-level description of his or her4 ear Topic Researcher Affiliation
Updating a Web Cache, Carrie Grimes, Google Research.
In an introductory session on themorning of the first day, Grimes described how search engines keep current their database of Webpages. The main issue she wanted the students to focus on was determining how often a Website should be revisited by the search engine to ensure that the index is up-to-date. Google wantsthe index to always have the most up-to-date version of the page. However, it’s not possible tocheck if each page has changed every second, minute or even hour. This is the crux of the problemon which Grimes focused. She also introduced several additional issues that arise when trying toindex the World Wide Web, which helped put the research question into a larger context. At theend of her introduction, Grimes described the data that had been collected to study this problem:thousands of Web sites were visited at regular intervals for 12 months, and for each site there is arecord of the visits on which the site had changed from the prior visit.The students spent the rest of the morning becoming familiar with the data, keeping in mindthat the rate of change for a Web site was the main interest. The data were provided in two differentformats: one was a ragged array where for each Web site there was a set of times when a changewas observed; and the other format was a data frame with a row corresponding to each detectedchange to a Web page, with a value for the time of detected change. Advanced preparation withGrimes led us to the decision to provide these two distinct formats for the data so the students hadmore freedom in thinking about the problem in different ways without being constrained to followa single path of analysis.The students self-organized into groups of two and three and worked with these representationsof the data for the remainder of the morning. Before lunch they reported to everyone on theirinitial explorations. Several groups chose one of their plots to present and were given one minuteto describe an interesting feature of the data that was revealed in the figure. The students had takenseveral different approaches and made many interesting observations. Grimes led a conversation7ith each group as they presented, and she assisted in uncovering various relevant features of thedata. Students in the audience were invited to contribute their observations, confirm these findings,ask questions, or describe different results. In this way, the students were guided to behave like aresearcher, albeit in an accelerated fashion.Students noticed many natural categories of Web site updating, e.g., many Web sites are createdonce and never change, while others are updated at regular intervals, and others change frequentlybut without any obvious pattern. With this new knowledge, the workshop reconvened after lunchand students continued their investigations. This time they compared two groups of URLs, onethat updated frequently and the other slowly. The goal was to investigate how similar they werewith respect to the pattern of updates. This included a discussion led by Grimes on the exponentialand Poisson distributions and their connection to the problem, i.e., changes to a site may follow aPoisson scatter.Throughout the day, in addition to exploring their ideas in small group conversations withGrimes, the 27 undergraduates were assisted by seven graduate students and three PhD statisticians.With such a low student-teacher ratio, students were able to quickly convert their ideas into workingcode and convey their discoveries to others. The next morning, the students recapped their workfrom the previous afternoon and presented their findings. After this debrief, Grimes used theblackboard to uncover, with the students’ insights from their analysis, a problem with censoringthat is inherent in the data. Web sites might be updated multiple times between visits but only onechange is observed. Given the students’ familiarity with the data, they understood the reasons foradopting a more complex estimator of the rate of change based on the MLE (Grimes et al. 2008)even though many did not have the related theoretical background.This iterative process with the researcher was highly choreographed, yet still open to creativeideas. In this way, the students were figuring things out on their own, uncovering important issuesand discussing them with Grimes. As the second day progressed, students were increasingly inde-pendent and branched off in different directions. For example, with the help of functions that made8t easy to simulate the censored process, some students investigated and compared properties ofthe na¨ıve estimator that ignores the censoring and the improved MLE. Other students explored viaa simulation study how to choose the optimal time between updates. When groups prepared andpresented their findings, visualization was again the main vehicle to present their results. Sincethey had not all been working on the same aspect of the problem, they were able to contributedifferent pieces to the story.Grimes wrapped up the two days with a final presentation. She described a Bayesian methodthat she had developed to decide how often to crawl each URL and gave the students a sense ofwhat is possible with advanced study of statistics. At this time, she also spoke about her path froman undergraduate major in anthropology and archaeology, where a semester abroad in Guatemalasparked an interest in quantitative methods for dealing with disparate data, to graduate school instatistics where she worked on nonlinear dimensionality reduction problems, to what it is like towork as a research statistician at Google.Although the workshop is short, students are able to engage in the creative research process andexperience the excitement of making “independent” discoveries using modern statistics methodsand the power of practical and computational training (albeit mostly guided by the researcher). Byengaging students in the modern practice of statistical research, we hope to inspire them to seekmore training and other research experiences at the undergraduate and graduate level.
Participants.
The workshop brings together about 37 participants each year from around thecountry, typically this includes 25 undergraduate students, five graduate students, three researcher-presenters, two organizers, and two additional research statisticians. The undergraduate studentscome from a broad spectrum of institutions and academic preparation. Across the 6 workshops,146 undergraduates participated from 78 institutions and 27 states. Additionally, more than half977) were women. Once established, the program typically had 150 to 250 applications annuallyand, of those admitted to the program, the acceptance rate was about 90%.In the admissions process, we looked for a balance of students in terms of computing back-ground, statistics background, and institution. Some students were statistics majors who had takenadvanced courses in their major and others were majors in other fields who had taken only one ortwo courses but saw statistics as an important asset to their future studies. No computing skills wererequired, but we did ensure the group consisted of students with some experience with statisticalsoftware.Typically five graduate students participate in the workshop. They come from different univer-sities and bring different perspectives and experiences about graduate school. In each of the twomost recent offerings, one of the undergraduate participants from the previous year was invited toreturn as a “graduate student” assistant.Each year the lead researchers include statisticians from academia and research labs in gov-ernment and industry. The latter bring a valuable non-academic perspective, both in the natureof the applications they bring and also on career options. The main criteria for selection are thatthe researcher works closely with another discipline, has tremendous enthusiasm for their work,excellent communication skills, and flexible teaching style.In addition to the organizers, we have routinely invited other researchers to join the workshopfor two or three days. These visitors have included Joe Blitzstein (Harvard University), Di Cook(Iowa State University), Nicholas Horton (Amherst College), David James (Bell Labs), and Debo-rah Swayne (Shannon Research Labs, AT&T). Aside from assisting undergraduates, they also havegiven short lectures on statistical topics and software demos. They also interact with the studentsvia panels and informally, and they provide different perspectives on career paths and experiences.
Program length.
The brevity of the program has many benefits. High-profile researchers arewilling to volunteer for the program and dedicate time to prepare for and participate in the work-10hop. Additionally, the ESR exposes a large number of undergraduates (typically 25 each year) tostatistics research, compared to typical REUs. Moreover, given the size and length of the program,we have been more willing to take risks in admitting students, with the goal of having the biggestimpact by including students who we think would gain a lot from the program. Reciprocally, forthe students, the low commitment and opportunity cost for them to spend one week learning aboutstatistics research means that they are willing to take the risk of attending the workshop. If theydiscover that the field is not right for them, then they have not dedicated their entire summer tothe program. Many students report that they are able to attend the workshop in addition to partic-ipating in other summer programs, jobs, and courses. Lastly, the participants receive roughly thesame amount of information and advice about graduate school that they would receive in a longerprogram.Without a doubt, in a one-week program, students do not learn as much about specific statisticalmethods or get the same extensive training in computing or visualization as they would in a 6- or10-week program. However, this is not our goal. We simply want participants to see the scope andimportance of statistics in a variety of contexts and to experience the challenge and excitement ofaddressing real-world questions with modern data analysis so they might be encouraged to take thenext step in studying statistics.
Computing.
The one-week program begins with a one-day, fast-paced introduction to the sta-tistical programming environment R (R Development Core Team 2012). This introduction is bothgeneral and carefully crafted to prepare the students for the needs of the following days. (See for the reference materials supplied to the stu-dents.) During this training we take the opportunity to explore interesting data sets and teachvisualization. We have found that few students have received training in visualization, and thistopic maintains the interest of those who are new to R and those who have extensive experiencewith it. Additionally, we attempt to provide differentiated instruction so students can find practice11roblems that are appropriate to their level. During the two-day projects themselves, the grad-uate students assist with the computing details so as it is not a barrier to the students’ creativeexpression, yet they are able to appreciate the power and need for computational skills. And, ex-perienced students typically learn new things, such as more sophisticated graphical functionalityand computational approaches.
Information about graduate school.
An important goal of ESR is to encourage students toconsider graduate studies in statistical science and to provide them with information about howto apply to graduate school, what graduate school is like, and career opportunities in statisticalscience. We organize three panel sessions on topics related to graduate school and careers. Thefirst panel is an information session and group discussion on the process of applying to graduateschool. Students receive general advice and materials on how to write a statement of purpose, whoto ask for a letter of recommendation, funding opportunities, how to get the most out of a site visit,etc. More specific advice is also offered, based on faculty experience on graduate admissions com-mittees, about preparing for graduate school, what graduate programs look for in an application,and also how to identify programs that are a good fit for each student. The second panel involvesgraduate students discussing their experience and perspective about the difference between life asan undergraduate versus graduate student, the process of selecting a graduate program and a PhDadvisor, and student “community.” The final panel session includes statistical researchers workingoutside of a university setting, e.g., at industrial and national labs. The panelists offer their viewson these non-university careers. Each of these panels generates many questions from and en-gaged discussions with the students and often provides eye-opening information to some about thepossibilities of graduate school. Additionally, there are many informal opportunities over breaksand meals for students to receive individual advice on preparing for, applying to, and selecting agraduate school. 12 ariants.
We have experimented with a few variations on the presentation of three two-dayprojects. For example, the first ESR included five topics, each for a single day. We found thatone day did not give the students enough time to familiarize themselves with the problem and data.As a result, the students were mechanically solving the problem without having an opportunity tothink of approaches themselves and understand the implications of their discoveries and contributeideas. Additionally, the context switching from one day to the next was mentally exhausting forthem. We have had more success with two other variations.Most recently, we had only two topics for the week. Instead of a third topic, we included aproject where students worked in groups on one of six data sets that we provided. These wereintroduced on the first day of the workshop, and students had time to explore them during theR tutorial. They continued to work on the project during the second day, exploring interestingfeatures of the data. Then, the formal two-day sessions began on the third day. This schedule gavethe students an opportunity to further hone their R skills in preparation for the research topics.On the last day, they completed their analysis and presented their findings. In evaluations, manystudents commented that they liked having their own separate project to work on. Others noted thatthe continuity of working on their project throughout the week made it very apparent how muchtheir R skills had improved.In 2011 and 2013, we also offered a science version of the ESR with Berkeley faculty leadingthe research topics. The format of these workshops was slightly different from the ESR. Here therewas a single theme for the week, such as the carbon cycle and sensor networks. These workshopsincluded other activities, such as a poster session and having students design experiments and col-lect data. Like ESR, the common thread was working with data collected to address an important,current research problem. 13
REPORTED IMPACT
Each year we have carried out end-of-program evaluations. We present here a summary of studentfeedback about the program that focuses on students’ perceived benefits of the workshop. Overall,the students report that the material in ESR is very different from what they are exposed to intraditional coursework and they left the program with a much better understanding of the roleof statistics in scientific discovery. More specifically, students were asked what were the mostvaluable aspects of the workshop. Five main themes emerged from their responses. In decreasingorder of mention these are: hands on experience with real data; exposure to modern statisticsresearch; gaining expertise in R; access to faculty and graduate students; and information aboutgraduate school.When asked what were the least favorable aspects of the workshop, 30% reported nothing wasunfavorable. The rest listed issues that mainly fell into two areas. One related to the different levelsof experience with R. Some students were frustrated with sitting through the introduction to thelanguage and others wished there was more time for preparation. The other problem raised was thelevel of technical detail that certain topics required. For example, we have found that the extensivebackground material needed for topics, such as genetics, can be a barrier to understanding theresearch problem. We have found that it is important to get students working with the data quickly,making discoveries, and offering insights on their own. This way they have a more rewardingexperience despite the severe time limitation.Students were also asked what surprised them about the workshop. Three themes emergedfrom their answers to this question. One was the importance of computing to modern statisticalapplications. As one student put it, he/she was surprised at “How important having good computingskills is for a statistician.” And another student added a related note that he/she was surprised at“How many ways one can approach statistics problems visually.” Another theme was the highquality of the speakers. The students were very appreciative of the dedication of the experts who14hared their research problems with them. Also mentioned regularly by the students in response tothis question was the group work and the supportive community created by the faculty and graduatestudents. The students enjoyed the collaborative, non-competitive environment.Finally, students were also asked: If recommending this workshop to a fellow student, whatreasons would you give to him or her to participate? And, What reservations, if any, would youexpress to him or her about participating? Below are representative reasons to participate: “Theproblems covered are a lot more interesting, mentally stimulating and applicable [than] what yousee in classes.” “It will expand your knowledge of statistics and data analysis in a major way,both through the topics, and through collaboration with peers, grad students, and the amazingprofessors.” “Using real life data in engaging exercises.” “Insight on what sorts of cutting-edgeresearch is being done in stats.” “To see how many applications statistics [has] and how it is notjust a science but also an art.” “It is self-driven research unlike anything in a classroom”As for the reservations they would express to someone, about one-third had no reservations andthose that did made the following types of comments: “Make sure you know a little R coming inbecause it can really make a difference.” “The workshop will be valuable only if you put in a lot ofeffort during the breakout sessions.” “Be prepared for a full week with not a ton of free time.”For a different perspective, we recently contacted the graduate student assistants from the pastfour offerings of the program. We asked them to comment on the ways, if any, the ESR has influ-enced their teaching and on what other ways they have benefited from the program. Fourteen ofthe 15 people contacted responded. They commented on the benefits of experiencing an alternativeapproach to teaching that was more interactive and real-problem-oriented and on being exposed toresearchers in other areas of statistics and seeing how they think about their research problems. Itwas clear from their responses that they felt they had participated in a very different teaching andlearning environment than previously experienced, and that they benefitted from exposure to thisenvironment. Below are representative comments from their evaluation:“I think ESR definitely was a great experience for me to approach teaching from a more hands-on,15pen-ended perspective. Much of my previous teaching had been centered around set curriculumand going over pre-set problems, but what we did during the program helped me communicate withstudents and colleagues in a more creative and collaborative way - which encourages deeper think-ing and discussion.” “I think one of the biggest impacts ESR had on my teaching is to recognizethat while it is uncomfortable to me to give students open-ended problems, it is beneficial to theirlearning and it is exciting to see what directions they take the problems. The program also taughtme that it is good to sometimes give students messy data.” “The program gave me an opportunity tointeract with students with many different statistical backgrounds and research interests. As I nowcollaborate frequently with social scientists, I am finding my previous experience in ESR quitehelpful in my current work.” “The program gave me experience to respond on the spot to all kindsof surprising students’ questions. ... [it] made me better in articulating my ideas/questions andunderstanding what other people were thinking when solving problems.” “When I first went on thejob market for an academic job at a liberal arts institution, most of my teaching experience involvedlecturing in a large-classroom setting. Several hiring committees were intrigued by the ESR for-mat and were happy I had experience facilitating an active learning environment.” “I learned newstatistical concepts and application areas while I was TAing – making me more well-rounded, andbetter equipped to make contributions to problems outside my research area.”In summary, the undergraduate and graduate students highly valued the experience of workingwith authentic data on current, relevant scientific problems in a collaborative open-ended environ-ment. We too believe these are essential elements of the program, and in the next section, wesummarize the key features of the ESR that we think are responsible for creating this experience,and we make recommendations for how to incorporate some of these features into the classroomso more students are exposed to authentic data analysis processes.16
KEY FEATURES & RECOMMENDATIONS
Our teaching and courses have benefited tremendously from preparing, participating in, and exper-imenting with the ESR. We have found our own efforts to bring the ESR into our classroom havehelped create a higher level of student engagement, interest, and aptitude. From our experienceswith the various versions of the workshop and from student evaluations, we have identified severalaspects of ESR that we think are particularly important to its success, and we provide a set of rec-ommendations for ways to incorporate aspects of these key features into “regular” courses. Theserecommendations include both ideas for how individuals can change their courses and how we asa community of statistics faculty can bring about larger change.
1. The Research Problem.
For the ESR, we invite researchers who are known for their activeengagement in a scientific application, their tremendous enthusiasm for their work, and exceptionalcommunication skills. The researchers’ close connection to the application fosters enthusiasm forstatistics among the students as they see the relevance of the field in solving important problemsat the frontiers of science. Moreover, the approach that the researchers take to engage students inthe creative process of data analysis follows a non-traditional teaching practice that is more akin toan investigatory process. In preparing for each ESR, we had the privilege to be in regular contactwith the researchers in advance of the workshop. This preparatory work included reading therelevant papers describing the researchers’ work, exploring the data, and documenting our initialquestions and thought process in this first exposure to the problem. We acted as students duringthis preparatory stage and our learning process helped inform and shape the teaching and learningexperience during the ESR.This experience aided us in developing an approach/philosophy for adapting these projectsinto case studies and assignments for our courses. These case studies are more focused on thescientific problem itself than those typically found in, e.g., DASL (DASL Project 2014) whichgenerally aim at providing a brief example of a statistical method. Rather, they are more in line17ith the context-laden open-ended case studies in Nolan and Speed (2001). However, they containgreater details on the statistical analysis (different possible approaches and statistical issues) and oncomputing and visualization. We have made some of these data and materials available for teachingadvanced undergraduate courses in Nolan and Temple Lang (2015) and its accompanying Web site http://rdatasciencecases.org/ , and other materials for teaching introductory courseare on the Web at .While it can be difficult for instructors to find or access realistic, cutting-edge problems andvery time consuming to work through the details, especially without access to the researcher, weencourage instructors of statistics to cull problems from their own applied research or to collaboratewith a local applied statistician or scientist to develop a case study and make it available to thestatistics community. The great advantage to developing a local application is that there is thepossibility of bringing in an expert, as with the ESR. For example, a local expert can be invited toa class meeting to introduce the problem and data and then invited back for a follow-up meeting todiscuss student findings.
2. Visualization.
We have found that structuring the initial stages of analysis around visualiza-tion creates a level playing field for the students and quickly engages them with the data. Withvisualization, students can uncover important aspects of a problem without needing knowledgeof advanced methods. Despite their varied backgrounds, all students typically find that through avisualization they can make a contribution that addresses the research problem. From there, stu-dents head in different directions analyzing the data with more sophisticated statistical techniquesdepending on their preparation.Exploratory visualization is a vital element of all data analyses that is rarely emphasized or ex-plicitly taught in our courses. Often only a few simple types of visualizations are used in courses,such as histograms, box plots, and scatter plots, and little or no attention is paid to the principles ofgood graphics. Presentation graphics are important for making convincing arguments, exploratory18raphics are important for informing a data analysis, and modern software tools have reduced thebarrier to making rich, informative data visualizations. For these reasons, we advocate that statisti-cal graphics deserves a larger part in our curriculum. And importantly, students find it empoweringand enjoyable to create informative and meaningful visualizations. See, e.g., Nolan and Perrett(2014) for examples of visualization assignments that can be used in a spectrum of undergraduatecourses, and see http://datascience.ucdavis.edu/NSFWorkshops/Visualization/GraphicsPartI.pdf for an overview of material on graphics that we have included in our in-troductory and advanced courses.
3. Computing.
The preparatory work mentioned in recommendation
4. Engagement.
The low student-teacher ratio and the freedom from assessment created anopen exchange of ideas between the undergraduates, graduates, and PhD statisticians. Explicitlyrequiring the students to work in groups, asking them to have their own ideas about analyzing thedata, expecting them all to contribute, and making it clear that there was no single correct answer,were some of the key features that we believe helped foster their curiosity and gain confidence inexpressing their ideas. The undergraduates could always find someone to discuss an idea with or toask for assistance with programming. Graduate students were able to take care of many immediateissues and also help to identify more significant problems that required input from the researcheror organizers. This quick turn around created open, responsive channels of communication thathelped sustain the excitement of the data analysis.We have found that we can partially create this atmosphere in our classes through the use oftechnology and near-peer instructors. Near-peers are students who have more advanced standingand have previously taken the particular course. They act as instructional aides by assisting inlab sessions. Research shows that peer instruction increases student mastery of both conceptualreasoning and quantitative problem solving and increases student engagement (Crouch and Mazur2001). This approach can be particularly effective at large universities where low student-teacherratios are not possible.We have also had some success using online forums for addressing questions about projectsand data analyses. We particularly like Piazza ( https://piazza.com/ ). We organize ourcourses so that instructors, teaching assistants and near-peer instructors share in the responsibilityof monitoring and responding to student posts, and as the semester progresses, we reduce ourresponses with the expectation that students fill in the gap and answer each others questions.There are many other possibilities for creating a community of student statisticians. For exam-ple, faculty can flip the classroom, where there is more time for student-student and faculty-student20onversation on how to approach a data analysis problem in the classroom because students are re-ceiving the more traditionally delivered material outside the classroom. As another option, facultycan sponsor a DataFest (Cetinkaya-Rundel and Stangl 2013) at their institution where studentswork intensively in groups for three days on a real-world project. Possibly, student clubs cansucceed here as well.
5. Advanced methods.
In the ESR, after the students have worked with the data and have anunderstanding of the research question, the researcher introduces an advanced method to analyzethe data, such as spline smoothing, recursive partitioning, and empirical Bayes. This introductionis in the context of solving the current problem and from an intuitive point of view, rather than amore abstract, rigorous mathematical approach. In this context, students are excited about seeinghow modern methods can be used to solve important real world problems. The students are givena basic understanding of how the method works and why it is useful in the particular setting, butthey are also well aware that further study of statistics is essential to understanding how best toemploy these tools.We advocate that our undergraduate curriculum needs to introduce modern, advanced (andfun!) statistical methods into introductory courses. Typically, our courses focus on topics such ashistograms, t -tests, and simple linear models, but why not also include one or more modern topicsthat are easy to understand at an intuitive and/or algorithmic level and that can excite students aboutstatistics and attract them to the field? These methods can be incorporated into case studies thatuse more basic methods and so bring the teaching of statistics closer to the practice of statistics.A small change such as this has the potential to make a large impact on student interest in andperception of our field. Moreover, if the concepts behind testing and inference are embedded inthis larger framework, we believe that students will better understand and properly use statistics.21 CONCLUSION
In this article, we have described a program for undergraduates that aims to create a rich and vibrantexperience working with modern, authentic, statistics research problems. We have attempted toconvey the unique aspects of the program with the hope that it will spark ideas and lead to changein our undergraduate statistics introductory courses and major programs. The first three guidingprinciples of the 2014 ASA Guidelines for undergraduate programs in statistical science are: thescientific method and its relation to the statistical problem solving cycle; real applications; andfocus on problem solving. The ESR provides insights into how we might improve our curricularactivities to follow these guiding principles. For example, we can give students early practice withthe interplay between questions, answers and statistics and with authentic data analysis. We alsocan update curricular topics to increase emphasis on data visualization and incorporate modernmethods into introductory classes. Furthermore, there are opportunities with near-peer instruction,online discussion boards, etc. to foster a community of engaged student learners.Finally, faculty development appears at the top of the list of “next steps” in the ASA Guidelines,which calls for creating and sharing materials, such as those mentioned in Section 5. We furtheradvocate creating opportunities for faculty to participate in inquiry-based approaches to teachingand approaches for bringing statistical problem solving into the undergraduate classroom, similarto the graduate students’ experience in the ESR. One possibility would be to develop an ESR-likeexperience for faculty where they have the opportunity to create materials to use in their classroomsand share with others. If statistics is to remain a vital field, then we must modernize our teaching,both the topics we cover and our approach to teaching them. Statistics educators are a key piece ofthis change.
References
J. J. Allaire. rmarkdown : Dynamic Documents for R. http://cran.r-project.org/web/packages/rmarkdown , 2015. R package22ersion 0.5.1.ASA. . Alexan-dria, VA, 2014. .B. Baumer, M. Cetinkaya-Rundel, A. Bray, L. Loi, and N. J. Horton. R mark-down: Integrating a reproducible analysis tool into introductory statistics, 2014. http://arxiv.org/pdf/1402.1894.pdf .M. Cetinkaya-Rundel and D. Stangl. A Celebration of Data.
The American Statistician , 23(3):43–46, 2013.C. Crouch and E. Mazur. Peer Instruction: Ten years of experience and results.
Am. J. Phys. , 69:970–977, 2001.DASL Project. Data and Story Library (DASL). http://lib.stat.cmu.edu/DASL/DataArchive.html , 2014.C. Grimes, D. Ford, and E. Tassone. Keeping a Search Engine Fresh: Risk and optimal-ity in estimating refresh rates for web pages. In
Proceedings of the INTERFACE . 2008. .D. Nolan and J. Perrett. Copying the masters and other techniques for learning data visualization,2014. arXiv:1503.00781.D. Nolan and T. P. Speed.
Stat Labs: Mathematical Statistics through Applications . Springer,2001.D. Nolan and D. Temple Lang.
Data Science in R: A Case Studies Approach to Computational Rea-soning and Problem Solving Mathematical Statistics through Applications . CRC Press, 2015.R. Pruim, D. Kaplan, and N. J. Horton. mosaic : Project MOSAIC statistics and mathematicsteaching utilities. http://cran.r-project.org/web/packages/mosaic/ , 2014.R package version 0.9.1-3.R Development Core Team.
R: A Language and Environment for Statistical Computing . Vienna,Austria, 2012. .RStudio.
RStudio: Integrated development environment for R; Version 0.98.978 . Boston, MA,2013. .T. P. Speed. Questions, Answers, and Statistics. In
Proceedings of the International Conferenceon Teaching Statistics 2 . 1986.Y. Xie. knitr : A General-Purpose Package for Dynamic Report Generation in R. http://cran.r-project.org/web/packages/knitrhttp://cran.r-project.org/web/packages/knitr