[PDF] I hear, I forget. I do, I understand: a modified Moore-method mathematical statistics course

Abstract

Moore introduced a method for graduate mathematics instruction that consisted primarily of individual student work on challenging proofs (Jones, 1977). Cohen (1982) described an adaptation with less explicit competition suitable for undergraduate students at a liberal arts college. This paper details an adaptation of this modified Moore-method to teach mathematical statistics, and describes ways that such an approach helps engage students and foster the teaching of statistics. Groups of students worked a set of 3 difficult problems (some theoretical, some applied) every two weeks. Class time was devoted to coaching sessions with the instructor, group meeting time, and class presentations. R was used to estimate solutions empirically where analytic results were intractable, as well as to provide an environment to undertake simulation studies with the aim of deepening understanding and complementing analytic solutions. Each group presented comprehensive solutions to complement oral presentations. Development of parallel techniques for empirical and analytic problem solving was an explicit goal of the course, which also attempted to communicate ways that statistics can be used to tackle interesting problems. The group problem solving component and use of technology allowed students to attempt much more challenging questions than they could otherwise solve.

Full PDF

II hear, I forget. I do, I understand:a modiﬁed Moore-method mathematical statistics course

Nicholas J. Horton ∗ Department of MathematicsAmherst College, Amherst, MA

October 1, 2013 ∗ Address for correspondence: Dept of Mathematics, Seeley Mudd, Amherst College, Amherst, MA 01002-5000. Phone:413-542-5655, email: [email protected] a r X i v : . [ s t a t . O T ] S e p hear, I forget. I do, I understand:a modiﬁed Moore-method mathematical statistics course Abstract

Moore introduced a method for graduate mathematics instruction that consisted primarily of individ-ual student work on challenging proofs (Jones 1977). Cohen (1992) described an adaptation with lessexplicit competition suitable for undergraduate students at a liberal arts college. This paper details anadaptation of this modiﬁed Moore-method to teach mathematical statistics, and describes ways that suchan approach helps engage students and foster the teaching of statistics.Groups of students worked a set of 3 difﬁcult problems (some theoretical, some applied) every two weeks.Class time was devoted to coaching sessions with the instructor, group meeting time, and class presen-tations. R was used to estimate solutions empirically where analytic results were intractable, as well asto provide an environment to undertake simulation studies with the aim of deepening understanding andcomplementing analytic solutions. Each group presented comprehensive solutions to complement oralpresentations. Development of parallel techniques for empirical and analytic problem solving was an ex-plicit goal of the course, which also attempted to communicate ways that statistics can be used to tackleinteresting problems. The group problem solving component and use of technology allowed students toattempt much more challenging questions than they could otherwise solve.Keywords: capstone course, empirical problem solving, intermediate statistics, R software, RStudiointegrated development environment, reproducible analysis, simulation studies, statistical computing,statistical education 2

Introduction

In this paper, an implementation of a mathematical statistics course is described with the goal of devel-oping a combination of analytic and empirical problem-solving skills through the solution of challengingproblems and complex case studies. The course, offered at the Department of Mathematics and Statisticsat Smith College in spring 2007 and spring 2011, adapted the approach of R.L Moore (Jones 1977),using modiﬁcations suggested by Cohen (1992). A similar mathematical statistics course is described byMcLoughlin (2008).In the next subsection, recent developments in statistics education are described, followed by an overviewof the modiﬁed Moore-Cohen method. Section 2 describes speciﬁc details of the course, Section 3 pro-vides two example problems (with empirical as well as analytic solutions), Section 4 describes gradingand assessment while Section 5 provides additional discussion and closing thoughts.

Extensive curricular reforms in undergraduate statistics education have transformed our programs andcourses in recent decades (Cobb 1992, Moore, Cobb, Garﬁeld & Meeker 1995, Cobb 2011). The Guide-lines for Assessment and Instruction for Statistics Education (GAISE) report (GAISE College Group2005), which succinctly described these changes, recommended that introductory statistics courses: • Emphasize statistical literacy and develop statistical thinking, • Use real data, • Stress conceptual understanding rather than mere knowledge of procedures, • Foster active learning in the classroom, • Use technology for developing conceptual understanding and analyzing data, and • Use assessments to improve and evaluate student learning.Other related efforts have attempted to broaden the types of questions that statistics students grapple with(Brown & Kass 2009, Gould 2010), increase the use of case studies (Barrows & Tamblyn 1980, Nolan &Speed 2000, Nolan 2003) and take advantage of sophisticated computing technologies and environments3uch as R (Ihaka & Gentleman 1996) or Matlab (Kaplan 2003) to buttress understanding of statisticalconcepts (Buttrey, Nolan & Temple Lang 2001, Nolan & Temple Lang 2003, Horton, Brown & Qian2004, Froelich 2008, Nolan & Temple Lang 2010, Lazar, Reeves & Franklin 2011).The mathematical statistics course has undergone many transformations during this same period. A livelypanel in 2003 with the provocative title “Is the Math Stat Course obsolete?” (Rossman & Chance 2003)provided a glimpse into ways that this intermediate level statistics course is adapting to a changinglandscape. One idea raised was that the math stat course (still a common entry point to the ﬁeld for manystudents studying mathematics) should convey the excitement of the discipline (“even if they don’t goon [in statistics], we want them to leave thinking statistics is interesting”). Another was that modeling,computing and problem-solving are key components of such a course.Cobb (2011) provides a series of capsule summaries of innovations in the teaching of mathematicalstatistics, and discusses key tensions that underlie our courses in terms of what we want students tolearn: Surely the most common answer must be that we want our students to learn to analyze data,and certainly I share that goal. But for some students, particularly those with a strong interestand ability in mathematics, I suggest a complementary goal, one that in my opinion has notreceived enough explicit attention: We want these mathematically inclined students to learnto solve methodological problems. I call the two goals complementary because, as I shallargue in detail, there are essential tensions between the goals of helping students learn toanalyze data and helping students learn to solve methodological problems.For a ready example of the tension, consider the role of simple, artiﬁcial examples. Forteaching data analysis, these “toy” examples are often and deservedly regarded with con-tempt. But for developing an understanding of a methodological challenge, the ability tocreate a dialectical succession of toy examples and exploit their evolution is critical (p. 32).

Moore was noted (Halmos 1985) for quoting the Chinese proverb

I hear, I forget. I see, I remember. I do,I understand.

He provided classes with a list of deﬁnitions and theorems which they would subsequentlyprove individually and then share with the rest of the class. Competition was a key driving force in thecourse (Jones 1977), with efforts to ensure that student background was as homogeneous as possible. Theoverall goal was to build student capacity to create structure from an axiomatic basis and communicate4his to others. Smith, Yoo & Nichols (2009) described possible evaluations and assessment of Mooremethod mathematics courses.Cohen (1992) modiﬁed Moore’s approach using three guiding principles: • Students understand better and remember longer what they discover themselves than what is toldto them, • People master an idea thoroughly when they teach it to someone else, and • Effective writing and clear thinking are inextricably linked (p. 474).A fourth principle incorporated in this mathematical statistics course involved the use of R (R CoreTeam 2013) and RStudio (an open source integrated development environment for R) to facilitate parallelempirical and analytic problem solving techniques.Much of the class time is spent with students working as a group and individually to solve sets of chal-lenging problems, writing up solutions, and presenting them to the class as a whole. While each grouptackled problems from each of the major units of the course, group members would tend to learn theirassigned problems in more detail and rely on their classmates to convey understanding of the other prob-lems.The Moore and Cohen approaches deal more with pedagogy than with curriculum. Moore used hismethod to teach proofs in topology. Cohen used his method for linear algebra. Here we borrow Cohen’spedagogy for a course in mathematical statistics. Students are not given theorems to prove as in Moore’scourses; instead they are given challenging problems to solve.These problems are chosen according to four criteria. The ﬁrst two are critical to the pedagogy: eachproblem should be easy to grasp, and each should be hard enough that solving it poses a genuine chal-lenge. The ﬁrst criterion helps ensure that all students in a group can participate; the second helps ensurethat stronger students will not be able to cut off discussion with a quick solution.A third criterion is that the problems should have links to actual applications. This is in the spirit of theGAISE recommendations.Fourth, the problems should lend themselves to parallel and complementary pairs of solutions, one basedon simulation and the other based on theory. The parallel solutions constitute a recurring theme to the5ourse, one that is central to the curriculum. This criterion is in some ways incidental to the pedagogy,although it helps ensure that students with different strengths and backgrounds can contribute actively togroup work.

For the sections (ofﬁcially titled “Seminar in Mathematical Statistics”) taught by the author in Spring2007 and Spring 2011, the class met three times per week for 80 minutes per session for thirteen weeks.While the only required prerequisite for the class was probability, most students in the course had alsotaken introductory statistics and linear algebra. No speciﬁc knowledge of statistics was assumed. Atthe beginning of the course, the students took the 40 item multiple choice CAOS (Comprehensive As-sessment of Outcomes in a ﬁrst statistics course) test (DelMas, Garﬁeld, Ooms & Chance 2006). Whiledesigned to assess student reasoning after a ﬁrst course in statistics (and not a mathematical statisticsclass), the CAOS focuses on conceptual understanding of variability and uncertainty. The average scorefor the mathematical statistics students on the CAOS pre-test was 67.2% correct with a standard deviationof 13.6% (values ranged from 43% to 90%).The structure of the course included (almost) no lectures. Instead, the material was broken down into anumber of problem sets. These questions were designed to be sufﬁciently difﬁcult to provide a challengeto students, but still amenable (with some assistance) to solution.During the ﬁrst offering, four groups of 3 students were created, with seven groups of 3 students forthe second offering. Each group would work a different set of problems for each problem set (withan occasional problem assigned to all groups). Throughout the semester, these groups were reshufﬂedtwice, with no two individuals being in the same group twice. The re-balancing helped to address issueswith groups that consisted of only weak students (as well as to provide a release valve for problems withgroup dynamics).Most class sessions consisted of a series of “coaching” sessions several days after the initial presentationof the problems. These coaching sessions, described in detail in Cohen (1992), are critical in helpingto guide students towards the desired solution without providing the answer. All students in a groupattended a given coaching session, and discussed their preliminary attempts at the problems. In somecases they may have solved their problems. More commonly additional guidance was needed for them tomake progress or to elaborate on their solutions. Early on in the course, much of this coaching involved6upport and scaffolding for the use of computing (to allow them to gradually build their skills in termsof simulation and exploration).Each student created a draft of their preliminary solution in preparation for a second coaching session.To ensure that all students were engaged and making good faith efforts, these were reviewed by theinstructor. One per group was graded in detail, to provide general feedback for all students.The second coaching session was used to help answer any remaining questions and assist with prepa-rations of the solutions (“weekly papers”). Other assistance was provided outside of the regular classmeeting times by email or during ofﬁce hours.Before the ﬁnal class session for a given set of problems, each group created a single clear and com-prehensive solution, which was made available to the class. Finally, each group gave a 15 minute oralpresentation that reviewed their solution, with questions and answers as needed.

The approach suggested by Cohen (1992) to teach analysis of linear algebra provides students severalpages of axioms, deﬁnitions, theorems and problems. This serves as the foundation from which all of theremaining material is derived. Because of the need for more extensive material to support student workin a range of mathematical statistical topics, the text by Rice (1995) was used for background reading aswell as the source of many of the problems. In addition, several modules (including case studies withadvanced data analysis) were integrated from Nolan & Speed (2000).The course began with a series of challenging probability problems, covering selected topics and high-lights from Chapters 1 through 5 of Rice (1995). The next set of problems related to descriptive andgraphical visualization (covering Chapter 10 of Rice (1995) and the

Maternal smoking and infant health module from Nolan & Speed (2000)). Two sets of problems were devoted to estimation and the bootstrap(Chapter 8 of Rice (1995) and the

Patterns in DNA and

Who plays video games? modules from Nolan &Speed (2000)). Testing hypotheses and assessing goodness of ﬁt (Chapter 9 of Rice (1995)) comprisedtwo sets of problems, while Chapter 10 was used as a basis for a set on two sample comparisons. Theﬁrst time the course was offered, it closed with a set of problems entitled

Bayesian inference: a big idea ,based loosely on Chapter 15 of Rice (1995) and Section 2.5 of Lavine (2013), while the second offeringclosed with precursors of informal inference and simulation studies of inference rules (Wild, Pfannkuch,Regan & Horton 2010). 7 .2 Real data and mathematical statistics

While the course did not focus on advanced analysis of multivariate datasets, real data was regularlyincorporated into the course, primarily as a component of problems assigned to the students throughoutthe semester. The textbooks by Rice as well as Nolan & Speed are notable for the number and varietyof motivating examples provided throughout, including the exercises. As an example, students might beasked to ﬁnd the method of moments estimator for θ for a Pareto distribution with known scale parameter x , and compare this to the maximum likelihood estimator for θ . After ﬁnding the analytic results, andsimulating to compare the variance of the estimators, they would be asked to calculate and interpret thesample statistic using data from an economic survey. Another set of problems related to the analysis ofcell probabilities expected by genetic theories, through estimation of underlying parameters. Studentswere assessed both on their ability to report in context on the underlying applied statistical question, aswell as on the relevant statistical derivations or simulations that they carried out.As outlined by the GAISE guidelines, use of real data is essential to the introductory course, and cen-tral also to any statistics curriculum as a whole. Nevertheless, for certain individual courses that serveas elements of a larger statistics curriculum, real data may be less essential. There is an inherent com-plementarity between analysis of data using existing methods and the development of new methods(Cobb 2011). We need a curriculum that teaches students to engage, appreciate and enjoy both data an-alytic and methodological challenges. In a course such as the one described here, although connectionsto real data are important, the balance is weighted towards problems of a more abstract sort. This approach would not be feasible without the use of computing technology to facilitate analysis andsimulation. R (Ihaka & Gentleman 1996) and RStudio ( http://rstudio.org ) provided a ﬂexibleand adaptable environment for exploration (Horton et al. 2004, Pruim 2011).RStudio is an open-source integrated development environment that provides a consistent and powerfulinterface for R (an open source general purpose statistical package, http://r-project.org ) thatis easier to install, learn and run than standard R. L A TEX (Lamport 2011) within the Sweave (Leisch2002) system was used as the formatting environment for the solutions, with an annotated exampledistributed to all students during the initial class meeting. This included examples of tables, ﬁgures,cross referencing, bibliography and other useful attributes. Submissions were made available to studentsas both Sweave source and PDF ﬁles to allow students to borrow working code. RStudio is particularly8ttractive because it simpliﬁes the user interface and has tightly integrated support for Sweave (includinga single button click to

Compile PDF from the source document). In future offerings, the

Markdown system within the knitr package (Xie 2012) will be used, as it provides simpliﬁed functionality anddoes not require knowledge of L A TEX.The course intentionally introduced students to concepts of reproducible analysis (Gentleman & Tem-ple Lang 2007), where computation, code and results of an analysis are integrated. Being able to re-runa set of simulations and regenerate a report with a single click is a powerful motivator for students usedto error-prone processes of cutting and pasting output and ﬁgures. Reproducible analysis systems arebecoming standard in industry and academia, have the potential to help ensure better statistical analysis,and should be incorporated in the statistics curriculum.To help simplify the learning curve for these somewhat complex systems, a number of examples andidioms were provided by the instructor, to help build students’ repertoire of useful techniques to at-tack problems. Students were encouraged to write their initial solutions using pseudo-code (an informaldescription that could later be turned into working R code). These were also posted to the course man-agement system to facilitate re-use in other problems and settings.

To give a better sense of the course, we describe two problems that were completed by the students,along with model solutions and commentary (additional examples are found in the online supplement).Each group would generally work 3 or 4 problems per assignment.These problems feature both empirical (simulations in R) and analytic (closed-form) solutions by thegroups. They range from easier to more challenging, but illustrate the approaches taken by students inthree separate application areas. The general level of difﬁculty is similar to that of Rice (1995) ∗ . It is known that 5% of the members of a population have disease X, which can be discovered by a bloodtest (that is assumed to perfectly identify both diseased and nondiseased populations). Suppose that N ∗ Rice states on page xi that

This book includes a fairly large number of problems, some of which will be quite difﬁcult forstudents . My students conﬁrmed this assertion. k people are pooled to be analyzed.Assume that N = nk with n an integer. If the test is negative, all the people tested are healthy (that is,just this one test is needed). If the test result is positive, each of the k people must be tested separately(that is, a total of k + 1 tests are needed for that group) † .Questions:i. For ﬁxed k what is the expected number of tests needed in (B)?ii. Find the k that will minimize the expected number of tests in (B).iii. Using the k that minimizes the number of tests, on average how many tests does (B) save incomparison with (A)? Be sure to check your answer using an empirical simulation. We attempted to gain a better understanding of the problem by simulation. First, we set k = 10 , n = 500 and P ( infected ) = p = 0 . (refer to Figure 1 for code). Given these speciﬁc values for each of thevariables, we found the expected number of tests to be approximately 2501.9. We then used this value tohelp us check our analytic solution.Next, we tried different values of k and n such that N (the number of people to be tested) equaled 5000.We did this to ﬁnd the value of k that minimized the expected number of tests. Given that N = 5000 ,possible integer values for k were 2, 4, 5, 8, and 10. We found the expected number of tests for eachof these k values, respectively were 2988.7, 2178.9, 2126.5, 2306.2, and 2501.9. For this example, theminimum value is found when k = 5 . Approach (A): the expected number of tests needed is E [ T A ] = N = n ∗ k, because we would be testingeach individual exactly once. † This assumes that all of the tests are run at the same time. Otherwise, if the pool tested positive and the ﬁrst k − testswere negative, there would be no need to test the ﬁnal member of the pool. umsim = 1000p = 0.05 Figure 1: R code to generate empirical estimates using Approach (B)For Approach (B):i. Let Y = the T B = the total number of tests needed. Assuming independence,we have that E [ Y ] = n (1 − . k ) and E [ T B ] = n + kE [ Y ] = n + k ( n (1 − . k )) . With N = 5000 people, this simpliﬁes to: E [ T B ] = 5000(1 /k + (1 − . k )) . When k = 10 , n = 500 and P ( infected ) = p = 0 . , E [ T B ] = 2506 . , which closely matches the results from the simulation.ii. We ﬁnd the derivative of E [ T B ] with respect to k and solve (using a symbolic mathematics packagesuch as Maple or Wolfram Alpha), which yields a positive solution of k = 5 . (see Figure 2).When k = 5 , E [ T B ] = n + 5( n (1 − . )) = n + 1 . n = 2 . n = 2130 . This result is similar to that shown in the empirical simulations.iii. We compare the two expected number of tests needed for each of the approaches when k = 5 and p = 0 . : E [ T A ] /E [ T B ] = 5 n/ . n = 2 . . number in pool (k) e x pe c t ed nu m be r Figure 2: Display of expected number of blood tests required as a function of pool-size ( k ), with N =5000 , p = 0 . . . This problem was part of a series of probability questions at the start of the course. While more efﬁcientprogramming approaches could be used, the empirical solution features a number of idioms and tricks ofthe trade that are repeated throughout the class.This example demonstrates a setting where the analytic solution is straightforward using basic propertiesof expectations, but where the empirical solution provides a useful check on the results. This type ofquestion helps students build conﬁdence in using knowledge from the prerequisite course in new ways.

Questions [from Evans & Rosenthal (2004)]:i. Is it possible to ﬁnd a tractable expression for the cdf of a distribution with density given by: f ( y ) = c (1 + | y | ) exp( − y ) , where c is a normalizing constant and y is deﬁned on the whole realline? If not, can you ﬁnd c ?ii. Show how to generate a sample of observations from this distribution.iii. Describe how this is useful in Bayesian inference. i. While it is possible to ﬁnd a closed-form solution for the cdf of this distribution it is not easilysolvable. Note that because the density is a function of the absolute value of y , the integral can bebroken into two symmetric parts. To ﬁnd c , we evaluate twice the integral from [0 , ∞ ) in R: > f = function(x){exp(-xˆ4)*(1+abs(x))ˆ3}> integral = integrate(f, 0, Inf) Hence c = 1 / . ∼ = 0 . . ii. We created a Markov Chain Monte Carlo sampler, using the Metropolis-Hastings algorithm. Thepremise for this algorithm is that it chooses proposal probabilities so that after the process hasconverged draws are generated from the desired distribution. A further discussion for enthusiastscan be found on Page 610 of Evans & Rosenthal (2004). We ﬁnd the acceptance probability α ( x, y ) in terms of two densities, our f ( y ) and q ( x, y ) , a normal proposal density with mean x and variance 1, so that α ( x, y ) = min (cid:26) , f ( y ) q ( y, x ) f ( x ) q ( x, y ) (cid:27) = min (cid:40) , c exp ( − y )(1 + | y | ) (2 π ) − / exp ( − ( y − x ) / c exp ( − x )(1 + | x | ) (2 π ) − / exp ( − ( x − y ) / (cid:41) = min (cid:40) , exp ( − y + x )(1 + | y | ) (1 + | x | ) (cid:41) Pick an arbitrary value for X . The Metropolis-Hastings algorithm then computes the value X n +1 as follows:1. Generate Y n +1 from a normal( X n , 1).2. Let y = Y n +1 , compute α ( x, y ) as before.3. With probability α ( x, y ) , let X n +1 = Y n +1 = y (use proposal value). Otherwise, with proba-bility − α ( x, y ) , let X n +1 = X n = x (keep previous value).The code (displayed in Figure 3), uses the ﬁrst 100,000 iterations as a burn-in period, then gener-ates 100,000 samples. A histogram is displayed in Figure 4.iii. The Metropolis-Hastings algorithm is a form of Markov Chain Monte Carlo (MCMC) and is par-ticularly attractive when the posterior density function does not have a familiar integral (such aswhen f ( x ) is a posterior density that does not correspond to a conjugate prior).Simulation is a central part of applied Bayesian analysis, because of the relative ease with whichsamples can be generated from a probability distribution, even when the density function cannotbe explicitly integrated (see page 25 of Gelman, Carlin, Stern & Rubin (2004)).14 = seq(from=-3, to=3, length=200)pdfval = 1/6.809610784*exp(-xˆ4)*(1+abs(x))ˆ3par(mfrow=c(2, 1)); plot(x, pdfval, type="n"); lines(x, pdfval)title("Actual pdf")alphafun = function(x, y) { return(exp(-yˆ4+xˆ4)*(1+abs(y))ˆ3*(1+abs(x))ˆ-3) }numvals = 100000; burnin = 100000i = 1xn = 3 Figure 3: R code to generate Metropolis-Hastings samples15 . . . x pd f v a l Actual pdfMetropolis−Hastings samples x D en s i t y −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 . . . Figure 4: True density and simulated draws from probability distribution16 .2.2 Commentary

This problem was taken from the ﬁnal set of problems, entitled

Bayesian statistics: a big idea , whichwas intended to introduce students to more sophisticated simulations that are necessary to get answersfor more complex models. Because the students had no prior experience with MCMC, a preliminarymini-lecture on the topic was provided along with supporting readings from Lavine (2013). This in-cluded some classic examples with conjugate priors. Throwing the nasty density function at studentswas initially off-putting, but it helped to motivate MCMC methods and introduce Bayesian ideas andmethods. The goal of this section was to give students a glimpse into a ﬂexible and sophisticated set ofmodels that can tackle problems far outside the realm of a traditional math stat class.

Assessment of students in the course was done in several ways. Students completed 7 sets of problemsover the course of the semester (each one approximately 2 weeks apart). Grades on preliminary solutionsand weekly papers constituted 35% of the grade, with class participation, attendance and oral presenta-tions an additional 20%. Two midterm exams accounted for 40% while 5% reﬂected good faith efforttowards completion of low-stakes online assessments. The midterm exams had in-class and take-homecomponents. They included problems similar to those undertaken by the groups, albeit with simplersolutions.An informal mid-semester evaluation was undertaken approximately halfway through the course. Forthe ﬁrst offering of the course, a colleague met with the class during the last 15 minutes of a classsession (without the instructor present). Feedback from this assessment indicated great worries about thestructure of the take-home midterm (would the problems be as hard as Rice?) and queries about otherforms of assessment.For the second offering, a more formal evaluation was undertaken where a staff member from the collegelearning center spent the last 20 minutes of a class session with students in focus groups. The studentsappreciated the structure of the course and the opportunities for revision. They “like that we get lectureson background, the collaboration and group work” and “like that we do analytic and empirical solu-tions.” Students sought more input from the instructor, with a desire for more lectures to “put things intoperspective.” Some students suggested that the instructor “tell us what are the key points to absolutelyknow from each problem set.” The ﬁnal question from the focus groups related to the students’ roles17s learners. Students revealed that they understand that they have to prepare more thoroughly for class,improve their own class participation and assume additional responsibilities outside of class. The stu-dents acknowledged that they should read the text more carefully, read other groups’ problems beforethe presentations, and “try harder” with Rice.The outside evaluator summarized the report with the following quote:As you made clear to me in our discussion, although your students may want you to tellthem “the key points to absolutely know,” you believe strongly that they must work their waytowards knowledge mastery in this course. To assist them in achieving this end, you havestructured the course in ways that require them to work individually and collaboratively–with guidance from you–as they become more expert and reﬂective learners.Many of your students are uneasy with this approach and unsure of themselves: they wantto know the right answers, the correct way to think, hence their request for more inputfrom you. Their unease marks them as less sophisticated about real learning and/or timidabout undertaking independent intellectual journeys. You might have an explicit discussionwith your students about your pedagogy and your learning goals for them. I suspect theywould be quite responsive to this kind of communication given their high regard for you andthis course: they know you believe in them. And, since their answers to the third questionreveal that they are aware of their own responsibilities as students, you could also use thisdiscussion to reinforce their own good insights on becoming more active and inquisitivelearners.The students also completed the CAOS post-test at the end of the class, with a mean of 72.5% correct(sd=13%, min=43%, max=90%). There was a statistically signiﬁcant increase in scores compared to thepre-test (paired t-test p=0.01, df=30, 95% conﬁdence interval from 1.4 to 9.2 point increase). Figure5 displays the relationship between pre and post scores (with a solid scatterplot smoother plus dashed

P OST = P RE line). There is some indication of larger improvement for students with lower pre-testscores, which is consistent with a ceiling effect. Given that the CAOS test is intended to assess outcomesfrom a ﬁrst course, this is not surprising. 18 re po s t Figure 5: Relationship between student outcomes on the CAOS (Comprehensive Assessment of Out-comes in a ﬁrst Statistics course) from the class in 2007 and 2011 (plus smoothed line and

P OST = P RE line). 19

Discussion

This paper describes an implementation of a modiﬁed Moore-Cohen method mathematical statisticscourse at an undergraduate liberal arts college. The course featured a series of challenging problems,some theoretical, others data-driven, designed to help teach mathematical statistics using applications.A key idea is that the use of technology (R/RStudio and reproducible analysis tools) has opened up newpossibilities.An attractive aspect of the proposed course was how it intentionally dovetailed with the GAISE recom-mendations (GAISE College Group 2005). In particular, it was designed to encourage statistical thinkingthrough empirical problem solving, use real data to motivate methods, stress conceptual understanding,foster active learning and use technology to develop conceptual understanding. The course is consistentwith the American Statistical Association guidelines for statistics programs (Workgroup on Undergrad-uate Statistics 2000), which call for students to develop effective technical writing, presentation skills,teamwork and collaboration, in addition to knowledge of statistics.

This approach differs signiﬁcantly from the traditional Moore-method, which was developed for a deﬁnition-theorem-proof type course and relies primarily on individual work and competition as a motivator. Themodiﬁed Moore-Cohen method uses group work to facilitate engagement, with stronger students able toplunge more deeply into their solutions while still ensuring that weaker students can receive assistance asneeded. This modiﬁcation might be better thought of as a species split-off, where rather than competing,students are supported to go beyond their expectation and discover something in themselves.A primary challenge of teaching is to engage students in the material being studied. Cohen (1992) notedthat the method effectively raises the level of communication between students and that most students arestimulated by the change from passive to active learning . Lazar et al. (2011) described the importanceof capstone courses in statistics. Structuring the class with multiple, challenging problems that werenot amenable to quick individual solution helped to achieve the goals of a capstone. This includes get-ting students to grapple with real-world problems, helping them develop capacities to work effectivelyin groups, augmenting their ability to compute to extend their problem-solving abilities, and helpingthem to sharpen their abilities to communicate the complexity and power of statistical methods. Thecourse also dovetails with other efforts to involve students in interdisciplinary research projects (Legler,Roback, Ziegler-Graham, Scott, Lane-Getaz & Richey 2010), which tend to focus on larger, more com-20lex datasets in the context of a client discipline.While no formal assessment of the course was undertaken, student feedback from less formal appraisalswas generally positive. Students found the approach to be challenging, particularly at the beginning of thesemester when they were confronted with simultaneously learning R/RStudio, L A TEX/Sweave, empiricalproblem solving techniques as well as oral and written presentation skills. The particular technicalchallenges of learning new packages and systems quickly receded, and the primary challenge related toanswering difﬁcult questions and learning new material, concepts and statistical methods.A limitation of problem or case-based courses is that they typically cover fewer topics in more depth.That was true for this course, which had more constrained coverage goals than the traditional math statclass (though most of the typical key concepts were covered). In addition, students would be expectedto have variable mastery of particular topics that were included, since they engaged in the problemsthat their group was assigned at an intense level, but had more passive involvement in the problems thatother groups presented. The combination of written and oral presentation of solutions from other groupswas designed to minimize these disparities. Ideally students would emerge from the course with usefulcapacities (such as ability to compute with data, simulate to approximate answers, and communicateorally and through their writing) that would allow them to ﬁll any gaps in their knowledge and succeedin a graduate level course.Classes that include group work as a component of assessments often have group dynamic issues, andthis course was no exception. In general there was a positive sense of community and engagement whichﬂowed from the group-based workload. Knowing that the groups would be reshufﬂed twice helped aswell. Focusing much of the work in groups allowed students to tackle far more challenging questionsthan they could solve individually and also modeled a common post-college work environment. Severalstudents have provided anecdotal reports of the value of learning tools for statistical computing andreproducible analysis.There are other challenges to use of this method for teaching the mathematical statistics courses. Theenrollments were 12 and 20 students in Spring 2007 and Spring 2011, respectively. While scaling tocourse sizes of 30–40 students would be straightforward, larger class sizes would require different sys-tems and structures. These might include multiple sections taught with some common mini-lectures,doubling up on problems or student support for computing. The time commitment was comparable to astandard course, due to the extensive coaching and preparation, despite the fact that formal lectures wererelatively short (generally at the start of each new topic).21 .2 Use of technology

Empirical (simulation-based) estimation complements analytic solutions, and can often allow approxi-mate solution of extremely challenging problems. Besides providing a useful check on analytic answers,these simulations can help with insights into how to solve a problem. R and RStudio serve as a ﬂexibleand powerful environment for such exploration.A number of technologies were prominently featured in the course. These included extensive use ofL A TEX and R. Reproducible analysis (the

Sweave system (Leisch 2002) as implemented within RStudio)greatly facilitated integration of commands, output and graphics, and led to better facility for students toundertake analyses outside the course. This scaffolding also helped to move students from a “point andclick” approach to statistical analysis towards a more ﬂexible scripting interface. Further discussion ofhow to integrate reproducible analysis and effective mechanisms to build students’ ability to “computewith data” are important issues but lie somewhat outside the scope of this paper.Other courses may ﬁnd the use of R and RStudio for simulation and approximation of analytic solutionsto be helpful, without the Moore-method approach. The new text by Pruim (2011) features such apresentation.

Cobb (2011) argues that the profession needs two types of statisticians: those with the capacity toappropriately analyze and interpret data, as well as those with interest in devising novel solutions tomethodological challenges. Teaching mathematical statistics in this manner has the potential to fosterengagement by presenting students with extended glimpses of the excitement of developing statisticalprocedures to solve challenging problems (Nolan & Temple Lang 2010). This approach could also serveas a model for other intermediate and advanced undergraduate statistics classes. This method may alsobe relevant for the teaching of similar quantitative courses in other disciplines.

Acknowledgements

This work was supported by NSF grant 0920350 (Phase II: Building a Community around Modeling,Statistics, Computation, and Calculus). Thanks to Sarah Anoke, George Cobb, David Cohen, Daniel22aplan, David Palmer and Randall Pruim for many useful discussions about pedagogy as well as helpfulcomments on an earlier draft. I am also indebted to the Editor, Associate Editor and anonymous reviewersfor many suggestions which led to improvements in the manuscript.23 eferences

Barrows, H. & Tamblyn, R. (1980).

Problem based learning: an approach to medical education ,Springer-Verlag, New York.Brown, E. N. & Kass, R. E. (2009). What is statistics?,

The American Statistician (2): 105–110.Buttrey, S. E., Nolan, D. & Temple Lang, D. (2001). Computing in the mathematical statistics course, ASA Proceedings of the Joint Statistical Meetings .Cobb, G. (2011). Teaching statistics: some important tensions,

Chilean Journal of Statistics (1): 31–62.Cobb, G. W. (1992). Teaching statistics, In Lynn A. Steen (ed), Heeding the call for change: suggestionsfor curricular action (MAA Notes No. 22) pp. 3–43.Cohen, D. W. (1992). A modiﬁed Moore method for teaching undergraduate mathematics,

AmericanMathematical Monthly (7): 473–74,487–490.DelMas, R., Garﬁeld, J., Ooms, A. & Chance, B. (2006). Assessing students’ conceptual understandingafter a ﬁrst course in statistics, Proceedings of the Annual Meetings of the American EducationalResearch Association .Evans, M. J. & Rosenthal, J. S. (2004).

Probability and Statistics: the Science of Uncertainty , W HFreeman and Company, New York.Froelich, A. (2008). Using R in probability and mathematical statistics courses,

ASA Proceedings of theJoint Statistical Meetings .GAISE College Group (2005). Guidelines for assessment and instruction in statistics education, , accessed August 18, 2013,

Technical re-port , American Statistical Association.Gelman, A., Carlin, J. B., Stern, H. S. & Rubin, D. B. (2004).

Bayesian data analysis (second edition) ,Chapman and Hall.Gentleman, R. & Temple Lang, D. (2007). Statistical analyses and reproducible research,

Journal ofComputational and Graphical Statistics (1): 1–23.Gould, R. (2010). Statistics and the modern student, International Statistical Review (2): 297315.Halmos, P. (1985). I Want to Be a Mathematician: An Automathography , Springer.Horton, N. J., Brown, E. R. & Qian, L. (2004). Use of R as a toolbox for mathematical statisticsexploration,

The American Statistician (4): 343–357.Ihaka, R. & Gentleman, R. (1996). R: A language for data analysis and graphics, Journal of Computa-tional and Graphical Statistics (3): 299–314.Jones, F. B. (1977). The Moore method, American Mathematical Monthly : 273–278.24aplan, D. (2003). Introduction to Scientiﬁc Computation and Programming , CL-Engineering.Lamport, L. (2011). LaTeX: a document preparation system, ,accessed August 18, 2013,

Technical report , LaTeX Project.Lavine, M. (2013).

Introduction to Statistical Thought, , accessed August 18, 2013 , Creative Commons.Lazar, N. A., Reeves, J. & Franklin, C. (2011). A capstone course for undergraduate statistics majors,

The American Statistician (3): 183–189.Legler, J., Roback, P., Ziegler-Graham, K., Scott, J., Lane-Getaz, S. & Richey, M. (2010). A model foran interdisciplinary undergraduate research program, The American Statistician : 184–189.Leisch, F. (2002). Sweave, part I: Mixing R and L A TEX,

R News (3): 28–31.McLoughlin, P. (2008). A modiﬁed Moore approach to teaching probability and mathematical statistics:An inquiry based learning technique, ASA Proceedings of the Joint Statistical Meetings .Moore, D. S., Cobb, G. W., Garﬁeld, J. & Meeker, W. Q. (1995). Statistics education ﬁn de si`ecle,

TheAmerican Statistician : 250–260.Nolan, D. A. (2003). Case studies in the mathematical statistics course, Science and statistics: Afestschrift for Terry Speed (IMS Press, Fountain Hills, AZ) pp. 165–176.Nolan, D. & Speed, T. (eds) (2000).

Stat Labs: Mathematical Statistics Through Applications , Springer-Verlag, New York.Nolan, D. & Temple Lang, D. (2003). Case studies and computing: broadening the scope of statisticaleducation,

Proceedings of the 2003 ISI Meeting .Nolan, D. & Temple Lang, D. (2010). Computing in the statistics curriculum,

The American Statistician (2): 97–107.Pruim, R. (2011). Foundations and Applications of Statistics: An Introduction using R , American Math-ematical Society.R Core Team (2013).

R: A Language and Environment for Statistical Computing , R Foundation forStatistical Computing, Vienna, Austria.

URL:

Rice, J. A. (1995).

Mathematical statistics and data analysis , Duxbury.Rossman, A. & Chance, B. (2003). Notes from the 2003 JSM session

Is the Math Stat Course Obso-lete? , accessed August18, 2013,

Technical report , American Statistical Association.Smith, J. C., Yoo, S. & Nichols, S. R. (2009). Evaluation and assessment: Effectiveness of the method,

The Moore Method: A Pathway to Learner-Centered Instruction , pp. 139–149.25ild, C. J., Pfannkuch, M., Regan, M. & Horton, N. J. (2010). Towards more accessible conceptionsof statistical inference (with discussion),

Journal of the Royal Statistical Society, Series A (AppliedStatistics) : 247–295.Workgroup on Undergraduate Statistics (2000). Guidelines for undergraduate statistics programs, , accessed Au-gust 18, 2013,

Technical report , American Statistical Association.Xie, Y. (2012). knitr: A general-purpose package for dynamic report generation in R . R package version0.8.

URL: http://CRAN.R-project.org/package=knitr, accessed August 18, 2013 nline Appendix: Additional Example Problems I hear, I forget. I do, I understand:a modiﬁed Moore-method mathematical statistics courseThe following material is proposed as an online appendix. σ using IQR Assume that we observe n iid observations from a normal distribution. Questions:i. Use the IQR of the list to estimate σ .ii. Use simulation to assess the variability of this estimator for samples of n = 100 and .iii. How does the variability of this estimator compare to ˆ s (usual estimator)? numsim=1000; mu=42; n1=100; n2=400runsim = function(numsim, n, mu, sigma) {res1 = numeric(numsim); res2 = res1for (i in 1:numsim) {mynorms = rnorm(n, 0, sigma)vals = quantile(mynorms)res1[i] = (vals[4]-vals[2])/1.34898res2[i] = sd(mynorms)}return(data.frame(IQR=res1, S=res2))}res100 = runsim(numsim, n1, mu, pi)res400 = runsim(numsim, n2, mu, pi)boxplot(res100$IQR, res100$S, res400$IQR, res400$S,names=c("n=100 (IQR)","n=100 (S)","n=400 (IQR)", "n=400 (S)"),ylab="distribution of sigma-hat")text(3.5, 4.0, "True sigma is 3.14159")abline(h=pi); abline(v=2.5) Figure 6: R code to carry out simulation study (estimation of σ )27 llllllll lllll llllll lllll n=100 (IQR) n=100 (S) n=400 (IQR) n=400 (S) . . . . . . i s t r i bu t i on o f s i g m a − ha t True sigma is 3.14159

Figure 7: Distribution of sample estimates by estimator and sample size28 .4.1 Solution i. We know that for a standard normal random variable P ( Z > . . . So we would expectthat the IQR (interquartile range) would extend to ∗ . . standard units. We use thisexpectation to determine the estimator: ˜ s = IQR/ . .ii. We carried out a simple simulation study with a ﬁxed mean and standard deviation (set to π ). Athousand simulations of samples were taken using ˜ s and ˆ s (MLE). The results are displayed inFigure 7. We note that both estimators are less variable when n = 400 than for n = 100 andconclude that the variability of the estimators goes down as a function of √ n .iii. The IQR for ˜ s is 0.50 for n=100 and 0.25 for n=400, while the IQR for ˆ s is 0.30 for n=100 and0.16 for n=400. We conclude that the MLE is more efﬁcient than our ad-hoc estimator. This exercise was included with a problem set mid-way through the class as the nature and propertiesof estimators were explored. This problem introduced the idea of a simulation study to investigate thebehavior of a new estimator. While the analytic solution was straightforward, it required the students tothink about estimation in a different way, and tap properties of the normal distribution. The empiricalsolution provided a glimpse into the additional variability of the IQR estimator compared to the standardestimator of standard deviation. A full analytic solution for this problem was beyond the scope of thecourse, but can be undertaken for speciﬁc values of n . Perform a simulation study on the sensitivity of the χ test for the uniform distribution to expected cellcounts below 5. Simulate the distribution of the test statistic for 16 and 64 observations from a uniformdistribution using 8 equal-length bins (from Nolan & Speed (2000)). We know that the chi-square test is recommended only in situations where the expected cell count is 5or more in each cell. In this simulation study, we generate repeated samples from the null distributionand compare these to the large-sample distribution of the chi-square ( χ ) statistic (see Figure 8). Weknow that in this setting, the appropriate degrees of freedom are equal to the number of bins minus1. The main work is done using the simchisq() function, which generates data from a continuousuniform variable, then constructs the observed and expected cell counts and the chi-square statistic. Thisis repeated for the two scenarios and displayed in Figure 9. We see that the observed distribution under29 imchisq = function(n, bins) {vals = cut(runif(n, 0, bins), breaks=0:bins)obs = c(table(vals))exp = c(rep(n/bins, bins))return(sum(((obs - exp)ˆ2)/exp))}library(mosaic); par(mfrow=c(1, 2))bins = 8; n = 16 Figure 8: R code to carry out simulation study (chi-square problem)the null is somewhat jumpy (due to the discreteness of the possible values) when the expected cell countsare low (left ﬁgure), and that the observed curve is quite similar to the chi-square distribution when theexpected cell count is 8 (right ﬁgure).

This problem was intended to provide more practice in the construction of simulation studies as well asintroduce new idioms related to looping and writing of functions. It also serves to highlight the impor-tance of assumptions and the idea of sampling under the null distribution (as a precursor to resamplingbased inference). This was included with a group of problems mid-way through the class as the natureand properties of tests of hypotheses along with sampling distributions under the null were explored.30 . . . . . . . N=16, 8 bins D en s i t y . . . . . . . N=64, 8 bins D en s i t yy