[PDF] A Conversation with Donald B. Rubin

Abstract

Donald Bruce Rubin is John L. Loeb Professor of Statistics at Harvard University. He has made fundamental contributions to statistical methods for missing data, causal inference, survey sampling, Bayesian inference, computing and applications to a wide range of disciplines, including psychology, education, policy, law, economics, epidemiology, public health and other social and biomedical sciences.

Full PDF

aa r X i v : . [ s t a t . O T ] O c t Statistical Science (cid:13)

Institute of Mathematical Statistics, 2014

A Conversation with Donald B. Rubin

Fan Li and Fabrizia Mealli

Abstract.

Donald Bruce Rubin is John L. Loeb Professor of Statis-tics at Harvard University. He has made fundamental contributionsto statistical methods for missing data, causal inference, survey sam-pling, Bayesian inference, computing and applications to a wide rangeof disciplines, including psychology, education, policy, law, economics,epidemiology, public health and other social and biomedical sciences.Don was born in Washington, D.C. on Decem-ber 22, 1943, to Harriet and Allan Rubin. One yearlater, his family moved to Evanston, Illinois, wherehe grew up. He developed a keen interest in physicsand mathematics in high school. In 1961, he wentto college at Princeton University, intending to ma-jor in physics, but graduated in psychology in 1965.He began graduate school in psychology at Harvard,then switched to Computer Science (MS, 1966) andeventually earned a Ph.D. in Statistics under thedirection of Bill Cochran in 1970. After graduatingfrom Harvard, he taught for a year in Harvard’s De-partment of Statistics, and then in 1971 he beganworking at Educational Testing Service (ETS) andserved as a visiting faculty member at Princeton’snew Statistics Department. He held several visitingacademic appointments in the next decade at Har-vard, UC Berkeley, University of Texas at Austinand the University of Wisconsin at Madison. He wasa full professor at the University of Chicago in 1981–1983, and in 1984 moved back to the Harvard Statis-tics Department, where he remains until now, andwhere he served as chair from 1985 to 1994 and from2000 to 2004.

Fan Li is Assistant Professor, Department of StatisticalScience, Duke University, Durham, North Carolina27708-0251, USA e-mail: ﬂ[email protected]. FabriziaMealli is Professor, Department of Statistics, ComputerScience, Applications, University of Florence, VialeMorgagni 59, Florence 50134, Italy e-mail:[email protected]ﬁ.it.

This is an electronic reprint of the original articlepublished by the Institute of Mathematical Statistics in

Statistical Science , 2014, Vol. 29, No. 3, 439–457. Thisreprint diﬀers from the original in pagination andtypographic detail.

Don has advised or coadvised over 50 Ph.D. stu-dents, written or edited 12 books, and publishednearly 400 articles. According to Google Scholar, byMay 2014, Rubin’s academic work has 150,000 cita-tions, 16,000 in 2013 alone, placing him at the topwith the most cited scholars in the world.For his many contributions, Don has been hon-ored by election to Membership in the US NationalAcademy of Sciences, the American Academy ofArts and Sciences, the British Academy, and Fellow-ship in the American Statistical Association, Insti-tute of Mathematical Statistics, International Sta-tistical Institute, Guggenheim Foundation, Hum-boldt Foundation and Woodrow Wilson Society. Hehas also received the Samuel S. Wilks Medal fromthe American Statistical Association, the ParzenPrize for Statistical Innovation, the Fisher Lecture-ship and the George W. Snedecor Award of theCommittee of Presidents of Statistical Societies. Hewas named Statistician of the Year by the AmericanStatistical Association’s Boston and Chicago Chap-ters. In addition, he has received honorary degreesfrom Bamberg University, Germany and the Univer-sity of Ljubljana, Slovenia.Besides being a statistician, he is a music lover,audiophile and fan of classic sports cars.This interview was initiated on August 7, 2013,during the Joint Statistical Meetings 2013 in Mon-treal, in anticipation of Rubin’s 70th birthday,and completed at various times over the followingmonths.

BEGINNINGS

Fan:

Let’s begin with your childhood. I under-stand you grew up in a family of lawyers, whichmust have heavily inﬂuenced you intellectually. Canyou talk a little about your family? F. LI AND F. MEALLI

Don:

Yes. My father was the youngest of fourbrothers, all of whom were lawyers, and we usedto have stimulating arguments about all sorts oftopics. Probably the most argumentative uncle wasSy (Seymour Rubin, senior partner at Arnold, For-tas and Porter, diplomat, and professor of law atAmerican University), from D.C., who had framedpersonal letters of thanks for service from all thepresidents starting with Harry Truman and goingthrough Jerry Ford, as well as from some contenders,such as Adlai Stevenson, and various Supreme CourtJustices. I found this impressive but daunting. Therelevance of this is that it clearly created in me adeep respect for the principles of our legal system,to which I ﬁnd statistics highly relevant—this hasobviously inﬂuenced my own application of statis-tics to law, for example, concerning issues as diverseas the death penalty, aﬃrmative action and the to-bacco litigation.

Fabri:

We will surely get back to these issues later,but was there anyone else who inﬂuenced your in-terest in statistics?

Don:

Probably the most inﬂuential was Mel, mymother’s brother, a dentist (then a bachelor). Heloved to gamble small amounts, either in the bleach-ers at Wrigley Field, betting on the outcome ofthe next pitch, while watching the Cubs lose, orat Arlington Race track, where I was taught at ayoung age how to read the Racing Form and esti-mate the “true” odds from the various displayed bet-ting pools, while losing two dollar bets. Wednesdayand Saturday afternoons, during the warm monthswhen I was a preteen, were times to learn statistics—even if at various bookie joints that were sometimesraided. As I recall, I was a decent student of his, butstill lost small amounts.There were two other important inﬂuences onmy statistical interests from the late 1950s andearly 1960s. First, there was an old friend of myfather’s from their government days together, aProfessor Emeritus of Economics at UC Berkeley,George Mehren, with whom I had many entertain-ing and educational (to me) arguments, which gener-ated a respect for economics that continues to growto this day. And second, my wonderful teacher ofphysics at Evanston Township High School—RobertAnspaugh—who tried to teach me to think like areal scientist, and how to use mathematics in thepursuit of science.By the time I left high school for college, I appreci-ated some statistical thinking from gambling, some

Fig. 1.

Five-year old D. B. Rubin. scientiﬁc thinking from physics, and I had deep re-spect for disciplines other than formal mathematics,in particular, physics and the law. These, in hind-sight, are exposures that were crucial to the kindof statistics to which I gravitated in my later years.More details of the inﬂuence of my mentors can befound in Rubin (2014b).

COLLEGE TIME AT PRINCETON

Fan:

You entered Princeton in 1961, ﬁrst as aphysics major, but later changed to psychology.Why the change and why psychology?

Don:

That’s a good question. Inspired by Anspaugh,I wanted to become a physicist. I was lined upfor a BA in three years when I entered Prince-ton, and unknown to me before I entered, also linedup for a crazy plan to get a Ph.D. in physics inﬁve years, in a program being reconditely plannedby John Wheeler, a very well-known professor ofphysics there (and Richard Feynman’s Ph.D. ad-visor years earlier). In retrospect, this was a wildlyover-ambitious agenda, at least for me. For a combi-nation of complications, including the Vietnam War(and its associated drafts) and Professor Wheeler’ssabbatical at a critical time, I think no one suc-ceeded in completing a ﬁve-year Ph.D. from entry.In any case, there were many kids like me at Prince-ton then, who, even though primarily interested inmath and physics, were encouraged to explore othersubjects. I did that, and one of the courses I tookwas on personality theory, taught by a wonderfulprofessor, Silvan Tomkins, who later became a goodfriend. At the end of my second year, I switchedfrom Physics to Psychology, where my mathemati-

CONVERSATION WITH D. B. RUBIN cal and scientiﬁc background seemed both rare andappreciated—it was an immature decision (not surewhat a mature one would have been), but a ﬁne onefor me because it introduced me to some new waysof thinking, as well as to new fabulous academicmentors. Fabri:

You had some computing skills which wereuncommon then, right? So you started to use com-puters quite early.

Don:

Yes. Sometime between my ﬁrst and secondyear at Princeton, I taught myself Fortran. As youmentioned, those skills were not common, even atplaces like Princeton then.

Fabri:

Was learning Fortran just a matter of hav-ing fun or did you actually use these skills to solveproblems?

Don:

It was for solving problems. When I was inthe Psychology Department, I was helping to sup-port myself by coding some of the early batch com-puter packages for PSTAT, a Princeton statisticalsoftware package, which competed with BMDP ofUCLA at the time. I also wrote various programsfor simulating human behavior.

Fan:

In your senior year at Princeton, you appliedfor Ph.D. programs in psychology and were acceptedby several very good places.

Don:

Yes, I was accepted by Stanford, Michi-gan and Harvard. I met some extraordinary peo-ple during my visits to these programs. I went outto Stanford ﬁrst, and met William Estes, a quietbut wonderful professor with strong mathematicalskills and a wry wit, who later moved to Harvard.Michigan had a very strong mathematical psychol-ogy program, and when I visited in the spring of1965, I was hosted primarily by a very promisinggraduating Ph.D. student, Amos Tversky, who wasdoing extremely interesting work on human behav-ior and how people handled risks. In later years, heconnected with another psychologist, Daniel Kahne-man, and they wrote a series of extremely inﬂuentialpapers in psychology and economics, which eventu-ally led to Kahneman’s winning the Nobel Prize inEconomics in 2002; Tversky passed away in 1996and was thus not eligible for the Nobel Prize. Kahne-man (who recently was awarded a National Medal ofScience by President Obama) always acknowledgesthat the Nobel Prize was really a joint award (toTversky and him). I was on a committee sometimelast year with Kahneman, and it was interesting toﬁnd out that I had known Tversky longer than hehad.

Fan:

But ultimately you chose Harvard.

Don:

Well, we all make strange decisions. The rea-son was that I had an east-coast girlfriend who hadanother year in college.

GRADUATE YEARS AT HARVARD

Fabri:

You ﬁrst arrived at Harvard in 1965 as aPh.D. student in psychology, which was in the De-partment of Social Relations then, but were soondisappointed, and switched to computer science.What happened?

Don:

When I visited Harvard in the summer of1965, some senior people in Social Relations ap-peared to ﬁnd my background, in subjects like mathand physics, attractive, so they promised me that Icould skip some of the basic more “mathy” require-ments. But when I arrived there, the chair of thedepartment, a sociologist, told me something like,“No, no, I looked over your transcript and foundyour undergraduate education scientiﬁcally deﬁcientbecause it lacked ‘methods and statistics’ courses.You will have to take them now or withdraw.” Be-cause of all the math and physics that I’d had atPrinceton, I felt insulted! I had to get out of there.Because I had independent funding from an NSFgraduate fellowship, I looked around. At the time,the main applied math appeared being done in theDivision of Engineering and Applied Physics, whichrecently became the Harvard’s “School of Engineer-ing and Applied Sciences.” The division had severalsections; one of them was computer science (CS),which seemed happy to have me.

Fan:

But you got bored again soon. Was this be-cause you found the problems in CS not interestingor challenging enough?

Don:

No, not really that. There were several rea-sons. First, there was a big emphasis on automaticlanguage translation, because it was cold war time,and it appeared that CS got a lot of money for com-putational linguistics from ARPA (Advanced Re-search Projects Agency), now known as DARPA.The Soviet Union, from behind the iron curtain, pro-duced a huge number of documents in Russian, butevidently there were not enough people in the USto translate them. A complication is that there aresentences that you could not translate without theircontext. I still remember one example: “Time ﬂiesfast,” a three-word sentence that has three diﬀerentmeanings depending on which of the three wordsis the verb. If this three-word sentence cannot be

F. LI AND F. MEALLI automatically translated, how can one get an auto-matic (i.e., by computer) translation of a complexparagraph? Related to this was Noam Chomsky’swork on transformational grammars, down the riverat MIT.Second, although I found some real math coursesand the ones in CS on mathy topics, such as com-putational complexity, which dealt with Turing ma-chines, Godel’s theorem, etc., interesting, I foundmany of the courses dull. Much of the time they wereabout programming. I remember one of my projectswas to write a program to project 4-dimensional ﬁg-ures into 2-dimensions, and then rotate them us-ing a DEC PDP-1. It took an enormous number ofhours. Even though my program worked perfectly,I felt it was a gigantic waste of time. I also got aC+ in that course because I never went to any ofthe classes. Now, having dealt with many students,I would be more sympathetic that I deserved a C+,but not when I was a kid. At that time, I ﬁguredthere must be something better to do than rotating4D objects and getting a C+. But marching throughrice paddies in Vietnam or departing for somewherein Canada didn’t seem appealing. So after pickingup a MS degree in CS in 1966, although I stayed an-other year in CS, I was ready to try something else.

Fabri:

How did statistics end up in your path?

Don:

A summer job in Princeton in 1966 led to it.I did some programming for John Tukey in Fortran,LISP and COBOL. I also did some consulting fora Princeton sociology professor, Robert Althauser,basically writing programs to do matched sampling,matching blacks and whites, to study racial dispar-ity in dropout rates at Temple University. I had aconversation with Althauser about how psychologyand then CS weren’t working out for me at Harvard.Because Bob was doing some semi-technical thingsin sociology, he knew of Fred Mosteller, althoughnot personally, and also knew that Harvard had adecade-old Statistics Department that was foundedin 1957. He suggested that I contact Mosteller. Af-ter getting back to Harvard, I talked to Fred, andhe suggested that I take some stat courses. So in mythird year in Harvard, I took mostly stat coursesand did OK in them. And the Stat department said“Yes” to me. It also helped to have my own NSFfunding, which I had from the start; they kept re-newing for some reason, showing their bad tasteprobably, but it worked out well for me. Anyway, atthe end of my third year at Harvard, I had switchedto statistics, my third department in four years.

Fabri:

Besides Mosteller, who else was on thestatistics faculty then? It was a quite new depart-ment, as you said.

Don:

The other senior people were Bill Cochranand Art Dempster, who had recently been promotedto tenure. The junior ones were Paul Holland; JayGoldman, a probabilist; and Shulamith Gross fromBerkeley, a student of Erich Lehmann’s.

Fabri:

And you decided to work with Bill.

Don:

Actually, I ﬁrst talked to Fred. Fred alwayshad a lot of projects going; one was with John Tukeyand he proposed that I work on it. I told him that Ihad this matched sampling project of my own, andhe suggested that I talk to Cochran—Cochran a fewyears earlier was an advisor for the Surgeon Gen-eral’s report on smoking and lung cancer. It was ob-viously based on observational data, not on random-ized experiments, and Fred said that Cochran knewall about these issues in epidemiology and biostatis-tics. So I went to knock on Bill’s door. He answeredwith a grumpy sounding “yes,” I went in and hesaid, “No, not now, later!” So I thought “Hmmm,rough guy,” but actually he was a sweetheart, witha great Scottish dry sense of humor and a love ofscotch and cigarettes (I understand the former, al-though not the latter).

Fabri:

Cochran did have a lasting inﬂuence on you,right?

Don:

Yes, he had a tremendous inﬂuence on me.Once I was doing some irrelevant math on match-ing, which I now see popping up again in the lit-erature. I showed that to Bill, and he asked me,“Do you think that’s important, Don?” I said, “Well,I don’t know.” Then he said, “It is not importantto me. If you want to work on it, go ﬁnd someoneelse to advise you. I care about statistical problemsthat matter, not about making things epsilon bet-ter.” Another person who was very inﬂuential wasArt Dempster. Once I did some consulting for DataText, a collection of batch computer programs likePSTAT or BMDP. I was designing programs to cal-culate analyses of variance, do regressions, ordinaryleast squares, matrix inversions, all when you have,in hindsight, limited computing power. For adviceon some of those I talked to Dempster, who alwayshas great multivariate insights based on his deepunderstanding of geometry—very Fisherian.

Fan:

Your Ph.D. thesis was on matching, which isthe start of your life-long pursuit of causal inference.How did your interest in causal inference start?

Don:

When I worked with Althauser on the racialdisparity problem, I always emphasized to him that

CONVERSATION WITH D. B. RUBIN it was inherently descriptive, not really causal. I re-membered enough from my physics education inhigh school and Princeton that association is notcausation. So I was probably not intrigued by causalinference per se, but rather by the confusion that thesocial scientists had about it. You have to describe areal or hypothetical experiment where you could in-tervene, and after you intervene, you see how thingschange, not in time but between intervention (i.e.,treatment) groups. If you are not talking about in-tervention, you can’t talk about causality. For somereason, when I look at old philosophy, it seems tome that they didn’t get it right, whereas in previ-ous centuries, some experimenters got it. They bredcows, or mated hunting falcons. If you mated excel-lent female and male falcons, the resulting next gen-eration of falcons would generally be better huntersthan those resulting from random mating. In the20th century, many scientists and experimentalistsgot it. Fabri:

So you were only doing descriptive com-parisons in your Ph.D. thesis, and the notation ofpotential outcomes was not there.

Don:

Partly correct. At that time, the notation ofpotential outcomes was in my mind, because that isthe way that Cochran initiated discussions of ran-domized experiments in the class he taught in 1968.Initially, it was all based on randomization, unbi-asedness, Fisher’s test, etc. But the concepts had tobe ﬂipped into ordinary least squares (OLS) regres-sion and analysis of variance tables, because nobodycould compute anything diﬃcult back then. One ofthe lessons in Bill’s class in regression and experi-mental design was to use the abbreviated Dolittlemethod to invert matrices, by hand! So you reallycouldn’t do randomization tests in any generality.The other reason I was interested in experimentsand social science was my family history. There wasalways this legal question lurking: “But for this al-leged misconduct, what would have happened?”

Fan:

What was your ﬁrst job after getting yourPh.D. degree in 1970?

Don:

I stayed at Harvard for one more year, asan instructor in the Statistics Department, partlysupported by teaching, partly supported by theCambridge Project, which was an ARPA fundedHarvard–MIT joint eﬀort; the idea was to bring thecomputer science technologies of MIT and the socialsciences research of Harvard together to do wonder-ful things in the social sciences. In the Statistics De-partment, I was coteaching with Bob Rosenthal the“Statistics for Psychologists” course that, ironically,

Fig. 2.

D. B. Rubin (on left) with his puppy friend Thor(on right), about 1967. the Social Relations Department wanted me to takeﬁve years earlier, thereby driving me out of their de-partment! Bob had, and has, tremendous intuitionfor experimental design and other practical issues,and we have written many things together.

THE ETS DECADE: MISSING DATA, EM ANDCAUSAL INFERENCE

Fan:

After that one year, you went for a positionat ETS in Princeton instead of a junior faculty posi-tion in a research university. It was quite an unusualchoice, given that you could probably have found aposition in a respected university statistics depart-ment easily.

Don:

Right—many people thought I was goofy.I did have several good oﬀers, one was to stay atHarvard, and another was to go to Dartmouth. ButI met Al Beaton, who was later my boss at ETSin Princeton, at a conference in Madison, Wiscon-sin, and he oﬀered me a job, which I took. Al had adoctorate in education at Harvard, and had workedwith Dempster on computational issues, such as the

F. LI AND F. MEALLI “sweep operator.” He was a great guy with a deepunderstanding of practical computing issues. Also,he appreciated my research. Because I was an un-dergrad at Princeton, it was almost like going home.For several years, I taught one course at Princeton.Between the jobs at ETS and Princeton, I was earn-ing twice what the Harvard salary would have been,which allowed me to buy a house on an acre and ahalf, with a garage for rebuilding an older Mercedesroadster, etc. A diﬀerent style of life from that inCambridge.

Fan:

You seem to have had a lot of freedom topursue research at the ETS. What was your respon-sibility at ETS?

Don:

The position at ETS was like an academicposition with teaching responsibilities replaced byconsulting on ETS’s social science problems, in-cluding psychological and educational testing ones.I found consulting much easier for me than teach-ing, and ETS had interesting problems. Also therewere many very good people around, like Fred Lord,who was highly respected in psychometrics. ThePrinceton faculty was great, too: Geoﬀrey Watson(of the Durbin–Watson statistic) was the chair; Pe-ter Bloomﬁeld was there as a junior faculty memberbefore he moved to North Carolina; and of courseTukey was still there, even though he spent a lot oftime at Bell Labs. John was John, having a spectac-ular but very unusual way of thinking—obviously agenius. Stuart Hunter was in the Engineering Schoolthen. These were ﬁne times for me, with tremen-dous freedom to pursue what I regarded as impor-tant work.

Fabri:

By any measure, your accomplishments inthe ETS years were astounding. In 1976, you pub-lished the paper “Inference and Missing Data” inBiometrika (Rubin, 1976) that lays the foundationfor modern analysis of missing data; in 1977, withArthur Dempster and Nan Laird, you published theEM paper “Maximum Likelihood from IncompleteData via the EM Algorithm” in JRSS-B (Demp-ster, Laird and Rubin, 1977); in 1974, 1977, 1978,you published a series of papers that lay the foun-dation for the Rubin Causal Model (Rubin, 1974,1977, 1978a). What was it like for you at that time?How come so many groundbreaking ideas explodedin your mind at the same time?

Don:

Probably the most important reason is that Ialways worried about solving real problems. I didn’tread the literature to uncover a hot topic to writeabout. I always liked math, but I never regarded much of mathematical statistics as real math—muchof it is just so tedious. Can you keep track of theseepsilons?

Fabri:

There is no coincidence that all these pa-pers share the common theme of missing data.

Don:

That’s right. That theme arose when I wasa graduate student. The ﬁrst paper I wrote on miss-ing data, which is also my ﬁrst sole-authored paper,was on analysis of variance designs, a quite algo-rithmic paper. It was always clear to me, from theexperimental design course from Cochran that youshould set up experiments as missing data problems,with all the potential outcomes under the not-takentreatments missing. But nobody did observationalstudies that way, which seemed very odd to me. In-deed, nobody was using potential outcomes outsidethe context of randomized experiments, and eventhere, most writers dropped potential outcomes infavor of least squares when actually doing things.

Fan:

What was the state of research on missingdata before you came into the scene?

Don:

It was extremely ad hoc. The standard ap-proach to missing data then was comparing the bi-ases of ﬁlling in the means, or of regression impu-tation under diﬀerent situations, but almost alwaysunder an implicit “missing completely at random”assumption. The purely technical sides of these pa-pers are solid. But I found there were always counterexamples to the propriety of the speciﬁc methodsbeing considered, and to explore them, one almostneeded a master’s thesis for each situation. I wouldrather address the class of problems with some gen-erality. There is a mechanism that creates missingdata, which is critical for deciding how to deal withthe missing data. That idea of formal indicators formissing data goes way back in the contexts of exper-imental design and survey design. I am consistentlyamazed how this was not used in observational stud-ies until I did so in the 1970s; maybe someone did,but I’ve looked for years and haven’t found any-thing. But probably because the missing data paperwas done in a relatively new way, I had great diﬃ-culty in getting it published (more details in Rubin,2014a).

Fan:

The EM algorithm is another milestone inmodern statistics; it is also relevant in computer sci-ence and one of the most important algorithm indata mining. Though similar ideas had been used inseveral speciﬁc contexts before, nobody had realizedthe generality of EM. How did Dempster, Laird andyou discover the generality?

CONVERSATION WITH D. B. RUBIN Don:

In those early years at ETS, I had the free-dom to remain in close contact with the Harvardpeople, Cochran, Dempster, Holland and Rosenthal,which was very important to me. I always enjoyedtalking to Dempster, who is a very principled anddeep thinker. I was able to arrange some consultingprojects at ETS to bring him to Princeton. Oncewe were talking about some missing data problem,and we started discussing ﬁlling these values in, butI knew it wouldn’t work in generality. I pointedto a paper by Hartley and Hocking (1971), wherethey deserted the approach of iteratively ﬁlling inmissing values, as in Hartley (1956) for the counteddata case, and went to Newton–Raphson, I think, inthe normal case. Even though aspects of EM wereknown for years, and Hartley and others were sortof nibbling around the edges of EM, apparently no-body put it all together as a general algorithm. Artand I realized that you have to ﬁll in suﬃcient statis-tics. I had all these examples like t distributions,factor analysis (the ETS guys loved that), latentclass models. And Art had a great graduate stu-dent, Nan Laird, available to work on parts of it,and we started writing it up. The EM paper wasaccepted right away by JRSS-B, even with inviteddiscussions.

Fan:

Now let’s talk more about causal inference.You are known for proposing the general poten-tial outcome framework. It was Neyman who ﬁrstmentioned the notation of potential outcomes inhis Ph.D. thesis (Neyman, 1990), but the notationseemed to have long been neglected.

Don:

Yes, it was ignored outside randomizedexperiments. Within randomized studies, the no-tion became standard and used, for example, inKempthorne’s work, but as I mentioned earlier, ig-nored otherwise.

Fan:

Were you aware of Neyman’s work before?

Don:

No. I wasn’t aware of his work deﬁning po-tential outcomes until 1990 when his Ph.D. thesiswas translated into English, although I attributedmuch of the perspective to him because of his workon surveys in Neyman (1934) and onward (see Ru-bin, 1990a, followed by Rubin, 1990b).

Fabri:

You actually met Neyman when you visitedBerkeley in the mid-1970s. During all those lunches,had you ever discussed causal inference and poten-tial outcomes with him?

Don:

I did. In fact, I had an oﬃce right next tohis. Neyman came to Berkeley in the late 30s. Hewas very impressive, not only as a mathematical statistician, but also as an individual. There wasa tremendous aura about him. Shortly after arriv-ing in Berkeley, I gave a talk on missing data andcausal inference. The next day, I went to lunch withNeyman and I said something like, “It seems to methat formulating causal problems in terms of miss-ing potential outcomes is an obvious thing to do, notjust in randomized experiments, but also in observa-tional studies.” Neyman answered to the eﬀect that(remarkable in hindsight because he did so withoutacknowledging that he was the person who ﬁrst for-mulated potential outcomes), “No, causality is fartoo speculative in nonrandomized settings.” He re-peated something like this quote from his biography,“ . . .

Without randomization an experiment has lit-tle value irrespective of the subsequent treatment.”(Also see my comment on this conversion in Ru-bin, 2010.) Then he went to say politely but ﬁrmly,“Let’s not talk about that, let’s instead talk aboutastronomy.” He was very into astronomy at the time.

Fabri:

You probably learned the reasons why hewas so involved in the frequentist approach.

Don:

Yes. I remember we once had a conversationabout what conﬁdence intervals really meant andwhy the formal Neyman–Pearson approach seemedirrelevant to me. He said something like, “You misin-terpret what we have done. We were doing the math-ematics; go back and read my 1934 paper where Iﬁrst deﬁned a conﬁdence interval.” He deﬁned it as aprocedure that has the correct coverage for all priordistributions (see page 589, Neyman, 1934). If youthink of that, you are forced to include all point masspriors and, therefore, you are forced to do Neyman–Pearson. He went on to say (approximately), “If youare a real scientist with a class of problems to workon, you don’t care about all point-mass priors, youonly care about the priors for the class of problemsyou will be working on. But if you are doing themathematics, you can’t talk about the problems youor anyone is working on.” I tried to make this pointin a comment (Rubin, 1995), but it didn’t seem toresonate to others.

Fabri:

In his famous 1986 JASA paper, Paul Hol-land coined the term “Rubin Causal Model (RCM),”referring to the potential outcome framework tocausal inference (Holland, 1986). Can you explainwhy, if you think so, the term “Rubin Causal Model”is a fair description of your contribution to thistopic?

Don:

Actually Angrist, Imbens and I had a re-joinder in our 1996 JASA paper (Angrist, Imbens

F. LI AND F. MEALLI and Rubin, 1996), where we explain why we thinkit is fair. Neyman is pristinely associated with thedevelopment of potential outcomes in randomizedexperiments, no doubt about that. But in the 1974paper (Rubin, 1974), I made the potential outcomesapproach for deﬁning causal eﬀects front and cen-ter, not only in randomized experiments, but also inobservational studies, which apparently had neverbeen done before. As Neyman told me back in Berke-ley, in some sense, he didn’t believe in doing statisti-cal inference for causal eﬀects outside of randomizedexperiments.

Fan:

Also there are features in the RCM, such asthe deﬁnition of the assignment mechanism, that be-long to you.

Don:

Yes, it was crucial to realize that random-ized experiments are embedded in a larger class ofassignment mechanisms, which was not in the liter-ature. Also, in the 1978 paper (Rubin, 1978a), I pro-posed three integral parts to this RCM framework:potential outcomes, assignment mechanisms, and a(Bayesian) model for the science (the potential out-comes and covariates). The last two parts were notonly something that Neyman never did, he possiblywouldn’t even like the third part. In fact, I thinkit is unfair to attribute something to someone whois dead, who may not approve of the content beingattributed. If the fundamental idea is clear, such aswith Fisher’s randomization test of a sharp null hy-pothesis, sure, attribute that idea to Fisher no mat-ter what the test statistic, as in Brillinger, Jonesand Tukey (1978). Panos Toulis (a ﬁne HarvardPh.D. student) helped me track down this statementthat I remembered reading in my ETS days from amanuscript John gave to me:“

In the precomputer era, the fact that almost allwork could be done once and for all was of great im-portance. As a consequence, the advantages of ran-domization approaches—except for those few caseswhere the randomization distributions could be dealtwith once and for all—were not adequately valued . One reason for this undervaluation lay in thefact that, so long as randomization was conﬁnedto specially manageable key statistics, there seemedno way to introduce into the randomization ap-proach the insights—some misleading and some im-portant and valuable—into what test statistics wouldbe highly sensitive to the changes that it was mostdesired to detect. The disappearance of this situa-tion with the rise of the computer seems not to have received the attention that it deserves .” (Brillinger,Jones and Tukey, 1978, Chapter 25, page F-5.)

Fabri:

Here I am quoting an interesting questionby Tom Belin regarding potential outcomes: “Doyou believe potential outcomes exist in people asﬁxed quantities, or is the notion that potential out-comes are a device to facilitate causal inference?”

Don:

Deﬁnitely the latter. Among other things,a person’s potential outcomes could change overtime, and how do we know the people who werestudied in the past are still exchangeable with peo-ple today? But there are lots of devices like that inscience.

Fan:

In the RCM, cause/intervention should al-ways be deﬁned before you start the analysis. Inother words, the RCM is a framework to investi-gate the “eﬀects of a cause,” but not the “causes ofan eﬀect.” Some criticize this as a major limitation.Do you regard this as a limitation? Do you thinkit is ever possible to draw inference on the causesof eﬀects from data, or is it, per se, an interestingquestion worth further investigation?

Don:

I regard “the cause” of an event topic asmore of a cocktail conversation topic than a scien-tiﬁc inquiry, because it leads to an essentially inﬁniteregress. Someone says, “He died of lung cancer be-cause he smoked three packs a day”; then someoneelse counters, “Oh no, he died of lung cancer be-cause both of his parents smoked three packs a dayand, therefore, there was no hope of his doing any-thing other than smoking three packs a day”; thenanother one says, “No, no, his parents smoked be-cause his grandparents smoked—they lived in NorthCarolina where, back then, everyone smoked threepacks a day, so the cause is where the grandparentslived,” and so on. How far back should you go? Youcan’t talk sensibly about the cause of an event; youcan talk about “but for that cause (and there canbe many ‘but for’s), what would have happened?”All these questions can be addressed hypothetically.But the cause ? The notion is meaningless to me.

Fabri:

Do you feel that you beneﬁt from knowingabout history of statistics when you are thinkingabout fundamentals of statistics?

Don:

I know some history, but not a large amount.The most important lessons occur when I feel thatI understand why one of the giants, like Fisher orNeyman, got something wrong. When you under-stand why a mediocre thinker got something wrong,you learn little, but when you understand why a ge-nius got something wrong, you learn a tremendousamount.

CONVERSATION WITH D. B. RUBIN Fig. 3.

D. B. Rubin (on left) poses with the captain (onright) of Sy’s boat harbored in Bodrum, Turkey, mid-1970s.

BACK TO HARVARD: PROPENSITY SCORE,MULTIPLE IMPUTATION AND MORE

Fabri:

After those productive years at ETS, youspent some time at the EPA (US EnvironmentalProtection Agency). Why did you decide to move,given that you were apparently doing very well atthe ETS?

Don:

It started partly from my joking answer tothe question, “How long have you been at ETS?”I answered, “Too long.” The problems that I haddealt with at ETS started to appear repetitive, andI felt that I had made important contributions tothem including EM and multiple imputation ideas,which were being used to address some serious is-sues, like test equating, and formulating the rightways to collect data. So I wanted to try somethingelse. At the time, David Rosenbaum was the headof the Oﬃce of Radiation Programs at the EPA. Hehad the grand idea of putting together a team ofapplied mathematicians and statisticians. Somehowhe found my name and invited me to D.C. to ﬁndout whether I wanted to lead such a group. Basi-cally, I had the freedom to hire several people ofmy choice, and I had a good government salary (atthe level of “Senior Executive Service”). So I said,“Let’s see whom I can get.” I was able to convinceboth Rod Little (who was in England at that time)and Paul Rosenbaum (whom I advised while I was still at ETS), as well as Susan Hinkins, who wrotea thesis on missing data at Montana State Univer-sity, and two others. That was shortly before thepresidential election. Then the Democrats lost andReagan was to come in, and everything seemed tobe falling apart. All of a sudden, many of the peo-ple above my level at the EPA (most of whom werepresidential appointments), had to prepare to turnin their resignations, and had to be concerned abouttheir next positions.

Fabri:

So the EPA project ended before it evengot started.

Don:

It didn’t start at all in some sense. I formallysigned on at the beginning of December, and afterone pay period, I turned in my resignation. But Ifelt responsible to ﬁnd jobs for all these people Ibrought there. Eventually, Susan Hinkins got con-nected with Fritz Scheuren at the IRS; Paul Rosen-baum got a position at the University of Wisconsinat Madison; Rod got a job related to the Census.One nice thing about that short period of time isthat, through the projects I was in charge of, I madeseveral good connections, such as to Herman Cher-noﬀ and George Box. George and I really hit it oﬀ,primarily because of his insistence on statistics hav-ing connections to real problems, but also becauseof his wonderful sense of humor, which was wittyand ribald, and his love of good spirits. In any case,the EPA position led to an invitation to visit Box atthe Math Research Center at the University of Wis-consin, which I gladly accepted. That gave me thechance to ﬁnish writing the propensity score paperswith Paul (Rosenbaum and Rubin, 1983a, 1983b,1984a).

Fan:

Since you mentioned propensity score, ar-guably the most popular causal inference techniquein a wide range of applied disciplines, can you givesome insights on the “natural history” of propensityscore?

Don:

I ﬁrst met Paul in 1978, when I came to Har-vard on a Guggenheim fellowship; he was a ﬁrst-yearPh.D. student, extremely bright and devoted. Backin my Princeton days I did some consulting for apsychologist at Rutgers, June Reinisch, who laterbecame the ﬁrst director of the Kinsey Institute af-ter Kinsey. She was very interested in studying thenature-nurture controversy—what makes men andwomen so diﬀerent? She and her husband, who wasalso a psychologist, were doing experiments on ratsand pigs. They injected hormones into the uteri ofpregnant animals, and thereby exposed the fetuses F. LI AND F. MEALLI to diﬀerent prebirth environments; this kind of ran-domized experiment is obviously unethical to dowith humans. One of the problems Paul and I wereworking on for this project, also as part of Paul’sthesis, was matching—matching background char-acteristics of exposed and unexposed. The covari-ates included a lot of continuous and discrete vari-ables, some of which were rare events like certainserious diseases prior to, or during, early pregnancy.Soon it became clear that standard matching ap-proaches, like Mahalanobis matching, do not workwell in such high dimensional settings. You have toﬁnd some type of summaries of these variables andbalance the summaries in the treatment and controlgroups, not individual to individual. Then we real-ized if you have an assignment mechanism, you canmatch on the individual assignment probabilities,which is essentially the Horvitz–Thompson idea, toeliminate all systematic bias. I don’t remember theexact details, but I think we ﬁrst got the propen-sity score idea when working on a Duke data bankon coronary artery bypass surgery, but reﬁned it forthe Reinisch data, which is very similar in principle.Again, the idea of the propensity score is motivatedby addressing real problems, but with generality.

Fan:

Multiple Imputation (MI) is another very in-ﬂuential contribution of yours. Your book “Multi-ple Imputation for Nonresponse in Sample Surveys”(Rubin, 1987a) has commonly been cited as the ori-gin of MI. But my understanding is that you ﬁrstdeveloped the idea and coined the term much ear-lier.

Don:

Correct, I ﬁrst wrote about MI in an ASAproceedings paper in 1978 (Rubin, 1972, 1978b).That’s where “the 18+ years” comes from when Iwrote “Multiple Imputation After 18+ Years” (Ru-bin, 1996).

Fabri:

MI has been developed in the context ofmissing data, but it applicability seems to be farbeyond missing data.

Don:

Yes, MI has been applied and will be, I think,all over the place. The reason I titled the book thatway, “Multiple Imputation for Nonresponse in Sam-ple Surveys,” is that it was obvious to me that inthe settings where you need to create public-usedata sets, you had to have a separation betweenthe person who ﬁxed up the missing data problemand the many people who might do analyses of thedata. So there was an obvious need to do some-thing like this, because users could not possibly have the collection of tools and resources to do the im-putation, for example, using conﬁdential informa-tion. My Ph.D. students, Trivellore Raghunathan(Raghu) and Jerry Reiter, have made wonderful con-tributions to conﬁdentiality using MI. Of course,other great Ph.D. students of mine Nat Schenker,Kim Hung Lee, Xiao-Li Meng, Joe Schafer, as wellas many others, have also made major contributionsto MI. The development of MI really reﬂects the col-lective eﬀorts from these people and others like RodLittle and his colleagues and students.

Fabri:

Rod Little once half-jokingly said, “Wantto be highly cited? Coauthor a book with Rubin!”And indeed he wrote the book “Statistical Analy-sis with Missing Data” with you (Little and Rubin,1987, 2002), which is now regarded as the classictextbook on missing data. There have been a lotof new advances and changes in missing data sincethen. Will we see a new edition of the book thatincorporates these developments sometime soon?

Don:

Oh yes, we are working on that now. Themain changes from 1987 to 2002 reﬂect the greateracceptability of Bayesian methods and MCMC typecomputations. Rod is a fabulous coauthor, a muchmore ﬂuid writer than I am. I believe this third edi-tion will have even more major changes than the2002 one did from the 1987 one, but again manydriven by computational advances.

ON BAYESIAN

Fan:

In the 1978 Annals paper (Rubin, 1978a),you gave, for the ﬁrst time, a rigorous formula-tion of Bayesian inference for causal eﬀects. But theBayesian approach to causal inference did not havemuch following until very recently, and the ﬁeld ofcausal inference is still largely frequentist. How doyou view the role of Bayesian approach in causalinference?

Don:

I believe being Bayesian is the right way toapproach things, because the basic frequentist ap-proach, such as the Fisherian tests and Neyman’sunbiased estimates and conﬁdence intervals, usuallydoes not work in complicated problems with manynuisance unknowns. So you have to go Bayesian tocreate procedures. You can go partially Bayesian us-ing things like posterior predictive checks, whereyou put down a null that you may discover evi-dence against, or direct likelihood approaches as inFrumento et al. (2012); if the data are consistentwith a null that is interesting, you live with it. But

CONVERSATION WITH D. B. RUBIN Fig. 4.

D. B. Rubin at Harvard, early 1980s.

Neyman-style frequentist evaluations of Bayesianprocedures are still relevant.

Fan:

But why is the ﬁeld of causal inference stillpredominantly frequentist?

Don:

I think there are several reasons. First, thereare many Bayesian statisticians who are far more in-terested in MCMC algebra and algorithms, and donot get into the science. Second, I regard the methodof moments (MOM) frequentist approach as peda-gogically easier for motivating and revealing sourcesof information. Take the simple instrumental vari-able setting with one-sided noncompliance. Here, itis very easy to look at the simple MOM estimate tosee where information comes from. With Bayesianmethods, the answer is, in some sense, just there infront of you. But when you ask where the informa-tion comes from, you have to start with any value,and iterate using conditional expectations, or drawsfrom the current joint distributions. You have tohave far more sophisticated mathematical thinkingto understand fully Bayesian ideas. There are theseproblems with missing data (as in my discussionof Efron, 1994) where there are unique, consistentestimates of some parameters using MOM, but forwhich the joint MLE is on the boundary. So I think it is often easier, pedagogically, to motivate simpleestimators and simple procedures, and not try to beeﬃcient when you convey ideas. In causal inference,that corresponds to talking about unbiased or nearlyunbiased estimates of causal estimands, as in Rubin(1977). There are other reasons having to do withthe current education of most statisticians.

Fan:

After EM, starting from the early 1980s,you were heavily involved in developing methodsfor Bayesian computing, including the Bayesianbootstrap (Rubin, 1981), the sampling importance-resampling (SIR) algorithm (Rubin, 1987b), and(lesser-acknowledged) “approximate Bayesian com-putation (ABC)” (Rubin, 1984, Section 3.1).

Don:

It was clear then that computers were goingto allow Bayes to work far more broadly than earlier.You, as well as others such as Simon Tavare, Chris-tian Robert and Jean-Michel Marin, are giving mecredit for ﬁrst proposing ABC. Thanks! Although,frankly, I never thought that would be a useful al-gorithm except in problems with simple suﬃcientstatistics.

Fabri:

But you do not seem to have followed upmuch on these ideas later, even if you have usedthem. Also you do not label yourself as a Bayesianor a frequentist, even if all these papers made ex-traordinary contributions to Bayesian inference withfundamental and big ideas.

Don:

First of all, fundamentally I am hostile toall “religions.” I recently heard a talk by Raghu inBamberg, Germany, where he said that in his worldthey have zillions of gods, and I think that is right;you should have zillions of gods, one for this goodidea, one for that good idea. And diﬀerent peoplecan create diﬀerent gods to whatever extent theywant to. I am not a fully-pledged member of theBayesian camp—I like being friends with them, butI never want to be religiously Bayesian. My attitudeis that any complication that creates problems forone form of inference creates problems for all formsof inference, just in diﬀerent ways. For example, thefact that confounded treatment assignments causeproblems for frequentist inference is obvious. Doesit generate problems for the Bayesian? Yeah, thatpoint was made in the 1978 Annals paper: Random-ization matters to a Bayesian, although not in thesame way as to a frequentist, that is, not as the basisfor inference, but it aﬀects the likelihood function.There is something I am currently working on witha Ph.D. student, Viviana Garcia, that builds on a F. LI AND F. MEALLI paper I wrote with Paul Rosenbaum in 1984 (Rosen-baum and Rubin, 1984b), which is the only Bayesianpaper that Paul has ever written, at least with me.In that paper, we did some simulations to show thereis an eﬀect on Bayesian inference of the stoppingrule. We show that if you have a stopping rule anduse the “wrong” prior to do the analysis, like a uni-form improper prior, but the data are coming from a“correct” prior, and you look at the answer you getfrom the right prior and from the “wrong” prior,they are diﬀerent. The portion of the right poste-rior that you cover using the “wrong” posterior isincorrect. This extends to all situations and it is re-lated to all of these ignorability theorems, and itmeans that you need to have the right model withrespect to the right measure. Of course achievingthis is impossible in practice and, therefore, leads tothe need for frequentist (Neymanian) evaluations ofthe operating characteristics of Bayesian procedureswhen using incorrect models (Rubin, 1984). Bayesworks, in principle, there is no doubt, but it can beso hard! It can work, in practice, but you must havesome other principles ﬂoating around somewhere toevaluate the consequences—how wrong your conclu-sions can be. So you must have something to fallback on, and I think that is where these frequentistevaluations are extremely useful, not the uncondi-tional Neyman–Pearson frequentist evaluations forall point mass priors (which were critical as mathe-matical demonstrations that we cannot achieve theideal goal in any generality), but evaluations for theclass of problems that you are dealing with in yoursituation.

Fan:

The 1984 Annals paper “Bayesianly Justi-ﬁable and Relevant Frequency Calculations for theApplied Statistician” (Rubin, 1984) is one of my all-time favorite papers. This paper, as the earlier paperby George Box (Box, 1980), deals with the “cali-brated Bayes” paradigm with generality, which canbe viewed as a compromising or mid-ground betweenthe Bayesian and frequentist paradigms. It has aprofound inﬂuence on many of us. In particular, RodLittle has strongly advocated “calibrated Bayes” asthe 21st century roadmap of statistics in several ofhis prominent talks, including the 2005 ASA Presi-dent’s Invited Address and the 2012 Fisher Lecture.What was the background and reasons for you towrite that paper?

Don:

Interesting question. I was visiting Box atthe Mathematics Research Center in 1981–1982 and wrote Rubin (1983) partly during that period—I think it’s a good paper with some good ideas, butwithout a satisfying big picture. That dissatisfactionled to that 1984 paper—what is the big picture? Ittook me a very long time to “get it right,” but itall seems very obvious to me now. The idea of pos-terior predictive checks has been further articulatedand advanced in Meng (1994), Gelman, Meng andStern (1996), and the multiauthored book “BayesianData Analysis” (Gelman et al., 1995, 2003, 2014).

Fabri:

Can you talk a little more about the“Bayesian Data Analysis” book, probably one of themost popular Bayesian textbooks?

Don:

Yup, I think that the Gelman et al. bookmight be THE most popular Bayesian text. Itstarted out as notes by John Carlin for a Bayesiancourse that he taught when I was Chair sometimein the mid or late 1980s. Andy must have been aPh.D. student at that time, with tremendous en-ergy for scholarship. John was heading back to Aus-tralia, which is his homeland, and somehow the de-partment had some extra teaching money, and wewanted to keep John around for a year—I do notremember the details. But I do remember that theidea of turning the notes for the course into a fulltext was percolating. Also Hal Stern was an Asso-ciate Professor with us at that time, and so the fourof us decided to make it happen. We basically di-vided up chapters and started writing. Even thoughJohn’s initial notes were the starting basis, thingschanged as soon as Andy “took charge.” Quickly,Andy and Hal were the most active. Andy, withHal, were even more dominant in the second edition,where I added some parts, edited others, but clearlythis was Andy’s show. The third edition, which justcame out in early 2014, was even more extreme, withAndy adding two coauthors (David Dunson and AkiVehtari) because he liked their work, and they hadbeen responsive to Andy’s requests. As the old manof the group, I just requested that I be the last au-thor; Andy obviously was the ﬁrst author, and thesecond and third were as in the ﬁrst edition. In someways, I feel like I’m an associate editor of a journalthat has Andy as the editor! We get along ﬁne, andclearly it’s a successful book.

Fan:

A revolutionary development in statisticssince the early 90s was the MCMC methodology.You left your mark in this with Gelman, proposingthe Gelman–Rubin statistic for convergence check(Gelman and Rubin, 1992), which seems to be verymuch connected to some of your previous work.

CONVERSATION WITH D. B. RUBIN Don:

Correct. We embedded the convergencecheck problem into the combination of the multipleimputation and multiple chains frameworks, usingthe idea of the combining rules for MI. The idea ofusing multiple chains—that comes from physics—and was Andy’s knowledge, not mine. My contri-bution was to suggest using modiﬁed MI combiningrules to help do the assessment of convergence. Theidea is powerful because it is so simple. If the start-ing value does not matter, which is the whole point,then it doesn’t matter, period. The real issue shouldbe how you choose the functions of the estimandsthat you are assessing, and as always, you want con-vergence to asymptotic normality to be good forthese functions, so that the simple justiﬁcation forthe Gelman–Rubin statistic is roughly accurate.

THE 1990S: COLLABORATING WITHECONOMISTS

Fabri:

In the 1990s, you started to work witheconomists. With Joshua Angrist, and particularlywith Guido Imbens, you wrote a series of very in-ﬂuential papers, connecting the potential outcomesframework to causal inference with instrumentalvariables. Can you tell us how this collaborationstarted?

Don:

Absolutely. I always liked economics; manyeconomists are great characters! It was in the early90s when Guido came to my oﬃce as a junior fac-ulty member in the Harvard Economics Departmentand basically said, “I think I have something thatmay interest you.” I had never met him before, andhe was asking if the concept of instrumental vari-ables already had a history in statistics. Guido andJosh Angrist had already deﬁned the LATE (localaverage treatment eﬀect) in an Econometrica pa-per (Imbens and Angrist, 1994)—although I thinkCACE (Complier Average Causal Eﬀect) is a muchbetter name because it is more descriptive and moreprecise—local can be local for anything, local forBoston, local for females, etc. Then I asked in re-turn, “Well tell me the setup, I have never heard ofit in statistics before” and while he was explaining Istarted thinking, “Gosh, there is something impor-tant here! I have never seen it before,” and then Isaid, “Let’s meet tomorrow and talk about it more,”because these kinds of assumptions (monotonicityand the “exclusion restriction”) were fascinating tome, and it was clear that there was something therethat I had never really thought hard about; it was great. That eventually led to the instrument vari-ables paper (Angrist, Imbens and Rubin, 1996) andthe later Bayesian paper (Imbens and Rubin, 1997).A closely related development was a project Iwas consulting on for AMGEN at about the sametime, for a product for the treatment of ALS (amy-otrophic lateral sclerosis), or Lou Gehrig’s disease,which is a progressive neuromuscular disease thateventually destroys motor neurons, and death fol-lows. The new product was to be compared to thecontrol treatment where the primary outcome wasquality of life (QOL) two years post-randomization,as measured by “forced vital capacity” (FVC), es-sentially, how big a balloon you can blow up. Infact, many people do not reach the end-point of two-year post-randomization survival, and so two-yearQOL is “truncated” or “censored” by death. Peo-ple were trying to ﬁt this problem into a “missingdata” framework, but I realized right away that itwas something diﬀerent.

Fan:

Essentially both ideas are special cases of thegeneral idea of Principal Stratiﬁcation, which we candiscuss in a moment.

Don:

Yes, indeed. These meetings with Guido andthis way of thinking were so much more articulatedand close to the thinking of European economists inthe 30s and 40s, like Tinbergen and Haavelmo, thanmany subsequent economists who seemed sometimesto be too into their OLS algebra in some sense.There was some correspondence between one of thetwo—Haavelmo, I think—and Neyman on these hy-pothetical experiments on supply and demand. Eu-ropean brains were talking to each other, and notsimply exchanging technical mathematics!

Fabri:

I know that many years before you metGuido, with other statisticians, like Tukey, you haddiscussions about the way economists were treatingselection problems, or missing data problems. Butyou had some adventurous, to say the least, previ-ous experiences with economists dealing with prob-lems that you had worked on, which they had almostneglected completely.

Don:

Yes, James Heckman was tracking my workin the early 1980s when I came to Chicago afterETS. The public exchange came out in the ETSvolume edited by Howard Wainer (which is whereGlynn, Laird and Rubin, 1986, appears), with com-ments from Heckman, Tukey, Hartigan and others.

Fabri:

Economics is a ﬁeld where the idea ofcausality is crucial; did you ﬁnd interest in eco-nomics also for this very reason? The problems theyhave are usually very interesting. F. LI AND F. MEALLI

Fig. 5.

In classroom at Harvard, late 1980s.

Don:

There are often interesting questions fromsocial science students that come up in class. Onerecent example is how do we answer questions like“What would the Americas be like if they werenot settled by Europeans?” I asked the questioner,“Who would they be settled by instead? By the Chi-nese? By the Africans? What are you talking about?What are we comparing the current American worldto?” Another example comes from an undergradu-ate thesis that I directed, by Alice Xiang, which wonboth the Hoopes Prize and the economics’ HarrisPrize for an outstanding honors thesis. The thesis is on the causal eﬀect of racial aﬃrmative actionin law school admissions on some outcomes versusthe same proportion of aﬃrmative action admissionsbut counter-factually based on socioeconomic sta-tus. This is not just for cocktail conversation—it wasa case recently before the US Supreme Court, Fisherv. University of Texas, which was kicked back to thelower court to reconsider, and additionally the issuewas recently aﬀected by a state law in Michigan.There is an amicus brief sent to the US SupremeCourt to which Guido (Imbens), former Ph.D. stu-

Fig. 6. (Left to right) Guido Imbens, Don Rubin, Josh Angrist. March, 2014.

CONVERSATION WITH D. B. RUBIN dents, Dan Ho, Jim Greiner and I (with others) con-tributed.Such careful formulation of questions is somethingcritical, and to me is central to the ﬁeld of statis-tics. It is crucial to formulate clearly your causalquestion. What is the alternative intervention youare considering, when you talk about the causal ef-fect of aﬃrmative action on graduation rates or bar-passage rates? Immediately formulating the problemas an OLS regression is the wrong way to do this,at least to me. Fan:

You apparently have a long interest in law;besides the aforementioned “aﬃrmative action” the-sis, you have done some interesting work in appliedstatistics in law.

Don:

Yes. Paul Rosenbaum was, I think, the ﬁrstof my Harvard students who did something aboutstatistics in law. Either his qualifying paper or aclass paper in 1978 was on the eﬀect of the deathpenalty. Jim Greiner, another great Ph.D. studentof mine, who had a law degree before entering Har-vard Statistics, wrote his Ph.D. thesis (and subse-quently several important papers) on potential out-comes and causal eﬀects of immutable characteris-tics. He is now a full professor at the Harvard LawSchool. There were also several previous undergrad-uate students of mine who were interested in statis-tics and law, but (sadly) most went to law school.Since 1980, I have been involved in many legal top-ics.

THE NEW MILLENNIUM: PRINCIPALSTRATIFICATION

Fabri:

The work you did with Guido, as well asthe work on censoring due to death, led to your pa-per on Principal Stratiﬁcation (Frangakis and Ru-bin, 2002), coauthored with this brilliant student ofyours, Constantine Frangakis, who happens to beFan’s advisor.

Don:

Yes, Constantine is fabulous, but the origi-nal title of that paper was very long, same with thetitle of his thesis. It went on and on, with proba-bly a few Latin, a few Italian, a few French and afew Greek words! Of course I was exasperated, so Iconvinced him to simplify the paper’s title to “Prin-cipal Stratiﬁcation in Causal Inference.” He is bril-liant, so good that he has no trouble dealing withall the complexity in his own mind, but therefore hestruggles at times pulling out the kernels of all theseideas, making them simple.

Fan:

What do you think is the most remarkablething about the development of Principal Stratiﬁca-tion?

Don:

It is a whole new collection of ways of think-ing about what the real information is in causalproblems. Once you understand what the real in-formation is, you can start thinking about how youcan get the answers to questions that you wantto extract from that information; you always haveto make assumptions, and it forces you to expli-cate what these assumptions are, not in terms ofOLS, which no social scientist or doctor would re-ally understand—but in terms of scientiﬁc or medi-cal entities. And because you have to make assump-tions, be honest and state them clearly. For exam-ple, I like your papers (Mealli and Pacini, 2013;Mattei, Li and Mealli, 2013) about multiple post-randomization outcomes, where you discuss that forsome outcomes, exclusion restriction or other struc-tural assumptions may be more plausible.

Fabri:

Principal Stratiﬁcation is sometimes com-pared to other tools for doing so-called mediationanalysis—what is your view about inferring on me-diation eﬀects?

Don:

I think we (Don and Fabri) discussed a paperrecently in JRSS-A, and those discussions summa-rize my–our view on that. Essentially, some of thepeople writing about mediation seem to misunder-stand what a function is. They write down some-thing that has two arguments inside parenthesis,with a comma separating them, and they seem tothink that therefore something is well deﬁned!

Fan:

Even though causal inference has gained in-creasing attention in statistics and beyond, thereseems to be a lot of misunderstanding, misuse, mis-interpretation and mystifying of causal inference.Why? And what needs to be done to change?

Don:

I think it is partly because causal inference isa very diﬀerent topic from many topics in statisticsin that it does not demand a lot of technical ad-vanced mathematical knowledge, but does demanda lot of conceptual and basic mathematical sophisti-cation. Principal Stratiﬁcation is one such example.Writing down notation does not take the place ofunderstanding what the notation means and howto prove things mathematically. Also partly becausecausal inference has become a popular topic, it hasbeen ﬂooded with publications that are often donecasually. For some ﬁelds, it is important to bridgethe “old” (everything-based-on-OLS) thinking with F. LI AND F. MEALLI the newer ideas. That’s a battle Guido and I con-stantly had to deal with when writing our book (Im-bens and Rubin, 2015).

Fan:

You mentioned the book; when will it ﬁnallycome out? It has been forthcoming for the last tenyears or so.

Don: (Laughing) Come on, Fan, that’s not fair!Has it only been ten years? We have promised thepublisher (Cambridge University Press) that it willbe ready by September 30, 2013. It will be about500 pages, 25 chapters. It will be followed by anothervolume, dealing with topics that we could not get toin the volume due to length, such as principal strat-iﬁcation beyond IV settings, or because we believethe topics have not been sharply and cleanly formu-lated yet, such as regression discontinuity designs,or using propensity scores with multiple treatments.Also in this volume, we didn’t discuss so-called case–control studies, which are the meat of much of epi-demiology; it is very important to embed these stud-ies into a framework that makes sense, not just teachthem as a bag of tricks.

MENTORING, CONSULTING ANDEDITORSHIP

Fabri:

You have advised over 50 Ph.D. studentsand many BA students as well. This sounds like ajob interview, but what is your teaching philosophy?

Don:

My view is that one should approach teach-ing very diﬀerently depending on the kind of stu-dents you have and their goals. Harvard has tremen-dous undergraduate and graduate students, buttheir strengths vary and their objectives vary. A longtime ago I decided that I don’t have the desire orability to be an entertainer in class, that is, to en-tertain to get their attention. If they ﬁnd me enter-taining, ﬁne; but it is better if they ﬁnd the topic Iam presenting entertaining.

Fabri:

Many of your students went on to becomeleaders and not only in academia. And you oftensay that the thing that you are the most proud of isyour students. Though it is clearly impossible to talkabout them here one by one, can you share some ofyour fond memories of the students?

Don:

Fabri, that is a killer question unless we haveanother day for this. What I can say is that it hasbeen a great pleasure to supervise so many very tal-ented students. I could start listing my superb Ph.D.students at the University of Chicago and at Har- As of April 1, 2014, the book can be preordered onAmazon.com. vard. All of my Ph.D. students are talented in many,and sometimes diﬀerent, dimensions: among themthere are two COPSS award winners, one presidentof the ASA, one president of ENAR, two JSM pro-gram chairs, and other such honors, and many ofthem made substantial contributions to government,academia and industry.

Fan:

You also have advised a large number of un-dergraduate students on a wide range of topics. Thisis quite uncommon because some people ﬁnd men-toring undergraduates more challenging and less re-warding than mentoring graduate students. What isyour take on this?

Don:

I am not completely innocent on this charge.I have no interest in “babysitting” and trying tomotivate unmotivated students, either undergrad-uate or graduate. But Harvard does attract someextremely talented and motivated undergraduates,some of whom I had the pleasure to advise. Fivehave won Hoopes and other prizes for outstandingundergraduate theses.

Fabri:

Now let’s talk about writing, which bothFan and I, as many others, have some quite mem-orable ﬁrst-hand experience. You are known as aperfectionist in writing. As you mentioned, you arewilling to withdraw accepted papers if you are nota hundred percent satisﬁed with them.

Don:

Yes, as you guys know, I am generally a painin the neck as a coauthor. I have withdrawn three ac-cepted papers, and tried to improve them; all even-tually got reaccepted. One of these is the paper withyou guys and others on multiple imputation for theCDC Anthrax vaccine trial (Li et al., 2014). Youwere not too happy about it initially.

Fabri: (Laughing) Yeah, we tried to revolt with-out success. A diﬀerent question: How do you ap-proach rejections? Do you have some advice foryoung statisticians on that?

Don:

Over the years I had many papers immedi-ately rejected or rejected with the suggestion that itwould not be wise to resubmit. However, in almostall of these cases, this treatment led to markedlyimproved publications, somewhere. In fact, I thinkthat the drafts that have been repeatedly rejectedpossibly represent my best contributions. Certainly,the repeated rejections, combined with my tryingto address various comments, led to better exposi-tion and sometimes better problem formulation, too.The most important idea is: Do not think that peo-ple who are critics are hostile. In the vast majorityof cases, editors and reviewers are giving up theirtime to try to help authors, and, I believe, are often

CONVERSATION WITH D. B. RUBIN Fig. 7.

D. B. Rubin (on left) with Tom Belin (on right) and Tom’s daughter Janet (middle), Cambridge, 2008. especially generous and helpful to younger or inex-perienced authors. Do not read into rejection letterspersonal attacks, which are extremely rare. So myadvice is: Quality trumps quantity, and stick withgood ideas even when you have to do polite bat-tle with editors and reviewers—they are not perfectjudges, but they are, almost uniformly, on your side.More details of these are given in Rubin (2014b).

Fan:

In 1978, you became the Coordinating andApplications Editor of JASA. Is there anything par-ticularly unique about your editorship?

Don:

As author, I am willing to withdraw acceptedpapers. As a new editor, at least then, I was also will-ing to suggest to authors that they withdraw papersaccepted by the previous editors! I took some heatfor that at the beginning. I read through all the pa-pers that the previous editorial board had acceptedand were awaiting copyediting for publication; forthe ones that I thought were bad (I remember therewere about eight), I wrote, “Dear authors, I thinkyou should consider withdrawing this paper,” withlong explanations of why I thought it would be anembarrassment to them if the paper were published.Fabri knows that I can be brutally frank about suchsuggestions.

Fan:

Did they comply?

Don:

Yes, all but one. This one author fought, andI kept saying, “You have to ﬁx this up.” Eventually,the changes made the paper OK. For the other ones,the authors agreed with my criticisms: Just because the previous editor didn’t get a good reviewer orthey overlooked mistakes, does not mean the papershould appear. But I was not very popular, at leastat ﬁrst.

Fabri:

You have done a wide range of consulting.What is the role that consulting plays in your re-search?

Don:

To me consulting is always a stimulatingsource of problems. As I mentioned before, for exam-ple, propensity score technology partly came fromthe consulting work we did for June Reinisch.

Fabri:

One of the more controversial cases in whichyou are involved as a consultant is the US tobaccolitigation case, in which you represented the tobaccocompanies as an expert witness. Would you mindsharing some of your thoughts on this case?

Don:

Happy to. This comes from my family back-ground dealing with lawyers. We have a legal systemwhere certain things are legal, certain things are not.You should generally obey laws even if you don’t likethem, or you should try to change them. If a com-pany is making a legal product, and they are ad-vertising it legally under current laws, then acceptit or work to change the laws. If they lie, punishthem for lying, if that is legal to do. You never see acommercial for sporty cars that show the cars goingaround corners extremely slowly and safely. How dothey advertise cars? They usually show them sweep-ing around corners, and say “Don’t do this on your F. LI AND F. MEALLI

Fig. 8.

Celebrating Don’s 70th birthday at the Yenching Restaurant, Harvard Square, March 29, 2014. Front (left to right):Alan Zaslavsky, Elizabeth Stuart, Xiao-Li Meng, TE Raghunathan; Back (left to right): Fan Li, Elizabeth Zell, Fabrizia Mealli,Don Rubin. The restaurant has a dish named in Don’s honor, the “Rubin.” own.” Things that are enjoyable typically have un-certainties or risks associated with them. Flying toEurope to visit Fabri has risks!Certainly I do not doubt that no matter how Iwould intervene to reduce cigarette smoking, lungcancer rates would drop. But what intervention thatwould reduce smoking would involve reducing ille-gal conduct of the cigarette industry—that is theessence of the legal question.When I was ﬁrst contacted by a tobacco lawyer,I was very reluctant to consult for them, and I fearedstrong pressure to be dishonest, which was absentthroughout. The original topic was simply to com-ment on the ways the plaintiﬀs’ experts were han-dling missing data. On examination, their methodsseemed to me to be not the best available and, atworst, silly (e.g., when missing “marital status,” callthem “married”). As I continued to read these ini-tial reports, I was appalled that hundreds of billionsof dollars could be sought on the basis of such anal-yses. From a broader perspective, the logic underly-ing most of the analyses also seemed to me entirelyconfused. For example, alleged misconduct seemedto play no role in nearly all calculations, and phrasessuch as “caused by” or “attributable to,” were usednearly interchangeably and often apparently with-out thought. Should nearly a trillion dollars in dam-ages be awarded on the basis of faulty logic and badstatistical analyses because we “know” the defen-dant is evil and guilty? If the issue were assessing the tobacco industry a trillion dollar ﬁne for lying aboutits products, I would be amazed but mute. But thesereports were using statistical arguments to set thenumbers—is it acceptable to use bad statistics to setnumbers because we “know” the defendant is guilty?What sort of precedent does that imply? The ethicsof this consulting is discussed at some length in Ru-bin (2002).

Fabri:

We have talked quite a lot about statistics.Let’s talk about some of your other passions in life,for example, music, audio systems and sports cars.

Don:

There are other passions, too, and their orderis very age dependent (I leave more to your percep-tions). When a kid, for example, sports cars, bothdriving them and rebuilding them, was the top ofthose three hobbies. But age (poorer vision, slowerreﬂexes, more aches and pains, etc.) shifted the bal-ance more to music, both live and recorded—luckilymy ears are still good enough to enjoy these, but asmore age catches up, things may shift.

Fan and Fabri:

Well, it has been nearly three hourssince we started the conversation. Here is the ﬁnalquestion before letting you go for dinner: What isyour short advice to young researchers in statistics?

Don:

Have fun! Don’t be grumpy. If lucky, youmay live to have a wonderful 70th birthday celebra-tion! Video of the celebration is available at:

CONVERSATION WITH D. B. RUBIN ACKNOWLEDGMENTS

We thank Elizabeth Zell, Guido Imbens, Tom Be-lin, Rod Little, Dale Rinkel and Alan Zaslavsky forhelpful suggestions. This work is partially funded byNSF-SES Grant 1155697.REFERENCES

Angrist, J. , Imbens, G. W. and

Rubin, D. B. (1996).Identiﬁcation of causal eﬀects using instrumental variables(with discussion and rejoinder).

J. Amer. Statist. Assoc. Box, G. E. P. (1980). Sampling and Bayes’ inference in sci-entiﬁc modelling and robustness.

J. Roy. Statist. Soc. Ser.A

Brillinger, D. R. , Jones, L. V. and

Tukey, J. W. (1978).The management of weather resources. In

The Role ofStatistics in Weather Resources Management II . Report ofthe Statistical Task Force to the Weather Modiﬁcation Ad-visory Board. US Government Printing Oﬃce, Washington,DC. Dempster, A. P. , Laird, N. M. and

Rubin, D. B. (1977).Maximum likelihood from incomplete data via the EM al-gorithm.

J. R. Stat. Soc. Ser. B Stat. Methodol. Frangakis, C. E. and

Rubin, D. B. (2002). Principalstratiﬁcation in causal inference.

Biometrics Frumento, P. , Mealli, F. , Pacini, B. and

Rubin, D. B. (2012). Evaluating the eﬀect of training on wages in thepresence of noncompliance, nonemployment, and miss-ing outcome data.

J. Amer. Statist. Assoc.

Gelman, A. and

Rubin, D. B. (1992). Inference from itera-tive simulation using multiple sequences (with discussion).

Statist. Sci. Gelman, A. , Meng, X. L. and

Stern, H. (1996). Posteriorpredictive assessment of model ﬁtness via realized discrep-ancies.

Statist. Sinica Gelman, A. , Carlin, J. B. , Stern, H. S. and

Rubin, D. B. (1995).

Bayesian Data Analysis . Chapman & Hall, London.MR1385925

Gelman, A. , Carlin, J. , Stern, H. and

Rubin, D. B. (2003).

Bayesian Data Analysis , 2nd ed. CRC Press, NewYork.

Gelman, A. , Carlin, J. , Stern, H. , Dunson, D. , Ve-htari, A. and

Rubin, D. B. (2014).

Bayesian Data Anal-ysis , 3rd ed. CRC Press, New York.

Glynn, R. , Laird, N. M. and

Rubin, D. B. (1986). Se-lection modelling versus mixture modelling with nonignor-able nonresponse. In

Drawing Inferences from Self-SelectedSamples ( H. Wainer , ed.) 119–146. Springer, New York.

Hartley, H. O. (1956). A plan for programming analysisof variance for general purpose computers.

Biometrics Hartley, H. O. and

Hocking, R. R. (1971). The analysisof incomplete data.

Biometrics Holland, P. W. (1986). Statistics and causal inference.

J.Amer. Statist. Assoc. Imbens, G. W. and

Angrist, J. (1994). Identiﬁcation andestimation of local average treatment eﬀects.

Econometrica Imbens, G. W. and

Rubin, D. B. (1997). Bayesian inferencefor causal eﬀects in randomized experiments with noncom-pliance.

Ann. Statist. Imbens, G. W. and

Rubin, D. B. (2015).

Causal Inferencefor Statistics, Social, and Biomedical Sciences: An Intro-duction . Cambridge Univ. Press, New York.

Li, F. , Baccini, M. , Mealli, F. , Zell, E. R. , Fran-gakis, C. E. and

Rubin, D. B. (2014). Multiple impu-tation by ordered monotone blocks with application tothe anthrax vaccine research program.

J. Comput. Graph.Statist. Little, R. J. A. and

Rubin, D. B. (1987).

Statistical Anal-ysis with Missing Data . Wiley, New York. MR0890519

Little, R. J. A. and

Rubin, D. B. (2002).

Statistical Anal-ysis with Missing Data , 2nd ed. Wiley, Hoboken, NJ.MR1925014

Mattei, A. , Li, F. and

Mealli, F. (2013). Exploiting mul-tiple outcomes in Bayesian principal stratiﬁcation analysiswith application to the evaluation of a job training pro-gram.

Ann. Appl. Stat. Mealli, F. and

Pacini, B. (2013). Using secondary out-comes to sharpen inference in randomized experimentswith noncompliance.

J. Amer. Statist. Assoc.

Meng, X.-L. (1994). Posterior predictive p -values. Ann.Statist. Neyman, J. (1990). On the application of probability theoryto agricultural experiments. Essay on principles. Section 9.

Statist. Sci. Neyman, J. (1934). On the two diﬀerent aspects of the rep-resentative method: The method of stratiﬁed sampling andthe method of purposive selection.

J. Roy. Statist. Soc. Rosenbaum, P. R. and

Rubin, D. B. (1983a). The centralrole of the propensity score in observational studies forcausal eﬀects.

Biometrika Rosenbaum, P. R. and

Rubin, D. B. (1983b). Assessingsensitivity to an unobserved binary covariate in an obser-vational study with binary outcome.

J. R. Stat. Soc. Ser.B Stat. Methodol. Rosenbaum, P. R. and

Rubin, D. B. (1984a). Reducingbias in observational studies using subclassiﬁcation on thepropensity score.

J. Amer. Statist. Assoc. Rosenbaum, P. R. and

Rubin, D. B. (1984b). Sensitivity ofBayes inference with data-dependent stopping rules.

Amer.Statist. Rubin, D. B. (1972). A non-iterative algorithm for leastsquares estimation of missing values in any analysis of vari-ance design.

J. R. Stat. Soc. Ser. C. Appl. Stat. Rubin, D. B. (1974). Estimating causal eﬀects of treatmentsin randomized and nonrandomized studies.

J. EducationalPsychology Rubin, D. B. (1976). Inference and missing data.

Biometrika F. LI AND F. MEALLI

Rubin, D. B. (1977). Assignment to treatment group on thebasis of a covariate.

J. Educational Statistics Rubin, D. B. (1978a). Bayesian inference for causal ef-fects: The role of randomization.

Ann. Statist. Rubin, D. B. (1978b). Multiple imputations in samplesurveys—A phenomenological Bayesian approach to non-response (with discussion and reply). In

The Proceedings ofthe Survey Research Methods Section of the American Sta-tistical Association

Imputation and Editingof Faulty or Missing Survey Data . U.S. Dept. Commerce,Bureau of the Census, Washington, DC.

Rubin, D. B. (1981). The Bayesian bootstrap.

Ann. Statist. Rubin, D. B. (1983). A case study of the robustness ofBayesian methods of inference: Estimating the total in a ﬁ-nite population using transformations to normality. In

Sci-entiﬁc Inference, Data Analysis, and Robustness (Madison,Wis., 1981) . Publ. Math. Res. Center Univ. Wisconsin Rubin, D. B. (1984). Bayesianly justiﬁable and relevantfrequency calculations for the applied statistician.

Ann.Statist. Rubin, D. B. (1987a).

Multiple Imputation for Nonresponsein Surveys . Wiley, New York.

Rubin, D. B. (1987b). A noniterative sampling/importanceresampling alternative to the data augmentation algorithmfor creating a few imputations when fractions of missinginformation are modest: The SIR algorithm. Discussion of “The calculation of posterior distributions by data aug-mentation” by M. Tanner and W. H. Wong.

J. Amer.Statist. Assoc. Rubin, D. B. (1990a). Formal modes of statistical inferencefor causal eﬀects.

J. Statist. Plann. Inference Rubin, D. B. (1990b). Comment on “Neyman (1923) andcausal inference in experiments and observational studies.”

Statist. Sci. Rubin, D. B. (1994). Comment on “Missing data, impu-tation, and the bootstrap” by Bradley Efron.

J. Amer.Statist. Assoc. Rubin, D. B. (1995). Bayes, Neyman, and calibration. Dis-cussion of Berk, Western and Weiss.

Sociological Method-ology Rubin, D. B. (1996). Multiple imputation after 18+ years(with discussion and rejoinder).

J. Amer. Statist. Assoc. Rubin, D. B. (2002). The ethics of consulting for the tobaccoindustry. Special issue on “Ethics, statistics and statisti-cians”.

Stat. Methods Med. Res. Rubin, D. B. (2010). Reﬂections stimulated by the commentsof Shadish (2010) and West and Thoemmes (2010).

Psy-chol. Methods Rubin, D. B. (2014a). Converting rejections into positivestimuli. In

Past, Present, and Future of Statistical Science ( X. Lin et al. , eds.) 593–603. CRC Press, New York.

Rubin, D. B. (2014b). The importance of mentors. In