aa r X i v : . [ s t a t . M E ] J un Statistical Science (cid:13)
Institute of Mathematical Statistics, 2013
Another Conversation with Persi Diaconis
David Aldous
Abstract.
Persi Diaconis was born in New York on January 31, 1945.Upon receiving a Ph.D. from Harvard in 1974 he was appointed Assis-tant Professor at Stanford. Following periods as Professor at Harvard(1987–1997) and Cornell (1996–1998), he has been Professor in the De-partments of Mathematics and Statistics at Stanford since 1998. He isa member of the National Academy of Sciences, a past President ofthe IMS and has received honorary doctorates from Chicago and fourother universities.The following conversation took place at his office and at Aldous’shome in early 2012.
Key words and phrases:
Bayesian statistics, card shuffling, exchange-ability, foundations of statistics, magic, Markov chain Monte Carlo,mixing times.
1. MARKOV CHAINS, MIXING TIMES ANDMONTE CARLO
Aldous:
You were interviewed in October 1984for a
Statistical Science conversation article [7], soI won’t ask about your earlier personal and academiclife, but try to pick up from that point. You andI were both involved, in the early 1980s, with thestart of the topic now often labeled “Markov chainsand mixing times” [34]. Can you tell us your recol-lections of early days, and give some overview of howthe whole topic has developed over the last 30 years?
Diaconis:
That’s been the main focus of my worksince the 1980s, and it started for me with an ap-plied problem. I was working at Bell Labs and wewere simulating optimal strategies in various gamesand needed a lot of random permutations. The stan-dard way is to pick a random number between 1 and n and switch it with 1, then pick a random number This is an electronic reprint of the original articlepublished by the Institute of Mathematical Statistics in
Statistical Science , 2013, Vol. 28, No. 2, 269–281. Thisreprint differs from the original in pagination andtypographic detail. between 2 and n and switch it with 2, etc. If you dothat n − n times that would beenough . . . Aldous: . . . by an easy coupling argument . . .
Diaconis: . . . but it wasn’t clear if that was theright answer. Eventually Mehrdad Shahshahani washere and we realized we could set it up as a problemin Fourier analysis and carefully do the Fourier anal-ysis on a noncommutative group and get the rightanswer and it turned out [26] to be n log n , whichwhen n = 52 gives 103. That was for me the start ofit. And in 1983 you wrote this article [1] on mixing D. ALDOUS times for Markov chains. Around the same time JimReeds had become interested in riffle shuffling. Hehad reinvented a model that Gilbert and Shannonhad invented and had numerical results and ideasbut couldn’t seem to push them through. You andI started to talk about it and invented stopping timearguments [3] that turned out to give good answersin some cases. Spurred on by these two examplesI started to think hard about mixing times. Aroundthe same time, what I now call “the Markov chainMonte Carlo revolution” [11] began with the paperby Geman and Geman [30]. So the topic of mixingtimes became about more than just card shuffling,it was also about how long should you run a simula-tion until it converges. I say now, as I said then, thatif you take any application of MCMC in a real prob-lem and ask if we theoreticians can give a sensibleanswer to a practitioner about “how long . . . ,” thenwe can’t. These are open research problems, everyone of them. We have ideas, we have heuristics, butas math problems they are really open.A fitting proof of that is the following: The Metro-polis algorithm, Glauber dynamics, the Gibbs sam-pler and molecular dynamics were all invented tosolve one problem, the problem of random place-ment of hard discs in a box. Take, say, two-dimen-sional discs of radius ε in the unit square. You wantthem to be uniform random subject to nonoverlap.The Metropolis algorithm is that you pick a disc atrandom, you try to move it a little, if it’s possibleto move then do it, if not then try another pick.Glauber dynamics is similar. But as far as I know—despite billions of steps of simulations over the last60 years—nobody has ever sampled from anythingclose to the stationary distribution, in the interest-ing case of high disc density. There’s supposed tobe a phase transition around 81%, but the algo-rithms have no hope of converging near that pointand yet people get numbers from the simulationsand talk about them. I think the same goes for sta-tistical algorithms too—people who don’t want tothink about it just run the simulation until some-thing seems to have settled down. So I think the cur-rent state of the art is there’s a ton of research stillto be done; everyone finds us theoreticians annoy-ing prigs for asking what can you show rigorously.But it’s not just being annoying. In enough cases thealgorithms really don’t converge, and people don’tseem to want to own up to that. In [21, 22] we triedpretty seriously to do the hard discs problem, butthere’s still a very long way to go. Aldous:
Now there’s a distinction between “don’tconverge” and “we can’t prove they do converge”. . . And there’s an argument that in practice oneuses “black box” methods like MCMC in compli-cated situations where you don’t have any nice struc-ture, whereas to do any theoretical analysis you needto assume some structure, so we (perhaps) are ina Catch-22 situation where one can do theory forMCMC only in situations where you wouldn’t actu-ally use it. And then you have to rely on the heuris-tics that applied researchers have developed.
Diaconis:
Well at least for the chemists I talk to,who study molecular dynamics, they haven’t con-verged, they’re in some kind of local minimum, andreally dramatically new ideas are needed. I think avery interesting research question is to look at thezoo of diagnostic techniques that are available to-day, and look at the hundreds of examples of Markovchains about which we know everything. Take someof those examples and diagnostic techniques and tryto see how they behave. That seems like a very rea-sonable thing to do. I’ve tried for 20 years to geta graduate student to do this, but somehow I can’tget anyone to sit down and do the work. I shouldhave learned I need to do it myself. They’re hardmath problems. The diagnostics can be pretty so-phisticated; they’re not just second eigenvalue butinvolve sups and infs of complicated functionals.We do have a lot of machinery and they’re nicemath problems so this project would be useful tohelp evaluate diagnostics. What’s annoying to me ishow little that problem is recognized. If you go to aStatistics meeting, in talk after talk somebody runsthe Gibbs sampler because that’s the standard thingto do, and they say they ran it 10,000 times and itseemed to be OK, and they just go on with whatthey’re doing. People don’t even try to prove thechain does what it’s supposed to do, that is, havethe desired stationary distribution.
2. BAYESIAN STATISTICS
Aldous:
Let me move on to Bayesian statistics,which has been a recurrent feature of your research.Maybe I should remind readers that 30 years agothis was completely unrelated to Markov chains, butnow a major use of MCMC is for computing Bayesposteriors—but let’s leave MCMC for later. I’m notcompetent to ask good questions here, so let me justthrow out two points and then I’ll sit back and listen.(a) You have work, such as the 1986 paper [14]with David Freedman, that addresses foundational
NOTHER CONVERSATION WITH PERSI DIACONIS technical (rather than philosophical) issues in Bayes-ian statistics.(b) There is a recent newsletter piece by Mike Jor-dan [32] summarizing comments by many leadingBayesian statisticians (including you) on open prob-lems in Bayesian statistics.So I guess I am asking for your thoughts on thehistory and current state of methodological/techni-cal aspects of Bayesian statistics. Diaconis:
I came into Statistics late in life, be-coming aware of the Bayesian position when I wasa graduate student at Harvard. Art Dempster andFred Mosteller were Bayesians—not everyone, BillCochran wouldn’t dream of doing anything Bayesian.I read de Finetti’s work and found it frustrating andfascinating, as I still do today, but it was inspiringand so I tried to make my own sense out of it. One ofthe things I noticed was that de Finetti’s theorem in-volves an infinite exchangeable sequence. I wonderedwhether there could be a finite version, saying thesequence is almost a mixture of IIDs. In fact, I wrotemy first paper on that topic when I was still a grad-uate student [8]. That was when I came to meetthe Berkeley people—David Freedman and LesterDubins and David Blackwell, the latter two beingBayesians of various stripes—and, in fact, DavidFreedman’s thesis (published as [27]) was about deFinetti’s theorem for Markov chains. The story, in anutshell, is that David was a precocious but difficultyoung man who wanted to do a thesis in Probabil-ity, and (the way David told it) he went into Feller’soffice and Feller looked up, said, “prove de Finetti’stheorem for Markov chains,” looked back down, andDavid left. So he went and did it. In order to provethe theorem, he needed to assume the chain wasstationary. When I met him at a Berkeley-Stanfordjoint colloquium, I said that I knew how to do itwithout stationarity. I could make a finite versionof the theorem and it didn’t need anything like sta-tionarity. He agreed to listen, and Lester did too.Lester was very dismissive, but David wasn’t, andthat led to our work on finite versions of both theMarkov and the IID cases [12, 13].I’ve written far too many papers. I’ll try to distin-guish the ones that people seem to like into Statis-tics or Probability or something in between. Youpresented me with a list of papers . . .
Aldous: your 30 most cited papers, according toGoogle Scholar,
Diaconis: . . . and about a third are Statistics anda third are Probability and a third are in between, like de Finetti’s theorem, which I was interested infor philosophical reasons, trying to make sense of theway model-building goes. I like de Finetti’s take, fo-cusing on observables, and I’d like to understandjust what you need to assume about a process, interms of observables, in order for it to be a mixtureof standard parametric families, a mixture of expo-nential or normals or some other thing. That led toa lot of work [15]. That era seems to have quietedway down—nowadays no one works on exchange-ability particularly, though a few of us still dabblein it.About a year ago, some of our chemists here cameto me. They were working on a protein folding prob-lem with the IBM Blue Gene project. They’re re-ally doing protein folding—taking forty moleculesand ten thousand water molecules and then doingthe molecular dynamics to see how the protein foldsby using the equations of physics. It’s a very high-dimensional system—one particle is represented bytwelve numbers—and the chemists were coarse-graining and dividing this high-dimensional spaceinto maybe five thousand boxes. Their hope is thatwithin a box it will quickly get random—in the senseof invariant measure for a dynamical system—andthat jumps from box to box can be modeled assome Markov chain. Refreshingly to me, they wereBayesians, so they wanted to put a prior on transi-tion matrices and, because the laws of physics arereversible, they wanted the prior to live on reversiblechains. I realized that some earlier work with SilkeRolles [24] exactly gave the conjugate prior for re-versible Markov chains. I told them about it, theyimplemented it and they say it makes a big differ-ence. There’s a marvelous graduate student here,Sergio Bacallado, he’s a chemist, and he’s writtenpapers such as [5] in the Annals which extends ourwork on priors in more practical directions. There’ssomething very exciting here—our old work had hor-rible formulas involving quotients of Gamma func-tions and now someone is caring to get it right,and thinking it’s sensible. So that subject is quitealive and well today, although Sergio has taken it alot further. One of the main problems for Markovchain theory is to make the mixing time theory forcontinuous-space chains. There really are technicaldifficulties for continuous spaces, and he’s managedto get around that.Now in a larger view, it’s a very exciting time forBayesian statistics. When I first learned about it, inthe early 1970s, it was still Good and Savage, and
D. ALDOUS people were still arguing about whether an egg in afridge is rotten or not . . .
Aldous: . . . and the Bayesian lady tasting tea.
Diaconis:
I remember going to my first Valenciameeting. One of the world’s leading Bayesians, JohnPratt, a marvelous man, was analyzing some data,his wife’s estimates of upcoming gross receipts at acinema where she worked in Cambridge, MA. He wasdoing regression, and at the end he did an ordinaryleast squares but nothing Bayesian [35]. I asked himwhy not Bayesian? He said it was too hard to figureout the priors and it wouldn’t have made any differ-ence anyway. I was shocked and dumbfounded. Thatwas 1983, but since then we can actually implementBayesian methods. And we do. Now the judgementhas to be put off—frequentist methods have had 200years of people tinkering with them and we’re juststarting to use Bayesian methods. I think it’s reason-able to let time settle down before deciding whetherthey are better or worse. There are lots and lots ofgroups doing Bayesian analysis.One of the big tensions in Statistics, which is amystery to me, is really big data sets. You can try toestimate huge numbers of parameters with very fewdata points. Now I understand sometimes there’s astory that seeks to justify that, but it makes mevery, very nervous. If you try to think about be-ing a Bayesian in that kind of problem, it can’t bethat you have any idea about what priors you’reputting on, you’re completely making something up.It’s nothing other than a way of suggesting proce-dures. It might be useful, it might not be useful.There are a lot of people trying to do that, but it’sa completely different part of the world and I don’thave much feeling for it. It’s so taken over Statis-tics right at the moment that I feel compelled toput in the following sentence. There are huge datasets; there are also many, many small data sets.And that’s where the inferential subtleties matter.If you’re sick and you’re trying to think about a newprocedure for your tooth and there are two availableprocedures, with 10 or 50 instances of each . . . whatshould you do? Statistics encounters lots of prob-lems like this too. So it’s good to remember thatwhile there are huge data sets and that’s very excit-ing, there are also lots of small data sets and there’sstill room for the classical way of thinking aboutstatistical problems.
Aldous:
A cynical view is that there’s more moneyin the fields with big data sets.
Diaconis:
Tsk tsk (laughs), you won’t get any ar-gument from me.
3. TEACHING THE PHILOSOPHICALFOUNDATIONS
Aldous:
You teach an undergraduate course withBrian Skyrms on the philosophical foundations ofStatistics. You describe its topic as “10 great ideasabout Chance.” Now most readers of
Statistical Sci-ence have surely never taken, let alone taught, sucha course. Can you tell us about the course?
Diaconis:
Philosophers and statisticians havethought for a very long time about what proba-bilistic statements mean and how to combine dis-parate sources of information to reach a conclusion.These are still important questions and not ones towhich we know the answer. We begin our coursewith the first great idea, that probability can bemeasured—the emergence of equally likely cases, thefirst probability calculations. There is of course adiscussion of frequentism and of various kinds ofBayesians. Indeed, I.J. Good once wrote an arti-cle entitled 46656 varieties of Bayesians where hestates 11 “facets” like whether utility is emphasizedor avoided, whether physical probabilities are de-nied or allowed, and so on. We try to explain someof the different kinds of Bayesians. Brian and I areboth subjectivists—I am what I call a nonreligiousBayesian, that is, I find it useful and interesting andI don’t really care what you do. Some of the courseis pointing out the shallowness of naive frequentism.Bayesians are happy to talk about frequencies, inthat when you have a lot of data the data swampsthe prior, and you will use the frequency in order tomake your inferences. It’s not that Bayesians argueagainst frequencies, they’re happy to have a lot ofdata, and frequencies are forced on you by the math-ematics. So we discuss and prove those things. Wealso explain von Mises collectives, which have mor-phed into the complexity approach to probability.One of the things I find interesting that’s hardto make philosophy out of, is what I want to callthe von Mises pragmatic approach. If you ask work-ing statisticians what they think probability is, theysay, well, you do something a lot of times, and it’sthe proportion of times something happens. If youask about the probability Obama will be re-elected,they will respond with a cloud of words. Or they’llwalk away or say it’s too difficult to talk about.What von Mises said is that any scientific area haspractice and theory. He discusses geometry—there’sthe mathematical notion of circles and straight lines,then there’s practical architecture and drawing. Thetheory can be used, but at some point you have torelate the theory to the real world. I think that NOTHER CONVERSATION WITH PERSI DIACONIS sort of pragmatic approach to foundations is im-portant. But von Mises never tells you how to doso. I ask this question for differential equations. Ifsome guy writes down a differential equation, andthere’s a picture of water whirling around in a ves-sel with blockages—what does that equation have todo with the whirling of the water? In order to answerthat, many of us would say, “That’s what Statisticsis about.” Whether theory fits data is a statisticalquestion. So we can apply this to our own subject:does statistical theory fit the real world?Anyway, we hope to turn the course into a book,after several years of iterations. Aldous:
What kind of students take the course?
Diaconis:
About 70 students, undergraduates orgraduates in Statistics or Philosophy, and just in-terested other people, even some faculty attend. It’squite lively, there’s lots of discussion. We teach itonce a week for three hours, which is exhausting foreveryone concerned.Trying to think about why we do what we do is im-portant, but nobody talks about it. I tell the follow-ing two stories. One is about you, and one is aboutBrad Efron. At some stage you and I were talking,as we often do, and I said I was going to teach acourse on the Philosophy of Probability. And you gotquite irate, saying, “You’re just going to tell a bunchof words that won’t illuminate anything.” And mygood friend Efron got similarly very angry. He said,“That’s just going to be that Bayesian garbage,”reached into his pocket, took out a handful of coins,threw them, and said, “Look: Head, Tail, Tail, . . . — that’s random.” So people hear “Philosophy” andtake it in a religious way. To me, the question “iswhat you’re doing really about anything?” is worthdiscussing, and we’re just trying to talk about it.If you want to know what the problems in Bayesianstatistics are, ask a Bayesian. We know! It’s veryhard to put meaningful priors on high-dimensionalreal problems. And the choices can really make adifference. I’m going to give one example of that,just for fun. Suppose you’re teaching an elementaryProbability course. It’s the first day of term, youwalk into class, you see there are 26 students in theclass, so you decide to do the birthday problem. Hereare two thoughts about the birthday problem. First,if it doesn’t work, then it’s a disastrous way to starta course. Second, the usual calculation assumes eachday is equally likely. But my students are about thesame age, and there are more births on weekdaysthan weekend-days—that’s about a 20% effect—and then there are smaller seasonal effects. So the uni-formity might not be true for my class. We don’treally know what the probabilities are. So let meput a prior on ( p , p , . . . , p ). If your prior is uni-form on the simplex, then the key number of people(to have a 50% chance of some birthday coincidence)decreases from 26 to about 18. For the coupon col-lector’s problem, using a story that Feller suggested,the key number of people in a village (to have a50% chance that every day is someone’s birthday)is about 2300. That’s under the uniform multino-mial model. If instead you take the uniform prior onthe simplex, then—it’s a slightly harder calculationto do—but if I remember, the key number increasesto about 190,000. That’s a little surprising when youfirst hear it, but under the uniform prior some p i willbe around (1 / so you need order 365 peoplejust to have a good chance of having that one dayas a birthday. Aldous:
But isn’t this a good argument againstthe naive Bayesian idea of inventing priors that aremathematically simple but without any real-worldreason?
Diaconis:
Sure, and that was the point of the exer-cise. Bayesian statisticians should be thinking morecarefully about their priors. Part of that is under-standing the effect of different priors, and those aremath problems. In the birthday problem, mathshowed the prior didn’t have too much effect, whereasfor the coupon collector’s problem it had a huge ef-fect. Susan Holmes and I wrote a paper called
ABayesian peek into Feller volume
I [18] taking hiselementary problems and making Bayesian versionsof them. When does it make a difference and whennot? It’s a paper I like a lot.
Aldous:
A version of the nonuniform birthday prob-lem I give in my own “probability and the real world”course [2] is to take p i = 1 . × for half the daysand 0 . × for the other half. This makes sur-prisingly little difference—the key number decreasesfrom 23 to 22. And to avoid the possible disaster ofit failing with my students, I show the active rosterof a baseball team (easily found online; each MLBteam has a page in the same format) which con-veniently has 25 players and their birth dates. Thepredicted chance of a birthday coincidence is about57%. With 30 MLB teams one expects around 17teams to have the coincidence; and one can read-ily check this prediction in class in a minute or so(print out the 30 pages and distribute among stu-dents). D. ALDOUS
4. BOOKS: ON MAGIC AND ONCOINCIDENCES
Aldous:
On a lighter note, I have found myselffollowing in your footsteps in various aspects of aca-demic life, a minor such aspect being “unfinishedbooks.” The 1984 conversation refers to the book oncoincidences you were writing with Mosteller, andthere is a 1989 joint paper [23], but when can weexpect to see the book?
Diaconis:
Well, there were two books mentionedin that interview, and the other one, with Ron Gra-ham on mathematics and magic, has recently beenpublished [17]. So it took 27 years, but we did finishit. I’m starting to think about the coincidences bookagain. We’re sitting in my office and you see thosefolders up there . . .
Aldous:
I see about 15 of those very wide old open-ended cardboard files . . .
Diaconis: . . . those folders have newspaper clip-pings collected by Fred Mosteller over 30 years, andevery one has a few pages saying here’s a kind ofcoincidence we might study via a model, and here’ssome back-of-an-envelope calculation. I give a lot ofpublic talks, about 50 a year, and I had stopped giv-ing the talk on coincidences, but I’ve now committedto giving the talk again in a few weeks. That’s howI trick myself and get back into thinking about thetopic. So look for the book sometime in the nextfive years. I promised Fred (before he died in 2006)I would do it, and I’m going to gear up and doit.
Aldous:
The colorful story of you running awayfrom home at age 14 to do magic, then buying Fellerand teaching yourself enough mathematics to under-stand it, was told in the 1984 interview, and has be-come well known in our community. But I’ve jokedto students “if you meet the Queen of England, don’tslap her on the back; if you meet Persi Diaconis,don’t ask him to do a magic trick.” Now that youand Ron Graham have published the book on math-ematics and magic [17], could you tell us a littleabout what’s in the book?
Diaconis:
The reason I first got interested in math-ematics was via magic. I had hoped to call the book
Mathematics to Magic and Back , but the publishervetoed that, saying people wouldn’t get the idea.Now it’s called
Magical Mathematics: The Math-ematical Ideas that Animate Great Magic Tricks ,maybe a bit pompous. One of the things about math-ematics and magic is that if some person says,“I know a card trick,” you wince inside, becausethey’re going to deal cards into piles on the ta- ble, and everyone’s going to fall asleep. How longuntil I can change the conversation? We’re inter-ested in good magic tricks, which are performableand don’t look mathematical, but which have somemath behind them. Some of the math turns out tobe pretty interesting. Most of the tricks are oneswe invented ourselves, which is why we don’t getstrung up for revealing secrets; the magic commu-nity doesn’t like that, but we seem to get awaywith it. There’s not much probability in the book—there’s some material on riffle shuffles and that sortof thing—and some old tricks of Charles Jordan thatwe made mathematical sense of. To whet your ap-petite, there’s a chapter on the connection betweenriffle shuffles and the Mandlebrot set.
Aldous:
Science has a notion of progress—onecould take any scientific topic and write a nontech-nical article on progress in that topic over the last30 years. Is there an analog of progress in magic?
Diaconis:
Here I’m a bit negative. The final chap-ters are about who are the current stars—who is in-venting tricks that are new and really different? Thepeople we describe are old or now departed. Theyounger people don’t seem to be inventing math-based tricks. But in the coming quarter I’ll be teach-ing a course on mathematics and magic here at Stan-ford, so I’m trying to cultivate young people myself.Magic is changing in many ways, and the main oneis again negative. Because of Wikipedia and youtubethere are very few secrets any more. You could bewatching a show and type the right words into yoursmartphone and get an explanation, and this won’tgo away. It’s profoundly changing magic, likely notfor the better.Now I do have a positive hope—maybe this willencourage people to invent new and better tricks.Also . . . when I was a kid, I was once hanging aroundwith my magic mentor Dai Vernon at a billiard par-lor. Billiards is a very refined game, the gentleman’sversion of pool. Now pool halls are notoriously rowdy,smoke-filled with gambling and drinking. This was agroup of people, seated around two masters, playingthree-cushion billiards. The crowd was silent asidefrom an occasional quiet ooh of appreciation. Ver-non looked at me and said, “Wouldn’t it be won-derful if people watched magic that way.” If peoplewould learn a bit more about magic and appreci-ate the skill and presentation, then maybe it wouldbecome like watching a classical violinist. Those aremy dreams about how exposure might change magicfor the better.
NOTHER CONVERSATION WITH PERSI DIACONIS
5. COLLABORATION WITH DAVIDFREEDMAN
Aldous:
We’ve already mentioned David Freed-man, my long-time colleague at Berkeley, and per-haps your major collaborator, who sadly died in2008. I regarded him as one of the handful of peo-ple in our business who are unique—there was no-body like
David. I mentally pictured him as MycroftHolmes (Sherlock’s smarter older brother, who ap-pears briefly in several stories to give sage advice)and I recall you having some “bright light” image.Can you tell us some things about your collabora-tions and about David’s impact on the field?
Diaconis:
I first met David at a Berkeley-Stanfordjoint colloquium barbeque at Tom Cover’s house.I had read his thesis when I was a graduate student,so I had something to say to him. He was a verycrusty character. He had a kind of “gee shucks, I’mjust a farm boy” outer style, but he was in fact thedebating champion of Canada. He was an honestman, and there aren’t so many of them . He could bedifficult. There’s an image—that I heard from JimPitman who maybe heard it from Lester Dubins—of David working on a problem: you’d ask him aquestion and he would berate you and say that’sstupid, but then he would get down and focus. Andwhen he was focused it was like there was this verybright clear light on a narrow part of the problem,and then it would shift slightly over and focus ona next part. That was how he worked. He wasn’t aquick glib guy.At some stage he decided that the main impact hecould make in Statistics was what he called defen-sive statistics, which was trying to make an art andscience out of critiquing knee-jerk modeling and thewild misuse of probability models. He was as effec-tive as anyone ever has been at that. Was he actuallyeffective? Maybe not in our business, but he has afollowing in some of the social sciences and that’smarvelous. He certainly made me very sensitized tothe misuse of models.
Aldous:
And me too.
Diaconis:
Now it’s easy to just criticize modeling,but what should we do about it? I wrote a paperabout my version of David’s argument which wascalled
A place for philosophy? The rise of modelingin statistical science [9]. I tried to make a list of whatwe can do. David’s approach to what we should dowas embodied in the last book he wrote [28]. Hespent years writing out with infinite clarity abouttopics he had such scorn for. I had never quite un-derstood why he put so much energy into expound- ing (e.g.) the Cox proportional hazards model or themysteries of regression. Then he said to me, as if itwere obvious, though it hadn’t occurred to me be-fore: “If I say it really, really clearly, then people willsee how crazy it is.”David was a brilliant mathematician. I miss himdaily, because we used to chat all the time. AndI could ask him anything, from “where to eat” to finepoints of nonmeasurable sets. This continued until afew years before his death. We had written 33 paperstogether, and I’m a shoot-from-the-hip guy in writ-ing first drafts, and David was very careful, and veryartful in his prose, and finally we got rather tired ofeach other, like an old married couple—we felt wehad heard everything the other had to say. I foundhis constant negativity draining, and he found myconstant enthusiasm draining. But we had been apretty good pair for a long time.Right now, Laurent Saloff-Coste and I [25] are try-ing to make a little theory of “who needs positivity?”What happens when you start convolving signedmeasures? Infinite products are often not well-defin-ed. I’m sure there’s some technical way of fixingthat. It’s the kind of thing where David would havesaid, “Let’s think about it,” and some nice mathwould have come out of it. Now, with David gone,I don’t know who to ask about such things, I don’tknow who cares about measure theory any more.
Aldous:
But we all figure you have 57 collabora-tors, so you always have somebody to call.
Diaconis:
I do have a lot of collaborators, andthat’s an absolute joy, though there’s a cost. Youhave to own up to how little you know, and not beafraid to make a fool of yourself.
6. MORE COLLABORATORS
Aldous:
Because you have had a huge number ofcollaborators, we might apologize in advance to anywho are not mentioned in this conversation. In the1984 conversation you emphasized Martin Gardinerand Fred Mosteller and Charles Stein and DavidFreedman as the people you had interacted with andbeen influenced by the most by that time. Are thereothers during your later career, not already men-tioned, who you would like to talk about?
Diaconis:
Well, there’s you, with exchangabilityand card shuffling and mixing in MCMC, and statis-tics and probability in the real world. And Lau-rent Saloff-Coste, an analyst who I’ve converted tobe somewhat of a probabilist. He was visiting DanStroock, and at that time was very far from prob-
D. ALDOUS ability, and we got into an argument, and he wasright and I was wrong.I’ve written a lot of papers with Ron Graham. Hetried to hire me when I got my Ph.D. I rememberknocking on his office door at Bell Labs, where hewas running the math and statistics group. I openedthe door and there was this man with a net attachedto his waist belt and going up to the ceiling. Hewas practicing 7 ball juggling and the net caughtdropped balls so he didn’t have to pick then up offthe floor, and I thought, this guy’s great.I’ve written papers with Susan Holmes, my wife,and that has its complexity. One of the most stress-ful things, for each of us, is to hear the other give atalk on our joint work. You sit there thinking, “No,no, no, that’s not the way to say it,” and you haveto keep quiet. We’ve all had this experience with agraduate student, but when it’s your wife it’s rad-ically worse. I’ve just finished writing a paper [16]with Susan and Jason Fulman that was based on acasino card shuffling machine that we were asked toanalyze and could in fact analyze. This was done tenyears ago and the machine didn’t work, so it wasn’tso polite to publish back then.I don’t write so many papers with my graduatestudents—they should get the credit for their work—but one I have resumed working with is Jason Ful-man. I enjoy working with him because he startswith a natural algebraic bent, but I taught himto look at a formula and look for some probabil-ity story, and he’s great at it. I have also startedwriting papers with Sourav Chatterjee. He’s movingtoward the probability-physics field, but I’m encour-aging him to keep some connection with statistics.
7. NETWORKING
Diaconis:
I’m an extremely social statistician. Thatis, it’s a lot of fun to go ask somebody something.You need to be not too proud, to not be embarrassedabout what you don’t know. If someone asks you aquestion, and you don’t know the answer, then sug-gest someone else who might know—try to be help-ful. I do this all the time—asking and answering,helping other people and having them help you—but most people don’t. Learning social skills is un-dervalued in the research community. There’s a joyin having a community, in having people who knowwhat you’re doing.
Aldous:
As a related aspect of social skills, I tell in-coming graduate students that the faculty are friend-ly but busy; they won’t come talk to you, but you can make the effort to go talk to them. Also, I sayto pay attention to your cohort of students—somewill become eminent in the future—and they alwayslaugh.
Diaconis:
Sometimes when interviewing postdocs,they think they can come to Stanford and have you work on their problem. Or they just want to workon their own thing by themselves. It’s a lot better toread some paper by the person you want to interactwith, and say, “Can we talk about that?”, at leastas a way of getting started. It’s a simple thing to do,but most people don’t do it.
8. OLD TOPICS NEVER DIE
Aldous:
You recently sent me an email from coun-try X saying that most of the people you talked towere our generation and still working on the samekind of topics that had established their careers. I’vealways liked the well-known quote from von Neu-mann [37]:As a mathematical discipline travels farfrom its empirical source, or still more, ifit is a second and third generation onlyindirectly inspired by ideas coming from“reality” . . . there is a grave danger thatthe subject will develop along the line ofleast resistance, that the stream, so farfrom its source, will separate into a mul-titude of insignificant branches, and thatthe discipline will become a disorganizedmass of details and complexities.Of course math naturally grows in a “one thing leadsto another” way, but is there any test for whenenough has been done on a topic and it’s time tomove on?
Diaconis:
It’s a difficult question. Right now I’mdoing some work in algebraic topology, a subjectwith enormous depth, but many of the prominentpractitioners are involved in the minutiae of howthe big machine works and don’t bother to solvereal problems. They just think that if the machineis well enough developed, then it can solve any prob-lem that’s handed to it. I do think it’s important totry to focus on real world problems. A lot of mymotivation is MCMC, which is really used on realproblems, and, as I said earlier, we don’t know howto give theoretical analyses of MCMC on real prob-lems. So what we do is problems with nice structure,say, symmetry, and hope that will grow into some-thing useful. von Neumann’s quote is perfect—you
NOTHER CONVERSATION WITH PERSI DIACONIS Fig. 1.
Juliet Shaffer, Erich Lehmann, Persi Diaconis, 1997. make a small change in a solved problem, it’s stillnot real, you can’t do it but one of your studentsmakes progress, and an area grows and gets a name.It does happen that way.Of course it’s easy to criticize. One way I try tobe constructive is take a classic like the originalMetropolis algorithm applied to hard discs in a box.Can I prove anything about it? I worked very hard for five years with wonderful analysts. We wrote pa-pers [21, 22] in the best math journals. But our the-orems are basically useless as regards the real prob-lem.But again . . . sometimes things done because theywere beautiful as pure math, then 50 years later it’sjust what somebody needed. A reasonable case inpoint is partial exchangeability for matrices, which
Fig. 2.
Persi Diaconis, 2006. D. ALDOUS
Fig. 3.
Persi Diaconis, 2006.
David Freedman and I were working on in 1979, andyou independently came up with a proof. That wasan esoteric corner of probability, and soon the sub-ject went quiet for 20 years, but now it’s completelyre-emerged in contexts such as graph limits [20] andother parts of pure math [4]. People are looking backat the old papers and asking how did they do that.I just opened the
Annals of Probability and there’san article on free probability versions of de Finetti’stheorem. Is that probability, or some other area ofmath? It’s very hard to know what will turn out tobe useful.
Aldous:
An unconventional idea for a workshopwould be to invite senior people to talk about onenonrecent idea of theirs which has not been devel-oped or followed up by others, but which (the speakerthinks) should be. Following Hammersley [31], onemight call these “ungerminated seedlings of re-search.” Do you have any ideas in this category?
Diaconis:
There’s a problem that I worked on aspart of my thesis but have never managed to get anyone else interested in. It’s about summability.A sequence of real numbers that doesn’t converge inthe usual sense may be Abel or Cesaro summable.And there are theorems that say if a sequence issummable in scheme A, then it’s summable in schemeB. I noticed that any time there was such a knowntheorem, there was a probabilistic identity whichsaid that the stronger method was an average of theweaker method. So is there a kind of meta-theoremthat says this is always true?I once gave the Hardy Memorial Lecture at Cam-bridge and wrote a paper [10] titled
G. H. Hardyand Probability ??? with the three question marks.Hardy notoriously didn’t have much regard for ap-plied math of any sort, and probability was particu-larly low on his list. He hated being remembered forthe Hardy–Weinberg principle. I knew Paul Erd˝oswell, and he said that Hardy and Littlewood weregreat mathematicians, but if they had had any knowl-edge of probability at all, then they would havebeen able to prove the law of the iterated loga-rithm. That they certainly had the techniques butbecause they just couldn’t think probabilisticallytheir work on that particular problem was second-rate. Anyway, in the lecture I wove together suchstories and my own open problems about Tauberiantheory.
Aldous:
Outside academia you are perhaps bestknown for magic and for the “7 shuffles suffice” re-sult from your 1992 paper with Dave Bayer [6]. I’msure that features in every other interview you’vedone, so I won’t ask again here. More recent work ofyours that attracted popular interest was the 2007
Dynamical bias in the coin toss paper [19], assert-ing (by a mixture of Newtonian physics and exper-imental observation of the initial parameters whenreal people performed tosses) that there was abouta 50.8% chance for a coin to land the same wayup as tossed. I had two undergraduates actually dothe 40,000 tosses required to have a good chance ofdetecting this effect, but the results were ambigu-ous [33]. Have you or other people followed up onyour paper?
Diaconis:
Aside from your students, there’s a phys-ics group at Boston [38] who carefully repeated ourmeasurements of angular velocity etc., and a Polishgroup who have written a book [36] on the physics ofgambling. They reproduced our analysis and addedbouncing and air resistance, which we neglected.Speaking of coin-tossing, every year we get a callfrom ESPN and they want a two-minute spot on “is
NOTHER CONVERSATION WITH PERSI DIACONIS Fig. 4.
Philip Stark, Don Ylvisaker, Persi Diaconis, Larry Brown, Terry Speed and Ani Adhikari at the memorial for DavidFreedman, 2008. the coin toss in the Superbowl fair?” Of course theSuperbowl coin is a big thick specially minted ob-ject, and I don’t have much to say on that. I recentlygot a letter from a German
Gymnasium teacher whotried to make a biased coin by making one side ofbalsa wood, and he couldn’t do it. I wrote back say- ing that some coins are biased when you spin themon a flat surface, but for flipping in the air we canprove you can’t make it biased . . .
Aldous: . . . by conservation of angular momentum,which a high school physics teacher should know.You may recall that two of our colleagues have a
Fig. 5.
Persi Diaconis and Elizabeth Purdom, 2010. D. ALDOUS paper titled
You can load a die, but you can’t bias acoin [29].
9. MODERN TIMES
Aldous:
In the 1984 conversation, when askedabout the future you were wise enough not to makevery specific predictions about particular topics, butI do notice two points. You noted there was increas-ing collaboration—“more and more 2- or 3-authorpapers”—and we’re all aware this trend has contin-ued. The current (October 2011)
Annals of Statistics has only 2 out of 17 articles being single-authored,whereas going back 30 years (September 1981) itwas 10 out of 17. Incidently, the total length of the17 articles increased from under 200 pages to al-most 500 pages, a perhaps less predictable effect.Your second point, paraphrasing slightly, was “I’mglad Statistics is not that kind of high-pressure fieldwhere you have to publish every two weeks.” Buttoday we do have younger colleagues who publishfifteen papers per year.We can probably all agree that increased collab-oration is A Good Thing, but what about the in-creased number of papers and the implicit pressureon young people to publish more than in our day?
Diaconis:
Right now I’m on the hiring committeesfor both the Math and Stat departments, and it’snoticeable that even applicants straight out of gradschool have 3–10 papers on their CV, many of themin pretty good journals. How has that happened?When I was at that stage I just had some techni-cal reports. So it’s just a cultural change. We per-ceive an exponentially growing literature with justtoo many papers. People publish the most obscurethings. But then the ability to search on the weballows us to keep track, and, as I said earlier, some-times the most obscure-looking paper turns out tocontain just the right thing. And I should be the lastone to criticize there being too many papers, becauseI’m now writing almost ten papers a year. I wouldhate to have to choose which ones I shouldn’t havewritten.In our field we still referee, or pretend to referee,papers, and we all know it can take six months or ayear to get through. I do some work with physicistsand physics is largely an unrefereed subject. Theirlogic is that if somebody publishes a wrong result,the community becomes aware of it, and then thatgroup gets a bad reputation. It’s not that no onelooks at the paper at all; someone reads the abstractand scans the paper to check it looks reasonable.Then it gets published, in time maybe closer to three weeks than three months. So our field is moving inthat direction. Publication is less and less meaning-ful because of the arXiv. But as an author I find ituseful to imagine that some referee is going to readmy paper. It makes me take care about the detailsand the exposition.
Aldous:
Your answer in 1984 to “what does the fu-ture hold for you?” was “just going crazy, workinghard, learning more math.” I think we can agree thatprediction was correct. So let me ask the same ques-tion again, and ask for your thoughts on the futureof the field of Statistics, and ask for advice to some-one completing an undergraduate degree and con-templating starting a Ph.D. program in Statistics.
Diaconis:
Yes, I still like working hard and learn-ing more. Over my career Statistics has changedso drastically it’s almost unrecognizable. Compa-nies like Target predicting what their individual cus-tomers will want or can be persuaded to want—thiskind of aggressive analysis of massive data sets. Sothere’s a lot of new Statistics for someone like mewho’s classically trained. You have to find a partof it you want to learn. For example, I’m trying tothink about large networks via general models forrandom graphs. And for a theoretical statistician,looking at what applied people are doing and ask-ing, “Can I break it, can I do it better?” will alwaysgive us plenty to do.About what a youngster should do . . . for a start,you can’t learn too much about using computers.I lament that the academic statistics world doesn’tknow how to recognize and reward that skill appro-priately. There are people who are amazing hackersand that’s an invaluable skill, but they don’t getthe same credit as mathematically-focused people.I don’t know why this is, but it should change.
Aldous:
Presumably because of the traditional “re-search = papers in journals” equation—we’re so usedto assessing research contributions in that particularway. Even though there are journals like
Journal ofComputational and Graphical Statistics , they maybeare perceived as less prestigious.
Diaconis:
Another piece of advice is to read clas-sic papers. If there’s a topic that interests you, lookback at what the people who invented it actuallywrote. It gives you a more concrete sense of why theyinvented it and what it’s about, compared to read-ing textbooks. Nowadays people don’t pay enoughattention to such things—instead it’s “Let’s try itout and write a quick paper.”Statistics is as healthy as it’s ever been. One cansee the prominence of machine learning, but they
NOTHER CONVERSATION WITH PERSI DIACONIS are really just using ideas that were developed inStatistics twenty or fifty years ago. They are apply-ing them—that’s great—but we are inventing theideas that will be applied in the next twenty or fiftyyears. Statistics is a great field to be part of, andI’m still excited by it. ACKNOWLEDGMENTS
I thank Raazesh Sainudiin for proofreading a firstdraft. REFERENCES [1]
Aldous, D. (1983). Random walks on finite groups andrapidly mixing Markov chains. In
Seminar on Prob-ability, XVII . Lecture Notes in Math.
Aldous, D. (2012). On chance and unpredictabil-ity: 20 lectures on the links between mathemat-ical probability and the real world. Available at .[3]
Aldous, D. and
Diaconis, P. (1986). Shuffling cardsand stopping times.
Amer. Math. Monthly Aldous, D. J. (2010). Exchangeability and continuumlimits of discrete random structures. In
Proceedingsof the International Congress of Mathematicians.Volume I
Bacallado, S. (2011). Bayesian analysis of variable-order, reversible Markov chains.
Ann. Statist. Bayer, D. and
Diaconis, P. (1992). Trailing the dove-tail shuffle to its lair.
Ann. Appl. Probab. DeGroot, M. H. (1986). A conversation with Persi Di-aconis.
Statist. Sci. Diaconis, P. (1977). Finite forms of de Finetti’s the-orem on exchangeability.
Synthese Diaconis, P. (1998). A place for philosophy? The riseof modeling in statistical science.
Quar. Appl. Math
LVI
Diaconis, P. (2002). G. H. Hardy and probability???
Bull. Lond. Math. Soc. Diaconis, P. (2009). The Markov chain Monte Carlorevolution.
Bull. Amer. Math. Soc. (N.S.) Diaconis, P. and
Freedman, D. (1980). de Finetti’stheorem for Markov chains.
Ann. Probab. Diaconis, P. and
Freedman, D. (1980). Finite ex-changeable sequences.
Ann. Probab. Diaconis, P. and
Freedman, D. (1986). On the con-sistency of Bayes estimates.
Ann. Statist. Diaconis, P. and
Freedman, D. (1987). A dozen deFinetti-style results in search of a theory.
Ann. Inst. Henri Poincar´e Probab. Stat. Diaconis, P. , Fulman, J. and
Holmes, S. (2012).Analysis of casino shelf shuffling machines.
Ann.Appl. Probab.
To appear.[17]
Diaconis, P. and
Graham, R. (2011).
Magical Mathe-matics: The Mathematical Ideas that Animate GreatMagic Tricks . Princeton Univ. Press, Princeton, NJ.MR2858033[18]
Diaconis, P. and
Holmes, S. (2002). A Bayesian peekinto Feller volume I.
Sankhy¯a Ser. A Diaconis, P. , Holmes, S. and
Montgomery, R. (2007). Dynamical bias in the coin toss.
SIAM Rev. Diaconis, P. and
Janson, S. (2008). Graph limits andexchangeable random graphs.
Rend. Mat. Appl. (7) Diaconis, P. and
Lebeau, G. (2009). Micro-local anal-ysis for the Metropolis algorithm.
Math. Z.
Diaconis, P. , Lebeau, G. and
Michel, L. (2011).Geometric analysis for the metropolis algorithmon Lipschitz domains.
Invent. Math.
Diaconis, P. and
Mosteller, F. (1989). Methods forstudying coincidences.
J. Amer. Statist. Assoc. Diaconis, P. and
Rolles, S. W. W. (2006). Bayesiananalysis for reversible Markov chains.
Ann. Statist. Diaconis, P. and
Saloff-Coste, L. (2012). Convolu-tion powers of complex functions on Z . Unpublishedmanuscript.[26] Diaconis, P. and
Shahshahani, M. (1981). Generatinga random permutation with random transpositions.
Z. Wahrsch. Verw. Gebiete Freedman, D. A. (1962). Mixtures of Markov processes.
Ann. Math. Statist. Freedman, D. A. (2009).
Statistical Models: Theory andPractice , revised ed. Cambridge Univ. Press, Cam-bridge. MR2489600[29]
Gelman, A. and
Nolan, D. (2002). You can load a die,but you can’t bias a coin.
Amer. Statist. Geman, S. and
Geman, D. (1984). Stochastic re-laxation, Gibbs distributions, and the Bayesianrestoration of images.
IEEE Trans. Pattern Anal.Machine Intelligence Hammersley, J. M. (1972). A few seedlings of research.In
Proceedings of the Sixth Berkeley Symposium onMathematical Statistics and Probability (Univ. Cali-fornia, Berkeley, Calif., 1970/1971), Vol. I: Theoryof Statistics
Jordan, M. I. (2011). What are the open problems inBayesian statistics?
ISBA Bulletin Ku, P. , Larwood, J. and
Aldous, D. (2009). 40,000coin tosses yield ambiguous evidence for dynamical Levin, D. A. , Peres, Y. and
Wilmer, E. L. (2009).
Markov Chains and Mixing Times . Amer. Math.Soc., Providence, RI. MR2466937[35]
Pratt, J. and
Schlaifer, R. (1985). Repetitive as-sessment of judgmental probability distributions:A case study. In
Proc. Second Valencia Inter-national Meeting on Bayesian Statistics
Strzalko, J. , Grabski, J. , Stefanski, A. , Per-likowski, P. and
Kapitaniak, T. (2009).
Dynam-ics of Gambling: Origins of Randomness in Dynam-ical Systems . Springer, New York.[37] von Neumann, J. (1947). The mathematician. In
TheWorks of the Mind
Yong, E. and