[PDF] The Recomputation Manifesto

Abstract

Replication of scientific experiments is critical to the advance of science. Unfortunately, the discipline of Computer Science has never treated replication seriously, even though computers are very good at doing the same thing over and over again. Not only are experiments rarely replicated, they are rarely even replicable in a meaningful way. Scientists are being encouraged to make their source code available, but this is only a small step. Even in the happy event that source code can be built and run successfully, running code is a long way away from being able to replicate the experiment that code was used for. I propose that the discipline of Computer Science must embrace replication of experiments as standard practice. I propose that the only credible technique to make experiments truly replicable is to provide copies of virtual machines in which the experiments are validated to run. I propose that tools and repositories should be made available to make this happen. I propose to be one of those who makes it happen.

Full PDF

aa r X i v : . [ c s . G L ] A p r THE RECOMPUTATION MANIFESTO

IAN P. GENT, 12 APRIL 2013VERSION 1:

REV ISION : 9479 Computational experiments should be recomputable for all time Recomputation of recomputable experiments should be very easy Tools and repositories can help recomputation become standard It should be easier to make experiments recomputable than not to The only way to ensure recomputability is to provide virtual machines Runtime performance is a secondary issue

Replication of scientiﬁc experiments is critical to the advance of science. Unfortunately,the discipline of Computer Science has never treated replication seriously, even though com-puters are very good at doing the same thing over and over again. Not only are experimentsrarely replicated, they are rarely even replicable in a meaningful way. Scientists are beingencouraged to make their source code available [13], but this is only a small step. Even inthe happy event that source code can be built and run successfully, running code is a longway away from being able to replicate the experiment that code was used for.I propose that the discipline of Computer Science must embrace replication of experimentsas standard practice. I propose that the only credible technique to make experiments trulyreplicable is to provide copies of virtual machines in which the experiments are validated torun. I propose that tools and repositories should be made available to make this happen. Ipropose to be one of those who makes it happen.I am using the word ‘recomputation’ to mean the replication of computational experiments.This is the ‘recomputation manifesto’, not the ‘replication manifesto’. The word ‘replication’has a well-established technical meaning in computing. I am adding a technical meaningto an existing word, but the new meaning adds only a nuance and also serves usefully todistinguish replication of computational experiments from other types of experiment.

1. Computational experiments should be recomputable for all time.

Why recomputable? And why for all time?The necessity for recomputation is simple. An experiment is used to form scientiﬁc conclu-sions, and those conclusions are never ﬁnal. They may always be subject to question, soexperimental work may need to be repeated. I would even argue that the less experimentsare recomputed, the more important it is that they be recomputable. The reason is thatﬂaws in an experiment that is frequently recomputed are likely to come to light, but not As well as being critical, it is a timely issue, witness this in

Nature , 3 April 2013 [17]. The word ‘recomputation’ is older than the USA, reported by the

Oxford English Dictionary in 1766. Itdoes have some technical meanings, e.g. [18], but none as widespread as those for ‘replication’.

REV ISION : 9479 so if the experiment is rarely recomputed. The more rarely an experiment is repeated,the longer the time lapse until it is, and the less likely a new researcher will be able torecompute it easily without the original experiment being deliberately made recomputable.If the original experiment is ﬂawed in some way, misleading results can lie uncorrected inthe literature for years [1].Should experiments be recomputable for all time? Yes. Computer Science is unique inhaving this ability. Imagine if physicists had the opportunity to look through Galileo’stelescope at Jupiter’s moons, but had thrown them away on the basis that a description ofhow to make it was available. There is no reasonable limit to what is the useful life of anexperiment is, so the simplest answer is to have no limit. Moore’s law, while it lasts, makes ittechnically feasible to store experiments forever. It’s aﬀordable to store experiments: if costper GB decreases exponentially, a ﬁxed sum is enough to store each GB in perpetuity. It iseven aﬀordable to rerun all computational experiments in history regularly. If researchersdonated 10% of CPU resources for new experiments to recomputing old ones, all 5 year oldexperiments could be rerun each year.

2. Recomputation of recomputable experiments should be very easy

Recomputation should be very easy. Ideally, very easy means clicking a button. This willnot always be possible, but we should make it as easy as we possibly can for experimentersto reproduce their own experiments and those of other researchers. A lovely example ofmaking it easy to rerun existing experiments is provided by the website runmycode.org ,which experimenters can use to provide a simple interface to test their code on diﬀerentinputs [19].An important point about this manifesto is that by recomputation I focus mainly on theexact replication of an experiment. No other choice is available since (as I argue below)we have to duplicate the exact experimental conditions to ensure that recomputation ispossible. Drummond argues powerfully that this - which he calls ‘replicability’ - is thepoor cousin of ‘reproducibility’, where experiments are reproduced with changes to factorsbelieved to be insigniﬁcant [7]. There is no question that key advances in science should bereproducible in Drummond’s richer sense, but neither do I accept that “the impoverishedversion, replicability, is one not worth having” [7]. To take a non-computational example,it would be wonderful if the original experiments on cold fusion could be replicated exactly.If there was some technical ﬂaw with the experiments, it could be discovered, or if not theeﬀects that led to the apparent conclusion that cold fusion existed could be investigated.Once we have ensured that experiments can be recomputed exactly, we can have the luxuryof seeking to ensure that experiments can be reproduced in richer ways.

3. Tools and repositories can help recomputation become standard

I draw an analogy with version control tools. Learning and using subversion, git, or mercu-rial adds complication to one’s life. But the beneﬁts are enormous, and without question itis easier overall to build and maintain code using source code control systems than without.Similarly, repositories such as github or bitbucket greatly enhance development and shar-ing of code. Once we have the appropriate tools and repositories we in Computer Sciencewill have an enormous advantage over other sciences. Many sciences have massive onlinerepositories, of course leveraging advances in Computer Science to their beneﬁt. We can Assuming computing power doubles every 18 months, CPU resources increase by 10 times every 5 years.

HE RECOMPUTATION MANIFESTO 3 be unique in having massive online repositories of fully realised experiments, not just thedata resulting from experiments. We should research and make available tools and repos-itories for the recomputation of scientiﬁc experiments. Sadly, tools for recomputation arerelatively lacking to date, although there are promising signs. In a single PhD thesis, PhilipGuo introduced Burrito, a tool for electronic lab notebooks, and CDE, an automatic pack-aging tool for experiments in linux [11]. Other important recent tools include “veriﬁablecomputational results” [8], “Sumatra” [6] and HAL [14]. As well as runmycode.org, otherrepositories helping to make experiments replicable are myExperiment.org [16], SHARE[10], and the specialist journal “Image Processing Online” ( ) in which eacharticle is accompanied by runnable code.

4. It should be easier to make experiments recomputable than not to

How can it be easier to make new experiments recomputable than not to bother? I drawyour attention to Jon Claerbout’s words [4]: “It’s not really for the beneﬁt of other people. Experience shows the principalbeneﬁciary of reproducible research is you the author yourself.”

I’ve been doing major computational experiments for 20 years: some of them I am ratherproud of. But whenever I start on a new set of experiments, my heart sinks a little. I expectthe pain of getting everything set up, and then I expect the diﬃculty or impossibility ofreproducing my experiments if, say, the paper is rejected and I need to rerun them for alater submission. We should ﬁx this. Tools should be available to make it easier to runexperiments, encourage good experimental practice, and simultaneously make them recom-putable. An experienced experimenter’s heart should sink no more than an experiencedprogrammer when starting a new program. Just as programmers have comfortable devel-opment environments they can use, experimenters should have a range of environments todevelop experiments in. Repositories of past experiments - one’s own and other people’s -should act like software libraries to aid rapid development of new experiments.

5. The only way to ensure recomputability is to provide virtual machines

While the other points of the manifesto state how I think the world should be, this point isan empirical statement about how the world is. There are two reasons that virtual machinesare - at least for now - the only way to go. The ﬁrst is the almost universal (I suspect)experience among anyone who has built software and especially tried to rebuild it later. Abuild of the resulting program (and hence experiment) can fail due to arbitrary changesin the machine being used. This applies even where the machine is physically the sameone that was used months ago to run an experiment the ﬁrst time. An innocent softwareupdate, to a package not obviously used by the build software, can cause disaster. If themachine is not the same one or a clone thereof, all bets are oﬀ. The only way to ensure anexperiment is recomputable is to make available a virtual machine identical to one whichwas tested and worked when the experiment was originally run. The second reason is thatthe available computers and operating systems change over time. It may be true that codeyou make available today can be built with only minor pain by many people on currentcomputers. That is unlikely to be true in 5 years, and hardly credible in 20.A classic example illustrates both these reasons. SHRDLU is one of the most famousprograms in the history of AI [20]. The complete source code is available, but cannot berun. The physical machine and OS it ran on can be emulated, but we do not have the exact

IAN P. GENT, 12 APRIL 2013 VERSION 1:

REV ISION : 9479 machine state that enabled the program to run. Amongst other issues, Terry Winogradchanged his own copy of Lisp, changes that are now lost [5]. What we need – and shouldprovide for new experiments – is an exact virtual machine in which the experiment worked.There are isolated cases of researchers providing virtual machines for recomputation, suchas Brown [2, 3] and SHARE [10]. This should be the standard way Computer Scientistsdo business. Howe makes this case much more extensively, giving 13 reasons why virtualmachines in the cloud can improve reproducibility [12].This does not mean that all uploads and downloads to repositories should be of full virtualmachines. For ease of use, as well as reducing bandwidth, it’s important that tools provideas many ways as possible of allowing experiments to be uploaded and downloaded.

6. Runtime performance is a secondary issue

If a measure is deterministic, e.g. the data structure resulting from a deterministic algo-rithm, it should be identical every time the experiment is recomputed. We cannot guaranteethe same results in non-deterministic measures such as cpu time. By using virtual machines,we may obtain diﬀerent results as are seen on a physical machine, and future results onvirtual machines may diﬀer from current ones. Even where serious eﬀorts are made, there isno guarantee that reproducible run-time results can be obtained. For example, even thoughmax-clique researchers have a standard methodology for cross-referencing cpu times, Prosserhas shown that these results cannot be relied on even approximately [15].We must tackle the easier problems ﬁrst. Ensuring recomputation of experiments is not easy,but it is vital. Obtaining meaningfully replicable cpu time comparisons requires signiﬁcantadditional research – if it is even possible. By allowing recomputation of experiments, weallow researchers to do them in a variety of environments, discovering if conclusions aboutrun times are universal or contingent. The crucial thing is to preserve scientiﬁc experiments.It’s unarguable that if we can’t recompute an experiment at all, we can’t preserve run timeperformance.

The recomputation.org Mission

To help make recomputation a reality, I am starting http://recomputation.org as onerepository for recomputable experiments, and starting to work on the tools to use on thesite. Based on my arguments above I have the following plans. The overriding goal is thefollowing mission statement:

If we can compute your experiment now, anyone can recompute it 20 yearsfrom now

The site recomputation.org is brand new and (as I write) holds zero experiments. Butthe following are the principles which we will use in developing it to deliver the missionstatement.1. recomputation.org will make available virtual machines or equivalent technologyto allow exact recomputation of lodged experiments.2. recomputation.org will make its best eﬀorts to ensure that all experiments whichit believes to be recomputable will remain recomputable for all time.3. recomputation.org will always be free to those lodging bona-ﬁde scientiﬁc exper-iments and to those obtaining past experiments, provided that: all aspects of the

HE RECOMPUTATION MANIFESTO 5 experiments are freely available; the experimenter’s contributions are open source;and fees are not charged for the related scientiﬁc publication. recomputation.org will provide its code and tools using an appropriate open sourcelicence, including server-side code.5. recomputation.org will serve as a testbed for scientiﬁc research into issues such asexperimental techniques and methodologies. Vote Recomputation

The recomputation manifesto is similar in spirit to the ‘Science Code Manifesto’ but ad-dresses an orthogonal issue. The Science Code Manifesto, sciencecodemanifesto.org ,demands that code used in scientiﬁc publications should be made available in an open wayfor the useful lifetime of the publication. But this is neither a necessary condition forrecomputation nor suﬃcient for it. Closed source experiments can be recomputed if theappropriate environment is provided – e.g. a suitable virtual machine containing the neces-sary binaries but not source. Arguably recomputability is more important for closed sourcethan for open source. But it must not be thought that open source is suﬃcient for recom-putability. The only guarantee of recomputability is the exact environment being available.I do endorse the Science Code Manifesto, and initial eﬀorts at http://recomputation.org will be focussed on open source eﬀorts only. As well as the scientiﬁc desirability of opensource, it avoids one form of potential licencing problems in recomputation.A manifesto is a call that people reading it should vote for your point of view. Don’t votewith a signature or a petition. Vote by making your computational experiments recom-putable. Do it at http://recomputation.org , or at your own web site, or at anotherrepository. But make your experiments recomputable.

References [1] J. Christopher Beck, Patrick Prosser, and Richard J. Wallace. Trying again to fail-ﬁrst. In Boi Faltings,Adrian Petcu, Fran¸cois Fages, and Francesca Rossi, editors,

CSCLP , volume 3419 of

Lecture Notes inComputer Science , pages 41–55. Springer, 2004.[2] C. Titus Brown. Our approach to replication in computational science. http://ivory.idyll.org/blog/replication-i.html , 2012.[3] C Titus Brown, Adina Howe, Qingpeng Zhang, Alexis B Pyrkosz, and Timothy H Brom. A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data. 2012.[4] Jon Claerbout. Reproducible computational research: A history of hurdles, mostly overcome. .[5] Semaphore Corporation. SHRDLU resurrection. ,2011.[6] Andrew Davison. Automated tracking of computational experiments using sumatra. In

EuroSciPy: 3rdEuropean meeting on Python in Science , 2010.[7] Chris Drummond. Replicability is not reproducibility: Nor is it good science.

Proceedings of the Twenty-Sixth International Conference on Machine Learning: Workshop on Evaluation Methods for MachineLearning IV , 2009.[8] Matan Gavish and David L. Donoho. Three dream applications of veriﬁable computational results.

Computing in Science and Engineering , 14(4):26–31, 2012. The point of the caveats is that free service might not be oﬀered to authors who will not make sourcecode available, or to publications such as journals who charge either authors or readers for access. Equally,there is major value in making non-open experiments recomputable too, so if funding is available it may beappropriate to allow these to be freely deposited also.

IAN P. GENT, 12 APRIL 2013 VERSION 1:

REV ISION : 9479 [9] I.P. Gent, S.A. Grant, E. MacIntyre, P. Prosser, P. Shaw, B.M. Smith, and T. Walsh. How not to doit.

Research Report Series-University of Leeds School of Computer studies LU SCS RR , 1997.[10] Pieter Van Gorp and Paul W. P. J. Grefen. Supporting the internet-based evaluation of research softwarewith cloud infrastructure.

Software and System Modeling , 11(1):11–28, 2012.[11] Philip J. Guo. Software tools to facilitate research programming. Ph.D. dissertation, Department ofComputer Science, Stanford University, 2012.[12] Bill Howe. Virtual appliances, cloud computing, and reproducible research.

Computing in Science andEngineering , 14(4):36–41, 2012.[13] Darrel C. Ince, Leslie Hatton, and John Graham-Cumming. The case for open computer programs.

Nature , 482(7386):485–488, february 2012.[14] Christopher Nell, Chris Fawcett, Holger H. Hoos, and Kevin Leyton-Brown. Hal: A framework forthe automated analysis and design of high-performance algorithms. In Carlos A. Coello Coello, editor,

LION , volume 6683 of

Lecture Notes in Computer Science , pages 600–615. Springer, 2011.[15] Patrick Prosser. Exact algorithms for maximum clique: A computational study.

Algorithms , 5(4):545–587, 2012.[16] David De Roure, Carole A. Goble, and Robert Stevens. The design and realisation of the myexperimentvirtual research environment for social sharing of workﬂows.

Future Generation Comp. Syst. , 25(5):561–567, 2009.[17] Jonathan F. Russell. If a job is worth doing, it is worth doing twice.

Nature , 496:7, April 2013.[18] Christian Schulte.

Programming Constraint Services: High-Level Programming of Standard and NewConstraint Services , volume 2302 of

Lecture Notes in Computer Science . Springer, 2002.[19] Victoria Stodden, Christophe Hurlin, and Christophe Perignon. Runmycode.org: A novel disseminationand collaboration platform for executing published computational results. In eScience , pages 1–8. IEEEComputer Society, 2012.[20] Terry Winograd. Understanding natural language.

Cognitive Psychology , 3(1):1–191, 1972.

About the Author

Ian Gent is Professor of Computer Science at the University of St Andrews, Scotland. Hisinterest in the proper foundations of empirical science in computing date almost 20 years.He has given tutorials on “Empirical Methods in CS and AI” at conferences such as IJCAI2001. Of his non peer-reviewed papers, his most cited by far is “How Not To Do It”[9], a collection of embarrassing mistakes he and colleagues have made in computationalexperiments. To show how good we are at not doing things right, we mis-spelt the name ofone of the authors!

Acknowledgements

I wish to thank many scientists I have discussed these issues with over the years, for examplemy fellow authors of [9]. I especially thank Patrick Prosser for his tenacious pursuit of repli-cation of past experiments and stories of his struggles in achieving them, Ewan (not Ewen)MacIntyre for accepting our mis-spelling of his name, Adam Barker, and Lars Kotthoﬀ. Ialso thank more recent colleagues, including Edwin Brady, Chris Jeﬀerson, Steve Linton,Ian Miguel, Pete Nightingale, Karen Petrie, Aaron Quigley, and Jonathan Ward.