[PDF] Re-run, Repeat, Reproduce, Reuse, Replicate: Transforming Code into Scientific Contributions

Abstract

Scientific code is not production software. Scientific code participates in the evaluation of a scientific hypothesis. This imposes specific constraints on the code that are often overlooked in practice. We articulate, with a small example, five characteristics that a scientific code in computational science should possess: re-runnable, repeatable, reproducible, reusable and replicable.

Full PDF

aa r X i v : . [ c s . G L ] J un Re-run, Repeat, Reproduce, Reuse, Replicate:Transforming Code into Scientiﬁc Contributions

Fabien C. Y. Benureau , , , ∗ , Nicolas P. Rougier , , INRIA Bordeaux Sud-Ouest, Talence, France Institut des Maladies Neurod´eg´en´eratives, Universit´e de Bordeaux, CNRS UMR 5293, Bordeaux, France LaBRI, Universit´e de Bordeaux, Bordeaux INP, CNRS UMR 5800, Talence, France ∗ Corresponding author: [email protected]

Introduction (R ) Replicability is a cornerstone of science. If an experimental result cannot be re-obtained by an inde-pendent party, it merely becomes, at best, an observation that may inspire future research (Mesirov,2010; Open Science Collaboration, 2015). Replication issues have received increased attention in re-cent years, with a particular focus on medicine and psychology (Iqbal, Wallach, Khoury, Schully, &Ioannidis, 2016). One could think that computational research would mostly be shielded from suchissues, since a computer program describes precisely what it does and is easily disseminated to otherresearchers without alteration.But precisely because it is easy to believe that if a program runs once and gives the expected resultsit will do so forever, crucial steps to transform working code into meaningful scientiﬁc contributions arerarely undertaken (Collberg & Proebsting, 2016; Sandve, Nekrutenko, Taylor, & Hovig, 2013; Schwab,Karrenbach, & Claerbout, 2000). Computational research is plagued by replication problems, in part,because it seems impervious to them. Contrary to production software who provide a service gearedtowards a practical outcome , the motivation behind scientiﬁc code is to test an hypothesis. While insome instance production software and scientiﬁc code are indistinguishable, the reason why they werecreated is diﬀerent, and, therefore, so are the criteria to evaluate their success. A program can fail as ascientiﬁc contribution in many diﬀerent ways for many diﬀerent reasons. Borrowing the terms coinedby Goble (Goble, 2016), for a program to contribute to science, it should be re-runnable (R ), repeatable(R ), reproducible (R ), reusable (R ) and replicable (R ). Let us illustrate this with a small example, arandom walk (Hughes, 1995) written in Python: import randomx = 0 for i in xrange (10):step = random.choice([-1,+1])x += step print x, Listing 1:

Random walk (R ) raw code, archive In the code above, the random.choice function randomly returns either +1 or -1. The instruction“ for i in xrange(10):" executes the next three indented lines ten times. Executed, this programwould display: 1

1, 0, -1, 0, -1, 0, -1, 0, 1, 2

Output

What could go wrong with such a simple program?Well… 2 e-runnable (R ) Have you ever tried to re-run a program you wrote some years ago? It can often be frustratingly hard.Part of the problem is that technology is evolving at a fast pace and you cannot know in advance howthe system, the software and the libraries your program depends on will evolve. Since you wrote thecode, you may have reinstalled or upgraded your operating system. The compiler, interpreter or set oflibraries installed may have been replaced with newer versions. You may ﬁnd yourself battling witharcane issues of library compatibility—thoroughly orthogonal to your immediate research goals—to ex-ecute again a code that worked perfectly before . To be clear, it is impossible to write future-proof code,and the best eﬀorts can be stymied by the smallest change in one of the dependencies. At the sametime, modernizing an unmaintained ten-year-old code can reveal itself to be an arduous and expensiveundertaking—and precarious, since each change risks aﬀecting the semantics of the program. Ratherthan trying to predict the future or painstakingly dusting oﬀ old code, an often more straightforwardsolution is to recreate the old execution environment . For this to happen however, the dependenciesin terms of systems, software and libraries must be made clear enough.A re-runnable code is one that can be run again when needed, and in particular more than the onetime that was needed to produce the results. It is important to notice that the re-runnability of a codeis not an intrinsic property. Rather, it depends on the context, and becomes increasingly diﬃcult asthe code ages. Therefore, to be and remain re-runnable on other researchers’ computers, a re-runnablecode should describe—with enough details to be recreated—an execution environment in which it isexecutable. As shown by (Collberg & Proebsting, 2016), this is far from being either obvious or easy. import randomx = 0walk = [] for i in range (10):step = random.choice([-1,+1])x += stepwalk.append(x) print (walk) Listing 2:

Re-runnable random walk (R ) raw code, archive In our case, the R version of our tiny walker seems to imply that any version of Python would beﬁne. This not the case: it uses the print instruction and the xrange operator, both speciﬁc to Python2. The print instruction , available in Python 2 (a version still widely used; support is scheduled to stopin 2020), has been deprecated in Python 3 (ﬁrst released in 2008, almost a decade ago) in favor or aprint function , while the xrange operator has been replaced by the range operator in Python 3. In orderto try to future-proof the code a bit, we might as well target Python 3, as is done in the R version.Incidentally, it remains compatible with Python 2. But whichever version is chosen, the crucial stephere is to document it. Repeatable (R ) The code is running and producing the expected results. The next step is to make sure that you canproduce the same output over successive runs of your program. In other words, the next step is to To be clear, and although virtual machines are often a great help here, this is not always possible. It is, however, always more diﬃcult when the original execution environment is unknown. repeatable output. Repeatability is valuable. If a run ofthe program produces a particularly puzzling result, repeatability allows you to scrutinize any stepof the execution of the program by re-running it again with extraneous prints, or inside a debugger.Repeatability is also useful to prove that the program did indeed produce the published results. Repeata-bility is not always possible or easy (Diethelm, 2012; Court`es & Wurmus, 2015). But for sequential anddeterministically parallels programs (Hines & Carnevale, 2008; Collange, Defour, Graillat, & Iakym-chuk, 2015) not depending on analog inputs, it often comes down to controlling the initialization of thepseudo-random number generators (RNG).For our program, that means setting the seed of the random module. We may also want to save theoutput of the program to a ﬁle, so that we can easily verify that consecutive runs do produce the sameoutput: eyeballing diﬀerences is unreliable and time-consuming, and therefore won’t be done system-atically. import randomrandom.seed(1) for i in range (10):step = random.choice([-1,+1])x += stepwalk.append(x) print (walk) open ('results-R2.txt', 'w') as fd:fd.write( str (walk)) Listing 3:

Re-runnable, repeatable random walk (R ) raw code, archive Setting seeds should be done carefully. Using 439 as a seed in the previous program would resultin ten consecutive +1 steps , which—although a perfectly valid random walk—lend itself to a grossmisinterpretation of the overall dynamics of the algorithm. Verifying that the qualitative aspects of theresults and the conclusions that are made are not tied to a speciﬁc initialization of the pseudo-randomgenerator is an integral part of any scientiﬁc undertaking in computational science; this is usually doneby repeating the simulations multiple times with diﬀerent seeds. Reproducible (R ) The R code seems ﬁne enough, but it hides several problems that come to light when trying to re-produce results. A result is said to be reproducible if another researcher can take the original code andinput data, execute it, and re-obtain the same result (Peng, Dominici, & Zeger, 2006). As explained byDonoho, Maleki, Rahman, Shahram, and Stodden (Donoho et al., 2009), scientiﬁc practice must expectthat errors are ubiquitous , and therefore be robust to them. Ensuring reproducibility is a fundamentalstep toward this: it provides other researchers the means to verify that the code does indeed producethe published results, and to scrutinize the procedures it used to produce them. As demonstrated byMesnard and Barba (Mesnard & Barba, 2016), reproducibility is hard. With CPython 3.3-3.6. See the next section for details. program will not produce the same results all the time. It will, because it isrepeatable, produce the same results over repeated executions. But it will not necessarily do so overdiﬀerent execution environments. The cause is to be found in a change that occurred in the pseudo-random number generator between Python 3.2 and Python 3.3. Executed with Python 2.7 to 3.2, thecode will produce the sequence -1, 0, 1, 0, -1, -2, -1, 0, -1, -2. But with Python 3.3 to 3.6, it will produce-1, -2, -1, -2, -1, 0, 1, 2, 1, 0. With future versions of the language, it may change still. For the R version,we abandon the use of the random.choice function in favor of the random.uniform function, whosebehavior is consistent across versions 2.7 to 3.6 of Python.Because any dependency of a program—to the most basic one, the language itself—can change itsbehavior from one version to another, executability (R ) and determinism (R ) are necessary but notsuﬃcient for reproducibility. The exact execution environment used to produce the results must also bespeciﬁed—rather than the broadest set of environments where the code can be eﬀectively run. In otherwords, assertions such as “the results were obtained with CPython 3.6.1” are more valuable, in a scien-tiﬁc context, than “the program works with Python 3.x and above”. With the increasing complexity ofcomputational stacks, retrieving and deciding what is pertinent (CPU architecture? operating systemversion? endianness?) might be non-trivial. A good rule of thumb is to include more information thannecessary rather than not enough, and some rather than none.Recording the execution environment is only the ﬁrst step. The R program uses a random seedbut does not keep a trace of it except in the code. Should the code change after the production of theresults, someone provided with the last version of the code will not be able to know which seed wasused to produce the results, and would need to iterate through all possible random seeds, an impossibletask in practice .This is why result ﬁles should come alongside their context, i.e. an exhaustive list of the parametersused as well as a precise description of the execution environment, as the R code does. The code itselfis part of that context: the version of the code must be recorded. It is common for diﬀerent results ordiﬀerent ﬁgures to have been generated by diﬀerent versions of the code. Ideally, all results shouldoriginate from the same (and last) version of the code. But for long or expensive computations, thismay not be feasible. In that case, the result ﬁles should contain the version of the code that was usedto produce it. This information can be obtained from the version control software. This also allows,if some errors are found and corrected after some results have been obtained, to identify which onesshould be recomputed. In R , the code records the git revision, and whether the repository holds un-committed changes when the computation starts.Published results should obviously come from version of the code where every change and everyﬁle has been committed. This includes pre-processing, post-processing and plotting code. Plotting codemay seem mundane, but it is as vulnerable as any other piece of the code to bugs and errors. Whenit comes to checking that the reproduced data match the one published in the article, however, ﬁgurescan reveal themselves to be imprecise and cumbersome, and sometimes plain unusable. To avoid hav-ing to manually overlay pixelated plots, published ﬁgures should be accompanied by their underlyingdata (coordinates of the plotted points) in the supplementary data to allow straightforward numericcomparisons.Another good practice is to make the code self-veriﬁable. In R , a short unit test is provided, thatallows the code to verify its own reproducibility. Should this test fail, then there is little hope of repro-ducing the results. Of course, passing the test does not guarantee anything. Here, with possibilities for a 10-step random walk, a seed matching the any possible sequence could certainly befound. For instance, seed 11235813 matches the results of R with Python 2.7. Such a search becomes intractable for a100-step walk.

5t is obvious that reproducibility implies availability . As shown in (Collberg & Proebsting, 2016),code is often unavailable, or only available upon request. While the latter may seem suﬃcient, changesin email address, changes in career, retirement, a busy inbox or poor archiving practices can makea code just as unreachable. Code and input data and result data should be available with the pub-lished article, as supplementary data, or through a DOI link to a scientiﬁc repository such as Figshare,Zenodo or a domain speciﬁc database, such as ModelDB for computational neuroscience. The codespresented in this article are available in the GitHub repository github.com/rougier/random-walk andat doi.org/10.5281/zenodo.848217. Online code repositories such as GitHub are not scientiﬁc repositories, and may disappear, change name or change theiraccess policy at any moment. Direct links to them are not perpetual, and, when used, they should always be supplementedby a DOI link to a scientiﬁc archive. Copyright (c) 2017 Nicolas P. Rougier and Fabien C. Y. Benureau import sys, subprocess, datetime, random def generate_walk():x = 0walk = [] for i in range (10): if random.uniform(-1, +1) > 0:x += 1 else :x -= 1walk.append(x) return walk if subprocess.call(('git', 'diff-index', '--quiet', 'HEAD')): print ('Repository␣is␣dirty,␣please␣commit␣first')sys.exit(1) print (walk)results = {'data' : walk,'seed' : seed,'timestamp': str (datetime.datetime.utcnow()),'revision' : revision,'system' : sys.version}with open ('results-R3.txt', 'w') as fd:fd.write( str (results)) Listing 4:

Re-runnable, repeatable, reproducible random walk (R ) raw code, archive To recap, reproducibility implies re-runnability and repeatability and availability, yet imposes ad-ditional conditions. Dependencies and platforms must be described as precisely and as speciﬁcally aspossible. Parameters values and inputs should accompany the result ﬁles. The data and scripts behindthe graphs must be published. Unit tests are a good way to embed self-diagnostics of reproducibility inthe code. Reproducibility is hard, yet tremendously necessary.

Reusable (R ) Making your program reusable means it can be easily used, and modiﬁed, by you and other people, in-side and outside your lab. Ensuring your program is reusable is advantageous for a number of reasons.7or you, ﬁrst. Because the you now and the you in two years are two diﬀerent persons. Detailson how to use the code, its limitations, its quirks, may be present to your mind now, but will probablyescape you in six months (Donoho et al., 2009). Here, comments and documentation can make a sig-niﬁcant diﬀerence. Source code reﬂects the results of the decisions that were made during its creation,but not the reasons behind those decisions. In science, where the method and its justiﬁcation matter asmuch as the results, those reasons are precious knowledge. In that context, a comment on how a givenparameter was chosen (optimization, experimental data, educated guess), why a library was chosenover another (conceptual or technical reasons?) is valuable information.Reusability of course directly beneﬁts other researchers from your team and outside of it. The eas-ier it is to use your code, the lower the threshold is for other to study, modify and extend it. Scientistsconstantly face the constraint of time: if a model is available, documented, and can be installed, runand understood all in a few hours, it will be preferred over another that would require weeks to reachthe same stage. A reproducible and reusable code oﬀers a platform both veriﬁable and easy-to-use, fos-tering the development of derivative works by other researchers on solid foundations. Those derivativeworks contribute to the impact of your original contribution.Having more people examining and using your code also means that potential errors have a higherchance to be caught. If people start using your program, they will most likely report bugs or malfunc-tions they encounter. If you’re lucky enough, they might even propose either bug ﬁxes or improve-ments, hence improving the overall quality of your software. This process contributes to the long-termreproducibility to the extent people continue to use and maintain the program.Despite all this, reusability is often overlooked, and it is not hard to see why. Scientists are rarelytrained in software engineering, and reusability can represent an expensive endeavour if undertakenas an afterthought, for little tangible short-term beneﬁts, for a codebase that might, after all, see only asingle use. And, in fact, reusability is not as indispensable a requirement as re-runnability, repeatabilityand reproducibility. Yet, some simple measures can tremendously increase reusability, and at the sametime strengthen reproducibility and re-runnability over the long-term.Avoid hardcoded or magic numbers. Magic numbers are numbers present directly in the sourcecode, that do not have a name and therefore can be diﬃcult to interpret semantically. Hardcoded val-ues are variables that cannot be changed through a function argument or a parameter conﬁgurationﬁle. To be modiﬁed, they involve editing the code, which is cumbersome and error-prone. In the R code, the seed and the number of steps are respectively hardcoded and magic.Similarly, code behavior should not be changed by commenting/uncommenting code (Wilson et al.,2017). Modiﬁcation of the behavior of the code, required when diﬀerent experiments examine slightlydiﬀerent conditions, should always be explicitly set through parameters accessible to the end-user. Thisimproves reproducibility in two ways: it allows those conditions to be recorded as parameters in theresult ﬁles, and it allows to deﬁne separate scripts to run or conﬁguration ﬁles to load to produce eachof the ﬁgures of the published paper. With documentation explaining which script or conﬁguration ﬁlecorresponds to which experiment, reproducing the diﬀerent ﬁgures becomes straightforward.8 Copyright (c) 2017 Nicolas P. Rougier and Fabien C.Y. Benureau import sys, subprocess, datetime, random def generate_walk(count, x0=0, step=1, seed=0):""" Random walkcount: number of stepsx0 : initial position (default 0)step : step size (default 1)seed : seed for the initialization of the random generator (default 0)"""random.seed(seed)x = x0walk = [] for i in range (count): if random.uniform(-1, +1) > 0:x += 1 else :x -= 1walk.append(x) return walk def generate_results(count, x0=0, step=1, seed=0):"""Compute a walk and return it alongside its context""" if subprocess.call(('git', 'diff-index', '--quiet', 'HEAD')): print ('Repository␣is␣dirty,␣please␣commit␣first')sys.exit(1) return {'data' : walk,'parameters': {'x0':x0, 'step':step, 'count':count, 'seed':seed},'timestamp' : str (datetime.datetime.utcnow()),'revision' : revision,'system' : sys.version} if __name__ == '__main__': open ('results-R4.txt', 'w') as fd:fd.write( str (results)) print (results['data']) Listing 5:

Re-runnable, repeatable, reproducible, reusable random walk (R ) raw code, archive Documentation is one of the most potent tools for reusability. A proper documentation on howto install and run the software often makes the diﬀerence whether other researchers manage to use it9r not. A comment describing what each function does, however evident, can avoid hours of head-scratching. Great code may need few comments. Scientists, however, are not always brilliant develop-ers. Of course, bad, complicated code should be rewritten until is simple enough to explain itself. Butrealistically, this is not always going to be done: there is simply not enough incentive for it. There, acomment that explains the intentions and reasons behind a block of code can be tremendously useful.Reusability is not a strict requirement for scientiﬁc code. But it has many beneﬁts, and a few sim-ple measures can foster it considerably. To complement the R version provided here, we provide anexample repository of a re-runnable, repeatable, reproducible and reusable random walk code. Therepository is available on GitHub github.com/benureau/r5 and here doi.org/10.5281/zenodo.848284. Replicable (R ) Having made a software reusable oﬀers an additional way to ﬁnd errors, especially if your scientiﬁccontribution is popular. Unfortunately, this is not always eﬀective, and some recent cases have shownthat bugs can lurk in well-used open-source code, impacting the false positive rates of fMRI studies (Ek-lund, Nichols, & Knutsson, 2016), or the encryption of communications over the Internet (Durumericet al., 2014). Let’s be clear: the goal here is not to remove all bugs and mistakes from science. The goalis to have methods and practices in place that make possible for the inevitable errors that will be madeto be caught and corrected by motivated investigators. This is why, as explained by Peng et al. (Penget al., 2006), the replication of important ﬁndings by multiple independent investigators is fundamental tothe accumulation of scientiﬁc evidence . Replicability is the implicit assumption that an article that does not provide the source code makes:that the description it provides of the algorithms is suﬃciently precise and complete to re-obtain theresults it presents. Here, replicating implies writing a new code matching the conceptual description ofthe article, in order to reobtain the same results. Replication aﬀords robustness to the results because,should the original code contain an error, a diﬀerent codebase creates the possibility that this error willnot be repeated, in the same way that replicating an laboratory experiment in a diﬀerent laboratorycan ferret out subtle biases. While every published article should strive for replicability, it is seldomobtained. In fact, absent an explicit eﬀort to make an algorithmic description replicable, there is littleprobability that it will be.This is because most papers strive to communicate the main ideas behind their contribution interms as simple and as clear as possible, so that the reader may be able to easily understand them andthe results that are presented. Trying to ensure replicability in the main text adds a myriad of esotericdetails that are not conceptually signiﬁcant and clutter the explanations. Therefore, unless the writerdedicates an addendum or a section of the supplementary information for technical details speciﬁcallyaimed at replicability, the information will not be there because there are incentives not to do so.But even when those details are present, the best eﬀorts may fall short because an oversight, atypo or a diﬀerence between what is evident for the writer and for the reader (Mesnard & Barba, 2016).Minute changes in the numerical estimation of a common ﬁrst-order diﬀerential equation can havesigniﬁcant impact (Crook, Davison, & Plesser, 2013). Hence, a reproducible code plays an importantrole alongside its article: it is a objective catalog of all the implementation details.A researcher seeking to replicate published results might ﬁrst consider only the article. If she failsto replicate the results, she will consult the original code, and with it be able to pinpoint why her codeand the code of the authors diﬀer in behavior. Because a mistake on their part? Hers? Or a diﬀer-ence in a seemingly innocuous implementation detail? A ﬁne analysis of why a particular algorithmicdescription is lacking or ambiguous or why a minor implementation decision is in fact crucial to ob-10ain the published results is of great scientiﬁc value. Such an analysis can only be done with access toboth the article and the code. With only the article, the researcher will often be unable to understandwhy she failed to replicate the results, and will naturally be inclined to only report replication successes.Replicability, therefore, does not negate the necessity of reproducibility. In fact, it often relies onit. To illustrate this, let us consider what could be the textual description of the random walker (as itwould be written in an article describing it):

The model uses the Mersene Twister generator initialized with the seed 1. At each iteration,a uniform number between -1 (included) and +1 (excluded) is drawn and the sign of the resultis used for generating a positive or negative step.

This description, while somewhat precise, forgoes—as it is common—the initialization of the variables(here the starting value of the walk: ), and the technical details about which implementation of theRNG is used. import random import numpy as np def _rng(seed):""" Return a numpy random number generator initialized with seedas it would be with python random generator."""rng = random.Random()rng.seed(seed)_, keys, _ = rng.getstate()rng = np.random.RandomState()state = rng.get_state()rng.set_state((state[0], keys[:-1], state[2], state[3], state[4])) return rng def walk(n, seed):""" Random walk for n steps """rng = _rng(seed)steps = 2*(rng.uniform(-1,+1,n) > 0) - 1 return steps.cumsum().tolist() if __name__ == '__main__': open ("results-R5.txt", "w") as fd:fd.write( str (results)) print (path) Listing 6:

Replicated random walk (R ) raw code, archive

11t may look innocuous. After all, the Python documentation, states that “Python uses the MersenneTwister as the core generator. It produces 53-bit precision ﬂoats and has a period of 2**19937-1”. Some-one trying to replicate the work however might choose to use the RNG from the NumPy library. TheNumPy library is extensively used in the science community, and it provides an implementation of theMersene Twister generator too. Unfortunately, the way the seed is interpreted by the two implemen-tations is diﬀerent, yielding diﬀerent random sequences.Here we are able to replicate exactly the behavior of the pure-Python random walker by setting theinternal state of the NumPy RNG appropriately, but only because we have access to speciﬁc technicaldetails (the use of the random module of the standard Python library of CPython 3.6.1), or to the codeitself.But there are still more subtle problems with the description given above. If we look more closelyat it, we can realize that nothing is said about the speciﬁc case of when generating a step. Do we haveto consider to be a positive or a negative step? Without further information and without the originalcode, it is up to the reader to decide. Likewise, the description is ambiguous regarding the ﬁrst elementof the walk. Is the initialization value included (it was not in our codes so far)? This slight diﬀerencemight aﬀect the statistics of short runs.All these ambiguities in the description of an algorithm pile up; some are inconsequential (the 0 casehas null probability), but some may aﬀect the results in important ways. They are mostly inconspicuousto the reader and oftentimes, to the writer as well. In fact, the best way to ferret out the ambiguities, bigand small, of an article is to replicate it. This is one of the reasons why the ReScience journal (Rougieret al., 2017) was created (the second author, Nicolas Rougier, is one of the editor-in-chief of ReScience).This journal, run entirely by volunteers on GitHub, targets computational research and encourages theexplicit replication of already published research, promoting new and open-source implementations in or-der to ensure that the original research is reproducible .Code is an integral part of any submission to the ReScience journal. During the review process,reviewers run the submitted code, may criticise its quality and its ease-of-use, and verify the repro-duciblity of the replication. The Journal of Open Source Software (Smith et al., 2017) functions similarly:testing the code is a fundamental part of the review process. Conclusion

Throughout the evolution of a small random walk example implemented in Python, we illustrated someof the issues that may plague scientiﬁc code. The code may be correct and of good quality, but still manyproblems may reduce its contribution to scientiﬁc knowledge. To make these problems explicit, we ar-ticulated ﬁve characteristics that a code should possess to be a useful part of a scientiﬁc publication: itshould be re-runnable, repeatable, reproducible, reusable and replicable.Running old code on tomorrow’s computer and software stacks may not be possible. But recreatingthe old code’s execution environment may be: to ensure that the long-term re-runnability of a code, itsexecution environment must be documented. For our example, a single comment went a long way totransform the R code into the R (re-runnable) one.Science is built on verifying the results of others. This is harder to do if each execution of the codeproduce a diﬀerent result. While for complex parallel workﬂow this may not be possible, in all instances Striving, as we do here, for a perfect quantitative match may seem unnecessary. Yet, in replication projects, quantitativecomparisons are a simple and eﬀective way to verify that the behavior has been reproduced. Moreover, they are particularlyhelpful to track exactly where the code of a tentative replication fails to reproduce the published results. (repeatable) version.Even more care is needed to make a code reproducible. The exact execution environment, code andparameters used must be recorded and embedded in the results ﬁles, as the R (reproducible) versiondoes. Furthermore, the code must be made available as supplementary data with the whole computa-tional workﬂow, from preprocessing steps to plotting scripts.Making code reusable is a stretch goal that can yield tremendous beneﬁts for you, your team andother researchers. Taken into account during development rather than as an afterthought, simple mea-sures can avoid hours of head-scratching for others, and for yourself—in a few years. Documentationis paramount here, even if it is a single comment per function, as it was done in the R (reusable) version.Finally, there is the belief that an article should suﬃce by itself: the descriptions of the algorithmspresent in the paper should suﬃce to re-obtain (to replicate) the published results. For well-written pa-pers that precisely dissociate conceptually signiﬁcant aspects from irrelevant implementation details,that may be. But scientiﬁc practice should not assume the best of cases. Science assumes that errors cancrop up everywhere. Every paper is a mistake or a forgotten parameter away from irreproducibility.Replication eﬀorts use the paper ﬁrst, and then the reproducible code that comes along with it when-ever the paper falls short of being precise enough to be reimplemented.In conclusion, the R (reproducible) form should be accepted as the minimum scientiﬁc standard(Wilson et al., 2017). This means this should be actually checked by reviewers and publishers whencode is part of a work worth being published. But it’s hardly the case today.Compared to psychology or biology, the replication issues of computational works have reasonableand eﬃcient solutions. But making sure that these solutions are adopted will not be solved by articlessuch as this one. Just like in other ﬁelds, we have to modify the incentives for the researchers topublish by adopting exigences, enforced domain-wide, on what constitutes an acceptable scientiﬁccomputational work. References

Collange, S., Defour, D., Graillat, S., & Iakymchuk, R. (2015). Numerical reproducibility for the parallel reductionon multi- and many-core architectures.

Parallel Computing , , 83–97. doi:10.1016/j.parco.2015.09.001Collberg, C. & Proebsting, T. A. (2016). Repeatability in computer systems research. Commun. ACM , (3), 62–69.doi:10.1145/2812803Court`es, L. & Wurmus, R. (2015). Reproducible and User-Controlled Software Environments in HPC with Guix.In .Crook, S. M., Davison, A. P., & Plesser, H. E. (2013). Learning from the past: approaches for reproducibility incomputational neuroscience. In J. M. Bower (Ed.),

20 years of computational neuroscience (pp. 73–102).New York, NY: Springer New York. doi:10.1007/978-1-4614-1424-7 4Diethelm, K. (2012). The limits of reproducibility in numerical simulation.

Computing in Science Engineering , (1),64–72. doi:10.1109/MCSE.2011.21Donoho, D. L., Maleki, A., Rahman, I. U., Shahram, M., & Stodden, V. (2009). Reproducible research in computa-tional harmonic analysis. Computing in Science & Engineering , (1), 8–18. doi:10.1109/mcse.2009.15Durumeric, Z., Payer, M., Paxson, V., Kasten, J., Adrian, D., Halderman, J. A., Bailey, M., Li, F., Weaver, N., Amann,J., & Beekman, J. (2014). The matter of heartbleed. In Proceedings of the 2014 conference on internet mea-surement conference - IMC ’14 . ACM Press. doi:10.1145/2663716.2663755Eklund, A., Nichols, T. E., & Knutsson, H. (2016). Cluster failure: why fMRI inferences for spatial extent have in-ﬂated false-positive rates.

Proceedings of the National Academy of Sciences , (28), 7900–7905. doi:10.1073/pnas.1602413113Goble, C. A. (2016). What is reproducibility? the r* brouhaha. Slides on slideshare. ines, M. L. & Carnevale, N. T. (2008). Translating network models to parallel hardware in NEURON. Journal ofNeuroscience Methods , (2), 425–455. doi:10.1016/j.jneumeth.2007.09.010Hughes, B. D. (1995). Random walks and random environments . Oxford New York: Clarendon Press Oxford Uni-versity Press.Iqbal, S. A., Wallach, J. D., Khoury, M. J., Schully, S. D., & Ioannidis, J. P. A. (2016). Reproducible research practicesand transparency across the biomedical literature.

PLOS Biology , (1), e1002333. doi:10.1371/journal.pbio.1002333Mesirov, J. P. (2010). Accessible reproducible research. Science , (5964), 415–416. doi:10.1126/science.1179653Mesnard, O. & Barba, L. (2016). Reproducible and replicable CFD: it’s harder than you think. Preprint on arXiv:1605.04339.Accepted in Comput. Sci. Eng.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science.

Science , (6251),aac4716–aac4716. doi:10.1126/science.aac4716Peng, R. D., Dominici, F., & Zeger, S. L. (2006). Reproducible epidemiologic research. American Journal of Epi-demiology , (9), 783. doi:10.1093/aje/kwj093Rougier, N. P., Hinsen, K., Alexandre, F., Arildsen, T., Barba, L., Benureau, F. C. Y., Titus Brown, C., de Buyl, P.,Caglayan, O., Davison, A. P., Andr´e Delsuc, M., Detorakis, G., Diem, A. K., Drix, D., Enel, P., Girard, B.,Guest, O., Hall, M. G., Neto Henriques, R., Hinaut, X., Jaron, K. S., Khamassi, M., Klein, A., Manninen, T.,Marchesi, P., McGlinn, D., Metzner, C., Petchey, O. L., Ekkehard Plesser, H., Poisot, T., Ram, K., Ram, Y.,Roesch, E., Rossant, C., Rostami, V., Shifman, A., Stachelek, J., Stimberg, M., Stollmeier, F., Vaggi, F., Viejo,G., Vitay, J., Vostinar, A., Yurchak, R., & Zito, T. (2017). Sustainable computational science: the ReScienceinitiative. ArXiv e-prints . arXiv: 1707.04393 [cs.DL]

Sandve, G. K., Nekrutenko, A., Taylor, J., & Hovig, E. (2013). Ten simple rules for reproducible computationalresearch.

PLoS Computational Biology , (10), e1003285. doi:10.1371/journal.pcbi.1003285Schwab, M., Karrenbach, N., & Claerbout, J. (2000). Making scientiﬁc computations reproducible. Computing inScience & Engineering , (6), 61–67. doi:10.1109/5992.881708Smith, A. M., E Niemeyer, K., Katz, D. S., Barba, L. A., Githinji, G., Gymrek, M., Huﬀ, K. D., Madan, C. R., CabunocMayes, A., Moerman, K. M., Prins, P., Ram, K., Rokem, A., Teal, T. K., Valls Guimera, R., & Vanderplas,J. T. (2017). Journal of Open Source Software (JOSS): design and ﬁrst-year review. ArXiv e-prints . arXiv:1707.02264 [cs.DL]

Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L., & Teal, T. K. (2017). Good enough practices in scien-tiﬁc computing.

PLOS Computational Biology , (6), e1005510. doi:10.1371/journal.pcbi.1005510(6), e1005510. doi:10.1371/journal.pcbi.1005510