[PDF] Replication, Communication, and the Population Dynamics of Scientific Discovery

Abstract

Many published research results are false, and controversy continues over the roles of replication and publication policy in improving the reliability of research. Addressing these problems is frustrated by the lack of a formal framework that jointly represents hypothesis formation, replication, publication bias, and variation in research quality. We develop a mathematical model of scientific discovery that combines all of these elements. This model provides both a dynamic model of research as well as a formal framework for reasoning about the normative structure of science. We show that replication may serve as a ratchet that gradually separates true hypotheses from false, but the same factors that make initial findings unreliable also make replications unreliable. The most important factors in improving the reliability of research are the rate of false positives and the base rate of true hypotheses, and we offer suggestions for addressing each. Our results also bring clarity to verbal debates about the communication of research. Surprisingly, publication bias is not always an obstacle, but instead may have positive impacts---suppression of negative novel findings is often beneficial. We also find that communication of negative replications may aid true discovery even when attempts to replicate have diminished power. The model speaks constructively to ongoing debates about the design and conduct of science, focusing analysis and discussion on precise, internally consistent models, as well as highlighting the importance of population dynamics.

Full PDF

RREPLICATION, COMMUNICATION, AND THE POPULATIONDYNAMICS OF SCIENTIFIC DISCOVERY

RICHARD MCELREATH

AND PAUL E. SMALDINO November 16, 2018

Many published research results are false [1], and controversy continuesover the roles of replication and publication policy in improving the reli-ability of research. Addressing these problems is frustrated by the lack ofa formal framework that jointly represents hypothesis formation, repli-cation, publication bias, and variation in research quality. We develop amathematical model of scientiﬁc discovery that combines all of these ele-ments. This model provides both a dynamic model of research as well asa formal framework for reasoning about the normative structure of sci-ence. We show that replication may serve as a ratchet that gradually sep-arates true hypotheses from false, but the same factors that make initialﬁndings unreliable also make replications unreliable. The most impor-tant factors in improving the reliability of research are the rate of falsepositives and the base rate of true hypotheses, and we offer suggestionsfor addressing each. Our results also bring clarity to verbal debates aboutthe communication of research. Surprisingly, publication bias is not al-ways an obstacle, but instead may have positive impacts—suppression ofnegative novel ﬁndings is often beneﬁcial. We also ﬁnd that communica-tion of negative replications may aid true discovery even when attemptsto replicate have diminished power. The model speaks constructively toongoing debates about the design and conduct of science, focusing anal-ysis and discussion on precise, internally consistent models, as well ashighlighting the importance of population dynamics.Keywords: replication, publication bias, epistemology, scientiﬁc method D EPARTMENT OF A NTHROPOLOGY , UC D

AVIS , O NE S HIELDS A VENUE , D

AVIS

CA95616 C ENTER FOR P OPULATION B IOLOGY , UC D

AVIS

E-mail address : [email protected] . a r X i v : . [ s t a t . O T ] J u l MCELREATH & SMALDINO I NTRODUCTION

Imagine two of your close colleagues have just heard about attempts toreplicate their positive research ﬁndings. Colleague A is thrilled that theattempt was successful. Colleague B is upset that the attempt was unsuc-cessful. What is the probability that Colleague A’s hypothesis is true? Whatis the probability that Colleague B’s hypothesis is false?This is not a fair quiz, because in truth no one knows the answers to thesequestions. The absence of replication in many ﬁelds [2–4], combined withthe absence of a formal framework for understanding replication, makes itdifﬁcult to even outline an answer. In the absence of replication, there issubstantial concern that many published ﬁndings may be false [1], an argu-ment with empirical support [5–7]. The history of science buttresses theseobservations. A recent catalog of false discoveries of chemical elements out-numbers the current number of real elements in the periodic table [8]. Inaddition to concerns about replication are concerns about research practiceand publication bias. Without knowing how many studies were conductedbut not published, it is not possible to assign evidential value to either ini-tial ﬁndings or replications. And it is not yet easy to acquire empirical evi-dence about these factors, as even the best empirical studies of publicationbias still rely upon researcher self-report [3].Thus many opinions can be sustained about the evidential value of bothinitial ﬁndings and replications. As a result, recent controversies over failedreplications demonstrate a lack of consensus on norms for replication andpublication [9–12]. What is the evidential value of replication, positive ornegative? What is the impact of publication bias [13]? If replication is partof an “invisible hand” [14] that corrects scientiﬁc errors, how much repli-cation is needed? And what are the risks of poorly designed or interpretedreplication attempts [9]? When replication is not possible or practical, whatother measures can be taken to improve the reliability of research?These questions remind us that little is understood about the populationdynamics of discovery, replication, and scientiﬁc communication. Muchmore attention has been given to individual methods of research design anddata analysis. And while it is useful to analyze research methods in isola-tion, such calculations are unsatisfying. A lot of research activity is hiddenfrom the public record. This means the actual number of ﬁndings for an hy-pothesis may never be known [13]. And since researchers select hypothesesfor further study from the literature itself, ﬁndings and publication biasescascade into other ﬁndings, interacting with biases and incentives [15].To know the evidential value of research, we must study the popula-tion dynamics that produce it [14, 16–18]. So here we construct and solve amathematical model of scientiﬁc beliefs formed by a population of bound-edly rational agents who accumulate evidence for and against hypotheses.

HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 3

We adopt a general signal detection framework that may apply to diversestatistical paradigms, whether p -valued or Bayesian. We study the jointdynamics that arise from replication, publication bias, and differences inresearch quality between original studies and replications. Our goal is notto accurately simulate science, but rather to understand it better using thesame reductionist tools that have been so successful in illuminating pop-ulation dynamics more generally [19, 20]. Our model implicitly provides,for example, a neutral model of scientiﬁc dynamics in which all hypothesesare false and yet discoveries are continuously published. It also provides arange of “selectionist” models that might be compared to data. The clarityof a quantitative framework will stimulate and clarify the development oflater empirical investigation and experimental intervention.The paper proceeds by ﬁrst outlining the dynamic structure of the model.We then solve the model for both its long-run dynamics and its epistemo-logical implications—what should a rational agent believe about an hy-pothesis, given a record of published results? We present a general interpre-tation of the joint dynamics, so the reader can extrapolate lessons from oursimple model to the complexity and diversity of real science. We concludeby relating our results to ongoing debates about improving the reliabilityof scientiﬁc research. M ODEL D ESCRIPTION

The model is illustrated in Fig. 1. We have also constructed an interactive,web-based tutorial on the conceptual foundations of the model, as well asfully adjustable simulation code, available at \protect http://xcelab.net/replication/.A population of researchers studies many different hypotheses. Each hy-pothesis is either true (green) or false (red). These hypotheses could be sim-ple associations, such as green jelly beans cause acne [21], or more generalclaims, such as evolution is predictable . Research results in either a positive ora negative ﬁnding. These ﬁndings may be the result of formal hypothesistests or informal assessments. True hypotheses produce positive ﬁndingsmore often than do false hypotheses, but the researchers never know forsure which hypotheses are true. Under these assumptions, the only infor-mation relevant for judging the truth of an hypothesis is its tally , the differ-ence between the number of published positive ﬁndings and the numberof published negative ﬁndings for each hypothesis, and we summarize re-sults in terms of these tallies. In reality, much other information is relevantto judging the truth of an hypothesis. Our assumptions are tactical ones.More complex models of scientiﬁc communication are possible, but anysuch model must include the components in our model, and so our resultsestablish a critical baseline.

MCELREATH & SMALDINO

1. Hypothesis Selection ! Novel hypotheses ! Tested hypotheses ! A previously tested hypothesis is selected for replication with probability r, otherwise a novel (untested) hypothesis is selected. Novel hypotheses are true with probability b. ! r ! r !

2. Investigation ! T ! Real truth of hypothesis ! P r obab ili t y o f r e s u l t ! β α β α +–

3. Communication ! Experimental results are communicated to the scientiﬁc community with a probability that depends upon both the experimental result (+, –) and whether the hypothesis was novel (N) or a replication (R). Communicated results join the set of tested hypotheses. Uncommunicated replications revert to their prior status. ! C N – C N – positive results ! negative results ! C R+ C R+ New result communicated ! New result not communicated ! C R– C R– File drawer ! novel ! replic. ! novel ! replic. ! True (T) ! False (T) ! KEY ! Interior = true epistemic state ! Exterior = experimental evidence ! Unknown ! Positive (+) ! Negative (–) ! General case ! General case (+ or –) ! F ! F IGURE Population dynamics of replication.

Each time interval, research activity has three stages that alter these tal-lies. In stage 1 (Fig. 1, upper-left) each researcher chooses to investigateone of n previously published hypotheses, with probability r , or a novelhypothesis, with probability 1 − r . When replicating, a researcher chooses apreviously published hypothesis at random and performs a new study of it.Later, we allow researchers to target hypotheses with speciﬁc tally values,rather than choosing at random. A novel hypothesis is true with probability b , the base rate , reﬂecting mechanisms of hypothesis formation. Untutoredintuition, for example, may be expected to yield a very low b . Genome wideassociation studies likewise have low b , because relatively few loci are as-sociated with any particular phenotype. There is no consensus on base rate,except that most scientists we know believe their own personal b values arebetter than average. So we allow b to vary freely in the model.In stage 2, a true hypothesis produces a positive ﬁnding 1 − β of the time,its power . A false hypothesis produces a positive ﬁnding α of the time, its false positive rate . We assume that 1 − β > α . Later we allow the values of β and α to differ between replication attempts and initial studies. Note that β and α are not merely properties of a statistical procedure, but rather of an HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 5 entire investigation. For example, using several procedures and selectingthe one that produces a positive result will inﬂate α [22].In stage 3, ﬁndings may be communicated to other researchers. Not ev-ery ﬁnding is communicated, either because no one tries to communicate itor rather because it cannot be published. Only communicated ﬁndings canadjust a tally. Let c N − be the probability that a negative ( − ) ﬁnding abouta new (N) hypothesis is communicated. We assume for simplicity that allnew positive results are communicated ( c N + = c R − and c R + be the probabilities thatreplications with negative and positive ﬁndings, respectively, are commu-nicated.These assumptions deﬁne the dynamics of the expected numbers of trueand false hypotheses with a given tally. We present the full recursions inthe Supporting Material. In the simplest case (full communication: c N − = c R − = c R + = n T, s of true hypotheses with an observed tally s in the next time step is given by: n (cid:48) T, s = n T, s + anr (cid:16) − n T, s n + n T, s − n ( − β ) + n T, s + n β (cid:17) (1)where a > n . This ex-pression says that the number in the next time step is just the current num-ber plus all of the ﬂows in and out caused by replications. In the case that s = − s =

1, there is an additional term an ( − r ) b β or an ( − r ) b ( − β ) ,respectively, to represent the inﬂow of novel ﬁndings. Recursions n (cid:48) F, s forfalse hypotheses are constructed from a change in variables: 1 − β → α , b → − b . Notice that this implies that the model is easily extended to anynumber of hypothesis types, such as effect size differences, that differ inpower and false-positive rate. We analyze the true / false dichotomy becauseof its prominence and simplicity.A NALYSIS

By literature review, a tally can be constructed for any given hypothe-sis. Given an observed tally, but a number of possibly unobserved studies,what is the probability that an hypothesis is correct? The model allows usto address this question for a diversity of scenarios. Before presenting thesolutions, note that the answers that the model provides can be understoodboth from a pure population dynamics perspective and from a probabilisticreasoning perspective. From the dynamics perspective, the population willconverge from any initial condition to a unique steady state in which the so-lutions give frequencies of true hypotheses at each tally value. Equally validis the epistemological perspective that the solutions tell us for any uniquehypothesis the probability it is true, given a state of information [23]. One

MCELREATH & SMALDINO consequence of this is that the solutions do not require that all hypothesesshare the same parameter values.For each tally value s , we solved for the steady state proportions of trueand false hypotheses, ˆ p T, s and ˆ p F, s . We also derived the same solutions un-der the probabilistic interpretation, and veriﬁed our solutions numericallyand through stochastic simulation. We present complete analytical solu-tions in the Supporting Material. In the simplest case (for full communica-tion), solutions take the form:ˆ p T, s = b ( − r ) ∞ ∑ m = r m − (cid:18) m ( m + s ) (cid:19) ( − β ) ( m + s ) β ( m − s ) (2)This expression deﬁnes an inﬁnite geometric series of binomial probabilitiesarising from all of the different possible histories by which a true hypoth-esis could achieve a tally of s , for every possible number of ﬁndings m . Inthe majority of cases, only the ﬁrst few terms of the series are important,because of the leading factor r m − . This fact also informs us that the rate ofconvergence to steady state will be quite rapid, unless r is large.For any particular tally, for example s =

1, expression (2) yields a closed-form solution like:ˆ p T,1 = b ( − r ) β r (cid:18)(cid:0) − r β ( − β ) (cid:1) − − (cid:19) (3)For arbitrary communication parameters, the solutions have a similar struc-ture, but are instead a series of multinomial probabilities in which the eventsare combinations of ﬁndings ( + or − ) and communication outcomes.These solutions are not easy to interpret by inspection. But they do pro-vide answers to the question: what is the probability that an hypothesis with agiven tally is correct? For any tally s , we can calculate:Pr ( true | s ) = ˆ p T, s ˆ p T, s + ˆ p F, s , Pr ( s | true ) = ˆ p T, s ∑ i ˆ p T, i , Pr ( s | false ) = ˆ p F, s ∑ i ˆ p F, i (4)The precision of a tally s is Pr ( true | s ) , the proportion of hypotheses withtally s that are true. The sensitivity , Pr ( s | true ) , is the proportion of true hy-potheses with tally s . It indicates where the true hypotheses are. Sensitivityis important because a high precision for a tally s is little help when thereare few hypotheses that achieve a tally s . And the speciﬁcity , Pr ( s | false ) , isthe proportion of false hypotheses with tally s , indicating where the falsehypotheses are. We use these deﬁnitions to explain the behavior of the sys-tem. Overall dynamics.

Fig. 2 describes the overall dynamics of precision, as afunction of the different parameters. In each panel, the trend lines showthe proportion of true hypotheses at each tally on the vertical axis. Thetally corresponding to each trend is indicated by a number. The horizontal

HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 7 axis in each panel varies a single parameter. Each vertical hairline showsthe value of each parameter that is held constant in other panels. This ﬁg-ure is complex. We’ll use it to highlight the most important factors in thereliability of ﬁndings and demonstrate counter-intuitive aspects of commu-nication. Then in the next section, we’ll turn to a more general explanationof the causes of these results.There are two clusters of plots. The top cluster represents a normativelyoptimistic scenario, with an auspicious base rate ( b = − β = α = b = − β = α = b < − ), to predicting the winner of apresidential election, on the high end ( b = s = s = s = b = s = s = s = s = b = r in panel (b), has remarkably littleimpact. This is because replication impacts the rate at which hypothesesreach different tallies, but not so much the precision at each tally. Therefore MCELREATH & SMALDINO (cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:5)(cid:6)(cid:7) (cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8) P r opo r t i on t r ue P r opo r t i on t r ue P r opo r t i on t r ue P r opo r t i on t r ue base rate ( b ) replication rate ( r ) power (1– β ) false-positive rate ( α )comm. neg. rep. ( c R– ) comm. pos. rep. ( c R+ ) comm. neg. new ( c N– )

123 04(a) (b) (c) (d)(e) (f) (g)0 1 11 1 1 O p t i m i s t i c sce n a r i o P ess i m i s t i c sce n a r i o base rate ( b ) replication rate ( r ) power (1– β ) false-positive rate ( α )comm. neg. rep. ( c R– ) comm. pos. rep. ( c R+ ) comm. neg. new ( c N– ) F IGURE Effects of base rate, replication, power, false-positives,and communication on the probability that an hypothesis with agiven tally is true. The two clusters illustrate difference scenarios.The blue trends, each labeled with its tally value, show precisionas it varies by the parameter on each horizontal axis. The numbersindicate the tally of a curve. Dashed curves are tallies of an evennumber. The vertical hairlines show the parameter values heldconstant across panels within the same cluster. at low replication rates, few hypotheses will ever attain s =

5, but those thatdo are almost certainly true. We expand on this point in the next section.Third, communication of ﬁndings, panels (e-g), can both assist discoveryor hinder it. Suppression of negative replications (e) reduces precision. But

HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 9 suppression of positive replications (f) and novel negative ﬁndings (g) ei-ther improves precision or has almost no impact on it. These aspects of thepopulation dynamics are counter-intuitive, but quite general and revealing.The next section explains them.

Dynamics of communication.

The “ﬁle drawer problem” [13] arises whenthe failure to publish negative ﬁndings distorts the estimated strength of anassociation. We consider a related phenomenon by asking how changes inthe communication parameters c N − , c R − , and c R + alter the precision, sensi-tivity, and speciﬁcity across tallies. In the process, we’ll have opportunity toexplain the joint dynamics of research quality and communication biases.In this model, it is rarely best to communicate everything. In the Sup-porting Material, we prove for the case of small b (such that b ≈

0) andsmall r ( r ≈

0) that c N − < α < β (usuallysatisﬁed), that c R − < α > (hopefully neversatisﬁed), and that c R + < β − α ≤ (of-ten satisﬁed). So some suppression of novel negative ﬁndings ( c N − < c R + <

1) can improve the value of replication.At larger b and r , the conditions are more complicated, but the qualitativeﬁnding remains intact.To grasp why suppressing ﬁndings might help us learn what is true,think of replication as epistemological chromatography . Chromatography isa set of techniques for separating substances that are mixed together. Forexample, mixed plant pigments can be separated by painting the mixtureonto the tip of a strip of ﬁlter paper and then soaking the tip in a solvent.Different pigments bind more or less strongly to the solvent or the paper.Therefore as the paper absorbs the solvent, different pigments travel at dif-ferent speeds, eventually separating and appearing as differently coloredbands on the paper. In the epistemological case, it is true and false hy-potheses that are mixed. We wish to separate the true ones from the false.Replication applies a “solvent” that diffuses false hypotheses towards neg-ative tallies and true hypotheses towards positive tallies. A true hypothesisdiffuses upwards with probability ( − β ) c R + , while a false hypothesis dif-fuses downwards with probability ( − α ) c R − . Thus the communicationparameters adjust rates of diffusion. Just as manipulating rates of chemicaldiffusion can improve real chromatography, manipulating communicationcan improve epistemological chromatography.In Fig. 3, we turn on communication one parameter at a time, in orderto explain the contribution of each mode of communication to the resultingpopulation dynamics. All four panels (a, b, c, d) show steady state preci-sion, sensitivity, and speciﬁcity and use b = r = − β = α = c N– = c R– = c R+ = 1. True and false hypotheses diffuse in both directions, and everything is communicated. Since most effort investigates new ﬁndings at tally –1, few hypotheses ever achieve a high tally, but those that do have high precision . (a) Positive only c N– = 0, c R– = 1, c R+ = 0. Tallies can only decrease. False hypotheses diffuse down faster than true ones. But since the mixture at tally +1 is mostly false, precision is always low. (b)

Negative only c N– = 0, c R– = 0, c R+ = 1. Only positive ﬁndings are initially communicated, and replication can only increase tallies, which are here counts of positive ﬁndings. True hypotheses diffuse upward faster than false ones. So large tallies have a high precision , the proportion of true hypotheses. (c)

Screen and check P r opo r t i on Solid : c N– = 0, c R– = 1, c R+ = 1. Up diffusion of true hypotheses is aided by down diffusion of false ones from the mixed source tally +1. Compare precision to (a) . Dashed : c N– = 0, c R– = 1, c R+ = 0.2. Suppressing positive replications regulates the rate of up diffusion, purifying high tallies at the price of sensitivity . (d) Total communication 1– αβ βα (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - P r opo r t i on P r opo r t i on P r opo r t i on Tally (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - - - βα αβ (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - ≥≤ F IGURE Replication and communication as epistemologicalchromatography. Precision is indicated in blue, sensitivity in or-ange, and speciﬁcity in gray. any parameters the reader chooses. Note that for sensitivity and speciﬁcity,probability above/below the highest/lowest tally displayed is added up onthe highest/lowest tally, so that none of the probability mass is hidden.In the ﬁrst three panels (a, b, c), only positive initial ﬁndings are com-municated, and all new hypotheses appear at tally s =

1. The mixture ofhypotheses at this tally is heavily skewed towards false hypotheses, andso has a low precision. Replication may cause an hypothesis to diffuse ineither direction, depending upon communication. In panel (a), negativeﬁndings are never communicated. But since true hypotheses diffuse up ata rate 1 − β and false ones only at a rate α < − β , truth is slowly separated HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 11 from falsity. At tallies of 8 or more, nearly all hypotheses are true, as indi-cated by the precision. Note however that most true hypotheses that havebeen communicated at all exist at low tallies, as indicated by the sensitivity.With enough time and replication, every true hypothesis can be split fromthe false. This is unlike the case in panel (b), where only negative repli-cations are communicated. The same dynamic works in reverse here, andreplication creates a pure sample of false hypotheses at low tallies.Combining both directions of diffusion is synergistic, as illustrated inpanel (c). Now both positive and negative replications are communicated.The downward diffusion of false hypotheses makes the upward diffusionof true hypotheses more efﬁcient. This effect arises because 1 − α > − β .False hypotheses diffuse down faster than true hypotheses diffuse up. Thispuriﬁes the source mixture at s =

1, allowing for precision to approachhigh values at much smaller tallies than in the absence of either diffusionprocess. In this example, hypotheses with tallies of s = c R + <

1, wehave effectively slowed all upward diffusion. This allows rapid downwarddiffusion from negative replications to further clean the source mixture, butat the cost of diffusing more true hypotheses towards negative tallies. Thisdynamic is beneﬁcial when base rate is especially low. So we achieve a veryclean sample of truth at smaller positive tallies in this scenario, but at theprice of ﬁnding fewer true hypotheses in total. Whether this is an improve-ment depends upon context, an issue we take up in the discussion.Finally, full communication is illustrated in panel (d). High precision isachieved at high tallies, but few hypotheses reside at those tallies. This in-efﬁciency arises from the unbiased allocation of replication effort. When allinitial ﬁndings are communicated, replication effort is overwhelmed by fol-lowing up on initial negative ﬁndings, the spike in speciﬁcity seen at tally s = −

1. When the base rate is low, it can be better to screen for positive ﬁnd-ings than to publish every negative ﬁnding. Note however that increasingprecision, the proportion of hypotheses at a given tally that are true, is notnecessarily the only objective. It does us little good if sensitivity is very lowat all high tally values. We return to this point in a later section, when we consider differential power and false-positive rates between initial studiesand replications.

Targeted replication.

Replication in the preceding analysis is purely ran-dom: every communicated hypothesis has an equal chance of being thetarget of a replication effort. Targeting particular tally values, like s = r T of all replication attempts target a chosen list oftally values, selecting an hypothesis randomly from all hypotheses withinthe list. For example, this list might consist of all previously communicatedhypotheses with a positive tally of three or less, so that researchers con-centrate their replication efforts on hypotheses thought to be true but withrelatively high uncertainty. The rest of the time, 1 − r T , replication effortremains unbiased.Fig. 4 shows the resulting modiﬁcation of the dynamics. The dashedcurves in these plots show the steady-state dynamics in the absence of tar-geting. The shaded pink regions show the range of tally values included inthe target. In each case, targeting improves sensitivity at higher positive tal-lies. Thus it helps to diffuse true hypotheses towards tallies with very highprecision. But there is very little effect on precision itself. Targeting helpsbecause it directs effort towards tallies that may not have a high density ofhypotheses. When replication effort is unbiased, most effort is directed totallies where the bulk of hypotheses reside. Therefore when the target rangeincludes a wide range, as in panel (c), it becomes relatively ineffective.Why doesn’t targeting improve the proportion of hypotheses that aretrue at higher tallies? Targeting serves mainly to speed up diffusion, with-out altering the relative rates at which true and false hypotheses diffuse.Changes in communication rates, in contrast, do alter the differential ratesof diffusion, and so may dramatically alter precision, as seen in the previoussection. Differential power and false-positives.

So far, we have assumed that power1 − β and false-positive rate α are the same in initial studies and replica-tions. Differences between initial studies and replications have been at thecenter of concerns about replication [9]. Here we analyze a version of ourmodel in which we allow the power and false-positive rate to vary. Let1 − β R and α R be the power and false-positive rate, respectively, for replica-tions. What effects do both higher-powered replication and lower-poweredreplication have on dynamics? HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 13 (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - (a)(b)(c) Tally P r opo r t i on P r opo r t i on P r opo r t i on F IGURE Targeted replication effort. In all three plots, talliesmarked for targeted replication are shown by the shaded region.Precision is indicated in blue, sensitivity in orange, and speciﬁcityin gray. Baseline parameters set to b = α = r = r T = c N − = c R − = c R + =

1. Dashed curves display steady-state without targeted replication, r T =

0. (a) High power setting,1 − β = − β = − β = s = In Fig. 5, we present two extreme, illustrative scenarios. Both scenariosuse b = c N − = c R − = c R + = r = r T = − β = α = − β R = α R = − β = α = − β R = α R = (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - (a) Low/high, c R– = 1 P r opo r t i on P r opo r t i on Tally Tally (b)

High/low, c R– = 1 (c) Low/high, c R– = 0.1 (d) High/low, c R– = 0.1 F IGURE Differential power and replication dynamics. Preci-sion is indicated in blue, sensitivity in orange, and speciﬁcity ingray. (a) Low power initial studies (1 − β = α = − β R = α R = − β = α = − β R = α R = false ones. Unfortunately, it also diffuses many true hypotheses towardsnegative tallies. The high precision at positive tallies is a result of a falsehypothesis’ relative inability to attain a positive replication, not a result ofa true hypothesis’ ability to avoid a negative replication.In the last two panels, (c) and (d), we show how these scenarios changewhen negative replications are suppressed, c R − = ISCUSSION

Ours is the ﬁrst analytical model of the joint population dynamics ofscientiﬁc hypothesis generation, communication, and replication. Such amodel is necessary to illuminate debates about scientiﬁc practice, becauseuntil researchers report the results of every study, empirical estimates ofbase rate are not possible. And without consideration of population dy-namics, any discussion of the value of research ﬁndings remains at leastpartly na¨ıve, because it is notoriously difﬁcult to reason verbally about com-plex systems. Our model produces a number of valuable counter-intuitive

HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 15 results. But even when its results are intuitive, some model like ours isneeded to demonstrate their logic. It is not enough to merely hold the cor-rect belief; we must also justify that belief.This model is not a deﬁnitive representation of the scientiﬁc process, nordoes it aim to be. It omits many relevant factors, such as investigator biasand disagreements about the interpretation of evidence. These omissionsallow the model to address focused questions about the evidential valueof research as it emerges from the joint dynamics of hypothesis generation,replication, and communication. Models that account for more and differ-ent factors must also include variants of these complex dynamics, so ourmodel is a necessary and useful ﬁrst step.Our analysis re-emphasizes what every textbook says: replication is anessential aspect of scientiﬁc discovery. However, it also quantiﬁes its impactand emphasizes that replication itself can be unreliable—the factors thatmake initial ﬁndings unreliable also make replication less reliable. Whenbase rate is low, power is low, or false positives common, then many suc-cessful replications will be needed to attain conﬁdence in an hypothesis.This is especially true when negative replications are difﬁcult to publish.We ﬁnd that low base rate and high false positive rate are the most im-portant threats to the effectiveness of research, replicated or not. This re-emphasizes the importance of quality theorizing, in order to improve baserate. While it is appealing to think that science works regardless of wherehypotheses come from, undisciplined hypothesis generation reduces baserate and makes initial ﬁndings mostly false. Then large amounts of repli-cation will be needed to uncover the truth. In ﬁelds such as physics andevolutionary biology, a great deal can be and is done to vet theory in therealm of pure thought, using mathematics and simulation. But in ﬁeldssuch as social psychology, theory development is rarely formalized [27].The results also re-emphasize the value of efforts to suppress false pos-itive ﬁndings, such as pre-registered data analysis plans. It is importantto recognize that any single scientiﬁc hypothesis may correspond to manydifferent statistical hypotheses. If a statistical hypothesis can be chosen af-ter seeing the data, reasonable scientiﬁc hypotheses can become unreason-ably ﬂexible [28]. And many data-contingent transformations and model-ing choices that increase power, conditional on an hypothesis being true,will also increase false-positives, conditional on the hypothesis being false.For example, dropping outliers may well aid discovery, if the hypothesis istrue. But it may also dramatically inﬂate false-positives, if the hypothesis isnot true [29].Our model immediately informs debates over the meaning of failed repli-cations. For example, some have suggested that positive replications havemore worth than negative replications [12], or even that failed replications “cannot contribute to a cumulative understanding of scientiﬁc phenom-ena” [30]. We ﬁnd the opposite: communicating a failure to replicate is typ-ically more informative than communicating a successful replication. Thisremains true even when replication attempts have lower power than origi-nal studies. However, a single failure to replicate is entirely consistent witha true hypothesis in many scenarios. So both positive and negative repli-cations may be regarded with skepticism. But neither is without value. Ofcourse our model is merely a model. But unlike the verbal arguments wecite, it is at least clear in its assumptions, and its logic can be veriﬁed.Our model also sheds light on proposals for improving the reliability ofresearch. For example, many have called for pre-registration and reviewwith a commitment from journals to publish research results, positive ornegative, in order to reduce under-reporting of negative ﬁndings [31]. Ouranalysis suggests that these proposals should distinguish between new hy-potheses and replication attempts. If indeed many new hypotheses are falsein many ﬁelds, a pre-registration process would merely ﬁll journal pageswith null ﬁndings, doing great harm by crowding out candidate hypothe-ses that have passed an initial screening. In our model, there is little harm inignoring novel negative ﬁndings, because they add very little information.Indeed, Figure 2 illustrates that the effect of ignoring novel negative re-sults on precision is negligible. In contrast, a negative replication may adda lot of information. We suspect however that our model exaggerates thiseffect, because the model ignores the wasted effort arising from different re-searchers repeating an investigation in ignorance of one another’s negativeﬁndings. And there are certainly ﬁelds in which full publication may bethe best policy, such as when false-positive rates are low or when the totalnumber of testable hypotheses is very small. Nevertheless, the qualitativedifference in information value between novel and follow-up negative ﬁnd-ings will remain as long as the base rate in the published literature is higherthan it is in novel investigations.The model stimulates empirical investigation by clarifying which factorsmust be estimated in order to gauge the evidential value of research, as wellas being readily translatable into a statistical framework, due to its analyt-ical speciﬁcation. Our model provides an implicit ‘null model’ of research:setting b = a priori false, but have nevertheless played an important role in science [20].There are additional factors to address in future work. Our model ig-nores researcher bias, multiple testing, and data snooping, each of whichdeﬂates base rate or inﬂates false-positive rate. Our analysis is framed ina standard, but unsatisfying, “true” and “false” classiﬁcation, rather thanconsidering practical signiﬁcance and effect size estimation [26]. Our model HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 17 can be directly generalized to consider variation in effect size instead of trueand false hypotheses. We explain this generalization in the Supporting Ma-terial. However, our model does not directly address causal inference norpoint estimation.Incentives also matter. A dynamic analysis of strategic behavior underdifferent incentive structures would aid policy analysis [18]. As Karl Pop-per argued, science does not work because scientists are selﬂess and unbi-ased people. Rather it works because its institutions channel our bias intothe production of public goods [32]. In particular, we worry that a researchenvironment that lacks replication may actually select for statistical prac-tices that inﬂate false-positives, as labs with such practices can more readilypublish ﬁndings and place students in new positions, all while outrunningthe truth.Replication may offer other beneﬁts that are not accounted for in ourmodel. A failed replication may be valuable because it inspires a new hy-pothesis in order to explain variation in ﬁndings. When ﬁndings do notgeneralize across samples, this creates an opportunity to explain the vari-ation [33, 34]. In our view, the goal of replication is not merely to ﬁnd thesame result, but also to discover how a result arises and how it is likely tovary in realistic, non-laboratory, contexts.Despite these shortcomings, our model provides speciﬁc quantitativeevaluations of many verbal arguments, as well as drawing attention to thepopulation dynamics of scientiﬁc knowledge. Science is a subtle project.Understanding it demands the same rigor that we apply to projects withinscience itself.

SUPPLEMENTAL INFORMATIONReplication, Communication, and the Population Dynamics ofScientiﬁc Discovery

1. D

ERIVATION OF FULL MODEL WITH RANDOM REPLICATION

Let f T, s = n T, s / n be the frequency of true hypotheses with tally s . Underthe assumptions and deﬁnitions supplied in the main text, the full recursionfor n (cid:48) T, s is given by: n (cid:48) T, s = n T, s + anr (cid:0) − f T, s ( c R + ( − β ) + c R − β ) + f T, s − ( − β ) c R + + f T, s + β c R − (cid:1) (5)for s not equal to 1 or −

1. In those cases, there is an additional term. For s = n (cid:48) T,1 = n T,1 (6) + anr (cid:0) − f T,1 ( c R + ( − β ) + c R − β ) + f T,0 ( − β ) c R + + f T,2 β c R − (cid:1) + an ( − r ) b ( − β ) The an ( − r ) b ( − β ) term accounts for inﬂow of novel positive ﬁndings,all of which are communicated. For s = − n (cid:48) T, − = n T, − (7) + anr (cid:0) − f T, − ( c R + ( − β ) + c R − β ) + f T, − ( − β ) c R + + f T,0 β c R − (cid:1) + an ( − r ) b β c N − The an ( − r ) b β c N − term accounts for inﬂow of novel negative ﬁndings,only c N − of which are communicated. Recursions for false hypotheses canbe derived just by substitution of variables: b → − b and 1 − β → α .These recursions implicitly deﬁne the population growth recursion for n : n (cid:48) = n + an ( − r ) (cid:0) b ( − β + β c N − ) + ( − b )( α + ( − α ) c N − ) (cid:1) (8)This just indicates that the population of published hypotheses grows pro-portional to the innovation rate, 1 − r , and the rates at which true and falsehypotheses respectively produce positive and negative ﬁndings, as well asthe rate at which negative ﬁndings are communicated.2. B EYOND “ TRUE ” AND “ FALSE ”Above we noted that recursions for false hypotheses can be derived justby substitution of variables: b → − b and 1 − β → α . In other words,true and false hypotheses are differentiated only by the rate at which theyappear in new investigations and their respective probabilities of producingpositive ﬁndings. This also means it is straightforward to expand the modelto additional epistemic states, as “true” and “false” really just more moreand less correct. For example, small, medium, and large effect sizes could HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 19 be represented by three states, each with its own base rate and probabilityof producing a positive result. The derivation would remain the same, butan additional set of steady-state solutions would appear.3. S

TEADY - STATE SOLUTIONS

We have analyzed this model using a variety of methods. First, we solvedthe model analytically for every structure except for targeted replication (tobe deﬁned later). Second, when analytical solution was not possible, wesolved the model numerically. Third, we studied the model under bothdeterministic and stochastic simulations, written independently by bothauthors in different programming languages. All forms of analysis yieldidentical results.The model above can be solved directly, in one of two ways. First, itcan be solved exactly by bounding tallies within a minimum and maxi-mum (using either absorbing or reﬂecting boundaries) and then solvingthe system of simultaneous equations for values of the state variables f i , s for i ∈ { T, F } . This approach is probably the most straightforward. Second,it can be solved to any level of approximation desired by iteratively solvingthe system of equations outward from s = ansatz is what ourmathematics instructors used to call it. Since the solutions from the brute-force approach looked like closures of inﬁnite series, and the simulation re-sults produced what resembled a mixture of geometric series, we guessedthe underlying limiting distribution. We then veriﬁed our ansatz solutionby plugging it back into the recursions and also by comparing it to numeri-cal results and our previous solutions. Finally, we induced the inﬁnite seriesrepresentation by constructing Taylor series expansions of the closed seriesexpressions, yielding the sequential terms of the solution expression in thenext section.3.1. Full communication solution.

Here we repeat the simplest such so-lution from the main text and then motivate its justiﬁcation. The steadystate proportion of hypotheses that are both true and have tally s , when allﬁndings are communicated, is given by:ˆ p T, s = b ( − r ) ∞ ∑ m = r m − K (cid:0) m , ( m + s ) /2 (cid:1) ( − β ) ( m + s ) β ( m − s ) (9)where K ( m , ( m + s ) /2 ) is the number of ways to get ( m + s ) /2 positiveﬁndings in m investigations of the same hypothesis. This is simple the bi-nomial chooser, but implicitly evaluating to zero whenever ( m + s ) /2 is not an integer. Since s is the difference between the number of positive and neg-ative ﬁndings, this multiplicity accounts for the number of paths by whichan hypothesis can be studied m times and end up with a tally s . The re-maining terms leading with 1 − β and β are just the probabilities of getting ( m + s ) /2 positive ﬁndings and ( m − s ) /2 negative ﬁndings, respectively.Here’s how to motivate the above solution. For any given tally s , thereare an inﬁnite number of histories by which it could have ended up withthat tally. • Consider tally s =

1, for example. If the hypothesis is true, it couldend up most simply at s = ( − r ) b ( − β ) , indicating innovationtimes base rate of true hypotheses times the probability of an initialpositive ﬁnding. • Similarly, if instead the hypothesis has been studied twice, whichhappens ( − r ) br of the time, the number of ways it could end upwith s = K ( ( + ) /2 ) = • For three studies, there are K (

3, 2 ) = s = + + − , (2) + − + , and (3) − + + . The probability of any oneof these is ( − β ) β , and the probability that an hypothesis is trueand has been studied three times is ( − r ) br .The pattern here generalizes so that the total probability is just: • the sum over number of studies on an hypothesis from m = m = ∞ of the probability the hypothesis was studied m times, givenby ( − r ) r m − • times the number of ways it could end up with a tally s in m steps,given by K ( m , ( m + s ) /2 ) • times the probability of getting ( m + s ) /2 positive and ( m − s ) /2negative ﬁndings.Writing down this summation and factoring out the common term b ( − r ) completes the expression.This steady-state solution obviously assumes that there has been an inﬁ-nite amount of research time, such that every m can be realized. In practice,since the sequence is geometric in r , the probabilities of higher values of m decline very rapidly and simulations conﬁrm that steady-state is reachedquite rapidly, as long as the replication rate r is not close to r = HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 21 real dynamical system, they are never exactly realized. For example, prob-lems in evolutionary theory are routinely solved by asking what happenson the inﬁnite time horizon. Such solutions have been incredibly useful, de-spite the fact that no real population or environment is stationary enoughto make the exercise literally sensible.3.2.

Arbitrary communication solution.

When communication parametersare allowed to be less than one, the above strategy generalizes directly, butdoes become complex. The expressions get much more complex, becausenow the inﬁnite series is over multinomial probabilities of three possibleoutcomes at each replication investigation of an hypothesis: (1) positiveand communicated, (2) negative and communicated, or (3) not communi-cated. In addition, when ﬁndings are not always communicated, then theeffective activity rate changes, making other probabilities conditional onobservable activity. Still, these solutions can be derived both by the logicto follow or by brute-force solution of the system of recursions. Solvingthe system of recursions does allow for easily deﬁning reﬂecting or absorb-ing tally boundaries, which may be appealing in some contexts. The com-binatoric solution to follow assumes unbounded tallies. Solutions in thebounded and unbounded cases are nearly identical, for all scenarios con-sidered in the main text. The Mathematica notebooks in the supplementalmaterials present code for both types of solution.We present the solutions here as a sequence of conditional probabilities,as we’ve found this form easier to interpret than the general multinomialform. Therefore they provide more insight. Speciﬁcally, we decompose themultinomial probabilities into a binomial series for observed/unobservedinvestigations of a hypothesis and a binomial series for positive/negativeﬁndings conditional on being observed. The solutions take the form:ˆ p T, s = Pr ( T ) Pr ( activity ) Pr ( new | activity ) (cid:0) ( − β ) Pr ( s | +) + β c N − Pr ( s |− ) (cid:1) (10)Where: Pr ( T ) = b (11)Pr ( activity ) = r + ( − r ) (cid:0) b (( − β ) + β c N − ) + ( − b )( α + ( − α ) c N − ) (cid:1) (12)Pr ( new | activity ) = ( − r ) (cid:0) b (( − β ) + β c N − ) + ( − b )( α + ( − α ) c N − ) (cid:1) Pr ( activity ) (13)The probabilities Pr ( s | +) and Pr ( s |− ) give the probabilities of tally s aver-aging over number of investigations m and un-communicated ﬁndings u ,beginning with either a positive ﬁnding or a negative ﬁnding, respectively. This conditioning is necessary because a tally s can be reached by differentpaths once communication is partial. These probabilities are given by:Pr ( s | +) = I ( s ) + ∞ ∑ m = m ∑ u = R m Pr ( u | m ) S ( s − | m − u ) (14)Pr ( s |− ) = I − ( s ) + ∞ ∑ m = m ∑ u = R m Pr ( u | m ) S ( s + | m − u ) (15)where I a ( b ) is a function that returns 1 when a = b and zero otherwiseand R = r / Pr ( activity ) is the probability of replication, conditional on ac-tivity as deﬁned earlier. The term Pr ( u | m ) gives the probability of u un-communicated ﬁndings in m investigations, deﬁned as:Pr ( u | m ) = m ! u ! ( m − u ) ! q u ◦ ( − q ◦ ) m − u (16)where q ◦ = ( − β R )( − c R + ) + β R ( − c R − ) (17)is the probability a replication ﬁnding is un-communicated, averaging overpositive and negative ﬁndings. Finally, the function S ( z | n ) provides theprobability that a sequence of length n communicated replication ﬁndingsproducing a difference z between positive and negative replications. It isdeﬁned as: S ( z | n ) = (cid:40) I ( z ) if n = K ( n , ( n + z ) /2 ) q ( n + z ) /2 + ( − q + ) ( n − z ) /2 if n > K ( a , b ) is again the binomial chooser function, but evaluating to zerowhen b is not an integer, and: q + = ( − β R ) c R + − q ◦ (19)which is the probability of a positive replication, conditional on the replica-tion ﬁnding being communicated.4. A PPROXIMATE CONDITIONS FOR REDUCED COMMUNICATION

We argue in the main text that full communication is rarely optimal,from the perspective of precision. Consider the full communication con-text: c N − = c R − = c R + =

1. For small b ( b ≈

0) and small r ( r ≈ • c N − < α < β (easy to satisfy) • c R − < α > • c R + < β − α ≤ HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 23

These conditions are derived by ﬁrst deﬁning precision at s =

1, which ismost conservative precision to investigate, because it beneﬁts the least fromreplication, and higher tallies always have higher precision than s =

1. Soimprovements at s = be theprecision at s =

1. Then the ﬁrst condition is proved by computing thederivative ∂ PPV / ∂ c N − , evaluated at full communication parameter val-ues. Then Taylor expand the result simultaneously by second-order around r = b =

0. Neglecting terms of order O ( b ) and O ( r ) and higher: ∂ PPV ∂ c N − ≈ − r − βα b ( β − α )( − β − α )( − α ) (20)which is negative unless α > β . Thus suppressing some initial negativeﬁndings is favorable, provided the base rate is small and replication is nottoo common. We think most scientiﬁc ﬁelds satisfy these conditions, butreasonable people can and do disagree on that point.In contrast, suppressing negative replications is unlikely to help. By thesame strategy, but this time differentiating with respect to c R − : ∂ PPV ∂ c R − ≈ rb − βα ( − β − α )( + r ( β − α )) (21)which is guaranteed positive, indicating that c R − = α ≤ − β > α .The third condition is derived similarly: ∂ PPV ∂ c R + ≈ − br − βα ( − β − α )( − r ( β − α )) (22)The last term is the one in play. For the above to be negative, it is requiredthat: r <

14 1 β − α (23)And this is guaranteed when β − α ≤ R EFERENCES

1. Ioannidis JPA. Why Most Published Research Findings Are False. PLoS Med.2005 Aug;2(8):e124. Available from: http://dx.doi.org/10.1371/journal.pmed.0020124 .2. Makel MC, Plucker JA, Hegarty B. Replications in Psychology Research How Of-ten Do They Really Occur? Perspectives on Psychological Science. 2012;7(6):537–542.Available from: http://pps.sagepub.com/content/7/6/537 .3. Franco A, Malhotra N, Simonovits G. Publication bias in the social sciences: Un-locking the ﬁle drawer. Science. 2014;345:1502–1505. Available from: .4. Schmidt S. Shall we really do it again? The powerful concept of replication is ne-glected in the social sciences. Review of General Psychology. 2009;13(2):90–100.5. Begley CG, Ellis LM. Drug development: Raise standards for preclinical cancer re-search. Nature. 2012;483:531–533. Available from: .6. Prinz F, Schlange T, Asadullah K. Believe it or not: how much can we rely on pub-lished data on potential drug targets? Nat Rev Drug Discov. 2011;10(9):712–712.Available from: .7. Sullivan PF. Spurious Genetic Associations. Biological Psychiatry. 2007May;61(10):1121–1126. Available from: .8. Fontani M, Costa M, Orna MV. The Lost Elements: The Periodic Table’s Shadow Side.Oxford University Press; 2014.9. Bissell M. Reproducibility: The risks of the replication drive. Na-ture. 2013;503:333–334. Available from: .10. Bohannon J. Replication effort provokes praise—and ‘bullying’ charges. Sci-ence. 2014;344:788–789. Available from: .11. Kahneman D. A new etiquette for replication. Social Psychology. 2014;45:310–311.12. Schnall S. Clean data: Statistical artefacts wash out replication efforts. Social Psy-chology. 2014;45(4):315–320. Available from: .13. Rosenthal R. The ﬁle drawer problem and tolerance for null results. PsychologicalBulletin. 1979;86(3):638–641.14. Hull DL. Science as a Process: An Evolutionary Account of the Social and ConceptualDevelopment of Science. Chicago, IL: University of Chicago Press; 1988.15. O’Rourke K, Detsky AS. Meta-analysis in medical research: Strong encouragementfor higher quality in individual research efforts. Journal of Clinical Epidemiology.1989;42(10):1021–1024.16. Campbell DT. Toward an epistemologically-relevant sociology of science. Science,Technology, & Human Values. 1985;10(1):38–48.17. Popper K. Conjectures and Refutations: The Growth of Scientiﬁc Knowledge. NewYork: Routledge; 1963.18. Kitcher P. Reviving the Sociology of Science. Philosophy of Science. 2000;67:S33–S44.19. Levins R. The Strategy of Model Building in Population Biology. American Scientist.1966;54.

HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 25

20. Wimsatt WC. False Models as means to Truer Theories. In: Nitecki M, Hoffman A,editors. Neutral Models in Biology. London: Oxford University Press; 1987. p. 23–55.21. Munroe R. “Signiﬁcant”: http://xkcd.com/882/; 2014. Available from: http://xkcd.com/882/ [cited 2014].22. Simmons JP, Nelson LD, Simonsohn U. False-Positive Psychology Undisclosed Flex-ibility in Data Collection and Analysis Allows Presenting Anything as Signiﬁcant.Psychological Science. 2011;22(11):1359–1366. Available from: http://pss.sagepub.com/content/22/11/1359 .23. Cox RT. Probability, Frequency and Reasonable Expectation. American Journal ofPhysics. 1946;14:1–10.24. Sedlemeier P, Gigerenzer G. Do studies of statistical power have an effect on thepower of studies? Psychological Bulletin. 1989;105(2):309–316.25. Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, et al. Powerfailure: why small sample size undermines the reliability of neuroscience. NatRev Neurosci. 2013;14(5):365–376. Available from: .26. Gelman A, Loken E. Ethics and Statistics: The AAA Tranche of Subprime Science.CHANCE. 2014;27(1):51–56. Available from: http://amstat.tandfonline.com/doi/abs/10.1080/09332480.2014.890872 .27. Smaldino PE, Calanchini J, Pickett CL. Theory development with agent-based mod-els. Organizational Psychology Review. 2015;in press.28. Gelman A, Loken E. The garden of forking paths: Why multiple comparisons can bea problem, even when there is no ‘ﬁshing expedition’ or ‘p-hacking’ and the researchhypothesis was posited ahead of time. Department of Statistics, Columbia University;2013.29. Bakker M, Wicherts JM. Outlier removal, sum scores, and the inﬂation of the Type Ierror rate in independent samples t tests: the power of alternatives and recommen-dations. Psychological Methods. 2014;19:409–427.30. Mitchell J. On the emptiness of failed replications; 2014. Available from: http://wjh.harvard.edu/~jmitchel/writing/failed_science.htm .31. American Political Science Association Task Force on Public Engagement. Increasingthe credibility of political science research: A proposal for journal reforms; 2014.32. Popper K. The Myth of the Framework: In Defence of Science and Rationality. Rout-ledge; 1996.33. Henrich J, Ensminger J, McElreath R, Barr A, Barrett C, Bolyanatz A,et al. Markets, Religion, Community Size, and the Evolution of Fairnessand Punishment. Science. 2010;327:1480–1484. Available from: /rmpubs/henrichetalfairnessmarketsreligiongroupsizeScience2010.pdf .34. Scott IM, Clark AP, Josephson SC, Boyette AH, Cuthill IC, Fried RL, et al. Humanpreferences for sexually dimorphic faces may be evolutionarily novel. Proceedings ofthe National Academy of Sciences. 2014;111(40):14388–14393. Available from: