Replication, Communication, and the Population Dynamics of Scientific Discovery
RREPLICATION, COMMUNICATION, AND THE POPULATIONDYNAMICS OF SCIENTIFIC DISCOVERY
RICHARD MCELREATH
AND PAUL E. SMALDINO November 16, 2018
Many published research results are false [1], and controversy continuesover the roles of replication and publication policy in improving the reli-ability of research. Addressing these problems is frustrated by the lack ofa formal framework that jointly represents hypothesis formation, repli-cation, publication bias, and variation in research quality. We develop amathematical model of scientific discovery that combines all of these ele-ments. This model provides both a dynamic model of research as well asa formal framework for reasoning about the normative structure of sci-ence. We show that replication may serve as a ratchet that gradually sep-arates true hypotheses from false, but the same factors that make initialfindings unreliable also make replications unreliable. The most impor-tant factors in improving the reliability of research are the rate of falsepositives and the base rate of true hypotheses, and we offer suggestionsfor addressing each. Our results also bring clarity to verbal debates aboutthe communication of research. Surprisingly, publication bias is not al-ways an obstacle, but instead may have positive impacts—suppression ofnegative novel findings is often beneficial. We also find that communica-tion of negative replications may aid true discovery even when attemptsto replicate have diminished power. The model speaks constructively toongoing debates about the design and conduct of science, focusing anal-ysis and discussion on precise, internally consistent models, as well ashighlighting the importance of population dynamics.Keywords: replication, publication bias, epistemology, scientific method D EPARTMENT OF A NTHROPOLOGY , UC D
AVIS , O NE S HIELDS A VENUE , D
AVIS
CA95616 C ENTER FOR P OPULATION B IOLOGY , UC D
AVIS
E-mail address : [email protected] . a r X i v : . [ s t a t . O T ] J u l MCELREATH & SMALDINO I NTRODUCTION
Imagine two of your close colleagues have just heard about attempts toreplicate their positive research findings. Colleague A is thrilled that theattempt was successful. Colleague B is upset that the attempt was unsuc-cessful. What is the probability that Colleague A’s hypothesis is true? Whatis the probability that Colleague B’s hypothesis is false?This is not a fair quiz, because in truth no one knows the answers to thesequestions. The absence of replication in many fields [2–4], combined withthe absence of a formal framework for understanding replication, makes itdifficult to even outline an answer. In the absence of replication, there issubstantial concern that many published findings may be false [1], an argu-ment with empirical support [5–7]. The history of science buttresses theseobservations. A recent catalog of false discoveries of chemical elements out-numbers the current number of real elements in the periodic table [8]. Inaddition to concerns about replication are concerns about research practiceand publication bias. Without knowing how many studies were conductedbut not published, it is not possible to assign evidential value to either ini-tial findings or replications. And it is not yet easy to acquire empirical evi-dence about these factors, as even the best empirical studies of publicationbias still rely upon researcher self-report [3].Thus many opinions can be sustained about the evidential value of bothinitial findings and replications. As a result, recent controversies over failedreplications demonstrate a lack of consensus on norms for replication andpublication [9–12]. What is the evidential value of replication, positive ornegative? What is the impact of publication bias [13]? If replication is partof an “invisible hand” [14] that corrects scientific errors, how much repli-cation is needed? And what are the risks of poorly designed or interpretedreplication attempts [9]? When replication is not possible or practical, whatother measures can be taken to improve the reliability of research?These questions remind us that little is understood about the populationdynamics of discovery, replication, and scientific communication. Muchmore attention has been given to individual methods of research design anddata analysis. And while it is useful to analyze research methods in isola-tion, such calculations are unsatisfying. A lot of research activity is hiddenfrom the public record. This means the actual number of findings for an hy-pothesis may never be known [13]. And since researchers select hypothesesfor further study from the literature itself, findings and publication biasescascade into other findings, interacting with biases and incentives [15].To know the evidential value of research, we must study the popula-tion dynamics that produce it [14, 16–18]. So here we construct and solve amathematical model of scientific beliefs formed by a population of bound-edly rational agents who accumulate evidence for and against hypotheses.
HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 3
We adopt a general signal detection framework that may apply to diversestatistical paradigms, whether p -valued or Bayesian. We study the jointdynamics that arise from replication, publication bias, and differences inresearch quality between original studies and replications. Our goal is notto accurately simulate science, but rather to understand it better using thesame reductionist tools that have been so successful in illuminating pop-ulation dynamics more generally [19, 20]. Our model implicitly provides,for example, a neutral model of scientific dynamics in which all hypothesesare false and yet discoveries are continuously published. It also provides arange of “selectionist” models that might be compared to data. The clarityof a quantitative framework will stimulate and clarify the development oflater empirical investigation and experimental intervention.The paper proceeds by first outlining the dynamic structure of the model.We then solve the model for both its long-run dynamics and its epistemo-logical implications—what should a rational agent believe about an hy-pothesis, given a record of published results? We present a general interpre-tation of the joint dynamics, so the reader can extrapolate lessons from oursimple model to the complexity and diversity of real science. We concludeby relating our results to ongoing debates about improving the reliabilityof scientific research. M ODEL D ESCRIPTION
The model is illustrated in Fig. 1. We have also constructed an interactive,web-based tutorial on the conceptual foundations of the model, as well asfully adjustable simulation code, available at \protect http://xcelab.net/replication/.A population of researchers studies many different hypotheses. Each hy-pothesis is either true (green) or false (red). These hypotheses could be sim-ple associations, such as green jelly beans cause acne [21], or more generalclaims, such as evolution is predictable . Research results in either a positive ora negative finding. These findings may be the result of formal hypothesistests or informal assessments. True hypotheses produce positive findingsmore often than do false hypotheses, but the researchers never know forsure which hypotheses are true. Under these assumptions, the only infor-mation relevant for judging the truth of an hypothesis is its tally , the differ-ence between the number of published positive findings and the numberof published negative findings for each hypothesis, and we summarize re-sults in terms of these tallies. In reality, much other information is relevantto judging the truth of an hypothesis. Our assumptions are tactical ones.More complex models of scientific communication are possible, but anysuch model must include the components in our model, and so our resultsestablish a critical baseline.
MCELREATH & SMALDINO
1. Hypothesis Selection ! Novel hypotheses ! Tested hypotheses ! A previously tested hypothesis is selected for replication with probability r, otherwise a novel (untested) hypothesis is selected. Novel hypotheses are true with probability b. ! r ! r !
2. Investigation ! T ! Real truth of hypothesis ! P r obab ili t y o f r e s u l t ! β α β α +–
3. Communication ! Experimental results are communicated to the scientific community with a probability that depends upon both the experimental result (+, –) and whether the hypothesis was novel (N) or a replication (R). Communicated results join the set of tested hypotheses. Uncommunicated replications revert to their prior status. ! C N – C N – positive results ! negative results ! C R+ C R+ New result communicated ! New result not communicated ! C R– C R– File drawer ! novel ! replic. ! novel ! replic. ! True (T) ! False (T) ! KEY ! Interior = true epistemic state ! Exterior = experimental evidence ! Unknown ! Positive (+) ! Negative (–) ! General case ! General case (+ or –) ! F ! F IGURE Population dynamics of replication.
Each time interval, research activity has three stages that alter these tal-lies. In stage 1 (Fig. 1, upper-left) each researcher chooses to investigateone of n previously published hypotheses, with probability r , or a novelhypothesis, with probability 1 − r . When replicating, a researcher chooses apreviously published hypothesis at random and performs a new study of it.Later, we allow researchers to target hypotheses with specific tally values,rather than choosing at random. A novel hypothesis is true with probability b , the base rate , reflecting mechanisms of hypothesis formation. Untutoredintuition, for example, may be expected to yield a very low b . Genome wideassociation studies likewise have low b , because relatively few loci are as-sociated with any particular phenotype. There is no consensus on base rate,except that most scientists we know believe their own personal b values arebetter than average. So we allow b to vary freely in the model.In stage 2, a true hypothesis produces a positive finding 1 − β of the time,its power . A false hypothesis produces a positive finding α of the time, its false positive rate . We assume that 1 − β > α . Later we allow the values of β and α to differ between replication attempts and initial studies. Note that β and α are not merely properties of a statistical procedure, but rather of an HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 5 entire investigation. For example, using several procedures and selectingthe one that produces a positive result will inflate α [22].In stage 3, findings may be communicated to other researchers. Not ev-ery finding is communicated, either because no one tries to communicate itor rather because it cannot be published. Only communicated findings canadjust a tally. Let c N − be the probability that a negative ( − ) finding abouta new (N) hypothesis is communicated. We assume for simplicity that allnew positive results are communicated ( c N + = c R − and c R + be the probabilities thatreplications with negative and positive findings, respectively, are commu-nicated.These assumptions define the dynamics of the expected numbers of trueand false hypotheses with a given tally. We present the full recursions inthe Supporting Material. In the simplest case (full communication: c N − = c R − = c R + = n T, s of true hypotheses with an observed tally s in the next time step is given by: n (cid:48) T, s = n T, s + anr (cid:16) − n T, s n + n T, s − n ( − β ) + n T, s + n β (cid:17) (1)where a > n . This ex-pression says that the number in the next time step is just the current num-ber plus all of the flows in and out caused by replications. In the case that s = − s =
1, there is an additional term an ( − r ) b β or an ( − r ) b ( − β ) ,respectively, to represent the inflow of novel findings. Recursions n (cid:48) F, s forfalse hypotheses are constructed from a change in variables: 1 − β → α , b → − b . Notice that this implies that the model is easily extended to anynumber of hypothesis types, such as effect size differences, that differ inpower and false-positive rate. We analyze the true / false dichotomy becauseof its prominence and simplicity.A NALYSIS
By literature review, a tally can be constructed for any given hypothe-sis. Given an observed tally, but a number of possibly unobserved studies,what is the probability that an hypothesis is correct? The model allows usto address this question for a diversity of scenarios. Before presenting thesolutions, note that the answers that the model provides can be understoodboth from a pure population dynamics perspective and from a probabilisticreasoning perspective. From the dynamics perspective, the population willconverge from any initial condition to a unique steady state in which the so-lutions give frequencies of true hypotheses at each tally value. Equally validis the epistemological perspective that the solutions tell us for any uniquehypothesis the probability it is true, given a state of information [23]. One
MCELREATH & SMALDINO consequence of this is that the solutions do not require that all hypothesesshare the same parameter values.For each tally value s , we solved for the steady state proportions of trueand false hypotheses, ˆ p T, s and ˆ p F, s . We also derived the same solutions un-der the probabilistic interpretation, and verified our solutions numericallyand through stochastic simulation. We present complete analytical solu-tions in the Supporting Material. In the simplest case (for full communica-tion), solutions take the form:ˆ p T, s = b ( − r ) ∞ ∑ m = r m − (cid:18) m ( m + s ) (cid:19) ( − β ) ( m + s ) β ( m − s ) (2)This expression defines an infinite geometric series of binomial probabilitiesarising from all of the different possible histories by which a true hypoth-esis could achieve a tally of s , for every possible number of findings m . Inthe majority of cases, only the first few terms of the series are important,because of the leading factor r m − . This fact also informs us that the rate ofconvergence to steady state will be quite rapid, unless r is large.For any particular tally, for example s =
1, expression (2) yields a closed-form solution like:ˆ p T,1 = b ( − r ) β r (cid:18)(cid:0) − r β ( − β ) (cid:1) − − (cid:19) (3)For arbitrary communication parameters, the solutions have a similar struc-ture, but are instead a series of multinomial probabilities in which the eventsare combinations of findings ( + or − ) and communication outcomes.These solutions are not easy to interpret by inspection. But they do pro-vide answers to the question: what is the probability that an hypothesis with agiven tally is correct? For any tally s , we can calculate:Pr ( true | s ) = ˆ p T, s ˆ p T, s + ˆ p F, s , Pr ( s | true ) = ˆ p T, s ∑ i ˆ p T, i , Pr ( s | false ) = ˆ p F, s ∑ i ˆ p F, i (4)The precision of a tally s is Pr ( true | s ) , the proportion of hypotheses withtally s that are true. The sensitivity , Pr ( s | true ) , is the proportion of true hy-potheses with tally s . It indicates where the true hypotheses are. Sensitivityis important because a high precision for a tally s is little help when thereare few hypotheses that achieve a tally s . And the specificity , Pr ( s | false ) , isthe proportion of false hypotheses with tally s , indicating where the falsehypotheses are. We use these definitions to explain the behavior of the sys-tem. Overall dynamics.
Fig. 2 describes the overall dynamics of precision, as afunction of the different parameters. In each panel, the trend lines showthe proportion of true hypotheses at each tally on the vertical axis. Thetally corresponding to each trend is indicated by a number. The horizontal
HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 7 axis in each panel varies a single parameter. Each vertical hairline showsthe value of each parameter that is held constant in other panels. This fig-ure is complex. We’ll use it to highlight the most important factors in thereliability of findings and demonstrate counter-intuitive aspects of commu-nication. Then in the next section, we’ll turn to a more general explanationof the causes of these results.There are two clusters of plots. The top cluster represents a normativelyoptimistic scenario, with an auspicious base rate ( b = − β = α = b = − β = α = b < − ), to predicting the winner of apresidential election, on the high end ( b = s = s = s = b = s = s = s = s = b = r in panel (b), has remarkably littleimpact. This is because replication impacts the rate at which hypothesesreach different tallies, but not so much the precision at each tally. Therefore MCELREATH & SMALDINO (cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:5)(cid:6)(cid:7) (cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8) P r opo r t i on t r ue P r opo r t i on t r ue P r opo r t i on t r ue P r opo r t i on t r ue base rate ( b ) replication rate ( r ) power (1– β ) false-positive rate ( α )comm. neg. rep. ( c R– ) comm. pos. rep. ( c R+ ) comm. neg. new ( c N– )
123 04(a) (b) (c) (d)(e) (f) (g)0 1 11 1 1 O p t i m i s t i c sce n a r i o P ess i m i s t i c sce n a r i o base rate ( b ) replication rate ( r ) power (1– β ) false-positive rate ( α )comm. neg. rep. ( c R– ) comm. pos. rep. ( c R+ ) comm. neg. new ( c N– ) F IGURE Effects of base rate, replication, power, false-positives,and communication on the probability that an hypothesis with agiven tally is true. The two clusters illustrate difference scenarios.The blue trends, each labeled with its tally value, show precisionas it varies by the parameter on each horizontal axis. The numbersindicate the tally of a curve. Dashed curves are tallies of an evennumber. The vertical hairlines show the parameter values heldconstant across panels within the same cluster. at low replication rates, few hypotheses will ever attain s =
5, but those thatdo are almost certainly true. We expand on this point in the next section.Third, communication of findings, panels (e-g), can both assist discoveryor hinder it. Suppression of negative replications (e) reduces precision. But
HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 9 suppression of positive replications (f) and novel negative findings (g) ei-ther improves precision or has almost no impact on it. These aspects of thepopulation dynamics are counter-intuitive, but quite general and revealing.The next section explains them.
Dynamics of communication.
The “file drawer problem” [13] arises whenthe failure to publish negative findings distorts the estimated strength of anassociation. We consider a related phenomenon by asking how changes inthe communication parameters c N − , c R − , and c R + alter the precision, sensi-tivity, and specificity across tallies. In the process, we’ll have opportunity toexplain the joint dynamics of research quality and communication biases.In this model, it is rarely best to communicate everything. In the Sup-porting Material, we prove for the case of small b (such that b ≈
0) andsmall r ( r ≈
0) that c N − < α < β (usuallysatisfied), that c R − < α > (hopefully neversatisfied), and that c R + < β − α ≤ (of-ten satisfied). So some suppression of novel negative findings ( c N − < c R + <
1) can improve the value of replication.At larger b and r , the conditions are more complicated, but the qualitativefinding remains intact.To grasp why suppressing findings might help us learn what is true,think of replication as epistemological chromatography . Chromatography isa set of techniques for separating substances that are mixed together. Forexample, mixed plant pigments can be separated by painting the mixtureonto the tip of a strip of filter paper and then soaking the tip in a solvent.Different pigments bind more or less strongly to the solvent or the paper.Therefore as the paper absorbs the solvent, different pigments travel at dif-ferent speeds, eventually separating and appearing as differently coloredbands on the paper. In the epistemological case, it is true and false hy-potheses that are mixed. We wish to separate the true ones from the false.Replication applies a “solvent” that diffuses false hypotheses towards neg-ative tallies and true hypotheses towards positive tallies. A true hypothesisdiffuses upwards with probability ( − β ) c R + , while a false hypothesis dif-fuses downwards with probability ( − α ) c R − . Thus the communicationparameters adjust rates of diffusion. Just as manipulating rates of chemicaldiffusion can improve real chromatography, manipulating communicationcan improve epistemological chromatography.In Fig. 3, we turn on communication one parameter at a time, in orderto explain the contribution of each mode of communication to the resultingpopulation dynamics. All four panels (a, b, c, d) show steady state preci-sion, sensitivity, and specificity and use b = r = − β = α = c N– = c R– = c R+ = 1. True and false hypotheses diffuse in both directions, and everything is communicated. Since most effort investigates new findings at tally –1, few hypotheses ever achieve a high tally, but those that do have high precision . (a) Positive only c N– = 0, c R– = 1, c R+ = 0. Tallies can only decrease. False hypotheses diffuse down faster than true ones. But since the mixture at tally +1 is mostly false, precision is always low. (b)
Negative only c N– = 0, c R– = 0, c R+ = 1. Only positive findings are initially communicated, and replication can only increase tallies, which are here counts of positive findings. True hypotheses diffuse upward faster than false ones. So large tallies have a high precision , the proportion of true hypotheses. (c)
Screen and check P r opo r t i on Solid : c N– = 0, c R– = 1, c R+ = 1. Up diffusion of true hypotheses is aided by down diffusion of false ones from the mixed source tally +1. Compare precision to (a) . Dashed : c N– = 0, c R– = 1, c R+ = 0.2. Suppressing positive replications regulates the rate of up diffusion, purifying high tallies at the price of sensitivity . (d) Total communication 1– αβ βα (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - P r opo r t i on P r opo r t i on P r opo r t i on Tally (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - - - βα αβ (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - ≥≤ F IGURE Replication and communication as epistemologicalchromatography. Precision is indicated in blue, sensitivity in or-ange, and specificity in gray. any parameters the reader chooses. Note that for sensitivity and specificity,probability above/below the highest/lowest tally displayed is added up onthe highest/lowest tally, so that none of the probability mass is hidden.In the first three panels (a, b, c), only positive initial findings are com-municated, and all new hypotheses appear at tally s =
1. The mixture ofhypotheses at this tally is heavily skewed towards false hypotheses, andso has a low precision. Replication may cause an hypothesis to diffuse ineither direction, depending upon communication. In panel (a), negativefindings are never communicated. But since true hypotheses diffuse up ata rate 1 − β and false ones only at a rate α < − β , truth is slowly separated HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 11 from falsity. At tallies of 8 or more, nearly all hypotheses are true, as indi-cated by the precision. Note however that most true hypotheses that havebeen communicated at all exist at low tallies, as indicated by the sensitivity.With enough time and replication, every true hypothesis can be split fromthe false. This is unlike the case in panel (b), where only negative repli-cations are communicated. The same dynamic works in reverse here, andreplication creates a pure sample of false hypotheses at low tallies.Combining both directions of diffusion is synergistic, as illustrated inpanel (c). Now both positive and negative replications are communicated.The downward diffusion of false hypotheses makes the upward diffusionof true hypotheses more efficient. This effect arises because 1 − α > − β .False hypotheses diffuse down faster than true hypotheses diffuse up. Thispurifies the source mixture at s =
1, allowing for precision to approachhigh values at much smaller tallies than in the absence of either diffusionprocess. In this example, hypotheses with tallies of s = c R + <
1, wehave effectively slowed all upward diffusion. This allows rapid downwarddiffusion from negative replications to further clean the source mixture, butat the cost of diffusing more true hypotheses towards negative tallies. Thisdynamic is beneficial when base rate is especially low. So we achieve a veryclean sample of truth at smaller positive tallies in this scenario, but at theprice of finding fewer true hypotheses in total. Whether this is an improve-ment depends upon context, an issue we take up in the discussion.Finally, full communication is illustrated in panel (d). High precision isachieved at high tallies, but few hypotheses reside at those tallies. This in-efficiency arises from the unbiased allocation of replication effort. When allinitial findings are communicated, replication effort is overwhelmed by fol-lowing up on initial negative findings, the spike in specificity seen at tally s = −
1. When the base rate is low, it can be better to screen for positive find-ings than to publish every negative finding. Note however that increasingprecision, the proportion of hypotheses at a given tally that are true, is notnecessarily the only objective. It does us little good if sensitivity is very lowat all high tally values. We return to this point in a later section, when we consider differential power and false-positive rates between initial studiesand replications.
Targeted replication.
Replication in the preceding analysis is purely ran-dom: every communicated hypothesis has an equal chance of being thetarget of a replication effort. Targeting particular tally values, like s = r T of all replication attempts target a chosen list oftally values, selecting an hypothesis randomly from all hypotheses withinthe list. For example, this list might consist of all previously communicatedhypotheses with a positive tally of three or less, so that researchers con-centrate their replication efforts on hypotheses thought to be true but withrelatively high uncertainty. The rest of the time, 1 − r T , replication effortremains unbiased.Fig. 4 shows the resulting modification of the dynamics. The dashedcurves in these plots show the steady-state dynamics in the absence of tar-geting. The shaded pink regions show the range of tally values included inthe target. In each case, targeting improves sensitivity at higher positive tal-lies. Thus it helps to diffuse true hypotheses towards tallies with very highprecision. But there is very little effect on precision itself. Targeting helpsbecause it directs effort towards tallies that may not have a high density ofhypotheses. When replication effort is unbiased, most effort is directed totallies where the bulk of hypotheses reside. Therefore when the target rangeincludes a wide range, as in panel (c), it becomes relatively ineffective.Why doesn’t targeting improve the proportion of hypotheses that aretrue at higher tallies? Targeting serves mainly to speed up diffusion, with-out altering the relative rates at which true and false hypotheses diffuse.Changes in communication rates, in contrast, do alter the differential ratesof diffusion, and so may dramatically alter precision, as seen in the previoussection. Differential power and false-positives.
So far, we have assumed that power1 − β and false-positive rate α are the same in initial studies and replica-tions. Differences between initial studies and replications have been at thecenter of concerns about replication [9]. Here we analyze a version of ourmodel in which we allow the power and false-positive rate to vary. Let1 − β R and α R be the power and false-positive rate, respectively, for replica-tions. What effects do both higher-powered replication and lower-poweredreplication have on dynamics? HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 13 (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - (a)(b)(c) Tally P r opo r t i on P r opo r t i on P r opo r t i on F IGURE Targeted replication effort. In all three plots, talliesmarked for targeted replication are shown by the shaded region.Precision is indicated in blue, sensitivity in orange, and specificityin gray. Baseline parameters set to b = α = r = r T = c N − = c R − = c R + =
1. Dashed curves display steady-state without targeted replication, r T =
0. (a) High power setting,1 − β = − β = − β = s = In Fig. 5, we present two extreme, illustrative scenarios. Both scenariosuse b = c N − = c R − = c R + = r = r T = − β = α = − β R = α R = − β = α = − β R = α R = (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) - - - (a) Low/high, c R– = 1 P r opo r t i on P r opo r t i on Tally Tally (b)
High/low, c R– = 1 (c) Low/high, c R– = 0.1 (d) High/low, c R– = 0.1 F IGURE Differential power and replication dynamics. Preci-sion is indicated in blue, sensitivity in orange, and specificity ingray. (a) Low power initial studies (1 − β = α = − β R = α R = − β = α = − β R = α R = false ones. Unfortunately, it also diffuses many true hypotheses towardsnegative tallies. The high precision at positive tallies is a result of a falsehypothesis’ relative inability to attain a positive replication, not a result ofa true hypothesis’ ability to avoid a negative replication.In the last two panels, (c) and (d), we show how these scenarios changewhen negative replications are suppressed, c R − = ISCUSSION
Ours is the first analytical model of the joint population dynamics ofscientific hypothesis generation, communication, and replication. Such amodel is necessary to illuminate debates about scientific practice, becauseuntil researchers report the results of every study, empirical estimates ofbase rate are not possible. And without consideration of population dy-namics, any discussion of the value of research findings remains at leastpartly na¨ıve, because it is notoriously difficult to reason verbally about com-plex systems. Our model produces a number of valuable counter-intuitive
HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 15 results. But even when its results are intuitive, some model like ours isneeded to demonstrate their logic. It is not enough to merely hold the cor-rect belief; we must also justify that belief.This model is not a definitive representation of the scientific process, nordoes it aim to be. It omits many relevant factors, such as investigator biasand disagreements about the interpretation of evidence. These omissionsallow the model to address focused questions about the evidential valueof research as it emerges from the joint dynamics of hypothesis generation,replication, and communication. Models that account for more and differ-ent factors must also include variants of these complex dynamics, so ourmodel is a necessary and useful first step.Our analysis re-emphasizes what every textbook says: replication is anessential aspect of scientific discovery. However, it also quantifies its impactand emphasizes that replication itself can be unreliable—the factors thatmake initial findings unreliable also make replication less reliable. Whenbase rate is low, power is low, or false positives common, then many suc-cessful replications will be needed to attain confidence in an hypothesis.This is especially true when negative replications are difficult to publish.We find that low base rate and high false positive rate are the most im-portant threats to the effectiveness of research, replicated or not. This re-emphasizes the importance of quality theorizing, in order to improve baserate. While it is appealing to think that science works regardless of wherehypotheses come from, undisciplined hypothesis generation reduces baserate and makes initial findings mostly false. Then large amounts of repli-cation will be needed to uncover the truth. In fields such as physics andevolutionary biology, a great deal can be and is done to vet theory in therealm of pure thought, using mathematics and simulation. But in fieldssuch as social psychology, theory development is rarely formalized [27].The results also re-emphasize the value of efforts to suppress false pos-itive findings, such as pre-registered data analysis plans. It is importantto recognize that any single scientific hypothesis may correspond to manydifferent statistical hypotheses. If a statistical hypothesis can be chosen af-ter seeing the data, reasonable scientific hypotheses can become unreason-ably flexible [28]. And many data-contingent transformations and model-ing choices that increase power, conditional on an hypothesis being true,will also increase false-positives, conditional on the hypothesis being false.For example, dropping outliers may well aid discovery, if the hypothesis istrue. But it may also dramatically inflate false-positives, if the hypothesis isnot true [29].Our model immediately informs debates over the meaning of failed repli-cations. For example, some have suggested that positive replications havemore worth than negative replications [12], or even that failed replications “cannot contribute to a cumulative understanding of scientific phenom-ena” [30]. We find the opposite: communicating a failure to replicate is typ-ically more informative than communicating a successful replication. Thisremains true even when replication attempts have lower power than origi-nal studies. However, a single failure to replicate is entirely consistent witha true hypothesis in many scenarios. So both positive and negative repli-cations may be regarded with skepticism. But neither is without value. Ofcourse our model is merely a model. But unlike the verbal arguments wecite, it is at least clear in its assumptions, and its logic can be verified.Our model also sheds light on proposals for improving the reliability ofresearch. For example, many have called for pre-registration and reviewwith a commitment from journals to publish research results, positive ornegative, in order to reduce under-reporting of negative findings [31]. Ouranalysis suggests that these proposals should distinguish between new hy-potheses and replication attempts. If indeed many new hypotheses are falsein many fields, a pre-registration process would merely fill journal pageswith null findings, doing great harm by crowding out candidate hypothe-ses that have passed an initial screening. In our model, there is little harm inignoring novel negative findings, because they add very little information.Indeed, Figure 2 illustrates that the effect of ignoring novel negative re-sults on precision is negligible. In contrast, a negative replication may adda lot of information. We suspect however that our model exaggerates thiseffect, because the model ignores the wasted effort arising from different re-searchers repeating an investigation in ignorance of one another’s negativefindings. And there are certainly fields in which full publication may bethe best policy, such as when false-positive rates are low or when the totalnumber of testable hypotheses is very small. Nevertheless, the qualitativedifference in information value between novel and follow-up negative find-ings will remain as long as the base rate in the published literature is higherthan it is in novel investigations.The model stimulates empirical investigation by clarifying which factorsmust be estimated in order to gauge the evidential value of research, as wellas being readily translatable into a statistical framework, due to its analyt-ical specification. Our model provides an implicit ‘null model’ of research:setting b = a priori false, but have nevertheless played an important role in science [20].There are additional factors to address in future work. Our model ig-nores researcher bias, multiple testing, and data snooping, each of whichdeflates base rate or inflates false-positive rate. Our analysis is framed ina standard, but unsatisfying, “true” and “false” classification, rather thanconsidering practical significance and effect size estimation [26]. Our model HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 17 can be directly generalized to consider variation in effect size instead of trueand false hypotheses. We explain this generalization in the Supporting Ma-terial. However, our model does not directly address causal inference norpoint estimation.Incentives also matter. A dynamic analysis of strategic behavior underdifferent incentive structures would aid policy analysis [18]. As Karl Pop-per argued, science does not work because scientists are selfless and unbi-ased people. Rather it works because its institutions channel our bias intothe production of public goods [32]. In particular, we worry that a researchenvironment that lacks replication may actually select for statistical prac-tices that inflate false-positives, as labs with such practices can more readilypublish findings and place students in new positions, all while outrunningthe truth.Replication may offer other benefits that are not accounted for in ourmodel. A failed replication may be valuable because it inspires a new hy-pothesis in order to explain variation in findings. When findings do notgeneralize across samples, this creates an opportunity to explain the vari-ation [33, 34]. In our view, the goal of replication is not merely to find thesame result, but also to discover how a result arises and how it is likely tovary in realistic, non-laboratory, contexts.Despite these shortcomings, our model provides specific quantitativeevaluations of many verbal arguments, as well as drawing attention to thepopulation dynamics of scientific knowledge. Science is a subtle project.Understanding it demands the same rigor that we apply to projects withinscience itself.
SUPPLEMENTAL INFORMATIONReplication, Communication, and the Population Dynamics ofScientific Discovery
1. D
ERIVATION OF FULL MODEL WITH RANDOM REPLICATION
Let f T, s = n T, s / n be the frequency of true hypotheses with tally s . Underthe assumptions and definitions supplied in the main text, the full recursionfor n (cid:48) T, s is given by: n (cid:48) T, s = n T, s + anr (cid:0) − f T, s ( c R + ( − β ) + c R − β ) + f T, s − ( − β ) c R + + f T, s + β c R − (cid:1) (5)for s not equal to 1 or −
1. In those cases, there is an additional term. For s = n (cid:48) T,1 = n T,1 (6) + anr (cid:0) − f T,1 ( c R + ( − β ) + c R − β ) + f T,0 ( − β ) c R + + f T,2 β c R − (cid:1) + an ( − r ) b ( − β ) The an ( − r ) b ( − β ) term accounts for inflow of novel positive findings,all of which are communicated. For s = − n (cid:48) T, − = n T, − (7) + anr (cid:0) − f T, − ( c R + ( − β ) + c R − β ) + f T, − ( − β ) c R + + f T,0 β c R − (cid:1) + an ( − r ) b β c N − The an ( − r ) b β c N − term accounts for inflow of novel negative findings,only c N − of which are communicated. Recursions for false hypotheses canbe derived just by substitution of variables: b → − b and 1 − β → α .These recursions implicitly define the population growth recursion for n : n (cid:48) = n + an ( − r ) (cid:0) b ( − β + β c N − ) + ( − b )( α + ( − α ) c N − ) (cid:1) (8)This just indicates that the population of published hypotheses grows pro-portional to the innovation rate, 1 − r , and the rates at which true and falsehypotheses respectively produce positive and negative findings, as well asthe rate at which negative findings are communicated.2. B EYOND “ TRUE ” AND “ FALSE ”Above we noted that recursions for false hypotheses can be derived justby substitution of variables: b → − b and 1 − β → α . In other words,true and false hypotheses are differentiated only by the rate at which theyappear in new investigations and their respective probabilities of producingpositive findings. This also means it is straightforward to expand the modelto additional epistemic states, as “true” and “false” really just more moreand less correct. For example, small, medium, and large effect sizes could HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 19 be represented by three states, each with its own base rate and probabilityof producing a positive result. The derivation would remain the same, butan additional set of steady-state solutions would appear.3. S
TEADY - STATE SOLUTIONS
We have analyzed this model using a variety of methods. First, we solvedthe model analytically for every structure except for targeted replication (tobe defined later). Second, when analytical solution was not possible, wesolved the model numerically. Third, we studied the model under bothdeterministic and stochastic simulations, written independently by bothauthors in different programming languages. All forms of analysis yieldidentical results.The model above can be solved directly, in one of two ways. First, itcan be solved exactly by bounding tallies within a minimum and maxi-mum (using either absorbing or reflecting boundaries) and then solvingthe system of simultaneous equations for values of the state variables f i , s for i ∈ { T, F } . This approach is probably the most straightforward. Second,it can be solved to any level of approximation desired by iteratively solvingthe system of equations outward from s = ansatz is what ourmathematics instructors used to call it. Since the solutions from the brute-force approach looked like closures of infinite series, and the simulation re-sults produced what resembled a mixture of geometric series, we guessedthe underlying limiting distribution. We then verified our ansatz solutionby plugging it back into the recursions and also by comparing it to numeri-cal results and our previous solutions. Finally, we induced the infinite seriesrepresentation by constructing Taylor series expansions of the closed seriesexpressions, yielding the sequential terms of the solution expression in thenext section.3.1. Full communication solution.
Here we repeat the simplest such so-lution from the main text and then motivate its justification. The steadystate proportion of hypotheses that are both true and have tally s , when allfindings are communicated, is given by:ˆ p T, s = b ( − r ) ∞ ∑ m = r m − K (cid:0) m , ( m + s ) /2 (cid:1) ( − β ) ( m + s ) β ( m − s ) (9)where K ( m , ( m + s ) /2 ) is the number of ways to get ( m + s ) /2 positivefindings in m investigations of the same hypothesis. This is simple the bi-nomial chooser, but implicitly evaluating to zero whenever ( m + s ) /2 is not an integer. Since s is the difference between the number of positive and neg-ative findings, this multiplicity accounts for the number of paths by whichan hypothesis can be studied m times and end up with a tally s . The re-maining terms leading with 1 − β and β are just the probabilities of getting ( m + s ) /2 positive findings and ( m − s ) /2 negative findings, respectively.Here’s how to motivate the above solution. For any given tally s , thereare an infinite number of histories by which it could have ended up withthat tally. • Consider tally s =
1, for example. If the hypothesis is true, it couldend up most simply at s = ( − r ) b ( − β ) , indicating innovationtimes base rate of true hypotheses times the probability of an initialpositive finding. • Similarly, if instead the hypothesis has been studied twice, whichhappens ( − r ) br of the time, the number of ways it could end upwith s = K ( ( + ) /2 ) = • For three studies, there are K (
3, 2 ) = s = + + − , (2) + − + , and (3) − + + . The probability of any oneof these is ( − β ) β , and the probability that an hypothesis is trueand has been studied three times is ( − r ) br .The pattern here generalizes so that the total probability is just: • the sum over number of studies on an hypothesis from m = m = ∞ of the probability the hypothesis was studied m times, givenby ( − r ) r m − • times the number of ways it could end up with a tally s in m steps,given by K ( m , ( m + s ) /2 ) • times the probability of getting ( m + s ) /2 positive and ( m − s ) /2negative findings.Writing down this summation and factoring out the common term b ( − r ) completes the expression.This steady-state solution obviously assumes that there has been an infi-nite amount of research time, such that every m can be realized. In practice,since the sequence is geometric in r , the probabilities of higher values of m decline very rapidly and simulations confirm that steady-state is reachedquite rapidly, as long as the replication rate r is not close to r = HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 21 real dynamical system, they are never exactly realized. For example, prob-lems in evolutionary theory are routinely solved by asking what happenson the infinite time horizon. Such solutions have been incredibly useful, de-spite the fact that no real population or environment is stationary enoughto make the exercise literally sensible.3.2.
Arbitrary communication solution.
When communication parametersare allowed to be less than one, the above strategy generalizes directly, butdoes become complex. The expressions get much more complex, becausenow the infinite series is over multinomial probabilities of three possibleoutcomes at each replication investigation of an hypothesis: (1) positiveand communicated, (2) negative and communicated, or (3) not communi-cated. In addition, when findings are not always communicated, then theeffective activity rate changes, making other probabilities conditional onobservable activity. Still, these solutions can be derived both by the logicto follow or by brute-force solution of the system of recursions. Solvingthe system of recursions does allow for easily defining reflecting or absorb-ing tally boundaries, which may be appealing in some contexts. The com-binatoric solution to follow assumes unbounded tallies. Solutions in thebounded and unbounded cases are nearly identical, for all scenarios con-sidered in the main text. The Mathematica notebooks in the supplementalmaterials present code for both types of solution.We present the solutions here as a sequence of conditional probabilities,as we’ve found this form easier to interpret than the general multinomialform. Therefore they provide more insight. Specifically, we decompose themultinomial probabilities into a binomial series for observed/unobservedinvestigations of a hypothesis and a binomial series for positive/negativefindings conditional on being observed. The solutions take the form:ˆ p T, s = Pr ( T ) Pr ( activity ) Pr ( new | activity ) (cid:0) ( − β ) Pr ( s | +) + β c N − Pr ( s |− ) (cid:1) (10)Where: Pr ( T ) = b (11)Pr ( activity ) = r + ( − r ) (cid:0) b (( − β ) + β c N − ) + ( − b )( α + ( − α ) c N − ) (cid:1) (12)Pr ( new | activity ) = ( − r ) (cid:0) b (( − β ) + β c N − ) + ( − b )( α + ( − α ) c N − ) (cid:1) Pr ( activity ) (13)The probabilities Pr ( s | +) and Pr ( s |− ) give the probabilities of tally s aver-aging over number of investigations m and un-communicated findings u ,beginning with either a positive finding or a negative finding, respectively. This conditioning is necessary because a tally s can be reached by differentpaths once communication is partial. These probabilities are given by:Pr ( s | +) = I ( s ) + ∞ ∑ m = m ∑ u = R m Pr ( u | m ) S ( s − | m − u ) (14)Pr ( s |− ) = I − ( s ) + ∞ ∑ m = m ∑ u = R m Pr ( u | m ) S ( s + | m − u ) (15)where I a ( b ) is a function that returns 1 when a = b and zero otherwiseand R = r / Pr ( activity ) is the probability of replication, conditional on ac-tivity as defined earlier. The term Pr ( u | m ) gives the probability of u un-communicated findings in m investigations, defined as:Pr ( u | m ) = m ! u ! ( m − u ) ! q u ◦ ( − q ◦ ) m − u (16)where q ◦ = ( − β R )( − c R + ) + β R ( − c R − ) (17)is the probability a replication finding is un-communicated, averaging overpositive and negative findings. Finally, the function S ( z | n ) provides theprobability that a sequence of length n communicated replication findingsproducing a difference z between positive and negative replications. It isdefined as: S ( z | n ) = (cid:40) I ( z ) if n = K ( n , ( n + z ) /2 ) q ( n + z ) /2 + ( − q + ) ( n − z ) /2 if n > K ( a , b ) is again the binomial chooser function, but evaluating to zerowhen b is not an integer, and: q + = ( − β R ) c R + − q ◦ (19)which is the probability of a positive replication, conditional on the replica-tion finding being communicated.4. A PPROXIMATE CONDITIONS FOR REDUCED COMMUNICATION
We argue in the main text that full communication is rarely optimal,from the perspective of precision. Consider the full communication con-text: c N − = c R − = c R + =
1. For small b ( b ≈
0) and small r ( r ≈ • c N − < α < β (easy to satisfy) • c R − < α > • c R + < β − α ≤ HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 23
These conditions are derived by first defining precision at s =
1, which ismost conservative precision to investigate, because it benefits the least fromreplication, and higher tallies always have higher precision than s =
1. Soimprovements at s = be theprecision at s =
1. Then the first condition is proved by computing thederivative ∂ PPV / ∂ c N − , evaluated at full communication parameter val-ues. Then Taylor expand the result simultaneously by second-order around r = b =
0. Neglecting terms of order O ( b ) and O ( r ) and higher: ∂ PPV ∂ c N − ≈ − r − βα b ( β − α )( − β − α )( − α ) (20)which is negative unless α > β . Thus suppressing some initial negativefindings is favorable, provided the base rate is small and replication is nottoo common. We think most scientific fields satisfy these conditions, butreasonable people can and do disagree on that point.In contrast, suppressing negative replications is unlikely to help. By thesame strategy, but this time differentiating with respect to c R − : ∂ PPV ∂ c R − ≈ rb − βα ( − β − α )( + r ( β − α )) (21)which is guaranteed positive, indicating that c R − = α ≤ − β > α .The third condition is derived similarly: ∂ PPV ∂ c R + ≈ − br − βα ( − β − α )( − r ( β − α )) (22)The last term is the one in play. For the above to be negative, it is requiredthat: r <
14 1 β − α (23)And this is guaranteed when β − α ≤ R EFERENCES
1. Ioannidis JPA. Why Most Published Research Findings Are False. PLoS Med.2005 Aug;2(8):e124. Available from: http://dx.doi.org/10.1371/journal.pmed.0020124 .2. Makel MC, Plucker JA, Hegarty B. Replications in Psychology Research How Of-ten Do They Really Occur? Perspectives on Psychological Science. 2012;7(6):537–542.Available from: http://pps.sagepub.com/content/7/6/537 .3. Franco A, Malhotra N, Simonovits G. Publication bias in the social sciences: Un-locking the file drawer. Science. 2014;345:1502–1505. Available from: .4. Schmidt S. Shall we really do it again? The powerful concept of replication is ne-glected in the social sciences. Review of General Psychology. 2009;13(2):90–100.5. Begley CG, Ellis LM. Drug development: Raise standards for preclinical cancer re-search. Nature. 2012;483:531–533. Available from: .6. Prinz F, Schlange T, Asadullah K. Believe it or not: how much can we rely on pub-lished data on potential drug targets? Nat Rev Drug Discov. 2011;10(9):712–712.Available from: .7. Sullivan PF. Spurious Genetic Associations. Biological Psychiatry. 2007May;61(10):1121–1126. Available from: .8. Fontani M, Costa M, Orna MV. The Lost Elements: The Periodic Table’s Shadow Side.Oxford University Press; 2014.9. Bissell M. Reproducibility: The risks of the replication drive. Na-ture. 2013;503:333–334. Available from: .10. Bohannon J. Replication effort provokes praise—and ‘bullying’ charges. Sci-ence. 2014;344:788–789. Available from: .11. Kahneman D. A new etiquette for replication. Social Psychology. 2014;45:310–311.12. Schnall S. Clean data: Statistical artefacts wash out replication efforts. Social Psy-chology. 2014;45(4):315–320. Available from: .13. Rosenthal R. The file drawer problem and tolerance for null results. PsychologicalBulletin. 1979;86(3):638–641.14. Hull DL. Science as a Process: An Evolutionary Account of the Social and ConceptualDevelopment of Science. Chicago, IL: University of Chicago Press; 1988.15. O’Rourke K, Detsky AS. Meta-analysis in medical research: Strong encouragementfor higher quality in individual research efforts. Journal of Clinical Epidemiology.1989;42(10):1021–1024.16. Campbell DT. Toward an epistemologically-relevant sociology of science. Science,Technology, & Human Values. 1985;10(1):38–48.17. Popper K. Conjectures and Refutations: The Growth of Scientific Knowledge. NewYork: Routledge; 1963.18. Kitcher P. Reviving the Sociology of Science. Philosophy of Science. 2000;67:S33–S44.19. Levins R. The Strategy of Model Building in Population Biology. American Scientist.1966;54.
HE POPULATION DYNAMICS OF SCIENTIFIC DISCOVERY 25
20. Wimsatt WC. False Models as means to Truer Theories. In: Nitecki M, Hoffman A,editors. Neutral Models in Biology. London: Oxford University Press; 1987. p. 23–55.21. Munroe R. “Significant”: http://xkcd.com/882/; 2014. Available from: http://xkcd.com/882/ [cited 2014].22. Simmons JP, Nelson LD, Simonsohn U. False-Positive Psychology Undisclosed Flex-ibility in Data Collection and Analysis Allows Presenting Anything as Significant.Psychological Science. 2011;22(11):1359–1366. Available from: http://pss.sagepub.com/content/22/11/1359 .23. Cox RT. Probability, Frequency and Reasonable Expectation. American Journal ofPhysics. 1946;14:1–10.24. Sedlemeier P, Gigerenzer G. Do studies of statistical power have an effect on thepower of studies? Psychological Bulletin. 1989;105(2):309–316.25. Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, et al. Powerfailure: why small sample size undermines the reliability of neuroscience. NatRev Neurosci. 2013;14(5):365–376. Available from: .26. Gelman A, Loken E. Ethics and Statistics: The AAA Tranche of Subprime Science.CHANCE. 2014;27(1):51–56. Available from: http://amstat.tandfonline.com/doi/abs/10.1080/09332480.2014.890872 .27. Smaldino PE, Calanchini J, Pickett CL. Theory development with agent-based mod-els. Organizational Psychology Review. 2015;in press.28. Gelman A, Loken E. The garden of forking paths: Why multiple comparisons can bea problem, even when there is no ‘fishing expedition’ or ‘p-hacking’ and the researchhypothesis was posited ahead of time. Department of Statistics, Columbia University;2013.29. Bakker M, Wicherts JM. Outlier removal, sum scores, and the inflation of the Type Ierror rate in independent samples t tests: the power of alternatives and recommen-dations. Psychological Methods. 2014;19:409–427.30. Mitchell J. On the emptiness of failed replications; 2014. Available from: http://wjh.harvard.edu/~jmitchel/writing/failed_science.htm .31. American Political Science Association Task Force on Public Engagement. Increasingthe credibility of political science research: A proposal for journal reforms; 2014.32. Popper K. The Myth of the Framework: In Defence of Science and Rationality. Rout-ledge; 1996.33. Henrich J, Ensminger J, McElreath R, Barr A, Barrett C, Bolyanatz A,et al. Markets, Religion, Community Size, and the Evolution of Fairnessand Punishment. Science. 2010;327:1480–1484. Available from: /rmpubs/henrichetalfairnessmarketsreligiongroupsizeScience2010.pdf .34. Scott IM, Clark AP, Josephson SC, Boyette AH, Cuthill IC, Fried RL, et al. Humanpreferences for sexually dimorphic faces may be evolutionarily novel. Proceedings ofthe National Academy of Sciences. 2014;111(40):14388–14393. Available from: