Reputation and Impact in Academic Careers
Alexander M. Petersen, Santo Fortunato, Raj K. Pan, Kimmo Kaski, Orion Penner, Armando Rungi, Massimo Riccaboni, H. Eugene Stanley, Fabio Pammolli
RReputation and Impact in Academic Careers
Alexander M. Petersen, Santo Fortunato, Raj K. Pan, Kimmo Kaski, Orion Penner, Armando Rungi, Massimo Riccaboni,
3, 4
H. Eugene Stanley, and Fabio Pammolli
1, 5 Laboratory for the Analysis of Complex Economic Systems,IMT Institute for Advanced Studies Lucca, 55100 Lucca, Italy Department of Biomedical Engineering and Computational Science,Aalto University School of Science, P.O. Box 12200, FI-00076, Finland Laboratory of Innovation Management and Economics,IMT Institute for Advanced Studies Lucca, 55100 Lucca, Italy Department of Managerial Economics, Strategy and Innovation,Katholieke Universiteit Leuven, 3000 Leuven, Belgium Center for Polymer Studies and Department of Physics,Boston University, Boston, Massachusetts 02215, USA (Dated: July 7, 2018)Reputation is an important social construct in science, which enables informed quality assessments of bothpublications and careers of scientists in the absence of complete systemic information. However, the relationbetween reputation and career growth of an individual remains poorly understood, despite recent proliferationof quantitative research evaluation methods. Here we develop an original framework for measuring how apublication’s citation rate ∆ c depends on the reputation of its central author i , in addition to its net citationcount c . To estimate the strength of the reputation effect, we perform a longitudinal analysis on the careersof 450 highly-cited scientists, using the total citations C i of each scientist as his/her reputation measure. Wefind a citation crossover c × which distinguishes the strength of the reputation effect. For publications with c < c × , the author’s reputation is found to dominate the annual citation rate. Hence, a new publication maygain a significant early advantage corresponding to roughly a 66% increase in the citation rate for each tenfoldincrease in C i . However, the reputation effect becomes negligible for highly cited publications meaning thatfor c ≥ c × the citation rate measures scientific impact more transparently. In addition we have developed astochastic reputation model, which is found to reproduce numerous statistical observations for real careers, thusproviding insight into the microscopic mechanisms underlying cumulative advantage in science. Citation counts are widely used to judge the impact of bothscientists and their publications [1–4]. While it is recognizedthat many factors outside the pure merit of the research or theauthors influence such counts, little effort has been devotedto identifying and quantifying the role of the author specificfactors. Recent investigations have begun to study the im-pact the individual scientists have through collaboration andreputation spillovers [5, 6], two integrative features of sci-entific careers that contribute to cumulative advantage [7–9].However, the majority of citation models avoid author spe-cific effects, mainly due to the difficulty in acquiring compre-hensive disambiguated career data [10–13]. As the measuresare becoming increasingly common in evaluation scenariosthroughout science, it is crucial to better understand what thecitation measures actually represent in the context of scien-tists’ careers. Moreover, how does reputation affect a scien-tist’s access to key resources, the incentives to publish qualityover quantity, and other key decisions along the career path[14–18]? And what role does reputation play in the “mentormatching” process within academic institutions, in the effec-tiveness of single/double blinding in peer-review, and in thereward system of science [14, 15, 19]?It is against this background that we have developed a [1] Published in: Proceedings of the National Academy of Science USA, 2014.DOI:10.1073/pnas.1323111111 Send correspondence to:[email protected], santo.fortunato@aalto.fi, or [email protected] quantitative framework with the goal of isolating the effectof author reputation upon citation dynamics. Specifically, bycontrolling for time- and author- specific factors, we quantifythe role of author reputation on the citation life cycle ofindividual publications at the micro level. We use a longitu-dinal career dataset from Thomson Reuters Web of Sciencecomprising 450 highly-cited scientists, 83,693 articles and7,577,084 citations tracked over 387,103 publication years.Dataset [A] refers to 100 top-cited physicists, [B] to anotherset of 100 highly prolific physicists, [C] to 100 assistantprofessors in physics, [D] to 100 top-cited cell biologists, and[E] to 50 top-cited pure mathematicians (for further data elab-oration see the Supporting Information (SI) Appendix). Foreach central scientist i we analyze the scientific productionmeasured by the number n i ( t ) of publications published inyear t , the cumulative number of citations c i,p ( t ) received bypublication p , and our quantitative reputation measure definedhere as the net citations aggregated across all publications C i ( t ) = (cid:80) p c i,p ( t ) .We begin with a description of our reputation model,followed by an empirical analysis of career trajectories,establishing C i as a good quantitative measure of reputation.We then establish quantitative benchmarks from the citationdistribution within individual publication portfolios and alsoquantify features of the citation life-cycle, both of whichare crucial components of our reputation effect model.Combining several empirical features of our analysis, wethen investigate the role of the reputation effect, showing a r X i v : . [ phy s i c s . s o c - ph ] O c t FIG. 1: Quantifying cumulative reputation measures and citationdynamics. (A,B) Growth trajectories of the cumulative publications N (cid:48) ( t ) and citations C (cid:48) ( t ) , appropriately rescaled to start from unityin each ordinate. The characteristic α and ζ exponents shown in eachlegend are calculated over the growth phase of the career. The math-ematicians [E] have distinct career trajectories, with α ≈ sincecollaboration spillovers via division of labor likely play a smallerrole in publication rate growth. See Tables S1–S9 for α i and ζ i val-ues calculated for individual careers. (C) Relation between τ / andcumulative citations c p . (D) Preferential attachment dynamics with π ≈ break down for c < c × . The reputation effect provides acitation boost above the baseline preferential attachment citation rateattributable to c p ( t ) only. that author reputation accounts for a significant fraction ofthe citation rate of young publications, thus providing atestable mechanism underlying cumulative advantage in sci-ence [7–9]. And finally, we develop a stochastic Monte Carloreputation model which matches the micro- and macroscopiccitation dynamics. Results
Reputation signaling.
Academic career growth is a complexprocess emerging from the institutional, social, and cognitiveaspects of science. Conceptually, each career i is embed-ded in two fundamental networks which are interconnected:the nodes in the first network represent scientists and in thesecond network represent publications. The links within the first network represent collaborations between scientists, andwithin the second network they represent citations betweenpublications; the cross-links represent the associations be-tween individuals and their publications.Since these networks are dynamic, it is difficult to fully un-derstand for any given individual, let alone the entire system,the complex information contained by all associations. Asa result, scientific reputation has emerged as a key signalingmechanism to address the dilemma of excessive informationthat arises, for example, in the task of evaluating, compar-ing, and ranking publication profiles in academic competi-tions. Reputation signals can flow between scientists j (cid:29) k ,between publications p (cid:29) q , and between a publication anda scientist, p (cid:29) i . The focus of our analysis is on this lat-ter dependency, i → p , whereby author reputation can impactthe citation rate of his/her publications, generating subsequentreputation feedback, p → i .Reputation plays an important role as a signal of trust-worthiness and quality, a role which addresses directly the“agency problem” characterizing the reward system in sci-ence [14]. Moreover, reputation signaling in scientific net-works is used to overcome information asymmetries betweenscientists and other academic agents; in this role it will be-come increasingly important as the rate of science publicationgrows and scientists have less time to absorb relevant advance-ments [14, 19–21]. With little time to read every paper on agiven topic, this trustworthiness signal is anecdotally consis-tent with the common practice of perusing the author nameswhen preliminarily evaluating the relevance of a newly-foundpublication. In the past, an author’s identity and associatedreputation was mainly linked to reference lists and personalinteractions. Nowadays, an author’s reputation is becomingincreasingly visible through searchable publication databases,laboratory websites, press, and other media, in addition to ci-tations.We measure the author reputation by C i ( t ) , which mea-sures not only the number of times his/her N i ( t ) publicationshave been referenced (an indication of overall scientific im-pact), but also the number of appearances of his/her name inthe literature, thereby providing a name-association visibil-ity. What C i does not account for is intrinsic research quality,e.g. the quality ratio C i /N i is broadly distributed across sci-entists. Since quantitative proxies for quality are limited tocitation counts, it is presently difficult to distill the role playedby quality in assessing overall scientific impact.By analyzing the top scientists, we reduce the compoundreputation effect occurring when two or more scientists ofcomparable reputation are coauthors on a publication, a sce-nario where it may be difficult to estimate the differential im-pact of these scientists on the citation rate. Due to data limi-tations requiring author name disambiguation and career datafor all coauthors j , we assume that a majority of the reputationsignal is attributable to the central scientist i by the approxi-mation C ( t ) ≈ (cid:80) j C j ( t ) ≈ C i ( t ) . Also, by analyzing top-cited cohorts, we can establish an upper bound to the strengthof the reputation effect. We note that C i possibly discounts therole of mentor reputation effects early in the career [22]. Nev-ertheless, by analyzing top scientists, the signaling advantage FIG. 2: Quantitative patterns in the growth and size-distribution of the publication portfolio for scientists from 3 disciplines. (Left) c i,p ( t ) for each author’s most cited papers (colored according to net citations in 2010) along with C i ( t ) ∼ t ζ i (dashed black curve). (Right) Theevolution of each author’s rank-citation profile using snapshots taken at 5 year intervals. The darkest blue data points represent the most recent c i ( r, t ) , and the subset of red data points indicate the logarithmically spaced data values used to fit the empirical data to our benchmark DGBDrank-citation distribution model [4] (solid black curve, see SI Appendix). The intersection of c i ( r, t ) with the dashed black line correspondsto the author’s h -index h i ( t ) . received early in their careers by associating with prestigiousmentors/coauthors should be negligible over the long run [20].To measure the role of author reputation vis-`a-vis publi-cation impact, we use a regression model that correlates theincrease in the number of citations ∆ c i,p ( t + 1) for a givenpaper p in year t + 1 using three explanatory variables: (i) therole played by the net number of citations c p ( t ) accrued up topaper age τ p quantified by the power-law regression parame-ter π ; (ii) the role of publication age and the obsolescence ofknowledge quantified by the exponential regression parame-ter τ ; and (iii) the role of author reputation C i ( t ) quantifiedby the power-law regression parameter ρ .Together, these three features are (i) the publication cita-tion effect Π p ( t ) ≡ [ c p ( t )] π , (ii) the life cycle effect A p ( τ ) ≡ exp[ − τ p /τ ] , and (iii) the author reputation effect R i ( t ) ≡ [ C i ( t )] ρ . We perform a multivariate regression to estimate the π , τ , and ρ values which parameterize the citation model, ∆ c i,p ( t + 1) ≡ η × Π p ( t ) × A p ( τ ) × R i ( t ) , (1)with the multiplicative log-normal noise term η . In the SIAppendix we perform an additional fixed effects regressionusing year as well as author variables to better control for theoverall growth in scientific output across time. In order tofully justify our reputation effect model, in what follows, wefirst account for two key features: measures for cumulativecareer reputation and obsolescence features of the citationlife-cycle. Patterns of growth for longitudinal reputation measures.
In this section we investigate the patterns of cumulative pub-lication and citation growth across the career. A striking sta-tistical patterns observed for top scientists is the faster thanlinear growth in time, both in cumulative publication num-ber N i ( t ) ≡ (cid:80) tt (cid:48) =1 n i ( t (cid:48) ) and in cumulative citation count C i ( t ) ≡ (cid:80) N i ( t ) p =1 c i,p ( t ) for a large part of a scientist’s “growthphase,” which we find to be ≈ years after their first publi-cation. Figures 1(A&B) show the characteristic growth trajec-tories (cid:104) N (cid:48) ( t ) (cid:105) ∼ t α and (cid:104) C (cid:48) ( t ) (cid:105) ∼ t ζ , calculated by an appro-priate average over individual N i ( t ) and C i ( t ) , respectively.To facilitate visual comparison, we use arbitrary normalizedordinate units so that each curve starts from the same point, (cid:104) N (cid:48) (1) (cid:105) = (cid:104) C (cid:48) (1) (cid:105) ≡ . The growth trajectories are char-acterized by superlinear algebraic growth, with α (cid:38) and ζ > α (values shown in Fig. 1). Individual exponents α i and ζ i are also calculated for the N i ( t ) and C i ( t ) of each author,(in addition to multiple other quantitative measures, see SIAppendix, Tables S1–S9). We averaged both α i and ζ i withineach dataset, confirming that (cid:104) α i (cid:105) ∼ = α and (cid:104) ζ i (cid:105) ∼ = ζ , confirm-ing that the aggregate patterns hold for the individual scale.In the SI Appendix we control for the exponential growth inscientific publication rates which can contribute to the longi-tudinal growth in C i ( t ) . We define “deflated” citation counts ∆ c Di,p ( t ) ≡ ∆ c i,p ( t ) /D ( t ) which are normalized by the num-ber of publications D ( t ) within a given discipline (since a newpublication can cite an old publication only once). For eachdiscipline we observe a % exponential growth in D ( t ) overthe last half century. After deflating each C i ( t ) , the net af-fect is only to reduce the estimated ζ i values by roughly 15%,meaning that the growth exponents ζ i (cid:38) reflect significantgrowth above the underlying baseline growth trend in science.Hence, we use C i ( t ) as a quantitative measure of reputationowing to the fact that the time dependence is readily quan-tified by a single parameter ζ i . We also use the power-lawscaling of C i ( t ) as a benchmark for the stochastic careermodel we develop in the final section. Figure 2 showstwo additional empirical benchmarks: (a) the microscopiccitation dynamics of individual publications comprising thepublication portfolio and (b) the rank-citation profile which isthe Zipf distribution of the publications ranked in decreasing FIG. 3: The citation life cycle reflects both the intrinsic pace of discovery and the obsolescence rate of new knowledge, two features which arediscipline dependent. (Left panels) For each of three disciplines, the averaged citation trajectory (cid:104) ∆ c (cid:48) ( τ ) (cid:105) is calculated for papers in the n -thquintile with the corresponding citation range indicated in each legend. For example, for physicists in dataset [A], the top 20% of papers havebetween 74 and 17,032 citations, and the papers in percentile 21–40 have between 31 and 73 citations. (Right panels) (cid:104) ∆ c (cid:48) ( τ ) (cid:105) calculated forrank-ordered groups of papers (listed in each legend) for 3 authors chosen from each discipline. order of rank r , c i (1) ≥ c i (2) ≥ · · · ≥ c i ( N i ) . We confirmthat the individual curves c i ( r ) belong to the class of thediscrete generalized beta distributions (DGBD), which in thegeneral form reads c ( r ) ∝ r − β ( N + 1 − r ) γ [4]. We vali-date the DGBD fits using the χ test (see SI Appendix), alsousing β i and ζ i as quantitative benchmarks for our MC model. Variability in the citation life-cycle.
Important scientificdiscoveries can cause paradigm shifts and significantly boostthe reputation of scientists associated with the discovery [23].In order to measure the reputation effect, one must also ac-count for obsolescence features of scientific knowledge. It isalso important to account for the variations in scientific im-pact, since most publications report results that are not sem-inal breakthroughs, but, rather, report incremental advancesthat are likely to have relatively short-term relevance.In this section we analyze the dynamics of the citation tra-jectory ∆ c p ( τ ) , the number of new citations received in pub-lication year τ , where τ is the number of years since the pub-lication was first cited. We analyze ∆ c p ( τ ) at two levels ofaggregation: (i) For each discipline, we calculate an aver-aged ∆ c p ( τ ) by collecting publications with similar total ci-tation counts c p . To achieve a scaled trajectory that is bettersuited for averaging we normalize each individual ∆ c p ( τ ) byits peak citation value, ∆ c (cid:48) p ( τ ) ≡ ∆ c p ( τ ) / Max[∆ c p ( τ )] . InFig. 3, the panels on the left show the characteristic citationtrajectory of publications belonging to each of the top 5 quin-tiles of each disciplinary citation distribution. Each curve rep-resents the average trajectory (cid:104) ∆ c (cid:48) ( τ ) (cid:105) ≡ N − q (cid:80) p ∆ c (cid:48) p ( τ ) calculated from the N q publications in quintile q . (ii) For eachcareer i , we calculate (cid:104) ∆ c (cid:48) i ( τ ) (cid:105) by averaging over groups ofranked citation sets within his/her publication portfolio. Thepanels on the right of Fig. 3 show that even within prestigiouscareers, there is significant variation in the citation life cycle.At both levels of aggregation, the impact life cycle typicallypeaks before publication age τ ≈ years. Counterexamples likely correspond to publications which receive a delayed sec-ondary attention, e.g. receiving subsequent experimental vali-dation of a previous theoretical prediction, and vice versa. Wedefine the half-life τ / as the time to reach half the peak ci-tation rate, ∆ c (cid:48) ( τ / ) = 1 / in the decay phase. Papers inthe theoretical domains of mathematics and physics can ex-hibit τ / > years. Remarkably, some top mathematicspublications even have τ / that span nearly the entire datasample duration of years, reflecting the indisputable andfoundational nature of “progress by proof.” This is in contrastto top-cited cell biology publications, whereby for even thetop 20% of most cited works, the value τ / ≈ years. Thisrelatively short decay timescale likely arises from the largescale of research output in bio-medical fields, which leads toa significantly higher discovery rate, and likewise, a relativelyfaster obsolescence rate.The relation between the decay time scale τ and c p providesinsight into the knowledge diffusion rate. Fig. 1(C) showsan approximate scaling relation τ / ∼ c Ω p when groupingpublications into logarithmically spaced c p bins. Physics andbiology differ mainly for the highly cited publications with c p (cid:38) , whereas mathematics shows larger variation in τ / per citation. The Ω value provides an approximate relationbetween citations and time. In mathematics τ / ∝ c p ,indicating that the impact is distributed roughly uniformlyacross time. However, for biology publications the sub-linearrelation with Ω ≈ . indicates that for two publications,one with twice the citation impact as the other, the more citedpublication gained twice the number of citations over a timeperiod τ / that was less than twice as large as the τ / ofthe less-cited publication. The differences in Ω are possiblyrelated to discipline-dependent bursts in technological ad-vancement, funding initiatives [15], and other social aspectsof science that are related to non-linearities in scientificadvancement. c ( t − < c × c ( t − ≥ c × Name π i (paper) τ i (lifecycle) ρ i (reputation) π i (paper) τ i (lifecycle) ρ i (reputation)Gossard, AC . ± .
027 4 . ± .
261 0 . ± .
008 0 . ± .
048 4 . ± .
184 0 . ± . Barab´asi, AL . ± .
036 3 . ± .
155 0 . ± .
010 1 . ± .
016 3 . ± .
111 0 . ± . Ave. ± Std. Dev. [A] . ± .
14 5 . ± .
52 0 . ± .
06 0 . ± .
19 8 . ± . − . ± . Baltimore, D . ± .
018 4 . ± .
148 0 . ± .
006 0 . ± .
047 5 . ± .
250 0 . ± . Laemmli, UK . ± .
036 5 . ± .
297 0 . ± .
014 1 . ± .
025 6 . ± . − . ± . Ave. ± Std. Dev. [D] . ± .
14 6 . ± .
24 0 . ± .
05 0 . ± .
22 9 . ± . − . ± . Serre, JP . ± .
095 15 . ± .
724 0 . ± .
026 0 . ± .
065 20 . ± . − . ± . Wiles, A . ± .
208 5 . ± .
187 0 . ± .
052 0 . ± .
059 9 . ± .
633 0 . ± . Ave. ± Std. Dev. [E] . ± .
17 30 . ± .
80 0 . ± .
07 0 . ± .
25 21 . ± .
30 0 . ± . TABLE I: Best-fit parameters for each effect ( ± std. errors), both for individual careers and the average values ( ± std. dev.) calculated withineach disciplinary dataset. The three features of the citation model are parameterized by the publication citation effect ( π ), the life-cycle effect( τ ), and the reputation effect ( ρ ). For statistical significances see SI Appendix Tables S10-S22. Baseline citation model.
To provide an initial test for ba-sic mechanistic differences between the citation dynamics ofhighly-cited publications and less-cited publications, in thissubsection we analyze the relation between ∆ c p ( t + 1) and c p ( t ) representing the standard baseline preferential attach-ment (PA) model (corresponding to the limit τ → ∞ and ρ = 0 ). Grouping together papers by c p ( t ) (using logarith-mic bins), we calculate for each group the mean number ofnew citations in the following year, (cid:104) ∆ c p ( t + 1) (cid:105) . Fig. 1(D)shows the empirical relation for physicists in datasets [A/B],indicating that publications with citations above a gradual butsubstantial citation crossover value c × obey a distinct scalinglaw that matches approximately linear ( π ≈ ) preferential at-tachment dynamics (see SI Appendix, Fig. S8, for other disci-plines). However, below c × , the citation rates are in excess ofthe citation rate expected from linear preferential attachmentalone, reflecting the citation premium that can be achieved viareputation. Quantifying the role of the reputation effect.
By analyz-ing the publications of highly-cited scientists, we have shownthat the basic citation dynamics above and below the citationcrossover value c × vary considerably. In this subsection weinvestigate the role played by the reputation effect for pub-lications with c p ( τ ) ≥ c × compared to publications with c p ( τ ) < c × . Based upon the assessment of the growth dy-namics (See SI Appendix, Figs. S8 and S9), we choose thecrossover values c × ≡ [A/B], c × ≡ [C], c × ≡ [D], and c × ≡ [E]. Our results are not strongly dependenton reasonable variations around our choice of c × . Table 1shows the π i , τ i , and ρ i estimates, above and below c × , forthe individual careers highlighted in Figs. 1 and 3. For ta-bles of the regression values aggregating over all careers ineach disciplinary dataset see SI Appendix Tables S10–S13,and for the values for all 450 scientists analyzed individuallysee SI Appendix Tables S14 – S22. The estimated model val-ues are consistent when comparing between aggregated disci-plinary datasets and individual career datasets. Interestingly,we find that mathematicians exhibit relatively high life-cycleexponents τ i as compared to physicists and biologists, consis-tent with the empirical trajectories shown in Fig. 3. However, the reputation effect ρ i is less prominent in mathematics, pos-sibly related to features of small team sizes and axiomatic dis-coveries which may decrease the role of reputation effects inconveying prestige signals.Our main result is a robust pattern of role switching byauthor- and publication-specific effects, specifically ρ ( c < c × ) > ρ ( c ≥ c × ) and π ( c < c × ) < π ( c ≥ c × ) . (2)For example, for the aggregate dataset [A/B] representing pro-lific physicists, we estimate the values ρ ( c < ≈ . , ρ ( c ≥ ≈ , π ( c < ≈ . , and π ( c ≥ ≈ .To emphasize the role of reputation on new publications,consider two scientists separated by a factor of 10 in theircumulative citations, C ( t ) = 10 C ( t ) . All other thingsbeing equal, the citation premium attributable to reputationalone for publications in the reputation regime ( c < c × ) is ∆ c ( t ) / ∆ c ( t ) = 10 ρ ≈ . (using the value ρ = 0 . fordataset [A]). Hence, there is a 66% increase in the citation ratefor each tenfold increase in C i ( t ) , which integrated over a ca-reer can provide significant positive feedback. A pattern thatemerges independent of discipline is ρ ( c ≥ c × ) ≈ , mean-ing that reputation only plays a significant role for c < c × .In the SI Appendix Section S6 we test the robustness of thisresult by implementing a fixed effects regression, the result ofwhich reaffirms the distinct roles of π and ρ above and be-low c × . Hence, these two inequalities in Eq. 2 indicate thatpublications are initially boosted by author reputation to a ci-tation “tipping point” c i,p ≈ c × , above which the citation rateis sustained in large by publication reputation. These findingsshow how microscopic reputation mechanisms contribute tocumulative “rich-get-richer” processes in science [7, 9]. Simulating synthetic Monte Carlo careers with the reputa-tion model.
Here we discuss three variants of a Monte Carlo(MC) career growth model which simulates the dynamics of ∆ c i,p ( t +1) for each publication p in each time period t of thecareer of synthetic author i . With each variant we introduceprogressively a new feature of publication citation trajectories.(i) We begin with a basic linear preferential attachment model(PA model) whereby ∆ c i,p ( t + 1) ∝ c i,p ( t ) . (ii) The PA-LCmodel includes a life-cycle (LC) obsolescence effect, A p ( τ ) . FIG. 4: Comparison of three Monte Carlo career models against empirical benchmarks demonstrated in Figs. 1–3 and S1–S3. For each modelwe show (cid:104) ∆ c (cid:48) ( τ ) (cid:105) for the top 4 groups of ranked papers, the evolution of c i,p ( τ ) and C i ( t ) (dashed black curve), and the evolution of therank-citation profile c i ( r ) at 5-period intervals. The best-fit DGBD β and γ parameters are also useful as quantitative benchmarks. For eachmodel we evolve the system over T ≡ periods, each period representative of a year. See SI Appendix for further elaboration of the modelparameters used in the MC simulation. Fig. 4 compares models (i-ii), which do not incorporate authorspecific factors, with the reputation model (iii) given by Eq. 1.The PA model fails to reproduce the characteristic trajectoriesof real publications, since there is a clear first-mover advan-tage [24] for the first publications published in the career, aswell as non-power-law growth of C i ( t ) .We use quantitative patterns demonstrated for real careersin Figs. 2–3 as empirical benchmarks to distinguish models(ii) and (iii). We confirm that the reputation model (iii) satis-fies the empirical benchmark characteristics in all 3 graphicalcategories (see Fig. 4). We also confirm for the model (iii),but not for the model (ii), that there is a distinction between (cid:104) ∆ c (cid:48) ( τ p ) (cid:105) for different rank sets. Furthermore, for model (iii)we quantitatively confirm that C ( t ) ∼ t ζ with (cid:46) ζ (cid:46) . Forsufficiently large t we also confirm that c ( r, t ) belongs to theclass of DGBD distributions, with β values within the rangeof values observed empirically. In the SI Appendix text wefurther demonstrate how the model can be used to estimateproperties of “average” careers for a given MC parameter set.For example, Fig. S11 shows excellent agreement betweenthe reputation model’s prediction and empirical data when es-timating the fraction f ≥ c x of publications with c p ≥ c × for agiven career age t . Empirically, we observe saturation f ≥ c x ≈ t . Discussion
Social networks in science are characterized by heteroge-neous structure [25] that provides opportunities for intellec-tual and social capital investment [26] and influences sci-entists’ research strategies [21]. Identifying patterns of ca- reer growth is becoming increasingly important, largely dueto the widespread emergence of quantitative evaluation pro-cesses and recent efforts to develop quantitative models ofcareer development. However, difficulties in accounting forcomplex social mechanisms, in addition to non-linearities andnon-stationarities in the career growth process, highlight thecase for caution in the development of predictive career mod-els [16, 17]. Without a better understanding of the institu-tional features and scientific norms that affect scientific ca-reers, along the variable path from apprentice to group leaderand mentor, there is a possibility to misuse quantitative careermetrics in the career evaluation process.Toward the goal of better understanding career growth, withpotential policy implications for the quantitative career eval-uation process, we have analyzed the effect of reputation onthe micro-level processes underlying the dynamics of a sci-entist’s research impact. We used a regression model for thecitation rate ∆ c i,p which accounts for the role of publicationimpact ( π ), the role of knowledge obsolescence ( τ ), and therole of author reputation ( ρ ). Interestingly, we find that thereputation parameter ρ ( c ≥ c × ) ≈ , meaning that in the longrun the reputation effect makes a negligible contribution tothe citation rate of papers with large c p . However, we identifycaveats concerning the way publications can become highlycited. By analyzing the variation of ρ and π for publicationsabove and below a citation threshold c × we identify the ad-vantageous role that author reputation plays in the citation dy-namics of new publications, finding that future publicationscan gain roughly a 66% increase in ∆ c for each tenfold in-crease in reputation C i . We note that it is also likely thatboth institutional affiliation and journal reputation also playa role in the citation dynamics, however disentangling the in-teraction between the multiple reputation sources will likelybe challenging and remains an open avenue for investigation.In the process of analyzing the effect of reputation on careergrowth, it was necessary to also quantify two essential fea-tures of our model, namely patterns of cumulative productiv-ity and impact across the career, and patterns of obsolescencein the citation life cycle of individual publications. For prolificscientists, we have identified a robust pattern of growth fortwo cumulative reputation measures, N i ( t ) and C i ( t ) , each ofwhich are quantifiable by a single scaling parameter, α i and ζ i , respectively. These regularities suggest that underlying so-cial processes sustain career growth via reinforcing coevolu-tion of scientific collaboration and publication [6, 27–29]. Wealso introduced a citation deflator index to control for the in-creased supply of citations arising from the exponential 5%growth (per year) in the total publication output. Analyzingthe growth of ’deflated’ citation trajectories, C Di ( t ) , we ob-served ζ i (cid:38) values which confirms that the observed careergrowth is significantly above the baseline inflation rate of sci-ence. We note that in using non-decreasing cumulative repu-tation measure C i ( t ) , we have overlooked the possibility thatreputation can significantly decrease, as occurs when a scien-tist is associated with invalidated and/or fraudulent science.Indeed, recent evidence indicates that the retraction of a pub-lication can have a negative impact on the potential growth of C i [30]. As a robustness check we also used the annual cita-tion rate ∆ C i ( t ) as an additional (non-cumulative) reputationmeasure, one that is more amenable to controlling for seculargrowth trends. We applied a multivariate fixed effects regres-sion using ∆ C i ( t ) as the reputation measure (see SI AppendixSection S6), which reconfirms the role of reputation in citationdynamics.Our analysis tracks the evolution of each scientist’s publi-cation portfolio across the career, suitably illustrated by therank-citation profile c i ( r ) , which highlights the skewed distri-bution of c i,p , even within a career. Arising from the power-law features of c i ( r ) [4], we emphasize the disproportion-ate fraction of a scientist’s total citations C i owed to the c i ( r = 1) citations coming from his/her highest-cited pub-lication. For example, the average and standard deviation ofthe ratio c i (1) /C i is . ± . for the physicists, . ± . for the biologists, and . ± . for the mathematicians weanalyzed, which emphasizes the potentially large reputationboost that can follow from just a single high-impact publica-tion. With rapidly increasing numbers of journals accompa-nied by the opportunity for rapid publication, the reputationeffect provides an incentive to aim for quality over quantity inthe publication process, reinforcing a research strategy whichis beneficial for science and scientists.It is also important to consider the role of reputation in lightof the increasing orientation of science around team endeavorscharacterized by multiple levels of hierarchy and division oflabor [31]. Because it is difficult to evaluate and assign creditto individual contributions in a team setting, there may be anincrease in the role and strength of the reputation in overcom-ing the problem associated with asymmetric and incompleteinformation. In addition to the collaboration network, repu-tation also plays a key role in numerous other scientific in-puts (money, labor, knowledge, etc.) which inevitably affect the overall quantity and quality of scientific outputs. It willbecome increasingly important to understand the relation be-tween these inputs and outputs in order to efficiently allocatescientific resources [6, 15, 18].In light of individual careers, an institutional setting basedon quantitative appraisal that neglects these complex relationsmay inadvertently go against the goal of sustaining the ca-reers of talented and diligent young academics [6]. For ex-ample, our finding of a crossover behavior around c × showshow young scientists lacking reputation can be negatively af-fected by social stratification in science. The appealing com-petitive advantage gained by working with a prestigious men-tor may be countered by the possibility that it may not be theideal mentor-advisee match. Despite having analyzed cohortsof highly cited scientists, our results have broad implicationsacross the scientific population when one considers the nu-merous careers that interact with top scientists via collabora-tion or mentorship.In excess, the reputation effect may also negatively af-fect science, especially considering how online visibilityhas become a relatively new reputation platform in anincreasingly competitive environment. As such, strategiesof self-promotion may emerge as scientists try to “game”with reputation systems. In such scenarios, it may be hardto disentangle fair from foul play. For example, it maybe difficult to distinguish self-citation strategies aimed atboosting C i from the natural tendency for scientists whoare crossing disciplinary borders to self-cite with the in-tention to send credibility signals [32]. Reputation willalso become increasingly important in light of preferentialtreatment in search queries, e.g. Google Scholar, whichprovide query results ordered according to citation measures.These systemic search and retrieval features may furtherstrengthen association of reputation between publications andauthors. In all, our results should motivate future researchto inspire institutional and funding body evaluation schemesto appropriately account for the roles that reputation andsocial context play in science. For example, our resultscan be used in support of the double-blind review system,which by reducing the role of reputation, is perceived tohave advantages due to its objectivity and fairness [33]. Weconclude with a general note that the data deluge broughtforth during the past decade is fueling extensive efforts in thecomputational social sciences[34] to identify and study theso-called “social atom” [35]. Because our methodology isgeneral, we speculate that other social networks characterizedby trust and partial/asymmetric information are also basedon similar reputation mechanisms. Indeed, it is likely thatagent-based reputation mechanisms will play an increasingrole due to the omnipresence of online recommender systemsgoverned by reputation dynamics operating as a generaldiffusive contagion phenomena [36]. Supporting Information (SI) Appendix available at AMPhomepage
Acknowledgements
We thank M. C. Buchanan, A. Scharn-horst, J. N. Tenenbaum, S. V. Buldyrev, V. Tortolini, A.Morescalchi, and S. Succi for helpful discussions and eachreferee for their extremely valuable comments and insights.AMP acknowledges COST Action MP0801 and COST Action TD1210. AMP, SF, KK, MR, and FP thank the EU FP project“Multiplex” and AMP, MR, and FP acknowledge PNR “Cri-sis Lab” project at IMT. OP acknowledges funding from theCanadian SSHRC. HES thanks NSF grant CMMI 1125290. [1] Radicchi F, Fortunato S & Castellano C (2008) Universality ofcitation distributions: Toward an objective measure of scientificimpact.
Proc. Natl. Acad. Sci. USA : 17268–17272.[2] Radicchi F, Fortunato S, Markines B, Vespignani A (2009) Dif-fusion of scientific credits and the ranking of scientists.
Phys.Rev. E : 056103.[3] Petersen AM, Wang F, Stanley HE (2010). Methods for measur-ing the citations and productivity of scientists across time anddiscipline. Phys. Rev. E : 036114.[4] Petersen AM, Stanley HE, Succi S (2011) Statistical regularitiesin the rank-citation profile of scientists. Scientific Reports :181.[5] Azoulay P, Zivin JSG, Wang J (2010) Superstar Extinction. Q.J. of Econ.
125 (2) : 549–589.[6] Petersen AM, Riccaboni R, Stanley HE, Pammolli F (2012)Persistence and Uncertainty in the Academic Career.
Proc. Nat.Acad. Sci. USA , 5213 – 5218.[7] Merton RK (1968) The Matthew effect in science.
Science :56–63.[8] De Solla Price D (1976) A general theory of bibliometric andother cumulative advantage processes.
Journal of the AmericanSociety for Information Science : 292–306.[9] Petersen AM, et al. (2011) Quantitative and empirical demon-stration of the Matthew effect in a study of career longevity. Proc. Natl. Acad. Sci. : 18–23.[10] Peterson GJ, Presse S, Dill KA (2010) Nonuniversal powerlaw scaling in the probability distribution of scientific citations.
Proc. Natl. Acad. Sci. USA : 16023–16027.[11] Eom Y-H, Fortunato S (2011) Characterizing and Modeling Ci-tation Dynamics.
PLoS ONE : e24926.[12] Medo M, Cimini G, Gualdi S (2011) Temporal Effects in theGrowth of Networks.
Phys. Rev. Lett. : 238701.[13] Golosovsky M, Solomon S (2012) Stochastic Dynamical Modelof a Growing Citation Network Based on a Self-Exciting PointProcess.
Phys. Rev. Lett. : 098701.[14] Stephan PE (1996) The economics of science.
J. Econ. Lit. :1199–1235.[15] Stephan PE. How Economics Shapes Science. (Harvard Univer-sity Press, Cambridge MA, 2012).[16] Penner O, Petersen AM, Pan RK, Fortunato S (2013) The casefor caution in predicting scientists’ future impact.
Phys. Today : 8–9.[17] Penner O, Pan RK, Petersen AM, Kaski K, Fortunato S (2013)On the predictability of future impact in science. Scientific Re-ports : 3052.[18] Duch J, et al. (2012) The Possible Role of Resource Require-ments and Academic Career-Choice Risk on Gender Differ-ences in Publication Rate and Impact. Plos ONE : e51332.[19] David PA (2008) The Historical Origins of ’Open Science’: Anessay on patronage, reputation and common agency contractingin the scientific revolution. Capitalism and Society : Article5.[20] Ductor L, Fafchamps M, Goyal S, van der Leij MJ. Social net-works and research output.
Review of Economics and Statistics , DOI: 10.1162/REST a 00430.[21] Shane S, Cable S (2002) Network ties, reputation, and the fi-nancing of new ventures.
Management Science : 364–381.[22] Malmgren RD, Ottino JM, Amaral LAN (2010) The role ofmentorship in prot´eg´e performance. Nature : 622 – 626.[23] Mazloumian A, Eom Y-H, Helbing D, Lozano S, Fortunato S(2011) How citation boosts promote scientific paradigm shiftsand Nobel prizes.
PLoS ONE : e18975.[24] Newman MEJ (2009) The first-mover advantage in scientificpublication.
Europhysics Letters : 68001.[25] Burt RS. Structural Holes (Harvard University Press, Cam-bridge MA, 1992).[26] Nahapiet J, Ghoshal S (1998) Social capital, intellectual capital,and the organizational advantage.
Acad. of Management Rev. : 242–266.[27] B¨orner K, Maru JT, Goldstone RL (2004) The simultaneousevolution of author and paper networks. Proc. Natl. Acad. Sci.USA : 5266–5273.[28] Guimera R, Uzzi B, Spiro J, Amaral LAN (2005) Team assem-bly mechanisms determine collaboration network structure andteam performance.
Science : 697–702.[29] Palla G, Barab´asi AL, Viscek T (2007) Quantifying socialgroup evolution.
Nature : 664–667.[30] Lu SF, Jin GZ, Uzzi B, Jones B (2013) The retraction penalty:Evidence from the web of science.
Scientific Reports : 3146.[31] Wutchy S, Jones BF, Uzzi B (2008) The increasing dominanceof teams in production of knowledge. Science : 1259–1262.[32] Hellsten I, et al. (2007) Self-citations, co-authorships and key-words: A new approach to scientists’ field mobility?
Sciento-metrics : 469–486.[33] Ware M (2008) Peer review: benefits, perceptions and alterna-tives. Publishing Research Consortium , December: 1–20.[34] Lazer D, et al. (2009) Life in the network: the coming age ofcomputational social science.
Science : 721–723.[35] Buchanan M.
The Social Atom. (Bloomsbury Publishing, NewYork NY, 2007).[36] Vespignani A (2012) Modelling dynamical processes in com-plex socio-technical systems.
Nature Physics : 32–39.[37] Sarigl E, Pfitzner R, Scholtes I, Garas A, Schweitzer F (2014)Predicting scientific success based on coauthorship networks. EPJ Data Science , (in press) DOI:10.1140/epjds/s13688-014-0009-x[38] Uzzi B (1999) Embeddedness in the making of financial cap-ital: How social relations and networks benefit firms seekingfinancing.
Amer. Soc. Rev. : 481–505.[39] Lazer D, Friedman A (2007) The network structure of explo-ration and exploitation. Adm. Sci. Quarterly : 667–694.[40] Pan RK, Saram¨aki J (2012) The strength of strong ties in scien-tific collaboration networks. EPL : 18007.[41] Petersen AM, Penner O (2014) Inequality and cumulative ad-vantage in science careers: a case study of high-impact journals. EPJ Data Science . DOI:10.1140/epjds/s13688-014-0024-y[42] van de Rijt A, Kang SM, Restivo M, Patil A (2014) Field exper-iments of success-breeds-success dynamics.
Proc. Natl. Acad.
Sci. : 6934–6939.[43] Azoulay P, Stuart T, WangH (2014) Matthew: Effect or Fable?
Management Science : 92–109.[44] Borjas GJ, Doran KB (2012) The collapse of the Soviet Unionand the productivity of American mathematicians. Q. J. ofEcon. : 1143–1203.[45] Horlings E, Gurney T (2013) Search strategies along the aca-demic lifecycle.
Scientometrics : 1137–1160.[46] Levin SG, Stephan PE (1991) Research productivity over thelife cycle: Evidence for academic scientists. Am. Econ. Rev. :114–132.[47] Vlachy J (1983) Tracing innovative papers in physics by succes-sive citation - Concepts and exemplars. Czechoslovak Journalof Physics B : 841–844.[48] B¨orner B, et al (2010) A multi-level systems perspective for thescience of team science. Sci. Transl. Med. : 49cm24.[49] Petersen AM, Pavlidis I, Semendeferi I (2014) A quantitativeperspective on ethics in large team science. Science & Engi-neering Ethics , (in press) DOI:10.1007/s11948-014-9562-8[50] Cho A (2012) Who Invented the Higgs?
Science : 1286–1289.[51] Birney E (2012) Lessons for big-data projects.
Nature : 49–51.[52] Goldstein M, Morris SA, Yen GG (2005) Group-based Yulemodel for bipartite author-paper networks.
Phys. Rev. E :026108.[53] Romer PM (1987) Growth based on increasing returns due tospecialization. Amer. Econ. Rev. : 56–62.[54] Newman MEJ (2004) Coauthorship network and patterns of sci-entific collaboration. Proc. Natl. Acad. Sci. USA : 5200–5205.[55] Catanzaro M, Caldarelli G, Pietronero L (2004) Assortativemodel for social networks.
Phys. Rev. E : 037101.[56] Oettl A (2012) Reconceptualizing Stars: Scientist Helpfulnessand Peer Performance. Management Science : 1122–1140.[57] Hirsch JE (2005) An index to quantify an individual’s scientificresearch output. Proc. Natl. Acad. Sci. USA : 16569–16572.[58] Bornmann L, Daniel H-J (2007) What do we know about the hindex?
Journal of the American Society for Information Scienceand Technology : 1381–1385.[59] Bornmann L, Mutz R, Daniel H-J (2008) Are there better in- dices for evaluation purposes than the h Index? A compari-son of nine different variants of the h Index using data frombiomedicine. Journal of the American Society for InformationScience and Technology : 001–008.[60] Batista PD, Campiteli MG, Martinez AS (2006) Is it possible tocompare researchers with different scientific interests? Sciento-metrics : 179–189.[61] Iglesias JE, Pecharrom´an C (2007) Scaling the h-index for dif-ferent scientific ISI fields. Scientometrics : 303–320.[62] Hirsch JE (2007) Does the h index have predictive power. Proc.Natl. Acad. Sci. : 19193–19198.[63] Redner S (2010) On the meaning of the h-index.
J. Stat. Mech. : L03005.[64] Naumis GG, Cocho G (2008) Tail universalities in rank distri-butions as an algebraic problem: The beta-like function.
Phys-ica A : 84–96.[65] Martinez-Mekler G, Martinez RA, del Rio MB, Mansilla R, Mi-ramontes P, Cocho G (2009) Universality of rank-ordering dis-tributions in the arts and sciences.
PLoS ONE : e4791.[66] Redner S (2005) Citation statistics from 110 years of PhysicalReview. Physics Today : 49–54.[67] Jeong H, Neda Z, Barabasi AL (2003) Measuring preferentialattachment in evolving networks EPL : 567–572.[68] Wang M, Yu G, Yu D (2008) Measuring the preferential attach-ment mechanism in citation networks. Physica A : 4692–4698.[69] Golosovsky M, Solomon S (2013) The transition towards im-mortality: non-linear autocatalytic growth of citations to scien-tific papers.
JSTAT : 340–354.[70] Evans, JA (2008) Electronic Publication and the Narrowing ofScience and Scholarship.
Science : 395–321.[71] Althouse BM, West JD, Bergstrom CT, Bergstrom T (2009)Differences in Impact Factor Across Fields and Over Time.
JA-SIST : 27–34.[72] Wang D, Song C, Barabasi AL (2013) Quantifying Long-TermScientific Impact. Science : 127–131.[73] Petersen AM, Penner O, Stanley HE (2011) Methods for de-trending success metrics to account for inflationary and defla-tionary factors.
Eur. Phys. J. B79