[PDF] Heavy-tailed distribution of the number of publications within scientific journals

Abstract

The community of scientists is characterized by their need to publish in peer-reviewed journals, in an attempt to avoid the "perish" side of the famous maxim. Accordingly, almost all researchers authored some scientific articles. Scholarly publications represent at least two benefits for the study of the scientific community as a social group. First, they attest of some form of relation between scientists (collaborations, mentoring, heritage,...), useful to determine and analyze social subgroups. Second, most of them are recorded in large data bases, easily accessible and including a lot of pertinent information, easing the quantitative and qualitative study of the scientific community. Understanding the underlying dynamics driving the creation of knowledge in general, and of scientific publication in particular, in addition to its interest from the social science point of view, can contribute to maintaining a high level of research, by identifying good and bad practices in science. In this manuscript, we attempt to advance this understanding by a statistical analysis of publications within peer-reviewed journals. Namely, we show that the distribution of the number of articles published by an author in a given journal is heavy-tailed, but has lighter tail than a power law. Moreover, we observe some anomalies in the data that pinpoint underlying dynamics of the scholarly publication process.

Full PDF

HHeavy-tailed distribution of the number of publications within scientiﬁc journals

Robin Delabays a,b and Melvyn Tyloo a a School of Engineering, University of Applied Sciences of Western Switzerland (HES-SO), CH-1951 Sion, Switzerland; b Center for Control, Dynamical Systems, and Computation,University of California at Santa Barbara (UCSB), Santa Barbara, 93106-9560 California, USA (Dated: November 11, 2020)The community of scientists is characterized by their need to publish in peer-reviewed journals,in an attempt to avoid the ”perish” side of the famous maxim. Accordingly, almost all researchersauthored some scientiﬁc articles. Scholarly publications represent at least two beneﬁts for the studyof the scientiﬁc community as a social group. First, they attest of some form of relation betweenscientists (collaborations, mentoring, heritage,...), useful to determine and analyze social subgroups.Second, most of them are recorded in large data bases, easily accessible and including a lot ofpertinent information, easing the quantitative and qualitative study of the scientiﬁc community.Understanding the underlying dynamics driving the creation of knowledge in general, and of sci-entiﬁc publication in particular, in addition to its interest from the social science point of view,can contribute to maintaining a high level of research, by identifying good and bad practices inscience. In this manuscript, we attempt to advance this understanding by a statistical analysis ofpublications within peer-reviewed journals. Namely, we show that the distribution of the number ofarticles published by an author in a given journal is heavy-tailed, but has lighter tail than a powerlaw. Moreover, we observe some anomalies in the data that pinpoint underlying dynamics of thescholarly publication process.

INTRODUCTION

One of the core mechanism in the practice of science isthe self examination of a ﬁeld of research. The validationof a scientiﬁc result is always collective, in the sense thatit has been scrutinized, criticized, and (hopefully) vali-dated by a suﬃcient number of peers. Furthermore, anyscientiﬁc result is always subject to new evaluation andmight eventually be replaced by a more accurate work.At the level of a community, scientists are then used tocriticize the work of colleagues and to have their workcriticized by them. It is then not surprizing that somescientists started to study (and thus somehow criticize)the scientiﬁc community itself [1]. The study of the thescientiﬁc community, sometimes refered to as

Science ofScience [2, 3], is a key step to unravel the underlying be-haviors of its members and draw some lessons about it.In the last decades, such an investigation has been signif-icantly eased by the emergence of large data bases of sci-entiﬁc publications (Web of Sience, PubMed, arXiv,...).It allowed for instance to build the time-evolving collab-oration network of scientists [4].Such an approach and associated tools has the poten-tial to help maintaining the quality of research, and thusa good use of public funding. Indeed, in the current con-text of increasing number of scientiﬁc publications [5, 6]in parallel to the ubiquitous presence of predatory jour-nals [7, 8], distinguishing bad practices from honest workin scientiﬁc publishing becomes more and more challeng-ing. Understanding the underlying dynamics of scientiﬁcpublication will be instrumental in this task.A scientist’s work is commonly evaluated by two dif-ferent, but related, quantities, namely, their number ofpublications and the number of citations thereof. Thesequantities are summarized in the criticized, but widely spread, h -index [9, 10]. Naturally, a vast majority ofinvestigations about the scientiﬁc publication process isfocussed on the citation side. It mostly aims at describ-ing how the citation network impacts the number of ci-tations a given paper is (and therefore its authors are)likely to recieve. In particular, evidence suggests thatcitations follow a rich-get-richer or preferential attach-ment process, where the more citations a scientist has,the more likely they are to get new citations [11], lead-ing to a power law distribution of citations [12, 13] orother heavy-tailed distributions [14]. Indeed, preferen-tial attachement has been proven to lead to heavy-taileddistributions [15], with some reﬁnements to account forthe life-time of a publication [16].Compared to the number of citations that an articleor a scientist gets, the number of articles published bya scientist has been much less investigated, even thoughpublishing papers is a sine qua none to get cited. In thismanuscript, we focus on the distribution of articles pub-lished by a scientist within a given peer-reviewed journal.As interestingly pointed out by Sekara et al. [17], publish-ing in a peer-reviewed journal (especially in high-impactones) is more likely if one author of the manuscript al-ready published in the same journal. Such a process canbe viewed as some sort of preferential attachment, andan expected outcome of such an observation is a highrepresentation of a few authors in a given journal [15].Furthermore, a scientist whose ﬁeld of research is well-aligned with a journal topic is likely to publish a largeproportion of their work in this journal, leading again toa high representation of a few specialized authors in agiven journal.We support these expectations, showing that in a se-lection of fourteen journals (listed in Table I) the distri-bution of the number of articles published by an author a r X i v : . [ c s . D L ] N ov within a journal has a heavy tail. It appears howeverthat this distribution have a tail weaker than a powerlaw. We argue that this distribution can be explained bya preferential attachement process, which is backed up byevidence. On top of that, in some of the selected jour-nals, we observe some interesting anomalies for which wegive an explanation. RESULTS

For each journal in Table I, we consider the list of allauthors who published in it and the number of articlespublished by each of them up to 2017. From this, one canplot the empirical distribution of the number of articlespublished by an author in a given journal (Fig. 1). Onthese data, we ﬁt three heavy-tailed distributions, namelya power law (Eq. 2), a power law with cutoﬀ (Eq. 3),and a

Yule-Simon distribution (Eq. 4), using a MaximumLikelihood Estimator (see Methods). We then assess thegoodness-of-ﬁt of our ﬁtting following [18], which is en-coded in a p -value (see Methods). The results of eachﬁt and goodness-of-ﬁt tests are presented in Table II,and the resulting distributions together with the dataare shown in Figs. 1, 3, and 5. Clearly, the power lawdistribution is a poor ﬁt for all data, its p -value beingzero for all journals. This can be seen in Figs. 1, 3, and5, where for most of the journals, the tail of the data setis lighter than the tail of its power law ﬁt (black dashedlines). For three journals (namely SCI, PLC, CHA), the p -value of the power law with cutoﬀ is larger than 5%and it seems to be a rather good ﬁt, and for two others(NEM and SIA), the Yule-Simon distribution cannot beexcluded. General explanation

We propose the following explanation for this heavy-tailedness. Many social processes are ruled by the socalled preferential attachment [19]. Scientiﬁc collabora-tions [20] and citations [12] are apparently no exceptionto the rule. Namely, the probability that an author willcreate a new scientiﬁc collaboration at time t is propor-tional to the number of scientiﬁc collaboration they have.It is reasonable to assume that the evolution of the num-ber of articles published by an author in a given journalis described by a similar preferential attachment process.In other words, it would mean that the probability thata new article published in a given journal is signed by anauthor is proportional to the number of articles publishedby this author in the very same journal.Heuristically, our argument is that if an author pub-lished a lot of article in a journal, it means (i) that theywrite a lot of papers, and (ii) that their research topic iswell-aligned with the topics covered by the journal (forspecialized journals), or that the scientiﬁc impact of thisauthor’s research matches the standards of the journal (for interdisciplinary journals). Assumptions (i) and (ii)together imply that this author is likely to publish againin this journal.This intuition can be made more rigorous. For threejournals (SCI, LAN, and PRL), we reﬁned the data toaccount for the time evolution of the number of articlespublished by each author. It turns out that, on average,the number of articles published during a year, by anauthor having already published k articles, is close tobe proportional to k (details are found in the Methodssection). According to [15], if it was exactly proportional k , the ﬁnal distribution would be a power law. The factthat the relation is not exactly proportional, but closeto be, probably explains the lighter-than-power-law tailsobserved in Figs. 1, 3, and 5. Observations

Aside of these general considerations, we note threeinteresting observations in the data. First, for journalswith a large number of authors and published articles, thetail of the histogram drops dramatically. Second, someauthors apprear to be stronger than the power law. Andthird, some very large groups of authors can be identiﬁedeven in long term aggregated data.

Decay in long-life journals.

We observe in Figs. 1,3, and 5 that for old journals where a lot of articles arepublished, the tail of the histogram has a rather fast de-cay after a heavy-tailed regime (this is particularly strik-ing in PRL and PRD, Fig. 3). We explain this by the factthat the number of pulications of a given author dependson two parameters, namely their publication rate and thelength of their career. Both these quantities are boundedin practice and even if it is possible to publish a very largenumber of articles in a given journal, there is a practi-cal limit to this number. We hypothesize that the decayin the histograms of long-living journals comes from theﬁniteness of publication rates and career lengths.

Key players.

The general distribution of the numberpapers per author is quite clear in our analysis, it seemsto be somewhere between an exponential distribution anda power law. The power law having the heaviest tailof the four distributions considered (exponential, powerlaw, power law with cutoﬀ, and Yule-Simon), we use itto estimate an upper bound on the number of articlespublished by an author for each journal, shown as thevertical dashed lines in Figs. 1, 3, and 5 (more details inthe Methods section). In some journals (see e.g., PNA,CHA, SIA, and AMA in Fig. 1, and NEM and ACS inthe Appendix, Fig. 5), it appears that, some authors,which we refer to as key players , publish signiﬁcantlymore articles in a journal than what the power law wouldpredict.Note that we checked that these key players are notartifacts due to multiple authors having the same namewhich would count as the same person. In all cases pre-sented here, there is a unique person appearing in the

Label Journal name (red. year) ∗ (1950) 63’791 (3’374)PNA Proc. Natl. Acad. Sci. USA ∗∗ (1950) 55’849 (2’495)SCI Science ∗ (1940) 48’928 (4’788)LAN The Lancet ∗ (1910) 33’416 (3’015)NEM New England Journal of Medicine ∗ (1950) 27’078 (3’842)PLC Plant Cell (2000) 20’649 (4’712)ACS J. of the American Chemical Society ∗ (1930) 82’223 (5’301)TAC IEEE Trans. on Automatic Control (2000) 8’911 (3’603)ENE Energy (2005) 28’920 (4’491)CHA Chaos 7’409SIA SIAM Journal on Applied Mathematics 6’106AMA Annals of Mathematics 3’679PRD Physical Review D 64’922PRL Physical Review Letters ∗ a J with respect to the number of articles published, for the six journals,indicated in the insets. The grey dotted line is an exponential ﬁt of the data, emphasizing that the distribution is heavy-tailed.We also show the best ﬁt (MLE)f for a power law distribution (dashed black), power law with cutoﬀ (dash-dotted black), andYule-Simon distribution (dotted black). The vertical dashed line indicates the theoretical maximal number of publications ifthe distribution was the ﬁtted power law (see Eq. 8). The same plots for the other journals are available in Fig. 3 and in theAppendix, Fig. 5. authors’ list of a very large number of papers.In order to make the data more comparable, we restrictour investigation to the early years between 1900 (earli-est possible in WoS) and the year in parenthesis in thesecond column of Table I for our ﬁrst nine journals in thetable. This yields a number of authors comparable to the following three journals in the table (reduced number ofauthors is given in parenthesis in the third column of Ta-ble I). The resulting distributions are depicted in Fig. 2and in the Appendix, Fig. 6, and the ﬁtted parametersare detailed in Table III. It appears from Figs. 2 and 6that for such reduced number of authors, the overshoot PL PLwC Y-S α p [%] β γ p [%] ρ p [%]NAT 2 .

58 0 . .

11 0 .

07 0 . .

10 0 . .

53 0 . .

30 0 .

02 0 . .

83 0 . .

68 0 . .

30 0 .

06 16 .

64 3 .

28 0 . .

47 0 . .

09 0 .

05 0 .

18 2 .

90 0 . .

76 0 . .

36 0 .

07 0 . .

43 8 . .

30 0 . .

92 0 .

10 13 .

42 3 .

01 0 . .

11 0 . .

95 0 .

01 0 . .

32 0 . .

08 0 . .

84 0 .

04 0 . .

51 0 . .

36 0 . .

12 0 .

06 0 .

12 3 .

15 0 . .

47 0 . .

28 0 .

05 80 .

84 3 .

43 0 . .

49 0 . .

20 0 .

08 2 .

24 3 .

49 9 . .

26 0 . .

72 0 .

14 0 .

18 2 .

95 0 . .

49 0 . .

24 0 .

005 0 .

02 1 .

55 0 . .

73 0 . .

52 0 .

005 0 .

12 1 .

80 0 . p -value of the goodness-of-ﬁt for power law (PL), power law with cutoﬀ (PLwC), andYule-Simon (Y-S) distributions. No set of data is well-ﬁttedby a power law distribution. However, the power law withcutoﬀ seems to be a good ﬁt for three journals (SCI, PLC,CHA), and the Yule-Simon distribution seems to correctly ﬁtthe distribution of NEM and SIA. For the other journals, noneof the distributions seem to ﬁt the data appropriately. of some authors is more systematic, suggesting that inthe early years of scientiﬁc journals, there is usually afew very proliﬁc authors publishing in it at a rather highrate.Considering the resluts of the ﬁtting, in Table III, weobserve better agreements than for the full data sets.This probably indicates that the sample size is not largeenough to accurately ﬁt heavy-tailed distributions, whichobviously need large samples. The fact that NAT andPNA are well-ﬁtted by two distributions, also indicatesthat the reduced data sets are not large enough to beconclusive. Peaks in PRL and PRD.

In Fig. 3, we observe twopeaks in the empirical distributions of PRL (around 66and 96) and PRD (around 77 and 104). Crossing the listsof authors for each number of articles between 63 and 102for PRL (resp. 72 and 111 for PRD), we get the rightpanel of Fig. 3. The fact that the authors composinga peak in PRL are also the ones composing one of thepeaks in PRD suggests that these authors are all part ofa large group publishing together.A quick search, indicates that the peaks correspondto the research groups of the experiments ATLAS andCMS at the CERN. These two experiments are so bigand gather so many authors that they can be seen, evenin the data used in our analysis, aggregated throughoutthe whole history of PRL (since 1958) and PRD (since1970).

PL PLwC Y-S α p [%] β γ p [%] ρ p [%]NAT 2 .

32 29 . .

23 0 .

016 6 . .

98 0 . .

10 0 . .

96 0 .

02 15 . .

55 6 . .

44 0 . .

13 0 .

09 72 . .

37 4 . .

25 0 . .

81 0 .

11 30 . .

91 2 . .

27 0 . .

06 0 .

04 4 . .

91 0 . .

59 0 . .

12 0 .

16 0 . .

82 54 . .

06 0 . .

89 0 .

02 0 . .

46 64 . .

32 0 . .

06 0 .

06 23 . .

04 0 . .

69 0 . .

50 0 .

06 94 . .

06 0 . p -value of the goodness-of-ﬁtfor power law (PL) and power law with cutoﬀ (PLwC), andYule-Simon (Y-S) distributions. We see that the only datathat are well-approximated by the power law are for NATwhen reduced to the ﬁrst 3374 entries of WoS. The power lawwith cutoﬀ, however, seems to be a good ﬁt for the reduceddata of six journals (NAT, PNA, SCI, LAN, TAC, and ENE).ENE is particularly well-ﬁtted by the power law with cutoﬀ.Finally, the Yule-Simon distribution seems to correctly ﬁt thedistribution of PAN, PLC, and ACS. For the other journals,none of the distributions seem to ﬁt the data appropriately.Remark that the reduced data of NAT and PNA are correctlyﬁtted for two distributions indicating that the amount of datais probably not suﬃcient for a good ﬁt. DISCUSSION

Our analysis reveals a series of interesting, even thoughnot surprizing, dynamics ruling the process of publica-tion within scientiﬁc journals. The main observation isthe heavy-tailed shape of the distribution of publications,which we explain by a preferential attachment process.We showed that the preferential attachment dynamics isheuristically meaningful, in the sense that if an authorpublishes a lot of papers and if their proﬁle aligns withthe journal’s proﬁle, they are likely to publish in thisjournal and at the same time they are likely to have al-ready published in the same journal. Moreover, we alsobacked up the preferential attachment process by data-based evidence, where we show that the proportion of ar-ticles published in a journal by the authors with already k articles (in this journal) is approximately proportionalto k . An exact proportionality would lead, according toRef. [15], to a power law distribtion. Of course, in thelong run, scientists cannot published an unbounded num-ber of articles, due to ﬁniteness of their careers. Thistranslates, in our analysis, as a drop in the tail of thedistribution for older journals, which then do not followa power law. Apparently, a power law with cutoﬀ or aYule-Simon distribution are better suited to describe thedata.On top of this general dynamics, our analysis displayssome interesting anomalies that point towards speciﬁcunderlying dynamics. First, the data show that in theearly decades of existence of a journal, a small number Figure 2. Histograms of the number of authors a J with respect to the number of articles published, for the ﬁrst three journalsof Table I, with data restricted to the years between 1900 (earliest possible in WoS) and the years indicated in the insets. Thenumber of authors covered is given in parenthesis in the third column of Table I. We show the best ﬁt for a power law distribution(dashed black), power law with cutoﬀ (dash-dotted black), and Yule-Simon distribution (dotted black). The vertical dashedline indicates the theoretical maximal number of published papers if the distribution was the ﬁtted power law (see Eq. 8). Weobserve an almost systematic exceeding of the number of articles published by some authors. The same plot for other journalsis available in the Appendix, Fig. 6. ATLAS CMS

Figure 3. Analysis of PRL and PRD. Left and center: Same ﬁgures as in Fig. 1 for PRD and PRL respectively. Thearrows indicate the increased number of authors corresponding to the ATLAS and CMS experiments at the CERN. Right:Two-dimensional, color-coded histogram of the number of authors with respect to the number of articles published in PRL(horizontal axis) and PRD (vertical axis). The peak centered at (96,77) is the CMS experiment and the one at (66,104) is theATLAS experiment, both at the CERN. of authors are extremely inﬂuencial. This translates assome authors having much more publications than whata power law distribution would predict, given that thepower law already has an heavier tail than our data. Suchauthors, which we refer to as key players, are likely to besome very inﬂuencial scientists in the topic(s) covered bythe journal.Second, we realized that some huge scientiﬁc projectcan impact the distribution of publications even on largescale aggregated data. In our samples, this is seen for thejournals Physical Review Letters (PRL) and Physical Re-view D (PRD), which publish the outcomes of the largeexperiments ATLAS and CMS at the CERN, gatheringthousands of scientists. Our approach was then able topinpoint further dynamics taking place in nowadays sci-ence.As seen in Table II, the ﬁtting of the data by a powerlaw with cutoﬀ or a Yule-Simon distribution is not per-fect. More advanced ﬁtting techniques might be able toidentify a common distribution for all journals, providedthat one exists. From a social science point of view, a more reﬁned explanation of the approximate preferentialattachment taking place in scientiﬁc publishing could un-ravel with more certainty the source of the distributionsobserved in this manuscript. This is work for a futureresearch.

MATERIALS AND METHODSData sets

We consider an arbitrary selection of 14 peer-reviewedjournals (see Table I), whose data are available on theWeb of Science data base (WoS). The selected journalsvary in age (from a few decades to more than a century)but are not too young, in order to have suﬃciently manypublications available, and all of them are still publish-ing nowadays. We denote by J := { NAT , PNA , ...,

PRL } the set of journals considered (see Table I for the list oflabels).Within each journal J ∈ J , we index authors by aninteger and for each author i = 1 , ..., N J , we count thenumber n Ji of articles published by i in J up to year 2017,which gives the set of data A J = { n Ji } . We restrict ourinvestigation to publications labelled as “Article” in theWoS data base, to focus on peer-reviewed articles and todiscard editorial material for instance. For some journals,the number of authors was too large to be downloadedfrom the WoS data base. As a consequence, the authorshaving published only one or two articles in these journalshad to be removed from the data (e.g., NAT, PNA, orSCI, indicated by asteriscs in Table I). Note also that wedo not take into account articles published anonymously,which represent a large number of articles in medicinejournals in particular.From the data set A J we can compute the proportionof authors who published n ∈ N articles a J ( n ) := |{ i : n Ji = n }| N J . (1)These values are represented in logarithmic scales inFigs. 1, 3, and 5, each panel corresponding to a diﬀer-ent journal. Distribution ﬁtting

For each empirical distribution in Figs. 1, 3, and 5, weﬁt an exponential distribution (grey dotted lines) to em-phasize their heavy-tailed behavior. With this observa-tion, it is tempting to ﬁt a power law distribution (blackdashed lines), P pl ( a J = n ) = C · n − α , (2)with α > C ∈ R normalizing the distribution.However, as pointed out in [18], ﬁtting a heavy-taileddistribution is not trivial and should be done carefully,the risk being to derive spurious conclusions [21]. Fol-lowing recommendations in Ref. [18], we also try to ﬁtother heavy-tailed distributions, such as the power lawwith cutoﬀ (black dash-dotted lines), P plc ( a J = n ) = C · n − β e − γn , (3)with β > γ >

0, and normalizing constant C ∈ R ,and the Yule-Simon distribution (black dotted lines), P ys ( a J = n ) = C · ( ρ − n, ρ ) , (4)with ρ > C ∈ R is the normalizing constant, andwhere B( x, y ) is the Euler beta function . We performthe distribution ﬁtting by optimizing the parameters α , β , γ , and ρ with a Maximum Likelihood Estimator [18].Other distribtions (such as log-normal, L´evy, Weibull)were tested and discarded because they were far frommatching the data. Goodness-of-ﬁt

To evaluate the goodness of our ﬁtting, we again followthe recommendations of [18]. We generate 5000 sets ofsynthetic data ˜ A i , i = 1 , ..., | ˜ A i | = N J and following the distributionwhose goodness-of-ﬁt is to be tested. For each of thesedata sets, we deﬁne its associated empirical cumulativedistribution function (CDF) S i ( k ) := |{ x ∈ ˜ A i : x ≤ k }|| ˜ A i | , (5)and denote by S J the empirical CDF of A J . We denoteby P i the CDF of the best ﬁtted distribution associatedto ˜ A i ( P J for A J ). The p -value of the goodness-of-ﬁt isthen given by p := |{ i : d KS ( S i , P i ) > d KS ( S J , P J ) }| , (6)where the Kolmogorov-Smirnov distance between twoCDFs Q and Q is deﬁned as the maximum diﬀerencebetween them, i.e., d KS ( Q , Q ) := max k | Q ( k ) − Q ( k ) | . (7)Namely, p is the proportion of synthetic data setsthat are further from the theoretical distribtion (in theKolmogorov-Smirnov sense) than the data set investi-gated. The ﬁt is rejected if p < good otherwise [see [18] for more details]. Maximum number of articles

Based on Eq. 2, one can compute x n , the number ofauthors with n publications in J if the distribution fol-lowed a power law. Setting this number to x n = 1, themaximal number of articles is given by x n ≈ N J C n − α = ⇒ n max ≈ ( N J C ) α . (8)This determines a theoretical upper bound on the num-ber of articles published by an author for each journal,shown as the vertical dashed lines in Figs. 1, 3, and 5. Number of articles published every year

For three journals (SCI, LAN, and PRL) we comparethe number of authors having published k articles at thebegining of year t with the number of articles publishedby these authors during year t . We deﬁne: • N k ( t ): the number of authors who have published k articles on December 31st of year t − • m k ( t ): the number of articles published during year t by all the authors with k articles on December31st of year t − m k ( t ) /N k ( t ) with re-spect to k for years t ∈ { , ..., } for SCI, LAN,and PRL. Note that, for each year considered, we do nottake into account authors who did not publish, becausethe majority of those are not active anymore (retired ordead). For each of the three journals, these values have alinear correlation coeﬃcient larger than 0 .

7, supportinga fairly good linear dependence, m k ( t ) ∼ k · N k ( t ) . (9)The probability that a new paper is signed by an au-thor with k publications is then close to be proportionalto k . According to [15], if it was exactly proportional,after a long enough time, the distribution of N k wouldfollow a power law. The fact that the relation 9 is notexact and that our samples are limited to a ﬁnite timehorizon, explains that we do not obtain exactly a powerlaw. However, the good correlation between m k ( t ) /N k ( t )and k tells us that the distribution should not be too faraway from a power law, in agreement with our observa-tion of Table II. DATA AVAILABILITY

The data are available from WoS. The study used nospecial computer code.

ACKNOWLEDGMENTS

RD and MT were supported by the Swiss NationalScience Foundation under grant number 2000020 182050.RD was supported by the Swiss National Science Foun-dation under grant number P400P2 194359.

APPENDIX

We show here the ﬁgures not displayed in the Resultssection. [1] D. J. de Solla Price,

Little Science, Big Science (Columbia University Press, 1963).[2] A. Clauset, D. B. Larremore, and R. Sinatra, Science , 477 (2017).[3] S. Fortunato, C. T. Bergstrom, K. B¨orner, J. A. Evans,D. Helbing, S. Milojevi´c, A. M. Petersen, F. Radic-chi, R. Sinatra, B. Uzzi, A. Vespignani, L. Waltman,D. Wang, and A.-L. Barab´asi, Science , eaao0185(2018).[4] M. E. J. Newman, Proc. Natl. Acad. Sci. USA , 404(2001).[5] D. J. de Solla Price, Science , 510 (1965).[6] L. Bornmann and R. Mutz, J. Assoc. Inf. Sci. Tech. ,2215 (2015).[7] J. Bohannon, Science , 60 (2013).[8] P. Sorokowski, E. Kulczycki, A. Sorokowska, andK. Pisanski, Nature , 481 (2017).[9] J. E. Hirsch, Proc. Natl. Acad. Sci. USA , 16569(2005).[10] G. Siudem, B. ˙Zoga´la Siudem, A. Cena, andM. Gagolewski, Proc. Natl. Acad. Sci. USA , 13896 (2020).[11] D. de Solla Price, J. Am. Soc. Inf. Sci. , 292 (1976).[12] Y.-H. Eom and S. Fortunato, PLoS ONE , e24926(2011).[13] L. Waltman, N. J. van Eck, and A. F. J. van Raan, J.Am. Soc. Inf. Sci. Tech. , 72 (2012).[14] M. Thelwall, J. Infometr. , 336 (2016).[15] P. L. Krapivsky, S. Redner, and F. Leyvraz, Phys. Rev.Lett. , 4629 (2000).[16] P. Parolo, R. K. Pan, R. Ghosh, B. A. Huberman,K. Kaski, and S. Fortunato, J. Infometr. , 734 (2015).[17] V. Sekara, P. Deville, S. E. Ahnert, A.-L. Barab´asi,R. Sinatra, and S. Lehmann, Proc. Natl. Acad. Sci. USA , 12603 (2018).[18] A. Clauset, C. R. Shalizi, and M. E. J. Newman, SIAMReview , 661 (2009).[19] H. Jeong, Z. N´eda, and A. L. Barab´asi, Europhys. Lett. , 567 (2003).[20] A. L. Barab´asi, H. Jeong, Z. N´eda, E. Ravasz, A. Schu-bert, and T. Vicsek, Physica A , 590 (2002).[21] A. D. Broido and A. Clauset, Nature Comm. , 1 (2019). Figure 4. Average number of publication within year t for authors with k publication at the begining of year t , with respectto k , for years t ∈ { , ..., } and for the three journals SCI, LAN, and PRL. The Pearson correlation coeﬃcients arerespectively r SCI ≈ . r LAN ≈ . r PRL ≈ . .