[PDF] The h-index is no longer an effective correlate of scientific reputation

Abstract

The impact of individual scientists is commonly quantified using citation-based measures. The most common such measure is the h-index. A scientist's h-index affects hiring, promotion, and funding decisions, and thus shapes the progress of science. Here we report a large-scale study of scientometric measures, analyzing millions of articles and hundreds of millions of citations across four scientific fields and two data platforms. We find that the correlation of the h-index with awards that indicate recognition by the scientific community has substantially declined. These trends are associated with changing authorship patterns. We show that these declines can be mitigated by fractional allocation of citations among authors, which has been discussed in the literature but not implemented at scale. We find that a fractional analogue of the h-index outperforms other measures as a correlate and predictor of scientific awards. Our results suggest that the use of the h-index in ranking scientists should be reconsidered, and that fractional allocation measures such as h-frac provide more robust alternatives. An interactive visualization of our work can be found at this https URL

Full PDF

TThe h-index is no longer an effective correlate ofscientiﬁc reputation

Vladlen Koltun and David Hafner Intelligent Systems Lab, Intel, Santa Clara, CA 95054, USA Intelligent Systems Lab, Intel, 85579 Neubiberg, Germany * [email protected] ABSTRACT

The impact of individual scientists is commonly quantiﬁed using citation-based measures. The most common such measure isthe h-index. A scientist’s h-index affects hiring, promotion, and funding decisions, and thus shapes the progress of science.Here we report a large-scale study of scientometric measures, analyzing millions of articles and hundreds of millions ofcitations across four scientiﬁc ﬁelds and two data platforms. We ﬁnd that the correlation of the h-index with awards that indicaterecognition by the scientiﬁc community has substantially declined. These trends are associated with changing authorshippatterns. We show that these declines can be mitigated by fractional allocation of citations among authors, which has beendiscussed in the literature but not implemented at scale. We ﬁnd that a fractional analogue of the h-index outperforms othermeasures as a correlate and predictor of scientiﬁc awards. Our results suggest that the use of the h-index in ranking scientistsshould be reconsidered, and that fractional allocation measures such as h-frac provide more robust alternatives.

Introduction

The h-index, proposed by Hirsch in 2005 , has become the leading measure for quantifying the impact of a scientist’s publishedwork. The h-index is prominently featured in citation databases such as Google Scholar, Scopus, and Web of Science. It informshiring, promotion, and funding decisions . It thereby shapes the evolution of the scientiﬁc community and the progress ofscience.Numerous variants of the h-index have been explored, and sophisticated alternatives have been proposed . None of thesehas displaced the h-index as the dominant measure of a scientist’s output. The endurance of the h-index can be attributed to anumber of characteristics. First, it summarizes a scientist’s output in a single number that can be readily used for comparisonand ranking. Second, it does not require a minimal number of publications or career length, and can thus be computed forscientists at all career stages. Third, it does not require tuning thresholds or parameters. Fourth, it is easily interpretable. Lastly,criticism notwithstanding, the h-index is seen as a robust measure of an individual scientist’s impact .Science continues to evolve and publication patterns change over time . Here we report an extensive empirical evaluationof individual research metrics. Since publication patterns differ across scientiﬁc ﬁelds , we collect large datasets in fourﬁelds of research: biology, computer science, economics, and physics. In each ﬁeld, we consider 1,000 most highly citedresearchers and trace their published output and its impact through two bibliographic data platforms: Scopus and GoogleScholar . The resulting datasets comprise 1.3 million articles and 102 million citations identiﬁed via Scopus and 2.6 millionarticles and 221 million citations identiﬁed via Google Scholar (Supplementary Fig. S3).We have cross-referenced the scientists in our datasets against lists of recipients of scientiﬁc awards that indicate recognitionby the scientiﬁc community: Nobel Prizes, Breakthrough Prizes, membership in the National Academies, fellowship of theAmerican Physical Society, Turing Award, fellowship of the Econometric Society, and other distinctions (SupplementaryFig. S4 and Supplementary Table S1). Among the 4,000 authors in our dataset, 75.6% have no such awards, 13.3% have oneaward, 5.1% have two, and 6.0% have three or more (Supplementary Fig. S4d). Our basic methodology is to correlate rankingsinduced by scientometric measures with rankings induced by scientiﬁc awards. The assumption is that a citation-based measurethat more reliably uncovers laureates of elite awards is a more veridical indicator of scientiﬁc reputation . Since publication,citation, and award patterns differ substantially across ﬁelds, we conduct parallel experiments in the four ﬁelds of research. Toconﬁrm the robustness of the ﬁndings, the studies are replicated across the two bibliographic platforms (Scopus and GoogleScholar).A number of prior studies are related to our work. Sinatra et al. analyze the careers of 2,887 physicists in the APS datasetand 7,630 scientists in the Web of Science database, considering approximately one million publications in total. Their study a r X i v : . [ c s . D L ] F e b ncludes evaluations that correlate individual scientiﬁc impact indicators with scientiﬁc awards. However, this is performed ona limited scale, taking into account only Nobel prizes in physics and Dirac and Boltzmann medals as indicators of scientiﬁcreputation. Considering publication and citation data of 84,116 scientists, Ioannidis et al. investigate a number of citationindicators based on how well they capture Nobel prize winners from the years 2011–2015. The recent study of Ayaz andMasood evaluates indices of researchers’ impact by analyzing 236,416 publications in the area of computer science. Theircomparison of bibliometric indices is based on 47 award winners in their dataset.Our study is conducted on a much larger scale. We analyze millions of articles in four different research ﬁelds that are citedhundreds of millions of times. We collect more than 10,000 awards and trace 1,848 distinct awards to the 4,000 scientists in ourdataset. (See supplementary information.) Most importantly, our datasets have yearly temporal granularity from 1970 onwards.This enables detailed evaluation of the temporal evolution of the effectiveness and predictive power of research metrics that, tothe best of our knowledge, has not been presented before.Our ﬁrst major ﬁnding is that the effectiveness of scientometric measures is declining. For example, the correlation of theh-index with scientiﬁc awards in physics has dropped from 0.34 in 2010 to 0.00 in 2019 (Kendall’s τ , Scopus physics dataset).This is associated with changing authorship patterns, including a higher prevalence of hyperauthorship. Our second majorﬁnding is that fractional allocation of citations among coauthors can mitigate this decline . In particular, for each measurewe study, its fractional counterpart is a better correlate and predictor of scientiﬁc awards. Among all measures, a fractionalanalogue of the h-index, h-frac, consistently outperforms alternatives.We test the robustness of the ﬁndings via controlled experiments across datasets. The main ﬁndings hold in all conditions:fractional allocation improves the effectiveness and predictive power of research metrics, and h-frac is consistently the mostreliable bibliometric indicator. Our results suggest that the use of the h-index in ranking scientists should be reconsidered, andthat fractional allocation measures such as h-frac provide more robust alternatives. The data also indicate, contrary to concernsexpressed in the literature, that fractional allocation measures are not antithetic to collaboration. Our ﬁndings can lead to moreeffective distribution of resources and thus accelerate scientiﬁc discovery . Our data, methodology, and ﬁndings may also havebroader applications in the empirical analysis of science . Results

Declining effectiveness of individual research metrics

Fig. 1a shows the effectiveness of scientometric measures over the past 30 years. The effectiveness of a scientometric measureis quantiﬁed by the correlation between the ranking induced by this measure and the ranking induced by scientiﬁc communityawards at a given point in time. Here we report Kendall’s τ on the Scopus physics dataset (see Supplementary Fig. S6 for othercorrelation criteria and datasets). In addition to the h-index (h), we evaluate the total number of citations to a scientist’s work (c),the mean number of citations per paper ( µ , advocated by Lehmann et al. ), Egghe’s g-index , the o-index , and the mediannumber of citations received by a scientist’s highly-cited papers (m, highlighted by Bornmann et al. ). (See supplementaryinformation.)As Fig. 1a demonstrates, the effectiveness of scientometric measures has declined. The decline is particularly pronouncedfor the h-index. The effectiveness of the h-index, as measured by Kendall’s τ , varied between 0.33 and 0.36 from 1990 to 2010,but dropped to 0.00 by 2019 on the Scopus physics dataset. This is concomitant with a dramatic shift in authorship patterns,illustrated by the average number of coauthors per paper for highly-cited physicists. While the mean number of coauthors perpublication, averaged across highly cited physicists, was 78 in 1994 and 121 in 2004, it rose to 952 in 2019, with 10% of thescientists having more than 2,441 coauthors per publication on average. (See https://h-frac.org/dataset-s2 .)This is further illustrated in Fig. 1b, which shows the distribution of the average number of coauthors per paper forhighly-cited physicists in each year from 1970 onwards. While small authorship teams were nearly universal in the beginning ofthis period (84% of the scientists had <

10 coauthors per publication on average in 1980), the set of highly-cited physicists hascome to be dominated by “hyper-collaborators”: 68% of the scientists had >

100 coauthors per publication on average in 2019.Large-scale collaboration has been a feature of science for centuries, but joint authorship has been institutionalized on a newscale in the past decade . Scientiﬁc consortia comprise thousands of authors who jointly author hundreds of publications .All members of the consortium are listed as authors on all papers . This has been referred to as hyperauthorship . Ourresults indicate that this behavior is reducing the effectiveness of established scientometric indicators. This is further illustratedin Fig. 1c, which shows the ranking of physicists by h-index in 1999, 2009, and 2019. The hyper-collaborators have permeatedthe ranking. Fractional allocation

Are there scientiﬁc impact metrics that share the advantages of the h-index and are robust to contemporary publication patterns?Hirsch proposed a bibliometric indicator that takes authorship into account , but his mechanism requires recursive computation K enda ll ' s hcgom 02004006008001000 A v e r age au t ho r c oun t Avg. authors bc A v g . au t ho r s pe r pape r Figure 1.

The effectiveness of scientometric measures is declining. ( a ) Effectiveness of scientometric measures as correlatesof scientiﬁc awards in the Scopus physics dataset. ( b ) Color-coded distribution of the average number of coauthors perpublication in this dataset. ( c ) Ranking of physicists by the h-index. Each data point is a scientist. Color and the vertical axisrepresent the average number of coauthors per publication.across the citation network and, even in its more tractable approximate form, is “particularly unkind to junior researchers” .An alternative that inherits the simplicity of the h-index is to allocate citations fractionally among authors.Derek de Solla Price advocated distributing credit for a scientiﬁc publication among all authors to preclude undesirablepublication practices: “The payoff in brownie points of publications or citations must be divided among all authors listed on thebyline, and in the absence of evidence to the contrary it must be divided equally among them. [...] If this is strictly enforced itcan act perhaps as a deterrent to the otherwise pernicious practice of coining false brownie points by awarding each author fullcredit for the whole thing.” . Since the introduction and broad adoption of the h-index , many variants and related measureshave been proposed . Some of these implement fractional allocation. Batista et al. present a normalization of the h-indexby the average number of authors of papers in the h-core. Wan et al. perform a similar normalization, but use the squareroot of the average authors of papers in the h-core. Chai et al. describe a variant of the h-index that is based on citationcounts normalized by the square root of the number of authors per paper. Egghe introduces alternative versions of the h- andg-index (see supplementary information) that use citation counts normalized by the number of authors. Egghe’s version ofthe h-index corresponds to the h-frac measure that we ﬁnd to be particularly effective in our experiments. Note that the workof Egghe is purely theoretical and does not include any experiments with real bibliographic data . Schreiber presentsan alternative fractional allocation measure. Instead of using normalized citation counts, Schreiber proposes to ﬁrst computealternative (“effective”) publication ranks that are divided by the number of authors. These effective ranks are then used todetermine the h m -index, akin to computing the h-index with unmodiﬁed publications ranks. A related alternative has alsobeen proposed for the g-index . Other variants that apply different fractional allocation schemes can also be found in theliterature . While there exist bibliometric tools that implement fractional versions of the h-index , we are not awareof published systematic empirical evaluation of fractional allocation measures with real bibliographic data, on a large scale(millions of articles), and across multiple scientiﬁc ﬁelds and data platforms. We contribute such an evaluation. Among othermeasures, we experimentally evaluate h-frac alongside the scientometric measures of Batista et al. (h I ), Schreiber (h m ),Wan et al. (h p ), and Chai et al. (h ap ).Fig. 2a(top) contrasts the effectiveness of fractional allocation measures and traditional ones across all research ﬁelds K enda ll ' s All Areas (Scopus & Google Scholar)1995 2005Year0.00.10.20.30.4 K enda ll ' s hh-fraccc-frac-fracgg-fracoo-fracmm-frac b K enda ll ' s All Areas (Scopus & Google Scholar)1995 2005Year0.00.10.20.30.4 K enda ll ' s h-frachhhhARh(2)hwi × ci10 Figure 2.

Effectiveness and predictive power of scientometric measures. In each subﬁgure, the top row depicts the correlationof bibliometric indicators and scientiﬁc awards, and the bottom row shows the predictive power ﬁve years into the future.( a ) Evaluation across all research areas and data platforms (Scopus and Google Scholar). ( b ) Evaluation of h-frac alongsideadditional measures across all research areas and data platforms.and data platforms. We again measure the correlation of rankings induced by different bibliometric measures and scientiﬁcreputation as evidenced by awards bestowed by the scientiﬁc community. Detailed results for the individual research areas canbe found in Supplementary Fig. S6(left).We ﬁnd that fractional measures are signiﬁcantly more effective correlates of scientiﬁc awards than unnormalized indicatorssuch as the h-index. The fractional analogue of the h-index, h-frac, is the most effective measure across datasets (average τ = .

32 in 2019, compared to 0 .

16 for the h-index; see Supplementary Table S2(top)). The effectiveness of fractional allocationmeasures is more stable over time than the effectiveness of their traditional counterparts. (For h-frac, average τ = .

28 in 1989and 0 .

32 in 2019; for the h-index, average τ = .

27 in 1989 and 0 .

16 in 2019.)

Predictive power and other measures

Next we evaluate the predictive power of different bibliometric measures. Prior studies have largely focused on the ability ofmeasures to predict their own future values, or those of other bibliometric indicators . In contrast, we study the ability ofan indicator to predict a scientist’s future reputation as evidenced by scientiﬁc awards. (Hirsch recognized this as a meaningfulgoal when he wrote “how likely is each candidate to become a member of the National Academy of Sciences 20 years down theline?”, but did not operationalize this .) We measure the correlation of rankings induced by scientometric indicators in a givenyear (e.g. 2010) with rankings induced by awards in a future year (e.g. 2015). Higher correlation implies stronger ability topredict future scientiﬁc reputation based on present-day bibliometric data.Fig. 2a(bottom) reports predictive power ﬁve years into the future. The results are summarized across all research ﬁelds anddata sources. The predictive power of the h-index has declined since its introduction (average τ = .

32 in 2004 versus 0 .

24 in2014). Other traditional indicators have also declined in effectiveness. Fractional measures are more predictive. h-frac has thehighest predictive power across datasets and its predictive power is stable over time (average τ is 0 .

34 in 1994, 0 .

36 in 2004,and 0 .

33 in 2014).We further evaluate h-frac alongside an extensive list of other scientometric measures . The results aresummarized in Fig. 2b. Measures that integrate some form of normalization by the number of coauthors (h-frac, h I , h m , h p , h ap )outperform measures that do not apply such normalization. h-frac is the best-performing measure in terms of both correlationwith scientiﬁc awards and predictive power. Robustness of the ﬁndings

We now test the robustness of the ﬁndings in a number of additional controlled experiments.First, we repeat the experiments with different correlation statistics (see supplementary information). The results aresummarized in Fig. 3b, and detailed results for all research areas and data platforms can be found in the supplementary materials V a l ue ( a ll da t a s e t s ) Kendall's hcgom h-fracc-frac-fracg-fraco-fracm-frac b V a l ue ( a ll da t a s e t s ) Area under the curve (AUC) 1995 2005 2015YearSomers' D 1995 2005 2015YearGoodman and Kruskal's 1995 2005 2015YearSpearman's c V a l ue ( a ll da t a s e t s ) Weighted 1995 2005 2015YearYes / no award count 1995 2005 2015Year75% of the awards 1995 2005 2015Year50% of the awards d V a l ue ( a ll da t a s e t s ) No hyperauthors 1995 2005 2015YearLeast citations 1995 2005 2015YearPubl. peak in [2000, 2010) 1995 2005 2015YearPubl. peak in [2010, 2020)

Figure 3.

Controlled experiments that test the robustness of the ﬁndings. ( a ) Reference result from the main experiments (cf.Fig. 2a(top)). ( b ) Corresponding results with other correlation statistics. ( c and d ) Results in different conditions: using subsetsof awards, researchers, and different mechanisms for counting awards.(Supplementary Fig. S6). Fractional measures continue to outperform their traditional counterparts, and h-frac is the mostreliable indicator.Next we analyze robustness with respect to the set of scientiﬁc awards considered in our datasets. Our main experimentstreated all awards equally, and ranked scientists by the total number of awards received. For example, a Nobel prize was giventhe same weight as membership in the National Academy of Sciences, and a scientist with two awards was ranked higher than ascientist with one award. To examine whether our ﬁndings are sensitive to this choice, we repeat the experiments under differentconditions. First, we assign 10 times higher weight to awards with 100 or fewer laureates. (See Supplementary Table S1.)Second, we evaluate a design in which the number of awards does not affect a scientist’s ranking: a scientist with an award ofany kind is ranked higher than a scientist with no awards, but all scientists with one or more awards are ranked equally. Theresults are summarized in Fig. 3c(left) and presented in detail in the supplementary materials (Supplementary Fig. S7). Ourﬁndings hold for both conditions. (The results remain consistent for other weighting factors and thresholds as well.)To further assess sensitivity, we repeat the experiments with random subsets of awards (using 75% and 50% of awards in ourdatabase). The results are reported in Fig. 3c(right) and Supplementary Fig. S7. Our ﬁndings again hold. This demonstrates therobustness of our ﬁndings with respect to the considered awards and the matching procedure. (See supplementary information.)Is the decline in the effectiveness of the h-index and other traditional scientometric measures solely due to the rise ofhyperauthorship? To investigate this hypothesis, we curtail the effect of hyperauthorship by reproducing the experimentswith the set of authors who have at most 100 coauthors per paper on average. The results in Fig. 3d(left) show that ourﬁndings hold in this condition as well: we see a strong decline in the effectiveness of traditional measures, in contrast to thestable performance of their fractional counterparts. Hyperauthors appear to be an extreme manifestation of a broader shift inpublication patterns. Hyperauthors themselves are not the main cause of the decline in the effectiveness of the h-index andother measures, and pruning hyperauthors from datasets does not avert this decline.Next we perform experiments with different subsets of researchers. First we remove the most highly-cited researchers inour datasets and repeat the experiments with the bottom 50% of researchers in each ﬁeld by number of citations. This examineswhether our ﬁndings hold for researchers that are not at the very top of their ﬁelds in terms of citations. Then we analyze theeffect of the main time period of a scientist’s work. (Details on the temporal coverage of the authors in our dataset can befound in Supplementary Fig. S3.) To this end, we choose subsets of researchers that are active at different periods of time.Speciﬁcally, we test the subset of researchers whose peak productivity (in terms of number of publications) occurs during the h c go m h - f r a cc - f r a c - f r a c g - f r a c o - f r a c m - f r a c hcgomh-fracc-frac-fracg-fraco-fracm-frac 1999 h c go m h - f r a cc - f r a c - f r a c g - f r a c o - f r a c m - f r a c h c go m h - f r a cc - f r a c - f r a c g - f r a c o - f r a c m - f r a c b P ea r s on ' s h - h-fracc - c-frac - -fracg - g-fraco - o-fracm - m-frac 20406080100120140 A v ge r age au t ho r c oun t Avg. authors

Figure 4.

Correlation between scientometric measures. ( a ) Correlation matrices of scientometric measures in the years 1999,2009, 2019. ( b ) Temporal evolution of correlations between traditional measures and their fractional counterparts.years [ , ) , and another subset whose peak productivity occurs during the years [ , ) .The results are summarized in Fig. 3d and given in detail in Supplementary Fig. S8. Our main ﬁndings are robust to allthese perturbations and hold in all conditions: fractional allocation measures always outperform their traditional counterparts,and h-frac is the most reliable bibliometric indicator across all conditions. Correlation between scientometric measures

Our experiments indicate that fractional allocation measures are superior to their traditional counterparts. To analyze thisfurther, we investigate the correlation between different scientometric measures . To this end, we compute the correlationbetween each pair of measures, aggregated over all datasets (Fig. 4a). To interpret the results, we consider three different 6x6blocks in the correlation matrices:(i) The lower right block summarizes the correlations between the fractional measures. It is quite stable over the years. Allfractional measures are moderately correlated, with the exception of µ -frac. The lower correlation of µ -frac with theother fractional measures can be explained by the explicit normalization by the number of publications in µ -frac, whichis absent in the other measures. As can be seen in the preceding results, µ -frac is the worst-performing measure amongthe fractional ones.(ii) The upper left block summarizes the correlations between the traditional measures. These correlations are stable overtime. The traditional measures are moderately correlated with each other, again with the exception of µ . This can againbe attributed to the explicit normalization by the number of publications in µ .(iii) The lower left block captures the correlations between the traditional and fractional measures. Notably, we observe thatthese correlations decrease signiﬁcantly from 2009 to 2019. All correlation values decrease, including the correlationsbetween the traditional measures and their direct fractional counterparts (the diagonal in the lower-left block). Themeasures µ and µ -frac stand out again, which can be attributed to the same factors as in the other blocks.Why have the traditional and fractional measures become less correlated over time? We examine the temporal evolutionof correlations between traditional measures and their fractional counterparts at ﬁner granularity (Fig. 4b). We see that thecorrelation decreases over time, with accelerated decline after 2010. Concurrently, the average number of authors per publicationrises signiﬁcantly. The two trends are strongly correlated. Since accounting for the number of authors per publication is thecentral feature that distinguishes fractional measures from their traditional counterparts, we attribute the diminishing correlationbetween the measures to the changing publication culture, as reﬂected in the dramatic increase in the average number of authorsper paper. Further analysis

Fig. 5a provides a number of case studies that highlight the stability of h-frac and the deterioration of the h-index over time.These case studies are further illustrated in Fig. 5b. The evolution of h and h-frac values over time is visualized in Figs. 5cand 5d. Hyperauthors (red) acquire increasingly high h-indices over time, commonly rising above 80 by 2019. In contrast, theirh-frac values remain low, predominantly less than 20. Fig. 5e visualizes the distribution of h-frac values in the four ﬁelds of R an k Richard P. Feynman 1995 2005 2015YearDavid J. Gross 1995 2005 2015YearStephen Hawking 1995 2005 2015YearElliott H. Lieb 1995 2005 2015YearJohn H. Schwarz 1995 2005 2015YearChen Ning Yanghh-frac b Rank by h10 R an k b y h - f r a c C. YangD. Gross R. FeynmanE. Lieb A. EinsteinE. WittenS. HawkingJ. Schwarz 1979 10 Rank by hC. Yang D. Gross R. FeynmanE. Lieb A. EinsteinE. Witten S. HawkingJ. SchwarzA. Barabási 1999 10 Rank by hC. YangD. GrossR. Feynman E. LiebA. EinsteinE. WittenS. HawkingJ. Schwarz A. Barabási2019 10 c d e Rank by h-frac020406080100120 h - f r a c BiologyComputer ScienceEconomicsPhysics f Edward Witten (

Figure 5.

Further analysis. ( a ) Ranking induced by h and h-frac for a number of scientists in the Scopus physics dataset.( b ) Comparison of rankings induced by h and h-frac in the Scopus physics dataset. Scientists are color-coded by the averagenumber of coauthors per publication. ( c ) Evolution of the h-index of each scientist in the Scopus physics dataset over time.Each scientist is a curve. Color represents the average number of coauthors per publication. ( d ) Evolution of h-frac over time.( e ) Distribution of h-frac values in each ﬁeld of research. ( f ) Distribution of the number of authors per publication for 10physicists with the highest h-frac in 2019. esearch. The top 100 scientists have h-frac values of 59 and higher in biology, 39 and higher in computer science, 37 andhigher in physics, and 29 and higher in economics.Fig. 5f examines in detail the output of the 10 physicists with the highest h-frac in 2019. The data suggests that the h-fracmeasure is not antithetical to collaboration, which is associated with scientiﬁc progress . Among physicists with the highesth-frac are proliﬁc collaborators such as Albert-László Barabási ( Discussion

We have conducted a large-scale systematic analysis of scientometric measures. We have demonstrated that commonly usedmeasures of a scientist’s impact have become less effective as correlates and predictors of scientiﬁc reputation as evidenced byscientiﬁc awards. The decline in the effectiveness of these measures is associated with changing authorship patterns in thescientiﬁc community, including the rise of hyperauthorship. We have also demonstrated that fractional allocation of citationsamong coauthors improves the robustness of scientometric measures. In particular, the h-frac, a fractional analogue of theh-index, is the most reliable measure across different experimental conditions.Our analysis did not uncover unreasonable penalization of collaboration among researchers by fractional allocation measures.Fractional allocation does make explicit the expectation that each author makes a meaningful contribution to the publication’simpact. In the words of Derek de Solla Price, “Those not sharing the work, support, and responsibility do not deserve theirnames on the paper, even if they are the great Lord Director of the Laboratory or a titular signatory on the project. Any time youtake a collaborator you must give up a share of the outcome, and you diminish your own share. That is as it should be; to dootherwise is a very cheap way of increasing apparent productivity.” . Our study indicates that fractional allocation neutralizesthe inﬂationary effects of hyperauthorship on bibliometric impact indicators, but continues to reward collaborative productionof impactful scientiﬁc research .A number of aspects of bibliometric impact indicators have not been addressed in our study. One is the normalizationof bibliometric indicators across different ﬁelds, so as to enable direct comparison of scientists across ﬁelds with differentpublication and citation patterns . Another is the presence of self-citations and whether such citations should be handleddifferently . Likewise we have not addressed the role of author order and whether this order should be taken into account inautomatically allocating credit for a publication’s impact . These are interesting avenues for future work.Our work has both near-term and long-term implications. In the near term, our work indicates that the use of the h-index inassessing individual scientiﬁc impact should be reconsidered, and that h-frac can serve as a more robust alternative. This canameliorate distortions introduced by contemporary authorship practices, lead to a more effective allocation of resources, andfacilitate scientiﬁc discovery. In the longer term, our data, methodology, and ﬁndings can inform the science of science andsupport further quantitative analysis of research, publication, and scientiﬁc accomplishment.An interactive visualization of our work can be found at https://h-frac.org . Methods

Highly-cited researchers

We construct a dataset of highly-cited researchers in four research ﬁelds: biology, computer science, economics, and physics.To begin, we retrieve a set of highly-cited researchers in each ﬁeld via Google Scholar. To this end, we query Google Scholarwith labels that are characteristic of different research areas (Supplementary Fig. S1). The retrieved authors are sorted by thenumber of citations: most highly cited researchers appear ﬁrst. However, the results are noisy because the queries retrieve allauthors that feature the queried keyword phrases in their proﬁles. For example, a physicist who features “high performancecomputing” as a keyword phrase in their proﬁle would be retrieved by the corresponding query. Since “high performancecomputing” is one of our queries for computer science researchers, the physicist would, in the absence of further validation, beadded to the computer science dataset.To clean up the initial lists compiled via Google Scholar, we cross-reference them with the Scopus database. A scientist’sScopus proﬁle indicates their primary research area. We use this primary research area to ﬁlter the initial lists. To this end,we need to match author proﬁles in Google Scholar with Scopus proﬁles. To perform the association, we ﬁrst create a set ofcandidate matches by querying the Scopus database with the researcher’s name. To obtain the query name, we clean the GoogleScholar proﬁle name via simple heuristics (e.g. remove extraneous information such as links or afﬁliation names). To reducefalse positives, we limit the candidates to Scopus proﬁles with more than 50 papers (more than 30 papers for economics). Toperform the actual matching, we analyze the top 100 papers (sorted by citation counts) of the different candidate proﬁles. If weﬁnd at least three matching paper titles in the Scholar and Scopus proﬁles, we associate the two proﬁles.After matching, we ﬁlter the authors in each ﬁeld by their primary subject area in Scopus (Supplementary Fig. S2). Afterﬁltering, we retain the top 1,000 authors in each ﬁeld. This ﬁltered set is derived from the top 1,186 Google Scholar proﬁles in iology, 1,711 in computer science, 1,632 in economics, and 1,296 in physics. This means that, in aggregate, more than twothirds of the initial Google Scholar proﬁles are matched to corresponding Scopus proﬁles with the desired primary subject area.Authors that could not be matched or do not have the requisite primary subject area are removed from the corresponding list.(They may still be retained in a list for a different ﬁeld; e.g. physics rather than computer science.) One attribute of our ﬁlteringprocedure is that the lists of authors in the four ﬁelds are disjoint: a scientist is only included in at most one list.

Google Scholar data

For all 4,000 researchers, we collect their Google Scholar publications including citation data . In particular, we collect(for each publication) the publication year, the number of authors, and the number of citations per year. We ﬁlter out certainpublications: (i) publications that do not list authors or the publication year, (ii) patents, and (iii) duplicates marked by GoogleScholar. Moreover, we noticed that the publication date and the citation years in Google Scholar are sometimes inconsistent: apublication is sometimes cited before is was published. As a remedy, we take the minimum of the publication year and the yearof the ﬁrst citation as the effective publication year.We also noticed that Google Scholar generally under-reports the number of authors for publications with large author sets.Manual inspection indicates that Scholar does not record all authors, but only the ﬁrst ∼

150 authors. In particular, the maximalvalue of the average author count in the Scholar dataset is 230, versus 3,130 in Scopus. This is an important limitation of theScholar data that has to be kept in mind. The consistency of our ﬁndings across the Scholar and Scopus datasets, in spite of thetruncated author counts in the Scholar data, indicates that our ﬁndings are robust to such noise and bias in the data.

Scopus data

Similar to the Google Scholar data, we collect for each of the 4,000 authors their Scopus publications with citation data .Since the Scopus data is signiﬁcantly less noisy than the Scholar data, no special data cleaning and ﬁltering are required.One salient difference between the datasets is that the Google Scholar datasets contain approximately twice as manypublications and citations than the Scopus datasets. One contributing factor is that Scopus indexes only a subset of the venuescrawled by Google Scholar. For example, Scopus does not index online repositories such as arXiv. In agreement with priorstudies, we have found Google Scholar data to be both broader and noisier than Scopus . The consistency of our ﬁndingsacross the Scholar and Scopus datasets highlights their robustness. Award data

We use awards bestowed by the scientiﬁc community as indicators of scientiﬁc reputation. To this end, we consider highlyselective distinctions, some of which span multiple scientiﬁc ﬁelds, such as membership in the National Academy of Sciences,and some of which are ﬁeld-speciﬁc, such as fellowship of the Econometric Society (Supplementary Fig. S4a, SupplementaryTable S1, and https://h-frac.org/dataset-s1 ).Our award data collection procedure begins by compiling complete lists of laureates for each award from the respectiveweb sites. (This is nontrivial since it requires customized parsing techniques for each award.) Next, we search these lists oflaureates for names in our datasets. This search is based on the surname and the initials from each Scopus author proﬁle in ourdataset. This yields a list of candidate matches. We then manually check all candidate matches, considering the author detailsin the Scopus proﬁle, such as name variations, afﬁliations, and subject areas, as well as details extracted from the correspondingaward pages, such as bio, afﬁliation, and country. (Supplementary Figs. S4, A, B, and Supplementary Table S1).For each laureate, we also retain the year in which the award was conferred. This is central to our measurement of correlationand predictive power over time.

References

1. Hirsch, J. E. An index to quantify an individual’s scientiﬁc research output.

Proceedings of the National Academy ofSciences et al.

Do metrics matter?

Nature

Science

Nature

Scientometrics (2009).6. Sinatra, R., Wang, D., Deville, P., Song, C. & Barabási, A.-L. Quantifying the evolution of individual scientiﬁc impact. Science (2016). . Hirsch, J. E. Does the h index have predictive power?

Proceedings of the National Academy of Sciences

Scientometrics

Nature et al.

Science of science.

Science (2018).12. Alonso, S., Cabrerizo, F. J., Herrera-Viedma, E. & Herrera, F. h-Index: A review focused in its variants, computation andstandardization for different scientiﬁc ﬁelds.

Journal of Informetrics Journal of Informetrics Journal of Informetrics

Scopus and http://api.elsevier.com . Access date: February 5–21, 2020. 2020.16.

Google Scholar https://scholar.google.com/ . Access date: February 5–21, 2020. 2020.17. Ioannidis, J. P., Klavans, R. & Boyack, K. W. Multiple Citation Indicators and Their Composite across ScientiﬁcDisciplines.

PLOS Biology (2016).18. Ayaz, S. & Masood, N. Comparison of researchers’ impact indices. PLoS One (2020).19. Price, D. d. S. Multiple Authorship. Science

Journal of the AmericanSociety for Information Science and Technology

PLOS Biology (2015).22. Lehmann, S., Jackson, A. D. & Lautrup, B. E. Measures for measures. Nature

Scientometrics

Nature Physics

Journal of the American Society for InformationScience and Technology

ScienceWatch (2012).27. ATLAS Collaboration.

ATLAS Authorship Policy https : / / twiki . cern . ch / twiki / bin / view / Main /ATLASAuthorshipPolicy . 2010.28. Castelvecchi, D. Physics paper sets record with more than 5,000 authors.

Nature (2015).29. Milojevi´c, S. Principles of scientiﬁc research team formation and evolution.

Proceedings of the National Academy ofSciences

Scientometrics

Science & Technology Libraries

Scientometrics

COLLNET Journal of Scientometrics and Information Management The Adapted Pure h-Index in Proccedings of WIS 2008, Berlin FourthInternational Conference on Webometrics, Informetrics and Scientometrics & Ninth COLLNET Meeting (2008).35. Schreiber, M. A modiﬁcation of the h-index: The h m -index accounts for multi-authored manuscripts. Journal of Informet-rics

6. Schreiber, M. To Share the Fame in a Fair Way, h m Modiﬁes h for Multi-authored Manuscripts.

New Journal of Physics

Journal of the American Society for InformationScience and Technology

Journal of Informetrics Scientometrics

Scientometrics

Scientometrics (2011).42. Rousseau, R. A note on the interpolated or real-valued h-index with a generalization for fractional counting. Aslib Journalof Information Management

Publish or Perish https://harzing.com/resources/publish-or-perish . 2007.44. Kozlowski, L. P. fCite: a fractional citation tool to quantify an individual’s scientiﬁc research output bioRxiv: https://doi.org/10.1101/771485 . 2019.45. Penner, O., Pan, R. K., Petersen, A. M., Kaski, K. & Fortunato, S. On the predictability of future impact in science.

Scientiﬁc Reports (2013).46. Jin, B.-H. H-index: An Evaluation Indicator Proposed by Scientist. Science Focus Chinese ScienceBulletin

ISSI Newsletter Information Processing & Management

Journal of the AmericanSociety for Information Science and Technology (2009).51. Kosmulski, M. MAXPROD: A new index for assessment of the scientiﬁc output of an individual, and a comparison withthe h-index. International Journal of Scientometrics, Informetrics and Bibliometrics (2007).52. Radicchi, F. & Castellano, C. Analysis of bibliometric indicators for individual scholars in a large data set. Scientometrics

Science

A Century of Science: Globalization of Scientiﬁc Collaborations, Citations, andInnovations in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2017), 1437–1446.55. Wu, L., Wang, D. & Evans, J. A. Large teams develop and small teams disrupt science and technology.

Nature

Annalen der Physik

PLOS ONE (2011). Author contributions statement

V.K. conceived and directed the project. D.H. collected the data, implemented the methods, and performed experiments. V.K.and D.H. designed experiments, analyzed data, and wrote the paper.

Additional information

The authors declare no competing interests. upplementary Information

Data Collection

Scholar Data

In total, we collect 2,624,994 (valid) publications in Google Scholar that are collectively cited 220,783,854 times. Thedistribution of the publications and citations among the research ﬁelds is as follows (Fig. S3(bottom)): biology accounts for31% of the publications and 38% of the citations, computer science for 22% and 17%, economics for 12% and 10%, andphysics for 35% and 35%. Our dataset offers yearly granularity from 1970 onwards.

Scopus Data

In total, the Scopus dataset comprises 1,290,219 publications with 102,405,086 citations. The distribution of publications andcitations is as follows (Fig. S3(top)): biology accounts for 34% of publications and 49% of citations, computer science for 20%and 14%, economics for 6% and 4%, and physics for 41% and 33%.

Award Data

In total, we trace 1,848 distinct awards to the 4,000 scientists in our dataset. Some scientists have received multiple awards.The number of distinct scientists who have received at least one award in the dataset is 976 (24.4%). 13.3% of the researchersreceived exactly one award, 5.1% received two, 3.1% three, and 2.9% more than three awards (Fig. S4d). Of the 1,848 distinctawards, 653 (35.3%) were granted to researchers in biology, 526 (28.5%) in economics, 402 (21.8%) in physics, and 267(14.4%) in computer science (Fig. S4c).

Scientometric Measures

The following paragraphs explain the main scientometric measures that we consider in this work.

H-Index

The h-index, originally proposed by Hirsch in 2005 , is deﬁned as the maximal value of h such that h publications by the authorhave at least h citations each. Let N be the number of publications and let { c , . . . , c N } the number of citations per paper indecreasing order; i.e. c i ≥ c j for i < j . The h-index is given byh = max ( h ) s.t. c h ≥ h . (1) C-Index

We deﬁne the c-index as the total number of citations to all publications by the author:c = N ∑ i = c i . (2) µ -Index Lehmann et al. advocated the use of the mean number of citations per paper: µ = N N ∑ i = c i . (3) G-Index

Egghe’s g-index is a variation on the h-index. It is deﬁned as the maximal value of g such that g publications by the authorcollectively have at least g citations in total:g = max ( g ) s.t. ∑ i ≤ g c i ≥ g . (4) O-Index

The o-index, proposed by Dorogovtsev and Mendes in 2015 , is deﬁned as the geometric mean of the h-index (h) and thecitation count of the most-cited publication ( c ):o = (cid:112) h c . (5) -Index The m-index, proposed by Bornmann et al. , is deﬁned as the median number of citations received by publications in theh-core. The h-core comprises the top h publications ranked by citation count. Thusm = median ( { c , . . . , c h } ) . (6)Based on these traditional scientometric measures, we deﬁne their factional counterparts ( -frac ). The fractional measuresare based on citation counts ¯ c that are normalized by the number of authors per publication:¯ c = cA , (7)where A is the number of authors. The intuition is that this normalization distributes the contribution of a publication equallyamong the authors. This is clearly a simpliﬁcation of credit allocation in science , but it is simple and does not introduce newparameters. H-Frac

The fractional h-index, h-frac, is deﬁned ash-frac = max ( h ) s.t. ¯ c h ≥ h . (8)Here { ¯ c , . . . , ¯ c N } are the normalized citation counts per paper in decreasing order; i.e. ¯ c i ≥ ¯ c j for i < j . C-Frac

The fractional measure c-frac is the aggregate of the author’s normalized citation counts:c-frac = N ∑ i = ¯ c i . (9) µ -Frac µ -frac is the mean of the normalized citation counts, averaged over all publications by the author: µ -frac = N N ∑ i = ¯ c i . (10) G-Frac g-frac is likewise deﬁned by analogy with the g-index using normalized citation counts:g-frac = max ( g ) s.t. ∑ i ≤ g ¯ c i ≥ g . (11) O-Frac

We deﬁne o-frac as the geometric mean of the fractional h-index (h-frac) and the largest normalized citation count ( ¯ c ):o-frac = (cid:112) h-frac ¯ c . (12) M-Frac

The fractional counterpart of the m-index, m-frac, is the median of the normalized citation counts among the top h-fracpublications ranked by normalized citation counts:m-frac = median ( { ¯ c , . . . , ¯ c h-frac } ) . (13) Effectiveness of Scientometric Measures

ROC Curves

We analyze a receiver operating characteristic (ROC) curve for each dataset (Fig. S5a). We rank the scientists by the consideredscientometric measure. Lower rank corresponds to higher value of the measure. The scientist with the highest value in thedataset has rank 1. The ROC curve starts at ( , ) . We iterate over the list of scientists, in order of rank r (from 1 onwards),and aggregate the awards. Step r adds the following data point to the ROC curve. The x-coordinate is the fraction of the ﬁrst r scientists that have not received any award in the dataset ( false positive rate ). The y-coordinate is the fraction of the total umber of awards in the dataset received by the ﬁrst r scientists ( true positive rate ). By construction, the ROC curve ends,for r = ( , ) . The area under the curve (AUC) is an indicator of the effectiveness of the considered scientometricmeasure . If a measure ranks scientists that have garnered more awards more highly, the ROC curve rises faster and the AUC ishigher.The fractional measures perform much better than their non-fractional counterparts. h-frac performs best across all researchareas and datasets (Fig. S5).In addition to the AUC, we analyze other criteria that quantify the correlation between a ranking of scientists by acertain scientometric measure and a ranking by the number of awards. If the two rankings are similar (high correlation), thescientometric measure is taken to be a more veridical indicator of scientiﬁc reputation. We evaluate the following correlationmeasures. Kendall’s τ We use the τ b form of Kendall’s τ , which accounts for ties . It is deﬁned as τ = τ b = C − D (cid:112) ( C + D + T A ) · ( C + D + T B ) , (14)where C is the number of concordant and D the number of discordant pairs in two rankings A and B . T A is the number of tiesin A only and T B is the number of ties in B only. If a tie occurs in both A and B , it is not added to either T A or T B . Equation (14)reduces to τ a when no ties are present : τ a = C − Dn ( n − ) / , (15)where n is the number of elements in A or B . Somers’ D

We also measure Somers’ D . Somers’ D of a ranking A w.r.t. a ranking B is deﬁned asD AB = τ a ( A , B ) τ a ( B , B ) . (16)Note that Somers’ D is asymmetric. In our evaluation, we set A to the ranking by the considered scientometric measure and B to the ranking based on awards. Goodman and Kruskal’s γ Goodman and Kruskal’s γ is deﬁned as follows : γ = C − DC + D . (17) Spearman’s ρ We also compute Spearman’s rank correlation coefﬁcient , which is deﬁned as the Pearson correlation coefﬁcient between therank variables: ρ = cov ( r A , r B ) σ r A σ r B , (18)where r A and r B are rank variables and σ r A and σ r B the corresponding standard deviations.The results in Table S2 support the following observations. First, the fractional measures perform consistently better thantheir non-fractional counterparts. Furthermore, the relative order of effectiveness of scientometric measures is consistent in thedifferent correlation statistics. This highlights the robustness of our ﬁndings. Overall, h-frac is the most effective scientometricmeasure in terms of correlation with scientiﬁc reputation (as indicated by scientiﬁc awards).Of the four research ﬁelds we study, economics stands out in terms of the relative effectiveness of different scientometricmeasures. In economics, g-frac and o-frac appear to be the most effective measures. However, the variation between thescientometric measures in economics is substantially smaller than in the other research ﬁelds. For example, the minimal andmaximal values of Kendall’s τ in biology in the Scopus dataset are 0.02 and 0.34, while the minimal and maximal values foreconomics are 0.22 and 0.30 (Table S2(top)). Examination of the data suggests that the ﬁeld of economics has retained moreclassical publication patterns, with smaller author sets, fewer publications per author, and minimal hyperauthorship. emporal Dynamics Effectiveness Over Time

Next, we analyze the effectiveness of scientometric measures in each year from 1990 onwards. To this end, we consider foreach year Y the publication and award data up to that year. In particular, we only consider publications up to year Y as well ascitations and awards up to the end of year Y. This enables us to investigate the evolution of the effectiveness of scientometricmeasures over time (Fig. 2(top)).We again observe that the fractional measures perform better than their non-fractional counterparts. While most measurestend to decrease in effectiveness over time, the factional measures are more stable. The difference between the fractional andnon-fractional measures increases over time. From 2014 onwards, all fractional measures are on average more effective thanany of the traditional measures (Fig. 2a(top)). Among all measures, h-frac is the most effective in terms of correlation withscientiﬁc reputation (Fig. 2(top)).

Predictive Power Over Time

We also investigate the temporal evolution of the predictive power of scientometric measures. Our aim is to quantify how wella scientometric measure predicts future scientiﬁc reputation. To this end, we compare a ranking of scientists induced by theconsidered scientometric measure in year Y to a ranking induced by awards garnered up to year Y + X. A high correlationamong these two rankings implies that the scientometric measure is a good predictor of scientiﬁc reputation X years into thefuture. We compute the same correlation measures deﬁned earlier and take X = hylogenySequencingDevelopmental BiologyBioengineeringPhylogeneticsMolecular EvolutionBiophysicsPopulation GeneticsEndocrinologyStem CellsProteomicsCancer BiologyEpigeneticsEvolutionary BiologyHuman GeneticsSystems BiologyMetabolismMolecular BiologyBiochemistryCardiovascular DiseaseCell BiologyAgingBiologyEvolutionBiostatisticsComputational BiologyDiabetesMedicineCardiologyImmunologyBioinformaticsCancerNeuroscienceGeneticsEpidemiologyGenomics Biology Computer SystemsCompilersDatabase SystemsScientific ComputingProgramming LanguagesOperating SystemsDistributed ComputingParallel ComputingComputer SecurityHigh Performance ComputingComputer ArchitectureAiCryptographyNeural NetworksInformation RetrievalCloud ComputingInformation TheoryComputer GraphicsAlgorithmsPattern RecognitionNetworkingDistributed SystemsDeep LearningOptimizationImage ProcessingSignal ProcessingComputer ScienceData MiningRoboticsComputer VisionArtificial IntelligenceMachine Learning Computer Science0 100 200 300 400 500number of author profilesFinancial CrisesMicroeconomicsBehavioral FinanceApplied EconometricsBankingMonetary EconomicsEconomic GrowthFinancial EconomicsEconomic DevelopmentAsset PricingIndustrial OrganizationInternational FinanceInternational EconomicsBehavioral EconomicsCorporate FinanceDevelopment EconomicsLabor EconomicsEconometricsMacroeconomicsFinanceEconomics Economics 0 100 200 300 400 500number of author profilesPhysics InstrumentationRelativityElementary ParticlesSupersymmetryExperimental PhysicsMathematical PhysicsAccelerator PhysicsString TheoryQuantum MechanicsStatistical MechanicsCondensed Matter TheorySuperconductivityGeneral RelativityTheoretical PhysicsCondensed Matter PhysicsNuclear PhysicsAstronomyGravitational WavesCosmologyAstrophysicsHigh Energy PhysicsPhysicsParticle Physics Physics Figure S1.

Google Scholar queries used to initialize the datasets. Distribution of search queries in the initial lists ofresearchers; i.e. the number of researchers in the initial lists who feature the respective keyword phrase in their proﬁle. harmacology, Toxicology and PharmaceuticsAgricultural and Biological SciencesImmunology and MicrobiologyNeuroscienceBiochemistry, Genetics and Molecular BiologyMedicine Biology EngineeringComputer Science Computer Science0 200 400 600 800 1000number of author profilesDecision SciencesPsychologyBusiness, Management and AccountingSocial SciencesEconomics, Econometrics and Finance Economics 0 200 400 600 800 1000number of author profilesPhysics and Astronomy Physics

Figure S2.

Scopus subject areas used for ﬁltering the initial author list compiled from Google Scholar. The plots show thenumber of author proﬁles in the ﬁltered datasets with the respective subject as their primary research area. S c opu s C u m . o f au t ho r s Authors 0.00.20.40.60.81.01.2 C u m . o f pub li c a t i on s × Publications 0.00.20.40.60.81.0 C u m . o f c i t a t i on s × CitationsBiologyComputer ScienceEconomicsPhysics1980 1990 2000 2010Year05001000150020002500300035004000 S c ho l a r C u m . o f au t ho r s C u m . o f pub li c a t i on s × C u m . o f c i t a t i on s × Figure S3.

Overview of Scopus and Google Scholar datasets. Scholar (top) and Google Scholar (bottom) datasets. From leftto right: Cumulative number of authors, publications, and citations per year, from 1970 onwards. Authors are consideredpresent in the database if they have at least one publication recorded by the considered year. C u m u l a t i v e nu m be r o f a w a r d s American Academy of Arts & SciencesFellows of the American Association for the Advancement of ScienceFellows of the American Statistical AssociationNational Academy of EngineeringNational Academy of SciencesBreakthrough Prize in Life SciencesNational Academy of MedicineNobel Prize in ChemistryNobel Prize in Physiology or MedicineACM Prize in ComputingTuring AwardAEA/AFA Joint Luncheon SpeakersAmerican Economic Association Distinguished FellowsAmerican Economic Association Foreign Honorary MembersAmerican Economic Association Richard T. Ely LecturersAmerican Finance Association Fischer Black PrizeFellows of the American Finance AssociationFellows of the Econometric SocietyFisher-Schultz LectureFrisch Memorial LectureJohn Bates Clark MedalMorgan Stanley - AFA Award for Excellence in FinanceNobel Prize in EconomicsWalras-Bowley LectureBreakthrough Prize in Fundamental PhysicsFellows of the American Physical SocietyNobel Prize in Physics b c C u m u l a t i v e nu m be r o f a w a r d s BiologyComputer ScienceEconomicsPhysics d N u m be r o f r e s ea r c he r s Figure S4.

Award statistics. ( a ) Cumulative number of awards indexed in our data collection. ( b ) Cumulative number ofawards to scientists in our datasets. ( c ) Cumulative number of awards to scientists in each research ﬁeld. ( d ) Distribution of thenumber of awards garnered by individual scientists. S c opu s A w a r d s Biology Computer Science Economics Physics0.0 0.2 0.4 0.6 0.8 1.0Scientists with no award0.000.250.500.751.00 S c ho l a r A w a r d s b Scopus ScholarMeasure Bio CS Eco Phy Bio CS Eco Phy Avg.c 0.60 0.68 0.77 0.54 0.59 0.74 0.83 0.45 0.65 µ µ -frac 0.76 0.62 0.77 0.88 0.68 0.62 0.77 g-frac 0.78 0.69 0.79 0.89 0.73 0.74 Figure S5.

Receiver operating characteristic (ROC) curve and area under the curve (AUC) for each research ﬁeld and datasource. ( a ) The horizontal axis is the accumulated fraction of scientists with no awards ( false positive rate ). The vertical axis isthe fraction of awards accumulated by scientists ( true positive rate ). Larger area under the curve (AUC) indicates that a givenbibliometric indicator ranks scientists who have received more awards more highly. Details are given in the text. ( b ) Numericalvalues of AUC for each research ﬁeld and data source. .00.20.4 S c opu s B i o l og y S c opu s C o m pu t e r S c i en c e S c opu s E c ono m i cs S c opu s P h ys i cs S c ho l a r B i o l og y S c ho l a r C o m pu t e r S c i en c e S c ho l a r E c ono m i cs S c ho l a r P h ys i cs Figure S6.

Effectiveness of scientometric measures over time for different evaluation criteria. From left to right: Kendall’s τ ,area under the curve (AUC), Somers’ D, Goodman and Kruskal’s γ , Spearman’s ρ . .00.20.4 S c opu s B i o l og y Default 0.00.20.4 Weighted 0.00.20.4 Yes / no award count 0.00.20.4 75% of the awards 0.00.20.4 50% of the awards0.00.20.4 S c opu s C o m pu t e r S c i en c e S c opu s E c ono m i cs S c opu s P h ys i cs S c ho l a r B i o l og y S c ho l a r C o m pu t e r S c i en c e S c ho l a r E c ono m i cs S c ho l a r P h ys i cs Figure S7.

Effectiveness of scientometric measures over time for different perturbations of rankings induced by awards.From left to right: equal weight for all awards (default), higher weight for awards with <

100 laureates, binary (yes / no) awardcounting, random subsets of awards reveiced by researchers in our database (75% and 50%). .00.20.4 S c opu s B i o l og y All 0.00.20.4 No hyperauthors 0.00.20.4 Least citations 0.00.20.4 Publ. peak in [2000, 2010) 0.00.20.4 Publ. peak in [2010, 2020)0.00.20.4 S c opu s C o m pu t e r S c i en c e S c opu s E c ono m i cs S c opu s P h ys i cs S c ho l a r B i o l og y S c ho l a r C o m pu t e r S c i en c e S c ho l a r E c ono m i cs S c ho l a r P h ys i cs Figure S8.

Effectiveness of scientometric measures over time for different subsets of researchers. From left to right: allresearchers, without hyperauthors, authors with fewer citations (bottom half), authors with publication peak in [ , ) ,authors with publication peak in [ , ) . able S1. Awards used in our study. The ﬁrst ﬁve awards apply to all research areas ( cross -ﬁeld), while the others areﬁeld-speciﬁc ( CS stands for computer science ). The second-to-last column lists the total number of laureates of each award.The last column shows the number of laureates in our datasets.Award Laureates Matches C r o ss American Academy of Arts & Sciences 13,837 354Fellows of the American Association for the Advancement of Science 65,303 390Fellows of the American Statistical Association 2,485 45National Academy of Engineering 4,401 106National Academy of Sciences 6,085 263 B i o l ogy Breakthrough Prize in Life Sciences 48 11National Academy of Medicine 2,980 121Nobel Prize in Chemistry 184 5Nobel Prize in Physiology or Medicine 219 2 C S ACM Prize in Computing 13 7Turing Award 70 9 E c ono m i c s AEA/AFA Joint Luncheon Speakers 58 10American Economic Association Distinguished Fellows 172 25American Economic Association Foreign Honorary Members 40 11American Economic Association Richard T. Ely Lecturers 58 14American Finance Association Fischer Black Prize 8 4Fellows of the American Finance Association 66 22Fellows of the Econometric Society 719 194Fisher-Schultz Lecture 54 13Frisch Memorial Lecture 9 2John Bates Clark Medal 42 13Morgan Stanley - AFA Award for Excellence in Finance 5 0Nobel Prize in Economics 84 17Walras-Bowley Lecture 48 12 P hy s i c s Breakthrough Prize in Fundamental Physics 33 13Fellows of the American Physical Society 10,902 178Nobel Prize in Physics 213 10 able S2.

Effectiveness of scientometric measures. Higher is better. The most effective measure in each dataset is highlightedin bold. Scopus ScholarMeasure Bio CS Eco Phy Bio CS Eco Phy Avg. K e nd a ll ’ s τ c 0.10 0.17 0.26 0.05 0.08 0.20 0.33 -0.05 0.14 µ µ -frac 0.30 0.08 0.27 0.40 0.21 0.08 0.27 0.27 0.23h-frac g-frac 0.32 0.18 0.30 0.43 0.26 0.20 S o m e r s ’ D c 0.13 0.31 0.40 0.08 0.11 0.37 0.51 -0.09 0.23 µ µ -frac 0.41 0.15 0.41 0.67 0.29 0.14 0.41 0.46 0.37h-frac g-frac 0.44 0.32 0.45 0.69 0.36 0.37 G ood m a n a nd K r u s k a l ’ s γ c 0.13 0.31 0.40 0.08 0.11 0.37 0.51 -0.09 0.23 µ µ -frac 0.41 0.15 0.41 0.67 0.29 0.14 0.41 0.46 0.37h-frac g-frac 0.44 0.32 S p ea r m a n ’ s ρ c 0.12 0.21 0.33 0.06 0.10 0.25 0.41 -0.07 0.18 µ µ -frac 0.38 0.10 0.34 0.50 0.27 0.09 0.34 0.34 0.30h-frac g-frac 0.40 0.22 0.37 0.52 0.33 0.25 dditional References

58. Kendall, M. G. The treatment of ties in ranking problems.

Biometrika

American Sociological Review

Journal of the American StatisticalAssociation

British Journal of Mathematical and Statistical Psychology255–269 (1995).