Crediting multi-authored papers to single authors
CCrediting multi-authored papers to single authors
Anna Tietze, Serge Galam, and Philip Hofmann Institute of Neuroradiology, Charit´e University Medicine Berlin, 13353 Berlin, Germany ∗ CEVIPOF - Centre for Political Research, Sciences Po and CNRS, 98 rue de l’Universite Paris, 75007, France † Department of Physics and Astronomy, Interdisciplinary NanoscienceCenter (iNANO), Aarhus University, 8000 Aarhus C, Denmark ‡ (Dated: May 7, 2019)A fair assignment of credit for multi-authored publications is a long-standing issue in scientomet-rics. In the calculation of the h -index, for instance, all co-authors receive equal credit for a givenpublication, independent of a given author’s contribution to the work or of the total number ofco-authors. Several attempts have been made to distribute the credit in a more appropriate man-ner. In a recent paper, Hirsch has suggested a new way of credit assignment that is fundamentallydifferent from the previous ones: All credit for a multi-author paper goes to a single author, thecalled “ α -author”, defined as the person with the highest current h -index ( not the highest h -indexat the time of the paper’s publication) [1]. The collection of papers this author has received creditfor as α -author is then used to calculate a new index, h α , following the same recipe as for the usual h index. The objective of this new assignment is not a fairer distribution of credit, but rather thedetermination of an altogether different property, the degree of a person’s scientific leadership. Weshow that given the complex time dependence of h for individual scientists, the approach of usingthe current h value instead of the historic one is problematic, and we argue that it would be feasibleto determine the α -author at the time of the paper’s publication instead. On the other hand, thereare other practical considerations that make the calculation of the proposed h α very difficult. Asan alternative, we explore other ways of crediting papers to a single author in order to test earlycareer achievement or scientific leadership. INTRODUCTION
Assigning appropriate credit to contributors of multi-authored publication is a challenging problem. Severalschemes have been proposed to achieve this in a “fair”way [2–8]. The issue has received special attention inconnection with the h -index proposed by Hirsch in 2005[9], motivated by the huge importance that this singlenumber has gained in the evaluation of scientists andinstitutions.Recently, Hirsch has proposed an interesting variationto the h -index that uses a strongly biased credit distri-bution compared to all previous suggestions: Instead ofgiving (possibly normalized) credit to all co-authors, onlya single author, the so-called α -author, can claim creditfor a given paper [1]. More precisely, the new index h α isconstructed in the same way as h from a scientist’s list ofpublications, but only publications are counted in whichthe person is the α -author. The α -author, in turn, isdefined as the contributor with the highest value of theconventional h at present. While h α may at first seemto be even more “unfair” than h , it is not intended toserve the same purpose. Instead, it is meant to measure“scientific leadership”, based on the assumption that the α -author has contributed significantly to the conceptionand realization of the project leading up to the paper.In this sense, h α can be used to complement other, moreconventional, measures of scientific success, including h as such.Hirsch’s introduction of h α has quickly given rise tosome criticism in Ref. [10], partly because it was per- ceived to reinforce the Matthew effect in science, andpartly because of a technical issue: In order to determinewho the α -author of a given paper is, Hirsch proposedto compare the h values of all co-authors at the presenttime and not at the time of the paper’s publication. Thissimplifies the calculation, as current h values are readilyavailable. In fact, it is often assumed that calculating his-torical h values is difficult or impossible [10]. However,Leydesdorff et al. argued that using current h values canlead to significant instabilities in the resulting of h α ’s forcollaborating scientists, since small relative changes inthe h values of the collaborators can lead to a shift of the α -author status in many co-authored papers [10]. Hirschhas addressed both aspects of this criticism in Ref. [11].The consequences of using the current h values for thehistoric ones at the time of a paper’s publication whencalculating h α are, in fact, poorly understood and closelyrelated to the time dependence of h for individual scien-tists. In the present work, we show that historical h values can be readily calculated using retrievable datafrom Web of Science and we study the time dependenceof h for a large number of condensed matter physicists.We show that the average of h over a larger populationdoes indeed show a roughly linear time dependence, butthis does not hold on the level of individual scientists.Given the relative ease of calculating historical h values,the definition of h α could thus be modified such that the α -author is determined at the time of a paper’s publi-cation. However, a more severe practical problem with h α is that it is extremely difficult to calculate due to(co-)author name disambiguation. a r X i v : . [ c s . D L ] M a y As an alternative implementation of Hirsch’s proposal,we consider variations of Galam’s gh -index [6] to iden-tify early career achievements or scientific leadership ofauthors from the position of their name in the authorlist. Such an approach is obviously only meaningful ina research field where the order of the authors encodesinformation about their contributions (as is the case incondensed matter physics).This paper is structured as follows: The Methods sec-tion briefly explains the data set used in this study andhow to automatically calculate the time dependence of h for a large group of scientists. In the Results and Dis-cussion section we first present our results on the timedependence of h and then discuss the possibility to useGalam’s gh to assign credit to single authors in order toinvestigate certain properties such as research leadership.We end the paper with a Conclusions section. METHODS
The data set used in this study has already been in-troduced in Ref. [12] and it is described in detail there.In short, it consists of general citation data for 302 con-densed matter physicists (number of publications, cita-tions and h -values today) extracted from ResearcherIDbetween April and December 2018, combined with de-tailed citation data for every paper co-authored by theseindividuals (24,286 papers), extracted from Web of Sci-ence (WoS).For a given individual from the group, the value of h at any desired point in time can be determined as fol-lows: In WoS, a search for all the individual’s papersis performed using the Researcher ID number as uniqueidentifier. Using the “create citation report” function,the details of every single paper can be obtained, i.e. theauthor list and the number of times the paper has beencited in each year after its publication. This data can beexported from WoS for further analysis. Calculating h ata given point in time is now a trivial matter because itsimply requires counting the number of published papersand the number of citations these papers have acquiredup to that point in time. It is also possible to obtainthe position of the author in question by inspecting theauthor list. This cannot be done entirely automatically,notably in the case of name changes or different spellingpossibilities of a name. Note that the procedure offers acertain amount of protection against false Researcher IDprofiles because publications which are not co-authoredby the owner of the profile stand out.The starting point t = 0 is defined as the year a givenauthor has published the first paper. When a time de-pendence is considered, the starting year is also the yearzero. When we refer to a 20 year career, we are thusconsidering years zero to 19. (a) (b) VWXYZ
FIG. 1: (a) Time dependence for average h for researcherswith a career length of 15, 20 and 25 years. (b) Time depen-dence for five selected individuals with a qualitatively differentbehaviour. RESULTS AND DISCUSSIONTime dependence of the h -index The time dependence of the h -index for individual re-searchers has been the central issue in the debate aboutthe stability of the proposed h α -index [1, 10, 11]. Deter-mining the α author of a historical paper by choosing theauthor with the highest h today is only unproblematic if h increases linearly and with the same rate for all co-authors [11]. This assumption is obviously questionablebecause a variation in the growth rates of h is the featurethat turns h into a useful quantity in the first place. Thetime dependence of the h -index has been investigatedpreviously [13–19], albeit mostly theoretically or for asmall number of individuals. Different curve shapes con-sistent with monotonic growth have been discussed [17].To the best of our knowledge, no systematic study of thetime dependence of h for a well-defined larger group ofscientists has been carried out so far.We start by investigating the average time dependenceof h for three sub-sets of the 302 individuals in our dataset: scientists with a career length of up to 15 years (154individuals), 20 years (105 individuals) and 25 years (67individuals). Note that the group of 154 individuals witha career length of at least 15 years also contains all mem-bers of the other two groups. The results are shown inFigure 1(a). The assumption of a linear increase of h withtime is clearly reasonable, if not entirely correct. Re-markably, the slope of the curve is higher for the groupswith a shorter career span. This can be expected becausethese groups also contain the “young” researchers with acareer start after 2002, and these profit especially fromthe general growth in the number of published papersand hence also citations [20].A roughly linear time dependence of h on average, how-ever, does not imply that this also holds on the level ofan individual researcher. In fact, this is not the case.Fig. 1(b) illustrates some strong deviations from the lin- (a) (b) slope ( Δ h / year) FIG. 2: Result of a linear fit to h ( t ) for 67 individuals with acareer length of 25 years or more, shown as histograms. (a)Slope and (b) goodness of linear fit (expressed as the sum ofsquared residuals). ear behaviour for a few chosen examples from our dataset. Researcher V shows almost ideal linear behaviour,apart from a slight delay in the start of the career. Theother individuals show all types of different characteris-tics such an increased slope at later times (W and, ratherextreme, X), or more complicated curve shapes (Y andZ). These different shapes and possible reasons for themhave been discussed previously [17]. It could be inter-esting to investigate this further by assigning researchersto different shape categories but this goes beyond thescope of the present work. Here, it is only importantthat a large variety of curve shapes is found and thattheir tendency to average out to a linear curve does notimply that linear behaviour holds on the level of individ-ual researchers. With respect to the calculation of h α , aconsequence of this is that from knowing h today, it isimpossible to make reliable statements about what h hasbeen at some point in the past.In order to quantify the deviation from a linear timedependence, we perform a linear fit of h ( t ) for individualsof the sub-group with a career length of at least 25 years,using the constraint that h in the first year of the career h (0) matches the actually observed value (mostly 0, butsometimes 1 or even 2). The resulting slope and thesum of the squared residuals are shown as histograms inFigure 2(a) and (b), respectively. We see that h typicallyincreases at a rate of 0-2 per year and that, while a linearfit works reasonably well in most cases, there are manyindividuals for which it does not.Using the current instead of the historical value of h to determine the α -author of a paper has another poten-tial drawback: If one of the authors of a paper ceasesto publish, he / she could be “out- α ’ed” by the otherswho continue doing so (except for especially outstandingauthors with a very high h ). To illustrate this, let usassume that three collaborators (A, B, C) with similar h values have published many papers together. At somepoint, B and C stop publishing due to a change of careeror owing to certain life circumstances. This leads to B’s (a) (b) VXY
FIG. 3: Time dependence of h when an end to publicationactivity is artificially enforced. (a) Average h ( t ) for 67 indi-viduals with a research career length of at least 25 years (solidline). The dashed lines show the h ( t ) when papers publishedafter the first 10, 15 and 20 years are not considered. (b) Cor-responding curves for three of the 5 researchers from Figure1(b) (only three are chosen for clarity. and C’s h ( t ) to level off while growth continues for A.Eventually, A would out- α B and C for reasons that arenot related to scientific leadership. How long would Aneed to wait until his / her h value would overtake thoseof B and C?The saturation of h following the end of a scientific ca-reer has already been discussed by Hirsch in his first pa-per on the subject [9]. Using a simple model, he pointedout that the time needed for h ( t ) to level off increaseswith the total length of the scientific career / the to-tal number of papers published. Saturation in Hirsch’smodel results from all papers ending up in the h -core,and while this is not expected in a realistic scenario, thekey-feature of a slower saturation after a long career isstill found in our data. Fig. 3(a) shows the average h ( t )for the group of scientists with a career length of 25 yearsor more as a solid line, as already given in Fig. 1(a). Thedashed lines show the resulting curve if we artificially en-force a career end after 10, 15 or 20 years. The expectedtendency for h ( t ) to level off is indeed observed and thisappears to happen faster after a shorter career.It is important to remember that, again, the trend ob-served for the average population does not permit con-clusions about the trend for individual researchers, whichcan be very different. This is illustrated in Fig. 3(b)which shows the corresponding curves for researchers V,X and Y from the data in Fig. 1(b). In this case, onlythe curve for V is similar to the (roughly linear) averagewhereas the curves for X and Y and quite different.An important conclusion from this section is that de-termining the α -author on the basis of the current h val-ues does not seem to be a good choice. An obvious so-lution to the issue would be to determine the α -authorof a paper by using the historical h values at the timeof the paper’s publication. Leydesdorff et al. argue thatthis is challenging because the required citation data arenot provided directly in WoS [10] but, as we have shownhere, they can actually be extracted – provided that theauthor’s paper collection can be identified without ambi-guity by a search in WoS.The biggest practical difficulty in the calculation of h α is, in fact, another one: Even if all information aboutevery article authored by a person, including detailed ci-tation data, is available, this is not sufficient. The sameinformation is needed for every co-author on every paperthe person has ever published with and this is essentiallyimpossible to obtain, unless all authors have unique iden-tifiers. Such unique identifiers are being currently intro-duced ( e. g. ORCID or Researcher ID) but even if suchidentifiers were used throughout today (which they arenot), it would still take at least 20 years before a calcu-lation of h α would be practical. Alternative approaches to crediting publications tosingle authors
As we have shown in the previous section, the proposed h α suffers from two important drawbacks: (1) Using thecurrent values of h as a proxy for the historical values of h in order to determine the α -author of a paper causesseveral problems. (2) Mainly because of name disam-biguation, determining h α is extremely difficult and canprobably only be done for authors one is very familiarwith, or who have sufficiently high h values, such thatall past and current competitors for the α -status can beidentified and checked by person familiar with the field ofresearch (this does not protect against situations like for-mer PhD students who have meanwhile obtained a high h value through research in a different field). The firstproblem for the calculation of h α can be fixed quite eas-ily but the second can not. Still, the idea of an “unfair”credit distribution to a single author in order to iden-tify characteristics such as research leadership is very in-teresting and in this sub-section, we explore alternative,more practical, approaches to accomplish this.A useful tool for giving credit to a single author onlyis the gh -index proposed by Galam [6]. Originally, thiswas introduced as an index that obeys conservation lawsfor the number of published papers and citations. Indeed,the current practice of all co-authors taking credit for theentire paper and all of its citations appears to violate el-ementary conservation laws. As an illustration, imaginea business where every associated partner gets the to-tal business revenue and also owns 100% of the business.Unfortunately, this is too good to be true and only feasi-ble with one single owner. A soon as two or more part-ners own the business, the profit and the ownership mustbe divided along the respective shares of partners whosetotal equals 100% of the business assets. Similarly anymeaningful quantitative bibliometrics treatment shouldobey the same kind of conservation law, here with re- spect to numbers of papers and citations.While the gh -index was proposed with the intention ofachieving a “fair” distribution of credit, it can be definedin a rather large number of declinations, each one giv-ing a specific distribution of a paper’s citations to eachco-author. This distribution can also be chosen to beextremely “unfair”, such as attributing all of a paper’scitations to a single author with the purpose of measur-ing quantities different from the usual total impact. Wethus define the following variations of gh : • gh e gives equal credit to every author by dividingthe number of a paper’s citations by k if there are k authors, and then follows the same recipe as for thecalculation of h . gh e can thus be seen as a variationof the usual h -index but adapted to multi-authorpublications and conserving the total number of ci-tations. • gh gives all credit to the first author, i.e. all ci-tations of a paper are counted if the author is thefirst in the author list and otherwise the citationsare set to zero. This is again followed by the usualprocedure of calculating h . gh can be viewed asan index for early career achievements, or in con-nection with single author publications. • gh L gives all credit to the last author and is oth-erwise calculated in the same way as gh . In con-densed matter physics, as in many other fields, thelast author is usually the person with the overallresponsibility for a project. In the sense of Hirsch’ssuggestion, this would be the α -author but it wouldnot necessarily be the person with the highest val-ues of h .If probing research leadership is the objective, using gh L is much simpler than using h α because its calculationis based on the easily determined position of an authorin the list of co-authors. An important restriction is thatthe author position needs to contain information aboutauthor’s role in a collaboration which is not always thecase. If applicable in a given field of research, choosingthe last author instead of the one with the highest h asthe leading author can have several advantages. Somehave already been mentioned, for example the presenceof high- h co-authors on a paper who are simply on theauthor list because they have contributed to the fund-ing of the project [10], a practice that is not desirablebut common. Another example could be a collaborationinvolving several groups, all lead by senior people witha high h . Even if the project is lead by one individual,this would not necessarily be the person with the highest h , especially if the groups cover different sub-disciplines.In condensed matter physics, for instance, density func-tional theory is a sub-field of huge impact and therefore (a) (b) (c) FIG. 4: (a) Average time dependence of h , gh e , gh and gh L for the sub-group of researchers with a career length of at least25 years. (b) and (c) Distribution of gh and gh L as a function of h for the 105 researchers with a career length of at least 20years. Multiple incidents are colour-coded as red (1), green (2) and purple (3). high citation numbers and h -indices [21]. A senior indi-vidual working in this field would be likely to out- α theother co-authors.Note that both gh and gh L contain single author pa-pers. If gh L is to be used as an indicator of researchleadership, it is not obvious that single author papersshould be included. On the other hand, single authorpapers are quite rare (less than 3% of the total for ourdata set), and we can therefore ignore this issue here.The average of the three gh indices is plotted in Fig.4(a) for the authors with career length of at least 25 yearsand compared to the average h for the same group. All gh ’s are smaller than h because they do either include anormalization ( gh e ) or represent only a subset of an au-thor’s publications ( gh and gh L ). As one might expect, gh e is somewhat similar to h , in that it is almost linearbut has a slightly increasing slope over time. gh , onthe other hand, shows a decreasing slope over time. Thismight be expected because many first author papers arewritten in the beginning of a researcher’s career and thetotal number of citations they receive saturates at a latercareer stage. This is completely equivalent to the case ofceasing to publish altogether, just restricted to first au-thor papers. The overall characteristics of gh L with aclearly increasing slope over time is also consistent withwhat would be expected for a h -type index associatedwith research leadership. Last author papers are typi-cally first published with the beginning of an indepen-dent career and their number and impact first becomesapparent at a later career stage.As in the case of the h -index, merely inspecting theaverage of gh and gh L is insufficient to draw conclu-sions on the level of a single researcher. In Figure 4(b)and (c), we therefore show the distribution of gh and gh L as a function of the conventional h , respectively. Inthe case of gh , we observe that the role of first authorpapers decreases for individuals with a high value of h . This is to be expected because a high h is mostly basedon collaborative work with changing first authors. Thedistribution of gh L is more interesting, especially whenit comes to using this as a possible indicator of researchleadership. The distribution of gh L values for a given h is, in fact, very broad. For h = 29, for example, gh L varies between 6 and 24 meaning that one author hasachieved a relatively high value of h with only 6 last au-thor publications while another has done the same with24. This large variation clearly indicates that gh L couldbe a useful quantity to complement h . For the two au-thors in question, one would conclude that both madea significant contribution to their research field with thelatter in a leading role but not the former. CONCLUSION
We have inspected the time dependence of the h -indexfor a large number of individuals. While the average isroughly a linear function of time, this does not hold onthe level of individual researchers and we have discussedthe implications of this finding for the calculation of the h α -index recently suggested by Hirsch. Based on ourfindings, it appears more appropriate to identify the α -author of a paper based on the historical h -values at thetime of the paper’s publication rather than on the cur-rent ones, and we have shown that this is relatively easy.On the other hand, there are other severe practical limi-tation for a calculation of h α . We have adapted Hirsch’ssuggestion to assign credit in a multi-author paper to asingle author in h -type indices for other purposes thanmerely identifying total research impact, and we haveimplemented this using the scheme of the gh -index sug-gested by Galam. Such h -type indices could, for instance,be used to study the correlation between early career suc-cess ( e. g. a high gh or many first author papers in theinitial career stage) and later career achievements (suchas a high h or gh L ). ACKNOWLEDGEMENTS
This work was supported by VILLUM FONDEN viathe center of Excellence for Dirac Materials (Grant No.11744). One of us (SG) would like to thank J. E. Hirschfor helpful discussions. ∗ Electronic address: [email protected] † Electronic address: [email protected] ‡ Electronic address: [email protected][1] J. E. Hirsch, Scientometrics , 673 (2019).[2] T. Tscharntke, M. E. Hochberg, T. A. Rand, V. H. Resh,and J. Krauss, PLoS Biology , e18 (2007).[3] M. Schreiber, New Journal of Physics , 040201 (2008).[4] L. Egghe, Journal of the American Society for Informa-tion Science and Technology , 1608 (2008).[5] J. E. Hirsch, Scientometrics , 741 (2010).[6] S. Galam, Scientometrics , 365 (2011).[7] H.-W. Shen and A.-L. Barabasi, Proceedings of the Na-tional Academy of Sciences , 12325 (2014). [8] V. Vavryˇcuk, PLOS ONE , e0195509 (2018).[9] J. E. Hirsch, Proceedings of the National Academy of Sci-ences of the United States of America , 16569 (2005).[10] L. Leydesdorff, L. Bornmann, and T. Opthof, Sciento-metrics , 1163 (2019).[11] J. E. Hirsch, Scientometrics , 1167 (2019).[12] A. Tietze and P. Hofmann, Scientometrics , 171(2019).[13] L. Egghe, Journal of the American Society for Informa-tion Science and Technology , 452 (2007).[14] L. Egghe, Mathematical and Computer Modelling ,864 (2007).[15] Q. L. Burrell, Scientometrics , 19 (2007).[16] R. Guns and R. Rousseau, Journal of the American So-ciety for Information Science and Technology , 410(2009).[17] J. Wu, S. Lozano, and D. Helbing, Journal of Informetrics , 489 (2011).[18] R. Mannella and P. Rossi, Journal of Informetrics , 176(2013).[19] Y. Tarasevich and T. Shinyaeva, Bulletin of the SouthUral State University. Series “Mathematical Modelling,Programming and Computer Software” , 32 (2016).[20] S. Wuchty, B. F. Jones, and B. Uzzi, Science , 1036(2007).[21] R. Van Noorden, B. Maher, and R. Nuzzo, Nature514