Bibliometrics for collaboration works
BBibliometrics for collaboration works
Paolo Rossi a , Alessandro Strumia a , and Riccardo Torre b,c a Dipartimento di Fisica dell’Universit`a di Pisa, Italy b CERN, Theory Division, Geneva, Switzerland c INFN, sezione di Genova, Italy
Abstract
An important issue in bibliometrics is the weighing of co-authorshipin the production of scientific collaborations, which are becoming thestandard modality of research activity in many disciplines. The prob-lem is especially relevant in the field of high-energy physics, wherecollaborations reach 3000 authors, but it can no longer be ignoredalso in other domains, like medicine or biology. We present theo-retical and numerical arguments in favour of weighing the individualcontributions as 1 /N α aut where N aut is the number of co-authors. Whencounting citations we suggest the exponent α ≈
1, that corresponds tofractional counting. When counting the number of papers we suggest α ≈ / − /
2, with the former (latter) value more appropriate forlarger (smaller) collaborations. We expect and verify that the h indexscales as the square root of the average number of co-authors, anddefine a fractionalized h index that does not scale with collaborationsize. a r X i v : . [ c s . D L ] J u l Introduction
In many research fields, scientific collaboration has become the standard way of operating, andmoreover, due to the increasing complexity of the problems to be faced, the number of scientistswith different competences involved in each single collaboration is increasing. In the extremecase of high energy physics numbers have already reached the four-digit level, but in manyother domains, like medicine or biology, it is not unusual to find two-digit collaborations.In the context of bibliometrics this hyper-authorship phenomenon poses a very importantquestion, concerning the individual degree of property that must be assigned to the authors ofa common scientific article, both concerning the paper itself and the citations it receives. It israther clear that attributing the full credit of a paper to each of the authors is mystifying and (ifadopted by policymakers) tends to encourage fictitious collaborations, because of the obviouscompetitive advantage resulting from the much larger number of articles that a collaborationmay produce in the same amount of time in comparison with an isolated author. Moreover,also the number of citations received is strongly correlated with the typical dimensions of thecollaborations operating in a given field of research.Fractional counting of papers and citations could be a solution to this issue. Fractionalcounting has been extensively discussed in the literature. For instance, it has been consideredin the context of metrics and rankings by Aksnes, Schneider, and Gunnarsson (2012), Bouyssouand Marchant (2016), Carbone (2011), Egghe (2008), Hooydonk (1997), Leydesdorff and Born-mann (2010, 2011), Leydesdorff and Opthof (2010), Leydesdorff and Shin (2011), Rousseau(2014), Strumia and Torre (2019), and in the context of constructing research networks byLeydesdorff and Park (2017), Perianes-Rodr´ıguez, Waltman, and van Eck (2016). Fractionalcounting gives an intensive quantity: this means, for example, that the total index of the Eu-ropean Union is the sum of its members, unlike what happens if full counting is adopted (seee.g. the discussions by Gauffriau (2017), Gauffriau, Larsen, Maye, Roulin-Perriard, and von Ins(2008), Strumia and Torre (2019), and Waltman and van Eck (2015)).However, the choice of fractionally counting papers by attributing a 1 /N aut weight to eachof the N aut co-authors of a paper would imply a strong penalty for authors beloinging to largecollaborations. As we will show in the following, see for instance fig. 3, this can be seen fromthe strong dependence of the fractionally counted number of papers on the collaboration size,which implies a strong reduction of the fractionally counted number of papers for collaborationswith many co-authors. Indeed typically N aut tend to produce less than N pap = N aut papers inthe same time in which a single author produces a single paper. While it is clear that froma pure research perspective what matters is the scientific impact, that cannot be quantifiedtrough the publication frequency, there are still examples of policymakers that consider thenumber of papers as a simple relevant indicator and use it for the evaluation of scientists. While full counting of papers penalises authors working in small collaborations (for instancesmall experiments in fundamental physics), full fractional counting of papers, if adopted bypolicymakers, would on the contrary discourage large collaborations, which are a necessaryendeavour in modern research. This problem needs therefore a non-subjective solution, namely As an example, the Italian Ministry for Research poses lower thresholds in the publication frequency toaccess to professorship positions. h -index, etc.The aim of the present paper is to find fractional counting algorithms that do not scalewith collaboration size. This has a two-fold advantage: on the one hand it ensures that neitherauthors belonging to large collaborations, nor single authors nor authors working in smallgroups are favoured or disfavoured, when bibliometric indicators are is used in their evaluation,for their choice of carrying out their research in large versus small groups. On the other handthis allows to better quantify and qualify the bibliometric output of large collaborations incomparison with small groups of authors (or even single authors).The bibliometric literature documents that collaboration papers tend to have higher impactthan single-authored papers. Beaver (1986) studied the field of physics, finding that co-authoredresearch tends to be of higher quality than solo research. Bordons, Garcia-Jover, and Barrigon(1993) studied Spanish publications in pharmacology and pharmacy finding that internationallyco-authored documents have higher impact than the remaining collaborative documents or non-collaborative ones. Avkiran (2013) found that collaboration leads to articles of higher impact infinance, up to 4 collaborators. Gazni and Didegah (2011) found a significant positive correlationbetween the number of authors and the number of citations in Harvard publications. Hsu andHuang (2011) considered 90k articles in natural sciences, finding that the average number ofcitations scales as N / (data extend up to about 10 co-authors), up to wide fluctuations. Leeand Bozeman (2005) found that the number of peer-reviewed journal papers is strongly andsignificantly associated with the number of collaborators, unlike the number of fractionally-counted papers. Katz and Hicks (1997) studied how the average number of citations perpaper varies with different types of collaborations. See also the works of van Raan (1997),Sooryamoorthy (2009) and Birnholtz (2006) for additional studies.General theoretical arguments concerning scale-free systems suggest that the scientific pro-ductivity of collaborations and the corresponding frequency distribution of citations shouldshow some, at least approximate, power law dependence on N aut . Empirical evidence appearsto support these arguments. Finding the most appropriate exponents for these scaling lawswould offer the possibility of weighing the production of collaborations in the bibliometricestimate of the (quantitative) value of their results in such a way as to discourage adaptiveand opportunistic behaviours while encouraging more appropriate practices in the indicationof co-authorship.In section 2 we develop and present some theoretical arguments in favour of weighing theindividual contributions to a single paper as 1 /N α aut , where α ≈ / − /
2, with the former(latter) value more appropriate for larger (smaller) collaborations. When counting overall cita-tions we suggest the exponent α = 1, corresponding to fractional counting. By combining thetwo above arguments, in section 2.4 we define an h index that does not scale with collaborationsize.In section 3 we analyze empirical data concerning a very large number of collaborationsactive in fundamental physics, where the range of available values of N aut allows for sufficiently3onvincing estimates of the exponents describing the dependence on N aut of the total numberof papers and of the mean and total number of citations.Finally, we summarize and draw our conclusions in section 4. The behaviour of collaborations with N aut authors can be viewed as a scale-free phenomenonfor a wide range of values of N aut . Any upper limit on N aut would be sufficiently large to excludeany sensible effect on the equilibrium distributions in the range of values we are interested toexplore (3 to 4 orders of magnitude).We therefore expect that the various indices N I that characterise bibliometric outputsof collaborations are distributed at equilibrium following a power-law behaviour, which weparametrise as follows (cid:104) N I (cid:105) = C I N p I aut (1)where C I and the powers p I are constants. For example this applies to the number of papersproduced (in a definite amount of time) by a scientific collaboration N pap = C pap N p pap aut and tothe average number of citations per paper N cit = C cit N p cit aut . The total number of citations N totcit received by the papers of a collaboration then scales as N totcit = N cit N pap = C cit C pap N p totcit aut , p totcit = p cit + p pap . (2) Assuming a collective rational behaviour, and that on average the number of citations receivedby scientific papers may be considered as a reasonable proxy for their quality, we might expectthat individual and collective choices would lead at equilibrium to p totcit ≈ , (3)namely that the total number of citations received by collaborations scales, on average, withthe number of members. This expresses the fact that the (average) value of work made by N aut scientists should approximately be equal to N aut times the work made by a single scientist. Alower power p totcit would arise in the presence of gift authorships, namely of authors who signpapers without substantially contributing.This means that the total number of citations is not a fair indicator for authors, as it growswith the number of co-authors. Similarly, the h -index introduced by Hirsch (2005), being onaverage proportional to square root of the number of citations, grows on average as the squareroot of the number of co-authors.A bibliometric index which does not overestimate nor underestimate individual contributionsto a collaboration is then the number of fractionally-counted citations N fcit received by each4uthor. This means that a fraction f A of each paper is attributed to each co-author A such thatthe fractions f A sum up to unity. On average N fcit scales with power index p fcit = p totcit − ≈ N fcit is a scale-invariant quantity that (unlike the number of citations) cannot bearbitrarily inflated grouping authors. In order to implement these concepts into actual bibliometric indices for the total number ofpapers or for the average number of citations, we must offer arguments in favour of explicitvalues for the exponents p pap and p cit . We present here a simple “theoretical” argument. As the goal of collaborations is achievingmore than what single authors can achieve, we expect p cit >
0. Assuming rational behaviour inthe formation of collaborations one may expect that (at least for not-too-big groups), individualcompetence of partners be as far as possible complementary, and therefore “orthogonal” in someabstract N -dimensional “space of competencies”. We may therefore regard the qualitativeoutput of a collaboration as the vector sum of N orthogonal vectors.The limit where all authors have “orthogonal” competencies corresponds to N = N aut : thenthe length of such vector scales as N / . This corresponds to p pap = 1 / , p cit = 1 / . (4)Sometimes, more than one collaborator is needed to fulfil a needed competency: realistic col-laborations organise in N sub ≤ N aut sub-collaborations that work on “orthogonal” topics. Weassume the number of sub-collaborations satisfies the scaling law N sub = N s aut with exponent s ≤
1. (5) We do not address the relative assignment of credit. In some fields the contribution of different authors isreflected by their order, with special recognition given to first and last authors. Various proposals have beenput forward to encode the relative credit in the fractions f A (see, e.g. Kosmulski (2012) and Waltman (2015)).In some other fields authors are sorted alphabetically, giving no information about who contributed more. Thishappens in most papers in our data-base, so that we will assume a common f A equal to the inverse of thenumber of authors. Some authors think that counting publications has no bibliometric interest, with citations being the onlyrelevant quantity to be measured. From such a perspective, it remains nevertheless interesting to know howcollaborations tend to split their bibliometric output within their publications. For example one might observea gap in citation output between (groups of) authors and want to understand if it mostly arises from publicationintensity. As another example, one might have a partial data-base (e.g. limited to some area) that only allowsto reliably compute publication intensity. Furthermore, the publication indicator is available immediately, whilethe citation indicator becomes more significant after some years as citations accumulate. As a matter of fact,while publication intensity is a less significant indicator, it remains used or at least mentioned because of itssimplicity. At the opposite extremum other authors think that citations can be significantly distorted by socialbiases, and view publications numbers as a more objective bibliometric indicator. In order to obtain a resultthat does not scale with collaboration size, one should then fractionalize papers according to the appropriatepower for papers. Assuming that authors have different skills, the average length squared of such vector scales as N aut . �� ��� ����������������� ������ �� �������� � ��� � � � � � � � � � � �� � �� � � �� �� � �� - � �� - � �� - � �� � �� � �� � �� � ���������������� � ���� = � ��� / � ��� � � �� � � ���� � � ��� �� - �� - �� - ���� - ���� - ���� - ������ - ������ - ������ - �������� - �������� - �������� - ���� Figure 1:
Left : Total number of collaborations listed in the
InSpire database with the numberof authors shown on the horizontal axis.
Right : distribution of the number of individualcitations (citations divided by the number of references of the citing papers) received by paperswith the indicated number of authors.Then, the average number of citations of each paper scales as the square root of the number of“orthogonal” competencies N = N sub : N cit ∝ √ N ∝ N s/ . (6)Assuming again an optimal distribution of resources, N totcit is expected to scale as N aut , andthereby the number of papers is expected to scale as N pap ∝ N − s/ . (7)It is reasonable to assume that the number of papers scales as N , the number of topics aboutwhich the collaboration has competencies. This leads to s = 1 − s/
2, solved by s = 2 /
3, andthereby to p pap = 2 / , p cit = 1 / . (8)A weaker growth of the number of papers with N leads to smaller p pap . h index The h index (defined by Hirsch (2005) as the number of papers that received more than h citations), provides extra information on the distribution of the number of citations, favouringauthors that produced many highly-cited papers with respect to authors that produced manypoorly cited papers plus a small number of top-cited papers.6heoretical arguments (see e.g. Yong (2014)) and evidence from data analysis (see e.g. Man-nella and Rossi (2013) and Strumia and Torre (2019)) indicate that the h index is stronglycorrelated to the square root of the total number of citations received by an author: h ≈ α N . . (9)where the theoretical prediction from Yong (2014) is α ≈ .
54 and the phenomenological resultobtained from the data of about 1400 Italian physicists is α ≈ .
53 (Mannella and Rossi, 2013).Like the number of citations, the h index is affected by the collaboration size, being higherfor authors with more collaborators. Assuming for the h index the above mentioned scaling asa function of N totcit , and recalling our prediction p totcit ≈ , we thereby expect that the h -indexshould scale approximately as the square root of the (average) number of authors: h ∝ N . (10)independently of the specific values taken by p pap and p cit , as long as they satisfy p cit + p pap ≈ In this section we present bibliometric data in fundamental physics, that offer support for p pap ≈ . − . , p cit ≈ . − . , p totcit ≈ , p fcit ≈ . (11)We use the InSpire database that gives a picture of fundamental physics world-wide from ∼ × Official collaborations . We consider the 5965 (mostly experimental) collaborationslisted in the
InSpire database. Each collaboration produced a certain number N pap ofpapers, roughly written with the same group of N aut authors. The left panel of Figure 1gives some demographic information, that is the distribution of the number of authors incollaborations and the distribution of fractionally counted citations for different collabo-ration sizes. In the following, when showing results for official collaborations, we indicatecollaborations as dots in a scatter plot, with the main collaborations indicated by theirnames. Furthermore, we show the mean (median) as a red (magenta) curve and a bluedotted line highlighting the scaling with the number of authors. High-Energy Physics Literature
InSpire
Database (https://inspirehep.net). A few collaborations varied significantly their number of authors. We define the number of authors of acollaboration by averaging the number of authors of all its papers, with weights proportional to their numberof citations. This procedure assigns minor weight to proceedings written by one or few authors and to paperswritten by earlier incomplete phases of the collaboration. ���� ������������������������������������ - ������������������ ����������������� - ����� ��������������� �������������� ���������� / ������������������� ������������������������ ��������������������������� / �������� - ������ - �������� ������� � * ���� + � ������������ - ���������� ���������������� ������������������������������������������������ ���� ���� - ����������� �������������� ��������������� ���������� �������������������� ����������� ����� - �������������� ���������������������������������������� ��������������� - �������������� ��������������� ���������������������������� - ������ ���� ���������� - ������������������ ����� ∝ � ������ � �� �� � �� � �� � ����� � �� � �� � ������� �� ��� �������������� � ��� � � � � � � � � � � � � � � � � � � �� � �� � �� � � �� �� � � � � � � �� �� � �� � �� � ����� � �� � �� � ������ ������ �� �������������� 〈 � ��� 〉 � �� ��� � � � � � � � � � � � � � 〈 � � � � 〉 ������������������������ / ����� ∝ � ������ Figure 2: Number of papers versus number of collaborators.2.
Occasional collaborations . Many more multi-authored papers have been written bycollaborations that form for one or few papers. To study them we proceed as follows. Foreach author in the
InSpire database we compute the average number of authors of his/herpapers, (cid:104) N aut (cid:105) ≥
1, as well as his/her bibliometric indices (number of papers, of citations,etc). In view of the large number of authors, in the following, when presenting resultson occasional collaborations, we avoid showing scatter plots and only show averages.Moreover, results are shown separately within the main topics of fundamental physics:experiment, theory, astro/cosmo. The first category includes all papers in the hep-ex(high-energy experiments) and nucl-ex (nuclear experiments) category of arXiv. Thelatter category includes papers in astro-ph, which contains astrophysics and cosmology.Theoretical papers are those appeared in hep-ph (high-energy phenomenology), hep-th(high-energy theory), hep-lat (lattice), nucl-th (nuclear theory), gr-qc (general relativityand quantum cosmology).
Figure 2 shows that the number of papers produced by official (left panel) or occasional (rightpanel) collaborations scales with the number of authors as N pap ∝ N . − . . (12)In the right panel, theoretical papers with many authors fall below the scaling. These are rareoutliers: almost all papers in theoretical categories have few authors. Theoretical papers withmany authors mostly are collections of separate contributions grouped together, rather thanbig collaborations. 8 ���������� - ����� ����� ��������� �������������������� ������� ����������������������������� ��� ���������� - �� ���� - �������� ��������� - �� ������������������������������ ��������������� ������������������� ���� / ������ / ������� ������������������������� ����������� ���������������������� �������������������� ������������ ������������ ���������������� ������ �� / ������������ ����� - ������ - �������� ��������������������� - ��� ���� + � ������ ���������� ������������� / �������� ������ ����������������������� - ��� ��� ������������� ���������� ���� - ������������ - �� ��� �������������� ������ ������������������ ������ ��������� �������������� ��������� ���� ������������������� ����������� - �������������� �������� ������������������� ������ ���������� ��������������� - ��������������� ���� ���������������������������� ���� ������������ ���������� ����� ∝ � ������ �� �� � �� � ����� � �� � ������� �� ��� �������������� � ��� � � �� / � � � � �� �� � �� � ����� � �� � ���� ������ �� �������������� 〈 � ��� 〉 � �� � � �� � �� �� � � � � � � � � � ������������������������ / ����� ∝ � ������ Figure 3: Mean number of citations per paper versus number of collaborators.
Figure 3 shows that the mean number of citations received by papers written by an officialcollaboration (left panel) or by an author (right panel) roughly scales with the average numberof co-authors as N cit = N totcit N pap ∝ N . − . . (13)This result is in reasonable agreement with Hsu and Huang (2011), who found, in a muchsmaller sample, a power ≈ / N aut ∼ Figure 4 shows that the total number of citations received by an official collaboration (leftpanel) or author (right panel) grows roughly linearly with the average number of co-authors: N totcit ∝ N . (14)This is expected by combining the two previous scalings: the total number of citations ofa collaboration can be decomposed as the product of the number of papers written by thecollaboration, times the average number of citations received per paper: these factors roughlyscale as N . − . × N . − . .Figure 5 shows that the total number of fractionally-counted citations N fcit = (cid:80) p N p cit /N p aut received by papers p written by an official collaboration (left panel) or author (middle panel)is roughy independent of the average number of co-authors. A similar result holds for a relatedquantity, “individual citations”, defined as fractionally counted citations divided by the numberof references of the citing papers: N fcit , N icit ∝ N . (15)9 ���� ������������������ ������������������������������ ��� ����������������� - ����� ��������������� ����������������� ���� / �������������� ������ ������������������������ ��������������������������� / �������� ����� - ������ - �������� ������� ���������� ���� ���� + � ������������ - ���������� ���������������� �������������� ������� ������ ������� ���������������� ������� ������������ �������� ���� - ��������������� ��� ������������������������� �������������������� ������� ���������� ������������������������� ������� ����������� ����� - �������������� ������������� �������� ���� ���������������������������������������� ���������� ����������������� - �������������� ������� ������������������������� - ���������������� ���������������������� ����������� - ������ ����� ���� ���������� - ������ ����������������� ������������ ∝ � ���� � �� �� � �� � �� � ���� � �� � �� � �� � ������� �� ��� �������������� � ��� � �� � �� �� � � ��� � � � ��� � � �� � �� � �� � � �� �� � � � �� � �� �� � �� � �� � ���� � �� � �� � �� � ������ ������ �� �������������� 〈 � ��� 〉 � � � � � � � �� � �� �� � � ��� � � � � � 〈 � � �� 〉 ������������������������ / ����� ∝ � ���� Figure 4: Number of citations versus number of collaborators. ����� ������������������������������������ ��� ������������ ���� ����������� - �������� ������ ������������� ������������� ����������� �������� - � ������������������������������ �� / �������� ����� - ������ - �������� �������� ������� ���� + � ������ ���������� �������� ������ ���������� �������� ����������������� ������� �������������� �������� ���� - ��������������� ������������������������� ����������� ������ ������������������ ����������� ����� - ����������������������������� ������ ���������� ���������� ����� - �������������� ���� ���������������������������� - � ����� ���� ���������� - �������� ���������� ����� ∝ � ���� �� �� � �� � �� - � �� - � ����� � �� � ������� �� ��� �������������� � ��� � � �� / � � � � � �� �� � �� � �� - � �� - � ����� � �� � ������ ������ �� �������������� 〈 � ��� 〉 � � � � � � �� �� �� �� � �� � - � ��� � � � � �� � �� �� � � 〈 � � �� / � � � � 〉 ������������������������ / ����� ∝ � ���� Figure 5: Total number of fractionally-counted citations versus number of collaborators.This means that fractionally-counted citations or individual citations neither reward nor pe-nalise working in big collaborations, while citations reward authors who prefer working in bigcollaborations.
When analysing different fields it is usually meaningful to correct for their different publicationintensities. As different fields also show different collaboration patterns, this opens the issueabout how to properly account for the two aspects in a combined way. A simple generalanswer is obtained by counting citations divided by the number of references of the citing10 ���� ���������������������������������� ����� ��� ����������������� - ����� ����������������� - �������� �������� ���������� / ����� ����������� ������������������������ ������������������ �� / �������� ����� - ������ - �������� �������� ���� + � ������������ - ���������� ���������������� ���������� ��������������������� ������� ��������� �������� ���� - ����������� ������������������������� �������� ���������� ������������ ����������� ����� - ����������������������������� ������ ��������������� ��������� - �������������� ���������� ���������������������������� ���� ������������ ���������� ����� ∝ � ���� �� � �� � �� � �� - � �� - � �� - � �� � �� � �� � ������� �� ��� �������������� � ��� � � � �� / � � � � � �� �� � �� � �� - � �� - � ��� ������ ������ �� �������������� 〈 � ��� 〉 � � � � � � � �� � � � �� � � � �� � �� �� � � 〈 � � � �� / � � � � 〉 ������������������������ / ����� ∝ � ���� Figure 6: Total number of individual citations (fractionally-counted citations divided byreferences of citing papers) versus number of collaborators.paper, namely N icit ≡ N cit /N p totcit aut N ref (for a precise definition see Strumia and Torre (2019)where this indicator is dubbed “individual citations”). This quantity is similar to citations (sothat p totcit ≈
1) and it automatically provides a field-independent indicator, without havingto identify fields. Indeed, within any hypothetical closed field (with no citations to or fromother fields) this indicator satisfies the sum rule (cid:80) N icit = N pap , up to recent papers that willreceive citations in the future. In words, papers in sectors with higher publication intensitytend to receive more citations and thereby also tend to have more references. One can therebyuse references to factor out publication intensity (see e.g. Zitt and Small (2008) and Waltman(2015) for a review).Figure 6 shows this field-normalised indicator applied to fundamental physics, showing thatit exhibits the desired negligible scaling with collaboration size. h index Looking at data, the top-left panel of fig. 7 shows that the h index scales with the number ofauthors as h ∝ N . − . (16)consistently with our expectation. In order to avoid this scaling, various authors defined modi-fied h indices with fractionalized counting. As the h index combines information on the numberof citations and of papers, one can fractionalize with respect to citations and/or with respect topapers. The first option was considered by Egghe (2008), who defined an index (here called h cit )equal to the number of papers that received more than h cit fractionalized citations. The secondoption was considered by Schreiber (2008), who defined an index (here called h pap ) equal tothe number of fractionally-counted papers that received more than h pap citations. Looking atdata, the lower row of fig. 7 shows that the these two possibilities are only partially successful,11 �� �� � �� � �� � ����� � ������ ������ �� �������������� 〈 � ��� 〉 � � � � � � � � � � � � � � �� � � ��� - �������������� � ����� ������������������������ / ����� ∝ � ������ � �� �� � �� � �� � �� - � ��� ������ ������ �� �������������� 〈 � ��� 〉 � � � � � � �� �� �� �� � �� �� � � � � �� � � � ���� �������������� ������ ��� ��������� ������������������������ / ����� ∝ � ���� � �� �� � �� � �� � �� - � ��� ������ ������ �� �������������� 〈 � ��� 〉 � � � � � � � � � � � � � � � � � � � � �� � � � ���� �������������� ������ ������������������������ / ����� ∝ � ��� - ���� � �� �� � �� � �� � ���� ���� ������ �� �������������� 〈 � ��� 〉 � �� � � ��� � � �� � �� � � � ���� �������������� ��������� ������������������������ / ����� ∝ � ��� - ���� Figure 7:
Top-left: the h -index scales as the square root of the average number of collab-orators. Top-right : the h -index fractionalized as suggested in eq. (18) does not scale withthe average number of collaborators. Bottom : the h -index fractionalized with respect to pa-pers (left) or to citations (right) shows a reduced residual scaling with the average number ofcollaborators.as they both still exhibit a milder scaling with the average number of co-authors h cit , h pap ∝ N − . . (17)In order to understand the reason of this residual scaling with the average number of co-authors, we recall that larger collaborations tend to produce more papers as well as papers thatget more cited. The h index fractionalized with respect to papers would exhibit the desiredscale-invariant behaviour if the scalings were N pap ∝ N p pap aut and N cit ∝ N p cit aut with p pap = 1 and12 cit = 0. Similarly, the h index fractionalized with respect to citations would exhibit the desiredscale-invariant behaviour if p pap = 0 and p cit = 1.Instead, data show p pap ≈ p cit ≈ / h -index h f that exhibits the desired scale-invariantbehaviour is obtained by partially fractionalizing with respect to both papers and citations.To compute our h f one needs to sort papers according to citations partially fractionalized as c = N cit / √ N aut , and summing authorships partially fractionalized as 1 / √ N aut until h f is largerthan the fractionalized c . In formulæ h f = (cid:88) h f InSpire and in termsof the large variability in collaboration size, that is by far the largest in the public researchdomain, reaching thousands of authors. To our knowledge there are no other studies on thistopic carried out with such a dataset, so that our results cannot be confronted with previousstudies.Having to deal with N aut (cid:29) / 2. We therefore formulated a theoretical model for bibliometrics of collaborations based onthe aforementioned considerations and assumptions. As any model, it needs to be confrontedwith data to extract unknowns, that, in our case are the scaling exponents of the power-lawdependence of the number of papers p pap , the number of citations per paper p cit , the totalnumber of citations p totcit , and the number of fractionally counted citations p fcit .In section 3 we computed all the quantities relevant for our model and extracted the un-knowns from the data in the InSpire dataset. On the one hand we observed in data theexpected approximate power-law scaling. On the other hand, we were able to estimate thedesired exponents, observing p pap ≈ . − . p cit ≈ . − . p totcit ≈ 1, and p fcit ≈ h index, despite its well known cor-relation with the number of citations, we also predicted and evaluated the scaling with thenumber of authors of the h index, h ∝ N . − . . Furthermore, we defined a modified h index(see eq. (18)) that roughly does not scale with the number of authors. We confronted our resultwith different modified h indices proposed in the literature, which did not solve the issue ofdependence on the number of authors.Our results apply to mean (or median) quantities. Before concluding, a comment on the fulldistributions, or at least on their variances is in order. In the right panel of Figure 1 we showthe distributions of individual citations received by all papers in our database, splitting themaccording to their number of authors. We see that, as already suggested in the literature, paperswith more authors are more cited. We also see that distributions have large variabilities: thedistributions are approximatively log-normal with log-scale means that scale as (cid:104) N cit (cid:105) ∝ N . and with log-scale widths that remain approximatively constant. This behaviour is obtainedfrom our initial theoretical considerations adding one extra assumption: that collaborationstend to equalise the total amount of skill within each competence, such that the distributionin N cit of a collaboration is simply obtained rescaling the distribution of single-author papers.The bibliometric output of collaborations formed as random groups of authors would insteadshow larger variabilities. References Aksnes, D. W., Schneider, J. W., and Gun-narsson, M. (2012). Ranking national re-search systems by citation indicators. A com-parative analysis using whole and fraction-alised counting methods. J. Informetrics , (1), 36–43. ( Semantic Scholar ) doi: [link]Avkiran, N. K. (2013). An empirical in-vestigation of the influence of collaborationin finance on article impact. Scientometrics , (3), 911-925. doi: [link]14eaver, D. B. (1986). Collaboration andteamwork in physics. Czech. J. Phys. B , (1), 14-18. ( ADS ) doi: [link]Birnholtz, J. P. (2006). What does it mean tobe an author? The intersection of credit, con-tribution, and collaboration in science. JA-SIST , (13), 1758-1770. ( ACM DL ) doi:[link]Bordons, M., Garcia-Jover, F., and Barrigon,S. (1993). Is collaboration improving researchvisibility? Spanish scientific output in phar-macology and pharmacy. Research Evalua-tion , (1), 19-24. doi: [link]Bouyssou, D. and Marchant, T. (2016).Ranking authors using fractional counting ofcitations: An axiomatic approach. J. In-formetrics , (1), 183–199. ( SemanticScholar ) doi: [link]Carbone, V. (2011). Fractional count-ing of authorship to quantify scientific re-search output. arXiv , . ( SemanticScholar )Egghe, L. (2008). Mathematical theory of theh- and g-index in case of fractional countingof authorship. JASIST , (10), 1608–1616.( Semantic Scholar ) doi: [link]Gauffriau, M. (2017). A categorization ofarguments for counting methods for publica-tion and citation indicators. J. Informetrics , (3), 672–684. ( Semantic Scholar ) doi:[link]Gauffriau, M., Larsen, P. O., Maye, I.,Roulin-Perriard, A., and von Ins, M. (2008).Comparisons of results of publication count-ing using different methods. Scientometrics , (1), 147–176. ( Semantic Scholar ) doi:[link] Gazni, A. and Didegah, F. (2011). Investigat-ing different types of research collaborationand citation impact: a case study of Har-vard University’s publications. Scientomet-rics , (2), 251-265. ( Semantic Scholar )doi: [link]Hirsch, J. E. (2005). An index to quan-tify an individual’s scientific research out-put. Proc. Nat. Acad. Sci. , . ( SemanticScholar ) doi: [link]Hooydonk, G. V. (1997). Fractional countingof multiauthored publications: Consequencesfor the impact of authors. JASIS , (10),944–945. ( Semantic Scholar ) doi: [link]Hsu, J. and Huang, D. (2011). Correlationbetween impact and collaboration. Sciento-metrics , (2), 317-324. doi: [link]Katz, J. S. and Hicks, D. (1997). How muchis a collaboration worth? A calibrated bib-liometric model. Scientometrics , (3), 541-554. ( Semantic Scholar ) doi: [link]Kosmulski, M. (2012). The order in the listsof authors in multi-author papers revisited. J. Informetrics , (4), 639–644. ( SemanticScholar ) doi: [link]Lee, S. and Bozeman, B. (2005). The impactof research collaboration on scientific produc-tivity. Social Studies of Science , (5), 673-702. doi: [link]Leydesdorff, L. and Bornmann, L. (2010).How fractional counting affects the impactfactor: Steps towards field-independent clas-sifications of scholarly journals and literature. arXiv , . ( Semantic Scholar )Leydesdorff, L. and Bornmann, L. (2011).How fractional counting of citations affectsthe impact factor: Normalization in terms ofdifferences in citation potentials among fields15f science. JASIST , (2), 217–229. ( Se-mantic Scholar ) doi: [link]Leydesdorff, L. and Opthof, T. (2010). Nor-malization at the field level: fractional count-ing of citations. J. Informetrics , , 644-646.( Semantic Scholar )Leydesdorff, L. and Park, H. W. (2017). Fulland fractional counting in bibliometric net-works. J. Informetrics , (1), 117–120. ( Se-mantic Scholar ) doi: [link]Leydesdorff, L. and Shin, J. C. (2011). Howto evaluate universities in terms of their rela-tive citation impacts: Fractional counting ofcitations and the normalization of differencesamong disciplines. JASIST , (6), 1146–1155. ( Semantic Scholar ) doi: [link]Mannella, R. and Rossi, P. (2013). On thetime dependence of the h -index. J. Informet-rics , (1), 176–182. ( Semantic Scholar )doi: [link]Perianes-Rodr´ıguez, A., Waltman, L., andvan Eck, N. J. (2016). Constructing bib-liometric networks: A comparison betweenfull and fractional counting. J. Informetrics , (4), 1178–1195. ( Semantic Scholar )doi: [link]Rossi, P., Strumia, A., and Torre, R. (2019).Bibliometrics for collaboration works. In Proceedings of the 17th International Con-ference on Scientometrics and InformetricsISSI 2019, Rome, 2-5 September 2019 (Vol. I,p. 975-983). ( Semantic Scholar )Rousseau, R. (2014). A note on the inter-polated or real-valued h-index with a gen-eralization for fractional counting. AslibJ. Inf. Manag. , (1), 2–12. ( SemanticScholar ) doi: [link] Schreiber, M. (2008, jul). A modification ofthe h-index: the hm-index accounts for multi-authored manuscripts. Journal of Informet-rics , (3), 211–216. ( Semantic Scholar )doi: [link]Sooryamoorthy, R. (2009). Do types of col-laboration change citation? Collaborationand citation patterns of South African sci-ence publications. Scientometrics , , 177-193. ( Semantic Scholar ) doi: [link]Strumia, A. and Torre, R. (2019). Biblio-ranking fundamental physics. J. Informet-rics . ( Semantic Scholar )van Raan, A. F. J. (1997). Science as aninternational enterprise. Science and PublicPolicy , (5), 290-300. doi: [link]Waltman, L. (2015). A review of the liter-ature on citation impact indicators. CoRR , abs/1507.02099 . Retrieved from http://arxiv.org/abs/1507.02099 ( SemanticScholar )Waltman, L. and van Eck, N. J. (2015). Field-normalized citation impact indicators and thechoice of an appropriate counting method. J. Informetrics , (4), 872–894. ( SemanticScholar ) doi: [link]Yong, A. (2014). Critique of hirsch’s cita-tion index: a combinatorial fermi problem. Notices of the AMS , (9), 1040–1050. ( Se-mantic Scholar ) doi: [link]Zitt, M. and Small, H. (2008). Modifyingthe journal impact factor by fractional cita-tion weighting: The audience factor.