[PDF] A paper's corresponding affiliation and first affiliation are consistent at the country level in Web of Science

Abstract

The purpose of this study is to explore the relationship between the first affiliation and the corresponding affiliation at the different levels via the scientometric analysis We select over 18 million papers in the core collection database of Web of Science (WoS) published from 2000 to 2015, and measure the percentage of match between the first and the corresponding affiliation at the country and institution level. We find that a paper's the first affiliation and the corresponding affiliation are highly consistent at the country level, with over 98% of the match on average. However, the match at the institution level is much lower, which varies significantly with time and country. Hence, for studies at the country level, using the first and corresponding affiliations are almost the same. But we may need to take more cautions to select affiliation when the institution is the focus of the investigation. In the meanwhile, we find some evidence that the recorded corresponding information in the WoS database has undergone some changes since 2013, which sheds light on future studies on the comparison of different databases or the affiliation accuracy of WoS. Our finding relies on the records of WoS, which may not be entirely accurate. Given the scale of the analysis, our findings can serve as a useful reference for further studies when country allocation or institute allocation is needed. Existing studies on comparisons of straight counting methods usually cover a limited number of papers, a particular research field or a limited range of time. More importantly, using the number counted can not sufficiently tell if the corresponding and first affiliation are similar. This paper uses a metric similar to Jaccard similarity to measure the percentage of the match and performs a comprehensive analysis based on a large-scale bibliometric database.

Full PDF

RRESEARCH

A paper’s corresponding afﬁliation and ﬁrstafﬁliation are consistent at the country level inWeb of Science

Jianfei Yu ,Chunxiao Yin ∗ , Linlin Liu , and Tao Jia ∗ College of Computer and Information Science, Southwest University, Chongqing,400715, P. R. China.

Keywords: ﬁrst afﬁliation; corresponding afﬁliation; web of science; straight counting

ABSTRACT

Purpose:

The purpose of this study is to explore the relationship between the ﬁrstafﬁliation and the corresponding afﬁliation at the different levels via the scientometricanalysis.

Design/methodology/approach:

We select over 18 million papers in the core collectiondatabase of Web of Science (WoS) published from 2000 to 2015, and measure thepercentage of match between the ﬁrst and the corresponding afﬁliation at the country andinstitution level.

Finding:

We ﬁnd that a paper’s the ﬁrst afﬁliation and the corresponding afﬁliation arehighly consistent at the country level, with over 98% of the match on average. However,the match at the institution level is much lower, which varies signiﬁcantly with time andcountry. Hence, for studies at the country level, using the ﬁrst and correspondingafﬁliations are almost the same. But we may need to take more cautions to select afﬁliationwhen the institution is the focus of the investigation. In the meanwhile, we ﬁnd someevidence that the recorded corresponding information in the WoS database has undergonesome changes since 2013, which sheds light on future studies on the comparison ofdifferent databases or the afﬁliation accuracy of WoS.

Research limitations:

Our ﬁnding relies on the records of WoS, which may not be entirelyaccurate.

Practical implications:

Given the scale of the analysis, our ﬁndings can serve as a usefulreference for further studies when country allocation or institute allocation is needed.

Originality/value:

Existing studies on comparisons of straight counting methods usuallycover a limited number of papers, a particular research ﬁeld or a limited range of time.More importantly, using the number counted can not sufﬁciently tell if the correspondingand ﬁrst afﬁliation are similar. This paper uses a metric similar to Jaccard similarity tomeasure the percentage of the match and performs a comprehensive analysis based on alarge-scale bibliometric database.

INTRODUCTION a r X i v : . [ c s . D L ] J a n hen investigating the scientiﬁc productivity, scientiﬁc impact or scientiﬁc developmentof a country, we need to ﬁrst allocate papers to different countries (Gazni, Sugimoto, &Didegah, 2012; Gingras & Khelfaoui, 2018; Gonzalez-Brambila, Reyes-Gonzalez, Veloso, &Perez-Ang ´on, 2016). This problem is partially related to how we “count” papers, givingrise to a rich body of studies on the counting method (Aksnes, Schneider, & Gunnarsson,2012; Gauffriau, Larsen, Maye, Roulin-Perriard, & von Ins, 2008; M.-H. Huang, Lin, & Chen,2011; Korytkowski & Kulczycki, 2019; Smolinsky & Lercher, 2020; Vavryˇcuk, 2018; Waltman& van Eck, 2015). Such studies become more important given the intensiﬁed internationalcollaboration nowadays, through which the scientiﬁc production is characterized by notonly multiple institutes but also by multiple countries (Gul et al., 2015; Zacharewicz, Lep-ori, Reale, & Jonkers, 2019). Among the counting methods commonly applied, straightcounting is the one that allocates the whole credit of a paper to a single entity (Gauffriauet al., 2008; M.-H. Huang et al., 2011; Lin, Huang, & Chen, 2013). In other words, the pa-per would belong to one single country or one institute among the multiple afﬁliations ofthe paper. Previous studies suggest that straight counting is preferred in professional and scientiﬁc bibliometrics operations, especially when dealing with large-scale literature data(M.-H. Huang et al., 2011; Larsen, 2008).The logic behind the straight counting is that one of the most prominent afﬁliations (orauthors) owns the whole paper (Hagen, 2014; Mattsson, Sundberg, & Laget, 2011). Exist-ing studies mainly consider two options: using the ﬁrst afﬁliation (author) (B ¨orner, Penu-marthy, Meiss, & Ke, 2006; Gauffriau, Larsen, Maye, Roulin-Perriard, & von Ins, 2007; vanLeeuwen, 2009) or the corresponding afﬁliation (author) (Man, Weinkauf, Tsang, & Sin,2004; Mazloumian, Helbing, Lozano, Light, & B ¨orner, 2013). The counting results by thetwo options are compared. In some previous studies, it is found that counting results us-ing the ﬁrst afﬁliation and the corresponding afﬁliation are consistent in reﬂecting researchproductivity at the country level (M.-H. Huang et al., 2011; Waltman & van Eck, 2015).Nevertheless, these studies usually focus on a small fragment of paper data, covering onlya limited research ﬁeld or time range, which may not be conclusively generalized to othercircumstances.More importantly, country allocation is related to applications more general than justcounting (Petersen, Pan, Pammolli, & Fortunato, 2019). For instance, when studying the ci-tation network of countries, we need to assign a country to each paper (Apolloni, Rouquier,& Jensen, 2013; Bornmann, Stefaner, de Moya Aneg ´on, & Mutz, 2014; Hu, Wang, & Deng, rst or the corresponding afﬁliation are the same. But the country allocation by the ﬁrstafﬁliation is entirely different from that by the corresponding afﬁliation.In this study, we perform a comprehensive analysis using a large bibliometric databasethat contains over 18 million papers published from 2000 to 2015 on the Web of Science(WoS). Instead of counting, we measure the percentage of matches between the ﬁrst afﬁl-iation and the corresponding afﬁliation. We ﬁnd that the two afﬁliations of a paper areconsistent at the country level, with over 98% of matches on average. Therefore, for studiesat the country level, either counting the number of papers or the construction of citationnetworks, results based on the ﬁrst or corresponding afﬁliation are almost the same. Thematch at the institution level is lower, which also varies signiﬁcantly with time and coun-try. Hence we may need to take more cautions to select afﬁliation when the institution isthe focus of the investigation. Given the large scale of the analysis, our results can serveas a useful reference for further research when the country allocation or institute allocationis needed. The analyses also reveal the existence of record changes in the WoS database, which bring varaitions on how the corresponding afﬁliation is recorded. This ﬁnding mayshed lights on future studies on the comparison of different database or the afﬁliation ac-curacy of WoS data. DATA AND METHODData set.

We use the data in the Web of Science (WoS), which is a well-established databaseused for the bibliometric analysis (Han et al., 2014). The data covers the Science CitationIndex Expanded (SCIE) database, the Social Sciences Citation Index (SSCI) database, andthe Arts & Humanities Citation Index (A & HCI) database. In total, we analyze over 18million papers published from 2000 to 2015, which include articles, notes, reviews, letters,and conference proceeding papers.

Countries considered.

The most productive 16 countries, in terms of the number of sci-entiﬁc papers, are selected for our study. They are the United States (US), China (CN),theUnited Kingdom (GB), Germany (DE), Japan (JP), Italy (IT), France (FR), Canada (CA), In-dia (IN), Korea (KR), Spain (ES), Australia (AU), Brazil (BR), Netherlands (NL), Turkey(TR), and Russia (RU). The total number of papers by these 16 countries is 18,432,794, cov-ering over 76% of worldwide papers.

Address information.

WoS records the list of afﬁliations and the order of these afﬁl- iations for each paper. Starting in 2008, WoS also records the list of afﬁliations of eachauthor. WoS speciﬁcally records the reprint afﬁliation of each paper, which is consideredto be equivalent to the corresponding afﬁliation (Duffy, 2017; Fox, Ritchey, & Paine, 2018;Gonz´alez-Alcaide, Park, Huaman´ı, & Ramos, 2017; Kahn & MacGarvie, 2016; X. Wang, Xu,Wang, Peng, & Wang, 2013). In some very recent records, one paper may have multiplereprint afﬁliations. This is, however, not commonly observed in papers published duringthe period analyzed in this work. Almost all papers have only one reprint afﬁliation.

Comparing Institution Names.

The institution names in the meta data are usually notconsistent (Donner, Rimmert, & van Eck, 2020; Rimmert, Schwechheimer, & Winterhager,2017). One institution can be referred by different ways related to different writing andcoding rules (i.e., by the ofﬁcial, full institution name, and/or by varying forms of abbre-viations). For example, “Tsinghua University” and “Tsinghua Univ” are the same institu-tion, where the former is the full name and the latter is the abbreviated form. Likewise,two names with similar word composition may be related to two distinct institutions. For xample, “Univ Colorado” and“Univ Colorado Denver” are different, with the latter be-ing a branch of the former. In general, the name comparison is related to a broader andmore challenging problem called institution name disambiguation (S. Huang, Yang, Yan, &Rousseau, 2014; Jacob, Javed, Zhao, & Mcnair, 2014).Fortunately, in this work, we only need to determine if two institutions are the samein one single paper. It is unlikely that a paper contains two distinct yet literally similarinstitutions as its reprint and ﬁrst afﬁliation. Therefore, we do not need to solve the namedisambiguation problem. Here, we measure the edit distance of two institutions in the ﬁrstand reprint afﬁliation . The edit distance, also called Levenshtein distance, is deﬁned asthe minimum number of edits needed to transform one string into the other (Levenshtein,1966). We set the threshold as 90%. Two institution names with edit similarity equal orgreater than this threshold are considered as the same name. Calculating the Match.

We use a metric similar to Jaccard similarity to measure thepercentage of the match between the ﬁrst and corresponding afﬁliation. In particular, we have P i = | C i ∩ F i || C i | , (1)where C i is the set of papers whose the corresponding afﬁliation is associated with i (whichcan be a country or a institution) and F i is the set of papers whose the ﬁrst afﬁliation isassociated with i . We also test the results by changing the denominator to | F i | and | C i ∪ F i | .The conclusion does not change with such variations. RESULTS

The match between the ﬁrst afﬁliation and the corresponding afﬁliation at the country level

We ﬁrst compare the country in the ﬁrst and the corresponding afﬁliation of a paper. Thestatistics demonstrate a high consistency at the country level (Figure 1a). In 98.57% of allpapers analyzed, the ﬁrst and the corresponding afﬁliation point to the same country. Chinaranks the ﬁrst among the 16 countries, with the percentage of the match P ctry = P ctry = P ctry of all countries arehigh at different years. Although there is a sharp decline of P ctry occurring in the year 2012, the lowest value is above 94% (Figure 1b). In general, we can conclude that the countryin the ﬁrst and the corresponding afﬁliation of a paper have a high percentage of mach fordifferent countries and in different years.It is noteworthy that the label of the corresponding afﬁliation in the WoS may not bevery accurate (M.-H. Huang et al., 2011; Moya-Aneg ´on, Guerrero-Bote, Bornmann, & Moed,2013). Indeed, for some countries we ﬁnd a high percentage of papers (such as 76.34% inChina and 72.8% in India) whose corresponding afﬁliation is also the ﬁrst afﬁliation. Thisgives rise to a concern on validity of our conclusion as the high percentage of match canbe simply a result of high overlap between the ﬁrst and the corresponding afﬁliation. Forthis reason, we focus on papers whose the corresponding and the ﬁrst afﬁliation are dif-ferent. The percentage of match decreases slightly, with P ctry = P ctry in different years alsoremain at a high level, with the lowest value 90%. In other words, even when the ﬁrst andcorresponding afﬁliation are different, we still have at least 90% of match at the country evel. The result supports our conclusion that the ﬁrst and the corresponding afﬁliation arehighly consistent at the country level.Finally, both Figure 1b and Figure 1d show a sharp decline of P ctry in the year 2013 whichvirtually splits the curve into two phases (2000-2012 and 2013-2015). P ctry in the ﬁrst phaseis higher (on average 98.96% in Figure 1b and 98.15% in Figure 1d) than that in the secondphase (on average 96.90% in Figure 1b and 94.82% in Figure 1d). Something happens in2013 that brings down the overall percentage of the match by roughly 3 percentage points.Although the overall consistency is high and not signiﬁcantly affected by this decline, thisphenomenon needs further exploration which will be discussed in detail later. a bc d

12 M n

US CN GB DE JP IT FR CA IN KR ES AU BR NL TR RU95%96%97%98%99%100% US CN GB DE JP IT FR CA IN KR ES AU BR NL TR RU90%92%94%96%98%100% ctry P ctry P ctry P ctry P Figure 1. (a)

The percentage of afﬁliation match at the country level P ctry for all papers pub-lished during 2000 to 2015. The dashed line corresponds to the global average. (b) P ctry forpapers published at different years also remain at a high level. (c) The percentage of afﬁliationmatch at the country level P ctry for all papers whose ﬁrst and corresponding afﬁliation are dif-ferent. The dashed line corresponds to the global average. (d) The chang of P ctry over time forpapers published at a given year whose ﬁrst and corresponding afﬁliation are different. Match between the ﬁrst afﬁliation and the corresponding afﬁliation at the institution level

Since there are thousands of institutions all over the world, it is impossible to show theresults for each institution. Therefore, we use the average value grouped by countries oftheir institutions. As shown above, the country information in the ﬁrst and correspondingafﬁliation are highly consistent, using either of them should give roughly the same results.In particular, let P inst denote the percentage of papers with the ﬁrst and the correspondingafﬁliation matched at the institution level in each country. We ﬁnd that P inst is lower than P ctry , whose global average is 91.43% (Figure 2a). The P inst also demonstrates a large variety mong different countries. China is the highest with P inst = 97.23% and the P inst of Brazil isthe lowest (86.63%), with a difference about 10% (Figure 2a). a bc d US CN GB DE JP IT FR CA IN KR ES AU BR NL TR RU84%86%88%90%92%94%96%98%100%

12 M n

US CN GB DE JP IT FR CA IN KR ES AU BR NL TR RU50%60%70%80%90%100%

12 M n

US CN GB DE JP IT FR CA IN KR ES AU BR NL TR RU Global inst P inst P inst P inst P Figure 2. (a)

The percentage of afﬁliation match at the institution level P inst for all paperspublished during 2000 to 2015. The dashed line corresponds to the global average. (b) P inst forpapers published at different years, which is lower than P ctry . (c) The percentage of afﬁliationmatch at the institution level P inst for all papers whose ﬁrst and corresponding afﬁliation aredifferent. The dashed line corresponds to the global average. (d) The change of P inst over timefor papers published at a given year whose ﬁrst and corresponding afﬁliation are different. There is also a large change of P inst over time. Especially, except for China and Russia,the P inst of other countries are between 88% to 94%, and after 2012 the P inst of most coun- tries is below 90%. For Brazil and Korea, this value is even below 80%. The match at theinstitution level is not high in all countries at all years. Hence, allocation by the institutionin the ﬁrst afﬁliation may give different results compared with that by the institution in thecorresponding afﬁliation. One may need to carefully exam the robustness of the conclusionif the analysis is at the institution level.Similar to the analysis at the country level, we also conduct an analysis by excludingpapers whose the ﬁrst afﬁliation is labeled as the corresponding afﬁliation. The results inFigure 2c indicate that the P inst is much lower, with 84.02% on average. And the highest P inst in most countries located between 80% to 90% (Figure 2d). In Turkey, as an extremecase, the lowest P inst is only 42.2% in 2000. This further supports our conclusion that theﬁrst and corresponding afﬁliation are not consistent at the institution level.It is noteworthy that the sharp decline from 2012 to 2013 observed at the country levelis also found at the institution level. The drop is even higher. Speciﬁcally, there is a 7 ercentage points drop in Figure 2b on average and 12 percentage points drop in Figure 2d.This further urges us to explore the cause of the sharp decline. The sharp decline caused by the record change in WoS a bc de f papers by international collaboration papers whose 1st author is affiliated with the 1st affiliation papers whose reprint and first are the same

12 M n

US CN GB DE JP IT FR CA IN KR ES AU BR NL TR RU Global

P PP ctry P ctry P inst P Figure 3. (a)

The percentage of papers produced by international collaboration. The increaseis gradual and steady with no sudden or signiﬁcant changes. (b)

The percentage of paperswhose ﬁrst afﬁliation is included in the afﬁliation list of the ﬁrst author. Before 2013, about8% of papers have the ﬁrst author not afﬁliated with the ﬁrst afﬁliation. (c)

The percentageof papers whose reprint and ﬁrst afﬁliation are the same. The value has a sharp decrease in2013. (d)

We consider papers whose ﬁrst author is also the corresponding author and measurethe percentage of afﬁliation match P ctry at the country level. P ctry has a sharp decrease in 2013. (e, f) The percentage of afﬁliation match at the country level P ctry and at the insitution levelfor P inst papers whose ﬁrst author serves as the corresponding author. The statistics no longerdemonstrate the sharp decline. here are two reasons that seem capable of explaining the sharp decline. Since the declineis observed at both the country and institution level, one possibility is that the number ofinternationally co-authored papers has an acceleration in 2012, which gives rise to a suddendecrease in the percentage of the matched countries. The other possibility is that the declineis simply caused by the manner that WoS records the data. In other words, the label positionof the corresponding afﬁliation has changed in the WoS since 2012.To test the ﬁrst hypothesis, we calculate the percentage of papers in our data that areproduced by international collaboration. The international collaborated papers are thosethat contain afﬁliations in different countries (Gazni & Didegah, 2011; Iwami et al., 2020).We observe that the output by international collaboration demonstrates an increasing trend(Figure 3a), which is in line with that in previous studies (Gazni et al., 2012; Larivi`ere,Gingras, Sugimoto, & Tsou, 2015). Nevertheless, the increase is gradual and steady. Thereis no sudden or signiﬁcant change on the extend of international collaboration. Therefore,the decline observed in Figures 1 and 2 can not be attributed to patterns of international collaboration.For the second possibility, we indeed notice certain changes in the WoS data occurring in2013 that can generate some drastic ﬂuctuations in the statistics. For example, WoS recordstwo types of address information. One is the afﬁliation list of the paper and the other is theafﬁliation list of each author. Ideally, the afﬁliation list of the ﬁrst author should containthe ﬁrst afﬁliation of the paper (Larsen, 2008; Nederhof & Moed, 1993). But before the year2013, there are roughly 8% of papers whose ﬁrst afﬁliation is not included in the afﬁliationlist of the ﬁrst author (Figure 3b). The turning point appears in 2013. Since then the ﬁrstauthor is almost always afﬁliated with the ﬁrst afﬁliation (Figure 3b). Likewise, the fractionof papers whose reprint and ﬁrst afﬁliation are the same increases with time. But the valuehas a sharp decrease in 2013 (Figure 3c). These sudden changes imply some updates inWoS data set. However, the patterns observed in Figures 3b and 3c can not explain thesharp decrease observed in Figures 1 and 2. When we remove papers whose ﬁrst authoris not afﬁliated with the ﬁrst afﬁliation, the sharp decline still preserves. In Figures 1b and2b, we have already shown that P ctry and P inst suddenly decreases in 2013 when removingpapers whose ﬁrst and corresponding afﬁliation are the same.What we ﬁnd most relevant to the sharp decrease is the change in the records of thecorresponding author. In the WoS data, the percentage of papers whose ﬁrst author does not serve as the corresponding author increases smoothly with time. But if we focus onthese kinds of papers, we can observe a sudden decrease in the percentage of papers whoseﬁrst and corresponding afﬁliation are the same (Figure 3d). If we remove these papers inour analysis and consider only papers whose ﬁrst author is also the corresponding author,the sharp decline in P ctry and P inst are no longer observed (Figures 3e and 3f). Therefore,we believe that it is the change of the corresponding author records that gives rise to thesudden drop of the matched afﬁliation at the country and the institution level. CONCLUSION

To summarize, we analyze over 18 million papers in the WoS database published from2000 to 2015. We ﬁnd that a paper’s the ﬁrst afﬁliation and the corresponding afﬁliationare highly consistent at the country level, with over 98% of the match on average. Theextend of the match varies slightly when we focus on different years or consider only thecircumstance when the ﬁrst and the corresponding afﬁliation are different. Nevertheless, he match remains at a high level, with the lowest over 90%. The result is in line withprevious ﬁndings that straight counting by the ﬁrst and the corresponding afﬁliation giverise to close numbers. But our result can be applied to more general applications. Whenallocating a country to a paper, using the ﬁrst or the corresponding afﬁliation would yieldroughly the same results. Considering the fact that the corresponding afﬁliation is notusually explicitly given (in Microsoft Academic Graph for example (Ranjbar-Sahraei, vanEck, & de Jong, 2018; K. Wang et al., 2020)), our ﬁnding can be a useful reference for futurestudies that require country allocation.We also ﬁnd that the mach at the institution level is much lower. On average, about10% of the time, one would get different results when allocating the institution by the ﬁrstafﬁliation instead of the corresponding afﬁliation. The difference may not be signiﬁcantwhen only the number of papers is concerned. But for extended studies such as the impact,the research behavior and the collaboration pattern of different institutions, we need to bemore cautious in deciding which institution a paper belongs to. At least, the robustness of the conclusion needs to be tested by different allocation methods. This also raises interest-ing questions on the university ranking (Abramo & D’Angelo, 2015; Chen et al., 2020; Linet al., 2013; Selten et al., 2020), whose results rely on how the scientiﬁc output by differentuniversities are grouped.Finally, we observe some drastic changes in WoS records that bring a sharp decline inour measures. In particular, the change of corresponding author records gives rise to alower match at the country and institution level. There are studies analyzing and compar-ing different data sets of publications (Adriaanse & Rensleigh, 2013; Aghaei Chadegani etal., 2013; Falagas, Pitsouni, Malietzis, & Pappas, 2008; L ´opez-Illescas, de Moya-Aneg ´on, &Moed, 2008). Some studies also question the accuracy of WoS data in citations and topicclassiﬁcations (Franceschini, Maisano, & Mastrogiacomo, 2016; Ranjbar-Sahraei et al., 2018;van Eck & Waltman, 2019). Except for a few works, however, the accuracy of the afﬁliationinformation is not well discussed. Our observation implies that the afﬁliation of a certainfraction of papers may not be accurately recorded in WoS before 2013. At least, some pa-pers in the WoS may not have the correct corresponding information. The decrease in thestatistics also implies that records after 2013 may have a better accuracy than before. Thepotential errors in WoS data naturally raise concerns about the validity of our ﬁndings. Ifthere are ﬂaws in the data we analyzed, to what extend could we generalize the conclusionthat a paper’s corresponding and ﬁrst afﬁliation are consistent at the country level. Note that, however, some data sets may have better accuracy at some certain records, but noneof them are perfect. If we inevitably need to utilize the imperfect data to perform extendedand comprehensive research, we need to tolerate certain errors within it. From that per-spective, we believe that our ﬁnding is still useful, at least for those research relying onWoS data. Based on what WoS tells, the percentage of the match P ctry is very high in dif-ferent periods of time and different sets of papers considered. Our ﬁnding also provides areference point if other data sets are considered. Given the size of the data analyzed, it ishard to manually check the accuracy of the afﬁliation records of WoS. It would be mean-ingful and interesting to ﬁnd an automatic approach to perform a large-scale exam on thecorresponding afﬁliation and author records in WoS data. EFERENCES

Abramo, G., & D’Angelo, C. A. (2015). Evaluating university re-search: Same performance indicator, different rankings.

Jour-nal of Informetrics , (3), 514–525.Adriaanse, L. S., & Rensleigh, C. (2013). Web of science, scopusand google scholar. The Electronic Library .Aghaei Chadegani, A., Salehi, H., Yunus, M., Farhadi, H.,Fooladi, M., Farhadi, M., & Ale Ebrahim, N. (2013). A compar-ison between two main academic literature collections: Web ofscience and scopus databases.

Asian social science , (5), 18–26.Aksnes, D. W., Schneider, J. W., & Gunnarsson, M. (2012). Rank-ing national research systems by citation indicators. a compar-ative analysis using whole and fractionalised counting meth-ods. Journal of Informetrics , (1), 36–43.Apolloni, A., Rouquier, J.-B., & Jensen, P. (2013). Collaborationrange: Effects of geographical proximity on article impact. TheEuropean Physical Journal Special Topics , (6), 1467–1478.B¨orner, K., Penumarthy, S., Meiss, M., & Ke, W. (2006). Mappingthe diffusion of scholarly knowledge among major us researchinstitutions. Scientometrics , (3), 415–426.Bornmann, L., Stefaner, M., de Moya Aneg´on, F., & Mutz, R.(2014). Ranking and mapping of universities and research-focused institutions worldwide based on highly-cited papers. Online Information Review .Chen, W., Zhu, Z., & Jia, T. (2020). The rank boost by inconsis-tency in university rankings: evidence from 14 rankings of chi-nese universities.

Quantitative Science Studies (Just Accepted),1–17.Donner, P., Rimmert, C., & van Eck, N. J. (2020). Comparinginstitutional-level bibliometric research performance indicatorvalues based on different afﬁliation disambiguation systems.

Quantitative Science Studies , (1), 150–170.Duffy, M. A. (2017). Last and corresponding authorship practicesin ecology. Ecology and evolution , (21), 8876–8887.Falagas, M. E., Pitsouni, E. I., Malietzis, G. A., & Pappas, G.(2008). Comparison of pubmed, scopus, web of science, andgoogle scholar: strengths and weaknesses. The FASEB journal , (2), 338–342.Fortunato, S., Bergstrom, C. T., B¨orner, K., Evans, J. A., Helbing,D., Milojevi´c, S., . . . others (2018). Science of science. Science , (6379).Fox, C. W., Ritchey, J. P., & Paine, C. T. (2018). Patterns of author-ship in ecology and evolution: First, last, and correspondingauthorship vary with gender and geography. Ecology and Evo-lution , (23), 11492–11507.Franceschini, F., Maisano, D., & Mastrogiacomo, L. (2016). Em-pirical analysis and classiﬁcation of database errors in scopusand web of science. Journal of Informetrics , (4), 933–953.Gauffriau, M., Larsen, P., Maye, I., Roulin-Perriard, A., & von Ins,M. (2007). Publication, cooperation and productivity measuresin scientiﬁc research. Scientometrics , (2), 175–214.Gauffriau, M., Larsen, P., Maye, I., Roulin-Perriard, A., & von Ins,M. (2008). Comparisons of results of publication counting us-ing different methods. Scientometrics , (1), 147–176. Gazni, A., & Didegah, F. (2011). Investigating different types ofresearch collaboration and citation impact: a case study of har-vard university’s publications. Scientometrics , (2), 251–265.Gazni, A., Sugimoto, C. R., & Didegah, F. (2012). Mapping worldscientiﬁc collaboration: Authors, institutions, and countries. Journal of the American Society for Information Science and Tech-nology , (2), 323–335.Gingras, Y., & Khelfaoui, M. (2018). Assessing the effect of theunited states’“citation advantage” on other countries’ scientiﬁcimpact as measured in the web of science (wos) database. Sci-entometrics , (2), 517–532.Gonz´alez-Alcaide, G., Park, J., Huaman´ı, C., & Ramos, J. M.(2017). Dominance and leadership in research activities: Col-laboration between countries of differing human developmentis reﬂected through authorship order and designation as cor-responding authors in scientiﬁc publications. PloS one , (8),e0182513.Gonzalez-Brambila, C. N., Reyes-Gonzalez, L., Veloso, F., &Perez-Ang´on, M. A. (2016). The scientiﬁc impact of developingnations. PLoS One , (3), e0151328.Gul, S., Nisa, N. T., Shah, T. A., Gupta, S., Jan, A., & Ahmad, S.(2015). Middle east: research productivity and performanceacross nations. Scientometrics , (2), 1157–1166.Guo, J., Liu, X., Yang, L., & Wu, J. (2019). Are contributions fromchinese physicists undercited? Journal of Data and InformationScience , (4), 84–95.Hagen, N. T. (2014). Counting and comparing publication out-put with and without equalizing and inﬂationary bias. Journalof Informetrics , (2), 310–317.Han, P., Shi, J., Li, X., Wang, D., Shen, S., & Su, X. (2014). Inter-national collaboration in lis: global trends and networks at thecountry and institution level. Scientometrics , (1), 53–72.Hu, H., Wang, D., & Deng, S. (2020). Global collaboration inartiﬁcial intelligence: Bibliometrics and network analysis from1985 to 2019. Journal of Data and Information Science , (ahead-of-print).Huang, M.-H., Lin, C.-S., & Chen, D.-Z. (2011). Counting meth-ods, country rank changes, and counting inﬂation in the as-sessment of national research productivity and impact. Jour-nal of the American society for information science and technology , (12), 2427–2436.Huang, S., Yang, B., Yan, S., & Rousseau, R. (2014). Institutionname disambiguation for research assessment. Scientometrics , (3), 823–838.Iwami, S., Shimizu, T., Empizo, M. J. F., Gabayno, J. L. F.,Sarukura, N., Fujii, S., & Sumimura, Y. (2020). Current sta-tus and enhancement of collaborative research in the world: Acase study of osaka university. Journal of Data and InformationScience , (4), 75–85.Jacob, F., Javed, F., Zhao, M., & Mcnair, M. (2014). scool: A systemfor academic institution name normalization. In (pp.86–93). ia, T., Wang, D., & Szymanski, B. K. (2017). Quantifying patternsof research-interest evolution. Nature Human Behaviour , (4),1–7.Kahn, S., & MacGarvie, M. (2016). Do return requirements in-crease international knowledge diffusion? evidence from thefulbright program. Research Policy , (6), 1304–1322.Korytkowski, P., & Kulczycki, E. (2019). Publication countingmethods for a national research evaluation exercise. Journal ofInformetrics , (3), 804–816.Larivi`ere, V., Gingras, Y., Sugimoto, C. R., & Tsou, A. (2015). Teamsize matters: Collaboration and scientiﬁc impact since 1900. Journal of the Association for Information Science and Technology , (7), 1323–1332.Larsen, P. (2008). The state of the art in publication counting. Scientometrics , (2), 235–251.Levenshtein, V. I. (1966). Binary codes capable of correcting dele-tions, insertions, and reversals. In Soviet physics doklady (Vol. 10,pp. 707–710).Lin, C., Huang, M., & Chen, D. (2013). The inﬂuences of count-ing methods on university rankings based on paper count andcitation count.

Journal of Informetrics , (3), 611–621.Liu, L., Yu, J., Huang, J., Xia, F., & Jia, T. (2020). China may need tosupport more small teams in scientiﬁc research. arXiv preprintarXiv:2003.01108 .L´opez-Illescas, C., de Moya-Aneg´on, F., & Moed, H. F. (2008).Coverage and citation impact of oncological journals in theweb of science and scopus. Journal of informetrics , (4), 304–316.Man, J. P., Weinkauf, J. G., Tsang, M., & Sin, J. H. D. D. (2004).Why do some countries publish more than others? an interna-tional comparison of research funding, english proﬁciency andpublication output in highly ranked general medical journals. European journal of epidemiology , (8), 811–817.Mattsson, P., Sundberg, C. J., & Laget, P. (2011). Is correspon-dence reﬂected in the author position? a bibliometric studyof the relation between corresponding author and byline posi-tion. Scientometrics , (1), 99–105.Mazloumian, A., Helbing, D., Lozano, S., Light, R. P., & B¨orner,K. (2013). Global multi-level analysis of the ‘scientiﬁc foodweb’. Scientiﬁc reports , (1), 1–5.Milojevi´c, S. (2014). Principles of scientiﬁc research team for-mation and evolution. Proceedings of the National Academy ofSciences , (11), 3984–3989.Moya-Aneg´on, F., Guerrero-Bote, V. P., Bornmann, L., & Moed,H. F. (2013). The research guarantors of scientiﬁc papers andthe output counting: a promising new approach. Scientomet-rics , (2), 421–434.Nederhof, A. J., & Moed, H. F. (1993). Modeling multinationalpublication: development of an on-line fractionation approachto measure national scientiﬁc output. Scientometrics , (1), 39–52.Petersen, A. M., Pan, R. K., Pammolli, F., & Fortunato, S. (2019). Methods to account for citation inﬂation in research evalua-tion. Research Policy , (7), 1855–1865.Radicchi, F., Fortunato, S., Markines, B., & Vespignani, A. (2009).Diffusion of scientiﬁc credits and the ranking of scientists. Physical Review E , (5), 056103.Ranjbar-Sahraei, B., van Eck, N. J., & de Jong, R. (2018). Accuracyof afﬁliation information in microsoft academic: Implicationsfor institutional level research evaluation. In Sti 2018 confer-ence proceedings: Proceedings of the 23rd international conferenceon science and technology indicators (pp. 1065–1067).Rimmert, C., Schwechheimer, H., & Winterhager, M. (2017).Disambiguation of author addresses in bibliometric databases-technical report.Selten, F., Neylon, C., Huang, C.-K., & Groth, P. (2020). A lon-gitudinal analysis of university rankings.

Quantitative ScienceStudies , (3), 1109–1135.Shen, H.-W., & Barab´asi, A.-L. (2014). Collective credit alloca-tion in science. Proceedings of the National Academy of Sciences , (34), 12325–12330.Smolinsky, L., & Lercher, A. J. (2020). Co-author weighting in bib-liometric methodology and subﬁelds of a scientiﬁc discipline. arXiv preprint arXiv:2005.05471 .Thelwall, M., Bailey, C., Makita, M., Sud, P., & Madalli, D. P.(2019). Gender and research publishing in india: Uniformlyhigh inequality? Journal of informetrics , (1), 118–131.van Eck, N. J., & Waltman, L. (2019). Accuracy of citation data inweb of science and scopus. arXiv preprint arXiv:1906.07011 .van Leeuwen, T. (2009). Strength and weakness of nationalscience systems: A bibliometric analysis through cooperationpatterns. Scientometrics , (2), 389–408.Vavryˇcuk, V. (2018). Fair ranking of researchers and researchteams. PloS one , (4), e0195509.Waltman, L., & van Eck, N. J. (2015). Field-normalized citationimpact indicators and the choice of an appropriate countingmethod. Journal of Informetrics , (4), 872–894.Wang, K., Shen, Z., Huang, C., Wu, C.-H., Dong, Y., & Kanakia,A. (2020). Microsoft academic graph: When experts are notenough. Quantitative Science Studies , (1), 396–413.Wang, X., Xu, S., Wang, Z., Peng, L., & Wang, C. (2013). Interna-tional scientiﬁc collaboration of china: Collaborating countries,institutions and individuals. Scientometrics , (3), 885–894.Wu, L., Wang, D., & Evans, J. A. (2019). Large teams develop andsmall teams disrupt science and technology. Nature , (7744),378–382.Zacharewicz, T., Lepori, B., Reale, E., & Jonkers, K. (2019).Performance-based research funding in eu member states—acomparative assessment. Science and Public Policy , (1), 105–115.Zeng, A., Shen, Z., Zhou, J., Wu, J., Fan, Y., Wang, Y., & Stanley,H. E. (2017). The science of science: From the perspective ofcomplex systems. Physics Reports , , 1–73. ACKNOWLEDGMENTS e thank Prof. Barabasi at CCNR for giving access to the WOS data. AUTHOR CONTRIBUTIONS