A paper's corresponding affiliation and first affiliation are consistent at the country level in Web of Science
RRESEARCH
A paper’s corresponding affiliation and firstaffiliation are consistent at the country level inWeb of Science
Jianfei Yu ,Chunxiao Yin ∗ , Linlin Liu , and Tao Jia ∗ College of Computer and Information Science, Southwest University, Chongqing,400715, P. R. China.
Keywords: first affiliation; corresponding affiliation; web of science; straight counting
ABSTRACT
Purpose:
The purpose of this study is to explore the relationship between the firstaffiliation and the corresponding affiliation at the different levels via the scientometricanalysis.
Design/methodology/approach:
We select over 18 million papers in the core collectiondatabase of Web of Science (WoS) published from 2000 to 2015, and measure thepercentage of match between the first and the corresponding affiliation at the country andinstitution level.
Finding:
We find that a paper’s the first affiliation and the corresponding affiliation arehighly consistent at the country level, with over 98% of the match on average. However,the match at the institution level is much lower, which varies significantly with time andcountry. Hence, for studies at the country level, using the first and correspondingaffiliations are almost the same. But we may need to take more cautions to select affiliationwhen the institution is the focus of the investigation. In the meanwhile, we find someevidence that the recorded corresponding information in the WoS database has undergonesome changes since 2013, which sheds light on future studies on the comparison ofdifferent databases or the affiliation accuracy of WoS.
Research limitations:
Our finding relies on the records of WoS, which may not be entirelyaccurate.
Practical implications:
Given the scale of the analysis, our findings can serve as a usefulreference for further studies when country allocation or institute allocation is needed.
Originality/value:
Existing studies on comparisons of straight counting methods usuallycover a limited number of papers, a particular research field or a limited range of time.More importantly, using the number counted can not sufficiently tell if the correspondingand first affiliation are similar. This paper uses a metric similar to Jaccard similarity tomeasure the percentage of the match and performs a comprehensive analysis based on alarge-scale bibliometric database.
INTRODUCTION a r X i v : . [ c s . D L ] J a n hen investigating the scientific productivity, scientific impact or scientific developmentof a country, we need to first allocate papers to different countries (Gazni, Sugimoto, &Didegah, 2012; Gingras & Khelfaoui, 2018; Gonzalez-Brambila, Reyes-Gonzalez, Veloso, &Perez-Ang ´on, 2016). This problem is partially related to how we “count” papers, givingrise to a rich body of studies on the counting method (Aksnes, Schneider, & Gunnarsson,2012; Gauffriau, Larsen, Maye, Roulin-Perriard, & von Ins, 2008; M.-H. Huang, Lin, & Chen,2011; Korytkowski & Kulczycki, 2019; Smolinsky & Lercher, 2020; Vavryˇcuk, 2018; Waltman& van Eck, 2015). Such studies become more important given the intensified internationalcollaboration nowadays, through which the scientific production is characterized by notonly multiple institutes but also by multiple countries (Gul et al., 2015; Zacharewicz, Lep-ori, Reale, & Jonkers, 2019). Among the counting methods commonly applied, straightcounting is the one that allocates the whole credit of a paper to a single entity (Gauffriauet al., 2008; M.-H. Huang et al., 2011; Lin, Huang, & Chen, 2013). In other words, the pa-per would belong to one single country or one institute among the multiple affiliations ofthe paper. Previous studies suggest that straight counting is preferred in professional and scientific bibliometrics operations, especially when dealing with large-scale literature data(M.-H. Huang et al., 2011; Larsen, 2008).The logic behind the straight counting is that one of the most prominent affiliations (orauthors) owns the whole paper (Hagen, 2014; Mattsson, Sundberg, & Laget, 2011). Exist-ing studies mainly consider two options: using the first affiliation (author) (B ¨orner, Penu-marthy, Meiss, & Ke, 2006; Gauffriau, Larsen, Maye, Roulin-Perriard, & von Ins, 2007; vanLeeuwen, 2009) or the corresponding affiliation (author) (Man, Weinkauf, Tsang, & Sin,2004; Mazloumian, Helbing, Lozano, Light, & B ¨orner, 2013). The counting results by thetwo options are compared. In some previous studies, it is found that counting results us-ing the first affiliation and the corresponding affiliation are consistent in reflecting researchproductivity at the country level (M.-H. Huang et al., 2011; Waltman & van Eck, 2015).Nevertheless, these studies usually focus on a small fragment of paper data, covering onlya limited research field or time range, which may not be conclusively generalized to othercircumstances.More importantly, country allocation is related to applications more general than justcounting (Petersen, Pan, Pammolli, & Fortunato, 2019). For instance, when studying the ci-tation network of countries, we need to assign a country to each paper (Apolloni, Rouquier,& Jensen, 2013; Bornmann, Stefaner, de Moya Aneg ´on, & Mutz, 2014; Hu, Wang, & Deng, rst or the corresponding affiliation are the same. But the country allocation by the firstaffiliation is entirely different from that by the corresponding affiliation.In this study, we perform a comprehensive analysis using a large bibliometric databasethat contains over 18 million papers published from 2000 to 2015 on the Web of Science(WoS). Instead of counting, we measure the percentage of matches between the first affil-iation and the corresponding affiliation. We find that the two affiliations of a paper areconsistent at the country level, with over 98% of matches on average. Therefore, for studiesat the country level, either counting the number of papers or the construction of citationnetworks, results based on the first or corresponding affiliation are almost the same. Thematch at the institution level is lower, which also varies significantly with time and coun-try. Hence we may need to take more cautions to select affiliation when the institution isthe focus of the investigation. Given the large scale of the analysis, our results can serveas a useful reference for further research when the country allocation or institute allocationis needed. The analyses also reveal the existence of record changes in the WoS database, which bring varaitions on how the corresponding affiliation is recorded. This finding mayshed lights on future studies on the comparison of different database or the affiliation ac-curacy of WoS data. DATA AND METHODData set.
We use the data in the Web of Science (WoS), which is a well-established databaseused for the bibliometric analysis (Han et al., 2014). The data covers the Science CitationIndex Expanded (SCIE) database, the Social Sciences Citation Index (SSCI) database, andthe Arts & Humanities Citation Index (A & HCI) database. In total, we analyze over 18million papers published from 2000 to 2015, which include articles, notes, reviews, letters,and conference proceeding papers.
Countries considered.
The most productive 16 countries, in terms of the number of sci-entific papers, are selected for our study. They are the United States (US), China (CN),theUnited Kingdom (GB), Germany (DE), Japan (JP), Italy (IT), France (FR), Canada (CA), In-dia (IN), Korea (KR), Spain (ES), Australia (AU), Brazil (BR), Netherlands (NL), Turkey(TR), and Russia (RU). The total number of papers by these 16 countries is 18,432,794, cov-ering over 76% of worldwide papers.
Address information.
WoS records the list of affiliations and the order of these affil- iations for each paper. Starting in 2008, WoS also records the list of affiliations of eachauthor. WoS specifically records the reprint affiliation of each paper, which is consideredto be equivalent to the corresponding affiliation (Duffy, 2017; Fox, Ritchey, & Paine, 2018;Gonz´alez-Alcaide, Park, Huaman´ı, & Ramos, 2017; Kahn & MacGarvie, 2016; X. Wang, Xu,Wang, Peng, & Wang, 2013). In some very recent records, one paper may have multiplereprint affiliations. This is, however, not commonly observed in papers published duringthe period analyzed in this work. Almost all papers have only one reprint affiliation.
Comparing Institution Names.
The institution names in the meta data are usually notconsistent (Donner, Rimmert, & van Eck, 2020; Rimmert, Schwechheimer, & Winterhager,2017). One institution can be referred by different ways related to different writing andcoding rules (i.e., by the official, full institution name, and/or by varying forms of abbre-viations). For example, “Tsinghua University” and “Tsinghua Univ” are the same institu-tion, where the former is the full name and the latter is the abbreviated form. Likewise,two names with similar word composition may be related to two distinct institutions. For xample, “Univ Colorado” and“Univ Colorado Denver” are different, with the latter be-ing a branch of the former. In general, the name comparison is related to a broader andmore challenging problem called institution name disambiguation (S. Huang, Yang, Yan, &Rousseau, 2014; Jacob, Javed, Zhao, & Mcnair, 2014).Fortunately, in this work, we only need to determine if two institutions are the samein one single paper. It is unlikely that a paper contains two distinct yet literally similarinstitutions as its reprint and first affiliation. Therefore, we do not need to solve the namedisambiguation problem. Here, we measure the edit distance of two institutions in the firstand reprint affiliation . The edit distance, also called Levenshtein distance, is defined asthe minimum number of edits needed to transform one string into the other (Levenshtein,1966). We set the threshold as 90%. Two institution names with edit similarity equal orgreater than this threshold are considered as the same name. Calculating the Match.
We use a metric similar to Jaccard similarity to measure thepercentage of the match between the first and corresponding affiliation. In particular, we have P i = | C i ∩ F i || C i | , (1)where C i is the set of papers whose the corresponding affiliation is associated with i (whichcan be a country or a institution) and F i is the set of papers whose the first affiliation isassociated with i . We also test the results by changing the denominator to | F i | and | C i ∪ F i | .The conclusion does not change with such variations. RESULTS
The match between the first affiliation and the corresponding affiliation at the country level
We first compare the country in the first and the corresponding affiliation of a paper. Thestatistics demonstrate a high consistency at the country level (Figure 1a). In 98.57% of allpapers analyzed, the first and the corresponding affiliation point to the same country. Chinaranks the first among the 16 countries, with the percentage of the match P ctry = P ctry = P ctry of all countries arehigh at different years. Although there is a sharp decline of P ctry occurring in the year 2012, the lowest value is above 94% (Figure 1b). In general, we can conclude that the countryin the first and the corresponding affiliation of a paper have a high percentage of mach fordifferent countries and in different years.It is noteworthy that the label of the corresponding affiliation in the WoS may not bevery accurate (M.-H. Huang et al., 2011; Moya-Aneg ´on, Guerrero-Bote, Bornmann, & Moed,2013). Indeed, for some countries we find a high percentage of papers (such as 76.34% inChina and 72.8% in India) whose corresponding affiliation is also the first affiliation. Thisgives rise to a concern on validity of our conclusion as the high percentage of match canbe simply a result of high overlap between the first and the corresponding affiliation. Forthis reason, we focus on papers whose the corresponding and the first affiliation are dif-ferent. The percentage of match decreases slightly, with P ctry = P ctry in different years alsoremain at a high level, with the lowest value 90%. In other words, even when the first andcorresponding affiliation are different, we still have at least 90% of match at the country evel. The result supports our conclusion that the first and the corresponding affiliation arehighly consistent at the country level.Finally, both Figure 1b and Figure 1d show a sharp decline of P ctry in the year 2013 whichvirtually splits the curve into two phases (2000-2012 and 2013-2015). P ctry in the first phaseis higher (on average 98.96% in Figure 1b and 98.15% in Figure 1d) than that in the secondphase (on average 96.90% in Figure 1b and 94.82% in Figure 1d). Something happens in2013 that brings down the overall percentage of the match by roughly 3 percentage points.Although the overall consistency is high and not significantly affected by this decline, thisphenomenon needs further exploration which will be discussed in detail later. a bc d
12 M n
US CN GB DE JP IT FR CA IN KR ES AU BR NL TR RU95%96%97%98%99%100% US CN GB DE JP IT FR CA IN KR ES AU BR NL TR RU90%92%94%96%98%100% ctry P ctry P ctry P ctry P Figure 1. (a)
The percentage of affiliation match at the country level P ctry for all papers pub-lished during 2000 to 2015. The dashed line corresponds to the global average. (b) P ctry forpapers published at different years also remain at a high level. (c) The percentage of affiliationmatch at the country level P ctry for all papers whose first and corresponding affiliation are dif-ferent. The dashed line corresponds to the global average. (d) The chang of P ctry over time forpapers published at a given year whose first and corresponding affiliation are different. Match between the first affiliation and the corresponding affiliation at the institution level
Since there are thousands of institutions all over the world, it is impossible to show theresults for each institution. Therefore, we use the average value grouped by countries oftheir institutions. As shown above, the country information in the first and correspondingaffiliation are highly consistent, using either of them should give roughly the same results.In particular, let P inst denote the percentage of papers with the first and the correspondingaffiliation matched at the institution level in each country. We find that P inst is lower than P ctry , whose global average is 91.43% (Figure 2a). The P inst also demonstrates a large variety mong different countries. China is the highest with P inst = 97.23% and the P inst of Brazil isthe lowest (86.63%), with a difference about 10% (Figure 2a). a bc d US CN GB DE JP IT FR CA IN KR ES AU BR NL TR RU84%86%88%90%92%94%96%98%100%
12 M n
US CN GB DE JP IT FR CA IN KR ES AU BR NL TR RU50%60%70%80%90%100%
12 M n
US CN GB DE JP IT FR CA IN KR ES AU BR NL TR RU Global inst P inst P inst P inst P Figure 2. (a)
The percentage of affiliation match at the institution level P inst for all paperspublished during 2000 to 2015. The dashed line corresponds to the global average. (b) P inst forpapers published at different years, which is lower than P ctry . (c) The percentage of affiliationmatch at the institution level P inst for all papers whose first and corresponding affiliation aredifferent. The dashed line corresponds to the global average. (d) The change of P inst over timefor papers published at a given year whose first and corresponding affiliation are different. There is also a large change of P inst over time. Especially, except for China and Russia,the P inst of other countries are between 88% to 94%, and after 2012 the P inst of most coun- tries is below 90%. For Brazil and Korea, this value is even below 80%. The match at theinstitution level is not high in all countries at all years. Hence, allocation by the institutionin the first affiliation may give different results compared with that by the institution in thecorresponding affiliation. One may need to carefully exam the robustness of the conclusionif the analysis is at the institution level.Similar to the analysis at the country level, we also conduct an analysis by excludingpapers whose the first affiliation is labeled as the corresponding affiliation. The results inFigure 2c indicate that the P inst is much lower, with 84.02% on average. And the highest P inst in most countries located between 80% to 90% (Figure 2d). In Turkey, as an extremecase, the lowest P inst is only 42.2% in 2000. This further supports our conclusion that thefirst and corresponding affiliation are not consistent at the institution level.It is noteworthy that the sharp decline from 2012 to 2013 observed at the country levelis also found at the institution level. The drop is even higher. Specifically, there is a 7 ercentage points drop in Figure 2b on average and 12 percentage points drop in Figure 2d.This further urges us to explore the cause of the sharp decline. The sharp decline caused by the record change in WoS a bc de f papers by international collaboration papers whose 1st author is affiliated with the 1st affiliation papers whose reprint and first are the same
12 M n
12 M n
US CN GB DE JP IT FR CA IN KR ES AU BR NL TR RU Global
P PP ctry P ctry P inst P Figure 3. (a)
The percentage of papers produced by international collaboration. The increaseis gradual and steady with no sudden or significant changes. (b)
The percentage of paperswhose first affiliation is included in the affiliation list of the first author. Before 2013, about8% of papers have the first author not affiliated with the first affiliation. (c)
The percentageof papers whose reprint and first affiliation are the same. The value has a sharp decrease in2013. (d)
We consider papers whose first author is also the corresponding author and measurethe percentage of affiliation match P ctry at the country level. P ctry has a sharp decrease in 2013. (e, f) The percentage of affiliation match at the country level P ctry and at the insitution levelfor P inst papers whose first author serves as the corresponding author. The statistics no longerdemonstrate the sharp decline. here are two reasons that seem capable of explaining the sharp decline. Since the declineis observed at both the country and institution level, one possibility is that the number ofinternationally co-authored papers has an acceleration in 2012, which gives rise to a suddendecrease in the percentage of the matched countries. The other possibility is that the declineis simply caused by the manner that WoS records the data. In other words, the label positionof the corresponding affiliation has changed in the WoS since 2012.To test the first hypothesis, we calculate the percentage of papers in our data that areproduced by international collaboration. The international collaborated papers are thosethat contain affiliations in different countries (Gazni & Didegah, 2011; Iwami et al., 2020).We observe that the output by international collaboration demonstrates an increasing trend(Figure 3a), which is in line with that in previous studies (Gazni et al., 2012; Larivi`ere,Gingras, Sugimoto, & Tsou, 2015). Nevertheless, the increase is gradual and steady. Thereis no sudden or significant change on the extend of international collaboration. Therefore,the decline observed in Figures 1 and 2 can not be attributed to patterns of international collaboration.For the second possibility, we indeed notice certain changes in the WoS data occurring in2013 that can generate some drastic fluctuations in the statistics. For example, WoS recordstwo types of address information. One is the affiliation list of the paper and the other is theaffiliation list of each author. Ideally, the affiliation list of the first author should containthe first affiliation of the paper (Larsen, 2008; Nederhof & Moed, 1993). But before the year2013, there are roughly 8% of papers whose first affiliation is not included in the affiliationlist of the first author (Figure 3b). The turning point appears in 2013. Since then the firstauthor is almost always affiliated with the first affiliation (Figure 3b). Likewise, the fractionof papers whose reprint and first affiliation are the same increases with time. But the valuehas a sharp decrease in 2013 (Figure 3c). These sudden changes imply some updates inWoS data set. However, the patterns observed in Figures 3b and 3c can not explain thesharp decrease observed in Figures 1 and 2. When we remove papers whose first authoris not affiliated with the first affiliation, the sharp decline still preserves. In Figures 1b and2b, we have already shown that P ctry and P inst suddenly decreases in 2013 when removingpapers whose first and corresponding affiliation are the same.What we find most relevant to the sharp decrease is the change in the records of thecorresponding author. In the WoS data, the percentage of papers whose first author does not serve as the corresponding author increases smoothly with time. But if we focus onthese kinds of papers, we can observe a sudden decrease in the percentage of papers whosefirst and corresponding affiliation are the same (Figure 3d). If we remove these papers inour analysis and consider only papers whose first author is also the corresponding author,the sharp decline in P ctry and P inst are no longer observed (Figures 3e and 3f). Therefore,we believe that it is the change of the corresponding author records that gives rise to thesudden drop of the matched affiliation at the country and the institution level. CONCLUSION
To summarize, we analyze over 18 million papers in the WoS database published from2000 to 2015. We find that a paper’s the first affiliation and the corresponding affiliationare highly consistent at the country level, with over 98% of the match on average. Theextend of the match varies slightly when we focus on different years or consider only thecircumstance when the first and the corresponding affiliation are different. Nevertheless, he match remains at a high level, with the lowest over 90%. The result is in line withprevious findings that straight counting by the first and the corresponding affiliation giverise to close numbers. But our result can be applied to more general applications. Whenallocating a country to a paper, using the first or the corresponding affiliation would yieldroughly the same results. Considering the fact that the corresponding affiliation is notusually explicitly given (in Microsoft Academic Graph for example (Ranjbar-Sahraei, vanEck, & de Jong, 2018; K. Wang et al., 2020)), our finding can be a useful reference for futurestudies that require country allocation.We also find that the mach at the institution level is much lower. On average, about10% of the time, one would get different results when allocating the institution by the firstaffiliation instead of the corresponding affiliation. The difference may not be significantwhen only the number of papers is concerned. But for extended studies such as the impact,the research behavior and the collaboration pattern of different institutions, we need to bemore cautious in deciding which institution a paper belongs to. At least, the robustness of the conclusion needs to be tested by different allocation methods. This also raises interest-ing questions on the university ranking (Abramo & D’Angelo, 2015; Chen et al., 2020; Linet al., 2013; Selten et al., 2020), whose results rely on how the scientific output by differentuniversities are grouped.Finally, we observe some drastic changes in WoS records that bring a sharp decline inour measures. In particular, the change of corresponding author records gives rise to alower match at the country and institution level. There are studies analyzing and compar-ing different data sets of publications (Adriaanse & Rensleigh, 2013; Aghaei Chadegani etal., 2013; Falagas, Pitsouni, Malietzis, & Pappas, 2008; L ´opez-Illescas, de Moya-Aneg ´on, &Moed, 2008). Some studies also question the accuracy of WoS data in citations and topicclassifications (Franceschini, Maisano, & Mastrogiacomo, 2016; Ranjbar-Sahraei et al., 2018;van Eck & Waltman, 2019). Except for a few works, however, the accuracy of the affiliationinformation is not well discussed. Our observation implies that the affiliation of a certainfraction of papers may not be accurately recorded in WoS before 2013. At least, some pa-pers in the WoS may not have the correct corresponding information. The decrease in thestatistics also implies that records after 2013 may have a better accuracy than before. Thepotential errors in WoS data naturally raise concerns about the validity of our findings. Ifthere are flaws in the data we analyzed, to what extend could we generalize the conclusionthat a paper’s corresponding and first affiliation are consistent at the country level. Note that, however, some data sets may have better accuracy at some certain records, but noneof them are perfect. If we inevitably need to utilize the imperfect data to perform extendedand comprehensive research, we need to tolerate certain errors within it. From that per-spective, we believe that our finding is still useful, at least for those research relying onWoS data. Based on what WoS tells, the percentage of the match P ctry is very high in dif-ferent periods of time and different sets of papers considered. Our finding also provides areference point if other data sets are considered. Given the size of the data analyzed, it ishard to manually check the accuracy of the affiliation records of WoS. It would be mean-ingful and interesting to find an automatic approach to perform a large-scale exam on thecorresponding affiliation and author records in WoS data. EFERENCES
Abramo, G., & D’Angelo, C. A. (2015). Evaluating university re-search: Same performance indicator, different rankings.
Jour-nal of Informetrics , (3), 514–525.Adriaanse, L. S., & Rensleigh, C. (2013). Web of science, scopusand google scholar. The Electronic Library .Aghaei Chadegani, A., Salehi, H., Yunus, M., Farhadi, H.,Fooladi, M., Farhadi, M., & Ale Ebrahim, N. (2013). A compar-ison between two main academic literature collections: Web ofscience and scopus databases.
Asian social science , (5), 18–26.Aksnes, D. W., Schneider, J. W., & Gunnarsson, M. (2012). Rank-ing national research systems by citation indicators. a compar-ative analysis using whole and fractionalised counting meth-ods. Journal of Informetrics , (1), 36–43.Apolloni, A., Rouquier, J.-B., & Jensen, P. (2013). Collaborationrange: Effects of geographical proximity on article impact. TheEuropean Physical Journal Special Topics , (6), 1467–1478.B¨orner, K., Penumarthy, S., Meiss, M., & Ke, W. (2006). Mappingthe diffusion of scholarly knowledge among major us researchinstitutions. Scientometrics , (3), 415–426.Bornmann, L., Stefaner, M., de Moya Aneg´on, F., & Mutz, R.(2014). Ranking and mapping of universities and research-focused institutions worldwide based on highly-cited papers. Online Information Review .Chen, W., Zhu, Z., & Jia, T. (2020). The rank boost by inconsis-tency in university rankings: evidence from 14 rankings of chi-nese universities.
Quantitative Science Studies (Just Accepted),1–17.Donner, P., Rimmert, C., & van Eck, N. J. (2020). Comparinginstitutional-level bibliometric research performance indicatorvalues based on different affiliation disambiguation systems.
Quantitative Science Studies , (1), 150–170.Duffy, M. A. (2017). Last and corresponding authorship practicesin ecology. Ecology and evolution , (21), 8876–8887.Falagas, M. E., Pitsouni, E. I., Malietzis, G. A., & Pappas, G.(2008). Comparison of pubmed, scopus, web of science, andgoogle scholar: strengths and weaknesses. The FASEB journal , (2), 338–342.Fortunato, S., Bergstrom, C. T., B¨orner, K., Evans, J. A., Helbing,D., Milojevi´c, S., . . . others (2018). Science of science. Science , (6379).Fox, C. W., Ritchey, J. P., & Paine, C. T. (2018). Patterns of author-ship in ecology and evolution: First, last, and correspondingauthorship vary with gender and geography. Ecology and Evo-lution , (23), 11492–11507.Franceschini, F., Maisano, D., & Mastrogiacomo, L. (2016). Em-pirical analysis and classification of database errors in scopusand web of science. Journal of Informetrics , (4), 933–953.Gauffriau, M., Larsen, P., Maye, I., Roulin-Perriard, A., & von Ins,M. (2007). Publication, cooperation and productivity measuresin scientific research. Scientometrics , (2), 175–214.Gauffriau, M., Larsen, P., Maye, I., Roulin-Perriard, A., & von Ins,M. (2008). Comparisons of results of publication counting us-ing different methods. Scientometrics , (1), 147–176. Gazni, A., & Didegah, F. (2011). Investigating different types ofresearch collaboration and citation impact: a case study of har-vard university’s publications. Scientometrics , (2), 251–265.Gazni, A., Sugimoto, C. R., & Didegah, F. (2012). Mapping worldscientific collaboration: Authors, institutions, and countries. Journal of the American Society for Information Science and Tech-nology , (2), 323–335.Gingras, Y., & Khelfaoui, M. (2018). Assessing the effect of theunited states’“citation advantage” on other countries’ scientificimpact as measured in the web of science (wos) database. Sci-entometrics , (2), 517–532.Gonz´alez-Alcaide, G., Park, J., Huaman´ı, C., & Ramos, J. M.(2017). Dominance and leadership in research activities: Col-laboration between countries of differing human developmentis reflected through authorship order and designation as cor-responding authors in scientific publications. PloS one , (8),e0182513.Gonzalez-Brambila, C. N., Reyes-Gonzalez, L., Veloso, F., &Perez-Ang´on, M. A. (2016). The scientific impact of developingnations. PLoS One , (3), e0151328.Gul, S., Nisa, N. T., Shah, T. A., Gupta, S., Jan, A., & Ahmad, S.(2015). Middle east: research productivity and performanceacross nations. Scientometrics , (2), 1157–1166.Guo, J., Liu, X., Yang, L., & Wu, J. (2019). Are contributions fromchinese physicists undercited? Journal of Data and InformationScience , (4), 84–95.Hagen, N. T. (2014). Counting and comparing publication out-put with and without equalizing and inflationary bias. Journalof Informetrics , (2), 310–317.Han, P., Shi, J., Li, X., Wang, D., Shen, S., & Su, X. (2014). Inter-national collaboration in lis: global trends and networks at thecountry and institution level. Scientometrics , (1), 53–72.Hu, H., Wang, D., & Deng, S. (2020). Global collaboration inartificial intelligence: Bibliometrics and network analysis from1985 to 2019. Journal of Data and Information Science , (ahead-of-print).Huang, M.-H., Lin, C.-S., & Chen, D.-Z. (2011). Counting meth-ods, country rank changes, and counting inflation in the as-sessment of national research productivity and impact. Jour-nal of the American society for information science and technology , (12), 2427–2436.Huang, S., Yang, B., Yan, S., & Rousseau, R. (2014). Institutionname disambiguation for research assessment. Scientometrics , (3), 823–838.Iwami, S., Shimizu, T., Empizo, M. J. F., Gabayno, J. L. F.,Sarukura, N., Fujii, S., & Sumimura, Y. (2020). Current sta-tus and enhancement of collaborative research in the world: Acase study of osaka university. Journal of Data and InformationScience , (4), 75–85.Jacob, F., Javed, F., Zhao, M., & Mcnair, M. (2014). scool: A systemfor academic institution name normalization. In (pp.86–93). ia, T., Wang, D., & Szymanski, B. K. (2017). Quantifying patternsof research-interest evolution. Nature Human Behaviour , (4),1–7.Kahn, S., & MacGarvie, M. (2016). Do return requirements in-crease international knowledge diffusion? evidence from thefulbright program. Research Policy , (6), 1304–1322.Korytkowski, P., & Kulczycki, E. (2019). Publication countingmethods for a national research evaluation exercise. Journal ofInformetrics , (3), 804–816.Larivi`ere, V., Gingras, Y., Sugimoto, C. R., & Tsou, A. (2015). Teamsize matters: Collaboration and scientific impact since 1900. Journal of the Association for Information Science and Technology , (7), 1323–1332.Larsen, P. (2008). The state of the art in publication counting. Scientometrics , (2), 235–251.Levenshtein, V. I. (1966). Binary codes capable of correcting dele-tions, insertions, and reversals. In Soviet physics doklady (Vol. 10,pp. 707–710).Lin, C., Huang, M., & Chen, D. (2013). The influences of count-ing methods on university rankings based on paper count andcitation count.
Journal of Informetrics , (3), 611–621.Liu, L., Yu, J., Huang, J., Xia, F., & Jia, T. (2020). China may need tosupport more small teams in scientific research. arXiv preprintarXiv:2003.01108 .L´opez-Illescas, C., de Moya-Aneg´on, F., & Moed, H. F. (2008).Coverage and citation impact of oncological journals in theweb of science and scopus. Journal of informetrics , (4), 304–316.Man, J. P., Weinkauf, J. G., Tsang, M., & Sin, J. H. D. D. (2004).Why do some countries publish more than others? an interna-tional comparison of research funding, english proficiency andpublication output in highly ranked general medical journals. European journal of epidemiology , (8), 811–817.Mattsson, P., Sundberg, C. J., & Laget, P. (2011). Is correspon-dence reflected in the author position? a bibliometric studyof the relation between corresponding author and byline posi-tion. Scientometrics , (1), 99–105.Mazloumian, A., Helbing, D., Lozano, S., Light, R. P., & B¨orner,K. (2013). Global multi-level analysis of the ‘scientific foodweb’. Scientific reports , (1), 1–5.Milojevi´c, S. (2014). Principles of scientific research team for-mation and evolution. Proceedings of the National Academy ofSciences , (11), 3984–3989.Moya-Aneg´on, F., Guerrero-Bote, V. P., Bornmann, L., & Moed,H. F. (2013). The research guarantors of scientific papers andthe output counting: a promising new approach. Scientomet-rics , (2), 421–434.Nederhof, A. J., & Moed, H. F. (1993). Modeling multinationalpublication: development of an on-line fractionation approachto measure national scientific output. Scientometrics , (1), 39–52.Petersen, A. M., Pan, R. K., Pammolli, F., & Fortunato, S. (2019). Methods to account for citation inflation in research evalua-tion. Research Policy , (7), 1855–1865.Radicchi, F., Fortunato, S., Markines, B., & Vespignani, A. (2009).Diffusion of scientific credits and the ranking of scientists. Physical Review E , (5), 056103.Ranjbar-Sahraei, B., van Eck, N. J., & de Jong, R. (2018). Accuracyof affiliation information in microsoft academic: Implicationsfor institutional level research evaluation. In Sti 2018 confer-ence proceedings: Proceedings of the 23rd international conferenceon science and technology indicators (pp. 1065–1067).Rimmert, C., Schwechheimer, H., & Winterhager, M. (2017).Disambiguation of author addresses in bibliometric databases-technical report.Selten, F., Neylon, C., Huang, C.-K., & Groth, P. (2020). A lon-gitudinal analysis of university rankings.
Quantitative ScienceStudies , (3), 1109–1135.Shen, H.-W., & Barab´asi, A.-L. (2014). Collective credit alloca-tion in science. Proceedings of the National Academy of Sciences , (34), 12325–12330.Smolinsky, L., & Lercher, A. J. (2020). Co-author weighting in bib-liometric methodology and subfields of a scientific discipline. arXiv preprint arXiv:2005.05471 .Thelwall, M., Bailey, C., Makita, M., Sud, P., & Madalli, D. P.(2019). Gender and research publishing in india: Uniformlyhigh inequality? Journal of informetrics , (1), 118–131.van Eck, N. J., & Waltman, L. (2019). Accuracy of citation data inweb of science and scopus. arXiv preprint arXiv:1906.07011 .van Leeuwen, T. (2009). Strength and weakness of nationalscience systems: A bibliometric analysis through cooperationpatterns. Scientometrics , (2), 389–408.Vavryˇcuk, V. (2018). Fair ranking of researchers and researchteams. PloS one , (4), e0195509.Waltman, L., & van Eck, N. J. (2015). Field-normalized citationimpact indicators and the choice of an appropriate countingmethod. Journal of Informetrics , (4), 872–894.Wang, K., Shen, Z., Huang, C., Wu, C.-H., Dong, Y., & Kanakia,A. (2020). Microsoft academic graph: When experts are notenough. Quantitative Science Studies , (1), 396–413.Wang, X., Xu, S., Wang, Z., Peng, L., & Wang, C. (2013). Interna-tional scientific collaboration of china: Collaborating countries,institutions and individuals. Scientometrics , (3), 885–894.Wu, L., Wang, D., & Evans, J. A. (2019). Large teams develop andsmall teams disrupt science and technology. Nature , (7744),378–382.Zacharewicz, T., Lepori, B., Reale, E., & Jonkers, K. (2019).Performance-based research funding in eu member states—acomparative assessment. Science and Public Policy , (1), 105–115.Zeng, A., Shen, Z., Zhou, J., Wu, J., Fan, Y., Wang, Y., & Stanley,H. E. (2017). The science of science: From the perspective ofcomplex systems. Physics Reports , , 1–73. ACKNOWLEDGMENTS e thank Prof. Barabasi at CCNR for giving access to the WOS data. AUTHOR CONTRIBUTIONS