Characterizing Research Leadership on Geographical Weighted Collaboration Network
Characterizing Research Leadership on Geographical Weighted Collaboration Network Chaocheng He a,b , Jiang Wu a* , Qingpeng Zhang b* a. School of Information Management, Wuhan University, Wuhan, Hubei, China. b.
School of Data Science, City University of Hong Kong, Kowloon, Hong Kong, China.
Abstract
Research collaborations, especially long-distance and cross-border collaborations, have become increasingly prevalent worldwide. Recent studies highlighted the significant role of research leadership in collaborations. However, existing measures of the research leadership do not take into account the intensity of leadership in the co-authorship network. More importantly, the spatial features, which influence the collaboration patterns and research outcomes, have not been incorporated in measuring the research leadership. To fill the gap, we construct an institution-level weighted co-authorship network that has two types of weight on the edges: the intensity of collaborations and the spatial score (the geographic distance adjusted by the cross-border nature). Based on this network, we propose a novel metric, namely the spatial research leadership rank (SpatialLeaderRank), to identify the leading institutions while considering both the collaboration intensity and the spatial features. Harnessing a dataset of 323,146 journal publications in pharmaceutical sciences during 2010-2018, we perform a comprehensive analysis of the geographical distribution and dynamic patterns of research leadership flows at the institution level. The results demonstrate that the SpatialLeaderRank outperforms baseline metrics in predicting the scholarly impact of institutions. And the result remains robust in the field of Information Science & Library Science.
Keywords: Social network analysis; Spatial scientometrics; Research leadership Corresponding authors: Jiang Wu ([email protected]); Qingpeng Zhang ([email protected]); All authors contributed equally to the study. . Introduction
In the “Big Science” era, research collaboration is playing an important role in knowledge creation, iteration, and dissemination (Katz and Martin 1997, Gazni, Sugimoto et al. 2012). Thanks to the increasing scale and complexity of scientific projects, we have witnessed a rapid growth in the frequency and influence of collaborative research (Wang, Xu et al. 2013, Chen, Zhang et al. 2019). This trend could be evidenced by the increase in research collaborations at a distance (Hoekman, Frenken et al. 2010, Bercovitz and Feldman 2011). It is important to gain an in-depth understanding of the trend towards collaborations at a longer distance and even cross-border for the following reasons. First, it enhances the quality of research by combining the expertise and resources and sharing the cost (Hoekman, Frenken et al. 2010). Compared with single-authored publications, collaborative work often leads to more novel and higher internal quality control (Fernandez, Ferrandiz et al. 2016). And these benefits increase as the distances between collaborators increases, as potential collaboration partners are more likely to be found (Hoekman, Frenken et al. 2010). Indeed, cross-border collaborative research tends to be of higher quality and greater impact compared with that of local collaborators (Guan, Zuo et al. 2016, Jiang, Zhu et al. 2018). Second, because of the aforementioned benefits, significant public policies and expenditures have been established and allocated to facilitate long-distance research collaborations (Hoekman, Frenken et al. 2010). For example, China’s 12th Five Year (2011-2015) Plan for Science and Technology Development stated that China will actively take part in international science and technology organizations and cross-border research collaborations (Wang and Wang 2017). Similarly, the European government attempts to construct a
European Research Area by coordinating regional, national and EU research activities (Hoekman, Frenken et al. 2010) to improve European states’ internal consistency and to break down barriers to innovate. Third, many countries and regions are making great efforts to attract overseas talents, who further facilitate the cross-border collaborations (Baruffaldi and Landoni 2012). For example, the 1000-talent program of China has attracted more than 7000 full-time and part-time senior scholars by 2018. Many are the leader in the field and 22.5% are non-Chinese researchers (Jia 2018). Within the first five years of the 1000-talent program, China has attracted more full professors than the ast 30 years combined . These talents coming from overseas bridged the gap between the research communities in two countries and greatly improved both the intensity and quality of cross-border collaborations. Existing literature on spatial research collaboration has two main limitations. First, most studies assume that the research collaborations are homogeneous while ignoring the existence of leadership in the collaboration. It is sensible that the first and corresponding author(s) play a leading role in the collaboration, and thus their relationship with other authors should be stronger than those among others (Wang, Wu et al. 2014). There are several studies examining the research leadership (Wang, Wu et al. 2014, González-Alcaide, Park et al. 2017). However, they either do not consider the social network structure of the co-authorship network (Chaocheng, Jiang et al. 2019) or ignores the intensity of collaborations while measuring the leadership (Zhou, Zeng et al. 2018). Second, collaboration relationship is assumed to be evenly distributed to authors of the same article, without considering distances between the institutions. Existing algorithms for identifying key research entities in collaborations mainly focus on the topological features of the entities in the collaboration network (Zhou, Zeng et al. 2018), without incorporating the spatial features into the network. Collaboration at a longer distance has a higher impact than that at a short distance (Hoekman, Frenken et al. 2010). Identifying important research entities in research collaboration from a spatial perspective becomes necessary and essential to knowledge creation as well as dissemination (Forman and van Zeebroeck 2019) (Wu 2013, Copiello 2019). To fill the above research gaps, we extend the literature as follows. First, we model the collaboration relationships as a directed network at the institution level, where the direction of an edge indicates the leadership flow between the two institutions. Second, we incorporate the geographical distance between institutions as a weight on the edge in the collaboration network. Third, based on the constructed geographical weighted collaboration network, we propose a novel metric, namely the spatial research leadership rank (SpatialLeaderRank), to identify the leading institutions while considering both the collaboration intensity and the spatial features. More specifically, an institution is considered with higher leadership status according to the following three criteria: (a) the institution frequently plays the corresponding rule in papers with other institutions; (b) the institution frequently plays the corresponding rule in longer distance and even http://kfq.job1001.com/ ross-border collaborations; (c) the participating institutions have high leadership status themselves (i.e. it is leading other leaders). We exemplify and validate the proposed SpatialLeaderRank metric using the journal publications in the pharmaceutical sciences, a field that has witnessed a dramatic increase in collaborations between multinational scientists in both academia and biotechnology sector because of the need for very diverse expertise from various disciplines and the rise of R&D outsourcing (Herrling 1998, McKelvey, Alm et al. 2003, Plotnikova and Rake 2014). We also examine the SpatialLeaderRank in the Information Science & Library Science as the robustness check. In the following sections, we review the literature in Section 2, and then describe the data and methods in Section 3. The results are presented in Section 4. We conclude the paper with discussions of the limitations and the implications for theory and policymaking in Section 5.
2. Literature review
The existing research on the spatial research collaboration mainly focuses on the spatial pattern of research collaboration and the role of geographical proximity on research collaboration. The spatial pattern of research collaboration has been systematically studied at multiple levels. A recent study illustrated the spatial patterns of cross-border knowledge flows and evaluates the effect of various factors including the geographical factor (Gui, Liu et al. 2018). The establishment of research alliances with more developed countries constituted a critical mechanism, which integrated developing countries into the global research community (González-Alcaide, Park et al. 2017). Researchers in small European states were found to be less homogenously collaborating with both domestic and foreign partners (Ukrainski, Masso et al. 2014). Research activities in Brazil were found to be spatially heterogeneous and a geographical decentralization process of scientific research activities across the country was needed to stimulate the development of those privileged areas (Sidone, Haddad et al. 2017). The role of geographical proximity has been explored in various fields. Plotnikova and Rake (2014) examined the country-level determinants in pharmaceutical research and found that the geographic distance was negatively associated with the cross-border research collaboration. In umanities, arts and social sciences, the geographical distance was found to be critical to the collaborative activities (Luo, Xia et al. 2018). Similar conclusions could be drawn from the field of ecology (Parreira, Machado et al. 2017), and immunology (Lander 2015), and so on. Research collaboration at a distance has become increasingly prevalent (Hoekman, Frenken et al. 2010, Bercovitz and Feldman 2011). In all countries and all academic fields, cross-border research collaboration tends to result in higher publishing rates and impact (Kwiek 2015). Researchers who collaborate at a distance are often better performing researchers (Jonsen, Butler et al. 2013, Wagner, Whetsell et al. 2019). Given the abundant evidence that the spatial features influence the research performance, existing research on collaboration networks did not account for the spatial patterns, and mainly assumed that the edges are homogeneous in the collaboration network.
It is commonly recognized that in the same publication, the first author and the corresponding author often lead the research collaboration and make a major contribution (Wang and Wang 2017). Typical instances include the field of biology and engineering. In the biomedical field, the first author is often an early-career researcher who is assigned to carry out the research and write the research paper. Simultaneously, other non-leading (participating) coauthors act more specialized roles, such as contributing statistical analyses or data visualization (Sekara, Deville et al. 2018). However, corresponding authors should be responsible for both scientific and non-scientific contributions such as designing the roadmap of a project, assigning research tasks, guiding the experiments, and checking the logic of the paper, for coordinating the completion and submission of the work (Wang and Wang 2017). The corresponding author’s responsibility is more prominent with the increase of collaboration scales, the growing complexity and depth of the research (Hemlin, Allwood et al. 2013). In most cases, the first and corresponding authors belong to the same organization (Wang, Wu et al. 2014). Hence, the corresponding author’s affiliation is considered the research leader of a research paper in this study. Chaocheng, Jiang et al. (2019) define that each paper possesses a total leadership mass of 1, and thus the intensity of the research leadership flow from the leading institution 𝑎 to institution 𝑏 in the paper 𝑖 , 𝑅𝐿 𝑎𝑏,𝑖 is obtained as 𝐿 𝑎𝑏,𝑖 = 1𝐿𝐼𝑁 𝑖 × 1𝑁 𝑖 , (1) where 𝐿𝐼𝑁 𝑖 is the number of leading institutions and 𝑁 𝑖 is the number of institutions in paper 𝑖 . The aggregated intensity of RL flow from institution a to institution 𝑏 in all their co-publications, 𝐶 𝑎𝑏 , is defined as follows, 𝐶 𝑎𝑏 = ∑ 𝐶 𝑎𝑏,𝑖𝑀 𝑎𝑏 𝑖=1 , (2) where 𝑀 𝑎𝑏 is the number of papers where 𝑎 is the leading institution and 𝑏 is a participating institution. Institution 𝑎 ’s total research leadership mass (the number of publications where institution 𝑎 is the leading institution), 𝐿𝑀 𝑎 , is defined as follows, 𝐿𝑀 𝑎 = ∑ 𝐶 𝑎𝑏𝐵𝑏=1 , (3) where 𝐵 is the number of institutions that 𝑎 has led. We refer the readers to (Chaocheng, Jiang et al. 2019) for details of the research leadership flow and research leadership mass. PageRank was originally proposed by Google to rank the importance of webpages (Brin and Page 1998). There is a boom in its variation and application in a broad set of fields in the following decades. PageRank has been widely applied to analyzing the research collaboration network. Liu, Bollen et al. (2005) transformed each undirected edge in the research collaboration network into a set of bi-directional, symmetrical edges, and defined modification of PageRank, namely the AuthorRank. Fiala, Rousselot et al. (2008) modified the PageRank by incorporating both citation and co-authorship graph property. Successful as it is, PageRank has several drawbacks (Gleich 2015). The stability of ranking and the robustness to noise and manipulation vary given different parameters (Lu, Chen et al. 2016). Moreover, if there are disconnected components in the network, the ranking result is not unique. To this end, Lu, Zhang et al. (2011) proposed the LeaderRank, an adaptive and nonparametric algorithm, by adding a ground node that bi-directionally connected to every other node and then performing random walks. As a result, LeaderRank has a faster convergence rate, igher stability for noisy data, and more robustness to manipulations (Li, Zhou et al. 2014). LeaderRank is further applied to identify the influential nodes in complex products and systems (Li, Wang et al. 2019); in power grids (Zhou, Lei et al. 2019); in manufacturing services (Wu, Peng et al. 2019). However, it is noteworthy that, all the previous variants of PageRank including LeaderRank only consider the topological features of nodes, while ignoring other non-topological features, especially, the spatial features that is found to be very important to the academic performance and impact.
3. Data and methodology 3.1. Data collection
We leverage the Web of Science Core Citation Database and perform a data collection to construct the geographical weighted and directed network. Specifically, following (Plotnikova and Rake 2014) we collect publications in categories ( “Biochemistry and Molecular Biology”, “Biotechnology and Applied Microbiology’’, “Chemistry, Applied”, “Chemistry, Medicinal”, “Medicine, Research and Experimental”, “Pharmacology and Pharmacy” and “Toxicology”) related to pharmaceutical study during 2010-2018. To check the robustness and have a comprehensive understanding, we also collect the publications in the category of “Information Science & Library Science (ISLS)” and compare the key results in these two very different fields. More precisely, the data was retrieved using the search term “WC = A AND PY = B”, where A is the above sub-categories and B it 2010-2018. We restrict publications of journal articles and exclude other non-journal publications such as meeting abstracts, letters, editorial materials or reviews. We further restrict the data and sample 323,146 publications in pharmaceutical sciences and 28,158 publications in ISLS that have at least two co-institutions. To construct the weighted and directed spatial research leadership network, we extract 2459 institutions in pharmaceutical sciences and 841 institutions in ISLS, which have been the primary affiliation of the corresponding author for at least one paper (with multiple institutions) in each year. We utilize Google Map to obtain the latitude, longitude, and the country of each institution. .2. Geographical weighted and directed network
Spatial distance is one of the classic proximities for research collaborations (Boschma 2005). There are plenty of evidence that the spatial distance is a hinder determinant to the formation of research collaborations (Fernandez, Ferrandiz et al. 2016, Zhang and Guo 2017). Recently, cross-border collaborations are becoming increasingly popular. Cross-border collaboration brings unique benefits (Jiang, Zhu et al. 2018), including the better access to international data (Jonsen, Butler et al. 2013), the higher tendency to stimulate new ideas (Ellis and Zhan 2011), and increased international visibility and impact (Kwiek 2015). Therefore, in this study, we propose to measure the spatial proximity using the following spatial score, which takes an additive form of both the distance and the cross-border nature. For a publication 𝑖 , the spatial score of leading institution 𝑎 and participating institution 𝑏 pair is defined as: 𝑆𝑃𝑆 𝑖,𝑏𝑎 = 𝑙𝑔(𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑏𝑎 ) + 𝜆𝐶𝑜𝑢𝑛𝑡𝑟𝑦 𝑏𝑎 , (4) where 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑏𝑎 denotes the geographical distance between institutions 𝑏 and 𝑎 . 𝐶𝑜𝑢𝑛𝑡𝑟𝑦 𝑏𝑎 is a dummy variable indicating if 𝑏 and 𝑎 are from different countries. 𝜆 represents the relative importance of the cross-border nature of the collaboration. 𝜆 is a field-specific parameter because of the varying occurrence of the cross-border collaboration for different research fields. The value of 𝜆 is obtained by calculating the ratio between the coefficient of 𝐶𝑜𝑢𝑛𝑡𝑟𝑦 𝑏𝑎 and the coefficient of 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑏𝑎 in the gravity model (Fernandez, Ferrandiz et al. 2016, Zhang and Guo 2017), as described by the following equation: 𝐼 𝑎𝑏 = 𝛽 + 𝛽 𝑙𝑔 𝑃𝑢𝑏𝑚𝑎𝑠𝑠 𝑎 + 𝛽 𝑙𝑔 𝑃𝑢𝑏𝑚𝑎𝑠𝑠 𝑏 + 𝛽 𝑙𝑔 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑎𝑏 + 𝛽 𝐶𝑜𝑢𝑛𝑡𝑟𝑦 𝑎𝑏 + ∑ 𝛽 𝑘 𝑙𝑔𝑠 𝑘 + 𝜖 𝑎𝑏𝐾𝑘=5 , (5) where 𝐼 𝑎𝑏 denotes the collaboration intensity, measured by the number of co-publications, 𝑃𝑢𝑏𝑚𝑎𝑠𝑠 𝑎 and 𝑃𝑢𝑏𝑚𝑎𝑠𝑠 𝑏 denote the number of previous publications of institutes 𝑎 and 𝑏 , respectively. 𝑠 𝑘 denotes other dimensions of proximity, including cognitive proximity, social proximity and economic proximity. The gravity model results showed that, during 2010-2018, 𝜆 was 1.32 and 3.23 in pharmaceutical sciences and ISLS, respectively, indicating that the hindering effect of cross-border nature in pharmaceutical sciences was lower than that in ISLS. We can also obtain the evolution of 𝜆 by running the gravity model for each year. Figure 1 shows the yearly volution of 𝜆 during 2010-2018. The value of 𝜆 for pharmaceutical sciences was stable, with a lightly decreasing trend, indicating that the hindering effect of cross-border nature has been declining over time. While the value of 𝜆 for ISLS oscillated (ranging between 2.76 and 3.77). Figure 1 The values of 𝜆 for pharmaceutical sciences and ISLS during 2010-2018. Therefore, the longer the geographical distance is, the smaller the probability of collaboration and the higher the spatial score are. Meanwhile, the cross-border nature positively influences the spatial score. The spatial score of a publication 𝑖 is the average spatial score of the leading institution and participating institution pairs, which can be expressed as 𝑆𝑃𝑆 𝑖 = 1𝑁 𝑎 × 1𝑁 𝑏 × ∑ ∑ 𝑆𝑃𝑆 𝑖,𝑏𝑎𝑁 𝑏 𝑏 , 𝑁 𝑎 𝑎 (6) where 𝑁 𝑏 is the number of institutions and 𝑁 𝑎 is the number of leading institutions (i.e. more than one correspondence institutions). For a publication 𝑖 , we define the spatial research leadership flow intensity from participating institution 𝑏 to leading institution 𝑎 as follows , In (Chaocheng, Jiang et al. 2019), the direction of research leadership is from the leading institution to the participating institution. Differently, to better depict the process of gaining the leadership from leading others, the direction is defined to be from the participating institution to the leading institution. This definition also makes the following calculation more intuitive and consistent with the network science literature. 𝐿 𝑖,𝑏𝑎 = 𝑆𝑃𝑆 𝑖 × 1𝑁 𝑎 × 1𝑁 𝑏 , (7) The total spatial research leadership flow intensity from 𝑏 to 𝑎 is calculated as 𝑆𝐿 𝑏𝑎 = ∑ 𝑆𝐿 𝑖,𝑏𝑎𝑁 𝑏𝑎 𝑖=1 , (8) where 𝑁 𝑏𝑎 is the number of papers where 𝑎 is the leading institution and 𝑏 is a participating institution. Now, we can construct a geographical weighted and directed network where institutions are nodes and research leadership flows are directed edges. The weight on the edge represents the spatial research leadership flows’ intensities between the two institutions. Figure 2 illustrates the construction of the network. Publication 1
Author 1, Institution a , Country A Author 2, Institution b , Country A Author 3, Institution c , Country B Author 4, Institution a , Country A Author 5, Institution d , Country C Corresponding AuthorAuthor 4, Institution a , Country A Publication 2Author 3, Institution c , Country B Author 4, Institution a , Country A Author 5, Institution d , Country C Author 2, Institution b , Country A Corresponding AuthorAuthor 3, Institution c , Country B Publication 3Author 1, Institution a , Country A Author 3, Institution c , Country B Corresponding AuthorAuthor 1, Institution a , Country A a bc dDistance ba =100SL ba =0.96Distance dc =1000 Figure 2. An illustration of a hypothetical geographical weighted and directed network. .3. Spatial research leadership rank
LeaderRank is a simple variant of PageRank but has been widely proved to outperform PageRank regarding ranking effectiveness with good robustness (Li, Zhou et al. 2014).
In LeaderRank, a ground node, which bi-directionally connects to every other node, is added to the existing network with 𝑁 nodes and 𝑀 weighted directed edges. Figure 3 is an illustration of adding the ground node to the weighted and directed spatial research leadership network. Thus, the network consists of 𝑁 + 1 nodes and
𝑀 + 2𝑁 edges, and forms a strongly connected network. LeaderRank performs a standard random walk process to rank every node. Successful as it is, LeaderRank only considers the topological features of the collaboration network, while ignoring other non-topological features, especially, the spatial features, which have been recognized as important factors of the academic performance and research impact. Therefore, we propose a new metric, namely the SpatialLeaderRank, to incorporate the spatial features into the measure of research leadership. Specifically, the SpatialLeaderRank of node 𝑎 at the time step 𝑡 denoted as 𝑆𝑝𝑎𝑡𝑖𝑎𝑙𝐿𝑒𝑎𝑑𝑒𝑟𝑅𝑎𝑛𝑘 𝑎 (𝑡) . Thus, the dynamics of SpatialLeaderRank is described by the following iterative process, 𝑆𝑝𝑎𝑡𝑖𝑎𝑙𝐿𝑒𝑎𝑑𝑒𝑟𝑅𝑎𝑛𝑘 𝑎 (𝑡 + 1) = ∑ [ 𝑆𝐿 𝑏𝑎 ∑ 𝑆𝐿 𝑏𝑐𝑁+1𝑐=1,𝑐≠𝑏 × 𝑆𝑝𝑎𝑡𝑖𝑎𝑙𝐿𝑒𝑎𝑑𝑒𝑟𝑅𝑎𝑛𝑘 𝑏 (𝑡)] 𝑁+1𝑏=1,𝑏≠𝑎 , (9) where 𝑆𝐿 𝑏𝑎 is the spatial research leadership flow intensity from institution 𝑏 to 𝑎 . If there is no research leadership from b to c , 𝑆𝐿 𝑏𝑐 = 0 . The spatial research leadership flow intensity from other institutions to the ground node and from the ground node to other institutions is set to 1 (Lu, Zhang et al. 2011). The process starts with the initialization where all institutions’ SpatialLeaderRank being equal to 1. According to the iterative process described by equation (9), the SpatialLeaderRank value will converge to a unique and steady-state 𝑆𝑝𝑎𝑡𝑖𝑎𝑙𝐿𝑒𝑎𝑑𝑒𝑟𝑅𝑎𝑛𝑘 𝑎 (∞) , ( 𝑎 = 1,2, … , 𝑁, 𝑁 + 1 ). We rank all institutions according to 𝑆𝑝𝑎𝑡𝑖𝑎𝑙𝐿𝑒𝑎𝑑𝑒𝑟𝑅𝑎𝑛𝑘 𝑎 (∞) . eight ba =SL ba = 0.96 W e i gh t g a = W e i gh t a g = Weight gc = Weight cg = 1 Weight gd = Weight dg = 1Weight dc =SL dc =1.11a bc dGround Node g W e i gh t gb = W e i gh t bg = Figure 3 An illustration of adding the ground node to the geographical weighted and directed network to calculate SpatialLeaderRank.
4. Result and analysis
For simplicity, we mainly introduce the results for pharmaceutical sciences, and then summarize the results for ISLS in sub-section 4.2.4.
Figure 4. (a) The yearly boxplot of the distance of research leadership flow during 2010-2018. (b) The overall distribution of the distance of research leadership flow during 2010-2018. Red dots denote the data. The green dashed line is the fit of the power-law distribution.
Figure 5 The power-law distribution of the distance of research leadership flow during 2010-2018.
Figure 4 (a) presents the yearly boxplot of the distance of research leadership flow during 2010-2018, with the blue dotted line representing the median and red dots representing the mean. The increasing trend of both the median and mean indicate that, in general, the research leadership flow distance has been increasing steadily over time. Institutions are leading or being led by institutions located far away with an increasing frequency. Similarly, the distribution of research leadership flows exhibits a long-tail distribution (see Figure 4 (b)), with most research leadership flows occurring among institutions that are geographically close to each other, and in the meantime, the distance of research leadership flows can be very long (cross continent) for a small amount of cases. The data can well fit a power-law distribution
𝑃(𝑘) ∝ 𝑘 −𝛼 , where 𝑘 is the distance and 𝛼 is the exponential parameter as shown in Figure 5. The value of 𝛼 was decreasing over time (from 1.13 in 2010 to 1.10 in 2018), indicating that the decaying speed was increasing. To sum, the long-distance research leadership flow has become increasingly popular during 2010-2018. Figure 6.
The kernel density heat map of the distribution of research leadership mass at the regional level.
To present the change in the research leadership mass in pharmaceutical sciences, we visualize the geographical distribution of research leadership mass in Figure 6, the kernel density heat map of global research leadership mass in two split time periods, 2010-2014 and 2015-2018. The kernel density estimation smooths the spatial coordinates to generate a probability density surface of a set of point locations. For the detailed calculation of the kernel density estimation, please refer to (Downs and Horner 2012). We can identify three main research leadership mass clusters, Northeastern United States, European Union, and Northeastern Asia. More specifically, during 2010-2014, the cluster of European Union covered the largest area and the Northeastern United States had the highest density. In the Northeastern Asia cluster, Japan and Republic of orea led the region. In the Greater China Region, the research leadership mass was mainly distributed in the eastern part. In general, during 2010-2014, the pharmaceutical sciences research was dominated by the most economically developed countries. Differently, during 2015-2018, although these developed countries were still playing the prominent role in leading the pharmaceutical research, Eastern Asia has emerged as a key leader, with multiple significant clusters in China, Japan and Republic of Korea. In particular, the Yangtze River Delta Region (the densest region in the figure, including Shanghai, Jiangsu, Anhui and Zhejiang Provinces), the Jing-Jin-Ji Metropolitan Region (including Beijing, Tianjin and Hebei Provinces), and Sichuan Province have emerged as prominent leaders in pharmaceutical research, because the finest universities and medical schools locate in these regions. Other developing countries also had great improvement, especially India and Iran. In general, the distribution of research leadership mass has become more balanced between western and eastern countries. Figure 7 intuitively presents the detailed evolution of research leadership mass over year with a bump graph, where the width of the bump is proportional to the research leadership mass of the corresponding country or region. We present the top 15 countries/ regions in terms of their research leadership mass. At the first glance, the bump lines of all countries/regions were widening, indicating that the research leadership masses of all were growing during 2010-2018. In line with Figure 6, some significant changes were underway. Mainland China has taken the
Figure 7.
The bump graph of research leadership mass over the year at the country level. .2. Effectiveness analysis of SpatialLeaderRank
To evaluate the effectiveness of the proposed SpatialLeaderRank, we examine the correlation between the SpatialLeaderRank and other conventional indices (indices of collaboration network and publication number). We then perform the receiver operating characteristic (ROC) curve analysis to evaluate the ability of SpatialLeaderRank and other conventional indices to identify the top 5% institutions with high academic impact. We further use
Ksim (Haveliwala 2003) to measure the similarity between each indices’ ranking results and the academic impact rank. Last, we also compare the detailed rank of institutions in terms of SpatialLeaderRank, other conventional indices, and academic impact indices. The conventional indices include PageRank, betweenness centrality, closeness centrality, indegree centrality, and publication number (Wu 2013, Kim and Diesner 2015). And the academic impact indices include citation count, citation-based h-index, altmetrics count, altmetrics-based h-index.
We present the scatter plot between the SpatialLeaderRank and other conventional indices in Figure 8, and the Spearman’s correlation coefficients in Table 1. We find that all pairs of indices are positively correlated. In addition, except for the pair of closeness and other indices, all other pairs of indices are linearly correlated. More specifically, the Spearman’s correlation coefficients for all pairs of indices are larger than 0.833 ( 𝑝 ≤ 0.001 ), indicating that all the indices are highly positively correlated with each other. The correlation between the SpatialLeaderRank and LeaderRank is 0.992, indicating that the rank of institutions is identical for the majority of institutions. This is expected because most research collaborations took place among local institutions (see Figure 4b). The difference is mainly among those highly ranked institutions, and cannot be well captured by the correlation coefficient. We will find the difference between them in Section 4.2.2. Another intuitive observation is that the number of publications has the second-highest correlation with SpatialLeaderRank. It is worth noting that the actual causality cannot be revealed by the correlation analysis. It is interesting to explore the causality between these indices in the future.
Figure 8. The pairwise relationships between SpatialLeaderRank and other indices. Table 1. Spearman correlation coefficient between SpatialLeaderRank and other indices.
SpatialLeaderRank LeaderRank Publication Indegree Betweenness Closeness PageRank SpatialLeaderRank 1.000 0.992 *** *** *** *** *** ***
LeaderRank 0.992 *** *** *** *** *** ***
Publication 0.936 *** *** *** *** *** ***
Indegree 0.904 *** *** *** *** *** ***
Betweenness 0.875 *** *** *** *** *** ***
Closeness 0.858 *** *** *** *** *** ***
PageRank 0.870 *** *** *** *** *** *** *** p<0.001;
We evaluate the effectiveness of the SpatialLeaderRank by using it to predict institutions’ academic impact, which is measured by four common indices. We also compare its performance with that of conventional indices. Citation count is a widely recognized measure of academic impact (Yan and Ding 2011). However, simple citation count is not robust against manipulations Hirsch 2005). To this end, Hirsch (2005) proposed h-index, the maximum value of h papers being cited at least h times for each entity (author, journal, institution, etc.). H-index combines the quantity and quality of publications and has been widely adopted by the scientific community (Lund 2019). Recently, Altmetrics indices have emerged as popular tools to measure academic impact because they are less subject to the publication delays than citation. Similarly, the simple Altmetrics count is not robust against manipulations either. To this end, Askeridis (2018) proposed an Altmetrics-based H-index, namely, Mendeley-based H-index, which replaces the citation count with the view count of Mendeley readers. Mendeley is a free reference manager and academic social network. The view count of Mendeley users has been found to be associated with the later citation count (Aduku, Thelwall et al. 2017), because the view count is crowdsourced from the research community, and thus reflects the academic impact earlier than the citation count. In this study, we have two variants of the H-index, the citation-based index, and the Altmetrics-based H-index. Both of them wouldn’t be large if an entity publishes either many publications with low citation/Altmetrics or very few publications with high citation/Altmetrics. In total, four evaluation metrics (citation, citation-based h-index, Altmetrics, and Altmetrics-based h-index) are employed as the model outcome to evaluate the effectiveness of SpatialLeaderRank. We use the ROC curve analysis to examine the capability of SpatialLeaderRank in identifying top institutions with high academic impact (top 5% in terms of citation, citation-based h-index, Altmetrics, and Altmetrics-based h-index) and compare its performance with that of other conventional indices. ROC curve analysis is a graphic method to illustrate the performance of a binary classification system under different recognition thresholds (Hassan, Imran et al. 2017, Zhang, Wang et al. 2019). The larger AUC (area under the ROC curve) is, the better performance the focal binary classification system has. Figure 9 presents the ROC curve of the SpatialLeaderRank and other indices for identifying the top 5% institutions with high academic impact in terms of citation, citation-based h-index, Altmetrics, and Altmetrics-based h-index. Figure 9 ROC curves for
SpatialLeaderRank and other indices to identify top institutions with high academic impact.
As shown in Figure 9, all the indices have a reasonably high AUC (over 0.900) in terms of all the four evaluation metrics, indicating the generally good performance of all the indices in identifying the top institutions with high academic impact. Note that the high AUC value is expected because of the majority (95%) of institutions are classified as not influential. The roposed SpatialLeaderRank consistently outperforms other indices with all measures of academic impact. This indicates that integrating the spatial features does help to improve the capability in identifying top institutions with high academic impact. The larger SpatialLeaderRank score an institution has, the greater the leadership status that it has, and the higher academic impact that it generates. It’s noteworthy that the AUC gaps between SpatialLeaderRank and other indices are more significant in h-index (citation-based/altmetrics-based) than that in citation/altmetrics counts. This is due to the fact that h-index (both the citation-based/altmetrics-based) is more robust than citation/ altmetrics, and can more closely reflect the academic impact. SpatialLeaderRank has more advantages in identifying top institutions with high academic impact compared with other conventional indices. The ROC curve analysis only examines the overlap of top-k ranked institutions according to different indices, as it considers these top-k ranked institutions as an unordered set). Thus, it is meaningful to adopt another metric to further examine the relative ordering of the top-k ranked institutions. In this study, we adopt the
KSim metric (Haveliwala 2003) which is based on Kendall’s 𝜏 distance measure. Consider a ranking list by an index 𝜏 and a ranking list by an academic impact evaluation metric 𝜏 . Let U be the union of the institutions in 𝜏 and 𝜏 . Let 𝜎 be 𝑈 − 𝜏 and 𝜏 be the extension of 𝜏 , where 𝜏 contains 𝜎 in addition to the existing ranked institutions in 𝜏 . The rank of institutions in 𝜎 is set to have the same ordinal rank in the end of 𝜏 . Similarly, 𝜏 is extended to yield 𝜏 . The Ksim between 𝜏 and 𝜏 is defined as the Kendall’s distance between 𝜏 and 𝜏 , respectively: 𝐾𝑆𝑖𝑚(𝜏 , 𝜏 ) = |(𝑢, 𝑣): 𝜏 , 𝜏 agree on order of (𝑢, 𝑣), 𝑢 ≠ 𝑣|(|𝑈|)(|𝑈| − 1) , (10) In other words,
KSim measures the probability of 𝜏 and 𝜏 ’s agreement on the relative ordering of a randomly selected pair of institutions (𝑢, 𝑣) ∈ 𝑈 × 𝑈. Figure 10.
KSim between each index and four evaluation metrics from top 5 to top 500.
Figure 10 visualizes the
KSim curve between each index and the four academic impact evaluation metrics from the top 5 institutions to the top 500 institutions. In general, SpatialLeaderRank consistently outperforms all other indices. In particular, when N is small, the effect of extreme cases is more significant. In such a case, the advantage of SpatialLeaderRank over other indices is more obvious, indicating that SpatialLeaderRank is more robust to rare extreme cases. As N increases, all indices’ Ksim increase and become stable. Betweenness centrality has the worst performance according to
KSim in all evaluation metrics. Betweenness centrality of an institution is proportional to the number of shortest paths traversing through the institution. The unsatisfactory performance of betweenness centrality is due to the fact that most of the top-ranked institutions by betweenness centrality are actually the local hubs that are connecting to local institutions. Publication count and indegree have similar performance. Interestingly, we find that if the ublications and leading behaviors are mostly associated with local institutions, the academic impact does not get much improved. It’s interesting that closeness centrality has a relatively high
Ksim in altmetrics and particularly, altmetrics-based H-index. In altmetrics-based H-index, when N is larger than 300, closeness centrality’s performance is close to the SpatialLeaderRank, primarily because institutions with high closeness centrality have shorter distance from other institutions, so the spread of research output is faster through the collaboration network. We rank the institutions according to SpatialLeaderRank, the other conventional indices (LeaderRank, piublication, indegree, betweenness, closeness, and PageRank) and academic impact indices (citation, citation-based h-index, Altmetrics, and Altmetrics-based h-index). Table 2 presents the top 20 institutions. A larger table of top 100 institutions has been posted on our Github site . Table 3 presents the rank of the SpatialLeaderRank-based top 20 institutions in other rankings. The top institutions ranked by SpatialLeaderRank are also highly ranked according to all the four academic impact indices. This strong association indicates that leading long-distance and cross-border collaborations generally lead to greater academic impact. Four institutions (Harvard University, University of Oxford, University of Cambridge and University of California San Diego) are among the top 20 according to all indices. Particularly, Harvard University is ranked as the top one institution according to four academic impact indices, and top three according to the other indices. Chinese Academy of Sciences (CAS) and French National Centre for Scientific Research (French: Centre national de la recherche scientifique, CNRS) are two state research organizations in China and France, respectively. These two mega organizations have multiple research institutes, and have published many papers in the field ( https://github.com/chaochehe/Top-100-institutions or further improvement. Similarly, we found that many other Chinese institutions are highly ranked in terms of the conventional indices, however, neither their SpatialLeaderRank nor academic impact is highly ranked. Their publications don’t match their academic impact status, suggesting that Chinese institutions should focus on improving the quality of research instead of quantity only. a b l e . T op i n s tit u ti on s ’r a nk by S p a ti a l L ea d e r R a nk a nd by o t h e r i nd i ce s . S p a ti a l L ea d e r R a nk C h i n e s e A ca d S c i H a r v a r d U n i v C N R S U n i v O x f o r d U n i v T okyo U n i v C a m b r i dg e U n i v T o r on t o U C S D U n i v C op e nh a g e n U C SF N C I U n i v M i c h i g a n U n i v W a s h i ng t on J ohn s H opk i n s U n i v S t a n f o r d U n i v U n i v P e nn D uk e U n i v K yo t o U n i v U n i v I lli no i s U n i v N C a r o li n a “ C A M S ” : C h i n e s e A c ad e m y o f M e d i c a l S c i e n ce s ; “ CN R S ” : C e n t r e na ti ona l d e l a r ec h e r c h e s c i e n tifi qu e ( F r e n c h N a ti ona l C e n t r e f o r S c i e n tifi c R e s e a r c h ); “ C S I C ” : C on s e j o Sup e r i o r d e I n ve s ti ga c i on e s C i e n tífi c a s ( Span i s h N a ti ona l R e s e a r c h C oun c il ); “ HH M I ” : H o w a r d H ugh e s M e d i c a l I n s tit u t e ; “ M GH ” : M a ss a c hu s e tt s G e n e r a l H o s p it a l ; “ M I T” : M a ss a c hu s e tt s I n s tit u t e o f T ec hno l og y ; “ NC I ” : N a ti ona l C an ce r I n s tit u t e , U S A ; “ S J T U ” : Shangha i J i ao T ong U n i ve rs it y ; “ U C L A ” : U n i ve rs it y o f C a lif o r n i a , L o s A ng e l e s ; “ U C S D ” : U n i ve rs it y o f C a lif o r n i a San D i e go ; “ U C S F ” : T h e U n i ve rs it y o f C a lif o r n i a , San F r an c i s c o ; “ U C L” : U n i ve rs it y C o ll e g e L ondon ; “ U C L A ” : U n i ve rs it y o f C a lif o r n i a , L o s A ng e l e s ; “ U N A M ” : N a ti ona l A u t ono m ou s U n i ve rs it y o f M ex i c o ; L ea d e r R a nk C h i n e s e A ca d S c i H a r v a r d U n i v C N R S U n i v T okyo U n i v T o r on t o U n i v C op e nh a g e n C A M S K yo t o U n i v U C S D U n i v O x f o r d U n i v M i c h i g a n U C SF J ohn s H opk i n s U n i v U n i v C a m b r i dg e U n i v P e nn R u ss i a n A ca d S c i U n i v S a o P a u l o U n i v W a s h i ng t on S e ou l N a tl U n i v O s a k a U n i v P ub li ca ti on C h i n e s e A ca d S c i H a r v a r d U n i v R u ss i a n A ca d S c i U n i v S a o P a u l o S J T U U n i v T okyo C N R S U n i v T o r on t o S e ou l N a tl U n i v Z h e ji a ng U n i v F ud a n U n i v U n i v C op e nh a g e n U C S D U n i v M i c h i g a n U n i v O x f o r d P e k i ng U n i v U n i v I lli no i s U n i v N C a r o li n a S un Y a t S e n U n i v U n i v C a m b r i dg e I nd e g r ee C h i n e s e A ca d S c i H a r v a r d U n i v U n i v M i c h i g a n U C S D U n i v O x f o r d U n i v C a m b r i dg e U n i v C op e nh a g e n U n i v I lli no i s U n i v S a o P a u l o Z h e ji a ng U n i v C S I C U n i v T okyo J ohn s H opk i n s U n i v P e k i ng U n i v U n i v N C a r o li n a S t a n f o r d U n i v U n i v M a r y l a nd S J T U U n i v P itt s bu r gh U n i v F l o r i d a B e t w ee nn e ss C h i n e s e A ca d S c i H a r v a r d U n i v U n i v S a o P a u l o U n i v T okyo C S I C U n i v O x f o r d K i ng S a ud U n i v U n i v C a m b r i dg e K yo t o U n i v U n i v M i c h i g a n U n i v C op e nh a g e n U C S D Z h e ji a ng U n i v U n i v I lli no i s K a r o li n s k a I n s t C N R S J ohn s H opk i n s U n i v N C I U C L S e ou l N a tl U n i v C l o s e n e ss C h i n e s e A ca d S c i H a r v a r d U n i v U C S D U n i v M i c h i g a n U n i v O x f o r d U n i v C a m b r i dg e U n i v I lli no i s S t a n f o r d U n i v U n i v C op e nh a g e n N C I U n i v P e nn Y a l e U n i v J ohn s H opk i n s U n i v U n i v T o r on t o O h i o S t a t e U n i v U n i v W i s c on s i n U C SF F ud a n U n i v M c G ill U n i v U n i v W a s h i ng t on P a g e r a nk C h i n e s e A ca d S c i U n i v S a o P a u l o H a r v a r d U n i v C S I C U C S D U n i v O x f o r d U n i v M i c h i g a n Z h e ji a ng U n i v U n i v C a m b r i dg e U n i v T okyo U n i v C op e nh a g e n P e k i ng U n i v U n i v I lli no i s K yo t o U n i v S J T U J ohn s H opk i n s U n i v UNA M K a r o li n s k a I n s t U n i v N C a r o li n a O h i o S t a t e U n i v C it a ti on H a r v a r d U n i v C h i n e s e A ca d S c i U n i v C a m b r i dg e U n i v O x f o r d U C S D U C SF S t a n f o r d U n i v U n i v T o r on t o M I T U n i v P e nn C N R S U n i v C a li f B e r k e l e y U n i v C op e nh a g e n U n i v M i c h i g a n U n i v T okyo U n i v W a s h i ng t on D uk e U n i v U C L A J ohn s H opk i n s U n i v Y a l e U n i v C it a ti on - b a s e d h - i n e dx H a r v a r d U n i v M I T U C SF S t a n f o r d U n i v U C S D U n i v C a m b r i dg e U n i v C a li f B e r k e l e y U n i v T o r on t o U n i v W a s h i ng t on U n i v P e nn M GH Y a l e U n i v U n i v O x f o r d C h i n e s e A ca d S c i J ohn s H opk i n s U n i v HH M I U n i v C op e nh a g e n U n i v M i c h i g a n C o l u m b i a U n i v C N R S A lt m e t r i c s H a r v a r d U n i v S t a n f o r d U n i v M I T U n i v C a m b r i dg e U C SF U C S D U n i v O x f o r d U n i v C a li f B e r k e l e y C h i n e s e A ca d S c i U n i v T o r on t o Y a l e U n i v U n i v P e nn U n i v W a s h i ng t on U n i v C op e nh a g e n U n i v M i c h i g a n U C L U C L A J ohn s H opk i n s U n i v C N R S D uk e U n i v A lt m e t r i c s - b a s e d h - i nd e x H a r v a r d U n i v S t a n f o r d U n i v M I T U C SF U C S D U n i v C a li f B e r k e l e y U n i v C a m b r i dg e U n i v O x f o r d Y a l e U n i v HH M I U n i v P e nn U n i v T o r on t o M GH U n i v W a s h i ng t on U C L U C L A C o l u m b i a U n i v J ohn s H opk i n s U n i v W a s h i ng t on U n i v D uk e U n i v T a b l e . C o m p a r i s on o f i n s tit u ti on r a nk by S p a ti a l L ea d e r R a nk a nd o t h e r i nd i ce s . P a g e R a nk C l o s e n e ss B e t w ee nn e ss I nd e g r ee P ub li ca ti on L ea d e r R a nk A lt m e t r i c s - b a s e d h - i nd e x A lt m e t r i c s C it a ti on - b a s e d h - i nd e x C it a ti on S p a ti a l L ea d e r R a nk I n s tit u ti on C h i n e s e A ca d S c i H a r v a r d U n i v C N R S U n i v O x f o r d U n i v T okyo U n i v C a m b r i dg e U n i v T o r on t o U C S D U n i v C op e nh a g e n U C SF N C I U n i v M i c h i g a n U n i v W a s h i ng t on J ohn s H opk i n s U n i v S t a n f o r d U n i v U n i v P e nn D uk e U n i v K yo t o U n i v U n i v I lli no i s U n i v N C a r o li n a .2.4. Results in Information Science & Library Science To check the robustness of the study, we implement the same ROC curve analysis and Ksim measure analysis in ISLS field, as shown in Figure 11 and Figure 12. The results are consistent with those for pharmaceutical sciences. The proposed SpatialLeaderRank consistently outperformed other indices in predicting the academic impact of institutions. Similar to Section 4.2.3, we present the top 20 institutions in ISLS in Tables 4. A larger table of top 100 institutions is posted on Github site . Figure 11 ROC curve for
SpatialLeaderRank and other indices to identify top 5% institutions in
ISLS. https://github.com/chaochehe/Top-100-institutions Figure 12
KSim between each index and four evaluation metrics from top 5 to top 500 in
ISLS. a b l e . T op i n s tit u ti on s ’r a nk by S p a ti a l L ea d e r R a nk a nd by o t h e r i nd i ce s i n I S L S . S p a ti a l L ea d e r R a nk U n i v I lli no i s C it y U n i v H ong K ong U n i v W a s h i ng t on U n i v W i s c on s i n U n i v M i c h i g a n I nd i a n a U n i v U n i v T e x a s A u s ti n G e o r g i a S t a t e U n i v H a r v a r d U n i v U n i v N C a r o li n a U n i v B r iti s h C o l u m b i a U n i v M a r y l a nd N a tl U n i v S i ng a po r e U n i v P itt s bu r gh N a ny a ng T ec hno l U n i v U n i v A m s t e r d a m U n i v M a n c h e s t e r G e o r g i a I n s t T ec hno l M c g ill U n i v W uh a n U n i v L ea d e r R a nk W uh a n U n i v U n i v G r a n a d a C it y U n i v H ong K ong I nd i a n a U n i v U n i v W i s c on s i n C h i n e s e A ca d S c i U n i v A m s t e r d a m U n i v I lli no i s N a ny a ng T ec hno l U n i v U n i v N C a r o li n a M c g ill U n i v A r i z on a S t a t e U n i v U n i v M a r y l a nd N a tl U n i v S i ng a po r e Z h e ji a ng U n i v N a n ji ng U n i v U n i v A r i z on a U n i v M i nn e s o t a U n i v P o lit ec n V a l e n c i a M a x P l a n c k G e s e ll P ub li ca ti on W uh a n U n i v U n i v I lli no i s I nd i a n a U n i v U n i v G r a n a d a N a ny a ng T ec hno l U n i v U n i v W i s c on s i n U n i v M a l a y a R u ss i a n A ca d S c i U n i v N C a r o li n a U n i v M a r y l a nd U n i v C a r l o s I ii M a d r i d U n i v F e d M i n a s G e r a i s U n i v C o m p l u t e n s e M a d r i d U n i v A m s t e r d a m U n i v S h e ff i e l d C h i n e s e A ca d S c i M c g ill U n i v P e nn S t a t e U n i v N a tl T a i w a n U n i v U n i v F e d S a n t a C a t a r i n a I nd e g r ee W uh a n U n i v I nd i a n a U n i v U n i v W i s c on s i n U n i v G r a n a d a C it y U n i v H ong K ong U n i v I lli no i s B e i h a ng U n i v U n i v C a r l o s I ii M a d r i d N a ny a ng T ec hno l U n i v C h i n e s e A ca d S c i U n i v M a r y l a nd U n i v A m s t e r d a m N a tl T a i w a n U n i v N a tl U n i v S i ng a po r e U n i v N C a r o li n a Y on s e i U n i v A r i z on a S t a t e U n i v U n i v S h e ff i e l d U n i v M i nn e s o t a D r e x e l U n i v B e t w ee nn e ss U n i v W i s c on s i n U n i v I lli no i s I nd i a n a U n i v U n i v S h e ff i e l d W uh a n U n i v U n i v A m s t e r d a m U n i v M a r y l a nd U n i v N C a r o li n a U n i v C a r l o s I ii M a d r i d M c g ill U n i v C it y U n i v H ong K ong U n i v W e s t e r n O n t a r i o U n i v G r a n a d a U n i v T e x a s A u s ti n U n i v W a s h i ng t on G e o r g i a S t a t e U n i v N a tl U n i v S i ng a po r e U n i v M i c h i g a n C h i n e s e A ca d S c i N a ny a ng T ec hno l U n i v C l o s e n e ss W uh a n U n i v I nd i a n a U n i v U n i v W i s c on s i n N a ny a ng T ec hno l U n i v C it y U n i v H ong K ong U n i v I lli no i s M c g ill U n i v C h i n e s e A ca d S c i U n i v A r i z on a U n i v A m s t e r d a m U n i v S h e ff i e l d U n i v N C a r o li n a A r i z on a S t a t e U n i v U n i v B r iti s h C o l u m b i a U n i v M a r y l a nd U n i v H ong K ong N a tl U n i v S i ng a po r e U n i v W e s t e r n O n t a r i o U n i v M i c h i g a n U n i v M i nn e s o t a P a g e r a nk I nd i a n a U n i v U n i v W i s c on s i n W uh a n U n i v C it y U n i v H ong K ong U n i v I lli no i s U n i v M a r y l a nd N a ny a ng T ec hno l U n i v A r i z on a S t a t e U n i v U n i v A r i z on a U n i v A m s t e r d a m U n i v T e x a s A u s ti n C h i n e s e A ca d S c i M c g ill U n i v U n i v N C a r o li n a U n i v M i nn e s o t a U n i v P itt s bu r gh U n i v M i c h i g a n G e o r g i a S t a t e U n i v U n i v C a r l o s I ii M a d r i d U n i v H ong K ong it a ti on U n i v M a r y l a nd I nd i a n a U n i v L e i d e n U n i v U n i v A r i z on a U n i v A r k a n s a s W o l v e r h a m p t on U n i v U n i v A m s t e r d a m C it y U n i v H ong K ong V a nd e r b ilt U n i v H a r v a r d U n i v U n i v W i s c on s i n T e m p l e U n i v G e o r g i a S t a t e U n i v U n i v I lli no i s W uh a n U n i v U n i v G r a n a d a U n i v B r iti s h C o l u m b i a U n i v N C a r o li n a U n i v M i c h i g a n U n i v W a s h i ng t on C it a ti on - b a s e d h - i n e dx U n i v M a r y l a nd L e i d e n U n i v I nd i a n a U n i v U n i v A m s t e r d a m C it y U n i v H ong K ong W o l v e r h a m p t on U n i v V a nd e r b ilt U n i v H a r v a r d U n i v U n i v A r k a n s a s U n i v W i s c on s i n T e m p l e U n i v G e o r g i a S t a t e U n i v U n i v A r i z on a U n i v W a s h i ng t on M i c h i g a n S t a t e U n i v U n i v I lli no i s U n i v G r a n a d a N o r t h w e s t e r n U n i v U n i v T e x a s A u s ti n U n i v N C a r o li n a A lt m e t r i c s U n i v M a r y l a nd W o l v e r h a m p t on U n i v L e i d e n U n i v D e l f t U n i v T ec hno l U n i v S h e ff i e l d U n i v M i c h i g a n I nd i a n a U n i v V a nd e r b ilt U n i v B r un e l U n i v U n i v A m s t e r d a m U n i v N C a r o li n a U n i v G r a n a d a U n i v W i s c on s i n C it y U n i v H ong K ong U n i v I lli no i s U n i v S ydn e y N a ny a ng T ec hno l U n i v U n i v T w e n t e U n i v M a l a y a H a r v a r d U n i v A lt m e t r i c s - b a s e d h - i nd e x V a nd e r b ilt U n i v H a r v a r d U n i v I nd i a n a U n i v U n i v M a r y l a nd U n i v M i c h i g a n W o l v e r h a m p t on U n i v C it y U n i v H ong K ong U n i v A m s t e r d a m L e i d e n U n i v C o l u m b i a U n i v U n i v W i s c on s i n U n i v W a s h i ng t on D e l f t U n i v T ec hno l U n i v G r a n a d a S t a n f o r d U n i v U n i v S h e ff i e l d U n i v S ydn e y U n i v N C a r o li n a G e o r g i a S t a t e U n i v U n i v M a n c h e s t e r . Discussions and conclusions In this paper, to address the spatial bias in research leadership flows in research collaboration, we examine the spatial distribution and the dynamic trend of research leadership flows in pharmaceutical sciences. We find that the distribution of research leadership flow distance presents a long-tail effect. In general, institutions are leading others at a growing distance. We observe that developing countries have been playing an increasingly important role in leading the research in pharmaceutical sciences. Then, we construct a geographically-weighted and directed network, based on which we propose the SpatialLeaderRank. SpatialLeaderRank ranks the institutions integrating both topological features and spatial features. Comprehensive experiments with the data in both pharmaceutical sciences and ISLS demonstrate the superior performance of the proposed SpatialLeaderRank in predicting the academic impact of institutions. Leading institutions are identified and presented. This study sheds light on the important association between long-distance and cross-border collaborations and academic impact. With the growing trend of cross-border collaborations, the distance between collaborators, particularly between the research leader and participators, should be an integral part to consider while examining the research leadership. From a policy perspective, we found a clear rebalancing process between the research leadership mass in developed and developing countries. A number of eastern Asian countries, particularly China, is quickly emerging as a new global leader in pharmaceutical sciences. There are two main reasons for the change. First, the expenditure on research has been increasing rapidly in China (Basu, Foland et al. 2018). As of 2018, the research and development (R&D) expenditure in China was 1967.79 billion CNY , a 179% increase from 2010. China recently passed the European Union in R&D investment (Basu, Foland et al. 2018). Meanwhile, the research funding in the United States only increased by 57% . Apparently, with abundant funding, Chinese institutions are playing the role as the leader more often. Second, the evaluation of research output in China is primarily based on quantitative measures such as publication number, the impact factor f journals, and citation count. The emphasis on these quantitative measures drove the whole academic community to publish as many papers as possible. Despite the success in publication number, the academic impact of Chinese institutions is still laid behind the Western institutions. It is suggested that policymakers in China shift the focus of the research evaluation towards the actual academic impact from quantitative measures. On 18/02/2020, the Ministry of Science and Technology and the Ministry of Education of China jointly published an announcement to urge the Chinese institutions to adopt a more scientific and influence-driven research evaluation approach . This indicates the start of transformation from quantity to quality in the pharmaceutical sciences in China. For the Western countries, including Europe and the United States, they are still playing the role as the major leaders in the field, more research expenditure is needed to maintain a good status. In general, cross-border collaboration is playing an increasingly important role in pharmaceutical research. Given the higher impact of long-distance collaborations, cross-border collaboration should be encouraged by means of joint-funding schemes and academic exchanges. On the other hand, although our research suggests the long-distance collaboration is beneficial, it is not appropriate to directly take the spatial features in the evaluation of academic performance, as it is easy to manipulate. Long-distance and cross-border collaborations shall be encouraged to increase the chance to generate high impact research, but the impact of research should be purely based on its scientific merit. The proposed SpatialLeaderRank is a general method that can be used to evaluate the leadership at the author and country levels. It is also applicable to examine other types of relationships, such as to evaluate the academic influence of scientific journals, to identify the key innovator in the industry by analyzing the patent-citation network and to identify the key moderator in the financial system by analyzing the guarantee-relationship network. eferenceeference