[PDF] Quantifying the Impact of Scholarly Papers Based on Higher-Order Weighted Citations

Abstract

Quantifying the impact of a scholarly paper is of great significance, yet the effect of geographical distance of cited papers has not been explored. In this paper, we examine 30,596 papers published in Physical Review C, and identify the relationship between citations and geographical distances between author affiliations. Subsequently, a relative citation weight is applied to assess the impact of a scholarly paper. A higher-order weighted quantum PageRank algorithm is also developed to address the behavior of multiple step citation flow. Capturing the citation dynamics with higher-order dependencies reveals the actual impact of papers, including necessary self-citations that are sometimes excluded in prior studies. Quantum PageRank is utilized in this paper to help differentiating nodes whose PageRank values are identical.

Full PDF

QQuantifying the Impact of Scholarly Papers Based onHigher-Order Weighted Citations

Xiaomei Bai , Fuli Zhang , Jie Hou , Ivan Lee , Xiangjie Kong , Amr Tolba ,Feng Xia Abstract

Quantifying the impact of a scholarly paper is of great significance, yet the effect ofgeographical distance of cited papers has not been explored. In this paper, we examine30,596 papers published in Physical Review C, and identify the relationship betweencitations and geographical distances between author affiliations. Subsequently, a relativecitation weight is applied to assess the impact of a scholarly paper. A higher-orderweighted quantum PageRank algorithm is also developed to address the behavior ofmultiple step citation flow. Capturing the citation dynamics with higher-orderdependencies reveals the actual impact of papers, including necessary self-citations thatare sometimes excluded in prior studies. Quantum PageRank is utilized in this paper tohelp differentiating nodes whose PageRank values are identical.

Introduction

With the rapidly growth of scholarly big data [1], there’s a crucial need to quantify theimpact of scholarly papers, to assess the performance of individual scholars, institutions,even for countries [2]. Currently, the impact of scholarly paper is mainly divided intotwo categories: unstructured metrics and structured metrics [3]. Unstructured metricsevaluate the impact of scholarly paper from a statistical point of view. Citations [4] [5]are the most representative unstructured metrics, with examples such as the H-index [6],the g-index [7], and the impact factor (IF) [8]. As an alternative measure of scientificimpact, Xia et al. [9] have investigated scholarly impact reflected on social media, andexplore the correlation between citations and messages/tweets on Facebook andTweeter. The structured metrics mainly consider the importance of scholarly entities inscholarly network, such as citation network, co-authors network, author-paper network,etc. PageRank [10], a seminal example of structured metrics, has attracted growing

PLOS a r X i v : . [ c s . D L ] A ug ttentions in scholarly impact evaluation. Sayyadi et al. [11] have estimated futureprestige scores of scholarly papers via the following three features: citations, publicationdate, and authorship. Wang et al. [12] have quantified the impact of scholarly papers byapplying PageRank and HITS [13] on citation network, author-paper network, andjournal-paper network. In the unweighted structured metrics, all citations are treatedwith equal importance. An alternative approach is to evaluate the impact of scholarlypapers by time-aware weighted citation network [14]. In another study, Shah et al. [15]has proposed the S-index metric to model the influence prorogation by a weightedpaper-paper citation networks. This paper applies a hierarchical model between theciting paper and the cited paper, thus the impact of a scholarly paper decayed rapidlyover different hierarchical levels.One potential problem for unstructured and structured metrics is that the impact ofindividual papers can be manipulated. For instance, aggressive self-citations orinduced-citations may lead to an inflated impact. Bai et al. [16] has evaluated theimpact of scholarly papers using a weighted citation network, in which Conflict ofInterest citation relationships are identified and the citation strengths are weakened.Another potential problem with structured metrics is that little is known how actualgeographic distance influences the impact of scholarly paper, and how higher-orderdependencies in citation networks react to the impact of scholarly paper. Liben-Nowellet al. [17] investigated the relationship between geographical distance and friendship inthe LiveJournal network, indicating that geographical proximity can indeed increase theprobability of friendship. This proved that social network attributes and geographicaldistance is related, which is an important aspect of the theory of small world. Aprevious research found a strong linear relationship between institutions anddistance [18]. Schubert et al. [19] revealed that geopolitical location, cultural relationsand language are important factors in shaping preference of cross-citation. Wu [20]investigated citing distances, citation patterns and spatial diversity to exploregeographical knowledge diffusion. Albarran et al. [21] found economic, political,sociological and intellectual factors were influencing the shaping of their citationdistributions and the research performance of countries. A geographic analysis ofcitation flows between cities is helpful to uncover how new scientific paradigms spread,and understand how quickly a new research gets recognized by academic circle indifferent geographical areas [22]. Bai et al. [23] explored the relationships betweencitations and the actual geographic location of institutions for evaluating the impact ofscholarly papers. Based on the previous work, we further explore the relationshipsbetween them, and construct a relative weight to represent the importance of citation.The concept of higher-order dependencies has been introduced by Xu et al. [24] toensure the correctness of network analysis. The higher-order dependencies mean that,when movements are simulated on the network, the next movement depends on severalprevious steps. The higher-order dependencies are widely applied to model variousapplications, including Web browsing behaviors [25], vehicle and human movements [26],stock market [27], etc. Bohlin et al. [28] have modelled citation flow between journals,and remembered their previous steps, corresponding to the zero-, first-, andsecond-order Markov models. Previous researchers evaluate the impact of papers basedon the original citation network, ignoring the influence of multiple step citation flow onthe impact of papers. In this paper, we construct a higher-order citation network, andapply the hierarchical citation structure to quantify the impact of scholarly papers.Once a citation network is constructed, evaluation methods such as PageRank orHITS can be applied. Although PageRank was introduced to rank Web pages, thealgorithm has been deployed in many applications such as finding important nodes innetworks [29], measuring impacts of scholarly papers [30], evaluating impacts ofscholars [31] or journals [32], as well as various applications in social networks [33] and PLOS α .When the α value is different, the evaluation results will be changed accordingly.To address the limitation of PageRank, Paparo et al. [37] have proposed thequantum PageRank algorithm to unambiguously identify the underlying topology ofnetworks. The quantum PageRank algorithm clearly highlights the structure ofsecondary hubs in scale-free networks. It recognizes the hierarchical structure inscale-free networks, amplifying the difference of important degree of nodes. Thealgorithm mainly consists of the following parts: (1) The input state of the algorithm isconstructed based on the transition matrix of PageRank. (2) Construct the unitarymatrix and transfer matrix to generate the total transformation matrix. (3) In order toobtain the probability of particle appearing in each node, square of total transformationmatrix is used to update the initial state. (4) Calculate m times average value for eachnode of a given network, namely, the quantum PageRank value.This paper analyzes temporal and geographical attributes of publications andcitations, addresses the limitation of conventional techniques in quantifying the impactof scholarly papers, and the main contributions of this paper are summarized as follows:(1) Identifying the relationship between citations and geographic locations of affiliations.(2) Introducing a relative citation weight based on geographical distance betweeninstitutions to better quantify the impact of scholarly papers. (3) Exploringhigher-order dependencies in citation networks. (4) Developing the higher-orderweighted quantum PageRank algorithm to rank the impact of scholarly papers. Methods

Citation between institutions

Fig 1 shows the citation relationship between institutions using two statistical analysistechniques: grouping analysis [38, 39] and clustering analysis [40]. Red dots representinstitutions, and the links between institutions represent citations. Fig 1A shows thatthe citation relationship between different institutions by grouping analysis. Thenumber of institutions is about 200 by clustering analysis. As Fig 1 shows, theinstitutions between six continents cite each other. In particular, the citation betweenNorth America and Europe is more frequent compared to between other continents inthe field of physics. Fig 1B can more clearly show the frequency of citation. (cid:11) (cid:36) (cid:12)(cid:3)(cid:42)(cid:85)(cid:82)(cid:88)(cid:83)(cid:76)(cid:81)(cid:74)(cid:3)(cid:68)(cid:81)(cid:68)(cid:79)(cid:92)(cid:86)(cid:76)(cid:86) (cid:11) (cid:37) (cid:12)(cid:3)(cid:38)(cid:79)(cid:88)(cid:86)(cid:87)(cid:72)(cid:85)(cid:76)(cid:81)(cid:74)(cid:3)(cid:68)(cid:81)(cid:68)(cid:79)(cid:92)(cid:86)(cid:76)(cid:86)

Fig 1.

Visualizing citations between institutions.

PLOS elative citation weight geographical distance : Let I represent a set of institutions, I = { I , I , · · · , I a , · · · , I b · · · } , and D represent the geographic distance between twoinstitutions I a and I b . By approximating the geographic distance using the sphericalmodel, D can be formulated as: D I a ,I b = 2 R · (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) arcsin (cid:115) sin (cid:18) | ∆ θ | (cid:19) + cos ( θ a ) · cos ( θ b ) · sin (cid:18) | ∆ φ | (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (1)where R is the radius of the earth, θ a and θ b are the latitudes of I a and I b , φ a and φ b are the longitudes of I a and I b . ∆ θ is the differences of latitudes between I a and I b ,∆ θ = θ a − θ b . ∆ φ is the difference of longitudes between I a and I b , and ∆ φ = φ a − φ b .While physical distance increases communication barrier for physical interactionsbetween the author and citing researcher, it is expected that citation counts decline overthe geographical distance that separates the researchers. (Further discussions can befound in the Discussion Section.) The decline pattern is modelled with an exponentialdecay, according to the following equation: y = y + A e − xt , (2)where y represents the citation count, y is a constant representing an offset of thecitation count, x is the physical distance separating the researchers, whereas t represents a scaling factor. A is the default number of citation less the offset when theauthor and the citing researcher co-locate at the same physical location. Experimentalresults of the citation pattern are presented in the Results Section.Upon identifying the citation pattern, we construct a relative citation weight toquantify the impact of scholarly papers. We consider the citation network at theinstitution-level, in which each institution has its actual latitude and longitude.Institutions are identified with nodes, and an edge exists between two institutions ifthey have citation relationships. In the citation network, the relative citation weightbetween two institutions, W I a ,I b , is defined as: W I a ,I b = D I a ,I b max m,n ∈ G D I m ,I n , (3)where G represent the set of all institutions, and I m and I n denotes any two individualinstitutions and m (cid:54) = n . D I m ,I n represents the geographic distance of two differentinstitutions. max D I m ,I n indicates the maximum geographic distance betweeninstitutions. Higher-order weighted quantum PageRank

In this section, we introduce the proposed higher-order quantum PageRank algorithm.Firstly, we construct higher-order dependencies in citation network. The specific processis as follows: (1) We use the random walk method to find the citation chain from theoriginal citation network to identify the higher-order dependence of the citationrelationships among the papers. (2) We traverse all citation chains and add up thenumber of occurrences of each order citation of all nodes in the chain. Citation chainscan navigate backwards and forwards to build up a picture of the intellectual base abouta topic [41]. (3) In the case of different orders, we calculate the probability of each nodeciting other articles separately. (4) In the different orders, we compare the probabilityof occurrence in the same citation relationship. If the probability change is large in

PLOS G according to all the generatedcitation relationships.Given a directed graph with M nodes, i | k indicates the k th order of node i . N i → j indicates the number of occurrences that node i cites node j . The probability of node i transferring to its neighboring nodes is defined as: P i | k → j = N i | k → j (cid:80) Mt =1 N i | k → t , (4)where k ∈ [2 , order ] with order shows the highest order, and t ranges from 1 to M .In order to calculate the probability, the K-L divergence value D ( P i ) needs to beobtained: D ( P i ) = M (cid:88) j =1 P i | k → j log P i | k → j P i → j . (5)If K-L divergence value D ( P i ) of node i is bigger than klog (cid:80) Mt =1 N i | k → t , using i | k replacesthe previous node i , and node i will obtain an updated transition probability.Secondly, we calculate the transfer matrix G according to the directed graph with N nodes. Subsequently, we need to construct the initial state, namely, input state. Thedetail is as follows: (1) | i (cid:105)| j (cid:105) represents the direct edge that the node i points the node j . G k,i indicates the probability of node i to node k , where i, k ∈ [0 , N − | ψ j (cid:105) := | j (cid:105) ⊗ N − (cid:88) k =0 (cid:112) G kj | k (cid:105) , (6)where | ψ j (cid:105) indicates a superposition of the vectors, which represents outgoing edgesfrom node j .The stochastic pattern of the vectors | ψ j (cid:105) for j = 0 , , , . . . , N − G . These vectors form an N -dimensional orthonormal set of vectors, and theyare used as the initial state of quantum walk.Then, we need to construct the unitary matrix π and the transfer matrix S to obtainthe general transform matrix. The unitary matrix π is π = 2 N − (cid:88) j =0 | ψ j (cid:105)(cid:104) ψ j | − E. (7)The transfer matrix S is used to move a quantum particle from node j to node k : S = N − (cid:88) j,k =0 | jk (cid:105)(cid:104) kj | . (8)The general transform matrix is defined as U = πS. (9)As the directions of the edges of the graph need to be swapped for an even numberof times, we use U to update the initial state | ψ (cid:105) each time. Then we calculate the PLOS i . The probability that the particles willappear at node i after m times of walking, P i,m , can be obtained using the followingformula: P i,m = (cid:104) ψ i | U m † · U m | ψ i (cid:105) , (10)where U m indicates U iteration m times, U m † is the transpose of U m .Finally, in order to guarantee a probabilistic interpretation of high order quantumPageRank, we conduct the following process N − (cid:88) i =0 P i,m = N − (cid:88) i =0 (cid:104) ψ i | U m † · U m | ψ i (cid:105) = 1 , ∀ m. (11) P i,m can be interpreted as the relative importance degree of node i , and it can be foundby calculating the probability of a quantum walker on node i . Thus, the impact score ofeach scholarly paper can be calculated from the P i,m value, as shown in Eq (12). Definition of a scholarly paper impact

Based on the observation that citations are inversely related to the geographicaldistance following an exponential distribution, the impact of each scholarly paper isdefined as its average higher-order weighted quantum PageRank value: S ( P i ) = (cid:104) P i,m (cid:105) := 1 M M (cid:88) m =1 P i,m , (12)where S ( P i ) represents the prestige score of a scholarly paper, (cid:104) P i,m (cid:105) represents theaverage value of higher-order weighted quantum PageRank scores, M represents theiteration number of the algorithm, and P i,m indicates the m -th value of higher-orderweighted quantum PageRank scores. The concept of the prestige score is inherited fromQuantum Google algorithm [37], with the importance of a node corresponds to theprestige score of a scholarly paper in our work. Data description

Our experiments are conducted on the Physical Review C (PRC) data set, a subset ofthe American Physical Society (APS) data set (http://publish.aps.org/datasets). PRCconsists of 34,443 papers, and each paper includes details of title, author name andaffiliation, date of publication, and a list of cited papers. Then, 3,587 papers withoutcitation details from the PRC data set are removed. Overall, 212,421 citations areidentified from the data set. Geographic coordinates of over 27,000 institutions areobtained by calling the Geocode function of the Google Maps API.

Data Processing

To better explore the relationship between citations and geographical distance, wedivide geographical distance by adopting statistical analysis technique: groupinganalysis and clustering analysis. For grouping analysis, we use multiples of 100 Kmdistance as threshold values, to determine the group of any two institutions. Forinstance, if institutes I a and I b are 250 Km apart, citations will be considered in the200–300 Km group. For clustering analysis, we use Density-Based Spatial Clustering ofApplications with Noise (DBSCAN) [42], which is a spatial clustering algorithm basedon density. The number of clusters are determined by two parameters: (1) the furthest PLOS

Geographic distribution of institutions

Fig 2 shows the geographic distribution of institutions with PRC publications. Each reddot represent an institution, with the dot size reflecting the number of publications, andthe color represents the number of citations. We observe that research institutions arespread over all continents, with the ones in North America and Europe are moreresearch intensive and attract more citations. The top 10 institutions according to thenumber of papers published are shown in Table 1, and their geographical locations arepointed out by green labels in Fig 2.

Fig 2.

Visualizing the geographical distribution of institutions.

Results

Citation dynamics

Fig 3 characterizes the change of citations ( C ) (i.e. variation in citation quantity) withgeographical distance ( d ) by grouping analysis. In order to characterize the citationtrend, we also analyze the relationship between citations and geographical distance byclustering analysis (Fig 4). For both analysis methods, we analyze the citation trend byconsidering four cases: intra-countries, inter-countries, raw distance (with oceans) andland distance (without oceans). The citation trends of scholarly papers within thecountries approximately follows C ( d ) ∼ y + A e − dt (Fig 3A and Fig 4A). Yet, we findthat the citation trends in-between countries (Fig 3B and Fig 4B) are different from theones within the countries. Citations rapidly decrease when the geographical distancesbetween institutions range from 0 Mm to 5 Mm, then consistently increase and reachthe peak at around 7Mm.The citation trend exhibits a rapid decline from 7 Mm to 20 Mm. Together, Fig 3Cand Fig 4C indicate that citations change with actual geographic distance. We find,however, that the changing trend of the citations is similar to one of between countries.This phenomenon drives us to explore the reason behind the peak point in Fig 3B,Fig 3C, Fig 4B, and Fig 4C. As a result, we find that the distance of Atlantic Oceanplays a significant role. The reason is that the Atlantic separates America from Europe,about 75% affiliations and 67% citations of papers are from America and Europe, and PLOS ! " " !&% " ’&% " "&% " (&% " " )&% " $&% " !" ! " " ! $% & &’(&)"*+%,"- .%/ ! " % % % !" ! " " ! $% & $%&$’"()*+", -*. ! " % % % & % ’ % !"! ! " " ! $% & ! " " "% &’ " " $% &’ " !"! ! " " ! $% & &’(&)!*+%,!- (cid:11) (cid:36) (cid:12)(cid:3)(cid:3)(cid:44)(cid:81)(cid:87)(cid:85)(cid:68)(cid:16)(cid:70)(cid:82)(cid:88)(cid:81)(cid:87)(cid:85)(cid:92)(cid:3)(cid:70)(cid:76)(cid:87)(cid:68)(cid:87)(cid:76)(cid:82)(cid:81)(cid:86) (cid:11) (cid:37) (cid:12)(cid:3)(cid:3)(cid:44)(cid:81)(cid:87)(cid:72)(cid:85)(cid:16)(cid:70)(cid:82)(cid:88)(cid:81)(cid:87)(cid:85)(cid:92)(cid:3)(cid:70)(cid:76)(cid:87)(cid:68)(cid:87)(cid:76)(cid:82)(cid:81)(cid:86)(cid:11) (cid:38) (cid:12)(cid:3)(cid:3)(cid:53)(cid:68)(cid:90)(cid:3)(cid:71)(cid:76)(cid:86)(cid:87)(cid:68)(cid:81)(cid:70)(cid:72)(cid:3)(cid:11)(cid:76)(cid:81)(cid:70)(cid:79)(cid:88)(cid:71)(cid:76)(cid:81)(cid:74)(cid:3)(cid:82)(cid:70)(cid:72)(cid:68)(cid:81)(cid:86)(cid:12) (cid:11) (cid:39) (cid:12)(cid:3)(cid:3)(cid:47)(cid:68)(cid:81)(cid:71)(cid:3)(cid:71)(cid:76)(cid:86)(cid:87)(cid:68)(cid:81)(cid:70)(cid:72)(cid:3)(cid:11)(cid:72)(cid:91)(cid:70)(cid:79)(cid:88)(cid:71)(cid:76)(cid:81)(cid:74)(cid:3)(cid:82)(cid:70)(cid:72)(cid:68)(cid:81)(cid:86)(cid:12) Fig 3.

Characterizing the relationship of citations and geographical distance bygrouping analysis.the citations between America and Europe account for around 68% of the totalcitations. The uneven geographical distribution of institutions causes such trend. Thisobservation drives us to explore the relationship between citations and geographicaldistance ignoring the influence of the non-uniform geographical distribution ofinstitutions. To this end, we construct the distance matrix of six continents containingAsia, Europe, Africa, North America, South America, and Oceania through the rangingfunction of Google Maps. According to Fig 3D and Fig 4D, it is apparent that thechange of citations for publications closely relates to the geographical distance, withmore citations associate to shorter geographical distance, and vice versa. This has clearimplications in quantifying the impact of scholarly papers: if the citations of a paper arefrom long distance, these citations are more valuable compared to the citations of shortdistance, and further elaboration can be found in the Discussion Section. Fig 3D andFig 4D indicate citations appear to follow a similar trend as Fig 3A and Fig 4A.In addition, we analyze the citation trend by considering the time factor. Toillustrate the difference of citation trend and geographical distance over differentperiods, comparisons over 4 decades (’70s, ’80s, ’90s and ’00s) are shown in Fig 5.Fig 5A compares the relationship between citations and geographical distance in 4decades. The trends are shown in Fig 3B, Fig 3C, Fig 4B, and Fig 4C. Fig 5B comparesthe change of papers over time. Fig 5A indicates that the citation trend approximatelyfollows a Gaussian distribution, ranging from 4 Mm to 12 Mm, and between 2000 to2009. The trend change is more noticeable compared to other three periods of time:1970-1979, 1980-1989, 1990-1999. The differences of citation trends (Fig 5A) are

PLOS ! " !& ’! $ !&%’! $ "& ’! $ "&%’! $ $ !" ! " " ! $% & &’(&)"*+%,"- .%/ ! " "! & "$ %" ’ "$!%" ’ ’ ’ !" ! " ! $% & $%&$’"()*+", -*. ! " "! % % &$" % %$" % !" ! " " ! $% & $%&$’"()*+", -*. ! " " !&% " ’&% " "&% " !" ! " " ! $% & &’(&)"*+%,"- .%/ (cid:11) (cid:36) (cid:12)(cid:3)(cid:44)(cid:81)(cid:87)(cid:85)(cid:68)(cid:16)(cid:70)(cid:82)(cid:88)(cid:81)(cid:87)(cid:85)(cid:92)(cid:3)(cid:70)(cid:76)(cid:87)(cid:68)(cid:87)(cid:76)(cid:82)(cid:81)(cid:86)(cid:11) (cid:38) (cid:12)(cid:3)(cid:53)(cid:68)(cid:90)(cid:3)(cid:71)(cid:76)(cid:86)(cid:87)(cid:68)(cid:81)(cid:70)(cid:72)(cid:3)(cid:11)(cid:76)(cid:81)(cid:70)(cid:79)(cid:88)(cid:71)(cid:76)(cid:81)(cid:74)(cid:3)(cid:82)(cid:70)(cid:72)(cid:68)(cid:81)(cid:86)(cid:12) (cid:11) (cid:37) (cid:12)(cid:3)(cid:44)(cid:81)(cid:87)(cid:72)(cid:85)(cid:16)(cid:70)(cid:82)(cid:88)(cid:81)(cid:87)(cid:85)(cid:92)(cid:3)(cid:70)(cid:76)(cid:87)(cid:68)(cid:87)(cid:76)(cid:82)(cid:81)(cid:86)(cid:11) (cid:39) (cid:12)(cid:3)(cid:47)(cid:68)(cid:81)(cid:71)(cid:3)(cid:71)(cid:76)(cid:86)(cid:87)(cid:68)(cid:81)(cid:70)(cid:72)(cid:3)(cid:11)(cid:72)(cid:91)(cid:70)(cid:79)(cid:88)(cid:71)(cid:76)(cid:81)(cid:74)(cid:3)(cid:82)(cid:70)(cid:72)(cid:68)(cid:81)(cid:86)(cid:12) Fig 4.

Characterizing the relationship of citations and geographical distance byclustering analysis.consistent with the change of productivity (Fig 5B) in four periods of time. We observea positive correlation between the number of publications and citations. Fig 5C andFig 5D show the trends of citation in North America and Europe. These trends indicatethat citations are closely related to geographical distance. These results inspire us toevaluate the impact of papers based on the geographical distance (see Methods).

Comparing the impact of papers

Based on citation network, we compare the scores of quantum PageRank and PageRank.We observe that many papers share the same PageRank score. The importance of somenodes in citation network cannot be distinguished, which is considered a typicaldrawback of PageRank. In order to show the difference of scores of quantum PageRankand PageRank, we randomly select 100 scores out of 27,000 for each algorithm. Fig 6shows the comparative results of quantum PageRank and PageRank for the same nodesin the citation network. According to Fig 6A, we observe that node15 - node21 yield thesame scores of PageRank, while their quantum PageRank scores are different (seeFig 6B). Fig 6 indicates that quantum PageRank can better reveal the hierarchy oflevels in the hierarchical networks.In order to explore the performance of higher-order weighted quantum PageRank, wecompare the scores of higher-order weighted quantum PageRank and weighted quantumPageRank. We find that higher-order weighted quantum PageRank algorithm cancapture different scores when weighted PageRank algorithm shows the same scores, as

PLOS ! " "! % ’$ &" % ($ &" % "$ ) ! " " ! $% & !" /0123/010 /0423/040 /0023/000 522235220 ! " ! " !" ! " ) (% ’( " (%&’( " !% ’( " !%&’( " )% ’( " )%&’( " !"! ! " " ! $% & &’(&)!*+%,!- ! " ) %& (% " %&’(% " !& (% " !&’(% " )& (% " !" ! " " ! $% & &’(&)"*+%,"- .%/ (cid:11) (cid:36) (cid:12)(cid:3)(cid:38)(cid:75)(cid:68)(cid:81)(cid:74)(cid:72)(cid:3)(cid:82)(cid:73)(cid:3)(cid:70)(cid:76)(cid:87)(cid:68)(cid:87)(cid:76)(cid:82)(cid:81)(cid:86)(cid:3)(cid:82)(cid:89)(cid:72)(cid:85)(cid:3)(cid:71)(cid:76)(cid:86)(cid:87)(cid:68)(cid:81)(cid:70)(cid:72) (cid:11) (cid:37) (cid:12)(cid:3)(cid:38)(cid:75)(cid:68)(cid:81)(cid:74)(cid:72)(cid:3)(cid:82)(cid:73)(cid:3)(cid:81)(cid:88)(cid:80)(cid:3)(cid:82)(cid:73)(cid:3)(cid:83)(cid:68)(cid:83)(cid:72)(cid:85)(cid:86)(cid:3)(cid:82)(cid:89)(cid:72)(cid:85)(cid:3)(cid:87)(cid:76)(cid:80)(cid:72)(cid:11) (cid:38) (cid:12)(cid:3)(cid:49)(cid:82)(cid:85)(cid:87)(cid:75)(cid:3)(cid:36)(cid:80)(cid:72)(cid:85)(cid:76)(cid:70)(cid:68) (cid:11) (cid:39) (cid:12)(cid:3)(cid:40)(cid:88)(cid:85)(cid:82)(cid:83)(cid:72) Fig 5.

Characterizing citation dynamics.shown in Fig 7. Fig 7A shows the scores of weighted quantum PageRank of 100 randomselected nodes in citation network, and Fig 7B shows the scores of higher-order weightedquantum PageRank of the same nodes. According to this Figure, we observe that scoresof node15 - node19 are the same in the weighted quantum PageRank, while their scoresare different in the higher-order quantum PageRank. The comparison between the twoalgorithms indicates that considering higher-order dependencies in citation network canbetter identify the impact of papers.Fig 8 illustrates the effect of higher-order citation networks on quantifying theimpact of self-citation. The pathways represent the citation between different papers.The red arrow indicates self-citation. The green arrow indicates the self-citation chainwith higher-order dependencies. In Fig 8A, paper P cites paper P , and the citationbelongs to self-citation. Other citation relationships do not include self-citation. Forexample, paper P cites paper P , and there is no common author for the two papers.Four papers cites P , and P cites six papers. W ( P → P ) represents the weight ofpaper P cites paper P . W ( P → P ) is equal to 0 . × − in the original citationnetwork. However, in the higher-order citation network, W ( P → P ) is equal to theweight of paper P | P citing paper P ( W ( P | P → P )), namely 2 . × − . Theweight in the higher-order network is higher than the weight in the original citationnetwork, indicating that the impact of the self-citation is improved . The citationstructure contributes to the enhancement of weight of self-citation. Due to thepre-sequence nodes of paper P are cited multiple times in the citation network, theweight of paper P | P citing paper P is improved in the higher-order citation network.In Fig 8B, paper P cites paper P , and the citation belongs to self-citation. There is noself-citation in other citation relationships. W ( P → P ) represents the weight of paper PLOS (cid:856)(cid:1004)(cid:28)(cid:1085)(cid:1004)(cid:1004)(cid:1006)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1009)(cid:1008)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1009)(cid:1010)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1009)(cid:1012)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1009) (cid:1005)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1008) (cid:1005)(cid:856)(cid:1006)(cid:28)(cid:882)(cid:1004)(cid:1008)(cid:1005)(cid:856)(cid:1008)(cid:28)(cid:882)(cid:1004)(cid:1008)(cid:1005)(cid:856)(cid:1010)(cid:28)(cid:882)(cid:1004)(cid:1008) (cid:1005)(cid:856)(cid:1012)(cid:28)(cid:882)(cid:1004)(cid:1008) (cid:1005) (cid:1008) (cid:1011) (cid:1005)(cid:1004) (cid:1005)(cid:1007) (cid:1005)(cid:1010) (cid:1005)(cid:1013) (cid:1006)(cid:1006) (cid:1006)(cid:1009) (cid:1006)(cid:1012) (cid:1007)(cid:1005) (cid:1007)(cid:1008) (cid:1007)(cid:1011) (cid:1008)(cid:1004) (cid:1008)(cid:1007) (cid:1008)(cid:1010) (cid:1008)(cid:1013) (cid:1009)(cid:1006) (cid:1009)(cid:1009) (cid:1009)(cid:1012) (cid:1010)(cid:1005) (cid:1010)(cid:1008) (cid:1010)(cid:1011) (cid:1011)(cid:1004) (cid:1011)(cid:1007) (cid:1011)(cid:1010) (cid:1011)(cid:1013) (cid:1012)(cid:1006) (cid:1012)(cid:1009) (cid:1012)(cid:1012) (cid:1013)(cid:1005) (cid:1013)(cid:1008) (cid:1013)(cid:1011) (cid:1005)(cid:1004)(cid:1004) (cid:94) (cid:272) (cid:381) (cid:396) (cid:286) (cid:400) (cid:69)(cid:381)(cid:282)(cid:286)(cid:400) (cid:19)(cid:131)(cid:137)(cid:135)(cid:21)(cid:131)(cid:144)(cid:141) (cid:1004)(cid:856)(cid:1004)(cid:28)(cid:1085)(cid:1004)(cid:1004)(cid:1005)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1009)(cid:1006)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1009)(cid:1007)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1009)(cid:1008)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1009)(cid:1009)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1009)(cid:1010)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1009)(cid:1011)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1009)(cid:1012)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1009)(cid:1013)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1009)(cid:1005)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1008) (cid:1005) (cid:1008) (cid:1011) (cid:1005)(cid:1004) (cid:1005)(cid:1007) (cid:1005)(cid:1010) (cid:1005)(cid:1013) (cid:1006)(cid:1006) (cid:1006)(cid:1009) (cid:1006)(cid:1012) (cid:1007)(cid:1005) (cid:1007)(cid:1008) (cid:1007)(cid:1011) (cid:1008)(cid:1004) (cid:1008)(cid:1007) (cid:1008)(cid:1010) (cid:1008)(cid:1013) (cid:1009)(cid:1006) (cid:1009)(cid:1009) (cid:1009)(cid:1012) (cid:1010)(cid:1005) (cid:1010)(cid:1008) (cid:1010)(cid:1011) (cid:1011)(cid:1004) (cid:1011)(cid:1007) (cid:1011)(cid:1010) (cid:1011)(cid:1013) (cid:1012)(cid:1006) (cid:1012)(cid:1009) (cid:1012)(cid:1012) (cid:1013)(cid:1005) (cid:1013)(cid:1008) (cid:1013)(cid:1011) (cid:1005)(cid:1004)(cid:1004) (cid:94) (cid:272) (cid:381) (cid:396) (cid:286) (cid:400) (cid:69)(cid:381)(cid:282)(cid:286)(cid:400) (cid:20)(cid:151)(cid:131)(cid:144)(cid:150)(cid:151)(cid:143)(cid:3)(cid:19)(cid:131)(cid:137)(cid:135)(cid:21)(cid:131)(cid:144)(cid:141) (cid:11) (cid:36) (cid:12)(cid:11) (cid:37) (cid:12)

Fig 6.

Comparing the scores of PageRank and quantum PageRank. Identical scoresusing PageRank (the red region) can be differentiated using quantum PageRank (thegreen region). P cites paper P in the original citation network. W ( P | P → P ) represents theweight of paper P cites paper P in the higher-order citation network. We observe thatthe W ( P | P → P ) in the higher-order citation network is lower than W ( P → P ) inthe original citation network. The reason is that pre-sequence nodes of paper P are onlycited by a paper, and paper P is a root node in the higher-order citation chain. Thecitation structure determines the weight change in the higher-order citation network. Discussion

Geographical distance

An interesting finding is that citation pattern is closely related to the geographicaldistributions of institutions, discounting the separation by oceans. The shorter theactual geographic distance between citing and cited institutions, the more citations. Weweight the citation between institutions by ignoring the ocean separating them. Rarecitations are considered more valuable: “less is more.” Intuitively, long distancepresents a barrier for disseminating research finding and socializing other researchers inperson. Although publishing over the Internet has become a popular alternative, it is achallenge to promote among massive information made available on the Web. Inaddition to the Web presence, additional publicity through conferences, seminars, andworkshops help making the work well-known. With the increased cost and effort forfrequent travel to far-away destinations, citations made by geographically far-away

PLOS (cid:856)(cid:1004)(cid:28)(cid:1085)(cid:1004)(cid:1004)(cid:1006)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1011)(cid:1008)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1011)(cid:1010)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1011)(cid:1012)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1011)(cid:1005)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1010) (cid:1005)(cid:856)(cid:1006)(cid:28)(cid:882)(cid:1004)(cid:1010) (cid:1005) (cid:1008) (cid:1011) (cid:1005)(cid:1004) (cid:1005)(cid:1007) (cid:1005)(cid:1010) (cid:1005)(cid:1013) (cid:1006)(cid:1006) (cid:1006)(cid:1009) (cid:1006)(cid:1012) (cid:1007)(cid:1005) (cid:1007)(cid:1008) (cid:1007)(cid:1011) (cid:1008)(cid:1004) (cid:1008)(cid:1007) (cid:1008)(cid:1010) (cid:1008)(cid:1013) (cid:1009)(cid:1006) (cid:1009)(cid:1009) (cid:1009)(cid:1012) (cid:1010)(cid:1005) (cid:1010)(cid:1008) (cid:1010)(cid:1011) (cid:1011)(cid:1004) (cid:1011)(cid:1007) (cid:1011)(cid:1010) (cid:1011)(cid:1013) (cid:1012)(cid:1006) (cid:1012)(cid:1009) (cid:1012)(cid:1012) (cid:1013)(cid:1005) (cid:1013)(cid:1008) (cid:1013)(cid:1011) (cid:1005)(cid:1004)(cid:1004) (cid:94) (cid:272) (cid:381) (cid:396) (cid:286) (cid:400) (cid:69)(cid:381)(cid:282)(cid:286)(cid:400) (cid:26)(cid:135)(cid:139)(cid:137)(cid:138)(cid:150)(cid:135)(cid:134)(cid:3)(cid:147)(cid:151)(cid:131)(cid:144)(cid:150)(cid:151)(cid:143)(cid:3)(cid:19)(cid:131)(cid:137)(cid:135)(cid:21)(cid:131)(cid:144)(cid:141) (cid:1004)(cid:856)(cid:1004)(cid:28)(cid:1085)(cid:1004)(cid:1004)(cid:1009)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1011)(cid:1005)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1010)(cid:1005)(cid:856)(cid:1009)(cid:28)(cid:882)(cid:1004)(cid:1010)(cid:1006)(cid:856)(cid:1004)(cid:28)(cid:882)(cid:1004)(cid:1010)(cid:1006)(cid:856)(cid:1009)(cid:28)(cid:882)(cid:1004)(cid:1010) (cid:1005) (cid:1008) (cid:1011) (cid:1005)(cid:1004) (cid:1005)(cid:1007) (cid:1005)(cid:1010) (cid:1005)(cid:1013) (cid:1006)(cid:1006) (cid:1006)(cid:1009) (cid:1006)(cid:1012) (cid:1007)(cid:1005) (cid:1007)(cid:1008) (cid:1007)(cid:1011) (cid:1008)(cid:1004) (cid:1008)(cid:1007) (cid:1008)(cid:1010) (cid:1008)(cid:1013) (cid:1009)(cid:1006) (cid:1009)(cid:1009) (cid:1009)(cid:1012) (cid:1010)(cid:1005) (cid:1010)(cid:1008) (cid:1010)(cid:1011) (cid:1011)(cid:1004) (cid:1011)(cid:1007) (cid:1011)(cid:1010) (cid:1011)(cid:1013) (cid:1012)(cid:1006) (cid:1012)(cid:1009) (cid:1012)(cid:1012) (cid:1013)(cid:1005) (cid:1013)(cid:1008) (cid:1013)(cid:1011) (cid:1005)(cid:1004)(cid:1004) (cid:94) (cid:272) (cid:381) (cid:396) (cid:286) (cid:400) (cid:69)(cid:381)(cid:282)(cid:286)(cid:400) (cid:11) (cid:36) (cid:12) (cid:11)(cid:139)(cid:137)(cid:138)(cid:135)(cid:148)(cid:486)(cid:145)(cid:148)(cid:134)(cid:135)(cid:148)(cid:3) (cid:3) (cid:153)(cid:135)(cid:139)(cid:137)(cid:138)(cid:150)(cid:135)(cid:134)(cid:3)(cid:147)(cid:151)(cid:131)(cid:144)(cid:150)(cid:151)(cid:143)(cid:3)(cid:19)(cid:131)(cid:137)(cid:135)(cid:21)(cid:131)(cid:144)(cid:141) (cid:11) (cid:37) (cid:12)

Fig 7.

Comparing the scores of weighted quantum PageRank and higher-orderweighted quantum PageRank. Identical scores using weighted quantum PageRank (thered regions) can be differentiated using higher-order weighted quantum PageRank (thegreen regions).researchers are considered more valuable. At the same time, long-distance citationsinclude less manipulated promotion, thus better reflects the true impact of a paper.It should be noted that the finding does not conflict to, and can be applied as aweighted factor on-top of, other “reputation” metrics such as citations from a paperwritten by a leading institute or published in a prestigious journal. Investigation of theweighted citation would be a different topic, and to combine it with the geographicaldistance analysis is beyond the scope of this paper.

Higher-order dependencies

In this paper, we propose a quantitative approach for evaluating the impact of scholarlypapers via a higher-order citation networks. Evaluating the impact of papers inhigher-order citation networks can more objectively reflect the true influence ofscholarly papers. Meanwhile, the higher-order dependencies can weaken the effect ofmanipulated citation activities. For example, when researchers manipulate citations toboost the impact of their papers, they usually deliberately cite the new publishedpapers by themselves or their friends. The manipulation activities can influence the truecitation networks, and generate more influence to the first-order citation networks.The higher-order dependencies are more likely to happen for the denser nodes androot nodes in citation networks. We exclude sparse nodes (citation chains withappearing less than 50 times in all the citation chains) in the citation networks to findthe higher-order dependencies. The ignored nodes in citation networks are regarded as

PLOS (cid:2)(cid:1)(cid:2)(cid:1) (cid:1)(cid:2)(cid:1)(cid:2)(cid:1) (cid:1) (cid:2)(cid:3) (cid:1)(cid:2)(cid:1)(cid:2)(cid:1) (cid:4) (cid:3) (cid:4) (cid:3) (cid:5) (cid:3) (cid:6) (cid:3)(cid:5)(cid:6)(cid:7)(cid:8)(cid:2)(cid:3) (cid:9)(cid:1) (cid:1)(cid:2)(cid:3)(cid:4)(cid:3)(cid:2)(cid:5)(cid:6)(cid:7)(cid:6)(cid:8)(cid:3)(cid:9)(cid:5)(cid:10)(cid:11) (cid:1)(cid:2)(cid:1)(cid:2)(cid:1) (cid:1)(cid:2)(cid:1)(cid:2)(cid:1) (cid:1) (cid:2)(cid:3) (cid:3) (cid:4) (cid:3) (cid:5) (cid:3) (cid:6) (cid:12)(cid:2)(cid:13)(cid:14)(cid:8)(cid:10)(cid:15)(cid:5)(cid:10)(cid:16)(cid:8)(cid:10)(cid:7)(cid:17)(cid:2)(cid:3)(cid:4)(cid:3)(cid:2)(cid:5)(cid:6)(cid:7)(cid:6)(cid:8)(cid:3)(cid:9)(cid:5)(cid:10)(cid:11) (cid:3) (cid:4) (cid:7)(cid:3) (cid:6) (cid:7)(cid:5)(cid:7)(cid:6)(cid:8)(cid:2)(cid:3) (cid:9)(cid:1) (cid:1)(cid:2)(cid:1)(cid:2)(cid:1) (cid:7)(cid:3) (cid:3) (cid:8) (cid:3) (cid:9) (cid:3)(cid:5)(cid:10)(cid:1)(cid:8)(cid:2)(cid:3) (cid:9)(cid:6) (cid:1)(cid:2)(cid:3)(cid:4)(cid:3)(cid:2)(cid:5)(cid:6)(cid:7)(cid:6)(cid:8)(cid:3)(cid:9)(cid:5)(cid:10)(cid:11) (cid:3) (cid:10) (cid:1)(cid:2)(cid:1)(cid:2)(cid:1) (cid:2)(cid:11) (cid:3) (cid:9) (cid:3)(cid:5)(cid:7)(cid:1)(cid:8)(cid:2)(cid:3) (cid:9)(cid:6) (cid:3) (cid:10) (cid:12)(cid:2)(cid:13)(cid:14)(cid:8)(cid:10)(cid:15)(cid:5)(cid:10)(cid:16)(cid:8)(cid:10)(cid:7)(cid:17)(cid:2)(cid:3)(cid:4)(cid:3)(cid:2)(cid:5)(cid:6)(cid:7)(cid:6)(cid:8)(cid:3)(cid:9)(cid:5)(cid:10)(cid:11) (cid:3) (cid:8) (cid:7)(cid:3) (cid:10) (cid:3) (cid:8) (cid:11) (cid:36) (cid:12)(cid:11) (cid:37) (cid:12)

Fig 8.

Comparing self-citation weights in two different citation networks.the zero-order dependencies, and such nodes are a large proportion in citation networks.In fact, the number of citation relationships is based on the statistical citation chains,which is generated by using the random walk method. Therefore, for a certain pair ofcitation, we find that the number of cited papers of precedence nodes and the number ofcitations of the succeeding nodes determine the number of occurrences of the pair ofnodes in all the citation chains. Based on this finding, we roughly estimate that theprobability of such nodes getting more citations is low if the higher-order dependenciesof the nodes appear in the citation chains less than 50 times. Given a paper, we traceits citation path, and we generate a citation tree according the citation relationships.For the root node in the citation tree, the total number of the root nodes appearing inall the citation chains is only related to the post-sequence nodes. For the leaf node inthe citation tree, the total number of the leaf nodes appearing in all the citation chainsis only related to the pre-sequence nodes. The finding mentioned above can be extendedto all the networks, in which researches can find the corresponding higher-orderdependencies to better rank the nodes. The general pattern is that the number ofin-degree and the number of out-degree of a node determine the number of occurrencesof the node in all the communication paths in certain network. Furthermore, the

PLOS

Ranking algorithm analysis

Due to the scores of PageRank more depending on the damping parameter α , the scoresof PageRank look more arbitrary. Compared to PageRank, the scores of quantumPageRank are less dependent on the parameter α , indicating quantum PageRank ismore robust compared to PageRank in term of the variation of damping parameter α [37]. We find that more citations are associated to shorter geographical distances. Toweaken the impact of cited papers from citing papers with short distances, andstrengthen the impact of scholarly papers from citations with long distances, we weightthe citation networks by an inverse function of the geographical distance betweeninstitutions. Based on the finding that citations are closely related to geographicaldistance, we construct the higher-order weighted quantum PageRank algorithm forobjectively quantifying the impact of scholarly papers. In the hierarchical networks,quantum PageRank can better distinguish the impact of nodes compared to PageRank,as shown in Fig 6. Higher-order weighted quantum PageRank can capture deeperstructured information, and better distinguish the impact of nodes compared toweighted quantum PageRank, as shown in Fig 7. Supporting information

S1 Data Source. Data source used in this paper.

Acknowledgments

The authors extend their appreciation to the International Scientific PartnershipProgram ISPP at King Saud University for funding this research work throughISPP

References

1. Xia F, Wang W, Bekele TM, Liu H. Big scholarly data: a survey. IEEETransactions on Big Data. 2017;3(1):18–35.2. Aguinis H, Su´arez-Gonz´alez I, Lannelongue G, Joo H. Scholarly impact revisited.The Academy of Management Perspectives. 2012;26(2):105–132.3. Bai X, Liu H, Zhang F, Ning Z, Kong X, Lee I, et al. An overview on evaluatingand predicting scholarly article impact. Information. 2017;8(3):73.4. Evans JA, Reimer J. Open access and global participation in science. Science.2009;323(5917):1025–1025.5. Gargouri Y, Hajjem C, Larivi`ere V, Gingras Y, Carr L, Brody T, et al.Self-selected or mandated, open access increases citation impact for higher qualityresearch. PLoS One. 2010;5(10):e13636.6. Hirsch JE. An index to quantify an individual’s scientific research output.Proceedings of the National Academy of Sciences of the United States of America.2005;102(46):16569–16572.

PLOS

PLOS a b l e . T o p - i n s t i t u t i o n s b y pub li c a t i o n q u a n t i t i e s i n N u m b e r I n s t i t u t i o n L a t i t ud e L o n g i t ud e N u m b e r o f p a p e r s C i t a t i o n s L o s A l a m o s S c i e n t i f i c L a b o r a t o r y , U n i v e r s i t y o f C a li f o r n i a , L o s A l a m o s , N e w M e x i c o8754435 . - . , ,

739 2 C y c l o t r o n L a b o r a t o r y a nd P h y s i c s D e p a r t m e n t , M i c h i ga nS t a t e U n i v e r s i t y , E a s t L a n s i n g , M i c h i ga n . - . , ,

748 3 J o i n t I n s t i t u t e f o r H e a vy I o n R e s e a r c h , O a k R i d g e , T e nn e ss ee , U S A . - . , ,

684 4 A r go nn e N a t i o n a l L a b o r a t o r y , A r go nn e , I lli n o i s , U S A . - . , ,

128 5 B e r k e l e y G e o c h r o n o l og y C e n t e r , R i d g e R oa d , B e r k e l e y , C a li f o r n i a9470937 . - . , ,

865 6 C a t h o li c U n i v e r s i t y o f A m e r i c a , W a s h i n g t o n , D . C . . - . , ,

402 7 C . E . N . S a c l a y , B . P . N o . , - G i f - s u r - Y v e tt e , F r a n ce . . , ,

361 8 D e p a r t e m e n t o f C h e m i s t r y , B r oo k h a v e n N a t i o n a l L a b o r a t o r y , U p t o n , N e w Y o r k , U S A . - . , ,

599 9 G . S . I . D a r m s t a d t , D a r m s t a d t , G e r m a n y . . ,