Predicting the Citations of Scholarly Paper
PPredicting the Citations of Scholarly Paper
Xiaomei Bai a , Fuli Zhang b *, Ivan Lee c a Computing Center, Anshan Normal University, Anshan, China b Library, Anshan Normal University, Anshan, China c School of Information Technology and Mathematical Sciences, University of SouthAustralia, Australia
Abstract
Citation prediction of scholarly papers is of great significance in guiding fund-ing allocations, recruitment decisions, and rewards. However, little is knownabout how citation patterns evolve over time. By exploring the inherent invo-lution property in scholarly paper citation, we introduce the Paper PotentialIndex (PPI) model based on four factors: inherent quality of scholarly paper,scholarly paper impact decaying over time, early citations, and early citers’impact. In addition, by analyzing factors that drive citation growth, wepropose a multi-feature model for impact prediction. Experimental resultsdemonstrate that the two models improve the accuracy in predicting schol-arly paper citations. Compared to the multi-feature model, the PPI modelyields superior predictive performance in terms of range-normalized RMSE.The PPI model better interprets the changes in citation, without the needto adjust parameters. Compared to the PPI model, the multi-feature modelperforms better prediction in terms of Mean Absolute Percentage Error andAccuracy; however, their predictive performance is more dependent on theparameter adjustment.
Keywords:
Scholarly Paper, Paper Potential Index, Multi-featureModel ∗ Corresponding author
Email address: [email protected] (Fuli Zhang b *) Preprint submitted to Journal of Informetrics August 13, 2020 a r X i v : . [ c s . D L ] A ug . Introduction There is an increasing interest in understanding the citation dynamicsof scholarly paper and the evolution in science (Xia et al., 2017). So far,studies in this field have primarily been focused on success of science (Xiaet al., 2016; Bai et al., 2016; Cao et al., 2016; Fiala and Tutoky, 2018; Zhanget al., 2017), academic collaboration networks (Panagopoulos et al., 2017),team science (Heidi, 2015) and scientific impact prediction (Bai et al., 2017).While citation serves as a popular indicator for measuring the research out-come, it is often required to estimate the future impact as well. For in-stance, research impact prediction helps in effective allocation of researchfunds (Clauset et al., 2017). An important challenge in scientific impact pre-diction is to characterize the change in citations over time, and it is importantto identify the factors that affect citations of scholarly papers.Previous studies have mainly focused on predicting the citations or an-alyzing future citation distributions. Some studies utilize machine learn-ing algorithms such as Gradient Boosting Decision Tree (Sandulescu andChiru, 2016), Support Vector Machine (Adankon and Cheriet, 2010), andXGBoost (Chen and Guestrin, 2016). To train the validity of the predictivemodels, crucial features have been identified for citation prediction, includingearly citations, journal impact factor, authors’ authority, journal reputation,topic of scholarly paper, and age (Petersen et al., 2014; Sarig¨ol et al., 2014; Yuet al., 2014). Some citation prediction studies have applied generative modelto reflect the observation that older papers typically attracted higher cita-tions (Newman, 2008), or to address some citation patterns that come withan initial period of growth followed by a gradual declined over time (Wanget al., 2008, 2013). More recently, Xiao et al. (2016) proposed a point pro-cess model to predict the long-term impact of individual publications basedon early citations. Furthermore, Singh et al. (2017) has found that earlyinfluential citers negatively affected long-term scientific impact, possibly dueto attention stealing, whereas non-influential early citers positively affectedlong-term scientific impact.Inspired by the prior work Wang et al. (2008, 2013); Xiao et al. (2016);Singh et al. (2017), we model the Paper Potential Index (PPI) by consideringthe following factors: inherent quality of scholarly paper, scholarly paper im-pact decaying over time, early citations, and early citers’ impact. The PPIpredictive model combines these factors and expands the Hawkes process,and it mainly depends on the inherent involution mechanism of paper cita-2ions with the following three properties: (1) Paper citation declines alongwith the decay of paper novelty over time; (2) The early citer’s impact canincrease scholarly paper impact in the predictive model; (3) Early citationshelp retaining long term citations.In addition, we propose a multi-feature predictive model, which consid-ers author-based features, journal-based features, and citations feature. Wecompare the prediction results of the two models in terms of mean abso-lute error, root mean squared error, range-normalized RMSE, mean absolutepercentage error and accuracy.Main contributions of this paper include: (1) Introduction of PPI whichreflects the potential impact of a scholarly article; (2) Consideration of schol-arly paper impact decaying over time, scholarly papers’ quality, early ci-tations, and early citing authors’ impact, to quantify the potential impactof scholarly articles; (3) Discussions on how PPI outperforms the existingmulti-feature models in citation prediction.
2. Related work
Citation prediction of scholarly papers has been extensively investigated,and these studies are mostly based on the analysis of mixture of features,including author-based features (the number of authors, the country of theauthor’s institution, authors’ authority, etc.), journal-based features (the to-tal citations of the journal, journal impact factor, keyword frequency of eachjournal, etc.), paper-based features (the topic of scholarly paper, scholarlypaper length, keyword repetition in the abstract of a paper, the number of ref-erences, etc.), and other features such as institutional features (institutionalrankings and reputation, etc.) In addition, Altmetrics are also employed topredict the citations of scholarly paper. Various investigations have beenconducted to explore the correlation between Twitter activities and citationpatterns (Peoples et al., 2016; Timilsina et al., 2016; Erdt et al., 2016). Semi-nal examples in citation prediction using mixture of features are summarizedin Table 1. The three categories of features: author-based features, journal-based features, and citations feature are used in our multi-feature predictivemodel. In order to improve the performance of prediction, Author ImpactFactor (AIF), Q value, H-index, Journal Impact Factor and citations areused to predict the citations of scholarly paper. The main difference betweenour multi-feature predictive model and the prior studies is the selection offeatures. 3 able 1:
Examples of multi-feature citation prediction of scholarly paper. source author features journal features paper features other featuresHaslam et al.(2008) the number ofauthors, firstauthor gender journal prestige title length,the number ofreferences first authorinstitution’sprestigebornmannet al. (2012) the number ofauthors, thereputation ofthe authors the language ofthe publishingjournal citation count citationperformance ofthe citedreferences,reviewers’ratings ofimportanceLivne et al.(2013) H-index,g-index journal prestige citations contentsimilarity,graph density,clusteringcoefficientYu et al.(2014) the number ofauthors, thecountry of theauthor’sinstitution,H-index journal impactfactor, totalcitations, 5-yearimpact factor,the citedhalf-life the number ofreferences, thereciprocal ofthe first-citedage of thispaper the documenttypeSingh et al.(2015) H-index,author rank,past influenceof authors,productivity,sociality,authority,versatility journal rank,journalcentrality, pastinfluence ofjournals publicationcount, citationcount, novelty,topic rank,diversity averagecountX,averageciteWordsRobson andMousqu`es(2016) the number ofauthors, authorname the number ofjournal pages,journal prestige the year ofpublication,title length,abstract length special issueSohrabi andIraj (2017) the number ofauthors title length,abstract length SCImagoquartile4n order to analyze the efficiency of multi-feature for citation prediction,regression models are often used. Popular regression models for citationprediction include quantile regression (Robson and Mousqu`es, 2016), semi-continuous regression (Sohrabi and Iraj, 2017) and Gradient Boosted Regres-sion Trees (GBDT) (Chen and Zhang, 2015). Generative models can also beused to predict the citations of scholarly papers (Li et al., 2015; Zhang et al.,2016). Wang et al. (2013) proposed a point process by identifying three fun-damental mechanisms in paper impact prediction: preferential attachment(highly cited papers are more likely be cited again), decay rate, and fitness(capturing the inherent differences between papers) to predict the probabil-ity of a paper being cited. To characterize the citation dynamics of scien-tific papers, a nonlinear stochastic model of citation dynamics based on thecopying-redirection-triadic closure mechanism was reported by Golosovskyand Solomon (2017).
3. Modeling citing behavior as a point process
The American Physical Society (APS) dataset includes all papers pub-lished in 9 journals, including Physical Review A, Physical Review B, Phys-ical Review C, Physical Review D, Physical Review E, Physical Review I,Physical Review L, Physical Review ST and Review of Modern Physics,from 1970 to 2013 (http://publish.aps.org/datasets). Each record in theAPS dataset includes paper title, author names, author affiliations, date ofpublication, and a list of cited papers. Because the APS dataset does notprovide unique author identifiers, we first do name disambiguation based onthe method proposed by Sinatra et al. (2016) in our experiments. Two au-thors are considered to be the same individual if all of the following threeconditions are fulfilled: (1) Last names of two authors are identical; (2) Firstnames are identical or with the matched initial; (3) One of the following istrue: the two authors cited each other at least once; the two authors shareat least one co-author; The two authors share at least one similar affiliation.We select 183,336 papers as experimental data in the APS dataset from 1978to 1998. Scholarly papers with greater or equal to 5 citations within the first5 years of publication are used as the training data, and their citations inthe subsequent 10 years are used as the testing data.5 .2. Prediction model
Intrinsic potential
Citations reflect the impact of a research paper,which correspond to the authors’ impact which can be quantified as Q i foran author i (Sinatra et al., 2016). A scholar with high Q i is expected topublish high-impact publications. In this paper, we use the parameter Q i toindicate the intrinsic potential of a paper’s impact. Paper impact decaying over time
As new ideas presented of eachpaper further grow in follow-up studies, the novelty fades away eventuallyand the impact of papers decays over time (Wang et al., 2013). Figure 1shows the citation pattern of individual scholarly papers over time. Thevertical axis is the yearly citations of 100 randomly selected scholarly paperspublished between 1978 and 1997 in the APS dataset. The color representsto the publication year of each scholarly paper. According to Figure 1, eachpaper has its own inherent citation trend and the pattern may not correlateto one another.
Figure 1: Citation pattern of individual scholarly papers over time.
Early citers’ impact
Some prior studies have ignored the citers’ impactto the citation dynamics (Wang et al., 2013). According to the study inSingh et al. (2017), influential early citers might negatively affect long-termscientific impact of papers due to attention stealing, whereas non-influentialearly citers could positively affect the long-term scientific impact of papers.Inspired by this idea, the early citers’ impact is used in PPI to model thecitation pattern of a scholarly paper.
Early citation
Based on the behavior that high early citations lead tomore citations in the future, we model the Paper Potential Index λ d ( t ) of a6cholarly paper d by extending a self-exciting Hawkes process: λ d ( t ) = β d Q dMax e − w d t + α d (cid:88) j,t j 1, wemaximize the reached probability of the i th citation at time t i . The conceptcan be formulated as follows: p ( t i | t i − ) = exp (cid:18) − (cid:90) t i t i − λ ( t ) dt (cid:19) λ ( t i ) (4)then we use the maximum likelihood estimation method to calculate the like-lihood function on the cited sequence of each article, and take the logarithmicfunction of the maximum likelihood estimate: log n (cid:89) i =1 p ( t i | t i − ) = n (cid:88) i =1 logλ ( t i ) − (cid:90) T λ ( t ) dt (5)where n is the citation count of a scholarly paper, t i is the time that the i − th citation occurs, and T is a period of time that a paper is cited. The maximumvalue of the log-likelihood function is obtained by calculating the minimumof its dual equation. Equation (4) is brought into the above formula, andadd a sparse regularized term (cid:107) β (cid:107) , we get the objective function L β : L β = − N (cid:88) d =1 { n (cid:88) i =1 log ( βs d e − w d t i + i − (cid:88) j − α d D j e − w d ( t i − t j ) ) − βs d w d (1 − e − w d T ) − α d w d n (cid:88) i =1 D i − e − w d ( T − t i ) } + λ (cid:107) β (cid:107) (6)where N is the number of papers in the experimental data, s d is the featuresof a paper. Adding the regularization term makes the objective functionnon-differentiable, we use the Alternating Direction Method of Multipliers(ADMM) to decompose the original optimization problem into a few sim-pler sub-problems. By introducing the auxiliary variable z , the optimizationproblem in equation (6) can be formulated by the following constraint opti-mization: min L + λ (cid:107) z (cid:107) s.t. β = z (7)The corresponding augmented Lagrangian is: L ρ = L + λ (cid:107) z (cid:107) + ρµ ( β − z ) + ρ (cid:107) β − z (cid:107) (8)where µ is the dual variable or Lagrange multiplier; ρ is the penalty coeffi-cient, which is usually used as an iterative step to update the dual variable.8he steps to solve the above augmented Lagrange optimization problem us-ing the ADMM algorithm are as follows:( β l +1 , α l +1 ) = arg min β ≥ ,α ≥ L ρ ( β l , α l , z l , u l ) (9) z l +1 = S λ/ρ ( β l +1 + α l +1 ) (10) u l +1 = u l + β l − z l +1 (11)where S λ/ρ is a soft critical value function. The ADMM algorithm is similarto the dual ascent algorithm, including a parameter minimization process,such as equation (9); an auxiliary parameter minimization process, such asequation (10); and a dual parameter update process, such as equation (11).In order to efficiently solve the optimization problem in equation (9), we usethe EM framework to update the parameters α and β . Given the probabilitythat feature k activates event i is p ki , the probability that event i activatesevent j is p ij , the EM algorithm is as follows: p d ( l +1) ki = β k s dk e − w dt i λ ( t i ) (12) p d ( l +1) ij = α d D j e − w d ( t i − t j ) λ ( t i ) (13) β l +1 k = − B + (cid:113) B + 4 ρ (cid:80) Nd =1 (cid:80) ni =1 p dki ρ (14) α ( l +1) d = (cid:80) ni =1 (cid:80) i − j =1 p dij (cid:80) ni =1 ( D i − e − w d ( T − t i ) ) /w d (15)where B = (cid:80) Nd =1 s dk (1 − e − w d T ) /w d + ρ ( u k − z k ). Equation (12) representsthe probability that the value of the k th feature S dk and the coefficient β k corresponding to the feature k affect the citations of the paper when a pa-per is cited i times. Equation (13) represents the probability that the j -th( j ≥ i ) citation affects the citations of a paper when it is cited i times.Therefore, (cid:80) Nd =1 (cid:80) ni =1 λ ( t i ) p dki indicates the expectation that the coefficient β k corresponding to the feature k affects citations of the paper on the entire9ata set. (cid:80) ni =1 (cid:80) i − j =1 λ ( t i ) p dij indicates the expectation that the number ofexisting citations of the paper affects its citations. In equation (8), we findthe maximum of these two expectations and derive the partial derivatives for α and β . When the partial derivative is zero, equations (14) and (15) areobtained. By iterating until convergence, we get the optimal values of theparameters α and β . After that, the new values of α and β are brought backto the values of u and z in the ADMM algorithm.After obtaining the parameters α and β , the parameters w and w ofeach paper are solved by the gradient descent method. The gradient of theobjective function with respect to w and w is as follows: ∂L ρ ∂w = n (cid:88) i =1 βst i e − w t i λ ( t i ) + βsw ( e − w T + T · w · e − w T − 1) (16) ∂L ρ ∂w = n (cid:88) i =1 (cid:80) i − j =1 ( t i − t j ) αD j e − w ( t i − t j ) λ ( t i )+ αw [ w ( T − t i ) e − w ( T − t i ) + e − w ( T − t i ) − 1] (17)After obtaining the optimal values of all parameters α , β , w and w , weestimate the citations of a scholarly paper after a certain period of time bytaking the integral of the intensity function λ ( t ). 4. Multi-features predictive model Author-based features . • Author Impact Factor (AIF).Similar to the concept of journal impact factor, an author’s AIF inyear T is the average citations of published papers in a period of ∆ T years before year T . Based on the APS dataset, we compute eachauthor’s AIF value according to the author’s publishing history anduse the statistics of all authors’ AIF of a given institution as a group ofits features, including sum, maximum, minimum, median, average anddeviation. We briefly explore and report the authors’ AIF features inthis work. 10 Q value.The Q value is calculated according to equation 2. • H-index.A scholar has an index value of H if the scholar has H papers withat least H citations. H-index can give an estimate of the impact of ascholar’s cumulative research contributions. Journal-based feature .Journal Impact Factor is a quantitative index to evaluate the impact of jour-nal. It is actually the ratio of citations of a journal and papers published ofthe journal. Citations feature .The historical citations of each paper are used to predict the impact of apaper. In order to investigate the effect of author-based feature, journal-basedfeature and citations feature, we evaluate the importance of features (seeTable 2). In this section, we describe the multi-feature predictive model, which in-tegrates author-based feature, journal-based feature and citations to the Gra-dient boosting decision trees (GBDT). The GBDT model suits for a mass offeatures and no-linear relationships between the predictor variables and thetarget variable. In terms of the multi-feature predictive model, parametersadjustment is crucial for the performance of predictive model. Main param-eters include:(1) learning rate : namely the model’s learning speed on the distributioncharacteristics of the sample, expressed as the weight of the regression treefor each iteration in the algorithm. The larger the learning rate is, the fasterthe algorithm converges. The smaller the learning rate is, the slower thealgorithm converges, but the prediction accuracy may increase.(2) number of iterations : the number of iterations is the number of weaklearners obtained in the model. In general, the number of iterations dependson the learning rate. 11 able 2: Features used in the prediction model. Feature Description Feature Descriptionc1 one-year citations max(H-index) maximum of H-indexc2 two-year citations min(H-index) minimum of H-indexc3 three-year citations avg(H-index) average of H-indexc4 four-year citations med(H-index) median of H-indexc5 five-year citations dev(H-index) deviation of H-indexsum(Q) sum of Q value sum(AIF) sum of AIFmax(Q) maximum of Q value max(AIF) maximum of AIFmin(Q) minimum of Q value min(AIF) minimum of AIFavg(Q) average of Q value avg(AIF) average of AIFmed(Q) median of Q value med(AIF) median of AIFdev(Q) deviation of Q value dev(AIF) deviation of AIFsum(H-index) sum of H-index JIF journal impact factor(3) minimum samples of leaf nodes : this parameter defines the conditionsunder which the subtree continues to be divided. If the number of sampleson the leaf node is smaller than the set value, the node will not be furtherdivided.(4) maximum depth of decision tree : this parameter is used to control themaximum depth of the decision tree generated by each round iteration. Thepurpose is to prevent over-fitting.(5) Sampling rate : this parameter indicates the proportion of training sam-ples used in each training, and its value ranges from 0 to 1. When the valueis 1, it indicates that all the samples are involved training. The main roleof this parameter is to add sample perturbation to prevent over-fitting. Thesampling rate of general samples is set between 0.5 and 0.8. If the value istoo large, the risk of over-fitting will be increased. If the value is too small,correct samples may not be learned due to too few samples, and the modeldeviations will increase.We used the Grid Search method to adjust the above mentioned parame-ters. The value of the learning rate ranges from 0.0005 to 0.5 and the step sizeis 0.0005. The number of iterations ranges from 500 to 3000 and the stepsize is 500. The value of leaf node minimum sample number value ranges12rom 10 to 80 and the step size is 10. The maximum depth of the decisiontree ranges from 5 to 7 and the step size is 1. Sampling rate ranges from 0.5to 1.0 and the step size is 0.1.According to the range of values and the step size of each parameter,the grid covered parameter space is generated for grid search. Each pointon the grid is traversed, and the parameter combination corresponding tothe point is used to train the model on the training set. Correspondingly,prediction is performed on the validation set, and the predictive accuracy iscalculated as an estimate of the prediction performance of the model underthe set of parameters. After traversing all the parameter combinations, theset of parameters with the highest prediction accuracy on the correspondingverification set is taken as the parameter of the final model. 5. Results and discussion In this subsection, we introduce several evaluation metrics for validatingthe PPI prediction model. Mean absolute error (MAE) .MAE quantifies how close the predictions is to the ground truth. MAE isgiven by: M AE = 1 n n (cid:88) i =1 | e i | (18)The mean absolute error is an average of the absolute errors | e i | , whichis equal to | f i − y i | , where f i is the prediction, and y i is the true value. n represents the number of predictions. Root mean squared error (RMSE) .RMSE is similar to MAE, which is defined as follows: RM SE = (cid:118)(cid:117)(cid:117)(cid:116) n n (cid:88) i =1 e i (19)RMSE also provides the average error and quantify the overall error rate. Insome cases, we need to compare results across activities, but RMSE can notgive an indication of the relative error. We need a normalized error, such asRange-normalized RMSE. 13 ange-normalized RMSE (NRMSE) . N RM SE = RM SEmax ( y i ) − min ( y i ) (20)where max ( y i ) and min ( y i ) represent the maximum and minimum functions,which are calculated by all ground-truth values of the test instances. Mean absolute percentage error (MAPE) An useful normalized metric is MAPE, which normalizes each error value foreach prediction. This metric shows the average deviation between predictedoutput and true output from the n experimental data. MAPE is defined asfollows: M AP E = 1 n n (cid:88) i =1 | e i | y i (21) Accuracy Accuracy shows the fraction of papers correctly predicted for a given errortolerance (cid:15) : Accuracy = 1 n n (cid:88) i =1 | | e i | y i ≤ (cid:15) | (22) Figure 2 shows the feature importance score of all features to predict the15th year’s citations of the published papers. The features c c c c Q value, JIF, authors’ Q value’ sum, respectively, theirvalues are 0.0608, 0.0527, 0.0441, 0.0338, 0.0331, and 0.0317. The feature im-portance score for predicting 6th-14th year citations of the published papersare shown in the appendix. 14 igure 2: Feature importance score of all features. Based on all features’ importance in predicting 6th-15th year citationsof papers, we selected the top 10 feature retraining model, the predictionaccuracy remains high. There are differences in feature importance scoresfor different predictive years (see appendix). Figure 3 shows the top 10 fea-ture importance score for predicting the 15th year citations of the publishedpapers. The features c c Q value’s minimum ranks fourth, and its value is 0.0969. Otherfeature importance scores are less than 0.0950. Figure 3: Importance scores of top 10 features. Q value, and JIF areranked in the top 10 of the list of feature importance ranking. Author H-index related features and author AIF related features are located behindthe list of feature importance rankings.In summary, we observe that historical citations play an important rolefor predicting the impact of the paper. Besides, author-based features areimportant in predicting the paper impact, especially the authors’ Q value. To test the validity of PPI prediction model, its predictive performance iscompared against four competing models: PPI NECAI, GBDT All, GBDT 10and PLI Science published by Wang et al. (2013). The comparison is madein terms of MAE, RMSE, NRMSE, MAPE, and accuracy.Figure 4 shows the MAE value of the five models. According to Figure4, we observe that PPI outperforms all competing models with lower MAEvalues for predicting citations after a scholarly paper is published for 5 years.We also observe that MAE values of all five models increase along with theyear, indicating that the predictive performance of all five models degradesover time. M AE Time PPI PPI_NECAI GBDT_All GBDT_10 PLI_Science Figure 4: Comparing MAE for different models. R M SE Time PPI PPI_NECAI GBDT_All GBDT_10 PLI_Science Figure 5: Comparing RMSE for different models. Figure 6 shows NRMSE values of the five models. For PPI model andPPI NECAI model, their NRMSE values are about 0.006. The NRMSEvalues of GBDT All model and GBDT 10 model shows increasing trends, andtheir NRMSE values are about 0.018 in future the 10th years after the fifthyear of scholarly paper published. The NRMSE values of the PLI Sciencemodel show a decaying trend. In term of NRMSE, the predictive performanceof the PPI model is better than other four models.Figure 7 shows the MAPE values of the five models. We observe thatthe MAPE values of GBDT All model and GBDT 10 model are below theother three models. The MAPE value of the PPI model is slightly higherthan GBDT All model and GBDT 10 model.Figure 8 shows the accuracy of the five models. The accuracy values of17 NR M SE Time PPI PPI_NECAI GBDT_All GBDT_10 PLI_Science Figure 6: Comparing NRMSE for different models. M APE Time PPI PPI_NECAI GBDT_All GBDT_10 PLI_Science Figure 7: Comparing MAPE for different models. A cc u r a cy Time PPI PPI_NECAI GBDT_All GBDT_10 PLI_Science Figure 8: Comparing Accuracy for different models. By comparing PPI and PPI-NECAI, we observe that early citing authors’impact contributes to improved prediction of scholarly paper impact. PPIyields superior citation prediction over PPI-NECAI, GBDT All, GBDT 10and PLI-Science in terms of MAE and NRMSE. Although the predictiveperformances of the GBDT All model and the GBDT 10 model are betterthan other three models in terms of MAPE and accuracy, the proposed PPIprediction model gives a clear explanation for the predictive effect of themodel by the following factors: inherent quality of scholarly paper, scholarlypaper impact decaying over time, early citations, and early citers’ impact.Compared to PPI-NECAI and PLI Science, PPI more accurately predictsthe scholarly paper impact. Although considering early citers’ impact canimprove the predictive performance of PPI model, other factors exist, such asauthor’s team impact, journal impact, authors’ cooperation relationship, anddisciplinary differences. In addition, due to the fact that the APS datasetonly contains local citations, this might limit the predictive accuracy of thiswork. Uncovering the essence of paper potential index is a promising future19ork, which might improve the predictive performance of PPI model, andit could provide a better understanding of the evolution of scholarly paperimpact. 6. Conclusion Based on point estimation process, we present the PPI predictive model,which considers the following four factors: (1) inherent quality of scholarlypaper; (2) scholarly paper impact decaying over time; (3) early citations; and(4) early citers’ impact. Experimental results indicate that the PPI modelimproves citation prediction of scholarly papers. The predictive performanceof PPI is better than PPI-NECAI, which reflects that early citing author’simpact is important for predicting the citations of scholarly paper. Althoughthe predictive performance of the GBDT All model and GBDT 10 model isbetter than other three models in terms of MAPE and accuracy, the pro-posed PPI predictive model give a clear explanation for the predictive effect,indicating that an ultimate understanding of long-term impact of scholarlypaper will benefit from understanding the inherent evolutionary mechanismof citations of scholarly papers. Acknowledgement We thank Feng Xia and Jie Hou from School of Software, Dalian Uni-versity of Technology for valuable discussions on this work. This workwas partially supported by Liaoning Provincial Key R&D Guidance Project(2018104021) and Liaoning Provincial Natural Fund Guidance Plan (20180550011).