[PDF] Author Impact: Evaluations, Predictions, and Challenges

Abstract

Author impact evaluation and prediction play a key role in determining rewards, funding, and promotion. In this paper, we first introduce the background of author impact evaluation and prediction. Then, we review recent developments of author impact evaluation, including data collection, data pre-processing, data analysis, feature selection, algorithm design, and algorithm evaluation. Thirdly, we provide an in-depth literature review on author impact predictive models and common evaluation metrics. Finally, we look into the representative research issues, including author impact inflation, unified evaluation standards, academic success gene, identification of the origins of hot streaks, and higher-order academic networks analysis. This paper should help the researchers obtain a broader understanding in author impact evaluation and prediction, and provides future research directions.

Full PDF

DDate of publication xxxx 00, 0000, date of current version xxxx 00, 0000.

Digital Object Identiﬁer 10.1109/ACCESS.2017.DOI

Author Impact: Evaluations, Predictions,and Challenges

FULI ZHANG , XIAOMEI BAI , IVAN LEE Library, Anshan Normal University, Anshan 114007, China Computing Center, Anshan Normal University, Anshan 114007, China School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide SA 5001, Australia

Corresponding author: Xiaomei Bai (e-mail: [email protected]).This work was partially supported by Liaoning Provincial Key R&D Guidance Project (2018104021) and Liaoning Provincial NaturalFund Guidance Plan (20180550011).

ABSTRACT

Author impact evaluation and prediction play a key role in determining rewards, funding, andpromotion. In this paper, we ﬁrst introduce the background of author impact evaluation and prediction. Then,we review recent developments of author impact evaluation, including data collection, data pre-processing,data analysis, feature selection, algorithm design, and algorithm evaluation. Thirdly, we provide an in-depthliterature review on author impact predictive models and common evaluation metrics. Finally, we look intothe representative research issues, including author impact inﬂation, uniﬁed evaluation standards, academicsuccess gene, identiﬁcation of the origins of hot streaks, and higher-order academic networks analysis. Thispaper should help the researchers obtain a broader understanding in author impact evaluation and prediction,and provides future research directions.

INDEX TERMS

Author impact, author impact evaluation, author impact prediction.

I. INTRODUCTION B IG scholarly data has grown exponentially which alignsto the expansion of academic activities and productivity,however, it has also brought unprecedented challenges [1].For example, it is difﬁcult to identify most relevant researchwork or scholars from a vast amount of scholarly datathrough a simple search. In addition, for decision makerswho allocate research funds, more information is needed tosupport the research evaluation system not only reﬂects pastperformance, but also predicts potential research productiv-ity [2]–[4]. Therefore, author impact evaluation and predic-tion are of great signiﬁcance. On one hand, it is possible todistinguish authors’ impact and provide assistance, especiallyfor beginners, to explore a new research ﬁeld. On the otherhand, author impact evaluation provides support for rewards,funding, and promotion decisions to a certain extent.The past few decades have witnessed the progress ofresearch in author impact evaluation and prediction, includ-ing changes in research focuses: (1) from past performanceanalysis to future prediction of author impact; (2) fromsimple citation analysis to complex citation analysis; (3)from unstructured metrics to structured metrics; (4) from asingle dimension of evaluation methods to multiple dimen-sions of evaluation methods. To quantify scholarly impact, citation has been the most widely used technique [5]–[7]. Alarge number of citation-based indicators are proposed, suchas h-index and its variants [8]–[11]. However, the methodof measuring author impact from a single dimension hasbeen unable to meet the rapid development of big scholarlydata [12]. The emergence of academic media platforms andthe evolution of social network relationships have challengedthe evaluation and prediction of author impact [13]. Struc-tured evaluation based on citations is a popular method forquantifying author impact in recent years [14]–[16]. Thismethod evaluates author impact mainly from the perspec-tive of scholarly network structure. The advantage of thenetwork-based structured evaluation method is that it can userich scholarly data and relationships in academic communityrather than relying solely on citation relationships.As an alternative to structured evaluation, model-basedmethods have also been introduced for author impact pre-diction [17], [18]. Sinatra et al. [17] introduced a stochasticmodel which assigns a unique parameter Q for each individ-ual author, to accurately predict the evolution of the author’simpact. The Q model mainly considers the effects of pro-ductivity, individual ability, and luck, to form a generalisedpattern of scientiﬁc success.Although researchers have delivered various achievement VOLUME 4, 2016 a r X i v : . [ c s . D L ] A ug n author impact evaluation and prediction, many challengingproblems remains unresolved [19]–[23]. The heterogeneousattribute and the dynamic nature of big scholarly data lead tohighly diversiﬁed scholarly networks, which raises the chal-lenge in exploring the relationship between authors and otherscholarly entities. At present, in most of the author impactprediction models, implicit features and implicit relationshipmining need to be further improved, namely, factors thatcan inﬂuence the success of scholars need to be explored indepth. By achieving these, it will be possible to more accu-rately discover the academic rising star and more reasonablyevaluate and predict author impact. Deep citation behavioranalysis is another challenging issue in the existing relevantstructured author impact evaluation and prediction research.The author’s citation behavior is complex and diverse, andit is necessary to fully explore the hidden relationships inscholarly networks and ﬁne-tune the evaluation and predic-tion models.This paper presents a review of recent developments inauthor impact evaluation and prediction, and the reviewcomplements relevant work in the past: Waltman et al. [24]offer a review of the literature on citation impact indica-tors. This overview covers data sets, basic citation impactindicators, the topics of normalization, counting methods,journal citation impact indicators, and recommendations forfuture research. This overview has a broader scope than ourpresented overview, but it covers the most basic indicatorssuch as citations, the number of highly cited publications,and h-index. Wildgaard et al. [25] present a review on authorimpact evaluation. One limitation of this review is that it doesnot consider author impact prediction research. In this paper,the progress of author impact evaluation and prediction isdescribed in detail.The rest of this paper is organized as follows. First, authorimpact evaluation method is presented in Section 2. Next, areview of the literature on author impact prediction methodis provided in Section 3. Open issues and challenges are thendiscussed in Section 4. Finally, we conclude this paper inSection 5. II. AUTHOR IMPACT EVALUATION

Author impact research mainly addresses two related issues:(1) evaluate the past impact of authors; and (2) predict theirfuture impact. Author impact evaluation includes the follow-ing parts: data collection, data pre-processing, data analysis,feature selection, algorithm design, and algorithm evaluation,as shown in Figure 1.

A. DATA SOURCES

Web of Science, Scopus and Google Scholar are frequentlyused for author impact evaluation. Web of Science and Sco-pus are subscription-based databases. In addition to cover-ing journals and book series, Web of Science also offersconference proceedings citation index [24]. More scholarlyresources can be retrieved by Google Scholar, including meta data of scholarly papers, conference proceeding, books,theses, patents and technical reports.In addition to proprietary data sets, several public accessi-ble data sets are available, including American Physical Soci-ety (APS) , Digital Bibliography & Library Project (DBLP) ,and Microsoft Academic Graph (MAG) . One advantage ofAPS is that it provides citation records as part of its data set.DBLP has distinguished different authors based on names,but it does not provide citation records. In comparison, MAGoffers heterogeneous information with publication records,authors, institutions, journals, conferences, ﬁelds of studyand citation relationships.Apart from accessing meta records made available in pro-prietary or public-accessible data sets, another approach is tocrawl some online social data such as downloads, mentions,tweets, shares, views, discussions, saves, and bookmarks forauthor impact evaluation [26]. B. DATA PRE-PROCESSING

Data pre-processing is crucial for author impact evaluation asit signiﬁcantly impacts the accuracy. Upon obtaining the au-thor’s raw data, the few questions need to be considered: (1)How to accurately differentiate authors based on names andafﬁliation? (2) How to account for authors who are associatedto multiple afﬁliations? (3) How to weight individual authorcontributions in jointly published papers?In practice, different pre-processing techniques are takenplace subject to different evaluation objectives. For example,in author impact evaluation and prediction research, authorname disambiguation is necessary to distinguish authors withsame full names for some datasets [27], such as the APSdataset which is commonly used for scholarly data analysisin the Physics discipline.

C. DATA ANALYSIS

In author impact evaluation, scholarly data analysis can bedivided into two categories: statistical analysis and scholarlynetwork analysis. Statistical analysis can reveal the scientiﬁcknowledge behind the big scholarly data by using statisticalanalysis [1].The heterogeneity and diversity attribute of scholarly net-work structure have raised the challenges in scholarly net-work analysis. In recent decades, researchers have madeimportant progress in network analysis research, such as thestructural hole theory [28]. The structural hole theory hasbeen applied to academic networks by researchers to evaluateauthor impact [29], [30]. Their experimental results indicatethe structural hole has a very close relationship to individualscholar’s success. Social network connecting authors and co-authorship network have attracted increasing attention forauthor impact prediction. Zhou et al. [31] propose a co-ranking method to evaluate authors and their publications http://publish.aps.org https://dblp.uni-trier.de/ http://aka.ms/academicgraph VOLUME 4, 2016 ata SetsMAGAPSDBLP Data PreprocessingDistinguished name Author's institutionAuthor ’ s geographical coordinates Analysis RelationshipsCitation relationshipsCo-author relationshipsCo-citation relationships Evaluation MethodsCitations-based evaluationNetworks-based evaluation Evaluation Indices Spearman ' s rank correlation coefficientDiscounted cumulated gain (DCG)RI(Pi)@K FIGURE 1: Framework of author impact evaluation.based on three scholarly networks: authors’ social network,citation network, and co-authorship network.

D. FEATURE SELECTION

Early studies in author impact evaluation mainly considertwo quantitative features: citation counts and paper counts.Recent researches have been inspired from the PageRankalgorithm and have used the structural features of scholarlynetworks to assess the author impact [32]–[35]. Also, socialnetwork measures such as degree centrality, closeness cen-trality, betweenness centrality, and PageRank frequently areused to assess author impact [14], [19], [32], [34], [36]–[40].In addition, Tweets are used to quantify author impact [41].Table 1 shows an example of selected feature for evaluatingauthor impact.

E. AUTHOR IMPACT EVALUATION

1) Citations-based evaluation

The most representative of author impact evaluation methodis h-index, which deﬁnes that “a scientist has index h if h of his or her N p papers have at least h citations each andthe other ( N p − h ) papers have ≤ h citations each” [8].While the simplicity of h-index might be the reason forits popularity [51], some researchers have pointed out itsdrawbacks. For instance, h-index is a cumulative measure soit does not fall, thus it fails to reﬂect the reduced impact forscholars that become inactive in research. Also, h-index doesnot differentiate different citations such as self-citations bydefault, thus the measure may not reﬂect true impact subjectto different degrees of manipulation [52].With the limitations in h-index, a large number of h-indexvariants have emerged to address these shortcomings [11],[53]–[58]. Table 2 compares h-index and its representativevariants. Egghe [9] proposes g-index, which not only keepsall advantages of the h-index, but also measures the globalcitation performance of an author. If the number of citationsof an author’s published papers is in descending order, theg-index is the largest number that top g papers received at least g citations. Since g-index better takes into account thenumber of citations of top papers of an author, g-index iseasily affected by highly cited papers. For example, an authorpublishes papers, in these papers, the number of citationsof just one paper is , the number of citations of other papers is equal to . Although the author’ h-index is 1,the author’s g-index is . To overcome the shortcomingsof g-index, Alonso et al. [59] quantify each author impactby using their proposed hg-index (see hg-index formula inTable 2), which keeps the advantages of h-index and g-indexand minimizes their disadvantages. For the above example,the author’s hg-index is √ . By comparing hg-index, g-index, and h-index, the advantage of hg-index is obviousand is listed as follows: (1) hg-index weakens the impact ofhighly cited papers; (2) hg-index solves the shortcoming ofh-index. For the above example, if the author impact usesthe w-index [60] to assess, the author’s w-index is also , itequals to the value of h-index. In this way, w-index does notsolve the shortcoming of h-index to a certain extent. Due tothe ignored excess citations of h-index, Zhang [11] deﬁnese-index. The excess citations received by all publications inthe h-core can be denoted by e . The e-index is a necessarycomplement for h-index, especially for assessing highly citedscholars. Further, to overcome the limitation of h-index ande-index, Bihari et al. [61] propose EM-index, which is theextension of h-index and e-index. EM-index is more ﬁne-grained indicator than the h-index, g-index, and e-index.However, EM-index does not consider the citations of allpublications. Therefore, the EM-index is suitable to evaluatethe highly cited papers. To overcome this limitation of EM-index, a multidimensional extension of the EM-index calledEM’-index is proposed [61]. Subsequently, to overcome thislimitation of year based indices, Bihari [62] deﬁnes the yearbased EM-index and the year based EM’-index. In theirmethods, they consider three different parameters: the totalnumber of papers, the year citations of paper, and citationsobtained in a particular year.In addition, Eggle et al. [54] propose a weighted h-index, VOLUME 4, 2016 ABLE 1: An example of selected features for evaluating author impact.

Features Feature Types Referencescitations statistical feature [17], [34], [40], [42]–[46]maximum entropy statistical feature [47]time statistical feature [37]the number of papers published, the number of co-authors statistical feature [40], [47]the number of authors, the number of papers in a certain journal statistical feature [45]the active year, the citing times, the citing times per paper, and theciting times per year, the cited time of papers one author cites, theciting time of papers one author cites, and the cited time of co-authors statistical feature [47]references being cited by the author before and its ratio, references inthe author’s previous publications and its ratio, keywords, the times ofauthor attend venue statistical feature [48]number of unique publication venue, number of paper-paper citationedges, number of coauthorship edges, number of author-citation edges statistical feature [16]tweets statistical feature [41]bridge counts, betweenness, diversity of cooperators statistical feature [30]degree, betweenness, closeness network features [34]PageRank network features [14], [19], [32], [34],[37]–[40]network index(NI), PageRank, HITS network features [30]researcher importance network features [44]Eigenfactor scores network features [49]paper authority vector network features [50] named h w -index, depending on the obtained citations ofpapers belonging to the h-core, and h -index ≤ h w -index < g -index. Würtz et al. [63] propose the stratiﬁed h-index, whichsupplements the conventional h-index in three separate h-indices: ﬁrst authorships, second authorships, and last author-ships. Other indices, such as Multiple h-index [64], rp-indexand cp-index [65], b-index [66], q -index [67], year-based h-type indicators [68], pure h-index [69], Wl-index [70], R-andAR-indices [71], π -index [72], and h m -index [73] are usedfor author impact evaluation.

2) Network-based evaluation

Because citations may be easily manipulated, citation-basedindices may not objectively evaluate the actual impact ofauthors. Instead of citations, scholarly networks are usedfor author impact evaluation. Network-based methods havebeen investigated as alternative methods for author impactevaluation.The exponential growth of academic data offers unprece-dented opportunities to explore patterns characterizing thestructure of scholarly networks and evolution of science [74].To demonstrate these academic relationships (see Figure 2),we randomly selected 10 authors from computer science areain the MAG dataset to construct eight typical networks basedon papers they published, journals or conferences, and insti-tutions. These scholarly networks include citation network,co-author network, author-paper network, author-journalnetwork, author-institution network, author-conference net-work, paper-journal network, and paper-conference network.In Figure 2, different color nodes represent different typesof academic entities and the lines between them representscholarly relationships. Because the shortcomings exist inauthor impact evaluation based on citations, as mentionedin Table 2, researchers measure author impact by usingscholarly networks. By exploring quantitative methods, from statistics to network science approaches, machine learningalgorithms and mathematical analysis, scientists have devel-oped structural author impact evaluation methods based onscholarly networks (see Table 3).Table 3 compares the different author impact evaluationmethods in the following eight aspects, including methodand reference, scholarly network, homogeneous relation-ships, heterogeneous relationships, data sets, comparing al-gorithms, evaluation metrics, and performance. Ding etal. [34] introduce PageRank algorithm to academic network,and its purpose is to use the PageRank algorithm to evaluatethe author impact. During this period, researchers mainlyleverage homogeneous networks for evaluation. Based onPageRank, Pradhan et al. [75] propose C -index, whichranks authors by using the weighted multi-layered scholarlynetworks, including author-author citation network, author-author co-authorship network, and paper-paper citation net-work. The C -index score can be obtain by computing threeindividual component scores from three layers mentionedabove. The component scores for individual layer are actuallythe PageRank scores.Recently, author impact evaluation has received wide at-tention, especially heterogenous network with multiple typesof nodes and relationships. Liu et al. [76] propose a graph-based ranking framework, Tri-Rank, to co-rank authors,scholarly papers, and venues simultaneously in heterogenousscholarly networks. Their experimental results show that Tri-Rank with heterogenous networks is more effective and efﬁ-cient than PageRank [77], HITS [78], and Co-Rank [31] inranking authors. However, in these researchers, all citationsare regarded as equal importance. To automatically identifyhow references in a bibliography affect on the citing paper,Zhu et al. [79] examine the effectiveness of several features todetermine the academic inﬂuence of a citation. Furthermore,researchers consider weighted citation networks to measure VOLUME 4, 2016

ABLE 2: Comparing h-index and its representative variants.

Method andreference Formula Advantage Disadvantageh-index [8] h = c cp · n , where c is newcitations per year every subsequentyear, p is papers of researcherpublished per year, n is n year an easily computable index, it givesan estimate of the importance,signiﬁcance for an author’spublications h-index is inﬂuenced byself-citations, h-index will notdecrease, excess citations arecompletely ignored, does notquantify co-authors’ contributesg-index [9] g = (cid:16) α − α − (cid:17) α − α · h , where α is aparameter, h is the value of h-index as simple as the h-index, bettertakes into account the citationscores of the top articles the g-index may be greatlyinﬂuenced by a very successfulpaperhg-index [59] hg = √ h · g , where h is h-index, g is g-index simple to compute, provides moregranularity than the h- and g-indices, takes into account the citesof the highly cited papers,signiﬁcantly reduces the impact ofsingle very high cited papers hg-index is inﬂuenced byself-citationsw-index [60] w = max ( cit p ≥ w − p + 1) , for all p ≤ w ,where cit p is the citations of the p -th paper as simple as the h-index, thew-index is the largest isoscelesright angle triangle under thecitation curve w-index is inﬂuenced byself-citations h w -index [54] w = (cid:113)(cid:80) kj =1 cit j , where cit =citation counts, j is the largestrow index depending on the obtained citationsof papers belonging to the h-core this index is inﬂuenced byself-citations, it does not considerthe excess citation counte-index [11] e = (cid:113)(cid:80) hp =1 ( cit p − h ) , where h is the h-index, cit p is the citationcount of pth paper excess citations are considered,especially for evaluating highlycited scientists it does not consider the corecitation count EM -index [61] EM = (cid:113)(cid:80) ke =1 E e , where EM is the EM-index of an author, E =h, E can be obtained bycalculating the h-index from theexcess citations of h-core paper considers the core citation countand the excess citation count this index does not consider thecitations of all publication EM (cid:48) -index[61] EM (cid:48) = (cid:113)(cid:80) ke =1 E (cid:48) e , where E (cid:48) is the k-dimensional vector thatcontains the citations of all thepapers (these papers are cited atleast once) The EM (cid:48) -index is themultidimensional extension of the EM -index, consider all papers The EM (cid:48) -index is inﬂuenced byself-citationsYear based EM -index [62] Y _ EM = (cid:113)(cid:80) ki =1 Y E e , theyear based EM-index is the squareroot of the sum of the componentof year based EM-index are the extension of year basedh-indices this index does not consider thecitations of all publicationYear based EM (cid:48) -index[62] Y _ EM (cid:48) = (cid:113)(cid:80) ke =1 Y E (cid:48) e , theyear based EM (cid:48) -index is thesquare root of the sum of thecomponent of year based EM (cid:48) -index year based EM (cid:48) -index considersall the items which occur at leastone time this index is inﬂuenced byself-citations scholar impact. Nykl et al. [39] use h-index, the numberof papers, citations, journal impact values and author countof scholarly paper features as citation weight in citationnetworks. Further, they apply PageRank algorithm in thecitation networks. Their experimental results indicate thatusing the journal impact values in PageRank can improveauthor ranking. Li et al. [80] propose a network-based andmulti-parameter model to ﬁnd inﬂuential authors. The ideastems from the fact that the authority of scholarly networkschanges as nodes are removed. Author i ’s prestige in aca-demic networks is deﬁned as p i ( g ) = α i · n (cid:88) j =1 b ij ( g, β ) (1)where b ij ( g, β ) represents the element of matrix B ( g, β ) at row i and column j . The parameter α represents the basevalue of node. The parameter β can capture the value beingconnected to certain node, which decays with distance.Citation networks evolve over time, thus time representsan important feature to quantify scholarly or institutionalimpact [81]. Wang et al. [82] deﬁne a time-aware weightsof edges strategy for evaluating scholarly impact. In prac-tice, they ﬁnd that older publications can get more accuratepredictions than recent ones. Therefore, they give the edgesassociated with older authority publications higher weights,because the scores of these publications are more reliablethan those of new publications. In their researches, the hubscores of an author can be obtained by H ( A i ) = (cid:80) P j ∈ Neighbor ( A i ) W ap ( i, j ) · S ( P j ) (cid:80) P j ∈ Neighbor ( A i ) W ap ( i, j ) (2) VOLUME 4, 2016 IGURE 2: Eight typical scholarly networks - an example of 10 randomly-selected computer science authors from the MAGdata set.where

N eighbor ( A i ) is the collection of papers, neighboringto author A i . S ( P j ) is the score of scholarly paper, and W ap ( i, j ) is the weight of edge from author A i to paper P j .The weight W ap ( i, j ) can be calculated by W ap ( i, j ) = a T current − T i (3)where T current − T i indicates the age (in years) of the paper P i since it was published. a is a constant with its valuegreater than 1. In their experiments, the value of a is set as2. In addition, a temporal citation network among scholarsis used by Franceschet et al. [83], who allocate ratings byconsidering the relative position between two authors at thetime of the citations. Thus, they name the dynamic ratingmethod as TimeRank. The difference between TimeRank andPageRank is that PageRank uses the citing author’s absoluterating, while TimeRank uses the citing author’s relative rat- ing. It is worth mentioning that the ratings of citing authorfor PageRank are at the end of the temporal citation, whileTimeRank uses the ratings of citing and cited scholars at theactual time of citation.Apart from the time factor, scholars’ position in the aca-demic networks may also be utilized to assess the scholarimpact. Zhang et al. [30] ﬁrst deﬁne the scholar’s structuralindex (SI) to capture the effect of scholars’ positions inscholarly networks. They then use the PageRank and HITSalgorithms together to obtain the scholar’s network index(NI). Finally, based on the values of SI and NI, they calculatescholar’s ﬁnal score. In their research, to determine scholars’positions in scholarly network, they apply the structuralholes theory which indicates that scholars linking differentdisciplines have more inﬂuence.The evaluation models described above suffer from one VOLUME 4, 2016 imitation, namely they are usually based on scholarly net-work structure, ignoring the content semantic awareness. Toaddress this limitation, Zhang et al. [48] propose a task-guided and semantic-aware ranking model. The rankingmodel performs joint optimization of GRU-based contentencoding and task-guided ranking. Their experimental resultsdemonstrate that the performance of TSR+ is better than anumber of baselines.

F. EVALUATION METRICS

Two popular metrics: Precision and Recall are usually usedto evaluate the performance of author impact methods. ThePrecision shows the accuracy top-k authors by a method andit is calculated by

P recision = T PT P + F P , where

T P (TruePositive) represents the number of positive cases that arecorrectly divided.

F P (False Positive) represents the numberof positive cases wrongly divided. The Recall reﬂects theration of true correlated authors returned in the top-k list.It is deﬁned as:

Recall = T PT P + F N , where

F N representsthe number of negative cases wrongly divided. In addition,Spearman’s rank correlation coefﬁcient and Discounted Cu-mulative Gain can be used to evaluate author impact.

Spearman’s rank correlation coefﬁcient . ρ = (cid:80) i ( R ( A i ) − R )( R ( A i ) − R ) (cid:113)(cid:80) i ( R ( A i ) − R ) (cid:80) i ( R ( A i ) − R ) (4)where R ( A i ) and R ( A i ) are the position of author A i in theground truth rank list and the corresponding algorithm ranklist, respectively. R and R are the average rank positions ofall authors in the two ranks lists, respectively. Discounted Cumulative Gain (DCG) . DCG n = n (cid:88) i =1 rel i log i + 1 , (5)where DCG n is the weighted sum of relevant degree ofranked authors, and its weight is a decreasing functionvarying according to the ranked position. Variable i is theranking of an author, and rel i is the relevance score of the i -th ranked author. III. AUTHOR IMPACT PREDICTION

In the previous section, we have discussed author impactevaluation methods and common evaluation metrics. In thissection, we focus speciﬁcally on author impact predictionmodels and common evaluation indices. The author impactpredictive model can be roughly divided into three cate-gories: feature-driven predictive model, network-based pre-dictive model, and generative predictive model. The frame-work of author impact prediction includes input data, predic-tive model, and output results, as shown in Figure 3.

A. AUTHOR IMPACT PREDICTION MODEL

1) Feature-driven predictive model

To examine how one’s h-index will evolve over time, feature-driven predictive models based on the following feature havebeen studied: author feature (author inﬂuence, number of co-authors, ﬁrst author’s h-index, average h-index of all authors),paper feature (citations, average citations, topic novelty,topic diversity), social feature (PageRank score, weightedaverage h-index of co-authors, co-author’s citations) andother features (venue citation, venue count, venue speciﬁcityscore) [87]–[92]. Several representative examples about au-thor impact prediction using mixture of features are summa-rized in Table 4. In order to analyze the efﬁciency of multi-feature for author impact prediction, regression models areoften used, such as linear regression [93], semi-continuousregression [94] and XGboost [95].McCarty et al. [87] integrate the variables reﬂecting anauthor’s collaborative behavior into regression model forpredicting author’s h-index. In their studies, the number ofauthors across all h-index articles, average authors of eacharticle, normalized mean betweenness, average number ofarticles published between co-authors, and average h-indexamong co-authors are selected as features to train the learningmodel. Penner et al. [97] propose an age-dependent cumula-tive model, and the predictive power of this model dependson scholars’ career age.Dong et al. [88] formalize a novel author impact predictionproblem to examine the factors driving an article to increaseauthor’s h-index. They explore six categories of factors,including author, content of paper, venue, social and temporalfactors. According to the correlation analysis of these factors,They ﬁnd that the author’s authority on the paper topic andthe venue are important factors to improve the author’s h-index. Furthermore, Dong et al. [90] ﬁnd that the impactprediction of a scholar with a higher h-index is more difﬁcultthan a scholar with a lower h-index in the future. Ayaz etal. [92] consider a comprehensive data set in the ComputerScience from Arnetminer, and explore the effect of differentcareer ages on predicting author’s h-index .The prediction of academic rising stars has attractedwidespread attention in academia. Daud et al. [89] ﬁrst try touse supervised machine learning technologies for predictingthe rising stars. A set of features are constructed on basis ofscholars and their social attributes, such as author inﬂuence,author contribution, venue citation and co-author citations.Weihs et al. [96] generate a collection of 44 features for eachauthor, and integrate these features into several regressionmodels such as linear regression, simple Markov, RandomForest (RF) [98], and Gradient Boosted Regression Trees(GBRT) [99].

2) Network-based predictive model

Li et al. [80] propose a network-base model with twoparameters for ﬁnding the inﬂuential scholars. They useKatzBonacich centrality to deﬁne scholarly network pres-tige [100]. The parameter α shows the useful informationis exogenous to the scholarly networks, and parameter β VOLUME 4, 2016 ABLE 3: Comparison of different network-based author impact evaluation methods.

Method andreference scholarlynetwork Homogeneousrelationships Heterogeneousrelationships Data sets Comparingalgorithms Evaluationmetrics PerformancePageRank forrankingauthor [34] author-author yes no Web ofScience PageRankwithdifferentdampingfactors Spearmancorrelationcoefﬁcient citation rank is highlycorrelated with PageRankwith different dampingfactorsP-Rank [84] paper-paper,co-author,paper-journal yes yes DBLP SimRank Similarity The advantages of P-Rankare its semanticcompleteness, robustnessand ﬂexibilitygraph-basedalgorithms [85] author-paper no yes ArnetMinerand UvT baselinemethod MAP, MRR currently focusing onimproving the expertranking performancep-index [86] paper-paper yes no GoogleScholar h-index p-index is robust againstmanipulations andperforms fairer and moreeffectively in rankingscientistsTri-Rank [76] author-paper,paper-venue,venue-author no yes ACM DigitalLibrary PageRank,HITS,Co-Rank Precision@k,Bpref, DCG more effective andaccurate than thestate-of-the-artcompetitors includingPageRank, HITS andCo-Rank C -index [75] author-author,paper-paper,co-author yes no MAS h-index Spearmancorrelationcoefﬁcient C -index is as efﬁcient ash-indexTimeRank [83] author-paper no yes Web ofScience TotCit,PageRank,h-index Frequency more effective thanalternativesTRank(TR-re) [37] author-paper no yes DBLP andAPS RW, PRW,TR-ex,TR-po, PAve AUC better than a number ofbaselines AIRank

BrC [30] co-author no yes MAG

AIRank

BeC , SI BrC , SI BeC ,NI,PageRank AUC better than other methodsin ranking top scholarswith more cross-domaincitationstask-guided andsemantic-awareranking (TSR+)model [48] author-paper no yes AMiner TSR, TaskE,metap-ath2vec,word2v+BPR Precision@k,Recall@kand AUC better than a number ofbaselines

Input data Predicting model

Learning

Author social structureAuthor individual informationHistorical citationsCitation relationshipsTime factor

Testing

New scholarly dataSelected features• Web of Science• Scopus• Google Scholar• American Physical Society• Digital Bibliography & Library Project• Microsoft Academic Graph · Rising star · Citations · H-index

Output results

FIGURE 3: Framework of author impact prediction. VOLUME 4, 2016

ABLE 4: Comparison of several representative multiple features-based author impact prediction models.

Reference Author features Paper features Social features Other features Model Evaluationmetrics Predictivetarget[87] number ofco-authors,AvgAuthors,proportion ofacademicco-authors components,isolates,betweenness,hierarchy,MeanTie bivariatemodels,multivariatemodel, ﬁnalmodel R coefficient h-index[88] A-ﬁrst-max,A-ave-max,A-sum-max,A-ﬁrst-ratio,A-max-ratio,A-num-authors,A-num-ﬁrst C-popularity,C-popularity-ratio,C-novelty,C-diversity, C-authority-ﬁrst,C-authority-max,C-authority-ave S-degree,S-pagerank,S-h-co-author,S-h-weight R-ratio-max,R-citation,T-ave-h,T-max-h,T-h-ﬁrst,T-h-max,V-ratio-max,V-citation logisticregressionclassiﬁer(LRC), randomforest (RF),baggeddecision trees(BAG) precision,recall,F1-score, areaunder curve(AUC),accuracy h-index[89] authorinﬂuence,authorcontribution,temporaldimension co-authorcitations,co-author count venue count,venue score,venue citation MEMM,CART, BN, NB average F1score rising stars[90] A-ﬁrst-max,A-ave-max,A-sum-max,A-ﬁrst-ratio,A-max-ratio,A-num-authors C-popularity,C-novelty,C-diversity, C-authority-ﬁrst,C-authority-max,C-authority-ave S-degree,S-pagerank,S-h-co-author,S-h-weight R-ratio-max,R-citation,T-ave-h,T-max-h,T-h-ﬁrst,T-h-max,V-ratio-max,V-citation LRC, SVM,NB, RBF,BAG, RF precision,recall,F1-score, AUC,accuracy, MAP,Pre@3 h-index[96] h-index,h-indexvariation overthe last twoyears,cumulativecitation count number ofpaperspublished,number ofpaperspublished inlast two years PageRank ofauthor inunweighedco-authorshipnetwork,PageRank ofauthor inweighedco-authorshipnetwork, h-index ofvenues, thenumber ofpapers invenues, totalnumber venuepublished in linearregression, RF,gradientboostedregression tree(GBRT) R , MAPE h-index[92] current h-index,number ofco-authors number ofpublications,years sincepublishing ﬁrstarticle, averagecitations perpaper number ofdistinct journalspublished in,number ofarticles in top10 journals inComputerScience ﬁttingregressionequation R , RMSE,Max_error h-index measures the robustness of experimental results. Based onthe co-author networks, Daud et al. [91] develop a weightedmutual inﬂuence rank (WMIRank) for ﬁnding academic ris-ing stars by combining three attributes of co-authorship suchas co-author’s citations based mutual inﬂuence, co-author’order based mutual inﬂuence and co-author venues’ citationsbased mutual inﬂuence. Zhang et al. [101] propose Schol-arRank method by considering three factors: citation countof authors, the mutual inﬂuence among co-authors and themutual reinforce process of different academic entities in theheterogeneous academic networks. These academic networksinclude citation networks, paper-journal networks and paper-author networks. However, most network-based predictivemodels for predicting author impact ignore an important factthat the academic networks evolve over time. Zhang et al. [102] propose the PePSI method of indi-vidualized predictive scholars’ inﬂuence in the time seriesacademic network. They mainly classify scholars into differ-ent types according to their citation dynamics. Furthermore,they construct four academic networks: temporal paper cita-tion networks, temporal co-author networks, temporal paper-venue networks and temporal paper-author networks. Basedon these academic networks, they calculate each scholar’simpact by applying the random walk algorithms.

3) Generative predictive model

Although the feature-driven and network-based predictivemodels can improve the accuracy of the author impact predic-tion to a certain extent, these models lack explanatory power.Sinatra et al. [17] quantify scholar impact by formulating astochastic model that assigns an unique individual parameter

VOLUME 4, 2016 to each scholar. The Q value reﬂects an author’s inﬂuenceon a paper impact, and it is a constant in a scholar’s career.The Q parameter for scholar i is deﬁne as: Q i = e (cid:104) log c iα (cid:105)− µ p (6)where Q i represents the Q value of scholar i . (cid:104) logc iα (cid:105) repre-sents the average logarithmic citations of all papers publishedby scholar i . α represents scholar i ’s α -th paper. µ p is equal to (cid:104) (cid:98) p (cid:105) . They ﬁnd that a scholar’s h-index is jointly determinedby the Q parameter and the productivity N . In addition, theyﬁnd that scholar’s future career impact can be predicted bythe Q value. The Q model can be explained by temporalchanges in productivity, luck, and heavy tailed nature of ascholar’s impact distribution. B. EVALUATION INDICES

In this subsection, we introduce several evaluation metricsto verify the validity of author impact prediction. Exceptfor Precision and Recall, Mean Absolute Error (MAE), RootMean Square Error (RMSE), F-Measure and Accuracy areused as evaluation metrics. The MAE can quantify how closebetween the predictions and the ground truth is. It is deﬁnedas

M AE = n (cid:80) ni =1 | e i | , where MAE is an average of theabsolute errors | e i | , which is equal to | f i − y i | , f i is theprediction, y i is the true value, and n represents the numberof predictions. RMSE is similar to MAE, which is deﬁnedas : RM SE = (cid:113) n (cid:80) ni =1 e i , RMSE also provides theaverage error and quantiﬁes the overall error rate. F-Measureis deﬁned as F − M easure = ( β +1) P Rβ P + R , where β representsparameter. P is the accuracy rate (Precision), and R is therecall rate (Recall). Accuracy shows the fraction of paperscorrectly predicted for a given error tolerance (cid:15) . This metricis deﬁned as: Accuracy = n (cid:80) ni =1 | | e i | y i ≤ (cid:15) | . IV. OPEN ISSUES AND CHALLENGES

In this section, we show several open issues for further re-search in this area, including author impact inﬂation, uniﬁedevaluation standards, academic success gene, identifying theorigins of hot streaks, and higher-order academic networksanalysis.

A. AUTHOR IMPACT INFLATION

The consideration of author impact inﬂation, which is mainlycaused by citation inﬂation, is important in the measurement,interpretation, and modeling of science. The citation inﬂa-tion stems from the exponential growth of scholarly papers,and affects the relative number of citations [103]. Further,citation inﬂation inﬂuences the comparative evaluation ofscholars, institutions, and countries across different periods.For this reason, normalization strategies for quantifying ci-tation impact between disciplines are consistently exploredin the bibliometrics community [24]. As author impact isrelated to author’s citations, citation inﬂation has increasedthe difﬁculty of author impact evaluation and prediction.

B. UNIFIED EVALUATION STANDARDS

Although the predictive modeling of author impact has gen-erated enormous progress in quantifying scientiﬁc studies,different researchers choose different predictive performancemetrics. For example, Ayaz et al. [92] choose R and RMSEto evaluate the predictive results, whereas Dong et al. [90]decide to measure the Precision, Recall, F1-score, AUC,Accuracy, MAP and Pre@3. To more objectively qualify sci-entiﬁc studies, there’s a need for deﬁning a uniﬁed evaluationstandard. C. ACADEMIC SUCCESS GENE

In the past, more attention has focused on predicting author’sh-index and academic rising stars by using feature-drivenmodel and network-based model. Yet, little is known aboutthe mechanisms of the temporal evolution of author impact.Although Q parameter can accurately predict a scholar’simpact, the dependence of Q on exogenous factors, suchas education level, current institution, or publication habits,remains unknown [17]. More likely, the academic successgenes include multiple factors, rather than a single one.Uncovering the origin of the academic success genes is achallenging task, which not only could offer a better under-standing of evolution of scholar impact, but also might guideand train high-impact scholars. D. IDENTIFYING THE ORIGINS OF HOT STREAKS

The hot streak phenomena in scientists’ individual careershas attracted researchers’ attentions. Liu et al. [104] uncoverthat hot streaks fundamentally can drive the collective aca-demic impact of a scholar. The uncovered hot streak phe-nomena are particularly crucial to understanding the long-term academic impact of a scholar in his/her career. If weignore it, the future impact of a scholar’s career will beoverestimated or underestimated. They show a hot-streakmodel that captures a real wide range of academic impacttrajectories of a scholar. However, the origins of hot steaksphenomena remain unknown.

E. HIGHER-ORDER ACADEMIC NETWORKS ANALYSIS

Researchers have traditionally focused on analyzing homoge-neous and heterogeneous academic networks to quantify theimpact of scholars. Most of the prior studies reﬂect the ci-tation dynamics by ﬁrst-order academic networks, includingtwo nodes: the citing nodes and the cited nodes. Due to theﬁrst-order academic networks cannot reﬂect the true citationﬂow pattern, and the higher-order citation networks can moreaccurately represent the citation dynamics, the higher-orderacademic networks analysis can help us understand the long-term impact of scholars in their careers [105]. As the analysisof higher-order academic networks is difﬁcult because itscomplexity of constructing the higher-order dependencies inacademic networks, further study on this topic remains anopen challenge in scholar impact evaluation. VOLUME 4, 2016 . CONCLUSION

In this paper, we have provided a comprehensive reviewon author impact evaluation and prediction, focusing ondifferent quantifying methods that can be used for authorimpact evaluation and prediction. Several changes have takenplace in this area: (1) from simple analysis to prediction;(2) from single-dimensional assessment to multi-dimensionalassessment; (3) from explicit features to implicit features;(4) from unstructured metrics to structured metrics. However,the analysis of the literature on author impact evaluation andprediction has led to the conclusion that despite a number ofmethods have been proposed to resolve the problems in thisarea, the solutions of some important issues remain unknown,such as author impact inﬂation, uniﬁed evaluation standards,academic success gene, identifying the origins of hot streaks,and higher-order academic networks analysis.

REFERENCES [1] F. Xia, W. Wang, T. M. Bekele, and H. Liu, “Big scholarly data: Asurvey,” IEEE Transactions on Big Data, vol. 3, no. 1, pp. 18–35, 2017.[2] X. Bai, F. Zhang, and I. Lee, “Predicting the citations of scholarly paper,”Journal of Informetrics, vol. 13, no. 1, pp. 407–418, 2019.[3] L. Cai, J. Tian, J. Liu, X. Bai, I. Lee, X. Kong, and F. Xia, “Scholarlyimpact assessment: a survey of citation weighting solutions,” Sciento-metrics, vol. 118, no. 2, pp. 453–478, 2019.[4] X. Bai, F. Xia, I. Lee, J. Zhang, and Z. Ning, “Identifying anomalouscitations for objective evaluation of scholarly article impact,” PloS one,vol. 11, no. 9, p. e0162364, 2016.[5] G. Eugene, “The history and meaning of the journal impact factor,” Jama,vol. 295, no. 1, pp. 90–93, 2006.[6] R. K. Pan and S. Fortunato, “Author impact factor: tracking the dynamicsof individual scientiﬁc impact,” Scientiﬁc Reports, vol. 4, no. 4880, pp.1–24, 2014.[7] M. Farooq, H. U. Khan, S. Iqbal, E. U. Munir, A. Shahzad, M. Farooq,H. U. Khan, S. Iqbal, E. U. Munir, and A. Shahzad, “DS-index: Rankingauthors distinctively in an academic network,” IEEE Access, vol. 5,no. 99, pp. 19 588–19 596, 2017.[8] J. E. Hirsch, “An index to quantify an individual’s scientiﬁc researchoutput,” Proceedings of the National Academy of Sciences, vol. 102,no. 46, pp. 16 569–16 572, 2005.[9] L. Egghe, “Theory and practise of the g-index,” Scientometrics, vol. 69,no. 1, pp. 131–152, 2006.[10] S. Alonso, F. J. Cabrerizo, E. Herrera-Viedma, and F. Herrera, “h-index:A review focused in its variants, computation and standardization fordifferent scientiﬁc ﬁelds,” Journal of Informetrics, vol. 3, no. 4, pp. 273–289, 2009.[11] C.-T. Zhang, “The e-index, complementing the h-index for excess cita-tions,” PLoS One, vol. 4, no. 5, p. e5429, 2009.[12] J. E. Hirsch, “Does the h index have predictive power?” Proceedings ofthe National Academy of Sciences of the United States of America, vol.104, no. 49, pp. 19 193–19 198, 2007.[13] X. Bai, “Predicting the number of publications for scholarly networks,”IEEE Access, vol. 6, pp. 11 842–11 848, 2018.[14] U. Senanayake, M. Piraveenan, and A. Zomaya, “The PageRank-index:Going beyond citation counts in quantifying scientiﬁc impact of re-searchers,” PloS One, vol. 10, no. 8, p. e0134794, 2015.[15] D. Fiala, L. Šubelj, S. Žitnik, and M. Bajec, “Do PageRank-based authorrankings outperform simple citation counts?” Journal of Informetrics,vol. 9, no. 2, pp. 334–348, 2015.[16] D. Pradhan, P. S. Paul, U. Maheswari, S. Nandi, and T. Chakraborty, “C -index: a PageRank based multi-faceted metric for authors’ performancemeasurement,” Scientometrics, vol. 110, no. 1, pp. 1–21, 2017.[17] R. Sinatra, D. Wang, P. Deville, C. Song, and A. L. Barabási, “Quantify-ing the evolution of individual scientiﬁc impact,” Science, vol. 354, no.6312, pp. aaf5239–aaf5239, 2016.[18] M. Nezhadbiglari, M. A. Gonálves, and J. M. Almeida, “Early predictionof scholar popularity,” in Digital Libraries, 2016, pp. 181–190. [19] M. Dunaiski, J. Geldenhuys, and W. Visser, “Author ranking evaluationat scale,” Journal of Informetrics, vol. 12, no. 3, pp. 679–702, 2018.[20] D. Fiala and G. Tutoky, “Pagerank-based prediction of award-winningresearchers and the impact of citations,” Journal of Informetrics, vol. 11,no. 4, pp. 1044–1068, 2017.[21] M. Dunaiski, J. Geldenhuys, and W. Visser, “How to evaluate rankings ofacademic entities using test data,” Journal of Informetrics, vol. 12, no. 3,pp. 631–655, 2018.[22] K. Higham, M. Governale, A. Jaffe, and U. Zülicke, “Unraveling thedynamics of growth, aging and inﬂation for citations to scientiﬁc articlesfrom speciﬁc research ﬁelds,” Journal of Informetrics, vol. 11, no. 4, pp.1190–1200, 2017.[23] G. Panagopoulos, G. Tsatsaronis, and I. Varlamis, “Detecting rising starsin dynamic collaborative networks,” Journal of Informetrics, vol. 11,no. 1, pp. 198–222, 2017.[24] L. Waltman, “A review of the literature on citation impact indicators,”Journal of Informetrics, vol. 10, no. 2, pp. 365–391, 2016.[25] L. Wildgaard, J. W. Schneider, and B. Larsen, “A review of the charac-teristics of 108 author-level bibliometric indicators,” Scientometrics, vol.101, no. 1, pp. 125–158, 2014.[26] F. Xia, X. Su, W. Wang, C. Zhang, Z. Ning, and I. Lee, “Bibliographicanalysis of Nature based on Twitter and Facebook altmetrics data,” PloSOne, vol. 11, no. 12, p. e0165997, 2016.[27] A. A. Ferreira, M. A. Gonçalves, and A. H. Laender, “A brief surveyof automatic methods for author name disambiguation,” ACM SigmodRecord, vol. 41, no. 2, pp. 15–26, 2012.[28] G. Ahuja, “Collaboration networks, structural holes, and innovation: Alongitudinal study,” Administrative science quarterly, vol. 45, no. 3, pp.425–455, 2000.[29] T. Lou and J. Tang, “Mining structural hole spanners through informationdiffusion in social networks,” in Proceedings of the 22nd InternationalConference on World Wide Web. ACM, 2013, pp. 825–836.[30] J. Zhang, Y. Hu, Z. Ning, A. Tolba, E. Elashkar, and F. Xia, “Airank:Author impact ranking through positions in collaboration networks,”Complexity, vol. 2018, pp. 1–16, 2018.[31] D. Zhou, S. A. Orshanskiy, H. Zha, and C. L. Giles, “Co-rankingauthors and documents in a heterogeneous network,” in Seventh IEEEInternational Conference on Data Mining (ICDM). IEEE, 2007, pp.739–744.[32] M. Dunaiski, W. Visser, and J. Geldenhuys, “Evaluating paper and authorranking algorithms using impact and contribution awards,” Journal ofInformetrics, vol. 10, no. 2, pp. 392–407, 2016.[33] M. E. Falagas, V. D. Kouranos, R. Arencibia-Jorge, and D. E. Karageor-gopoulos, “Comparison of scimago journal rank indicator with journalimpact factor,” The FASEB journal, vol. 22, no. 8, pp. 2623–2628, 2008.[34] Y. Ding, E. Yan, A. Frazho, and J. Caverlee, “PageRank for rankingauthors in co-citation networks,” Journal of the American Society forInformation Science and Technology, vol. 60, no. 11, pp. 2229–2243,2009.[35] E. Yan and Y. Ding, “Discovering author impact: A PageRank perspec-tive,” Information processing & management, vol. 47, no. 1, pp. 125–134,2011.[36] J. Bollen, H. Van de Sompel, A. Hagberg, and R. Chute, “A principalcomponent analysis of 39 scientiﬁc impact measures,” PloS One, vol. 4,no. 6, p. e6022, 2009.[37] J. Zhang, Z. Ning, X. Bai, X. Kong, J. Zhou, and F. Xia, “Exploring timefactors in measuring the scientiﬁc impact of scholars,” Scientometrics,vol. 112, no. 3, pp. 1301–1321, 2017.[38] D. Fiala, “Time-aware PageRank for bibliographic networks,” Journal ofInformetrics, vol. 6, no. 3, pp. 370–388, 2012.[39] M. Nykl, M. Campr, and K. Ježek, “Author ranking based on personalizedPageRank,” Journal of Informetrics, vol. 9, no. 4, pp. 777–799, 2015.[40] A. Usmani and A. Daud, “Uniﬁed author ranking based on integratedpublication and venue rank,” International Arab Journal of InformationTechnology (IAJIT), vol. 14, no. 1, pp. 111–118, 2017.[41] S. Kong and L. Feng, “A tweet-centric approach for topic-speciﬁc authorranking in micro-blog,” in International Conference on Advanced DataMining and Applications. Springer, 2011, pp. 138–151.[42] D. G. Bharathi, “Evaluation and ranking of researchers – Bh index,” PloSOne, vol. 8, no. 12, p. e82050, 2013.[43] S. N. Dorogovtsev and J. F. Mendes, “Ranking scientists,” NaturePhysics, vol. 11, no. 11, pp. 882–883, 2015.[44] X. Jiang, “Graph-based algorithms for ranking researchers: not all swansare white!” Scientometrics, vol. 96, no. 3, pp. 743–759, 2013. VOLUME 4, 2016

45] T. Marchant, “Score-based bibliometric rankings of authors,” Journal ofthe American Society for Information Science and Technology, vol. 60,no. 6, pp. 1132–1137, 2009.[46] D. Bouyssou and T. Marchant, “Ranking authors using fractional count-ing of citations: An axiomatic approach,” Journal of Informetrics, vol. 10,no. 1, pp. 183–199, 2016.[47] J. Stallings, E. Vance, J. Yang, M. W. Vannier, J. Liang, L. Pang, L. Dai,I. Ye, and G. Wang, “Determining scientiﬁc impact using a collaborationindex,” Proceedings of the National Academy of Sciences, vol. 110,no. 24, pp. 9680–9685, 2013.[48] C. Zhang, L. Yu, X. Zhang, and N. V. Chawla, “Task-guided andsemantic-aware ranking for academic author-paper correlation infer-ence.” in IJCAI, 2018, pp. 3641–3647.[49] J. D. West, M. C. Jensen, R. J. Dandrea, G. J. Gordon, and C. T.Bergstrom, “Author-level eigenfactor metrics: Evaluating the inﬂuenceof authors, institutions, and countries within the social science researchnetwork community,” Journal of the American Society for InformationScience and Technology, vol. 64, no. 4, pp. 787–801, 2013.[50] R. Liang and X. Jiang, “Scientiﬁc ranking over heterogeneous academichypernetwork,” in Thirtieth AAAI Conference on Artiﬁcial Intelligence.AAAI, 2016, pp. 20–26.[51] F. Franceschini and D. A. Maisano, “Analysis of the hirsch index’soperational properties,” European Journal of Operational Research, vol.203, no. 2, pp. 494–504, 2010.[52] L. Engqvist and J. G. Frommen, “The h-index and self-citations,” Trendsin ecology & evolution, vol. 23, no. 5, pp. 250–252, 2008.[53] L. Bornmann, R. Mutz, and H. D. Daniel, “Are there better indices forevaluation purposes than the h index? A comparison of nine differentvariants of the h index using data from biomedicine,” Journal of theAmerican Society for Information Science & Technology, vol. 59, no. 5,pp. 830–837, 2014.[54] L. Egghe and R. Rousseau, “An h-index weighted by citation impact,”Information Processing & Management, vol. 44, no. 2, pp. 770–780,2008.[55] L. Egghe, “An improvement of the h-index: The g-index.” ISSI, 2006,pp. 1–4.[56] L. Bornmann, R. Mutz, H.-D. Daniel, G. Wallon, and A. Ledin, “Arethere really two types of h index variants? A validation study by usingmolecular life sciences data,” Research Evaluation, vol. 18, no. 3, pp.185–190, 2009.[57] R. Guns and R. Rousseau, “Real and rational variants of the h-index andthe g-index,” Journal of Informetrics, vol. 3, no. 1, pp. 64–71, 2009.[58] L. Bornmann, R. Mutz, and H.-D. Daniel, “Do we need the h index andits variants in addition to standard bibliometric measures?” Journal of theAmerican Society for Information Science and technology, vol. 60, no. 6,pp. 1286–1289, 2009.[59] S. Alonso, F. J. Cabrerizo, E. Herrera-Viedma, and F. Herrera, “hg-index:a new index to characterize the scientiﬁc output of researchers based onthe h- and g-indices,” Scientometrics, vol. 82, no. 2, pp. 391–400, 2010.[60] Q. Wu, “The w-index: A measure to assess scientiﬁc impact by focusingon widely cited papers,” Journal of the American Society for InformationScience and Technology, vol. 61, no. 3, pp. 609–614, 2010.[61] A. Bihari and S. Tripathi, “Em-index: a new measure to evaluate thescientiﬁc impact of scientists,” Scientometrics, vol. 112, no. 1, pp. 659–677, 2017.[62] ——, “Year based EM-index: a new approach to evaluate the scientiﬁcimpact of scholars,” Scientometrics, vol. 114, no. 3, pp. 1175–1205, 2018.[63] M. Würtz and M. Schmidt, “The stratiﬁed h-index makes scientiﬁcimpact transparent,” Ugeskrift for laeger, vol. 179, no. 14, 2017.[64] M. Yaminﬁrooz and H. Gholinia, “Multiple h-index: a new scientometricindicator,” The Electronic Library, vol. 33, no. 3, pp. 547–556, 2015.[65] A. Abbasi, J. Altmann, and J. Hwang, “Evaluating scholars based on theiracademic collaboration activities: two indices, the rc-index and the cc-index, for quantifying collaboration activities of researchers and scientiﬁccommunities,” Scientometrics, vol. 83, no. 1, pp. 1–13, 2010.[66] R. J. Brown, “A simple method for excluding self-citation from the h-index: the b-index,” Online Information Review, vol. 33, no. 6, pp. 1129–1136, 2009.[67] F. J. Cabrerizo, S. Alonso, E. Herrera-Viedma, and F. Herrera, “q2-index:Quantitative and qualitative evaluation based on the number and impactof papers in the Hirsch core,” Journal of Informetrics, vol. 4, no. 1, pp.23–28, 2010.[68] D. Mahbuba and R. Rousseau, “Year-based h-type indicators,” Sciento-metrics, vol. 96, no. 3, pp. 785–797, 2013. [69] J.-K. Wan, P.-H. Hua, and R. Rousseau, “The pure h-index: calculating anauthor’sh-index by taking co-authors into account,” COLLNET Journalof Scientometrics and Information Management, vol. 1, no. 2, pp. 1–5,2007.[70] X. Wan and F. Liu, “Wl-index: Leveraging citation mention number toquantify an individual’s scientiﬁc impact,” Journal of the Association forInformation Science and Technology, vol. 65, no. 12, pp. 2509–2517,2014.[71] B. Jin, L. Liang, R. Rousseau, and L. Egghe, “The R- and AR-indices:Complementing the h-index,” Chinese science bulletin, vol. 52, no. 6, pp.855–863, 2007.[72] P. Vinkler, “The π -index: A new indicator for assessing scientiﬁc impact,”Journal of Information Science, vol. 35, no. 5, pp. 602–612, 2009.[73] M. Schreiber, “A modiﬁcation of the h-index: The hm-index accounts formulti-authored manuscripts,” Journal of Informetrics, vol. 2, no. 3, pp.211–216, 2008.[74] S. Fortunato, C. T. Bergstrom, K. Börner, J. A. Evans, D. Helbing,S. Milojevi´c, A. M. Petersen, F. Radicchi, R. Sinatra, B. Uzzi et al.,“Science of science,” Science, vol. 359, no. 6379, p. eaao0185, 2018.[75] D. Pradhan, P. S. Paul, U. Maheswari, S. Nandi, and T. Chakraborty, “C3-index: revisiting author’s performance measure,” in Proceedings of the8th ACM Conference on Web Science. ACM, 2016, pp. 318–319.[76] Z. Liu, H. Huang, X. Wei, and X. Mao, “Tri-rank: An authority rankingframework in heterogeneous academic networks by mutual reinforce,” inProceedings of the 26th IEEE International Conference on Tools withArtiﬁcial Intelligence (ICTAI). IEEE, 2014, pp. 493–500.[77] L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank citationranking: Bring order to the web,” in Stanford Digital Libraries WorkingPaper, 1998, pp. 1–20.[78] J. M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. S.Tomkins, “The web as a graph: measurements, models, and methods,”in International Computing and Combinatorics Conference. Springer,1999, pp. 1–17.[79] X. Zhu, P. Turney, D. Lemire, and A. Vellino, “Measuring academicinﬂuence: Not all citations are equal,” Journal of the Association forInformation Science and Technology, vol. 66, no. 2, pp. 408–427, 2015.[80] Y. Li, C. Wu, X. Wang, and P. Luo, “A network-based and multi-parameter model for ﬁnding inﬂuential authors,” Journal of Informetrics,vol. 8, no. 3, pp. 791–799, 2014.[81] X. Bai, F. Zhang, J. Hou, F. Xia, A. Tolba, and E. Elashkar, “Implicitmulti-feature learning for dynamic time series prediction of the impact ofinstitutions,” IEEE Access, vol. 5, pp. 16 372–16 382, 2017.[82] Y. Wang, Y. Tong, and M. Zeng, “Ranking scientiﬁc articles by exploitingcitations, authors, journals, and time information,” in Proceedings of theTwenty-Seventh AAAI Conference on Artiﬁcial Intelligence. AAAIPress, 2013, pp. 933–939.[83] M. Franceschet and G. Colavizza, “Timerank: A dynamic approach torate scholars using citations,” Journal of Informetrics, vol. 11, no. 4, pp.1128–1141, 2017.[84] P. Zhao, J. Han, and Y. Sun, “P-rank: a comprehensive structural similar-ity measure over information networks,” in Proceedings of the 18th ACMconference on Information and knowledge management. ACM, 2009,pp. 553–562.[85] S. D. Gollapalli, P. Mitra, and C. L. Giles, “Ranking authors in digitallibraries,” in Proceedings of the 11th annual international ACM/IEEEjoint conference on Digital libraries. ACM, 2011, pp. 251–254.[86] U. Senanayake, M. Piraveenan, and A. Y. Zomaya, “The p-index: Rank-ing scientists using network dynamics,” Procedia Computer Science,vol. 29, pp. 465–477, 2014.[87] C. Mccarty, J. W. Jawitz, A. Hopkins, and A. Goldman, “Predictingauthor h-index using characteristics of the co-author network,” Sciento-metrics, vol. 96, no. 2, pp. 467–483, 2013.[88] Y. Dong, R. A. Johnson, and N. V. Chawla, “Will this paper increaseyour h-index?: Scientiﬁc impact prediction,” in Proceedings of the eighthACM international conference on web search and data mining. ACM,2015, pp. 149–158.[89] A. Daud, M. Ahmad, M. Malik, and D. Che, “Using machine learningtechniques for rising star prediction in co-author network,” Scientomet-rics, vol. 102, no. 2, pp. 1687–1711, 2015.[90] Y. Dong, R. A. Johnson, and N. V. Chawla, “Can scientiﬁc impact bepredicted?” IEEE Trans on Big Data, vol. 2, no. 1, pp. 18–30, 2016.[91] A. Daud, N. R. Aljohani, R. A. Abbasi, Z. Raﬁque, T. Amjad, H. Dawood,and K. H. Alyoubi, “Finding rising stars in co-author networks viaweighted mutual inﬂuence,” in Proceedings of the 26th International VOLUME 4, 2016 onference on World Wide Web Companion. International World WideWeb Conferences Steering Committee, 2017, pp. 33–41.[92] S. Ayaz, N. Masood, and M. A. Islam, “Predicting scientiﬁc impact basedon h-index,” Scientometrics, vol. 114, no. 3, pp. 993–1010, 2018.[93] K. H. Zou, K. Tuncali, and S. G. Silverman, “Correlation and simplelinear regression,” Radiology, vol. 227, no. 3, pp. 617–628, 2003.[94] B. Sohrabi and H. Iraj, “The effect of keyword repetition in abstract andkeyword frequency per journal in predicting citation counts,” Scientomet-rics, vol. 110, no. 1, pp. 243–251, 2017.[95] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,”in Proceedings of the 22nd ACM SigKDD international conference onknowledge discovery and data mining. ACM, 2016, pp. 785–794.[96] L. Weihs and O. Etzioni, “Learning to predict citation-based impactmeasures,” in Proceedings of the 17th ACM/IEEE Joint Conference onDigital Libraries. IEEE Press, 2017, pp. 49–58.[97] O. Penner, R. K. Pan, A. M. Petersen, K. Kaski, and S. Fortunato, “Onthe predictability of future impact in science,” Scientiﬁc reports, vol. 3,no. 3052, pp. 1–8, 2013.[98] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.[99] A. Mohan, Z. Chen, and K. Weinberger, “Web-search ranking with ini-tialized gradient boosted regression trees,” in Proceedings of the learningto rank challenge. IEEE Press, 2011, pp. 77–89.[100] C. Ballester, A. Calvó-Armengol, and Y. Zenou, “Who’s who in net-works. wanted: The key player,” Econometrica, vol. 74, no. 5, pp. 1403–1417, 2006.[101] J. Zhang, Z. Ning, X. Bai, W. Wang, S. Yu, and F. Xia, “Who are therising stars in academia?” in Proceedings of the 16th ACM/IEEE-CS onJoint Conference on Digital Libraries. ACM, 2016, pp. 211–212.[102] J. Zhang, B. Xu, J. Liu, A. Tolba, Z. Al-Makhadmeh, and F. Xia, “PePSI:Personalized prediction of scholars’ impact in heterogeneous temporalacademic networks,” IEEE Access, vol. 6, pp. 55 661–55 672, 2018.[103] R. K. Pan, A. M. Petersen, F. Pammolli, and S. Fortunato, “The memoryof science: Inﬂation, myopia, and the knowledge network,” Journal ofInformetrics, vol. 12, no. 3, pp. 656–678, 2018.[104] L. Liu, Y. Wang, R. Sinatra, C. L. Giles, C. Song, and D. Wang, “Hotstreaks in artistic, cultural, and scientiﬁc careers,” Nature, vol. 559, no.7714, p. 396, 2018.[105] X. Bai, F. Zhang, J. Hou, I. Lee, X. Kong, A. Tolba, and F. Xia, “Quan-tifying the impact of scholarly papers based on higher-order weightedcitations,” PloS one, vol. 13, no. 3, p. e0193192, 2018.