Pairwise Learning for Name Disambiguation in Large-Scale Heterogeneous Academic Networks
Qingyun Sun, Hao Peng, Jianxin Li, Senzhang Wang, Xiangyu Dong, Liangxuan Zhao, Philip S. Yu, Lifang He
aa r X i v : . [ c s . D L ] S e p Pairwise Learning for Name Disambiguation in Large-ScaleHeterogeneous Academic Networks
Qingyun Sun ∗ , Hao Peng ∗ , Jianxin Li ∗ , Senzhang Wang † , Xiangyu Dong ∗ , Liangxuan Zhao ∗ , Philip S. Yu ‡ and Lifang He §∗ Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing 100191, China † Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China ‡ University of Illinois at Chicago, Chicago 60607, USA § Lehigh University, Bethlehem, PA, USAEmail: { sunqy, penghao, lijx } @act.buaa.edu.cn, [email protected], { dongxiangyu, zhaolx } @buaa.edu.cn, [email protected], [email protected] Abstract —Name disambiguation aims to identify uniqueauthors with the same name. Existing name disambigua-tion methods always exploit author attributes to enhancedisambiguation results. However, some discriminative authorattributes (e.g., email and affiliation) may change because ofgraduation or job-hopping, which will result in the separationof the same author’s papers in digital libraries. Although theseattributes may change, an author’s co-authors and researchtopics do not change frequently with time, which meansthat papers within a period have similar text and relationinformation in the academic network. Inspired by this idea,we introduce Multi-view Attention-based Pairwise RecurrentNeural Network (MA-PairRNN) to solve the name disambigua-tion problem. We divided papers into small blocks based ondiscriminative author attributes and blocks of the same authorwill be merged according to pairwise classification results ofMA-PairRNN. MA-PairRNN combines heterogeneous graphembedding learning and pairwise similarity learning into aframework. In addition to attribute and structure information,MA-PairRNN also exploits semantic information by meta-pathand generates node representation in an inductive way, whichis scalable to large graphs. Furthermore, a semantic-levelattention mechanism is adopted to fuse multiple meta-pathbased representations. A Pseudo-Siamese network consistingof two RNNs takes two paper sequences in publication timeorder as input and outputs their similarity. Results on tworeal-world datasets demonstrate that our framework has asignificant and consistent improvement of performance on thename disambiguation task. It was also demonstrated that MA-PairRNN can perform well with a small amount of trainingdata and have better generalization ability across differentresearch areas.
Keywords -Name disambiguation, graph embedding, pairwiselearning, heterogeneous information network
I. I
NTRODUCTION
Namesake problem [1] poses a huge challenge on manyapplications, e.g., information retrieval, bibliographic dataanalysis. When searching for academic publications byauthor name, the results may contain a long list of pub-lications of multiple authors with the same name. Some
Qingyun Sun and Hao Peng contributed equally to this work.Jianxin Li is corresponding author. digital libraries (e.g., DBLP and Google Scholar ) listcandidates after name disambiguation with correspondinghomepage, email and affiliation to make it easier to obtainall publications of one particular author. The academicimpacts of researchers are always measured by impacts oftheir publications in the research community. Therefore, itis important to keep publication data in digital librariesaccurate, consistent, and up to date.Name disambiguation [2], [3], which aims to identifyunique persons with the same name, has been studied fordecades but remains largely unsolved. Most of the existingsolutions utilize author attributes, including name, affiliation,email, homepage, etc., to generate paper representationsor further validate disambiguation results. However, thesediscriminative attributes, especially email and affiliation,may change because of graduation or job-hopping. Wetake Jian Pei , the well known leading researcher in datascience, as an example to show the change of discriminativeattributes in Fig. 1.
Jian Pei ’s papers from 2003 to 2005 areassociated with [email protected] and
State Universityof New York at Buffalo . His papers from 2005 to 2020 areassociated with [email protected] and
Simon Fraser University .The change of discriminative attributes may lead to the paperseparation problem [4], i.e., papers of an author are regardedas belonging to different authors, which commonly occurs indigital libraries. To address this issue, name disambiguationmethods should perform well even when discriminativeattributes change.Even though discriminative attributes may have changed,researchers often have a fixed co-author set and a fewspecific research areas that do not change frequently overtime, which can also be exploited to solve the name dis-ambiguation problem. As shown in Fig. 1, even
Jian Pei has different affiliations and emails in two time periods, hisclose co-authors (e.g.,
Jiawei Han , Ke Wang ) are fixed andhis research areas (e.g.,
Data mining , Time series ) are also https://dblp.uni-trier.de/ https://scholar.google.com/igure 1. An example of the change of Jian Pei ’s discriminative attributes.Figure 2. Academic network. consistent over time.There are several challenges that should be overcome:(1)
Heterogeneity of academic network.
The academicnetwork is a heterogeneous network that contains mul-tiple entities (e.g., author , paper , venue ) and multiplerelationships (e.g., writing , publishing ) as shown inFig. 2. It is challenging to preserve diverse structuraland semantic information simultaneously.(2) Inductive capability.
Many real-world applications en-counter a large number of new papers every day. It ischallenging for name disambiguation methods to havethe inductive capability that can generate representa-tions of new papers efficiently.(3)
Uncertain number of authors.
It is challenging todetermine the number of authors with the same name.In existing clustering based name disambiguation meth-ods [2], [3], [5], the number of authors (i.e., cluster size)is usually a pre-specified parameter.Current works [6], [7] did not efficiently handle thechange of discriminative attributes and inductive paper em-bedding problem in the heterogeneous academic networksimultaneously. In this work, we propose a novel M ulti-view A ttention-based Pair R ecurrent N eural N etwork framework,namely MA-PairRNN , to solve name disambiguation prob-lem. The intuitive idea is that an author’s papers during a period of time should have more similar representa-tions since the co-authors and research interests of mostauthors are consistent despite attributes change. Inspiredby this idea, we take name disambiguation as a pairwisepaper set classification problem that does not require toestimate the number of authors with the same name. Wedivide papers into small blocks according to discriminativeauthor attributes to reduce the search space of the namedisambiguation algorithm. Then small blocks are mergedbased on pairwise classification result and each block aftermerging is the paper set of an author. We represent eachpaper block as a sequence in publication time order andsolve the pairwise classification problem by comparing se-quence similarity. MA-PairRNN combines multiple multi-view graph embedding layers, a semantic-level attentionlayer, and a Pseudo-Siamese recurrent neural network layer,to learn node embedding and node sequence pair similaritysimultaneously. Specifically, multi-view graph embeddinglayer generates meta-path based embeddings of papers inthe heterogeneous academic network. Then, semantic-levelattention layer fuses these meta-path based embeddings intoa vector. Finally, Pseudo-Siamese recurrent neural networklayer learns the similarity of a node sequence pair. Weelaborate on the three components as follows:
Multi-view graph embedding layer.
Multi-view graphembedding layer incorporates meta-paths to capture richsemantic information in the heterogeneous network. Theheterogeneous network is converted into multiple relationview according to meta-paths. For each view, we learn K aggregator functions to incorporate the K -hop neighborhoodof each node. In this way, node embeddings are generatedby enhancing node feature with semantics. Semantic attention layer.
Semantic attention layer cap-tures the importance of meta-paths by an attention mecha-nism and fuse semantic information for specific tasks.
Pseudo-Siamese recurrent neural network layer.
Pseudo-Siamese recurrent neural network composes of tworecurrent neural networks, which are used to learn inherentrelations of paper sequences. It takes two sequences of paperembedding as input and outputs their similarity.The main contributions are summarized as follows: • We propose a novel pairwise classification frameworkcalled MA-PairRNN for name disambiguation task, whichlearns heterogeneous graph representation and paper setpairwise similarity simultaneously. • Under MA-PairRNN, we propose an inductive graph em-bedding method that takes both heterogeneity and largescale of the academic network into account. A semantic-level attention mechanism is leveraged to put differentemphases on each of the meta-paths. A Pseudo-Siameserecurrent neural network is adopted to learn inherentrelations and measure the similarity of two paper sets. • We conduct extensive experiments on AMiner-AND anda large-scale real-world dataset collected from Semanticcholar . The results illustrate the best performance aswell as good generalization ability of the proposed MA-PairRNN compared to other methods.The code of MA-PairRNN is available at https://github.com/RingBDStack/MA-PairRNN .II. RELATED WORKIn this section, we will briefly review name disambigua-tion methods and graph embedding methods. A. Name Disambiguation
Name disambiguation methods can be divided into su-pervised [1], [8], unsupervised [6], [9] and graph-basedones [2], [5]. Graph-based works exploit graph topologicalfeatures in the academic network to enhance the repre-sentation of papers. For instance, GHOST [2] constructsdocument graph based on co-authorship. [5] leverages onlyrelational data in the form of anonymized graphs to preserveauthor privacy. Pairwise classification methods are appliedto estimate the probability of a pair of author mentionsbelonging to the same author and are essential in thename disambiguation task. [6] first learns representation forevery name mention in a pairwise or tripletwise way andrefines the representation by a graph auto-encoder, but thismethod neglects linkage between paper and author and co-authorship. [7] addresses the pairwise classification problemby extracting both structure-aware features and global fea-tures without considering semantic features. In this paper, wefocus on the paper set level pairwise classification problemand exploit attribute, structure, and semantic features to formbetter representation.
B. Graph Embedding
Graph embedding aims to represent a graph as a lowdimensional vector while preserving graph structure andproperties. Recently, Graph Neural Network (GNN) [10]has attracted rising attention due to effective representationability. While most GNN works [10]–[12] focus on trans-ductive setting, there have been some recent works adoptingan inductive learning setting. DeepGL [13] aggregates aset of base graph features by relational functions that cangeneralize across networks. GraphSage [14] samples a fixednumber of neighbors and generate node embeddings byaggregating their features. Both DeepGL and GraphSageare designed for homogeneous graphs. LAN [15] aggregatesneighbors with both rule-based and network-based attentionweights for knowledge graphs.Heterogeneous information networks [16]–[19] have beenstudied in recent years. Meta-path is designed to preserve di-verse semantic information of node type and edge type [20]–[22]. GTN [23] converts heterogeneous graph to new graphstructures which involve identifying task-specific meta-paths and multi-hop connections. HAN [24] includes both node-level and semantic-level attention to take the importance ofnodes and meta-paths into consideration simultaneously.In this paper, we propose an inductive graph embeddingmethod utilizing rich heterogeneous information.III. P ROPOSED M ETHOD
A. Problem Definition
In this section, we formally define Heterogeneous Aca-demic Network and the problem of Name Disambiguation.
Definition 1 (
Heterogeneous Academic Network ): Heterogeneous Academic Network is defined as G = {V , E} ,where V and E denote the set of nodes and edges,respectively. A Heterogeneous Academic Network isassociated a node type mapping function f v : V → O and anedge type mapping function f e : E → R . O = { P, A, T, V } denotes node types set and R = { A writes P , P cites P , P is related to T , P is published in V } denotes edge typesset, where P, A, T, V denote the type of
Paper , Author , Topic and
Venue , respectively.
Definition 2 (
Name Disambiguation ): Given a name a , D a = { d a , d a , . . . , d aN } is a set of papers with name mention a . Every paper d ak consists of some metadata includingpaper attributes (e.g. title , year , venue , keywords ) and authorattributes (e.g. name , email , affiliation ). The objective ofname disambiguation is to partition all name mentions intoa set of unique authors C a = { c a , c a , . . . , c an } . B. Model Architecture
In this section, we propose a novel framework named
MA-PairRNN for name disambiguation. As describedabove, the main intuition is that papers of the same authorwithin a period should have similar representations in theacademic network since the author’s research and scholarrelation is consistent. We divide the set of papers D a intosmall blocks by discriminative author attributes in metadata.These small blocks will be merged based on pairwiseclassification results of MA-PairRNN. First, the multi-viewinductive graph embedding layer is designed to generate thepaper representation of each meta-path. Then a semanticattention layer is designed to learn importance of meta-paths and fuse meta-path based representations. Finally,papers in every block are arranged as a sequence denoted as s ∈ S according to their publication time. Two sequencesof paper embedding are fed into a Pseudo-Siamese networkwith two RNNs for pairwise similarity learning. The overallarchitecture of MA-PairRNN is shown in Fig. 3 C. Multi-View Graph Embedding Layer
Multi-view graph embedding layer generates node repre-sentations inductively by learning a function to aggregateattribute and topology information from local neighbor-hoods. To exploit rich semantic information in the heteroge-neous academic network, we proposed the concept of meta-path based view. Given a heterogeneous academic network nput: graphmeta-paths
Multi-view graph embedding layer Semantic attention layer
RNN Cell RNN Cell RNN Cell ...
RNN Cell RNN Cell RNN Cell ...
Pseudo-Siamese recurrent neural network layer different node types meta-path based embedding N × d (cid:262) MLP y Classification layer node sequence pairs ...... '0 t t N t ' ' N t t N t t '0 t ' ' N t ' 1 ' - N t initial features Embeddings N × d N × d N × d N × d Output: pairwise classification result
Figure 3. An overview of our overall network architecture. G = {V , E} and a meta-path p , a meta-path based view G p is derived from a type of proximity or relationship betweennodes characterized by a meta-path. It can capture differentaspects of structure information through meta-paths and ispotential to add new nodes dynamically.For each meta-path based view, similar to GraphSage [14],node representations are generated by aggregating featuresof meta-path based neighbors and propagating informationacross K layers. Node v ’s representation based on meta-path p is generated as below. First, in the k -th layer, each nodeaggregates its own representation and representations of its1-hop neighborhood N i generated by ( k -1)-th layer into asingle vector z ( k ) p ( N i ) as (1): z ( k ) p ( N i ) = mean ( { z ( k − p ( v j ) , ∀ v j ∈ v i ∪ N i } ) , (1)where z ( k − p ( v j ) denotes representation of v j in ( k -1)-thlayer. When k = 0, z (0) p ( v j ) is defined as original feature x ( v j ) of v j . Then a weight matrix W ( k ) p and a bias vector b ( k ) p are used to transfer information between layers as (2): z ( k ) p ( v i ) = σ ( W ( k ) p · z ( k − p ( N i ) + b ( k ) p ) . (2)To extend the algorithm to a mini-batch setting, we firstsample the l -egonet of papers in the batch. The l -egonetof node v is defined as the set of its l -hop neighbors andall edges between nodes in the set. For each batch, multi-view subgraphs are constructed based on the union of l -egonets of all paper nodes in this batch. Then we generatemeta-path based representation of every node in these multi-view subgraphs. For more convenient notation, we denote v i ’s final representation based meta-path p after K layers as z p ( v i ) ≡ z ( K ) p ( v i ) , where z p ( v i ) ∈ R d . D. Semantic Attention Layer
For each paper, multiple meta-path based representationsare obtained and they can collaborate with each other. Since we assume that the importance of meta-paths varies, anattention mechanism is adopted to capture their contributionand fuse meta-path based node representations.We first introduce a meta-path preference vector a p ∈ R |P|∗ d ′ for each meta-path p to guide the semantic attentionmechanism. For meta-path based representation z ( k ) p andmeta-path preference vector a p , the more similar they are,the greater weight will be assigned to z ( k ) p . We use anon-linear function to transform the d -dimension meta-pathbased embedding into d ′ -dimension as (3): z ′ p ( v i ) = σ ( W p · z p ( v i ) + b p ) . (3)where W p ∈ R |P|∗ d ′ is the weight parameter and b p ∈ R d ′ is the bias parameter of transformation. z ′ p ( v i ) ∈ R d ′ is the node representation of v i based meta-path p aftertransformation. The similarity of transformed representationvector and preference vector ω p ( v i ) is calculated as (4): ω p ( v i ) = a Tp · z ′ p ( v i ) k a p k · k z ′ p ( v i ) k , (4)where k · k is the L2 normalization of vectors. The weightof meta-path p for node v i is defined using a softmax unitas follows: ω ′ p ( v i ) = exp ( ω p ( v i )) P p ′ ∈P exp ( ω p ′ ( v i )) . (5)Final representation of node v i is generated by fusing allmeta-path based representations in the weighted sum form: z ( v i ) = X p ′ ∈P ω ′ p ′ ( v i ) ∗ z p ′ ( v i )) . (6) E. Pseudo-Siamese Recurrent Neural Network Layer
We designed a Pseudo-Siamese recurrent neural networklayer to capture inherent relations of papers and measuresimilarity of two paper sets. Pseudo-Siamese recurrent neu-ral network layer is a Pseudo-Siamese network consistingf two RNNs with different parameters to generate repre-sentations of two node sequences. Specifically, we feed twosequence of paper embeddings into two RNNs respectively.The learned paper embedding of the paper is taken as theinput of RNN units. The output of each RNN unit can beformalized as: h t = RNN( z t , θ t ) , (7)where θ t means parameters of RNN unit. Here we applythe popular LSTM to capture inherent relations of papersequences and learn their similarity. Note that the papersequence published earlier is in published time order andthe other sequence is in reverse. This setting is based on theassumption that an author’s research topics and co-authorsare stable during the period of attribute changing. All outputsof RNN units are aggregated by a GlobalP ool function togenerate the representation of paper sequence as follows: h = GlobalP ool ( { h t , t = 1 , , · · · , | s |} ) , (8)where |·| denotes the length of sequence. We apply a simpleaveraging strategy as the GlobalP ool function here. Thefinal representations of two paper sequences h (1) and h (1) are concatenated and then fed into a multiple fully connectedneural network: ˆy s = σ (MLP([ h (1) , h (2) ])) , (9)where σ ( · ) denotes the softmax function and [ · , · ] representsthe concatenation operation.Since our task is classification, the loss function L classify can be defined as the Cross-Entropy over all labeled nodesequence pairs between the ground-truth and the predictresults. The proposed framework can be trained on a set ofexample pairs. For each pair of paper sequences, a cosinescore function is applied to measure the similarity of thetwo paper sequence representations as (10). L sim = sim ( h (1) , h (2) ) = h (1) · h (2) (cid:13)(cid:13) h (1) (cid:13)(cid:13) · (cid:13)(cid:13) h (2) (cid:13)(cid:13) . (10)The pairwise similarity loss function encourages node se-quences of the same author to have similar representations,and enforces that of different authors to be highly distinct.The model is then trained to minimize the sum of classi-fication loss as follows: L = L classify + η ∗ L sim , (11)where η denotes the coefficient of pair similarity loss. Theoverall process of MA-PairRNN is shown in Algorithm 1.IV. EXPERIMENTS A. Dataset
For our experiments we used two datasets: Aminer-ANDand Semantic Scholar. • Aminer-AND [6]: This dataset contains 70,285 recordsof 12,798 unique authors with 100 ambiguous namereferences.
Algorithm 1:
The overall process of MA-PairRNN
Input:
Paper set D , heterogeneous graph G = {V , E} ,node features { x ( v ) , ∀ v ∈ V} , meta-path set P = { p , p , · · · , p M } , number of multi-viewgraph embedding layer K Output: meta-path based node representation { z p , z p , · · · , z p } Separate paper set D into small blocks accordingdiscriminative author attributes; Arrange papers in every block as sequence s ∈ S ; Construct meta-path based view {G p , G p , · · · , G p M } ; z (0) p ( v i ) = x ( v i ) , ∀ v i ∈ V ; while not converge do for v i ∈ V do for p ∈ P do for k = 1 , , · · · , K do Aggregate meta-path based neighborinformation in previous layer by (1); Calculate the representation of currentlayer by (2); end end Calculate the attention weight of eachmeta-path by (3), (4), (5); Fuse the semantic representation of eachmeta-path based view by (6); end for s ∈ S do Calculate the representation of sequence pair by(7) and (8); Classify the sequence pair by (9); end Calculate Loss by (10) and (11). end Table IS
TATISTICS OF S EMANTIC S CHOLAR
Dataset Node Types
SemanticScholar author author-paper paper paper-term topic paper-venue venue paper-paper • Semantic Scholar : We construct a new real-world aca-demic dataset from a digital library called SemanticScholar. There are 154,822 records of 857 unique authorswith 226 highly ambiguous name in medicine area andreference papers of these records. Detailed description isshown in Table I. The statistics of these authors’ papersare shown in Fig. 4. igure 4. Length Statistics of Paper sets.
B. Evaluation Metrics and Baselines
We apply pairwise Precision, Recall and F1 score inAminer-AND and apply averaged Accuracy, F1 score andAUC in Semantic Scholar to measure the performance ofall methods. We compare with attribute based methods aswell as attribute and structure based methods to demonstratethe effectiveness of our model. To verify the effectiveness ofeach component including meta-path based views, semantic-level attention and Pseudo-Siamese structure, we also testthree variants of MA-PairRNN. • MLP [25]: It’s s multilayer perceptron that directly pro-jecting input features into a low dimensional vector. • Deepwalk [26]: Deepwalk captures contextual informa-tion of neighborhood via uniform random walks for nodeembedding in homogeneous network. • GraphSage [14]: GraphSage samples node neighborhoodsto generate node embeddings for unseen data in an induc-tive way and is designed for homogeneous network. • Zhang et al. [5]: This method learns paper embeddingby sampling triplets from three graphs constructed byrelations of authors and papers and cluster them byhierarchical agglomerative algorithm. • GHOST [2]: GHOST use affinity propagation algorithmfor clustering on a co-authors graph where the nodedistance is measured based on the number of valid paths. • Louppe et al. [3]: This method trains a pairwise distancefunction based on similarity features and use a semi-supervised HAC algorithm for clustering. • Aminer [6]: This method first learns supervised globalembeddings and then refines the global embeddings foreach candidate set based on the local contexts. • Kim et al. [7]: It is a hybrid pairwise classification methodwhich generates paper representation by extracting bothstructure-aware features and global features. • PairRNN
LSTM : A variation of MA-PairRNN
LSTM , whichdirectly feed node feature into a Pseudo-Siamese recurrentneural network layer with two LSTMs. • G-PairRNN
LSTM : A variation of MA-PairRNN
LSTM , which neglects the heterogeneity of academic network andgenerates representation on the original graph. • M-PairRNN
LSTM : A variation of MA-PairRNN
LSTM ,which removes semantic-level attention layer and assignsthe same importance to each meta-path. • MA-PairRNN
LSTM : The proposed model that fuses at-tribute, structure and semantic feature for node embeddinggeneration with an semantic attention mechanism.
C. Implementation Details
In Aminer-AND, the selected meta-paths of our methodconsist of
Paper-Author-Paper , Paper-Topic-Paper and
Paper-Venue-Paper . We use the author’s affiliation as thediscriminative attribute to separate papers into small blocksand we use the same trainset and testset as in [6].In Semantic Scholar, the selected meta-paths of ourmethod consist of
Paper-Paper , Paper-Author-Paper , Paper-Topic-Paper , and
Paper-Venue-Paper . We use the author’semail as the discriminative attribute to separate papers intosmall blocks. To evaluate the learning ability of models, wetest them on Semantic Scholar with different training ratios { , , , } .The common training parameters are set as learning rate= e − and dropout = 0.2. The node embedding dimensionis set to 64 and the classifiers of all methods is a three-layerfully-connected neural network with a ReLU function. Inour proposed model MA-PairRNN LSTM , K is set to 2 andthe dimension of meta-path preference vector a is set to 32. D. Results and Discussions
The performance of different methods on some samplednames of Aminer-AND is reported in Table II. The resultson Semantic Scholar is reported in Table III. Major findingsfrom experimental results can be summarized as follows:
Performance Comparison.
As shown in Table II andTable III, by incorporating attribute, structure and semanticinformation, MA-PairRNN
LSTM outperforms all baselines inboth datasets. Generally, GNN based methods that combinethe attribute and structure information usually perform betterthan those methods which only exploit attribute informa-tion. Compared to simply concatenate representations ofnodes, the Pseudo-Siamese RNN network can better extractinherent relations of paper sequence. Compared to tak-ing the graph as homogeneous, M-PairRNN
LSTM and MA-PairRNN
LSTM can exploit semantic information successfullyand show their superiority. It demonstrates that combined useof attribute, structure, and semantic features better capturethe similarities between papers. In addition, the semantic-level attention mechanism in MA-PairRNN
LSTM can exploitsemantic information more properly.Fig. 5 shows F1 scores of MA-PairRNN
LSTM on differentpartition versions of Semantic Scholar with training ratioof 80%. After adequate rounds of training, the performanceof MA-PairRNN
LSTM on each dataset partition version has able IIT
HE DETAILED RESULTS (%) ON A MINER -AND
Attr. Struc. Attr. + Struc. Attr. + Struc. + Sem.Louppe et al. Zhang et al. GHOST Aminer MA-PairRNN
LSTM
Name
Prec Rec F1 Prec Rec F1 Prec Rec F1 Prec Rec F1 Prec Rec F1Hongbin Li 19.48 85.96 31.77 54.66 53.05 53.84 56.29 29.12 38.39 77.20 69.21 72.99 88.89 65.98 75.74Hua Bai 36.39 41.33 38.70 58.58 35.90 44.52 83.06 29.54 43.58 71.49 39.73 51.08 89.22 70.54 78.79Kexin Xu 91.26 98.35 94.67 90.02 82.47 86.08 92.90 28.52 43.64 91.37 98.64 94.87 85.19 71.88 77.97Lu Han 30.25 46.65 36.70 47.88 20.62 28.82 69.72 17.39 27.84 51.78 28.05 36.39 92.43 69.62 79.42Lin Huang 24.86 71.32 36.87 71.84 34.17 46.31 86.15 17.25 28.74 77.10 32.87 46.09 88.26 73.44 80.17Meiling Chen 58.32 47.14 52.14 59.36 28.80 38.79 86.11 23.85 37.35 74.93 44.70 55.99 - - -Min Zheng 25.86 32.67 28.87 54.76 19.70 28.98 80.50 15.21 25.58 57.65 22.35 32.21 86.07 82.03 84.00Qiang Shi 35.31 47.18 40.39 43.84 36.94 40.10 53.72 26.80 35.76 52.20 36.15 42.72 80.25 69.15 74.29Rong Yu 38.85 91.43 54.53 65.48 40.85 50.32 92.00 36.41 52.17 89.13 46.51 61.12 90.67 68.69 78.16Tao Deng 40.46 51.38 45.27 53.04 29.89 38.23 73.33 24.50 36.73 81.63 43.62 56.86 88.42 65.12 75.00Wei Quan 37.86 63.41 47.41 64.45 47.66 54.77 86.42 27.80 42.07 53.88 39.02 45.26 75.76 78.13 76.92Xudong Zhang 72.38 79.83 75.92 70.20 23.35 35.04 85.75 7.23 13.34 62.40 22.54 33.12 - - -Xu Xu 22.55 64.40 33.40 48.16 41.87 44.80 61.34 21.79 32.15 74.18 45.86 56.68 78.68 79.08 78.88Yanqing Wang 29.64 79.08 43.11 60.40 51.97 55.87 80.79 40.39 53.86 71.52 75.33 73.37 77.42 64.86 70.59Yong Tian 32.08 63.71 42.67 70.74 56.85 63.04 86.94 54.58 67.06 76.32 51.95 61.82 87.80 70.59 78.26Average 57.09 77.22 63.10 70.63 59.53 62.81 81.62 40.43 50.23 77.96 63.03 67.79
Table IIIQ
UANTITATIVE RESULTS AND STANDARD DEVIATION (%) ON S EMANTIC S CHOLAR
Attr. Attr. + Struc. Attr. + Struc. + Sem.Metrics Training MLP PairRNN
LSTM
Deepwalk GraphSage Aminer Kim et al. G-PairRNN
LSTM
M-PairRNN
LSTM
MA-PairRNN
LSTM ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Figure 5. Performance of MA-PairRNN
LSTM on different SemanticScholar partition version with training ratio of 80%. gained stability and certainty and is difficult to be furtherimproved though fluctuations exist.
Impact of training ratio.
F1 scores of all methods onSemantic Scholar with different training ratio are shown inFig. 6 (a) and their distributions are shown in Fig. 6 (b).The performances of all methods get worse as the trainingratio decrease. Our method MA-PairRNN
LSTM and its vari- ants suffer less performance degradation than others, whichshows better learning ability.
Siamese Network v.s Pseudo-Siamese Network.
Asmentioned above, Pseudo-Siamese neural network compo-nent consists of two RNNs with different parameters. Wealso test three variations including a Pseudo-Siamese net-work with two BiLSTM (MA-PairRNN
BiLSTM ), a Siamesenetwork with two parameter-shared LSTM (MA-RNN
LSTM ),and a Siamese network with two parameter-shared BiL-STM (MA-RNN
BiLSTM ). Results on Semantic Scholar areshown in Table. IV. We can see that Pseudo-SiameseNetwork models have a better performance than the othertwo Siamese Network models. Based on our assumptionthat papers during the period of discriminative attributeschanging have similar text and structure features, the papersequence published earlier is fed into RNN in publicationtime order and the other is in reverse order. Pseudo-Siamesenetwork may better capture the changing trend of researchtopic and scholar relationship.
Impact of Different Meta-paths.
To verify the abilityof semantic-level attention, we report F1 scores of MA-PairRNN
LSTM using single meta-path and correspondingattention values on Semantic Scholar in Fig. 7. Obviously, a) F1 scores with different training ratio(b) Distributions of F1 scores with different training ratioFigure 6. Performance with different training ratio on Semantic Scholar.Table IVP
ERFORMANCE COMPARISON (%)
OF DIFFERENT SEQUENCEREPRESENTATION MODEL ON S EMANTIC S CHOLAR
Models Accuracy F1 score AUCMA-PairRNN
LSTM
BiLSTM
MA-RNN
LSTM
MA-RNN
BiLSTM there is a positive correlation between the performance ofeach meta-path and its attention value. Among four meta-paths, MA-PairRNN
LSTM gives PVP the highest weight,which means that PVP is considered as the most criticalmeta-path in paper representation. It makes sense becauseauthors research areas are highly correlated with venueswhere their papers are published. Meanwhile, PP is alsogiven a high weight. It also makes sense because author’spapers are often closely related and have similar references.
Generalization ability across research areas.
On Se-mantic Scholar, our models are trained on papers of medicalarea. To verify the generalization ability of models acrossdifferent research areas, we collected data of 100 authorsfrom biology, chemistry, computer science, and mathematicsarea, respectively. The performance of all models on these
Figure 7. Performance of single meta-path and corresponding attentionvalue.Figure 8. Performance (F1 score %) in different research areas. data is shown in Fig. 8. When trained on data of the medicalarea and test on the other four areas, the performancedegradations of our proposed model (MA-PairRNN
LSTM )and its variations (G-PairRNN
LSTM and M-PairRNN
LSTM )are less than 3%, which are better than other models. Itindicates that the structure information can enhance model’sgeneralization ability. Most models perform better whentransferred to biology and chemistry area than other twoareas. It makes sense because these two areas share morearea knowledge with the medical one.
E. Parameters Analysis
In this section, we will investigate how dimension of nodeembedding and attention preference vector and coefficientof similarity loss can affect classification performance. Theresults on Semantic Scholar are reported in Fig. 9.
Dimension of the final node embedding z . The repre-sentation ability of graph embedding methods is affected bythe dimension of node embedding z . We explore its impactwith various dimension {
16, 32, 64, 128, 256 } . As shown inFig. 9 (a), the performance firstly improves with the increaseof node embedding dimension, then degenerates slowly, andachieves the best performance at the dimension of 64. The a) Dimension of the final node embedding z (b) Dimension of semantic attention vector a (c) Coefficient η of cosine similarity lossFigure 9. Parameter sensitivity: Dimension of node embedding z , Dimension of semantic attention vector a and Coefficient η of cosine similarity loss. reason may be that larger dimension could introduce someadditional redundancies. Dimension of semantic attention vector a . We evaluatethe effect of semantic attention vector a ’s dimension inthe set of { , , , , } . As shown in Fig. 9 (b),the F1 score has minor changes, which shows that MA-PairRNN LSTM is not very sensitive to the dimension ofattention preference vector.
Coefficient η of cosine similarity loss. The impactof similarity loss item is controlled by η . We vary η ∈{ , . , . , , . , , } . As shown in Fig. 9 (c), optimalperformance is obtained near η = 1, indicating that η cannotbe set too small or too large in order to prevent overfittingand underfitting. F. Case Study
We specifically choose three author variants named
JianPei in Semantic Scholar as a study case and we denote themas
Jian Pei 1 , Jian Pei 2 , Jian Pei 3 . Statistics of selectedthree author variants are shown in Table. V. Our modelclassifies
Jian Pei 1 and
Jian Pei 2 as the same person while
Jian Pei 3 is another person, which is consistent with theground truth. We visualize the subgraph of the academicnetwork that three author variants are in. The visualizedsubgraph includes papers and co-authors of the three authorvariants, and topics their papers related to. Papers of threeauthor variants are colored blue, green, and red respectivelyand other nodes are colored by their type. Paper nodes of
Jian Pei 1 colored blue and paper nodes of
Jian Pei 2 coloredgreen tend to be closely connected physically and many ofthem are connected by same topics (e.g.,
Data mining , SocialNetwork ) and same venues (e.g.,
KDD , TKDE ). Jian Pei 3 ’spaper nodes are connected to paper nodes of the other two bytopic nodes such as
Algorithm and
Simulation experiment ,which are used in many research areas.V. CONCLUSION AND FUTURE WORKIn this paper, we propose MA-PairRNN, a novel pairwisenode sequence classification framework for name disam-biguation, in which multi-view graph embedding layer is
Table VS
TATICS OF SELECTED AUTHOR VARIANTS author
Data miningJian Pei 1
441 23,729
Social networksFrequent pattern miningData miningJian Pei 2
78 4,512
Sequential pattern miningFrequent pattern miningMolecular synthesisJian Pei 3
36 690
Functional materialsConvenient Syntheses
Figure 10. Subgraph visualization of selected author variants. Paper nodecolor represents author variant (Blue:
Jian Pei 1 , Green:
Jian Pei 2 , Red:
Jian Pei 3 ) designed to generate node representation inductively, andPseudo-Siamese recurrent neural network is designed tolearn sequence pair similarity. Our proposed method canlearn node representation and sequence pair similarity si-multaneously, and can scale to large graphs for its inductiveapability. Experimental results on two real-world datasetsdemonstrate the effectiveness of our method. By analyzingthe learned attention weights of meta-paths, MA-PairRNNhas proven its potentially good interpretability. By testingon data of unseen areas, MA-PairRNN has also proven itsgood generalization ability. In the future, we plan to leveragehierarchical clustering to address the problem that an authorhas diverse research areas and works with non-overlappingsets of co-authors corresponding to each research area.A CKNOWLEDGMENT
This work is supported by the the National Key R&D Pro-gram of China (2018YFC0830804), NSFC No.61872022,NSF of Jiangsu Province BK20171420, NSF of GuangdongProvince (2017A030313339) and CCF-Tencent Open Re-search Fund, and in part by NSF under grants III-1526499,III-1763325, III-1909323, and SaTC-1930941.R
EFERENCES [1] H. Han, L. Giles, H. Zha, C. Li, and K. Tsioutsiouliklis, “Twosupervised learning approaches for name disambiguation inauthor citations,” in
JCDL . IEEE, 2004, pp. 296–305.[2] X. Fan, J. Wang, X. Pu, L. Zhou, and B. Lv, “On graph-basedname disambiguation,”
JDIQ , vol. 2, no. 2, pp. 1–23, 2011.[3] G. Louppe, H. T. Al-Natsheh, M. Susik, and E. J.Maguire, “Ethnicity sensitive author disambiguation usingsemi-supervised learning,” in
KESW . Springer, 2016, pp.272–287.[4] D. Lee, B.-W. On, J. Kang, and S. Park, “Effective andscalable solutions for mixed and split citation problems indigital libraries,” in
IQIS , 2005, pp. 69–76.[5] B. Zhang and M. Al Hasan, “Name disambiguation inanonymized graphs using network embedding,” in
CIKM ,2017, pp. 1239–1248.[6] Y. Zhang, F. Zhang, P. Yao, and J. Tang, “Name disambigua-tion in aminer: Clustering, maintenance, and human in theloop.” in
KDD , 2018, pp. 1002–1011.[7] K. Kim, S. Rohatgi, and C. L. Giles, “Hybrid deep pairwiseclassification for author name disambiguation,” in
CIKM ,2019, pp. 2369–2372.[8] R. C. Bunescu and M. Pasca, “Using encyclopedic knowledgefor named entity disambiguation,” in
EACL , 2006.[9] L. Cen, E. C. Dragut, L. Si, and M. Ouzzani, “Authordisambiguation by hierarchical agglomerative clustering withadaptive stopping criterion,” in
SIGIR , 2013, pp. 741–744.[10] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip,“A comprehensive survey on graph neural networks,”
IEEETNNLS , 2020.[11] H. Peng, J. Li, Y. He, Y. Liu, M. Bao, L. Wang, Y. Song,and Q. Yang, “Large-scale hierarchical text classification withrecursively regularized deep graph-cnn,” in
WWW , 2018, pp.1063–1072. [12] H. Peng, J. Li, S. Wang, L. Wang, Q. Gong, R. Yang,B. Li, P. Yu, and L. He, “Hierarchical taxonomy-aware andattentional graph capsule rcnns for large-scale multi-label textclassification,”
IEEE TKDE , 2019.[13] R. A. Rossi, R. Zhou, and N. K. Ahmed, “Deep featurelearning for graphs,” arXiv preprint arXiv:1704.08829 , 2017.[14] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representa-tion learning on large graphs,” in
NIPS , 2017, pp. 1024–1034.[15] P. Wang, J. Han, C. Li, and R. Pan, “Logic attention basedneighborhood aggregation for inductive knowledge graphembedding,” in
AAAI , vol. 33, 2019, pp. 7152–7159.[16] S. Wang, X. Hu, P. S. Yu, and Z. Li, “Mmrate: inferringmulti-aspect diffusion networks with multi-pattern cascades,”in
KDD , 2014, pp. 1246–1255.[17] X. Zhang, Y. Zhang, S. Wang, Y. Yao, B. Fang, and S. Y.Philip, “Improving stock market prediction via heterogeneousinformation fusion,”
KBS , vol. 143, pp. 236–247, 2018.[18] C. Gao, Y. Chen, S. Liu, Z. Tan, and S. Yan, “Adversarialnas:Adversarial neural architecture search for gans,” in
CVPR ,2020, pp. 5680–5689.[19] Y. Cao, H. Peng, and S. Y. Philip, “Multi-information sourcehin for medical concept embedding,” in
PAKDD . Springer,2020, pp. 396–408.[20] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, “Pathsim:Meta path-based top-k similarity search in heterogeneousinformation networks,”
VLDB , vol. 4, no. 11, pp. 992–1003,2011.[21] H. Peng, J. Li, Q. Gong, Y. Song, Y. Ning, K. Lai, and P. S.Yu, “Fine-grained event categorization with heterogeneousgraph convolutional networks,” in
IJCAI , 2019, pp. 3238–3245.[22] Y. He, Y. Song, J. Li, C. Ji, J. Peng, and H. Peng, “Hetes-paceywalk: a heterogeneous spacey random walk for hetero-geneous information network embedding,” in
CIKM , 2019,pp. 639–648.[23] S. Yun, M. Jeong, R. Kim, J. Kang, and H. J. Kim, “Graphtransformer networks,” in
NIPS , 2019, pp. 11 960–11 970.[24] X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu,“Heterogeneous graph attention network,” in
WWW , 2019, pp.2022–2032.[25] S. K. Pal and S. Mitra, “Multilayer perceptron, fuzzy sets, andclassification,”
IEEE Trans. Neural Networks , vol. 3, no. 5,pp. 683–697, 1992.[26] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Onlinelearning of social representations,” in