[PDF] Sampling Subgraph Network with Application to Graph Classification

Abstract

Graphs are naturally used to describe the structures of various real-world systems in biology, society, computer science etc., where subgraphs or motifs as basic blocks play an important role in function expression and information processing. However, existing research focuses on the basic statistics of certain motifs, largely ignoring the connection patterns among them. Recently, a subgraph network (SGN) model is proposed to study the potential structure among motifs, and it was found that the integration of SGN can enhance a series of graph classification methods. However, SGN model lacks diversity and is of quite high time complexity, making it difficult to widely apply in practice. In this paper, we introduce sampling strategies into SGN, and design a novel sampling subgraph network model, which is scale-controllable and of higher diversity. We also present a hierarchical feature fusion framework to integrate the structural features of diverse sampling SGNs, so as to improve the performance of graph classification. Extensive experiments demonstrate that, by comparing with the SGN model, our new model indeed has much lower time complexity (reduced by two orders of magnitude) and can better enhance a series of graph classification methods (doubling the performance enhancement).

Full PDF

11 Sampling Subgraph Network with Application toGraph Classiﬁcation

Jinhuan Wang, Pengtao Chen, Bin Ma, Jiajun Zhou, Zhongyuan Ruan, Guanrong Chen,

Fellow, IEEE ,and Qi Xuan,

Member, IEEE , Abstract —Graphs are naturally used to describe the structuresof various real-world systems in biology, society, computer scienceetc., where subgraphs or motifs as basic blocks play an importantrole in function expression and information processing. However,existing research focuses on the basic statistics of certain motifs,largely ignoring the connection patterns among them. Recently,a subgraph network (SGN) model is proposed to study thepotential structure among motifs, and it was found that theintegration of SGN can enhance a series of graph classiﬁcationmethods. However, SGN model lacks diversity and is of quitehigh time complexity, making it difﬁcult to widely apply inpractice. In this paper, we introduce sampling strategies intoSGN, and design a novel sampling subgraph network model,which is scale-controllable and of higher diversity. We alsopresent a hierarchical feature fusion framework to integrate thestructural features of diverse sampling SGNs, so as to improvethe performance of graph classiﬁcation. Extensive experimentsdemonstrate that, by comparing with the SGN model, our newmodel indeed has much lower time complexity (reduced by twoorders of magnitude) and can better enhance a series of graphclassiﬁcation methods (doubling the performance enhancement).

Index Terms —network sampling, subgraph network, featurefusion, graph classiﬁcation, biological network, social network

I. I

NTRODUCTION

Networks or graphs are frequently used to capture vari-ous relationships that exist in the real world, and thus wewitness the emergence of social networks [1]–[3], trafﬁc net-works [4]–[6], biological networks [7]–[9], literature citationnetworks [10], [11], etc. The recently proposed graph represen-tation methods allow us to better understand the structures ofthese networks and promote the development of various disci-plines. Interestingly, the early graph embedding methods werebeneﬁted from natural language processing [12], while now thegraph neural networks (GNN) are used to successfully dealwith visual semantic segmentation [13]. Furthermore, thesegraph embedding methods have made remarkable achieve-ments in such areas as recommendation systems [14], [15], QAsites [16], [17], and even drug discovery [18], [19]. In fact,

J. Wang, P. Chen, J. Zhou, Z. Ruan, and Q. Xuan are with theInstitute of Cyberspace Security, College of Information Engineering,Zhejiang University of Technology, Hangzhou 310023, China (e-mail:[email protected]; [email protected]; [email protected];[email protected]; [email protected]).B. Ma is with the Department of Electrical and Computer Engineering,University of Southern California, Los Angeles, CA, the United States(e-mail: [email protected]).G. Chen is with the Department of Electronic Engineering, City Universityof Hong Kong, Hong Kong SAR, China (e-mail: [email protected]).Corresponding authors: Qi Xuan. network science, together with machine learning (especiallydeep learning), has made an important contribution to thedevelopment of cross-disciplines.Subgraphs or motifs [20], [21], as basic building blocks,can be used to describe the mesoscale structure of a network.The networks constructed by different subgraphs may havevastly different topological properties and functions, and thuscould be integrated into many graph algorithms to improvetheir performances. For instance, after extracting the rootsubgraph with a modiﬁed skip-gram model, Narayanan etal. [22] proposed Subgraph2Vec as an unsupervised represen-tation learning method, leading to good performance on graphclassiﬁcation. Ugander et al. [23] treated subgraph frequenciesin social networks as local attributes and found that subgraphfrequencies do provide unique insights for identifying socialand graph structures of large networks. Inspired by neuraldocument embedding models, Nguyen et al. [24] proposed theGE-FSG method, which adopts a series of frequent subgraphsas the inputs of the PV-DBOW model to obtain the entire-graph embeddings, achieving good performance in graphclassiﬁcation and clustering. These studies focus more onthe basic statistics, e.g., the number of subgraphs, but lackanalysis of the underlying structure among these subgraphs.The recently proposed subgraph network (SGN) model [25]takes the above issue into consideration and connects differentsubgraphs to construct a new network at a higher level. Thisprocess can be iterated to form a series of SGNs of differentorders. It has been proven that SGNs can effectively expandthe structural space and further improve the performance ofnetwork algorithms.However, SGN model has the following two shortages.First, the rule to establish SGN is deterministic, i.e., userscan generate only one SGN of each order for a network. Suchlack of diversity will limit the capacity of SGN to expand thelatent structure space. Second, when the number of subgraphsexceeds the number of nodes in a network, the generatedSGN can be even larger than the original network, whichmakes it extremely time-consuming to process SGNs of thehigher-order, letting alone integrating these SGNs to designalgorithms of better performances. On the other hand, it isnoted that network sampling can increase the diversity byintroducing the randomness, and meanwhile control the scale,providing an effective and inexpensive solution for networkanalysis. This merit thus is exactly complementary to the SGNmodel.In this paper, we introduce network sampling into the SGNmodel, and proposes Sampling SubGraph Network (S GN). a r X i v : . [ c s . S I] F e b In particular, we utilize the following four network samplingstrategies, including random walk, biased walk, link selection,and spanning tree, to sample a subnetwork containing certainnumbers of nodes and links, and then map the subnetworkto SGN based on certain rules. Network sampling and SGNconstruction can be used iteratively, so as to create a series ofS GN of different orders, whose structural features can thenbe fused with those of the original network, so as to enhancea number of network algorithms. Speciﬁcally, we have thefollowing contributions: • We propose a new network model, sampling subgraphnetwork (S GN), by introducing network sampling intoSGN. Compared with SGN, our S GN can increase thediversity and decrease the complexity to a certain extent,beneﬁting the subsequent network algorithms. • We propose hierarchical fusion to fully utilize the struc-tural information extracted from S GNs of differentorders, generated by different sampling strategies, toenhance various graph classiﬁcation algorithms based onmanual attributes, Graph2Vec, DeepKernel, and Caps-GNN. • We apply the new method to eight real-world networkdatasets, and our experimental results demonstrate the ef-fectiveness and efﬁciency of S GN. The fusion of S GNsgenerated by different sampling strategies can increasethe performance of graph classiﬁcation algorithms in 30out of 32 cases, with a relative improvement of 10.75% onaverage (4.68% for SGN). This value increases to 14.49%(2.06% for SGN) when only CapsGNN is considered, i.e.,the combination of S GN-Fusion and CapsGNN achievesthe F - Score GNs needs much lesstime, reduced by almost two orders of magnitude.The rest of the paper is structured as follows. In Sec. II,we brieﬂy describe the related work in network sampling andfeature extraction. In Sec. III, we introduced the constructionmethod of S GN. In Sec. IV, we give several feature extractionmethods, which together with S GN are applied to eight real-world network datasets. Finally, we conclude the paper andhighlight some promising directions for future work in Sec. V.II. R

ELATED WORK

In this section, to supply some necessary background in-formation, we give a brief overview of network samplingstrategies and graph representation algorithms in graph miningand network science.

A. Network Sampling

Our work is closely related to the line of research in thenetwork analysis based on sampling. Sampling methods ingraph mining have two main tasks: generating node sequencesand limiting the scale of the network. For the former, manystudies utilize sampling strategies to extract node sequencesto provide materials for subsequent network representation.Random walk [26] is one of the most famous node samplingmethods, which has a wide inﬂuence in the ﬁeld of graph mining [27], [28]. For example, DeepWalk [29] combined therandom walk with the language model in NLP, which wasapplied to node classiﬁcation as a graph embedding method.In addition, Grover and Leskovec [30] designed a biased walkmechanism based on random walk, which had a further im-provement in node classiﬁcation. Breadth-First Sampling [31]is a node sampling algorithm, which is biased to the nodesof high degrees and has been successfully applied in themeasurement and topological analysis of OSNs. By limitingthe scale of a network, Satuluri et al. [32] sparsiﬁed graphsand achieved faster graph clustering without sacriﬁcing quality.Moreover, sampling on graphs also has a wide spectrumof applications on network visualization [33]. The samplingmethod can simplify the network while preserving signiﬁcantstructure information, which is of ultra importance in graphmining.

B. Graph Representation

The most naive network representation method is to calcu-late graph attributes according to certain typical topologicalmetrics [34]. Early graph embedding methods were consider-ably affected by NLP. For example, as graph-level embeddingalgorithms, Narayanan et al. proposed Subgraph2Vec [22] andGraph2Vec [35], which achieve good performances on graphclassiﬁcation.Another popular approach is to use graph kernel methodsto capture the similarity between graphs. Although repre-senting networks well, they generally have relatively highcomputational complexity [34]. It is worth mentioning thatthe WL kernel [36] was used to make the subgraph iso-morphism check more effective. On this basis, Yanardag andVishwanathan [37] proposed an alternative kernel formulationtermed as Deep Graph Kernel (DeepKernel) which achievedgood performances on several datasets.With the rise of spectral analysis of graph data in recentyears, graph convolutional neural network (GCN) has beendeveloped. It uses the Laplace decomposition of graphs toachieve convolutional operation in the spectral domain. Kipf etal. [38] used this neural network structure for semi-supervisedlearning, and achieved excellent results. Later, mathematicalanalysis on GCN went further and proved that the Laplaciandecomposition used by GCN and Laplacian smoothing on im-ages have mathematically equivalent forms [39]. At the sametime, GCNs in the spatial domain have also been proposed.Inspired by the idea of convolution kernels in CNN, Mathiaset al. [40] proposed the method of PATCHY-SAN, which candetermine the direction of the convolutions and the order of thenodes in the convolution window, and this model also achievedgood results in graph classiﬁcation. In this way, GCN treats theobtained information without weighting, i.e. the informationof important neighbors and non-important neighbors will beput into the convolution layer in an unbiased manner. GATovercomes this shortage by supplementing a self-attentioncoefﬁcient before the convolution layer [41]. Based on thenewly proposed capsule network architecture, Zhang et al. [42]designed a CapsGNN to generate multiple embeddings foreach graph, thereby capturing the classiﬁcation-related in-formation and the potential information with respect to the graph properties at the same time, which achieved the goodperformance.Although the above graph representation methods haverelatively high expressiveness and learning ability, largelyimproving the performance of graph classiﬁcation, they do nothave good interpretability, and in addition, they only rely ona single network structure, limiting their ability to exploit thelatent structural space. Therefore, we generate multiple S GNsto fully expand the latent structural space, so as to enhancethe network algorithms. Our experiments have demonstratedthat S GNs can be naturally integrated with many graphrepresentation methods by our feature fusion framework forthe further improvement of their effectiveness.III. M

ETHODOLOGY

We ﬁrst brieﬂy review SGN and the four network samplingmethods. Then we introduce the framework to establish S GN.

A. Subgraph network

Subgraph network (SGN) [25] is considered as a mappingfunction in network space. It provides a scalable model thattransforms the original node-level network into a subgraph-level network. As shown in Fig. 1, the SGN in Fig. 1 (b) canbe obtained by SGN mapping from the original network inFig. 1 (a). One can see that the edges of different colors in(a) are mapped into the corresponding nodes in (b), which arenaturally connected depending on whether they share the samenode in the original network.

SGN mapping (a)

Original network (b) SGN

Fig. 1. Schematic diagram of SGN construction.

Formally, given an undirected network G = ( V, E ) as anoriginal network, where V and E are the node and edge sets,respectively. Let V i ⊆ V and E i ⊆ E . Then, g i = ( V i , E i ) is a subgraph of G . The SGN, denoted by G s = L ( G ) , is amapping from G to G s = ( V s , E s ) , where the node and edgesets are denoted by V s = { g i | i = 0 , , , ..., n } and E s ⊆ ( V s × V s ) . If g a ∩ g b (cid:54) = ∅ , i.e., g a ∩ g b ∈ V , in the original network,then they are connected in the SGN, i.e., ( g a , g b ) ∈ E s . Itcan be seen that the construction of SGN has three steps: (i)detect subgraphs { g i } from the original network; (ii) clear anddeﬁne the connection rules between subgraphs; (iii) build SGNby leveraging the subgraphs.For simplicity, here for the case of 1st-order SGN, denotedby SGN (1) , pairwise linked nodes are chosen as buildingunits, and the adjacent node pairs are connected. In this case,SGN (1) is equivalent to the line graph [43], which reveals thetopological interaction between edges of the original network.Fu et al. [3] used this method to map the original network toan SGN, and then used the node centrality in SGN to predict the weights of edges of the original network. As the SGNgradually maps to the higher-order network space, one canobserve more abundant feature information. For example, the2nd-order subgraph network, denoted by SGN (2) , is obtainedby repeating the mapping process on the SGN (1) . The buildingunit of SGN (2) is a 2-hop structure (open triangle), whichmaintains the 2nd-order interactive information of the edgestructures and can provide more insights about the localstructure of a network [44]. To reduce the density of SGN,in the case of SGN (2) , two building units are connected whenthey share the same edge. The latent structural informationprovided by higher-order SGNs may steadily diminish as theorder increases. Therefore, SGN generally works best with theﬁrst two orders [25]. B. Network Sampling Strategies

In this paper, we adopt the following four sampling strate-gies, including random walk, biased walk, link selection, andspanning tree, to design our S GN.

Random walk.

Random walk [45] can be used to obtainthe co-occurrence relationship between nodes during networksampling. A node in a network can be described by thewandering sequence starting from it. The wandering sequenceobtained from the node contains both local and higher-orderneighbors. When the wandering scope is extended to the graphlevel, one can peek into the topology of the whole network.In our model, given a network G = ( V, E ) , the random walkalgorithm is described as follows: • Start with an initial node v ∈ V . • At step i , choose one neighbouring node u ∈ N ( v i − ) . • Let v i ← u be the next node and get the edge (cid:98) E ← (cid:98) E + (cid:8) ( v i − , v i ) (cid:9) . • Repeat the steps until | (cid:98) E | = | V | . Node v i is generated by the following distribution: P ( v i = x | v i − = m ) = (cid:26) αN , if ( m, x ) ∈ E , otherwise where α is the transition probability between nodes m and x ,and N is the normalizing constant. One can follow the abovesteps to simulate a random walk and get the ﬁnal substructure (cid:98) G = ( (cid:98) V , (cid:98) E ) . Biased walk.

In the ﬁeld of network science, biased walk[46] is different from the random walk where the probabilityof a potential new state is independent of external conditions.When the network is too complex to be analyzed by statisticalmethods, the biased walk provides an effective method forstructural analysis by extracting the symmetry of an undirectednetwork. The concept of the biased walk has attracted consid-erable attention, especially in the ﬁelds of transportation andsocial networks [47]. Here, we adopt the walking mechanismof Node2Vec [30], where the homogeneity equivalence andstructural equivalence of nodes are preserved by integratingthe depth-ﬁrst search and breadth-ﬁrst search. Speciﬁcally, weadopt the 2nd-order random walk with parameters p and q ,which takes into account the topological distance between thenext node and the previous node as well as the connectivity of

21 34 5 6 v mx x x α =1/p α =1/q α =1/q α =1 α =1/p uvx x x

22 1 3 2 143 2 1 uvx x x x x Firstiteration SeconditerationAfter several iterations

Firstiteration SeconditerationAfter several iterations

453 2 68 71453 2 618 7 (a) Original network (b) A spanning tree

Fig. 2. Illustration of the walk procedure in link selection. the current node. Thus, the transition probability α between v i and v i +1 is determined by α ( v i ,v i +1 ) = ω pq ( v i − , v i +1 ) =  p , d ( v i − ,v i +1 ) = 01 , d ( v i − ,v i +1 ) = 1 q , d ( v i − ,v i +1 ) = 2 where v i − , v i , and v i +1 are the previous, current, and nextnodes, respectively, and d ( v i − ,v i +1 ) ∈ (0 , , indicates theshortest path between v i − and v i +1 . Note that α is equal to ω pq when the network is unweighted. Various substructures ofnetwork can be obtained by controlling p and q . Link selection.

We also propose a new edge-based samplingmethod, namely link selection. Given a network G = ( V, E ) ,we ﬁrst sample an initial edge e = ( v , v ) , and thenrandomly select a node of this edge as the source node ofthe next sampling edge. The nodes of all the sampled edgesform the source node pool V pool for the next sampling. Thesampling process will not terminate until the stop conditionis met. The substructures after this sampling strategy areobtained by a diffuse search from a central edge, which ensuresthe acquisition of important network structures to a certainextent. As shown in Fig. 2, the node pair (1,2) is selectedas the initial edge and then we can get the substructure thatcontains nodes (1,2,3) after one iteration through node ”2” andget an expanding substructure that contains nodes (1,2,3,4)after second iteration through another node ”1”. After severaliterations, one can get the ﬁnal substructure, which contains7 nodes and 8 edges while the program satisﬁes the stopcondition. • Start with an initial edge e = ( v , v ) ∈ E , and let V pool = (cid:8) v , v (cid:9) , E pool = (cid:8) e (cid:9) . • At step i , choose one node u ∈ V pool . • Let u i ← u be the next start node and select an edge ( u i , u i +1 ) / ∈ E pool . • Update V pool ← V pool + (cid:8) u i +1 (cid:9) and get the edge pool E pool ← E pool + (cid:8) ( u i , u i +1 ) (cid:9) . • Repeat the above steps until | E pool | = | V | . Note that ( u i , u i +1 ) has the same transition probability withthe random walk, and V pool and E pool are the node and edgesets of the ﬁnal substructure (cid:98) G . This method differs fromrandom walk in that it can search the network on the basis of the current substructure rather than a single node, whichcan reduce the appearance of a chain structure to a greaterextent.

21 34 5 6 v mx x x α =1/p α =1/q α =1/q α =1 α =1/p uvx x x

22 1 3 2 143 2 1 uvx x x x x Firstiteration SeconditerationAfter several iterations

Firstiteration SeconditerationAfter several iterations

453 2 68 71453 2 618 7 (a) Original network (b) A spanning tree

Fig. 3. The substructure obtained by spanning tree.

Spanning tree.

A spanning tree [48] is a minimally con-nected substructure that contains all nodes in the graph, asshown in Fig. 3. Different spanning trees can be obtained bytraversing from different nodes. Here we randomly select anode as the initial node. The maximum and minimum spanningtrees are uniﬁed without considering the edge weights. In thissection, we use the typical Kruskal algorithm [49] to generatespanning trees and the weight values of edges are all set to 1.

C. Framework for Constructing S GN Most real-world networks have large scale and complexstructure. Typically, SGN could be even larger and denser,making the follow-up network algorithms less efﬁcient. It mayalso introduce extra noisy structural information, disturbingthe network algorithms to a certain extent. In view of this, wefocus on optimizing the SGN model and propose a frameworkfor constructing a sampling subgraph network (S GN) byintegrating different network sampling methods. The pseu-docodes of constructing S GN and sampling substructures aregiven in Algorithms 1 and 2, respectively. In Algorithms 1,GetMaxSubstructure(·) is to obtain the maximally connectedsubstructure of original network if it is not connected; NodeR-anking(·) is to rank the input nodes; SGNAlgorithms(·) is toconstruct SGNs. GetNextEdgeWithStrategy(·) in Algorithms 2is to get the next edge according to a given sampling strategy.In general, S GN can be constructed in three steps: sourcenode selection, sampling substructure and S GN construction,which are introduced in the following. • Source node selection : There are many ways to choosethe initial node: (i) Randomly select a node as thesource node; (ii) Select an initial node according to itsimportance measured by closeness centrality [50], K-shell [51], PageRank [52] or others. In this paper, we usethe K-shell method in order to capture the key structuremore likely. • Sampling substructure : After the initial source nodeis determined, a substructure can be obtained by con-ducting a certain sampling strategy to extract the maincontext of the current network. According to differentsampling strategies, diverse sampling substructures can begenerated, reﬂecting the different aspects of the originalnetwork and further beneﬁting the subsequent networkalgorithms. ……………… S GN (0) Original Network SamplingSamplingS GN (1) S GN (2) S GN Space MappingFeature MappingS GN Mapping

Structure

Feature FusionHierarchical Feature Fusion Classification & EvaluationClassificationModelGraph Classification Result Analysis 框中文字斜线箭头

Fig. 4. The overall framework of the S GN algorithm for network structure feature fusion.

Algorithm 1: Construction of S GN.Input:

A network G ( V , E ) with node set V and linkset E ⊆ ( V × V ) ;Sampling strategy f s ( · ) ;The order of SGN h . Output: S GN, denoted by G s ( V s , E s ). Initialize a temporary object G s = G ; while h do if the G s is not full-connected then GetMaxSubstructure( G s ); Initial node u = NodeRanking( V s ); Get sampling substructure (cid:99) G s throughexecuting Algorithm 2; G sgn = SGNAlgorithms( (cid:99) G s ); G s ← Relabeled( G sgn ); end else Repeat 5-8; end h = h − ; end return G s ( V s , E s ) • S GN construction : Based on the sampling substructure,we use SGN model to construct S GN. Note that networksampling and SGN are adopted iteratively so as to getthe S GNs of higher orders. This method can control thesize of S GNs and meanwhile increase their diversity.Therefore, compared with SGN, the S GN could furtherenhance both efﬁciency and effectiveness of the subse-quent network algorithms.Now, we use various feature extraction methods to getstructural features from S GNs of different orders, which areﬁrst fused and then used to establish the graph classiﬁcationmodels. The overall framework of S GN construction forstructural feature space expansion is shown in Fig. 4. Notethat, generally, information fusion tries to integrate information

Algorithm 2: Sampling substructure.Input:

A network G ( V , E );Source node u ;Sampling walks l . Output:

Sampling substructure, denoted by (cid:99) G s = g ( (cid:98) v , (cid:98) e ). Let v = u , initial walk v to [ v ], walk e to ∅ ; Select ﬁrst edge e with a given probability ofsampling strategy; Append the v = dst ( e ) to walk v , e to walk e ; for i = 2 to l − do cur v = walk v [-1], cur e = walk e [-1]; e i = GetNextEdgeWithStrategy( cur v , cur e ); Append e i to walk e , v i = dst ( e i ) to walk v ; end (cid:98) v = walk v , (cid:98) e = walk e ; return (cid:99) G s = g ( (cid:98) v , (cid:98) e )from multiple aspects to improve algorithm performance,which has a wide range of applications in practice. Forinstance, in speech recognition, the visual features of the lipmotion are fused with the speech signal features to predict thewords expressed [53]. In image recognition, Xuan et al. [54]developed a multistream convolutional neural network to auto-matically merge the features of multi-view pearl images, so asto improve the accuracy of pearl classiﬁcation. In this paper,we use different sampling strategies to capture the structuralfeatures from different aspects. As an example, we visualizedifferent 1st-order and 2nd-order S GNs generated by the fournetwork sampling strategies on positive and negative samplesfrom the MUTAG dataset, as shown in Fig. 5. It can be seenthat the S GNs generated by different sampling strategies havequite different structures, and the structural difference betweenthe positive and negative samples may be enlarged in S GNs.Therefore, it can be expected that the fusion of these diverseS GNs could improve the performance of graph classiﬁcation.

Positive SampleNegative S ampleOriginal N etworkOriginal N etwork 2nd-order1st-order2nd-order1st-orderS GN-RW S GN-BW S GN-LS S GN-STS GN-RW S GN-BW S GN-LS S GN-STS GN-RW S GN-BW S GN-LS S GN-STS GN-RW S GN-BW S GN-LS S GN-ST

Fig. 5. Visualization of 1st-order and 2nd-order S GNs using four network sampling strategies on positive and negative samples from the MUTAG dataset.

IV. E

XPERIMENTS

Now, we compare S GN and SGN models on their abilitiesto enhance graph classiﬁcation based on four feature extractionmethods. We ﬁrst introduce the datasets, followed by thefeature extraction methods and the parameter setting. Afterthat, we show the experimental results with discussion.

A. Datasets

We test our S GN method on eight real-world networkdatasets, as introduced in the following. IMDB-BINARY isabout social networks, while the others are about bio- andchemo-informatics networks. The basic statistics of thesedatasets are presented in Table I. • MUTAG [55] contains 188 mutagenic aromatic and het-eroaromatic compounds, with nodes and edges repre-senting atoms and the chemical bonds between them,respectively. They are labeled according to whether thereis a mutagenic effect on a special bacteria. • PTC [56] includes 344 chemical compound graphs, withnodes and edges representing atoms and the chemical

TABLE IB

ASIC STATISTICS OF EIGHT DATASETS . N G IS THE NUMBER OF GRAPHS , C max IS THE NUMBER OF GRAPHS BELONGING TO THE LARGEST CLASS , N C IS THE NUMBER OF CLASSES , AND

ODES AND

DGES ARE THEAVERAGE NUMBERS OF NODES AND EDGES , RESPECTIVELY , OF THEGRAPHS IN THE DATASET .Dataset N G C max N C & D 1178 691 2 284 716 bonds between them, respectively. Their labels are de-termined by their carcinogenicity for rats. • PROTEINS [57] comprises of 1113 graphs. The nodes areSecondary Structure Elements (SSEs) and the edges areneighbors in the amino-acid sequence or in the 3D space.These graphs represent either enzyme or non-enzyme proteins. • ENZYMES [58] contains 600 protein tertiary structures,and each enzyme belongs to one of the 6 EC top-levelclasses. • NCI1 & NCI109 [8] comprise of 4110 and 4127 graphs,respectively. The nodes and edges represent atoms andchemical bonds between them, respectively. They are twobalanced subsets of the datasets of chemical compoundsscreened for the activities against non-small cell lungcancer and ovarian cancer cell lines, respectively. Thepositive and negative samples are distinguished accordingto whether they are effective against cancer cells. • IMDB-BINARY [59] is about movie collaboration includ-ing 1000 graphs, which is collected from IMDB andcontains lots of information about different movies. Eachgraph is an ego-network, where nodes represent actors oractresses and edges indicate whether they appear in thesame movie. Each graph is categorized into one of thetwo genres (Action and Romance). • D & D [60] contains 1178 graphs of protein structures. Anode represents an amino acid and edges are constructedif the distance between two nodes is less than 6 ˚ A . A labeldenotes whether a protein is an enzyme or non-enzyme. B. Feature Extraction Methods

We adopt four typical methods to generate graph represen-tation, namely manual attributes, Graph2Vec, DeepKernel, andCapsGNN, which are introduced in the following. • Attributes : Here, we use the same 11 manual attributesas those introduced in [25], including the number ofnodes, the number of edges, average degree, networkdensity, average clustering coefﬁcient, the percentageof leaf nodes, the largest eigenvalue of the adjacencymatrix, average betweenness centrality, average closenesscentrality, and average eigenvector centrality. • Graph2Vec [35]: This is the ﬁrst unsupervised embeddingapproach for an entire network, which is based on theextending word-and-document embedding techniques thathas shown great advantages in natural language process-ing (NLP). • DeepKernel [37]: This method provides a uniﬁed frame-work that leverages the dependency information of sub-structures by learning latent representations. The sub-structure similarity matrix, M , is calculated by the matrix V with each column representing a sub-structure vector.Denote by P the matrix with each column representing asub-structure frequency vector. According to the deﬁni-tion of kernel: K = PMP T = PVV T P T = HH T , onecan use the columns in the matrix H = PV as the inputsto the classiﬁer. • CapsGNN [42]: This method was inspired by Cap-sNet [61], which adopts the concept of capsules toovercome the weakness of existing GNN-based graphembedding algorithms. In particular, CapsGNN extractsnode features in the form of capsules and utilizes therouting mechanism to capture important information atthe graph level. The model generates multiple embed- dings for each graph so as to capture graph propertiesfrom different aspects.

C. Parameter Setting

For source node selection, we choose the node of thelargest K-shell [51] as the source node for random walk (RW)and biased walk (BW), and choose the edge of the largestbetweenness centrality as the source edge for link selection(LS). We randomly pick up a node as the source node for thespanning tree (ST) to increase the diversity of S GN, since thesampled subnetworks will be quite similar if we ﬁx the sourcenode for this method. Moreover, we set the two parameters ofBW as p = 4 and q = 1 .In this study, for Graph2Vec , the embedding dimension isadopted according to [35]. Since the embedding dimensionis predominant for learning performances, a commonly-usedvalue of 1024 is adopted. The other parameters are set todefault values: the learning rate is set to 0.5, the batch sizeis set to 512 and the number of epochs is set to 1000. For

DeepKernel , according to [37], the Weisfelier-Lehman subtreekernel is used to build the corpus and its height is set to2. Furthermore, the embedding dimension is set to 10, thewindow size is set to 5 and skip-gram is used for the word2vecmodel. We adopt the default parameters for

CapsGNN andﬂatten the multiple embeddings of each graph as the input.Without loss of generality, the well-known Random Forest ischosen as the classiﬁcation model. Meanwhile, for each featureextraction method, the feature space is ﬁrst expanded by usingS GNs, and then the dimension of the feature vectors isreduced to the same value as that of the feature vector obtainedfrom the original network using PCA in the experiments, for afair comparison. Each dataset is randomly split into 8 folds fortraining and 2 fold for testing. Here, the F - Score is adoptedas the metric to evaluate the classiﬁcation performance: F = 2 P RP + R , (1)where P and R are the precision and recall, respectively. Inorder to diminish the random effect of the fold assignment tosome extent, the experiment is repeated 100 times and thenthe average F - Score and its standard deviation are reported.We further deﬁne the relative improvement rate (RIMP) ofSGN or S GN model as

RIM P = ( F model − F ori ) /F ori (2)where F ori and F model refer to the F - Score of the graphclassiﬁcation algorithm without and with the SGN model (orS GN-Fusion model), respectively.

D. Experimental Results

We use the four network sampling strategies to generatesampling substructures, and further construct the correspond-ing 1st-order and 2nd-order S GNs, denoted by S GN-RW,S GN-BW, S GN-LS, and S GN-ST, respectively . After that, It has been proven that the graph classiﬁcation models can be signiﬁcantlyenhanced by appropriately using the structural information of the SGNs in theﬁrst two orders, while such gain will be reduced soon as more SGNs of higherorders are integrated [25]. This is why we only use the S GNs of the ﬁrsttwo orders here.

TABLE IIC

LASSIFICATION RESULTS MEASURED BY F - Score

ON EIGHT DATASETS BY USING DIFFERENT FEATURE EXTRACTION METHODS . Algorithm Classiﬁcation results ( F - Score , %)

Attributes

MUTAG PTC PROTEINS ENZYMES NCI1 NCI109 IMDB-BINARY D & D Avg.Original . ± .

61 63 . ± .

55 78 . ± .

49 43 . ± .

29 67 . ± .

87 67 . ± .

25 73 . ± .

68 75 . ± . . ± .

21 67 . ± .

36 79 . ± .

96 50 . ± .

91 69 . ± .

59 69 . ± . . ± . . ± . RIMP -SGN 5.78% 6.96% 1.48% 15.79% 3.50% 3.55% 6.37% 1.05% 4.97%S GN-RW . ± .

11 66 . ± .

62 77 . ± .

52 52 . ± .

36 74 . ± .

69 73 . ± .

84 71 . ± .

74 77 . ± . GN-BW . ± .

37 69 . ± . . ± . . ± .

78 75 . ± .

98 73 . ± .

04 76 . ± . . ± . GN-LS . ± .

48 66 . ± .

55 78 . ± .

44 49 . ± .

14 75 . ± .

03 74 . ± .

62 71 . ± .

53 76 . ± . GN-ST . ± .

12 70 . ± .

75 76 . ± .

89 45 . ± .

29 72 . ± .

08 73 . ± .

76 77 . ± .

50 76 . ± . GN-Fusion . ± .

84 72 . ± . . ± . . ± .

90 76 . ± .

32 74 . ± . . ± .

21 77 . ± . . RIMP -Fusion 9.42% 13.44% 1.07% 27.39% 12.67% 11.21% 5.44% 1.56% 9.12%

Graph2Vec

MUTAG PTC PROTEINS ENZYMES NCI1 NCI109 IMDB-BINARY D & D Avg.Original . ± .

25 60 . ± .

86 73 . ± .

05 45 . ± .

73 73 . ± .

81 74 . ± .

47 62 . ± .

99 70 . ± . . ± .

70 63 . ± .

70 74 . ± .

09 48 . ± .

56 76 . ± .

21 74 . ± .

76 70 . ± .

55 80 . ± . RIMP -SGN 4.44% 5.10% 1.56% 7.88% 4.67% 0.81% 13.09% 14.48% 4.39%S GN-RW . ± .

69 61 . ± .

06 76 . ± .

12 48 . ± .

53 76 . ± .

35 74 . ± .

40 68 . ± .

57 81 . ± . GN-BW . ± .

07 64 . ± .

85 77 . ± . . ± .

30 77 . ± .

12 75 . ± .

46 71 . ± .

00 82 . ± .

22 73 . S GN-LS . ± .

57 62 . ± .

88 76 . ± .

21 47 . ± . . ± .

71 77 . ± . . ± .

16 81 . ± . GN-ST . ± .

99 63 . ± .

39 75 . ± .

15 49 . ± .

91 76 . ± .

21 72 . ± .

89 72 . ± .

11 74 . ± . GN-Fusion . ± .

37 64 . ± .

42 75 . ± . . ± . . ± .

72 75 . ± . . ± .

17 82 . ± .

79 73 . RIMP -Fusion -1.71% 7.00% 2.46% 21.28% 5.04% 1.97% 22.35% 17.79% 8.46%

DeepKernel

MUTAG PTC PROTEINS ENZYMES NCI1 NCI109 IMDB-BINARY D & D Avg.Original . ± .

68 59 . ± .

09 73 . ± .

82 45 . ± .

73 67 . ± .

91 67 . ± .

36 67 . ± .

45 75 . ± . . ± .

15 65 . ± .

05 76 . ± .

41 45 . ± .

75 70 . ± .

24 71 . ± .

61 75 . ± .

55 77 . ± . RIMP -SGN 12.94% 11.59% 4.75% 1.98% 4.77% 6.00% 12.15% 2.46% 7.29%S GN-RW . ± .

66 61 . ± .

77 75 . ± .

21 43 . ± .

64 69 . ± .

63 69 . ± .

70 72 . ± .

68 83 . ± . GN-BW . ± .

43 67 . ± .

48 76 . ± .

97 47 . ± .

68 71 . ± .

38 69 . ± .

05 74 . ± .

33 81 . ± . GN-LS . ± .

59 66 . ± .

21 76 . ± .

92 50 . ± . . ± . . ± .

26 75 . ± . . ± . GN-ST . ± .

68 65 . ± .

59 74 . ± .

54 48 . ± .

52 70 . ± . . ± . . ± .

07 78 . ± . GN-Fusion . ± .

07 70 . ± .

25 77 . ± .

97 52 . ± . . ± .

01 70 . ± . . ± . . ± . . RIMP -Fusion 14.20% 20.05% 5.24% 15.92% 5.96% 5.13% 13.33% 10.27% 10.94%

CapsGNN

MUTAG PTC PROTEINS ENZYMES NCI1 NCI109 IMDB-BINARY D & D Avg.Original . ± .

52 62 . ± .

25 75 . ± .

51 49 . ± .

02 78 . ± .

80 72 . ± .

15 72 . ± .

36 67 . ± . . ± .

44 64 . ± .

67 76 . ± .

13 50 . ± .

70 78 . ± .

87 73 . ± .

39 76 . ± .

74 68 . ± . RIMP -SGN 3.65% 3.32% 0.59% 0.52% 0.40% 1.00% 5.17% 1.42% 2.06%S GN-RW . ± .

59 77 . ± .

96 84 . ± .

09 51 . ± .

14 74 . ± .

40 75 . ± .

39 92 . ± .

15 78 . ± . GN-BW . ± .

82 81 . ± .

45 84 . ± .

72 52 . ± . . ± . . ± .

69 93 . ± . . ± . GN-LS . ± .

59 79 . ± .

16 84 . ± .

96 52 . ± .

23 76 . ± .

26 75 . ± .

46 93 . ± .

75 77 . ± . GN-ST . ± .

73 78 . ± .

06 84 . ± .

58 52 . ± .

18 76 . ± .

42 75 . ± .

60 94 . ± .

26 72 . ± . GN-Fusion . ± .

11 84 . ± .

47 85 . ± .

84 56 . ± . . ± . . ± .

05 95 . ± . . ± . . RIMP -Fusion 7.91% 35.55% 12.24% 12.66% -0.22% 7.18% 30.79% 14.90% 14.49% we adopt the four feature extraction methods, namely manualattributes, Graph2Vec, DeepKernel, and CapsGNN, to getstructural feature vectors. For each feature extraction method,we fuse the vectors generated from the different S GNs to asingle vector. Finally, this vector is fed into the Random Forestmodel to produce the classiﬁcation result. Note that we alsoproduce the results for a single sampling strategy for a morecomprehensive comparison. Here, a ten-fold cross-validationmethod is used to calculate F - Score of graph classiﬁcation.To enrich the sampling structures and reduce the probabilityof sampling repetition, 10 sampling averaging processes werecarried out for each sampling strategy.

1) Enhancement on classiﬁcation performance:

The ex-perimental results are shown in Table II, where one cansee that the four S GN models based on a single sampling strategy, i.e., S GN-RW, S GN-BW, S GN-LS, and S GN-ST, are comparable with the SGN model, which all producesimilar classiﬁcation results under different datasets and fea-ture extraction methods. Interestingly, S GN-BW outperformsSGN in enhancing the classiﬁcation models based on the fourfeature extraction methods in most cases, leading to a relativeimprovement of 4.52% on average. Such results are consistentwith the experience that Node2Vec is a powerful method tocapture the structural properties of a network. Moreover, sincedifferent S GNs generated by different sampling strategiescan capture the different aspects of a network, as visualizedin Fig. 5, one may expect that the fusion of these S GNscan produce even better classiﬁcation results. Indeed, we ﬁndthat the fusion of S GNs increases the performance of theoriginal graph classiﬁcation algorithms in 30 out of 32 cases, M U T AG E N Z Y M E SP R O TE I N SP T C N C I N C I I M D B - B I NA R YD & D Fig. 6. Average F - Score as functions of the training set size (represented by the fraction of samples in the training set), for various feature extractionmethods on different datasets, based on RW, BW, LS, ST and Fusion, respectively. with a relative improvement of 10.75% on average (muchbetter than 4.68% by SGN). The value increases to 14.49%(much better than 2.06% by SGN) when only CapsGNN isconsidered. This result is quite impressive, since CapsGNN, together with S GN, achieves the state-of-the-art performanceon PROTEINS and IMDB-BINARY datasets.To address the robustness of our S GN model against thesize variation of the training set, the F - Score is calculated by using various sizes of training sets (from 10 to 90 percent,within a 20 percent interval). For each size, the trainingand test sets are randomly divided, which is repeated 100times with the average result recorded. The results are shownin Fig. 6 for various feature extraction methods on eightdatasets. It can be seen that still the curves of S GN-Fusionare relatively higher than those of S GNs generated by asingle sampling strategy in most cases, indicating that thesuperiority of S GN-Fusion is robust in enhancing graphclassiﬁcation algorithms. In particular, such superiority seemsmuch more signiﬁcant when enhancing CapsGNN, which isinteresting and may indicate that the potential of S GN-Fusioncould be exploited further by connecting a better embeddingmethod or end-to-end graph neural network, and meanwhilethere could be much room for further improvement for graphclassiﬁcation.

2) Reduction of time complexity:

Note that one importantmotivation to introduce sampling strategies into SGN is to con-trol the network size so as to improve the efﬁciency of the net-work algorithms based upon them. Therefore, here to addressthe computational complexity of our method, we record theaverage computational time of SGN and S GN generated bythe four sampling strategies on the eight datasets, namely MU-TAG, PTC, PROTEINS, ENZYMES, NCI1, NCI109, IMDB-BINARY, and D & D. The results are presented in Table III,where one can see that, overall, the computational time ofS GN is much less than that of SGN for each sampling strat-egy on each dataset, decreasing from hundreds of seconds toless than 19 seconds. In fact, the computational time of S GNsgenerated by different sampling strategies is comparable toeach other. Considering that S GN-Fusion method needs togenerate all the four S GNs, its computational time is close tothe sum of individual ones, which is still less than 25 seconds.Such results suggest that, by comparing with SGN, our S GNmodel can indeed largely increase the efﬁciency of the networkalgorithms.In fact, we can estimate the time complexity of our modelin theory. For random walk, it is a computationally efﬁcientsampling method, which only requires O ( | E | ) space complex-ity to store the neighbors of each node in the graph. As forthe time complexity, by imposing graph connectivity in thesample generation process, random walk provides a convenientmechanism to increase the effective sampling rate by reusingsamples across different source nodes. For biased walk, weadopt the 2nd random walk mechanism of Node2Vec, whereeach step of random walk is based on the transition probability α which can be precomputed, so the time consuming of eachstep using alias sampling is O (1) . Link selection broadens thescope of the start node at each step in the random walk process,thereby accelerating the time to reach the stop condition.Kruskal algorithm to generate spanning trees is a greedyalgorithm, which has O ( | E | log ( | E | )) time complexity. Weknow that the computational complexity of SGN (1) is O ( | E | ) and that of constructing SGN (2) is O ( | E | ) . Our S GN modelconstrains the expansion of the network scale and reducesthe cost of constructing SGNs to the ﬁxed O ( | E | ) . Thus,the time computational complexity T of our S GN model is O ( | E | + | E | ) ≤ T ≤ O ( | E | log | E | + | E | | ) according to the TABLE IIIA

VERAGE COMPUTATIONAL TIME TO ESTABLISH

SGN

AND S GN S BYTHE FOUR SAMPLING STRATEGIES ON THE EIGHT DATASETS .Time (Seconds) SGN S GNRW BW LS STMUTAG . × . × . × . × . × . × . × & D . × Fig. 7. The t-SNE visualization of structural features using CapsGNN without(left) and with (right) S GN-ST. The same color of points represent the sameclass of graphs in IMDB-BINARY dataset. different sampling strategies, which is much lower than thatof SGN.

3) Visualization:

As a simple case study, we visualize theresults of classiﬁcation on IMDB-BINARY dataset based onCapsGNN method to verify the effectiveness of our S GNmodel. Here, we choose S GN-ST to visualize since this isthe best S GN generated by the single sampling strategy thatenhances the classiﬁcation performance of CapsGNN most.As shown in Fig. 7, the structural features are located indifferent places by utilizing t-SNE. The left shows the originalclassiﬁcation result using CapsGNN without S GN-ST, whilethe right depicts the optimized distribution of the same datasetusing CapsGNN with S GN-ST. One can see that the graphsin IMDB-BINARY dataset can indeed be distinguished by theoriginal features of CapsGNN, but it appears that the distinc-tion of graphs could become more explicit after hierarchicalrepresentation through network sampling and SGN mapping,demonstrating the effectiveness of our S GN model.V. C

ONCLUSIONS

In this paper, we present a novel sampling subgraph network(S GN) model as well as a hierarchical feature fusion frame-work for graph classiﬁcation by introducing network samplingstrategies into the SGN model. Compared with the latter, theS GNs are of higher diversity and controllable scale, and thusbeneﬁt the network feature extraction methods to capture morevarious aspects of the network structure with higher efﬁciency.We use different sampling strategies, namely random walk(RW), biased walk (BW), link selection (LS), and spanningtree (ST), to generate the corresponding sampling subgraph networks S GN-RW, S GN-BW, S GN-LS, and S GN-ST,respectively. The experimental results show that, comparedwith SGN, S GN has much lower time complexity, which wasreduced by almost two orders of magnitude, and meanwhilethey have comparable effects on graph classiﬁcation. In fact,the network algorithms based on S GN-BW behave evenbetter than those based on SGN, although each samplingsubnetwork is only a part of the original network. Moreinterestingly, when the features of all the four S GNs are fusedand then fed into graph classiﬁcation models, the classiﬁcationperformance can be signiﬁcantly enhanced. In particular, whenCapsGNN is used to extract the features of these S GNs, wecan achieve the-state-of-the-art results on the PROTEINS andIMDB-BINARY datasets.In the future, we will try more sampling strategies and thenintegrate them with SGN to generate more diverse S GNs;we will also apply our framework to more tasks beyond graphclassiﬁcation, such as link prediction, node classiﬁcation, etc.VI. A

CKNOWLEDGMENTS

The authors would like to thank all the members in theIVSN Research Group, Zhejiang University of Technology forthe valuable discussions about the ideas and technical detailspresented in this paper. This work was partially supportedby the National Natural Science Foundation of China underGrant 61973273, by the Zhejiang Provincial Natural ScienceFoundation of China under Grant LR19F030001, and by theHong Kong Research Grants Council under the GRF GrantCityU11200317. R

EFERENCES[1] Q. Xuan, X. Shu, Z. Ruan, J. Wang, C. Fu, and G. Chen, “A self-learning information diffusion model for smart social networks,”

IEEETransactions on Network Science and Engineering , vol. 7, no. 3, pp.1466–1480, 2019.[2] J. Kim and M. Hastak, “Social network analysis: Characteristics ofonline social networks after a disaster,”

International Journal of Infor-mation Management , vol. 38, no. 1, pp. 86–96, 2018.[3] C. Fu, M. Zhao, L. Fan, X. Chen, J. Chen, Z. Wu, Y. Xia, and Q. Xuan,“Link weight prediction using supervised learning methods and itsapplication to yelp layered network,”

IEEE Transactions on Knowledgeand Data Engineering , vol. 30, no. 8, pp. 1507–1518, 2018.[4] Z. Ruan, C. Song, X.-h. Yang, G. Shen, and Z. Liu, “Empirical analysisof urban road trafﬁc network: A case study in hangzhou city, china,”

Physica A: Statistical Mechanics and its Applications , vol. 527, p.121287, 2019.[5] D. Tang, W. Du, L. Shekhtman, Y. Wang, S. Havlin, X. Cao, and G. Yan,“Predictability of real temporal networks,”

National Science Review ,vol. 7, no. 5, pp. 929–937, 2020.[6] D. Xu, C. Wei, P. Peng, Q. Xuan, and H. Guo, “Ge-gan: A novel deeplearning framework for road trafﬁc state estimation,”

TransportationResearch Part C: Emerging Technologies , vol. 117, p. 102635, 2020.[7] M. Walter, C. Chaban, K. Sch¨utze, O. Batistic, K. Weckermann, C. N¨ake,D. Blazevic, C. Grefen, K. Schumacher, C. Oecking, K. Harter, andJ. Kudla, “Visualization of protein interactions in living plant cells usingbimolecular ﬂuorescence complementation,”

The Plant Journal , vol. 40,no. 3, pp. 428–438, 2004.[8] N. Wale, I. A. Watson, and G. Karypis, “Comparison of descriptorspaces for chemical compound retrieval and classiﬁcation,”

Knowledgeand Information Systems , vol. 14, no. 3, pp. 347–375, 2008.[9] J. Zhou, J. Shen, S. Yu, G. Chen, and Q. Xuan, “M-evolve: Structural-mapping-based data augmentation for graph classiﬁcation,”

IEEE Trans-actions on Network Science and Engineering , 2020.[10] M. R. Hosseini, M. Maghrebi, A. Akbarnezhad, I. Martek, and M. Arash-pour, “Analysis of citation networks in building information modelingresearch,”

Journal of Construction Engineering and Management , vol.144, no. 8, p. 04018064, 2018. [11] M. Yasunaga, J. Kasai, R. Zhang, A. R. Fabbri, I. Li, D. Friedman, andD. R. Radev, “Scisummnet: A large annotated corpus and content-impactmodels for scientiﬁc paper summarization with citation networks,” in

Proceedings of the AAAI Conference on Artiﬁcial Intelligence , vol. 33,2019, pp. 7386–7393.[12] J. B. Lee, R. A. Rossi, X. Kong, S. Kim, E. Koh, and A. Rao, “Graphconvolutional networks with motif-based attention,” in

Proceedings ofthe 28th ACM International Conference on Information and KnowledgeManagement , 2019, pp. 499–508.[13] Y. Lu, Y. Chen, D. Zhao, and J. Chen, “Graph-fcn for image seman-tic segmentation,” in

International Symposium on Neural Networks .Springer, 2019, pp. 97–105.[14] H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye,G. Anderson, G. Corrado, W. Chai, M. Ispir et al. , “Wide & deeplearning for recommender systems,” in

Proceedings of the 1st Workshopon Deep Learning for Recommender Systems . ACM, 2016, pp. 7–10.[15] H. Wang, M. Zhao, X. Xie, W. Li, and M. Guo, “Knowledge graphconvolutional networks for recommender systems,” in

The world wideweb conference , 2019, pp. 3307–3313.[16] X. Zhang, Y. Li, D. Shen, and L. Carin, “Diffusion maps for textualnetwork embedding,” in

Advances in Neural Information ProcessingSystems , 2018, pp. 7587–7597.[17] C. Fu, Y. Zheng, Y. Liu, Q. Xuan, and G. Chen, “Nes-tl: Networkembedding similarity-based transfer learning,”

IEEE Transactions onNetwork Science and Engineering , vol. 7, no. 3, pp. 1607–1618, 2019.[18] Y. Jing, Y. Bian, Z. Hu, L. Wang, and X.-Q. S. Xie, “Deep learning fordrug design: An artiﬁcial intelligence paradigm for drug discovery inthe big data era,”

The AAPS journal , vol. 20, no. 3, p. 58, 2018.[19] T. Lane, D. P. Russo, K. M. Zorn, A. M. Clark, A. Korotcov,V. Tkachenko, R. C. Reynolds, A. L. Perryman, J. S. Freundlich,and S. Ekins, “Comparing and validating machine learning models formycobacterium tuberculosis drug discovery,”

Molecular pharmaceutics ,vol. 15, no. 10, pp. 4346–4360, 2018.[20] S.-Y. Liu, J. Xiao, and X.-K. Xu, “Link prediction in signed socialnetworks: from status theory to motif families,”

IEEE Transactions onNetwork Science and Engineering , vol. 7, no. 3, pp. 1724–1735, 2019.[21] Q. Xuan, H. Fang, C. Fu, and V. Filkov, “Temporal motifs revealcollaboration patterns in online task-oriented networks,”

Physical ReviewE , vol. 91, no. 5, p. 052813, 2015.[22] A. Narayanan, M. Chandramohan, L. Chen, Y. Liu, and S. Saminathan,“subgraph2vec: Learning distributed representations of rooted sub-graphs from large graphs,” in

International Workshop on Mining andLearning with Graphs. , 2016.[23] J. Ugander, L. Backstrom, and J. Kleinberg, “Subgraph frequencies:Mapping the empirical and extremal geography of large graph collec-tions,” in

Proceedings of the 22nd international conference on WorldWide Web . ACM, 2013, pp. 1307–1318.[24] D. Nguyen, W. Luo, T. D. Nguyen, S. Venkatesh, and D. Phung,“Learning graph representation via frequent subgraphs,” in

Proceedingsof the 2018 SIAM International Conference on Data Mining . SIAM,2018, pp. 306–314.[25] Q. Xuan, J. Wang, M. Zhao, J. Yuan, C. Fu, Z. Ruan, and G. Chen,“Subgraph networks with application to structural feature space expan-sion,”

IEEE Transactions on Knowledge and Data Engineering , 2019,doi:10.1109/TKDE.2019.2957755.[26] J. D. Noh and H. Rieger, “Random walks on complex networks,”

Physical review letters , vol. 92, no. 11, p. 118701, 2004.[27] R. Andersen, F. Chung, and K. Lang, “Local graph partitioning usingpagerank vectors,” in . IEEE, 2006, pp. 475–486.[28] F. Fouss, A. Pirotte, J.-M. Renders, and M. Saerens, “Random-walkcomputation of similarities between nodes of a graph with applicationto collaborative recommendation,”

IEEE Transactions on knowledge anddata engineering , vol. 19, no. 3, pp. 355–369, 2007.[29] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learningof social representations,” in

Proceedings of the 20th ACM SIGKDDinternational conference on Knowledge discovery and data mining ,2014, pp. 701–710.[30] A. Grover and J. Leskovec, “node2vec: Scalable feature learning fornetworks,” in

Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining . ACM, pp.855–864.[31] M. Kurant, A. Markopoulou, and P. Thiran, “On the bias of bfs (breadthﬁrst search),” in .IEEE, 2010, pp. 1–8.[32] V. Satuluri, S. Parthasarathy, and Y. Ruan, “Local graph sparsiﬁcationfor scalable clustering,” in

Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data . Association forComputing Machinery, 2011, p. 721–732.[33] N. M. Devi and S. R. Kasireddy, “Graph analysis and visualization ofsocial network big data,” in

Social Network Forensics, Cyber Security,and Machine Learning . Springer, 2019, pp. 93–104.[34] G. Li, M. Semerci, B. Yener, and M. J. Zaki, “Graph classiﬁcation viatopological and label attributes,” in

Proceedings of the 9th internationalworkshop on mining and learning with graphs (MLG), San Diego, USA ,vol. 2, 2011.[35] A. Narayanan, M. Chandramohan, R. Venkatesan, L. Chen, Y. Liu, andS. Jaiswal, “graph2vec: Learning distributed representations of graphs,”in

International Workshop on Mining and Learning with Graphs. , 2017.[36] N. Shervashidze, P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, andK. M. Borgwardt, “Weisfeiler-lehman graph kernels,”

J. Mach. Learn.Res. , vol. 12, no. null, p. 2539–2561, Nov. 2011.[37] P. Yanardag and S. Vishwanathan, “Deep graph kernels,” in

Proceedingsof the 21th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining . ACM, 2015, pp. 1365–1374.[38] T. N. Kipf and M. Welling, “Semi-supervised classiﬁcation with graphconvolutional networks,” in

International Conference on Learning Rep-resentations (ICLR) , 2017.[39] Q. Li, Z. Han, and X.-M. Wu, “Deeper insights into graph convolutionalnetworks for semi-supervised learning,” in

AAAI , 2018.[40] M. Niepert, M. Ahmed, and K. Kutzkov, “Learning convolutionalneural networks for graphs,” in

Proceedings of the 33rd InternationalConference on International Conference on Machine Learning - Volume48 , ser. ICML’16. JMLR.org, 2016, p. 2014–2023.[41] P. Veliˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Li`o, andY. Bengio, “Graph attention networks,”

International Conference onLearning Representations , 2018, accepted as poster. [Online]. Available:https://openreview.net/forum?id=rJXMpikCZ[42] X. Zhang and L. Chen, “Capsule graph neural network,” in

InternationalConference on Learning Representations , 2019. [Online]. Available:https://openreview.net/forum?id=Byl8BnRcYm[43] F. Harary and R. Z. Norman, “Some properties of line digraphs,”

Rendiconti del Circolo Matematico di Palermo , vol. 9, no. 2, pp. 161–168, 1960.[44] J.-P. Eckmann and E. Moses, “Curvature of co-links uncovers hiddenthematic layers in the world wide web,”

Proceedings of the nationalacademy of sciences , vol. 99, no. 9, pp. 5825–5829, 2002.[45] K. Pearson, “The problem of the random walk,”

Nature , vol. 72, no.1867, pp. 342–342, 1905.[46] Y. Azar, A. Z. Broder, A. R. Karlin, N. Linial, and S. Phillips,“Biased random walks,” in

Proceedings of the twenty-fourth annualACM symposium on Theory of computing , 1992, pp. 1–9.[47] K. M. Adal, B. B. Samir, and N. B. Z. Ali, “Biased random walk basedrouting for mobile ad hoc networks,” in . IEEE, 2010, pp. 1–6.[48] A. Dey, S. Broumi, A. Bakali, M. Talea, F. Smarandache et al. , “Anew algorithm for ﬁnding minimum spanning trees with undirectedneutrosophic graphs,”

Granular Computing , vol. 4, no. 1, pp. 63–69,2019.[49] L. Najman, J. Cousty, and B. Perret, “Playing with kruskal: algorithmsfor morphological trees in edge-weighted graphs,” in

InternationalSymposium on Mathematical Morphology and Its Applications to Signaland Image Processing . Springer, 2013, pp. 135–146.[50] K. Okamoto, W. Chen, and X.-Y. Li, “Ranking of closeness centralityfor large-scale social networks,” in

International workshop on frontiersin algorithmics . Springer, 2008, pp. 186–195.[51] L. L¨u, D. Chen, X.-L. Ren, Q.-M. Zhang, Y.-C. Zhang, and T. Zhou,“Vital nodes identiﬁcation in complex networks,”

Physics Reports , vol.650, pp. 1–63, 2016.[52] A. N. Langville and C. D. Meyer, “Deeper inside pagerank,”

InternetMathematics , vol. 1, no. 3, pp. 335–380, 2004.[53] P. Zhou, W. Yang, W. Chen, Y. Wang, and J. Jia, “Modality attentionfor end-to-end audio-visual speech recognition,” in

ICASSP 2019-2019IEEE International Conference on Acoustics, Speech and Signal Pro-cessing (ICASSP) . IEEE, 2019, pp. 6565–6569.[54] Q. Xuan, B. Fang, Y. Liu, J. Wang, J. Zhang, Y. Zheng, and G. Bao,“Automatic pearl classiﬁcation machine based on a multistream convo-lutional neural network,”

IEEE Transactions on Industrial Electronics ,vol. 65, no. 8, pp. 6538–6547, 2018.[55] A. K. Debnath, R. L. Lopez de Compadre, G. Debnath, A. J. Shusterman,and C. Hansch, “Structure-activity relationship of mutagenic aromaticand heteroaromatic nitro compounds. correlation with molecular orbitalenergies and hydrophobicity,”

Journal of medicinal chemistry , vol. 34,no. 2, pp. 786–797, 1991. [56] H. Toivonen, A. Srinivasan, R. D. King, S. Kramer, and C. Helma,“Statistical evaluation of the predictive toxicology challenge 2000–2001,”

Bioinformatics , vol. 19, no. 10, pp. 1183–1193, 2003.[57] K. M. Borgwardt, C. S. Ong, S. Sch¨onauer, S. Vishwanathan, A. J.Smola, and H.-P. Kriegel, “Protein function prediction via graph ker-nels,”

Bioinformatics , vol. 21, pp. i47–i56, 2005.[58] R. Rossi and N. Ahmed, “The network data repository with interactivegraph analytics and visualization,” in

Proceedings of the AAAI Confer-ence on Artiﬁcial Intelligence , vol. 29, no. 1, 2015.[59] D. Nguyen, W. Luo, T. D. Nguyen, S. Venkatesh, and D. Phung,“Learning graph representation via frequent subgraphs,” in

Proceedingsof the 2018 SIAM International Conference on Data Mining . SIAM,2018, pp. 306–314.[60] P. D. Dobson and A. J. Doig, “Distinguishing enzyme structures fromnon-enzymes without alignments,”

Journal of molecular biology , vol.330, no. 4, pp. 771–783, 2003.[61] S. Sabour, N. Frosst, and G. Hinton, “Matrix capsules with em routing,”in6th international conference on learning representations, ICLR