Graph Convolutional Networks with EigenPooling
GGraph Convolutional Networks with EigenPooling
Yao Ma
Michigan State [email protected]
Suhang Wang
Pennsylvania State [email protected]
Charu C. Aggarwal
IBM T. J. Watson Research [email protected]
Jiliang Tang
Michigan State [email protected]
ABSTRACT
Graph neural networks, which generalize deep neural networkmodels to graph structured data, have attracted increasing atten-tion in recent years. They usually learn node representations bytransforming, propagating and aggregating node features and havebeen proven to improve the performance of many graph relatedtasks such as node classification and link prediction. To apply graphneural networks for the graph classification task, approaches togenerate the graph representation from node representations aredemanded. A common way is to globally combine the node rep-resentations. However, rich structural information is overlooked.Thus a hierarchical pooling procedure is desired to preserve thegraph structure during the graph representation learning. There aresome recent works on hierarchically learning graph representationanalogous to the pooling step in conventional convolutional neural(CNN) networks. However, the local structural information is stilllargely neglected during the pooling process. In this paper, we in-troduce a pooling operator
EigenPooling based on graph Fouriertransform, which can utilize the node features and local structuresduring the pooling process. We then design pooling layers basedon the pooling operator, which are further combined with tradi-tional GCN convolutional layers to form a graph neural networkframework
EigenGCN for graph classification. Theoretical analysisis provided to understand
EigenPooling from both local and globalperspectives. Experimental results of the graph classification taskon 6 commonly used benchmarks demonstrate the effectiveness ofthe proposed framework.
ACM Reference Format:
Yao Ma, Suhang Wang, Charu C. Aggarwal, and Jiliang Tang. 2019. GraphConvolutional Networks with EigenPooling. In
Proceedings of ACM Confer-ence (Conference’17).
ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
Recent years have witnessed increasing interests in generalizingneural networks for graph structured data. The stream of research
Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].
Conference’17, July 2017, Washington, DC, USA © 2019 Association for Computing Machinery.ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00https://doi.org/10.1145/nnnnnnn.nnnnnnn on this topic is usually under the name of “Graph Neural Net-works” [34], which typically involves transforming, propagatingand aggregating node features across the graph. Among them, somefocus on node-level representation learning [18, 22, 35] while oth-ers investigate learning graph-level representation [4, 8, 11, 14, 15,19, 25, 48]. While standing from different perspectives, these meth-ods have been proven to advance various graph related tasks. Themethods focusing on node representation learning have broughtimprovement to tasks such as node classification [14–16, 18, 22, 35]and link prediction [35] and those methods working on graph-levelrepresentation learning have mainly facilitated graph classification.In this paper, we work on graph level representation learning with afocus on the task of graph classification.
The task of graph classification is to predict the label of a givengraph utilizing its associated features and graph structure. GraphNeural Networks can extract graph representation while usingall associated information. Majority of existing graph neural net-works [7, 11, 17, 25] have been designed to generate good node rep-resentations, and then globally summarize the node representationsas the graph representation. These methods are inherently “flat”since they treat all the nodes equivalently when generating graphrepresentation using the node representations. In other words, theentire graph structure information is totally neglected during thisprocess. However, nodes are naturally of different statuses androles in a graph, and they should contribute differently to the graphlevel representation. Furthermore, graphs often have different localstructures (or subgraphs), which contain vital graph characteristics.For instance, in a graph of a protein, atoms (nodes) are connectedvia bonds (edges); some local structures, which consist of groupsof atoms and their direct bonds, can represent some specific func-tional units, which, in turn, are important to tell the functionalityof the entire protein [3, 11, 37]. These local structures are also notcaptured during the global summarizing process. To generate thegraph representation which preserves the local and global graphstructures, a hierarchical pooling process, analogous to the poolingprocess in conventional convolutional neural (CNN) networks [23],is needed.There are very recent works investigating the pooling procedurefor graph neural networks [8, 13, 39, 48]. These methods groupnodes into subgraphs (supernodes), coarsen the graph based onthese subgraphs and then the entire graph information is reducedto the coarsened graph by generating features of supernodes fromtheir corresponding nodes in subgraphs. However, when poolingthe features for supernodes, average pooling or max pooling havebeen usually adopted where the structures of these group nodes(the local structures) are still neglected. With the local structures, a r X i v : . [ c s . L G ] M a y he nodes in the subgraphs are of different statuses and roles whenthey contribute to the supernode representations. It is challengingto design a general pooling operator while incorporating the localstructure information as 1) the subgraphs may contain differentnumbers of nodes, thus a fixed size pooling operator cannot workfor all subgraphs; and 2) the subgraphs could have very differentstructures, which may require different approaches to summarizethe information for the supernode representation. To address theaforementioned challenges, we design a novel pooling operator EigenPooling based on the eigenvectors of the subgraphs, whichnaturally have the same size of each subgraph and can effectivelycapture the local structures when summarizing node features forsupernodes.
EigenPooling can be used as pooling layers to stackwith any graph neural network layers to form a novel framework
EigenGCN for graph classification. Our major contributions can besummarized as follows: • We introduce a novel pooling operator
EigenPooling , whichcan naturally summarize the subgraph information whileutilizing the subgraph structure; • We provide theoretical understandings on
EigenPooling fromboth local and global perspectives; • We incorporate pooling layers based on
EigenPooling into ex-isting graph neural networks as a novel framework
EigenGCN for representation learning for graph classification; and • We conduct comprehensive experiments on numerous real-world graph classification benchmarks to demonstrate theeffectiveness of the proposed pooling operator.
EigenGCN
In this paper, we aim to develop a Graph Neural Networks (GNN)model, which consists of convolutional layers and pooling layers,to learn graph representations such that graph level classificationcan be applied. Before going to the details, we first introduced somenotations and the problem setting.
Problem Setting:
A graph can be represented as G = {E , V} ,where V = { v , . . . , v N } is the set of N nodes and E is the set ofedges. The graph structure information can also be representedby an adjacency matrix A ∈ R N × N . Furthermore, each node inthe graph is associated with node features and we use X ∈ R N × d to denote the node feature matrix, where d is the dimension offeatures. Note that this node feature matrix can also be viewed as a d -dimensional graph signal [38] defined on the graph G . In the graphclassification setting, we have a set of graphs {G i } , each graph G i is associated with a label y i . The task of the graph classification isto take the graph (structure information and node features) as inputand predict its corresponding label. To make the prediction, it isimportant to extract useful information from both graph structureand node features. We aim to design graph convolution layersand EigenPooling to hierarchically extract graph features, whichfinally learns a vector representation of the input graph for graphclassification.
EigenGCN
In this work, we build our model based on Graph ConvolutionalNetworks (GCN) [22], which has been demonstrated to be effectivein node-level representation learning. While the GCN model is originally designed for semi-supervised node classification, weonly discuss the part for node representation learning but ignoringthe classification part. The GCN is stacked by several convolutionallayers and a single convolutional layer can be written as: F i + = ReLU ( ˜ D − ˜ A ˜ D − F i W i ) (1)where F i + ∈ R N × d i + is the output of the i -th convolutional layerfor i > F = X denotes the input node features. A totalnumber of I convolutional layers are stacked to learn node repre-sentations and the output matrix F I can be viewed as the final noderepresentations learned by the GCN model.As we described above, the GCN model has been designed forlearning node representations. In the end, the output of the GCNmodel is a matrix instead of a vector. The procedure of the GCN israther “flat”, as it can only “pass message” between nodes throughedges but cannot summarize the node information into the higherlevel graph representation. A simple way to summarize the nodeinformation to generate graph level representation is global pooling.For example, we could use the average of the node representationsas the graph representation. However, in this way, a lot of key infor-mation is ignored and the graph structure is also totally overlookedduring the pooling process.To address this challenge, we propose eigenvector based poolinglayers EigenPooling to hierarchically summarize node informa-tion and generate graph representation. An illustrative exampleis demonstrated in Figure 1. In particular, several pooling layersare added between convolutional layers. Each of the pooling lay-ers pools the graph signal defined on a graph into a graph signaldefined on a coarsened version of the input graph, which consistsof fewer nodes. Thus, the design of the pooling layers consists oftwo components: 1) graph coarsening, which divides the graph intoa set of subgraphs and form a coarsened graph by treating sub-graphs as supernodes; and 2) transform the original graph signalinformation into the graph signal defined on the coarsened graphwith
EigenPooling . We coarsen the graph based on a subgraphpartition. Given a subgraph partition with no overlaps betweensubgraphs, we treat each of the subgraphs as a supernode. To form acoarsened graph of the supernodes, we determine the connectivitybetween the supernodes by the edges across the subgraphs. Duringthe pooling process, for each of the subgraphs, we summarize theinformation of the graph signal on the subgraph to the supernode.With graph coarsening, we utilize the graph structure informationto form coarsened graphs, which makes it possible to learn repre-sentations level by level in a hierarchical way. With
EigenPooling ,we can learn node features of the coarsened graph that exploits thesubgraph structure as well as the node features of the input graph.Figure 1 shows an illustrative example, where a binary graphclassification is performed. In this illustrative example, the graphis coarsened three times and finally becomes a single supernode.The input is a graph signal (the node features), which can be multi-dimensional. For the ease of illustration, we do not show the nodefeatures on the graph. Two convolutional layers are applied to thegraph signal. Then, the graph signal is pooled to a signal defined onthe coarsened graph. This procedure (two convolution layers andone pooling layer) is repeated two more times and the graph signalis finally pooled to a signal on a single node. This pooled signal 𝑜𝑛𝑣 % 𝑐𝑜𝑛𝑣 & 𝑐𝑜𝑛𝑣 ’ 𝑐𝑜𝑛𝑣 ( 𝑐𝑜𝑛𝑣 ) 𝑐𝑜𝑛𝑣 * 𝑝𝑜𝑜𝑙 % 𝑝𝑜𝑜𝑙 & 𝑝𝑜𝑜𝑙 ’ 𝑙𝑎𝑏𝑒𝑙 Figure 1: An illustrative example of the general framework on the single node, which is a vector, can be viewed as the graphrepresentation. The graph representation then goes through severalfully connected layers and the prediction is made upon the outputof the last layer. Next, we introduce details of graph coarsening and
EigenPooling of EigenGCN . In this subsection, we introduce how we perform the graph coars-ening. As we mentioned in the previous subsection, the coarseningprocess is based on subgraph partition. There are different ways toseparate a given graph to a set of subgraphs with no overlappingnodes. In this paper, we adopt spectral clustering to obtain the sub-graphs, so that we can control the number of the subgraphs, which,in turn, determines the pooling ratio. We leave other options asfuture work. Given a set of subgraphs, we treat them as supern-odes and build the connections between them as similar in [40].An example of the graph coarsening and supernodes is shown inFigure 1, where a subgraph and its supernodes are denoted usingthe same color. Next, we introduce how to mathematically describethe subgraphs, supernodes, and their relations.Let c be a partition of a graph G , which consists of K connectedsubgraphs {G ( k ) } Kk = . For the graph G , we have the adjacencymatrix A ∈ R N × N and the feature matrix X ∈ R N × d . Let N k denote the number of nodes in the subgraph G ( k ) and Γ ( k ) is thelist of nodes in subgraph G ( k ) . Note that each of the subgraph canbe also viewed as a supernode. For each subgraph G ( k ) , we candefine a sampling operator C ( k ) ∈ R N × N k as follows: C ( k ) [ i , j ] = Γ ( k ) ( j ) = v i , (2)where C ( k ) [ i , j ] denotes the element in the ( i , j ) -th position of C ( k ) [ i , j ] and Γ ( k ) ( j ) is the j -th element in the node list Γ ( k ) . Thisoperator provides a relation between nodes in the subgraph G ( k ) and the nodes in the original graph. Given a single dimensionalgraph signal x ∈ R N × defined on the original entire graph, theinduced signal that is only defined on the subgraph G ( k ) can bewritten as x ( k ) = ( C ( k ) ) T x . (3)On the other hand, we can also use C ( k ) to up-sample a graph signal x ( k ) defined only on the subgraph G ( k ) to the entire graph G by¯ x = C ( k ) x ( k ) . (4)It keeps the values of the nodes in the subgraph untouched whilesetting the values of all the other nodes that do not belong to thesubgraph to 0. The operator can be applied to multi-dimensional signal X ∈ R N × d in a similar way. The induced adjacency matrix A ( k ) ∈ R N k × N k of the subgraph G ( k ) , which only describes theconnection within the subgraph G ( k ) , can be obtained as A ( k ) = ( C ( k ) ) T AC ( k ) . (5)The intra-subgraph adjacency matrix of the graph G , which onlyconsists of the edges inside each subgraph, can be represented as A int = K (cid:213) k = C ( k ) A ( k ) ( C ( k ) ) T . (6)Then the inter-subgraph adjacency matrix of graph G , which onlyconsists of the edges between subgraphs, can be represented as A ext = A − A int .Let G coar denote the coarsened graph, which consists of thesupernodes and their connections. We define the assignment matrix S ∈ R N × K , which indicates whether a node belongs to a specificsubgraph as: S [ i , j ] = v i ∈ Γ ( j ) . Then, the adjacency matrix of the coarsened graph is given as A coar = S T A ext S . (7)With Graph Coarsening, we can obtain the connectivity of G coar ,i.e., A coar . Obviously, A coar encodes the network structure infor-mation of G . Next, we describe how to obtain the node features X coar of G coar using EigenPooling . With A coar and X coar , we canstack more layers of GCN-GraphCoarsening- EigenPooling to learnhigher level representations of the graph for classification.
EigenPooling
In this subsection, we introduce
EigenPooling , aiming to obtain X coar that encodes network structure information and node fea-tures of G . Globally, the pooling operation is to transform a graphsignal defined on a given graph to a corresponding graph signal de-fined on the coarsened version of this graph. It is expected that theimportant information of the original graph signal can be largelypreserved in the transformed graph signal. Locally, for each ofthe subgraph, we summarize the features of the nodes in this sub-graph to a single representation of the supernode. It is necessaryto consider the structure of the subgraph when we perform thesummarizing, as the subgraph structure also encodes importantinformation. However, common adopted pooling methods suchas max pooling [8, 48] or average pooling [11] ignored the graphstructure. In some works [30], the subgraph structure is used to finda canonical ordering of the nodes, which is, however, very difficultand expensive. In this work, to use the structure of the subgraphs,we design the pooling operator based on the graph spectral theoryby facilitating the eigenvectors of the Laplacian matrix of the sub-graph. Next, we first briefly review the graph Fourier transform andthen introduce the design of EigenPooling based on graph Fouriertransform.
Given a graph G = {E , V} with A ∈ R N × N being the adjacency matrix and X ∈ R N × d being thenode feature matrix. Without loss of generality, for the followingdescription, we consider d =
1, i.e., x ∈ R N × , which can be viewedas a single dimensional graph signal defined on the graph G [33].his is the spatial view of a graph signal, which maps each node inthe graph to a scalar value (or a vector if the graph signal is multi-dimensional). Analogous to the classical signal processing, we candefine graph Fourier transform [38] and spectral representation ofthe graph signal in the spectral domain. To define the graph signal inthe spectral domain, we need to use the Laplacian matrix [6] L = D − A , where D is the diagonal degree matrix with D [ i , i ] = N (cid:205) j = A [ i , j ] .The Laplacian matrix L can be used to define the “smoothness” of agraph signal [38] as follows: s ( x ) = x T Lx = N (cid:213) i , j A [ i , j ]( x [ i ] − x [ j ]) . (8) s ( x ) measures the smoothness of the graph signal x . The smooth-ness of a graph signal depends on how dramatically the value ofconnected nodes can change. The smaller s ( x ) , the more smoothit is. For example, for a connected graph, a graph signal with thesame value on all the nodes has a smoothness of 0, which means“extremely smooth” with no change.As L is a real symmetric semi-positive definite matrix, it has acompleted set of orthonormal eigenvectors { u l } Nl = . These eigen-vectors are also known as the graph Fourier modes [38], which areassociated with the ordered real non-negative eigenvalues { λ l } Nl .Given a graph signal x , the graph Fourier transform can be obtainedas follows ˆ x = U T x , (9)where U = [ u , . . . , u N ] ∈ R N × N is the matrix consists of theeigenvectors of L . The vector ˆ x obtained after the transform is therepresentation of the graph signal in the spectral domain. Corre-spondingly, the inverse graph Fourier transform, which transfersthe spectral representation back to the spatial representation, canbe denoted as: x = U ˆ x . (10)Note that we can also view each the eigenvector u l of the Lapla-cian matrix L as a graph signal, and its corresponding eigenvalue λ l can measure its “smoothness”. For any of the eigenvector u l , wehave: s ( u l ) = u Tl Lu l = u Tl λ l u l = λ l . (11)The eigenvectors (or Fourier modes) are a set of base signals withdifferent “smoothness” defined on the graph G . Thus, the graphFourier transform of a graph signal x can be also viewed as linearlydecomposing the signal x into the set of base signals. ˆ x can beviewed as the coefficients of the linear combination of the basesignals to obtain the original signal x . Since graph Fourier trans-form can transform graph signal to spectral domain which takesinto consideration both graph structure and graph signal informa-tion, we adopt graph Fourier transform to design pooling operators,which pool the graph signal defined on a given graph G to a signaldefined on its coarsened version G coar . The design of the poolingoperator is based on graph Fourier transform of the subgraphs {G k } Kk = . Let L ( k ) denote the Laplacian matrix of the subgraph G ( k ) . We denote the eigenvectors of the Laplacian matrix L ( k ) as u ( k ) , . . . , u ( k ) N k . We then use the up-sampling operator C ( k ) to up-sample these eigenvectors (base signals on this subgraph) into theentire graph and get the up-sampled version as:¯ u ( k ) l = C ( k ) u ( k ) l , l = . . . N k . (12)With the up-sampled eigenvectors, we organize them into matricesto form pooling operators. Let Θ l ∈ R N × K denote the pooling op-erator consisting of all the l -th eigenvectors from all the subgraphs Θ l = [ ¯ u ( ) l , . . . , ¯ u ( K ) l ] (13)Note that the subgraphs are not necessary all with the samenumber of nodes, which means that the number of eigenvectorscan be different. Let N max = max k = ,..., K N k be the largest number ofnodes among all the subgraphs. Then, for a subgraph G ( k ) with N k nodes, we set u ( k ) l for N k < l ≤ N max as ∈ R N k × . The poolingprocess with l -th pooling operator Θ l can be described as X l = Θ Tl X (14)where X l ∈ R K × d is the pooled result using the l -th pooling opera-tor. The k -th row of X l contains the information pooled from the k -th subgraph, which is the representation of the k -th supernode.Following this construction, we build a set of N max poolingoperators. To combine the information pooled by different pooloperators, we can concatenate them together as follows: X pooled = [ X , . . . , X N max ] . (15)where X pooled ∈ R K × d · N max is the final pooled results. For effi-cient computation, instead of using the results pooled by all thepooling operators, we can choose to only use the first H of them asfollows: X coar = X pooled = [ X , . . . , X H ] . (16)In Section 3.1 and Section 3.2, we will show that with H ≪ N max ,we can still preserve most of the information. We will further em-pirically investigate the effect of choice of H in Section 4 EigenPooling
In this section, we provide a theoretical analysis of
EigenPooling byunderstanding it from local and global perspectives. We prove thatthe pooling operation can preserve useful information to be pro-cessed by the following GCN layers. We also verify that
EigenGCN is permutation invariant, which lays the foundation for graph clas-sification with
EigenGCN . EigenPooling
In this subsection, we analyze the pooling operator from a localperspective focusing on a specific subgraph G ( k ) . For the subgraph G ( k ) , the pooling operator tries to summarize the nodes’ featuresand form a representation for the corresponding supernode of thesubgraph. For a pooling operator Θ l , the part that is effective onthe subgraph G ( k ) , is only the up-sampled eigenvector ¯ u ( k ) l as theother eigenvectors have 0 values on the subgraph G ( k ) . Without theloss of generality, let’s consider a single dimensional graph signal ∈ R N × defined on the G , the pooling operation on G ( k ) can berepresented as: ( ¯ u ( k ) l ) T x = ( u ( k ) l ) T x ( k ) , (17)which is the Fourier coefficient of the Fourier mode u ( k ) l of the sub-graph G ( k ) . Thus, from a local perspective, the pooling process is agraph Fourier transform of the graph signal defined on the subgraph.As we introduced in the Section 2.3.1, each of the Fourier modes(eigenvectors) is associated with an eigenvalue, which measures itssmoothness. The Fourier coefficient of the corresponding Fouriermode provides the information to indicate the importance of thisFourier mode to the signal. The coefficient summarizes the graphsignal information utilizing both the node features and the sub-graph structure as the smoothness is related to both of them. Eachof the coefficients characterizes a different property (smoothness)of the graph signal. Using the first H coefficients while discardingthe others means that we focus more on the “smoother” part ofthe graph signal, which is common in a lot of applications such assignal denoising and compression [5, 29, 40]. Therefore, we can usethe squared summation of the coefficients to measure how muchinformation can be preserved as shown in the following theorem.Theorem 3.1. Let x be a graph signal defined on the graph G and U = [ u , . . . , u N ] be the Fourier modes of this graph. Let ˆ x be thecorresponding Fourier coefficients, i.e., ˆ x = U T x . Let x ′ = H (cid:205) l = ˆ x [ l ] · u l be the signal re-constructed using only the first H Fourier modes.Then | | ˆ x [ H ]| | | | ˆ x | | can measure the information being preserved by thisre-construction. Here ˆ x [ H ] denotes the vector consisting of the first H elements of ˆ x . Proof. According to Eq.(10), x can be written as x = N (cid:205) l = ˆ x [ l ] · u l .Since U is orthogonal, we have ∥ x ′ ∥ ∥ x ∥ = ∥ H (cid:205) l = ˆ x [ l ] · u l ∥ ∥ N (cid:205) l = ˆ x [ l ] · u l ∥ = ∥ ˆ x [ H ]∥ ∥ ˆ x ∥ (18)which completes the proof. □ It is common that for natural graph signal that the magnitude ofthe spectral form of the graph signal is concentrated on the first fewcoefficients [33, 38], which means that ∥ ˆ x [ H ]∥ ∥ ˆ x ∥ ≈ H ≪ N k .In other words, by using the first H coefficients, we can preserve themajority of the information while reducing the computational cost .We will empirically verify it in the experiment section. EigenPooling
In this subsection, we analyze the pooling operators from a globalperspective focusing on the entire graph G . The pooling operatorswe constructed can be viewed as a filterbank [40]. Each of thefilters in the filterbank filters the given graph signal and obtainsa new graph signal. In our case, the filtered signal is defined onthe coarsened graph G coar . Without the loss of generality, weconsider a single dimensional signal x ∈ R N × of G , then the filtered signals are { x l } N max l = . Next, we describe some key propertiesof these pooling operators. Property 1: Perfect Reconstruction:
The first property is thatwhen N max number of filters are used, the input graph signal canbe perfectly reconstructed from the filtered signals.Lemma 3.2. The graph signal x can be perfectly reconstructedfrom its filtered signals { x l } N max l = together with the pooling operators { Θ l } N max l = as x = N max (cid:205) l = Θ l x l . Proof. With the definition of Θ l given in Eq.(13), we have N max (cid:213) l = Θ l x l = N max (cid:213) l = K (cid:213) k = ¯ u ( k ) l · x l [ k ] = K (cid:213) k = N max (cid:213) l = ¯ u ( k ) l · x l [ k ] (19)From Eq.(14), we know that x l [ k ] = ( ¯ u ( k ) l ) T x . Together with thefact that ¯ u ( k ) l = C ( k ) u ( k ) l in Eq.(12), we can rewrite N max (cid:205) l = ¯ u ( k ) l · x l [ k ] as N max (cid:213) l = ¯ u ( k ) l · x l [ k ] = C ( k ) ( N max (cid:213) l = u ( k ) l u ( k ) l T ) C ( k ) T x (20)Obviously, (cid:205) N max l = u ( k ) l u ( k ) l T = (cid:205) N K l = u ( k ) l u ( k ) l T = I , since that { u ( k ) l } N k l = are orthonormal and { u ( k ) l } N max l = N k + are all vectors. Thus, N max (cid:205) l = ¯ u ( k ) l · x l [ k ] = C ( k ) x ( k ) . Substitute this to Eq.(19), we arrive at N max (cid:213) l = Θ l x l = K (cid:213) k = C ( k ) x ( k ) = x (21)which completes the proof. □ From Lemma 3.2, we know if N max number of filters are cho-sen, the filtered signals { x l } N max l = can preserve all the informationfrom x . Thus, together with graph coarsening, eigenvector poolingcan preserve the signal information of the input graph and canenlarge the receptive filed, which allows us to finally learn a vectorrepresentation for graph classification. Property 2: Energy/Information Preserving
The second prop-erty is that the filtered signals preserve all the energy when N max filters are chosen. To show this, we first give the following lemma,which serves as a tool for demonstrating property 2.Lemma 3.3. All the columns in the operators { Θ l } N max l = are or-thogonal to each other. Proof. By definition, we know that, for the same k , i.e, the samesubgraph, u ( k ) l , l = , . . . N max are orthogonal to each other, whichmeans ¯ u ( k ) l , l = , . . . N max are also orthogonal to each other. Inaddition, all the ¯ u ( k ) l with different k are also orthogonal to eachother as they only have non-zero values on different subgraphs. □ With the above lemma, we can further conclude that the ℓ normof graph signal x is equal to the summation of the ℓ norm of thepooled signals { x l } N max l = . The proof is given as follows:emma 3.4. The ℓ norm of the graph signal x is equal to thesummation of the ℓ norm of the pooled signals { x l } N max l = : || x || = N max (cid:213) l = || x l || (22)Proof. From Lemma 3.2 and 3.3, we have ∥ x ∥ = ∥ N max (cid:213) l = Θ l x l ∥ = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) K (cid:213) k = N max (cid:213) l = ¯ u ( k ) l · x l [ k ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = K (cid:213) k = N max (cid:213) l = x l [ k ] = N max (cid:213) l = || x l || which completes the proof. □ Property 3: Approximate Energy Preserving
Lemma 3.4 de-scribes the energy preserving when N max number of filters arechosen. In practice, we only need H ≪ N max of filters for efficientcomputation. Next we show that even with H number of filters, thefiltered signals preserve most of the energy/information.Theorem 3.5. Let x ′ = H (cid:205) l = Θ l x l be the graph signal reconstructedonly using the first H pooling operators and pooled signals { x l } Hl = .Then the ratio NH (cid:205) l = | | x l | | Nmax (cid:205) l = | | x l | | can measure the portion of informationbeing preserved by this x ′ . Proof. As shown in Lemma 3.2, ∥ x ∥ = N max (cid:205) l = ∥ x l ∥ . Similarly,we can show that ∥ x ′ ∥ = H (cid:205) l = ∥ x l ∥ . The portion of the informationbeing preserved can be represented as ∥ x ′ ∥ ∥ x ∥ = N H (cid:205) l = || x l || N max (cid:205) l = || x l || . (23)which completes the proof. □ Since for natural graph signals, the magnitude of the spectralform of the graph signal is concentrated in the first few coeffi-cients [33], which means that even with H filters, EigenPooling canpreserve the majority of the information/energy. EigenGCN
EigenGCN takes the adjacency matrix A and the node feature ma-trix X as input and aims to learn a vector representation of thegraph. The nodes in a graph do not have a specific ordering, i.e., A and X can be permuted. Obviously, for the same graph, we want EigenGCN to extract the same representation no matter which per-mutation of A and X are used as input. Thus, in this subsection,we prove that EigenGCN is permutation invariant, which lays thefoundation of using
EigenGCN for graph classification. Theorem 3.6.
Let P = { , } n × n be any permutation matrix,then EigenGCN ( A , X ) = EigenGCN ( PAP T , PX ) , i.e., EigenGCN ispermutation invariant.
Proof. In order to prove that
EigenGCN is permutation invari-ant, we only need to show that it’s key components GCN, graphcoarsening and EigenPooling are permutation invariant. For GCN,before permutation, the output is F = ReLU ( ˜ D − ˜ A ˜ D − XW i ) . Withpermutation, the output becomes ReLU (cid:16) ( P ˜ D − ˜ A ˜ D − P T )( PX ) W (cid:17) = ReLU ( P ˜ D − ˜ A ˜ D − XW ) = PF where we have used P T P = I . This shows that the effect of permu-tation on GCN only permutes the order of the node representationsbut doesn’t change the value of the representations. Second, thegraph coarsening is done by spectral clustering with A . No mat-ter which permutation we have, the detected subgraphs will notchange. Finally, EigenPooling summarizes the information withineach subgraph. Since the subgraph structures are not affected bythe permutation and the representation of each node in the sub-graphs is also not affected by the permutation, we can see that thesupernodes’ representations after EigenPooling are not affected bythe permutation. In addition, the inter-connectivity of supernodesis not affected since it’s determined by spectral clustering. Thus,we can say that one step of GCN-Graph Coarsening-EigenPoolingis permutation invariant. Since finally EigenGCN learns one vectorrepresentation of the input graph, we can conclude that the vectorrepresentation is the same under any permutation P . □ In this section, we conduct experiments to evaluate the effective-ness of the proposed framework
EigenGCN . Specifically, we aimto answer two questions: • Can
EigenGCN improve the graph classification performanceby the design of
EigenPooling ? • How reliable it is to use H number of filters for pooling?We begin by introducing datasets and experimental settings. Wethen compare EigenGCN with representative and state-of-the-artbaselines for graph classification to answer the first question. Wefurther conduct analysis on graph signals to verify the reasonable-ness of using H filters, which answers the second question. To verify whether the proposed framework can hierarchically learngood graph representations for classification, we evaluate
EigenGCN on 6 widely used benchmark data sets for graph classification [21],which includes three protein graph data sets, i.e.,
ENZYMES [3, 36],
PROTEINS [3, 10], and D & D [10, 37]; one mutagen data set Muta-genicity [20, 31] (We denoted as MUTAG in Table 1 and Table 2);and two data sets that consist of chemical compounds screened foractivity against non-small cell lung cancer and ovarian cancer celllines,
NCI1 and
NCI109 [44]. Some statistics of the data sets canbe found in Table 1. From the table, we can see that the used datasets contain varied numbers of graphs and have different graphsizes. We include data sets of different domains, sample and graphsizes to give a comprehensive understanding of how
EigenGCN performs with data sets under various conditions. able 1: Statistics of datasets
ENZYMES PROTEINS D& D NCI1 NCI109 MUTAG graphs 600 1,113 1,178 4,110 4,127 4,337mean |V| 32.63 39.06 284.32 29.87 29.68 30.32 classes 6 2 2 2 2 2
To compare the performance of graph classification, we considersome representative and state-of-the-art graph neural network mod-els with various pooling layers. Next, we briefly introduce thesebaseline approaches as well as the experimental settings for them. • GCN [22] is a graph neural network framework proposedfor semi-supervised node classification. It learns node repre-sentations by aggregating information from neighbors. Asthe
GCN model does not consist of a pooling layer, we di-rectly pool the learned node representations as the graphrepresentation. We use it as a baseline to compare whethera hierarchical pooling layer is necessary. • GraphSage [18] is similar as the GCN and provides variousaggregation method. As similar in
GCN , we directly pool thelearned node representations as the graph representation. • SET2SET . This baseline is also built upon
GCN , it is also“flat” but uses set2set architecture introduced in [43] insteadof averaging over all the nodes. We select this method to fur-ther show whether a hierarchical pooling layer is necessaryno matter average or other pooling methods are used. • DGCNN [49] is built upon the GCN layer. The features ofnodes are sorted before feeding them into traditional 1-Dconvolutional and dense layers [49]. This method is also “flat”without a hierarchical pooling procedure. • Diff-pool [48] is a graph neural network model designed forgraph level representation learning with differential poolinglayers. It uses node representations learned by an additionalconvolutional layer to learn the subgraphs (supernodes) andcoarsen the graph based on it. We select this model as itachieves state-of-art performance on the graph classificationtask. • EigenGCN -H represents various variants of the proposedframework
EigenGCN , where H denotes the number of pool-ing operators we use for EigenPooling . In this evaluation,we choose H = , , Each experiment is run 10 times and the average graph classificationperformance in terms of accuracy is reported in Table 2. From thetable, We make the following observations:
Table 2: Performance comparison.
Baselines Data setsENZYMES PROTEINS D&D NCI1 NCI109 MUTAGGCN 0.440 0.740 0.759 0.725 0.707 0.780GraphSage 0.554 0.746 0.766 0.732 0.703 0.785SET2SET 0.380 0.727 0.745 0.715 0.686 0.764DGCNN 0.410 0.732 0.778 0.729 0.723 0.788Diff-pool 0.636 0.759 0.780 0.760 0.741
EigenGCN -1 EigenGCN -2 0.645 0.754 0.770 0.767 0.748 0.789
EigenGCN -3 0.645 • Diff-pool and the
EigenGCN framework perform better thanthose methods without a hierarchical pooling procedure inmost of the cases. Aggregating the node information hierar-chically can help learn better graph representations. • The proposed framework
EigenGCN shares the same con-volutional layer with GCN, GraphSage, and SET2SET. How-ever, the proposed framework (with different H ) outperformsthem in most of the data sets. This further indicates the ne-cessity of the hierarchical pooling procedure. In other words,the proposed EigenPooling can indeed help the graph classi-fication performance. • In most of the data sets, we can observe that the variants ofthe
EigenGCN with more eigenvectors achieve better perfor-mance than those with fewer eigenvectors. Including moreeigenvectors, which suggests that we can preserve more in-formation during pooling, can help learn better graph repre-sentations in most of the cases. In some of data sets, includingmore eigenvector does not bring any improvement in perfor-mance or even make the performance worse. Theoretically,we are able to preserve more information by using moreeigenvectors. However, noise signals may be also preserved,which can be filtered when using fewer eigenvectors. • The proposed
EigenGCN achieves the state-of-the-art or atleast comparable performance on all the data sets, whichshows the effectiveness of the proposed framework
EigenGCN .To sum up,
EigenPooling can help learn better graph represen-tation and the proposed framework
EigenGCN with
EigenPooling can achieve state-of-the-art performance in graph classificationtask.
In this subsection, we investigate the distribution of the Fouriercoefficients on signals in real data. We aim to show that for naturalgraph signals, most of the information/energy concentrates on thefirst few Fourier models (or eigenvectors). This paves us a wayto only use H filters in EigenPooling . Specifically, given one dataset, for each graph G i with N i nodes and its associated signal X i ∈ R N i × d , we first calculate the graph Fourier transform andobtain the coefficients ˆ X i ∈ R N i × d . We then calculate the followingratio: r Hi = ∥ ˆ X i [ H , : ]∥ ∥ ˆ X i ∥ , where ˆ X i [ H , : ] denotes the first i rowsof the matrix ˆ X i for various values of H . According to Theorem 3.5,this ratio measures how much information can be preserved by thefirst H coefficients. We then average the ratio over the entire data
10 20 30 400.00.20.40.60.81.0 ENZYMES (a) ENZYMES (b) PROTEINS (c) NCI1 (d) NCI109 (e) Mutagenicity (f) DD
Figure 2: Understanding graph signals set and obtain r H = (cid:213) i r Hi . (24)Note that if H > N i , we set r Hi =
1. We visualize the ratio for eachof the data set up to H =
40 in Figure 2. As shown in Figure 2, formost of the data set, the magnitude of the coefficients concentratedin the first few coefficients, which demonstrates the reasonablenessof using only H ≪ N max filters in EigenPooling . In addition, using H filters can save computational cost. In recent years, graph neural network models, which try to extenddeep neural network models to graph structured data, have attractedincreasing interests. These graph neural network models have beenapplied to various applications in many different areas. In [22], agraph neural network model that tries to learn node representationby aggregating the node features from its neighbors, is applied toperform semi-supervised node classification. Similar methods werelater proposed to further enhance the performance by includingattention mechanism [42]. GraphSage [48], which allows moreflexible aggregation procedure, was designed for the same task.There are some graph neural networks models designed to reasonthe dynamics of physical systems where the model is applied topredict future states of nodes given their previous states [1, 32]. Most of the aforementioned methods can fit in the framework of“message passing” neural networks [17], which mainly involvestransforming, propagating and aggregating node features acrossthe graph through edges. Another stream of graph neural networkswas developed based on the graph Fourier transform [4, 8, 19, 24].The features are first transferred to the spectral domain, next filteredwith learnable filters and then transferred back to the spatial domain.The connection between these two streams of works is shown in [8,22]. Graph neural networks have also been extended to differenttypes of graphs [9, 26, 27] and applied to various applications [12, 28,35, 41, 45, 47]. Comprehensive surveys on graph neural networkscan be found in [2, 46, 50, 51].However, the design of the graph neural network layers is in-herently “flat”, which means the output of pure graph neural net-work layers is node representations for all the nodes in the graph.To apply graph neural networks to the graph classification task,an approach to summarize the learned node representations andgenerate the graph representation is needed. A simple way to gen-erate the graph representations is to globally combine the noderepresentations. Different combination approaches have been in-vestigated, which include averaging over all node representation asthe graph representation [11], adding a “virtual node” connectedto all the nodes in the graph and using its node representation asthe graph representation [25], and using conventional fully con-nected layers or convolutional layers after arranging the graphto the same size [17, 49]. However, these global pooling methodscannot hierarchically learn graph representations, thus ignoringimportant information in the graph structure. There are a few re-cent works [8, 13, 39, 48] investigating learning graph representa-tions with a hierarchical pooling procedure. These methods usuallyinvolve two steps 1) coarsen a graph by grouping nodes into su-pernode to form a hierarchical structure and 2) learn supernoderepresentations level by level and finally obtain the graph repre-sentation. These methods use mean-pooling or max-pooling whenthey generate supernodes representation, which neglects the im-portant structure information in the subgraphs. In this paper, wepropose a pooling operator based on local graph Fourier transform,which utilizes the subgraph structure as well as the node featuresfor generating the supernode representations.
In this paper, we design
EigenPooling , a pooling operator based onlocal graph Fourier transform, which can extract subgraph infor-mation utilizing both node features and structure of the subgraph.We provide a theoretical analysis of the pooling operator from bothlocal and global perspectives. The pooling operator together with asubgraph-based graph coarsening method forms the pooling layer,which can be incorporated into any graph neural networks to hier-archically learn graph level representations. We further proposeda graph neural network framework
EigenGCN by combining theproposed pooling layers with the GCN convolutional layers. Com-prehensive graph classification experiments were conducted on 6commonly used graph classification benchmarks. Our proposedframework achieves state-of-the-art performance on most of thedata sets, which demonstrates its effectiveness.
ACKNOWLEDGEMENTS
Yao Ma and Jiliang Tang are supported by the National ScienceFoundation (NSF) under grant numbers IIS-1714741, IIS-1715940,IIS-1845081 and CNS-1815636, and a grant from Criteo FacultyResearch Award.
REFERENCES [1] Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, et al.2016. Interaction networks for learning about objects, relations and physics. In
NIPS . 4502–4510.[2] Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez,Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, AdamSantoro, Ryan Faulkner, et al. 2018. Relational inductive biases, deep learning,and graph networks. arXiv preprint arXiv:1806.01261 (2018).[3] Karsten M Borgwardt, Cheng Soon Ong, Stefan Schönauer, SVN Vishwanathan,Alex J Smola, and Hans-Peter Kriegel. 2005. Protein function prediction via graphkernels.
Bioinformatics
21, suppl_1 (2005), i47–i56.[4] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2013. Spectral net-works and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013).[5] Siheng Chen, Aliaksei Sandryhaila, Jose MF Moura, and Jelena Kovacevic. 2014.Signal denoising on graphs via graph filtering. In
GlobalSIP .[6] Fan RK Chung and Fan Chung Graham. 1997.
Spectral graph theory . Number 92.American Mathematical Soc.[7] Hanjun Dai, Bo Dai, and Le Song. 2016. Discriminative embeddings of latentvariable models for structured data. In
ICML . 2702–2711.[8] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolu-tional neural networks on graphs with fast localized spectral filtering. In
NIPS .[9] Tyler Derr, Yao Ma, and Jiliang Tang. 2018. Signed graph convolutional networks.In . IEEE, 929–934.[10] Paul D Dobson and Andrew J Doig. 2003. Distinguishing enzyme structures fromnon-enzymes without alignments.
JMB
NIPS . 2224–2232.[12] Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin.2019. Graph Neural Networks for Social Recommendation. arXiv preprintarXiv:1902.07243 (2019).[13] Matthias Fey, Jan Eric Lenssen, Frank Weichert, and Heinrich Müller. 2018.SplineCNN: Fast geometric deep learning with continuous B-spline kernels. In
CVPR . 869–877.[14] Hongyang Gao and Shuiwang Ji. 2019. Graph representation learning via hardand channel-wise attention networks. In
Proceedings of the 25th ACM SIGKDDInternational Conference on Knowledge Discovery & Data Mining . ACM.[15] Hongyang Gao and Shuiwang Ji. 2019. Graph U-nets. In
Proceedings of The 36thInternational Conference on Machine Learning .[16] Hongyang Gao, Zhengyang Wang, and Shuiwang Ji. 2018. Large-scale learnablegraph convolutional networks. In
Proceedings of the 24th ACM SIGKDD Interna-tional Conference on Knowledge Discovery & Data Mining . ACM, 1416–1424.[17] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George EDahl. 2017. Neural Message Passing for Quantum Chemistry. In
ICML . 1263–1272.[18] Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representationlearning on large graphs. In
NIPS . 1024–1034.[19] Mikael Henaff, Joan Bruna, and Yann LeCun. 2015. Deep convolutional networkson graph-structured data. arXiv preprint arXiv:1506.05163 (2015).[20] Jeroen Kazius, Ross McGuire, and Roberta Bursi. 2005. Derivation and validationof toxicophores for mutagenicity prediction.
JMC
48, 1 (2005), 312–320.[21] Kristian Kersting, Nils M. Kriege, Christopher Morris, Petra Mutzel, and MarionNeumann. 2016. Benchmark Data Sets for Graph Kernels. http://graphkernels.cs.tu-dortmund.de[22] Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graphconvolutional networks. arXiv preprint arXiv:1609.02907 (2016).[23] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifi-cation with deep convolutional neural networks. In
NIPS . 1097–1105.[24] Ron Levie, Federico Monti, Xavier Bresson, and Michael M Bronstein. 2017.Cayleynets: Graph convolutional neural networks with complex rational spectralfilters. arXiv preprint arXiv:1705.07664 (2017).[25] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gatedgraph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).[26] Yao Ma, Ziyi Guo, Zhaochun Ren, Eric Zhao, Jiliang Tang, and Dawei Yin. 2018.Dynamic graph neural networks. arXiv preprint arXiv:1810.10627 (2018). [27] Yao Ma, Suhang Wang, Chara C Aggarwal, Dawei Yin, and Jiliang Tang. 2019.Multi-dimensional Graph Convolutional Networks. In
Proceedings of the 2019SIAM International Conference on Data Mining . SIAM, 657–665.[28] Federico Monti, Michael Bronstein, and Xavier Bresson. 2017. Geometric matrixcompletion with recurrent multi-graph neural networks. In
Advances in NeuralInformation Processing Systems . 3697–3707.[29] Sunil K Narang and Antonio Ortega. 2012. Perfect reconstruction two-channelwavelet filter banks for graph structured data.
IEEE Transactions on SignalProcessing
60, 6 (2012), 2786–2799.[30] Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learningconvolutional neural networks for graphs. In
ICML . 2014–2023.[31] Kaspar Riesen and Horst Bunke. 2008. IAM graph database repository for graphbased pattern recognition and machine learning. In
Joint IAPR InternationalWorkshops on SPR and (SSPR) . Springer, 287–297.[32] Alvaro Sanchez-Gonzalez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel,Martin Riedmiller, Raia Hadsell, and Peter Battaglia. 2018. Graph net-works as learnable physics engines for inference and control. arXiv preprintarXiv:1806.01242 (2018).[33] Aliaksei Sandryhaila and José MF Moura. [n. d.]. Discrete signal processing ongraphs.
IEEE transactions on signal processing
61, 7 ([n. d.]), 1644–1656.[34] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and GabrieleMonfardini. 2009. The graph neural network model.
IEEE Transactions on NeuralNetworks
20, 1 (2009), 61–80.[35] Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne van den Berg, IvanTitov, and Max Welling. 2018. Modeling relational data with graph convolutionalnetworks. In
European Semantic Web Conference . Springer, 593–607.[36] Ida Schomburg, Antje Chang, Christian Ebeling, Marion Gremse, Christian Heldt,Gregor Huhn, and Dietmar Schomburg. 2004. BRENDA, the enzyme database:updates and major new developments.
Nucleic acids research
32, suppl_1 (2004),D431–D433.[37] Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn,and Karsten M Borgwardt. 2011. Weisfeiler-lehman graph kernels.
JMLR
12, Sep(2011), 2539–2561.[38] David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and PierreVandergheynst. 2013. The emerging field of signal processing on graphs: Ex-tending high-dimensional data analysis to networks and other irregular domains.
IEEE Signal Processing Magazine
30, 3 (2013), 83–98.[39] Martin Simonovsky and Nikos Komodakis. 2017. Dynamic Edge-ConditionedFilters in Convolutional Neural Networks on Graphs. In
CVPR . 3693–3702.[40] Nicolas Tremblay and Pierre Borgnat. [n. d.]. Subgraph-based filterbanks forgraph signals.
IEEE Transactions on Signal Processing
64, 15 ([n. d.]), 3827–3840.[41] Rakshit Trivedi, Hanjun Dai, Yichen Wang, and Le Song. 2017. Know-evolve:Deep temporal reasoning for dynamic knowledge graphs. In
Proceedings of the34th International Conference on Machine Learning-Volume 70 . JMLR. org, 3462–3471.[42] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, PietroLiò, and Yoshua Bengio. 2017. Graph Attention Networks. arXiv preprintarXiv:1710.10903 (2017).[43] Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. 2015. Order matters: Se-quence to sequence for sets. arXiv preprint arXiv:1511.06391 (2015).[44] Nikil Wale, Ian A Watson, and George Karypis. 2008. Comparison of descrip-tor spaces for chemical compound retrieval and classification.
Knowledge andInformation Systems
14, 3 (2008), 347–375.[45] Xiaolong Wang, Yufei Ye, and Abhinav Gupta. 2018. Zero-shot recognition viasemantic embeddings and knowledge graphs. In
Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition . 6857–6866.[46] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, andPhilip S Yu. 2019. A comprehensive survey on graph neural networks. arXivpreprint arXiv:1901.00596 (2019).[47] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton,and Jure Leskovec. 2018. Graph convolutional neural networks for web-scalerecommender systems. In
Proceedings of the 24th ACM SIGKDD InternationalConference on Knowledge Discovery & Data Mining . ACM, 974–983.[48] Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, Will Hamilton, and JureLeskovec. 2018. Hierarchical graph representation learning with differentiablepooling. In
NIPS . 4805–4815.[49] Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. 2018. AnEnd-to-End Deep Learning Architecture for Graph Classification. (2018).[50] Ziwei Zhang, Peng Cui, and Wenwu Zhu. 2018. Deep learning on graphs: Asurvey. arXiv preprint arXiv:1812.04202 (2018).[51] Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and MaosongSun. 2018. Graph Neural Networks: A Review of Methods and Applications. arXiv preprint arXiv:1812.08434arXiv preprint arXiv:1812.08434