Enhancing Geometric Deep Learning via Graph Filter Deconvolution
EENHANCING GEOMETRIC DEEP LEARNING VIA GRAPH FILTER DECONVOLUTION
Jingkang Yang a,b and Santiago Segarra ca School of Electronic Engineering and Computer Science, Queen Mary University of London, London, UK b International School, Beijing University of Posts and Telecommunications, Beijing, China c Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA, USA
ABSTRACT
In this paper, we incorporate a graph filter deconvolution step intothe classical geometric convolutional neural network pipeline. Moreprecisely, under the assumption that the graph domain plays a rolein the generation of the observed graph signals, we pre-processevery signal by passing it through a sparse deconvolution oper-ation governed by a pre-specified filter bank. This deconvolutionoperation is formulated as a group-sparse recovery problem, andconvex relaxations that can be solved efficiently are put forth. Thedeconvolved signals are then fed into the geometric convolutionalneural network, yielding better classification performance than theirunprocessed counterparts. Numerical experiments showcase theeffectiveness of the deconvolution step on classification tasks onboth synthetic and real-world settings.
Index Terms — Geometric deep learning, graph signal process-ing, convolutional neural networks, deconvolution.
I. INTRODUCTION
Graph signal processing (GSP) emerges in response to the needto better process and understand the ever-increasing volume ofnetwork data, often conceptualized as signals defined on graphs[1], [2]. For example, graph-supported signals can model economicactivity observed over a network of production flows betweenindustrial sectors [3], as well as brain activity signals supportedon brain connectivity networks [4]–[6]. However, due to the com-plexity and irregularity of such networks, most of the standardsignal processing notions – such as convolution and downsampling– are no longer directly applicable on graph settings. Fortunately,in the past years, numerous works have expanded several of theclassical signal processing tools onto the realm of graphs, includingsampling and reconstruction [3], [7]–[9], stationarity and powerspectral density estimation [10]–[12], filter design [13], [14], andblind deconvolution [15]–[17]. Of special importance to this paperis the latter body of work on (blind) deconvolution, where the inputto a network diffusion process (formalized as a graph filter) isrecovered from the observation of the corresponding output, thusextending deconvolution of temporal and spatial signals to the lessstructured graph domain.General learning tasks for network data, such as classificationand regression, are also of practical importance. For example, onemight want to classify brain activity patterns while taking intoaccount the brain network on which they are defined. This providesan opportunity for a synergistic relation between machine learningand GSP, something currently being investigated under the name
SS received an IDSS MIT seed grant. Emails: [email protected],[email protected] of geometric (deep) learning (GDL). The general goal of GDLis to generalize structured deep neural networks to non-Euclideandomains, such as graphs and manifolds [18].GDL has achieved state-of-the-art performance on several ap-plication areas including shape correspondence tasks [19], [20]and recommender systems [21], [22]. Different from classic con-volutional neural networks (CNNs) that operate on well-structureddata such as audio and images, GDL acts on irregular domains– represented by graphs – where the fundamental operations ofconvolution and pooling (downsampling) are not straightforward[23]. Thus, most of the existing efforts are geared towards de-veloping alternatives for these operations on general graphs. Toextend the classic pooling operation, most works rely on existingclustering techniques to coarsen the graph structures [24], [25]while some recent approaches seek to avoid this operation ingeneral graphs [26]. For the generalization of the convolutionallayer, various classes of graph filters have been used, startingfrom full non-parametric definitions in the frequency domain [27]to a myriad of parametric definitions that simultaneously reduceoverfitting and speed-up the training process [28]–[30].There is a clear potential for cross-pollination between GSPand GDL since the former is designed to process data definedon graphs whereas the latter seeks to better learn from this data.However, most of the attention so far has been dedicated to theuse of GSP tools to design graph filters that better mimic theconvolutional layer in traditional CNNs. We contend that there isroom for further collaboration between these fields, where conceptslike deconvolution of graph signals can be used for classificationtasks.
Contribution and paper organization.
In this paper, we in-corporate the notion of graph filter deconvolution into the pipelineof GDL. More specifically, every signal to be classified is firstpassed through a deconvolution block using a pre-specified filterbank with sparse regularizers that jointly promote sparsity inthe number of filters used in the deconvolution and the numberof non-zero elements in the deconvolved (seeding) signals. Theclassification is then performed on the seeding signals, leading tobetter performance in practice.The rest of the paper is organized as follows. In Section IIwe briefly review basic notions of GSP and GDL. In Section IIIwe present the sparse deconvolution problem and how it canbe incorporated into the traditional GDL pipeline. Section IVillustrates the performance benefits of the proposed approach inthe classification of both synthetic and real-world data, beforeconcluding in Section V with closing remarks and potential avenuesfor future research. a r X i v : . [ ee ss . SP ] S e p I. BACKGROUNDII-A. Fundamentals of Graph Signal Processing
Consider the (possibly directed) graph G = ( N , E ) formedby the set N of N nodes and the set of edges E , such thatthe pair ( i, j ) belongs to E if there exists a link from node i to node j . Associated with a given G , a graph signal can berepresented as a vector x = [ x , . . . , x N ] T ∈ R N , where the i th component, x i , represents the signal value at node i . Thenetwork structure is captured by the graph-shift operator S [2],a sparse matrix such that [ S ] ij (cid:54) = 0 for ( i, j ) ∈ E or i = j . Theadjacency matrix and the graph Laplacian are usual choices for theshift operator. Assuming that S is diagonalizable, the shift can bedecomposed as S = V diag ( λ ) V − , where λ = [ λ , . . . , λ N ] T collects the eigenvalues. Linear graph filters are defined as graph-signal operators of the form H = (cid:80) L − l =0 h l S l , i.e., polynomials in S [2]. The filtering operation is thus given by y = Hx , where y is the filtered signal, x the input, h = [ h , . . . , h L − ] T the filtercoefficients, and L − the filter degree. Notice that y is effectivelya linear combination of successive shifted versions of the input x .Indeed, graph filters are the natural extension of convolutions tothe graph domain.The filter coefficients h determine the range of influence of thefilter in the graph domain. For example, if for a given filter onlythe first three elements of h are significant, then the output of thefilter at node i will only be determined by the input values at node i and its two-hop neighborhood. In this paper we consider fourtypes of filters with varying range: i) uniform-range h UR where h UR = , ii) short-range h SR where [ h SR ] i = (1 + i/L ) − , iii)medium-range h MR where [ h MR ] i = exp( − ( i/L − . ) , andiv) long-range h LR where [ h LR ] i = [ h SR ] L − i +1 . As expected, thevalues of [ h SR ] i decrease with i whereas the opposite is true for thevalues of [ h LR ] i . Moreover, for the medium-range filter, the valuesof [ h MR ] i are given by a Gaussian kernel centered at L/ .As in classical signal processing, graph filters and signals maybe represented in the frequency (or Fourier) domain. Defining thegraph Fourier operator as U = V − , the graph Fourier transform(GFT) of the signal x is ˜ x = Ux . For graph filters, the definitionof the GFT that maps the filter coefficients h to the frequencyresponse of the filter, ˜ h , is given by ˜ h = Ψh , where Ψ is an N × L Vandermonde matrix whose elements are [ Ψ ] i,j = λ j − i [13]. With ◦ denoting the Hadamard (elementwise) product, thedefinitions of the GFT for signals and filters allow us to rewrite thefiltering operation in the spectral domain as ˜ y = ˜ h ◦ ˜ x , mimickingthe classical convolution theorem. II-B. Basics of Geometric Deep Learning
The main goal of GDL is to generalize structured deep neuralnetworks to non-Euclidean domains, such as graphs and manifolds.Specifically, generalizing CNNs for data defined in irregular do-mains is one of the main challenges of GDL. In this section, wereview some of the existing approaches to overcome this challenge.We separate the presentation into the three main components ofCNNs: convolution, non-linearity, and pooling.
Convolutional layer.
Graph filters naturally extend the convo-lution operation to general graphs (cf. Section II-A). Hence, for agiven graph-shift operator, to represent a convolutional layer with R input channels (each input represents a graph signal) and M output channels, we use the following expression g ( m ) = V R (cid:88) r =1 diag (˜ h ( m,r ) ) V − f ( r ) , for m = 1 , . . . , M, (1)where ( f (1) , . . . , f ( R ) ) represent the R input channels, ( g (1) , . . . , g ( M ) ) represent the M output channels and ˜ h ( m,r ) ∈ R N × is the frequency response of one of the M × R filtersthat encode the convolutional layer. These frequency responsesare the ones learned during the training process. We denominatethis method ‘non-parametric Geometric CNN (non-parametricGNN)’, since every element of every frequency response ˜ h ( m,r ) is decoupled and can be tuned freely. To prevent overfittingand to promote locality of the learned filters, one can enforcethe frequency responses ˜ h ( m,r ) to belong to some parametricsubspace [20], as the one described by cubic splines. Specifically,we can define the responses ˜ h ( m,r ) = B α ( m,r ) , where B is an N × Q fixed interpolation kernel encoding cubic splines, and α ( m,r ) ∈ R Q × is a vector of learnable parameters [27]. We namethis method as ‘spline GNN’. Additional parametric families ofgraph filters have been defined in the literature [28]–[30]. Non-linearity.
Activation functions ξ are non-linear functionsapplied elementwise to the outputs of convolutional layers beforethe action of the pooling layer. Given the elementwise applicationof ξ , activation functions can be implemented in a straightforwardmanner for GDL. In this paper we use the well-known rectifiedlinear unit (ReLU) activation function ξ ( x ) = max(0 , x ) . Pooling layer.
The pooling layer reduces the dimensionality ofthe signal being treated. For the treatment of images by classicalCNNs, this operation is simple where groups of neighboring pixelsare merged into single pixels, thus obtaining a smaller image asthe output of the layer. However, for the pooling layer in irregulargraphs one has to group nodes into blocks via clustering operationsand then define a graph topology between the coarser nodes.Although multiple methods co-exist in the literature, in this paperwe use the fast pooling method proposed in [28], which makesuse of the Graclus coarsening algorithm [25] to build a binary treestructure for the pooling layer.
III. GRAPH FILTER DECONVOLUTION FOR GDL
The archetypal formulation of a GDL classifier consists ofa GNN whose weights are learned during a training phase viabackpropagation, and then can be readily used as in Fig. 1(a).More precisely, whenever a new graph signal has to classified,it is processed by the trained GNN and a classification output isobtained. This simple pipeline processes the graph signals as givenwhile relying on the graph convolutions to generate useful, graph-dependent features for classification. However, this pipeline ignorespotential generative methods of the graph signal observed in thefirst place. In particular, it is often the case that graph signals areobtained from some diffusion process in the graph, for example, thecontagion of an epidemic or the spread of news. Moreover, if thisdiffusion process is common across the classes that we are tryingto differentiate, it then obscures the difference between classes,complicating the task of the GNN. In this context, we propose thealternative pipeline in Fig. 1(b). Instead of directly using graphsignals as inputs of the GNN, we first perform a deconvolutionstep with a pre-specified filter bank to obtain for each signal a setof sparse (deconvolved) seeding signals. These seeding signals are econvolution
GNN
Classification graph&signal& Filter&bank& deconvolved&(seeding)&signals&
GNN
Classification graph&signal& H H H Deconvolution
GNN
Classification graph&signal& Filter&bank& deconvolved&(seeding)&signals&
GNN
Classification graph&signal& H H H (a) (b) Fig. 1 : Sketch of the classical and proposed GDL pipelines. (a) In the classical approach, a given graph signal is passed through a GNN to obtaina classification output. (b) We propose to first pass the given signal through a deconvolution block using a pre-specified filter bank. The GNN isthen applied to the seeding (deconvolved) signals. then fed into the GNN for either training or classification. In theremainder of this section we elaborate on the deconvolution step.Consider the case where, indeed, the observed graph signals y (such as the current state of an epidemic) can be modeled as theoutput of a diffusion process H excited by an originally sparseinput x (such as a small group of ‘patients zero’). Formally, wehave that y = Hx , where the seeding signal x is sparse with atmost S (cid:28) N non-zero entries. Whenever H is known, one canrecover x from y by solving the sparse recovery problem min x (cid:107) x (cid:107) subject to y = Hx , (2)where the (cid:96) norm in the objective function is used as the usualconvex surrogate of the sparsity promoting (cid:96) (pseudo-)norm. Nat-urally, whenever the observed signal y is noisy or the knowledgeon H is not perfect, one may replace the constraint in (2) by itsrobust counterpart (cid:107) y − Hx (cid:107) ≤ (cid:15) , where (cid:15) is tuned according tosome prior knowledge on the noise level.A more challenging setting is one where neither H nor a noisyversion of it are known. A possible approach in this situation isto implement the existing results on blind graph filter identifica-tion [15]–[17] to jointly recover H and x from y . However, solvingsuch blind identification problems can be computationally demand-ing for graphs of size in the order of a thousand nodes. To overcomethis computational difficulty, we instead pre-specify a bank of filterswith the intention that one of them (or a sparse combination ofthese) would be close to the true underlying diffusion filter. For-mally, consider a filter bank P = [ H , H , . . . , H K ] ∈ R N × NK that collects K possible diffusion processes for the graph signalsat hand. In practice, we build P by combining filters of differentrange (cf. Section II-A). Associating a seeding signal x i with eachof these filters, we define the matrix X = [ x , . . . , x K ] ∈ R N × K .With this notation in place, the proposed deconvolution step boilsdown to solving the following optimization problem min X (cid:107) X (cid:107) , + α (cid:107) vec ( X ) (cid:107) subject to y = P vec ( X ) . (3)The objective function in (3) consists of two different regularizers,with α > trading off their relative importance. The (cid:96) , normpromotes column-sparsity in X , i.e., that only a few of the columnsof X are different from zero. This implies that we seek to explainthe observed signal y by using only a small number of the filtersincluded in the bank P . Furthermore, the (cid:96) norm, just like in(2) promotes that the few seeding signals that explain y arethemselves sparse. Notice that both problems (2) and (3) are convexand, thus, can be solved efficiently in practice. This enables theimplementation of the proposed GDL pipeline in Fig. 1(b), as we illustrate in the next section. IV. NUMERICAL EXPERIMENTS
The proposed method is evaluated on both synthetic data andthe well-known MNIST handwritten digit dataset [31]. The resultsshow that our deconvolution step improves both the accuracy andthe learning rate of classification tasks.
IV-A. Synthetic data: Diffused graph signals
We evaluate the benefits of deconvolution via a synthetic binaryclassification task with known ground truth. In order to generatethe data, we follow the ensuing procedure. We start with a sparsesignal x S of size N ∈ { , } (we vary the size for differentexperiments) where only of the values in x S are differentfrom zero, and drawn from a standard normal distribution. Signal x S is then separately processed by two filters H and H definedon two different graphs G and G , drawn from Erd˝os-R´enyi (ER)random models with edge probabilities p = 0 . and p = 0 . ,respectively. Filter H is of uniform-range whereas filter H isof short-range (cf. Section II-A). We denote the outputs of thesefilters as x and x , respectively belonging to classes and .Our observations are y and y , obtained by diffusing x and x on the same graph G (ER, p = 0 . ) via the same graph filter H (short-range of degree L ∈ { , } ). Our objective is to classify y and y into their correct classes; see Fig. 2(a) for details.We consider , such sparse signals x S , leading to , observed signals y (half of each class), and we use , fortraining and the remaining for testing. We compare the clas-sification accuracy of three different logistic regression classifiers:one based on the observed diffused signals y , another based on thetrue seeding signals x , and the last one based on the deconvolvedsignals ˆ x . To obtain ˆ x from y we solve (2). We do not assumeperfect knowledge of H but rather we have access to a noisyversion ˜ H = H + (cid:15) Z where (cid:15) ∼ N (0 , σ ) and every entry of Z is drawn from a standard normal distribution. In Figs. 2(b)-(c)we present the accuracy of these three classifiers as a function of σ . As expected, the classifier based on the true signal x (depictedin green) achieves the highest accuracy. By contrast, the classifierbased on y (portrayed in blue) suffers since the diffusion filter H obscures the differences between both classes. Lastly, the accuracyof the classifier based on the deconvolved signals ˆ x (in red) largelydepends on σ . When σ = 0 , the perfect knowledge of H leadsto noiseless deconvolution so that ˆ x = x . As σ increases, theperformance degrades but is still preferable to the case where nodeconvolution is performed. This effect is observed for differentvalues of the filter degree L and graph size N . abel&1&Label&2& dataset& x S G , H G , H x x G, H y y (a) (b) (c) Fig. 2 : (a) Schematic view of the generation process for the synthetic data. (b) Classification accuracy as a function of the noise level σ in thedeconvolution filter for three different classifiers based on x (green), y (blue), and ˆ x (red). The solid and dashed lines respectively correspond to L = 3 and L = 5 for the diffusing filter. The size of the graphs is N = 30 . (c) Counterpart of (b) for N = 50 . (a) (b) (c) Fig. 3 : (a) Validation accuracy of the non-parametric GNN as a function of the training steps for the four classes of input signals considered. (b)Counterpart of (a) for the spline GNN. (c) Testing accuracy for both GNNs and the four classes of input signals considered.
IV-B. Handwritten number classification on MNIST
To evaluate the benefits of the proposed GDL pipeline [cf.Fig. 1(b)], we consider the MNIST classification task [31]. TheMNIST dataset consists of , handwritten digit images from10 classes corresponding to the different digits. The size of eachimage is × pixels. We randomly set , samples as trainingdata, , as validation data, and , as testing data.We consider an nearest-neighbor representation of each image,where each pixel is connected to its closest pixels in an un-weighted graph of N = 784 . We consider the raw images y ∈ R N as diffused signals to which we apply a deconvolution step. For thisstep, we use a filter bank with four filters – uniform, short, medium,and long-range –, each of degree L = 10 (cf. Section II-A). Thus,for each input signal y we have four (deconvolved) seeding signals ˆ x UR , ˆ x SR , ˆ x MR , and ˆ x LR , obtained by solving (3).For the GNN classifier, we experiment on both a non-parametricGNN and a spline GNN ( Q = 25 ), using Graclus as a graphcoarsening algorithm for the pooling layer (cf. Section II-B). Thenetwork architecture is a LeNet-5-like CNN [32] with standardTensorFlow specification given by ‘GC32-P4-GC64-P4-FC512’.This means that the GNN consists of two graph convolutional layers(GC) with depths of 32 and 64, two pooling layers (P) with a strideof 4, and one fully connected layer (FC) of size 512, configured inthe order mentioned above.We consider the classification accuracy of both GNNs whenapplied to four different types of inputs: i) raw : the raw sig-nals (images) y ∈ R N , ii) raw-copy : four copies of the rawsignals [ y , y , y , y ] ∈ R N × , iii) decon-4 : the seeding signals [ˆ x UR , ˆ x SR , ˆ x MR , ˆ x LR ] ∈ R N × , and iv) decon-1 : the aggregatedseeding signals ˆ x = ˆ x UR + ˆ x SR + ˆ x MR + ˆ x LR ∈ R N . Notice that the first two approaches fall under the classical setting in Fig. 1(a) withno preprocessing of the raw data while the last two are examplesof the proposed deconvolution approach in Fig. 1(b).The validation accuracy as a function of the training step for bothtypes of GNNs considered is illustrated in Figs. 3(a)-(b). In bothcases, the training process for the deconvolved signals is faster andachieves higher values of validation accuracy. Notice that even thereplicated raw data (raw-copy) achieves lower accuracy levels thanthe consolidated deconvolved signals (decon-1), indicating that theincrease in performance is not attributable to the consideration oflarger input signals but rather to the deconvolution step proposed.Similar conclusions can be obtained when considering the testingaccuracy of the trained GNNs; see Fig. 3(c). Even though the splineGNN achieves better performance than the non-parametric one, forboth configurations the deconvolved signals yield a higher accuracythan their raw (non-processed) counterparts. V. CONCLUSIONS
We explored synergies between GDL and GSP that go beyondthe area of graph filter design and touch upon the existing workon (blind) graph filter identification. More precisely, we proposeda pre-processing step for GDL that consists of a deconvolutionprocedure of the observed signals using a graph filter bank. Thisdeconvolution step – formulated as a group-sparse recovery prob-lem – was shown to improve the empirical classification powerof GNNs when compared to the classification of the unprocessedgraph signals. Potential research avenues that follow from theproposed work include the derivation of theoretical guaranteesof the deconvolution step to better characterize the classificationimprovements, and the development of criteria for optimal designof the graph filter bank that drives the deconvolution operation.
I. REFERENCES [1] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, andP. Vandergheynst, “The emerging field of signal processingon graphs: Extending high-dimensional data analysis to net-works and other irregular domains,”
IEEE Signal ProcessingMagazine , vol. 30, no. 3, pp. 83–98, 2013.[2] A. Sandryhaila and J. M. F. Moura, “Discrete signal process-ing on graphs,”
IEEE Trans. Signal Process. , vol. 61, no. 7,pp. 1644–1656, Apr. 2013.[3] A. G. Marques, S. Segarra, G. Leus, and A. Ribeiro, “Sam-pling of graph signals with successive local aggregations,”
IEEE Trans. Signal Process. , vol. 64, no. 7, pp. 1832–1843,April 2016.[4] E. Bullmore and O. Sporns, “Complex brain networks: Graphtheoretical analysis of structural and functional systems,”
Nature Reviews Neuroscience , vol. 10, no. 3, p. 186, 2009.[5] W. Huang, T. A. W. Bolton, J. D. Medaglia, D. S. Bassett,A. Ribeiro, and D. V. D. Ville, “A graph signal processingperspective on functional brain imaging,”
Proc. IEEE , vol.106, no. 5, pp. 868–885, May 2018.[6] J. D. Medaglia, W. Huang, S. Segarra, C. Olm, J. Gee,M. Grossman, A. Ribeiro, C. T. McMillan, and D. S. Bassett,“Brain network efficiency is influenced by the pathologicsource of corticobasal syndrome,”
Neurology , vol. 89, no. 13,pp. 1373–1381, 2017.[7] S. Chen, R. Varma, A. Sandryhaila, and J. Kovacevic, “Dis-crete signal processing on graphs: Sampling theory,”
IEEETrans. Signal Process. , vol. 63, no. 24, pp. 6510–6523, Dec2015.[8] S. Segarra, A. G. Marques, G. Leus, and A. Ribeiro, “Recon-struction of graph signals through percolation from seedingnodes,”
IEEE Trans. Signal Process. , vol. 64, no. 16, pp.4363–4378, Aug 2016.[9] D. Romero, M. Ma, and G. B. Giannakis, “Kernel-basedreconstruction of graph signals,”
IEEE Trans. Signal Process. ,vol. 65, no. 3, pp. 764–778, Feb 2017.[10] A. G. Marques, S. Segarra, G. Leus, and A. Ribeiro, “Sta-tionary graph processes and spectral estimation,”
IEEE Trans.Signal Process. , vol. 65, no. 22, pp. 5911–5926, Aug. 2017.[11] N. Perraudin and P. Vandergheynst, “Stationary signal pro-cessing on graphs,”
IEEE Trans. Signal Process. , vol. 65,no. 13, pp. 3462–3477, Jul. 2017.[12] B. Girault, “Stationary graph signals using an isometric graphtranslation,” in
European Signal Process. Conf. (EUSIPCO) ,Aug 2015, pp. 1516–1520.[13] S. Segarra, A. G. Marques, and A. Ribeiro, “Optimal graph-filter design and applications to distributed linear networkoperators,”
IEEE Trans. Signal Process. , vol. 65, no. 15, pp.4117–4131, Aug 2017.[14] E. Isufi, A. Loukas, A. Simonetto, and G. Leus, “Autore-gressive moving average graph filtering,”
IEEE Trans. SignalProcess. , vol. 65, no. 2, pp. 274–288, Jan 2017.[15] S. Segarra, G. Mateos, A. G. Marques, and A. Ribeiro, “Blindidentification of graph filters,”
IEEE Trans. Signal Process. ,vol. 65, no. 5, pp. 1146–1159, March 2017.[16] D. Ram´ırez, A. G. Marques, and S. Segarra, “Graph-signalreconstruction and blind deconvolution for diffused sparseinputs,” in
IEEE Intl. Conf. Acoust., Speech and Signal Process. (ICASSP) . IEEE, 2017, pp. 4104–4108.[17] C. Ye, R. Shafipour, and G. Mateos, “Blind identificationof invertible graph filters with multiple sparse inputs,” arXivpreprint arXiv:1803.04072 , 2018.[18] J. Masci, E. Rodol`a, D. Boscaini, M. M. Bronstein, andH. Li, “Geometric deep learning,” in
SIGGRAPH ASIA 2016Courses . ACM, 2016, p. 1.[19] L. Yi, H. Su, X. Guo, and L. Guibas, “SyncSpecCNN:Synchronized spectral CNN for 3D shape segmentation,” in
Computer Vision and Pattern Recognition (CVPR) , 2017.[20] O. Litany, T. Remez, E. Rodola, A. M. Bronstein, and M. M.Bronstein, “Deep functional maps: Structured prediction fordense shape correspondence,” in
Proc. ICCV , vol. 2, 2017,p. 8.[21] F. Monti, M. Bronstein, and X. Bresson, “Geometric matrixcompletion with recurrent multi-graph neural networks,” in
Advances in Neural Information Processing Systems , 2017,pp. 3700–3710.[22] W. Huang, A. G. Marques, and A. Ribeiro, “Collaborativefiltering via graph signal processing,” in
European SignalProcess. Conf. (EUSIPCO) , Aug 2017, pp. 1094–1098.[23] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Van-dergheynst, “Geometric deep learning: Going beyond eu-clidean data,”
IEEE Signal Processing Magazine , vol. 34,no. 4, pp. 18–42, 2017.[24] U. Von Luxburg, “A tutorial on spectral clustering,”
Statisticsand computing , vol. 17, no. 4, pp. 395–416, 2007.[25] I. S. Dhillon, Y. Guan, and B. Kulis, “Weighted graph cutswithout eigenvectors a multilevel approach,”
IEEE transac-tions on pattern analysis and machine intelligence , vol. 29,no. 11, 2007.[26] F. Gama, A. G. Marques, G. Leus, and A. Ribeiro, “Convolu-tional neural networks architectures for signals supported ongraphs,” arXiv preprint arXiv:1805.00165 , 2018.[27] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectralnetworks and locally connected networks on graphs,” arXivpreprint arXiv:1312.6203 , 2013.[28] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolu-tional neural networks on graphs with fast localized spectralfiltering,” in
Advances in Neural Information Processing Sys-tems , 2016, pp. 3844–3852.[29] J. Atwood and D. Towsley, “Diffusion-convolutional neuralnetworks,” in
Advances in Neural Information ProcessingSystems , 2016, pp. 1993–2001.[30] T. N. Kipf and M. Welling, “Semi-supervised classifica-tion with graph convolutional networks,” arXiv preprintarXiv:1609.02907 , 2016.[31] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,”
Proceedingsof the IEEE , vol. 86, no. 11, pp. 2278–2324, 1998.[32] Y. LeCun et al. , “Lenet-5, convolutional neural networks,”