[PDF] Inferring Graph Signal Translations as Invariant Transformations for Classification Tasks

Abstract

The field of Graph Signal Processing (GSP) has proposed tools to generalize harmonic analysis to complex domains represented through graphs. Among these tools are translations, which are required to define many others. Most works propose to define translations using solely the graph structure (i.e. edges). Such a problem is ill-posed in general as a graph conveys information about neighborhood but not about directions. In this paper, we propose to infer translations as edge-constrained operations that make a supervised classification problem invariant using a deep learning framework. As such, our methodology uses both the graph structure and labeled signals to infer translations. We perform experiments with regular 2D images and abstract hyperlink networks to show the effectiveness of the proposed methodology in inferring meaningful translations for signals supported on graphs.

Full PDF

IInferring Graph Signal Translations asInvariant Transformations for Classiﬁcation Tasks

Rapha¨el Baena, Lucas Drumetz, Vincent GriponIMT Atlantique and Lab-STICC,[email protected]

Abstract —The ﬁeld of Graph Signal Processing (GSP) hasproposed tools to generalize harmonic analysis to complexdomains represented through graphs. Among these tools aretranslations, which are required to deﬁne many others. Mostworks propose to deﬁne translations using solely the graphstructure (i.e. edges). Such a problem is ill-posed in general as agraph conveys information about neighborhood but not aboutdirections. In this paper, we propose to infer translations asedge-constrained operations that make a supervised classiﬁcationproblem invariant using a deep learning framework. As such, ourmethodology uses both the graph structure and labeled signalsto infer translations. We perform experiments with regular 2Dimages and abstract hyperlink networks to show the effectivenessof the proposed methodology in inferring meaningful translationsfor signals supported on graphs.

Index Terms —graph signal translation, deep learning, classiﬁ-cation, invariant operators

I. I

NTRODUCTION

Translations are among the most fundamental transforma-tions in signal processing. They are often used as a basicbuilding block to deﬁne convolutions, Fourier transform, ﬁltersand related tools. In machine learning, they can be exploited todeﬁne ad-hoc operators that beneﬁt from the underlying simpleregular structure of processed signals, such as in the case ofConvolutional Neural Networks (CNNs). As a matter of fact,CNNs were introduced because they can be made invariantto translations when combined with downsampling operators,which is often desirable in practice.Recently, the ﬁeld of Graph Signal Processing (GSP) arosewith the aim of generalizing classical harmonic analysis toirregular domains described using graphs [1]. Among thenumerous tools that were introduced in this ﬁeld, translationhas attracted a lot of attention [2]. Contrary to the case of n D structures, deﬁning translations for graph signals can bechallenging. Incidentally, graphs can represent very regularstructures (e.g. sensor network) as well as abstract ones (e.g.social network) and the deﬁnition of translations and henceharmonic operators should be sensible for these domains.In the early days of GSP, the Graph Fourier Transform(GFT) was introduced without relying on translations [3].Convolutions could then be deﬁned by simple pointwisemultiplications in the graph spectral domain. And translationswere then obtained by particularizing convolutions with Diracsignals. Later in [4] the authors pointed out that this operatorwas not an isometry. They proposed alternative deﬁnitionsbased on complex exponentials of the Laplacian matrix of the considered graph. Problematically, these operators donot generalize well classical circular translations on signalsdeﬁned on grid graphs. Using a completely different approach,the authors in [5] deﬁned translations of graph signals directlyin the vertex domain (without using the GFT), thus providingan actual generalization of classical tools. Still, this approachcomes with a large computational complexity, and struggleswith abstract and irregular graph structures.There are fundamental reasons why it is so challenging todeﬁne translations for graph signals. One of them is that agraph typically encompasses a notion of neighborhood (orsimilarity) between its vertices. On the other hand, translationsare deﬁned using directions, which are typically not explicitlyavailable or even meaningful when considering a graph [5].In this work, we would like to propose inferring graph signaltranslations using not only the graph, but also additionalinformation such as annotated signals on this graph.Our solution builds upon the idea of translational invarianceof classiﬁcation tasks. In more details, given a graph andsamples that belong to distinct classes, we aim at inferringoperators constrained by the graph structure and that allowto deﬁne weight-sharing deep learning architectures that reachhigh accuracy on the considered classiﬁcation task. As such,the inferred operators can be interpreted as transformationsthat are invariant for the considered task. In the case ofregular n D signals, we would expect these transformations toinclude classical translations, but also possibly other operatorssuch as directional dilations or contractions. Interestingly,this approach does not require strong assumptions about theregularity of the graph structure, and can thus be deployedeven for abstract domains such as relational networks.II. R

ELATED WORK

Let us consider a graph G = (cid:104) V, E (cid:105) , where V is a ﬁnite setof vertices and E is a set of pair of vertices called the edges.Such a graph can be conveniently expressed using its binary adjacency matrix A deﬁned as: A [ i, j ] = (cid:26) if ( i, j ) ∈ E otherwise . (1)The degree matrix of G is deﬁned as: D [ i, j ] = (cid:26) (cid:80) i (cid:48) ∈ V A [ i, i (cid:48) ] if i = j otherwise . (2) a r X i v : . [ ee ss . SP ] F e b n the ﬁeld of spectral graph theory, it is common to alsointroduce the (combinatorial) Laplacian of the graph as thematrix deﬁned as L = D − A .In this work, we are interested in processing signals ongraphs. A graph signal is a vector s ∈ R V . Of particularinterest are Dirac signals which are simple one-hot vectors.The ﬁeld of GSP introduces tools to manipulate signals ongraphs. These tools include convolutions, ﬁltering, smoothing,translations. . . The rationale is that such operators are deﬁnedby taking into account the graph structure (i.e. the graphedges). In the particular case where the considered graph isan oriented ring graph, the tools deﬁned by the framework ofGSP perfectly match the ones deﬁned for 1D signals [3].This matching does not necessarily hold for more complexgraph structures. In particular, considering regular 2D gridgraphs, the operators deﬁned using the GSP toolbox typicallydiffer from the traditional 2D corresponding ones [5]. Inciden-tally, deﬁning a graph signal translation operator is challeng-ing, because a graph structure only encompasses informationabout neighborhood of vertices and not directionality [5].In the early days of GSP, translations were deﬁned ontop of convolutions. As a matter of fact, the authors in [2]propose a deﬁnition of GFT of a signal s by simply projecting s to a basis where the Laplacian of the graph is diagonal.The inverse GFT can be obtained by projecting backwardsto the canonical basis. Then, in [6], [7] the authors deﬁneconvolutions in three steps: ﬁrst they compute the GFT ofconsidered signals, then they pointwise multiply their spectralcoordinates, and ﬁnally they perform an inverse GFT on theresulting vector. Graph signal translations can then be obtainedby convolving signals with a Dirac. The authors of [4] pointout in their paper that these translations are not isometric. Theyintroduce alternative deﬁnitions using complex exponentials ofthe Laplacian matrix. Problematically, the deﬁnitions in [6],[7], [4] do not properly generalize translations for signals ongraphs, because, as we pointed out previously, these operatorstypically do not match the expected ones when consideringregular 2D grid graphs. As a matter of fact, the translationsdeﬁned in [3] are isotropic.In [8], the authors aim at identifying directions or relevantgraph motifs in order to deﬁne graph signal convolutions.These motifs represent meaningful connectivity patterns, e.gtriangle motifs which are crucial for social networks [9]. Oncea set of motifs is chosen, nonisotropic Laplacians are deﬁnedfor each one. Convolutions are then deﬁned as multivariatepolynomial of the Laplacian matrices. Two key issues withthis methods are the huge amount of parameters it relies uponand the difﬁculty of choosing relevant motifs.With the purpose of proposing graph signal operators thatfully match the expected ones for regular grid graphs, theauthors in [5] introduce a deﬁnition of translations directlyin the vertex domain (i.e. that does not use the GFT).In their work they characterize translations as functions φ ,deﬁned from a subset of vertices V (cid:48) , that are i) injective( φ ( v ) = φ ( v (cid:48) ) ⇒ v = v (cid:48) , ∀ v, v (cid:48) ∈ V (cid:48) ), ii) edge-constrained( ( v, φ ( v )) ∈ E, ∀ v ∈ V (cid:48) ) and iii) neighborhood-preserving ( ( v, v (cid:48) ) ∈ E ⇔ ( φ ( v ) , φ ( v (cid:48) )) ∈ E, ∀ v, v (cid:48) ∈ V (cid:48) ). Injectivityand neighborhood-preservation are key characteristics to en-sure the matching with regular translations, but they are poorlysuited for abstract graph structures such as social networks.In [10], the authors introduce pseudo-convolutions for deepneural networks that can be seen as implementing the edgeconstraint previously introduced. Namely, they introduce atensor S and a vector w . The binary tensor S is of dimension N × N × K , where N is the number of vertices in the consid-ered graph and K is a hyperparameter. Moreover, S [ i, j, k ] iszero if ( i, j ) (cid:54)∈ E , and S [ i, j, :] contains at most one nonzeroentry. The vector w contains K coordinates. The tensor-matrixproduct along the third mode of S by w , denoted as S × w creates a N × N matrix W that can be seen as a weightedversion of the adjacency matrix A of the considered graph.The authors show that for particular choices of S , they canretrieve classical convolutions for regular grid graphs. Moregenerally, slices S [: , : , k ] can be interpreted as graph signaltranslations. In this paper, we propose to infer the tensor S using both the graph structure and a set of labeled signals.III. P ROBLEM S TATEMENT AND M ETHODOLOGY

The rationale behind CNNs is to exploit the invariance ofinput labels to translations [11], which is achieved throughweight sharing schemes. In more details, translations are usedto deﬁne convolutions. When convolutions are combined withpooling operations, they can produce representations that areinvariant with respect to translations. Resulting CNNs canobtain signiﬁcant gains in accuracy compared to translation-agnostic architectures such as multi-layer perceptrons [11].The key idea of our proposed methodology is to reverse thisreasoning. Namely, we propose to deﬁne learnable operatorsthat are aligned with the graph structure, from which webuild pseudo-convolutions by learning ad-hoc weight sharingschemes. Combined with pooling, we obtain architecturesthat can be trained end-to-end to solve classiﬁcation tasks.Once a network with good performance is found, we canthen assimilate our learned operators as pseudo-translations,or more generally classiﬁcation invariant operations.In more details, let us consider a simple example wherethe graph is a ring with adjacency matrix A . Let us supposethat we consider a periodic graph signal s made of N = 4 dimensions, on which we can operate k = 3 translations de-noted through their matrix representations ( T k ) , where T k ∈ R N × N . For this simple example T =   will bethe identity, T =   and T =   circular translations corresponding to the two orientations ofthe ring. We build a tensor S ∈ R N × N × K by concatenating GSL GSL n Pool FC SM y Fig. 1: Depiction of the used deep learning architecture. GSLstands for Graph-Signal Layer, Pool for a global average pool-ing, FC for a fully connected layer and SM for a softmax .matrices ( T k ) k . We also deﬁne a convolutional kernel vector w indexed by the K possible translations. Then it holds that: S × w = (cid:88) k w [ k ] S [: , : , k ]= (cid:88) k w [ k ]( T k )=  w w w w w w w w w w w w  . (3)We indeed recognize a Toeplitz circulant convolution ma-trix. These equations can be generalized for any regular n Dgraph easily, and to any graph by constraining the structure of S . The graph convolution operation (cid:63) can be then deﬁned as: s (cid:63) w = s (cid:62) ( S × w ) . (4)We propose to learn matrices ( T k ) k by optimizing a deepneural network meant to classify graph signals, under theconstraint that T k [ i, j ] (cid:54) = 0 ⇐ A [ i, j ] (cid:54) = 0 , where A isthe graph adjacency matrix. In other words, ( T k ) k are edgeconstrained transformations. A. Problem Statement

For the sake of simplicity, we describe here the processingof tensors with only one ﬁlter, that is to say a single w . Notethat all the equations of this section could be generalized tothe case of multiple ﬁlters, which boils down to adding adimension to all tensors and computations presented thereafter.Let us recall that a deep neural network can be described bya function f mapping the input to the output. The function f is obtained by assembling elementary functions, called layers,that are most of the time of the form: x (cid:55)→ σ ( Wx + b ) ,where W is a weight matrix, b is a bias vector and σ is a nonlinear function, usually parameter-free and appliedcomponent-wise. The weight matrix and its associated biasvector are the trainable parameters θ of the network.In the case of classiﬁcation, the aim is to train f to mapraw inputs (e.g. images) to their corresponding class. For thatmatter, we typically use two datasets, a training one, denoted D train , that is used to learn the parameters and a validationone used to stress the ability of the trained function f tocorrectly predict the class of previously unseen inputs. Also,the network function f ends by applying a softmax operator.The most typical setting for training a classiﬁer is to relyon a cross-entropy loss function L . Denoting ( x , y ) ∈ D train where x is an input and y its corresponding output, we have: L ( x , y ) = − log( f ( x )[ y ]) . The deep neural network function f is optimized to solvethe following problem: arg min θ (cid:88) ( x ,y ) ∈D train L ( x , y ) . In practice, variants of the Stochastic Gradient Descentalgorithm are often used for this optimization.Of particular interest for vision tasks are convolutional lay-ers, in which the weight tensor W implements a convolutionoperator. In our case, we do not have an explicit access totranslations, hence to convolutions. Thus we rather make useof Graph-Signal Layers (GSLs): s (cid:55)→ σ ( s (cid:62) ( S × w )) , whereslices S [: , : , k ] are edge-constrained: S [ i, j, k ] (cid:54) = 0 ⇐ A [ i, j ] (cid:54) = 0 . Given Equation (3), this layer is a generalization of convo-lutional layers. Multiple GSLs can be deﬁned, each with itsown weight vector w , but sharing the same global S .Let us now imagine that we are a given a deep neuralnetwork function f , with parameters { S , ω, θ } , containingsome GSLs. Here S represents the graph transformations(which are implicit in CNNs), ω are the parameters of theGSLs, and θ are the remaining parameters (e.g. for fullyconnected layers). The problem we aim at solving is to ﬁnd: arg min S ,ω,θ (cid:88) ( x ,y ) ∈D train L ( x , y ) . Speciﬁcally, we are interested in solutions in which S [ i, : , k ] are one-hot vectors, so that slices S [: , : , k ] can be interpreted aspseudo-translations. In the next subsection, we delve in moredetails in how we propose to enforce this constraint. B. Methodology

As stated in the introduction, convolutional neural networks,when they are built with pooling layers, have the asset ofproducing translation invariant decisions. However, performingpooling on graph signals can be hard, because it requirescomputing graph downsampling [2]. This is why in thispaper, we adopt a simple workaround where we only performa single pooling operation at the penultimate layer of ourproposed architecture, right before the ﬁnal fully connectedlayer. This pooling is global, so that it completely shrinks thegraph dimension: all vertices values are averaged in a singlevalue, for each considered ﬁlter. A depiction of the proposedarchitecture is available in Figure 1.Optimizing deep neural network functions over a discretedomain is a hard task [12], since it involves binary matrixconstraints, which are not straightforward to enforce. Becauseour aim is to obtain one-hot vectors, which is similar to [13],we adopt the same strategy. Namely, we apply a softmax operator over the second dimension of S , with a varyingtemperature t ( x (cid:55)→ softmax ( x /t ) ). This temperature startswith value t init , typically large, in which case the softmax operator has the effect of making the lines S [ i, : , k ] constantwhere deﬁned (recall that S [: , : , k ] is edge-constrained). At theend of the training, the ﬁnal temperature is t final , typically ↑ ↓ ↑ ↑ ↑ ↑ ↑ ↓ ↑ ↑ ← ← ↓ ← → ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↓ →← ← ← ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑← ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↓ →↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ → ↑↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ←↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↓↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↓↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑↑ → ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑→ ↑ → ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ←→ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑→ → → ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ← ← ← ←↑ → → ← → ↑ ← ↑ ↑ ↑ ← ← → ← (a) T ↑ ↑ → ← → ← → ↓ → ↑ ↓← → ↓↓ ↑ →↓↑ ←↓ ↓↓ ↑ ↓↓ ↓ ↓↓ → ↓← → ← ← ← ← ← ← → → → ← → ← → (b) T ↑ ↑ → → → ← → → ↓ ↓ ↑ ↑ ↓ ← → → ← → → ← ← ← ↓ → ↓← ↓ → → → → → → ← ← ← ← ← ← ↓↓ ↑ → → → → → → ← ← ← ← ← ← ← →→ → → → → → → → ← ← ← ← ← ← ← ↓↑ → → → → → → → ← ← ← ← ← ←← → → → → → → ← ← ← ← ← ← ← ← ←→ → → → → → → → ← ← ← ← ← → ←↓ → → → → → → ← ← ← ← ← ← ← ←↓ → → → → → → ← → ← ← → ← ← →→ ↓ → → ← → → → → ← ← ← → ← ↓ ↓↑ → ← ← → ← → ← ← ← ← ← ← ↑↓ ← → → ← → ← ← ← ← ← → → ← ↓ ↓↑ ← ← ← → ← → ← ← ← → ← → ↓ ↑→ → ↓ → ← ↓ ← ← → → ← ← ← (c) T → ← → ↓ ↓ ← ↑ → → ↓ ↑ ↓ ↑ ←↑ ↓ ← ← ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↑ ↑ ↓ ↑ ← ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ → ↑ ↑ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↑ ↑ → ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↑ ↑← ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ←↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↑ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ← ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ → ← ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ → ↑ ← ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ → → ← ← ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ → → ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ← ← ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ → →← ↓ ↓ ↓ → ↓ → ↓ → → ↓ → ↓ → ← (d) T ← ↑ ↑ ↓ ↑ → → → ↑ ↑ ↑ → → ↑ ↑ ↓ ← ↑ ← ↑ ↑ ↑ ↓ ↑ → → ↑ ↑ ↑ ↓ → →← ← ← ← ← ← ↑ ↑ ↑ → → → ↑ ↑ ↑ ↑← ← ← ← ← ← ← ← ← ↑ → → → → → →↓ ← ← ← ← ← ← ← → → → → → → → →← ← ← ← ← ← ← → → → → → → → →← ← ← ← ← ← ← ← ← → → → → → → →↑ ← ← ← ← ← ← ← ← → → → → → → ↑↓ ← ← ← ← ← ← → → → → → → → →← ← ← ← ← ← ← ← → ← → → → → → ↓→ ← ← ← ← ← ← → → → → → → ↓↓ ← ← ← ← ← → → → → → →↓ ↓ ← ← ← ← → → → → → → ↓↑ → ← ← ← ← → → → → → → ↓↓ ← → ← ← → → ← → → ↑ (e) T Fig. 2: Depiction of inferred pseudo-translations when considering the CIFAR-10 dataset on a regular 2D grid graph.small, so that the softmax boils down to a regular max operator, transforming lines S [ i, : , k ] into one-hot vectors.We experimented with various strategies to interpolate thetemperature between t init and t final . Our most consistentresults were obtained using an exponential interpolation: t ( s ) = t init ( t final /t init ) s/s total , where s is the current step in the training phase, and s total is the total number of steps used for training. At the end ofthe training process, we use a temperature of 0 to interpret theslices of S [: , : , k ] as pseudo-translations. In the next section,we present experiments on toy and real datasets.IV. E XPERIMENTS

In this section we present experiments on various typesof graphs from very regular structures (images supported on2D grid graphs) to abstract ones (hyperlink networks). Weevaluate our method on two datasets: CIFAR-10 [14] andwebKB [15]. CIFAR-10 is a classiﬁcation dataset of imagesmade of 32 ×

32 pixels with three primary colors grouped in 10classes. WebKB is a dataset composed of 877 web pages fromcomputer science departments of universities classiﬁed intoone of ﬁve classes (student, project, course, staff, and faculty).The dataset contains word-based feature vectors of dimension1703 for each of the websites, as well as a hyperlink graph.This dataset is typically used in contexts of semi-supervisedclassiﬁcation, where only a portion of the websites are labeled.

A. Sanity check with regular grid graphs

In our ﬁrst experiment, we aim at verifying the ability ofour proposed method to retrieve classical translations whendealing with 2D signals and structures. To this end, we usethe CIFAR-10 dataset downscaled to 16 ×

16 pixel images andsuppose given a regular grid graph for supporting the imagesignals. In more details, the grid graph is such that a vertexcorresponds to a pixel, and each pixel is connected throughthe edges to its four direct neighbors.In Figure 2, we depict the result of our proposed method.An inferred pseudo-translation T is represented in a grid ofsize × . For each vertex we represent by an arrow theneighbor vertex it is associated with through T (recall thatinferred pseudo-translations are edge-constrained, so that thisrepresentation is well deﬁned). For each T k , we highlight (a) original image (b) T (c) T (d) T (e) T (f) translation of [6] Fig. 3: Inferred translations T , T , · · · , T and comparisonwith the translation deﬁned in [6] on a near-regular graph.the vertices that correspond to the majority direction. Inter-estingly, we observe that T and T tend to approach regulartranslations. Note that T is almost the identity function. Sur-prisingly, we observe that T and T resembles respectivelyan horizontal dilation and compression. As a matter of fact,such transformations are valid in our framework and wouldtypically be invariant for the classiﬁcation problem at hand. B. Experiments with a near-regular inferred graph structure

In our second experiment, we use an inferred graph structurethat is obtained by computing the covariance matrix from thetraining set of CIFAR-10, and thresholding to keep only the5 nearest neighbors of each vertex (including self-loops). Theinferred graph structure is not as regular as the previously used2D grid graph even though it remains similar.Due to the non-regular structure of the graph it is notpossible to use the same representation than in Figure 2.Therefore we illustrate the obtained transformations by ap-plying them directly on an arbitrarily selected input image.Results are shown in Figure 3. We can clearly see that obtainedtransformations are not exactly classical translations, but mostof them are interpretable: T look likes a vertical translation, T the identity, T and T horizontal dilation and contraction. − − − − t final d i s t a n ce , acc u r ac y Fig. 4: Impact of t finit on the accuracy (black) and distance ofthe obtained translation : identity (orange), up (green), down(purple), dilation (blue), and the average distance (red). C. Experiments with hyperlink networks

To illustrate the genericity of the approach, we next runan experiment with the WebKB dataset. For lack of a bettermethod to evaluate the obtained transformations, we comparethe accuracy achieved using the proposed methodology with astandard method from the literature: graph convolutional neu-ral network GCN [16]. We averaged the obtained accuracy on10 different splits of training/validation/test sets. GCN obtainsan average of and our method . Note that GCNand the proposed methodology reach similar performance,yet the two systems are quite different: GCN uses isotropicdiffusion of signals, whereas we focus on directional inferredtranslations. Moreover, contrary to GCN, our approach is notdesigned to optimize classiﬁcation performance but to infermeaningful edge-constrained transformations.

D. Inﬂuence of hyperparameters

Finally, in a last series of experiments, we illustrate thesensitivity of the proposed method with respect to the hy-perparameters t init and t final . In Figure 4, we ﬁx t init andvary t final , whereas in Figure 5, we ﬁx t final and vary t init . In these experiments we evaluate the impacts of theinitial and ﬁnal temperatures on the accuracy of the networkand the transformations obtained. The “distance” measuresthe number of differences between obtained transformationsand the closest 2D translation, dilation or contraction. Forthis evaluation we use CIFAR-10 dataset and assume that theimages rely on the grid-graph. As can be observed, the methodis quite robust to changes in these hyperparameters.V. C ONCLUSION

In this paper we have introduced a new methodology basedon deep learning to infer graph signal translations from botha graph structure and a set of labeled signals. We empiricallyshowed that this methodology is able to retrieve usual 2Dtranslations from regular images. We also conducted experi-ments on an abstract hyperlink network and obtained perfor-mance similar to that of state-of-the-art. There are many openquestions following this work, including other possible ways − t init d i s t a n ce , acc u r ac y Fig. 5: Impact of t init on the accuracy (black) and distance ofthe obtained translation : identity (orange), up (green), down(purple), dilation (blue), and the average distance (red).to infer translations using labeled graph signals, better choiceof hyperparameters, design of deep learning architectures andof the classiﬁcation dataset.R EFERENCES[1] A. Ortega, P. Frossard, J. Kovaˇcevi´c, J. M. F. Moura, and P. Van-dergheynst, “Graph signal processing: Overview, challenges, and appli-cations,”

Proceedings of the IEEE , vol. 106, no. 5, pp. 808–828, 2018.[2] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Van-dergheynst, “The emerging ﬁeld of signal processing on graphs: Ex-tending high-dimensional data analysis to networks and other irregulardomains,”

IEEE Signal Processing Magazine , vol. 30, no. 3, pp. 83–98,2013.[3] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing ongraphs,”

IEEE Transactions on Signal Processing , vol. 61, no. 7, pp.1644–1656, 2013.[4] B. Girault, P. Gonc¸alves, and E. Fleury, “Translation on graphs: Anisometric shift operator,”

IEEE Signal Processing Letters , vol. 22, no. 12,pp. 2416–2420, 2015.[5] B. Pasdeloup, V. Gripon, J.-C. Vialatte, N. Grelier, and D. Pastor, “Aneighborhood-preserving translation operator on graphs,” 2018.[6] D. K. Hammond, P. Vandergheynst, and R. Gribonval, “Waveletson graphs via spectral graph theory,”

Applied and ComputationalHarmonic Analysis , vol. 30, no. 2, pp. 129–150, Mar. 2011. [Online].Available: https://hal.inria.fr/inria-00541855[7] D. I. Shuman, B. Ricaud, and P. Vandergheynst, “A windowed graphfourier transform,” in , 2012, pp. 133–136.[8] F. Monti, K. Otness, and M. M. Bronstein, “Motifnet: A motif-basedgraph convolutional network for directed graphs,” in , 2018, pp. 225–228.[9] A. R. Benson, D. F. Gleich, and J. Leskovec, “Higher-order organizationof complex networks,”

Science , vol. 353, no. 6295, pp. 163–166, 2016.[10] J.-C. Vialatte, V. Gripon, and G. Coppin, “Learning local receptive ﬁeldsand their weight sharing scheme on graphs,” 2017.[11] Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networksand applications in vision,” in

Proceedings of 2010 IEEE InternationalSymposium on Circuits and Systems , 2010, pp. 253–256.[12] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio,“Binarized neural networks: Training deep neural networks with weightsand activations constrained to +1 or -1,” 2016.[13] G. B. Hacene, C. Lassance, V. Gripon, M. Courbariaux, and Y. Bengio,“Attention based pruning for shift networks,” 2019.[14] A. Krizhevsky, “Learning multiple layers of features from tiny images,”