[PDF] Smoothing graph signals via random spanning forests

Abstract

Another facet of the elegant link between random processes on graphs and Laplacian-based numerical linear algebra is uncovered: based on random spanning forests, novel Monte-Carlo estimators for graph signal smoothing are proposed. These random forests are sampled efficiently via a variant of Wilson's algorithm --in time linear in the number of edges. The theoretical variance of the proposed estimators are analyzed, and their application to several problems are considered, such as Tikhonov denoising of graph signals or semi-supervised learning for node classification on graphs.

Full PDF

SSMOOTHING GRAPH SIGNALS VIA RANDOM SPANNING FORESTS

Yusuf Y. Pilavci, Pierre-Olivier Amblard, Simon Barthelm´e, Nicolas Tremblay

CNRS, Univ. Grenoble Alpes, Grenoble INP, GIPSA-lab, Grenoble, France

ABSTRACT

Another facet of the elegant link between random processeson graphs and Laplacian-based numerical linear algebra is un-covered: based on random spanning forests, novel Monte-Carlo estimators for graph signal smoothing are proposed.These random forests are sampled efﬁciently via a variant ofWilson’s algorithm –in time linear in the number of edges.The theoretical variance of the proposed estimators are ana-lyzed, and their application to several problems are consid-ered, such as Tikhonov denoising of graph signals or semi-supervised learning for node classiﬁcation on graphs.

Index Terms — graph signal processing, smoothing, ran-dom spanning forests

1. INTRODUCTION

Tikhonov denoising of graph signals [1, 2], semi-supervisedlearning for node classiﬁcation in graphs [3], post-samplinggraph signal reconstruction [4] are a few examples falling inthe following class of optimization problems. Given a graphsignal y = ( y | . . . | y n ) t ∈ R n where y i is the value measuredat node i of a graph, one considers the problem ˆ x = argmin z ∈ R n q || y − z || + z t L z , (1)where L is the Laplacian of the graph and q > a parame-ter tuning the trade-off between a data-ﬁdelity term || y − z || –encouraging the solution ˆ x to be close to y – and a smooth-ness term z t L z –encouraging the solution ˆ x to vary slowlyalong any path of the graph. In Tikhonov denoising, y is anoisy measurement of an underlying signal x that one wantsto recover. In semi-supervised learning (SSL), y are knownlabels that one wants to propagate on the graph to classify allthe nodes in different classes (see Section 4 for more details).This optimization problem admits an explicit solution: ˆ x = K y with K = q ( q I + L ) − ∈ R n × n , (2)that requires the inversion of a regularized Laplacian q I + L ,where I is the identity matrix. Computing K costs O ( n ) ele-mentary operations and becomes prohibitive as n increases. This work was partly funded by the ANR GenGP (ANR-16-CE23-0008),the ANR GraVa (ANR-18-CE40-0005), the LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01), the CNRS PEPS I3A (Project RW4SPEC), the Greno-ble Data Institute (ANR-15-IDEX-02), the LIA CNRS/Melbourne UnivGeodesic, and the MIAI chair ”LargeDATA at UGA.”

State-of-the-art.

For large n (say ≥ ), the state-of-the-artincludes iterative methods such as conjugate gradient withpreconditioning [5] where L is only accessed via matrix-vector multiplications of the form L z , and polynomial ap-proximation methods [6] that approximate K by a low-orderpolynomial in L . Both classes of methods enable to compute ˆ x in time linear in |E| , the number of edges of the graph. Contribution.

We introduce two Monte-Carlo estimators of ˆ x based on random spanning forests, that: • show another facet of the elegant link between randomprocesses on graphs and Laplacian-based numericallinear algebra, such as in [7, 8, 9] • scale linearly in |E| and thus useful on very large graphs • can be implemented in a fully distributed fashion: aslong as each node is able to communicate with itsneighbours, the result can be obtained without cen-tralized knowledge of the graph’s structure L (see theimplementation paragraph at the end of Section 3).The Julia code implementing these estimators and reproduc-ing this papers’ results is available on the authors’ website . Structure of the paper.

We provide the necessary notationsand background in Section 2. In Section 3, we detail the twonovel estimators before generalizing them to cases where K is of the form K = ( Q + L ) − Q with Q a positive diagonalmatrix. The experimental Section 4 gives an illustration onimage denoising before showing how to use our estimators inthe context of SSL. We conclude in Section 5.

2. BACKGROUNDNotations and preliminary deﬁnitions.

We consider undi-rected graphs G = ( V , E , W ) where V is the set of |V| = n nodes, E the set of |E| edges, and W ∈ R n × n the symmetricweighted adjacency matrix. We denote by d i = (cid:80) j W ij thedegree of node i , d = ( d , d , . . . , d n ) t ∈ R n the degree vec-tor, and D = diag ( d ) ∈ R n × n the diagonal degree matrix.In this paper, we consider the Laplacian matrix deﬁned as L = D − W ∈ R n × n . L is positive semi-deﬁnite (PSD) [7] andits eigenvalues can be ordered as λ ≤ λ ≤ . . . ≤ λ n .We also consider graph signals x ∈ R n deﬁned over the nodes V : x i = x ( i ) is the signal’s value at node i . We denote by ∼ nicolas.tremblay/ﬁles/graph smoothing via RSF.zip a r X i v : . [ c s . D M ] F e b ∈ R n the constant vector equal to 1, and by δ i ∈ R n theKronecker delta such that δ i ( j ) = 1 if j = i , and otherwise.A tree of G is understood as a connected subgraph of G that does not contain cycles. A rooted tree of G , i.e. , a tree of G whose edges are all oriented towards one node called root,is generically denoted by τ . A rooted forest of G , i.e. , a setof disjoint rooted trees, is generically denoted by φ . In thefollowing, we only consider rooted trees and rooted forests,and thus simply refer to them as trees and forests. Also, ρ ( φ ) stands for the set of roots of the trees in φ . As such, | ρ ( φ ) | is the total number of trees in φ . We deﬁne the function r φ : V → V such that r φ ( i ) is the root of the tree containingnode i in the forest φ . Node i is said to be rooted in r φ ( i ) . Random spanning forests (RSFs).

Let us recall that a span-ning tree (resp. forest) of G is a tree (resp. forest) of G that spans all nodes in V . Now, for a given graph G , thereexists in general many spanning trees (and even more span-ning forests). Probabilists and computer scientists have beenstudying several distributions over spanning trees and forestsin the past. A classical distribution over spanning trees is: P ( T = τ ) ∝ (cid:89) ( ij ) ∈ τ W ij . (3)Sampling from this distribution yields a so-called uniform spanning tree (UST) T , and can be efﬁciently performed byWilson’s algorithm [10] via loop-erased random walks.We focus here on the following parametrized distributionover spanning forests: P (Φ q = φ ) ∝ q | ρ ( φ ) | (cid:89) τ ∈ φ (cid:89) ( ij ) ∈ τ W ij , q ∈ R + ∗ . (4)Sampling from this distribution yields a RSF Φ q and is efﬁ-ciently performed (in time O ( |E| /q ) ) via a variant of Wilson’salgorithm [11]. Many properties of Φ q are known [11, 12].For instance, the probability that node i is rooted in j is (seelemma 3.3 in [12]) P (cid:0) r Φ q ( i ) = j (cid:1) = K ij . (5)

3. RSF-BASED ESTIMATORS

Given a signal y , our goal is to compute ˆ x = K y withoutcomputing explicitly K , that is, without inverting q I + L . Wedeﬁne two Monte-Carlo estimators of ˆ x , both based on RSFs. A ﬁrst estimator of ˆ x . Let us consider a realisation Φ q ofRSF and deﬁne the estimator ˜ x as ∀ i ˜ x ( i ) = y (cid:2) r Φ q ( i ) (cid:3) . (6) Proposition 1. ˜ x is unbiased: E [˜ x ] = ˆ x . Moreover: E (cid:0) || ˜ x − ˆ x || (cid:1) = (cid:88) i var (˜ x ( i )) = y t ( I − K ) y . this terminology comes from the fact that in unweighted graphs, thisdistribution reduces to the uniform distribution over all spanning trees Proof.

Let i ∈ V . One has, using Eq. (5): E [˜ x ( i )] = E (cid:2) y (cid:2) r Φ q ( i ) (cid:3)(cid:3) = (cid:88) j P ( r Φ q ( i ) = j ) y j (7) = (cid:88) j K ij y j = δ ti K y = ˆ x ( i ) . (8) ˜ x is thus unbiased. A similar calculation yields E (cid:2) ˜ x ( i ) (cid:3) = δ ti K y (2) where ∀ k, y (2) k = y k , such that the variance veriﬁes:var (˜ x ( i )) = E (cid:2) ˜ x ( i ) (cid:3) − E [˜ x ( i )] = δ ti K y (2) − ( δ ti K y ) . Summing over all nodes yields: (cid:88) i var (˜ x ( i )) = t K y (2) − y t K y . Noticing that is an eigenvector of K with eigenvalue (as is an eigenvector of L with eigenvalue ) ends the proof. An improved estimator of ˆ x . A random forest Φ q contains | ρ (Φ q ) | trees that we enumerate from to | ρ (Φ q ) | . Consider V k ⊂ V the subset of nodes that are in the k -th tree. As Φ q is a spanning forest, P = {V , V , . . . , V | ρ (Φ q ) | } is a partitionof V ( i.e. , a set of disjoint subsets that cover V ). Let t be thetree membership function that associates to each node i thetree number t ( i ) to which it belongs. For instance, |V t ( i ) | isthe size of the tree containing node i . We deﬁne a secondestimator as ∀ i ¯ x ( i ) = 1 |V t ( i ) | (cid:88) j ∈V t ( i ) y j . (9) Proposition 2. ¯ x is unbiased: E [¯ x ] = ˆ x . Moreover: E (cid:0) || ¯ x − ˆ x || (cid:1) = (cid:88) i var (¯ x ( i )) = y t ( K − K ) y . Proof.

Let us denote by π the function that takes as entrya forest φ and outputs its associated partition. Let us alsodeﬁne s ij = 1 if t ( i ) = t ( j ) , and otherwise. We need theProposition 2.3 of [11]. Fixing a partition P of V , one has: P (cid:0) r Φ q ( i ) = j | π (Φ q ) = P (cid:1) = s ij |V t ( i ) | . In words, this means that given a partition, the distribution ofthe root within each part V l is uniform over V l . Also, notethat K ij = P ( r φ ( i ) = j ) can be written: K ij = (cid:88) P P (cid:0) r Φ q ( i ) = j | π (Φ q ) = P (cid:1) P ( π (Φ q ) = P )= (cid:88) P s ij |V t ( i ) | P ( π (Φ q ) = P )= (cid:88) P s ij |V t ( i ) | (cid:88) φ s.t. π ( φ )= P P (Φ q = φ )= (cid:88) φ P (Φ q = φ ) s ij |V t ( i ) | = E (cid:18) s ij |V t ( i ) | (cid:19) . (10)iven a RSF Φ q , deﬁne the matrix S ij = s ij |V t ( i ) | ∈ R n × n .Eq. (10) translates as: E ( S ) = K . Thus, ∀ i ∈ V : E [¯ x ( i )] = E  |V t ( i ) | (cid:88) j ∈V t ( i ) y j  (11) = E (cid:2) δ ti S y (cid:3) = δ ti E [ S ] y = δ ti K y = ˆ x i . (12)The estimator ¯ x is thus unbiased. Note that one also has: E (cid:2) ¯ x ( i ) (cid:3) = E (cid:104)(cid:0) δ ti S y (cid:1) (cid:105) = y t E (cid:2) S t δ i δ ti S (cid:3) y , (13)such that (cid:88) i var (¯ x ( i )) = (cid:88) i y t E (cid:2) S t δ i δ ti S (cid:3) y − ( δ ti K y ) (14) = y t (cid:2) E (cid:2) S t S (cid:3) − K (cid:3) y . (15)Note that S t S = S , i.e. , E [ S t S ] = K , ﬁnishing the proof. Rao-Blackwellisation.

Note that ¯ x is a Rao-Blackwellisationof ˜ x where the sufﬁcient statistic is the partition induced bythe RSF. As such, ¯ x is necessarily an improvement over ˜ x .This improvement can also be observed in the variance equa-tions of both propositions: as K is PSD with eigenvaluesbetween and , one has: ∀ y ; y t ( K − K ) y ≤ y t ( I − K ) y . Tikhonov denoising.

Let y = x + (cid:15) be a noisy measurementof a signal x that one wishes to recover. Assuming that themeasurement noise (cid:15) is Gaussian with covariance E (cid:15) ( (cid:15)(cid:15) t ) = γ I , one can write (for instance for the second estimator): E (cid:15) (cid:2) E (cid:0) || ¯ x − ˆ x || (cid:1)(cid:3) = x t ( K − K ) x + γ Tr (cid:0) K − K (cid:1) = || ¯ F x || + γ Tr (cid:0) ¯ F (cid:1) , where ¯ F = U ¯ f ( Λ ) U t is a graph bandpass ﬁlter [1, 13] withfrequency response ¯ f ( λ ) = √ qλq + λ . The second term dependson the noise level γ . The ﬁrst term, however, depends on theoriginal signal x ﬁltered by ¯ f : the Fourier components of x associated with graph frequencies around λ = q (maximizing ¯ f ) are thus harder to denoise than the ones close to or λ n . In practice.

For a given q , N independent RSFs are sampled–in time O ( N |E| q ) . Each RSF provides an independent esti-mate of ˆ x , and all N estimates are ﬁnally averaged. Generalisation.

Instead of estimating results of the form ( q I + L ) − q y , one may need to estimate results of the form ( Q + L ) − Q y where Q = diag ( q ) is a diagonal matrix, with q = ( q | . . . | q n ) t ∈ (0 , + ∞ ) n . In order to tackle this case,one considers the following distribution over forests: P (Φ Q = φ ) ∝ (cid:89) r ∈ ρ ( φ ) q r (cid:89) τ ∈ φ (cid:89) ( ij ) ∈ τ W ij , (16)that generalizes the distribution of Eq. (4). One can efﬁcientlysample from this distribution –also via a variant of Wilson’s algorithm (see the next paragraph). The introduced estimatorsgeneralize naturally to this case. In fact, given a RSF Φ , theirformulation is exactly the same, the sole difference stemmingfrom the distribution from which Φ is drawn from. Proposi-tions 1 and 2 are still correct (proofs are omitted due to lack ofspace) for K = ( Q + L ) − Q instead of K = ( q I + L ) − q I . Implementation.

In a nutshell, an algorithm sampling fromthe distribution of Eq. (16) i/ adds a node ∆ to the graph andconnects it to each node i with weight q i ; ii/ runs Wilson’s RandomTreeWithRoot algorithm (based on loop-erasedrandom walks –see Figure 1 of [10]) on this augmented graphto sample a UST T rooted in ∆ ; iii/ cuts the edges connectedto ∆ in T yielding a RSF Φ Q . Then, the estimator ˜ x asso-ciates to each node i the value of y at its root r Φ Q ( i ) , whereasthe estimator ¯ x associates to each node i the average of y over all the nodes belonging to its tree. All these operationscan be done in a distributed fashion: no centralized knowl-edge of the graph is needed. Also, once the RSF is sampled,the computations involved for the estimators are not only dis-tributed but can also be made in parallel (within each tree ofthe forest). To give an order of magnitude of computationtimes, for a random vector y and q = , our naive Julia im-plementation of ¯ x on a graph with n = 10 (resp. ) nodesand |E| = 10 (resp ) edges runs in average in 35 (resp.550) ms on a laptop. These times are to compare to the op-timized built-in sparse matrix multiplication L y running in 6(resp. 115) ms, which is the building block of both conju-gate gradient and polynomial approximation methods statedin the introduction. Our methods are thus comparable to thestate-of-the-art in computation time.

4. EXPERIMENTSIllustration on an image.

Fig. 1 shows an image denoisingexample on a × grayscale image. Constant irregularpatches are observed on the realisation of ˜ x ( N = 1) : they arethe trees of the associated RSF realisation. Also, as expected, ¯ x converges faster than ˜ x (as N increases) for all values of q . Illustration on SSL.

The goal of SSL is to infer the label(or class) of all the nodes of a graph given a few pre-labellednodes. Consider a partial labeling Y = ( y | y | . . . | y k ) ∈ R n × k of the nodes, where k is the number of classes and y l ( i ) is equal to if node i is a priori known to belong toclass l and otherwise. The objective is to ﬁnd k classiﬁ-cation functions { f l } l =1 ,...,k such that each f l is on the onehand close to the labeling function y l and on the other handsmooth on the graph, with a trade-off given by a parameter µ > . Depending on the choice of Laplacian used to deﬁnethis graph smoothness (in the following, σ = 1 correspondsto the combinatorial Laplacian, σ = 1 / to the normalizedLaplacian, σ = 0 to the random walk Laplacian), the ex-plicit formulation of f l can be written as (Prop. 2.2 of [3]): f l = µ µ (cid:16) I − µ D − σ WD σ − (cid:17) − y l . Note that this can be : ˜ x ( N = 1) : y : ¯ x ( N = 1) : ˆ x : ˜ x ( N = 20) : ¯ x ( N = 20) : Fig. 1 . Illustration on an image. A grayscale image x is considered as a graph signal on an unweighted 2D grid graph whereeach pixel is a node connected to its four immediate neighbours. y = x + (cid:15) is a noisy measurement of x ( (cid:15) is Gaussian withcovariance matrix γ I ). ˆ x = q ( q I + L ) − y is the exact Tikhonov denoised signal (here with q = 1 ) that we try to estimate.Bottom line: the two left images show estimates of ˆ x obtained with the RSF-based estimators ˜ x and ¯ x detailed in Section 3.Averaging over N = 20 forest realisations, one obtains the two images on the right. Finally, the top-right ﬁgure is the PeakSignal-to-Noise Ratio (PSNR) of the denoised images (averaged over realisations of (cid:15) ) . As usual in these scenarios, thereexists an optimal regularization parameter (a value of q maximizing the PSNR) that is here observed to be between and . Fig. 2 . Illustration on SSL. Performance of the communityrecovery in a SBM with two equal-size classes, vs. the num-ber of pre-labeled nodes m in each class. Results are averagedover 10 realisations of SBM. Left: setting with strong com-munities. Right: setting with fuzzy communities.re-written, with K = ( Q + L ) − Q and Q = µ D , as: ∀ l = 1 , . . . , k f l = D − σ KD σ − y l . Finally, once all classiﬁcation functions f l are computed, eachnode i is classiﬁed in the class argmax l f l ( i ) .One may use our estimators to solve the SSL classiﬁcationtask: ∀ l , i/ use the proposed estimators on the vector D σ − y l to estimate KD σ − y l , ii/ left-multiply the result by D − σ andobtain an estimate of f l , iii/ once all functions f l have beenestimated, classify each node i to argmax l f l ( i ) . In the fol-lowing, we choose σ = 0 and set µ = 1 .We illustrate this on the Stochastic Block Model (SBM):a random graph model with communities. Consider a SBMwith n = 3000 nodes and two communities of equal size.We generate two scenarios: a well-structured (resp. fuzzy) setting with probability of intra-block connection p in = 2 · − (resp. − ) and inter-block connection p out = 3 · − , which corresponds to a sparse graph with average de-gree (cid:39) (resp. ). The following experiment is performed:i/ choose m nodes randomly in each community to serve as a priori knowledge on the labels and use them to deﬁne thetwo label functions y l ; ii/ compute the two classiﬁcation func-tions f l either via direct computation (method referred to as ˆ x ) or via our proposed estimators; iii/ each node i is classi-ﬁed in community argmax [ f ( i ) , f ( i )] . The performance ofthe community recovery is measured by the Adjusted RandIndex (ARI), a number between − and : the larger it is, themore accurate the recovery of the ground truth. Results areshown in Fig. 2. The estimator ¯ x matches the performanceof ˆ x after N = 500 forest realizations. Also, the smallerthe amount of prior knowledge m and the fuzzier the blockstructure, the harder it is to match ˆ x . A closer look at thesampled forests shows that some trees do not contain any la-beled nodes, thus failing to propagate the label information.This proof-of-concept could be improved in various ways toavoid this difﬁculty –going beyond the scope of this paper.

5. CONCLUSION

We provide an original and scalable method enabling to es-timate graph smoothing operations in large graphs. In futurework, we will further explore this deep link between RSFsand Laplacian-based numerical linear algebra, to apply theseideas to more advanced graph signal processing operations. . REFERENCES [1] D.I. Shuman, S.K. Narang, P. Frossard, A. Ortega, andP. Vandergheynst, “The emerging ﬁeld of signal pro-cessing on graphs: Extending high-dimensional dataanalysis to networks and other irregular domains,”

Sig-nal Processing Magazine, IEEE , vol. 30, no. 3, pp. 83–98, May 2013.[2] A. Sandryhaila and J.M.F. Moura, “Big Data Analysiswith Signal Processing on Graphs: Representation andprocessing of massive data sets with irregular structure,”

Signal Processing Magazine, IEEE , vol. 31, no. 5, pp.80–90, Sept. 2014.[3] Konstantin Avrachenkov, Alexey Mishenin, Paulo Go-nalves, and Marina Sokol, “Generalized OptimizationFramework for Graph-based Semi-supervised Learn-ing,” in

Proceedings of the 2012 SIAM InternationalConference on Data Mining , pp. 966–974.[4] Gilles Puy, Nicolas Tremblay, Rmi Gribonval, andPierre Vandergheynst, “Random sampling of bandlim-ited signals on graphs,”

Applied and ComputationalHarmonic Analysis , pp. –, 2016.[5] Yousef Saad,

Iterative methods for sparse linear sys-tems , vol. 82, SIAM, 2003.[6] D.I. Shuman, P. Vandergheynst, and P. Frossard,“Chebyshev polynomial approximation for distributedsignal processing,” in

Distributed Computing in SensorSystems and Workshops (DCOSS), 2011 InternationalConference on , June 2011, pp. 1–8.[7] F.R.K. Chung,

Spectral graph theory , Number 92. AmerMathematical Society, 1997.[8] Raﬁg Agaev and Pavel Chebotarev, “Spanning Forestsof a Digraph and Their Applications,”

Automation andRemote Control , vol. 62, no. 3, pp. 443–466, 2001,arXiv: math/0602061.[9] Simon Barthelm, Nicolas Tremblay, Alexandre Gaudil-lire, Luca Avena, and Pierre-Olivier Amblard, “Estimat-ing the inverse trace using random forests on graphs,” in

Proc. GRETSI Symposium Signal and Image Process-ing, Lille, France , 2019, arXiv: 1905.02086.[10] David Bruce Wilson, “Generating random spanningtrees more quickly than the cover time,” in

Proceedingsof the twenty-eighth annual ACM symposium on Theoryof computing . 1996, pp. 296–303, ACM.[11] L. Avena and A. Gaudillire, “Two Applications of Ran-dom Spanning Forests,”

Journal of Theoretical Proba-bility , July 2017. [12] Luca Avena and Alexandre Gaudillire, “Random span-ning forests, Markov matrix spectra and well distributedpoints,” arXiv:1310.1723 [math] , Oct. 2013, arXiv:1310.1723.[13] Nicolas Tremblay, Paulo Gonalves, and Pierre Borgnat,“Chapter 11 - Design of Graph Filters and Filterbanks,”in