Smoothing graph signals via random spanning forests
Yusuf Y. Pilavci, Pierre-Olivier Amblard, Simon Barthelmé, Nicolas Tremblay
SSMOOTHING GRAPH SIGNALS VIA RANDOM SPANNING FORESTS
Yusuf Y. Pilavci, Pierre-Olivier Amblard, Simon Barthelm´e, Nicolas Tremblay
CNRS, Univ. Grenoble Alpes, Grenoble INP, GIPSA-lab, Grenoble, France
ABSTRACT
Another facet of the elegant link between random processeson graphs and Laplacian-based numerical linear algebra is un-covered: based on random spanning forests, novel Monte-Carlo estimators for graph signal smoothing are proposed.These random forests are sampled efficiently via a variant ofWilson’s algorithm –in time linear in the number of edges.The theoretical variance of the proposed estimators are ana-lyzed, and their application to several problems are consid-ered, such as Tikhonov denoising of graph signals or semi-supervised learning for node classification on graphs.
Index Terms — graph signal processing, smoothing, ran-dom spanning forests
1. INTRODUCTION
Tikhonov denoising of graph signals [1, 2], semi-supervisedlearning for node classification in graphs [3], post-samplinggraph signal reconstruction [4] are a few examples falling inthe following class of optimization problems. Given a graphsignal y = ( y | . . . | y n ) t ∈ R n where y i is the value measuredat node i of a graph, one considers the problem ˆ x = argmin z ∈ R n q || y − z || + z t L z , (1)where L is the Laplacian of the graph and q > a parame-ter tuning the trade-off between a data-fidelity term || y − z || –encouraging the solution ˆ x to be close to y – and a smooth-ness term z t L z –encouraging the solution ˆ x to vary slowlyalong any path of the graph. In Tikhonov denoising, y is anoisy measurement of an underlying signal x that one wantsto recover. In semi-supervised learning (SSL), y are knownlabels that one wants to propagate on the graph to classify allthe nodes in different classes (see Section 4 for more details).This optimization problem admits an explicit solution: ˆ x = K y with K = q ( q I + L ) − ∈ R n × n , (2)that requires the inversion of a regularized Laplacian q I + L ,where I is the identity matrix. Computing K costs O ( n ) ele-mentary operations and becomes prohibitive as n increases. This work was partly funded by the ANR GenGP (ANR-16-CE23-0008),the ANR GraVa (ANR-18-CE40-0005), the LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01), the CNRS PEPS I3A (Project RW4SPEC), the Greno-ble Data Institute (ANR-15-IDEX-02), the LIA CNRS/Melbourne UnivGeodesic, and the MIAI chair ”LargeDATA at UGA.”
State-of-the-art.
For large n (say ≥ ), the state-of-the-artincludes iterative methods such as conjugate gradient withpreconditioning [5] where L is only accessed via matrix-vector multiplications of the form L z , and polynomial ap-proximation methods [6] that approximate K by a low-orderpolynomial in L . Both classes of methods enable to compute ˆ x in time linear in |E| , the number of edges of the graph. Contribution.
We introduce two Monte-Carlo estimators of ˆ x based on random spanning forests, that: • show another facet of the elegant link between randomprocesses on graphs and Laplacian-based numericallinear algebra, such as in [7, 8, 9] • scale linearly in |E| and thus useful on very large graphs • can be implemented in a fully distributed fashion: aslong as each node is able to communicate with itsneighbours, the result can be obtained without cen-tralized knowledge of the graph’s structure L (see theimplementation paragraph at the end of Section 3).The Julia code implementing these estimators and reproduc-ing this papers’ results is available on the authors’ website . Structure of the paper.
We provide the necessary notationsand background in Section 2. In Section 3, we detail the twonovel estimators before generalizing them to cases where K is of the form K = ( Q + L ) − Q with Q a positive diagonalmatrix. The experimental Section 4 gives an illustration onimage denoising before showing how to use our estimators inthe context of SSL. We conclude in Section 5.
2. BACKGROUNDNotations and preliminary definitions.
We consider undi-rected graphs G = ( V , E , W ) where V is the set of |V| = n nodes, E the set of |E| edges, and W ∈ R n × n the symmetricweighted adjacency matrix. We denote by d i = (cid:80) j W ij thedegree of node i , d = ( d , d , . . . , d n ) t ∈ R n the degree vec-tor, and D = diag ( d ) ∈ R n × n the diagonal degree matrix.In this paper, we consider the Laplacian matrix defined as L = D − W ∈ R n × n . L is positive semi-definite (PSD) [7] andits eigenvalues can be ordered as λ ≤ λ ≤ . . . ≤ λ n .We also consider graph signals x ∈ R n defined over the nodes V : x i = x ( i ) is the signal’s value at node i . We denote by ∼ nicolas.tremblay/files/graph smoothing via RSF.zip a r X i v : . [ c s . D M ] F e b ∈ R n the constant vector equal to 1, and by δ i ∈ R n theKronecker delta such that δ i ( j ) = 1 if j = i , and otherwise.A tree of G is understood as a connected subgraph of G that does not contain cycles. A rooted tree of G , i.e. , a tree of G whose edges are all oriented towards one node called root,is generically denoted by τ . A rooted forest of G , i.e. , a setof disjoint rooted trees, is generically denoted by φ . In thefollowing, we only consider rooted trees and rooted forests,and thus simply refer to them as trees and forests. Also, ρ ( φ ) stands for the set of roots of the trees in φ . As such, | ρ ( φ ) | is the total number of trees in φ . We define the function r φ : V → V such that r φ ( i ) is the root of the tree containingnode i in the forest φ . Node i is said to be rooted in r φ ( i ) . Random spanning forests (RSFs).
Let us recall that a span-ning tree (resp. forest) of G is a tree (resp. forest) of G that spans all nodes in V . Now, for a given graph G , thereexists in general many spanning trees (and even more span-ning forests). Probabilists and computer scientists have beenstudying several distributions over spanning trees and forestsin the past. A classical distribution over spanning trees is: P ( T = τ ) ∝ (cid:89) ( ij ) ∈ τ W ij . (3)Sampling from this distribution yields a so-called uniform spanning tree (UST) T , and can be efficiently performed byWilson’s algorithm [10] via loop-erased random walks.We focus here on the following parametrized distributionover spanning forests: P (Φ q = φ ) ∝ q | ρ ( φ ) | (cid:89) τ ∈ φ (cid:89) ( ij ) ∈ τ W ij , q ∈ R + ∗ . (4)Sampling from this distribution yields a RSF Φ q and is effi-ciently performed (in time O ( |E| /q ) ) via a variant of Wilson’salgorithm [11]. Many properties of Φ q are known [11, 12].For instance, the probability that node i is rooted in j is (seelemma 3.3 in [12]) P (cid:0) r Φ q ( i ) = j (cid:1) = K ij . (5)
3. RSF-BASED ESTIMATORS
Given a signal y , our goal is to compute ˆ x = K y withoutcomputing explicitly K , that is, without inverting q I + L . Wedefine two Monte-Carlo estimators of ˆ x , both based on RSFs. A first estimator of ˆ x . Let us consider a realisation Φ q ofRSF and define the estimator ˜ x as ∀ i ˜ x ( i ) = y (cid:2) r Φ q ( i ) (cid:3) . (6) Proposition 1. ˜ x is unbiased: E [˜ x ] = ˆ x . Moreover: E (cid:0) || ˜ x − ˆ x || (cid:1) = (cid:88) i var (˜ x ( i )) = y t ( I − K ) y . this terminology comes from the fact that in unweighted graphs, thisdistribution reduces to the uniform distribution over all spanning trees Proof.
Let i ∈ V . One has, using Eq. (5): E [˜ x ( i )] = E (cid:2) y (cid:2) r Φ q ( i ) (cid:3)(cid:3) = (cid:88) j P ( r Φ q ( i ) = j ) y j (7) = (cid:88) j K ij y j = δ ti K y = ˆ x ( i ) . (8) ˜ x is thus unbiased. A similar calculation yields E (cid:2) ˜ x ( i ) (cid:3) = δ ti K y (2) where ∀ k, y (2) k = y k , such that the variance verifies:var (˜ x ( i )) = E (cid:2) ˜ x ( i ) (cid:3) − E [˜ x ( i )] = δ ti K y (2) − ( δ ti K y ) . Summing over all nodes yields: (cid:88) i var (˜ x ( i )) = t K y (2) − y t K y . Noticing that is an eigenvector of K with eigenvalue (as is an eigenvector of L with eigenvalue ) ends the proof. An improved estimator of ˆ x . A random forest Φ q contains | ρ (Φ q ) | trees that we enumerate from to | ρ (Φ q ) | . Consider V k ⊂ V the subset of nodes that are in the k -th tree. As Φ q is a spanning forest, P = {V , V , . . . , V | ρ (Φ q ) | } is a partitionof V ( i.e. , a set of disjoint subsets that cover V ). Let t be thetree membership function that associates to each node i thetree number t ( i ) to which it belongs. For instance, |V t ( i ) | isthe size of the tree containing node i . We define a secondestimator as ∀ i ¯ x ( i ) = 1 |V t ( i ) | (cid:88) j ∈V t ( i ) y j . (9) Proposition 2. ¯ x is unbiased: E [¯ x ] = ˆ x . Moreover: E (cid:0) || ¯ x − ˆ x || (cid:1) = (cid:88) i var (¯ x ( i )) = y t ( K − K ) y . Proof.
Let us denote by π the function that takes as entrya forest φ and outputs its associated partition. Let us alsodefine s ij = 1 if t ( i ) = t ( j ) , and otherwise. We need theProposition 2.3 of [11]. Fixing a partition P of V , one has: P (cid:0) r Φ q ( i ) = j | π (Φ q ) = P (cid:1) = s ij |V t ( i ) | . In words, this means that given a partition, the distribution ofthe root within each part V l is uniform over V l . Also, notethat K ij = P ( r φ ( i ) = j ) can be written: K ij = (cid:88) P P (cid:0) r Φ q ( i ) = j | π (Φ q ) = P (cid:1) P ( π (Φ q ) = P )= (cid:88) P s ij |V t ( i ) | P ( π (Φ q ) = P )= (cid:88) P s ij |V t ( i ) | (cid:88) φ s.t. π ( φ )= P P (Φ q = φ )= (cid:88) φ P (Φ q = φ ) s ij |V t ( i ) | = E (cid:18) s ij |V t ( i ) | (cid:19) . (10)iven a RSF Φ q , define the matrix S ij = s ij |V t ( i ) | ∈ R n × n .Eq. (10) translates as: E ( S ) = K . Thus, ∀ i ∈ V : E [¯ x ( i )] = E |V t ( i ) | (cid:88) j ∈V t ( i ) y j (11) = E (cid:2) δ ti S y (cid:3) = δ ti E [ S ] y = δ ti K y = ˆ x i . (12)The estimator ¯ x is thus unbiased. Note that one also has: E (cid:2) ¯ x ( i ) (cid:3) = E (cid:104)(cid:0) δ ti S y (cid:1) (cid:105) = y t E (cid:2) S t δ i δ ti S (cid:3) y , (13)such that (cid:88) i var (¯ x ( i )) = (cid:88) i y t E (cid:2) S t δ i δ ti S (cid:3) y − ( δ ti K y ) (14) = y t (cid:2) E (cid:2) S t S (cid:3) − K (cid:3) y . (15)Note that S t S = S , i.e. , E [ S t S ] = K , finishing the proof. Rao-Blackwellisation.
Note that ¯ x is a Rao-Blackwellisationof ˜ x where the sufficient statistic is the partition induced bythe RSF. As such, ¯ x is necessarily an improvement over ˜ x .This improvement can also be observed in the variance equa-tions of both propositions: as K is PSD with eigenvaluesbetween and , one has: ∀ y ; y t ( K − K ) y ≤ y t ( I − K ) y . Tikhonov denoising.
Let y = x + (cid:15) be a noisy measurementof a signal x that one wishes to recover. Assuming that themeasurement noise (cid:15) is Gaussian with covariance E (cid:15) ( (cid:15)(cid:15) t ) = γ I , one can write (for instance for the second estimator): E (cid:15) (cid:2) E (cid:0) || ¯ x − ˆ x || (cid:1)(cid:3) = x t ( K − K ) x + γ Tr (cid:0) K − K (cid:1) = || ¯ F x || + γ Tr (cid:0) ¯ F (cid:1) , where ¯ F = U ¯ f ( Λ ) U t is a graph bandpass filter [1, 13] withfrequency response ¯ f ( λ ) = √ qλq + λ . The second term dependson the noise level γ . The first term, however, depends on theoriginal signal x filtered by ¯ f : the Fourier components of x associated with graph frequencies around λ = q (maximizing ¯ f ) are thus harder to denoise than the ones close to or λ n . In practice.
For a given q , N independent RSFs are sampled–in time O ( N |E| q ) . Each RSF provides an independent esti-mate of ˆ x , and all N estimates are finally averaged. Generalisation.
Instead of estimating results of the form ( q I + L ) − q y , one may need to estimate results of the form ( Q + L ) − Q y where Q = diag ( q ) is a diagonal matrix, with q = ( q | . . . | q n ) t ∈ (0 , + ∞ ) n . In order to tackle this case,one considers the following distribution over forests: P (Φ Q = φ ) ∝ (cid:89) r ∈ ρ ( φ ) q r (cid:89) τ ∈ φ (cid:89) ( ij ) ∈ τ W ij , (16)that generalizes the distribution of Eq. (4). One can efficientlysample from this distribution –also via a variant of Wilson’s algorithm (see the next paragraph). The introduced estimatorsgeneralize naturally to this case. In fact, given a RSF Φ , theirformulation is exactly the same, the sole difference stemmingfrom the distribution from which Φ is drawn from. Proposi-tions 1 and 2 are still correct (proofs are omitted due to lack ofspace) for K = ( Q + L ) − Q instead of K = ( q I + L ) − q I . Implementation.
In a nutshell, an algorithm sampling fromthe distribution of Eq. (16) i/ adds a node ∆ to the graph andconnects it to each node i with weight q i ; ii/ runs Wilson’s RandomTreeWithRoot algorithm (based on loop-erasedrandom walks –see Figure 1 of [10]) on this augmented graphto sample a UST T rooted in ∆ ; iii/ cuts the edges connectedto ∆ in T yielding a RSF Φ Q . Then, the estimator ˜ x asso-ciates to each node i the value of y at its root r Φ Q ( i ) , whereasthe estimator ¯ x associates to each node i the average of y over all the nodes belonging to its tree. All these operationscan be done in a distributed fashion: no centralized knowl-edge of the graph is needed. Also, once the RSF is sampled,the computations involved for the estimators are not only dis-tributed but can also be made in parallel (within each tree ofthe forest). To give an order of magnitude of computationtimes, for a random vector y and q = , our naive Julia im-plementation of ¯ x on a graph with n = 10 (resp. ) nodesand |E| = 10 (resp ) edges runs in average in 35 (resp.550) ms on a laptop. These times are to compare to the op-timized built-in sparse matrix multiplication L y running in 6(resp. 115) ms, which is the building block of both conju-gate gradient and polynomial approximation methods statedin the introduction. Our methods are thus comparable to thestate-of-the-art in computation time.
4. EXPERIMENTSIllustration on an image.
Fig. 1 shows an image denoisingexample on a × grayscale image. Constant irregularpatches are observed on the realisation of ˜ x ( N = 1) : they arethe trees of the associated RSF realisation. Also, as expected, ¯ x converges faster than ˜ x (as N increases) for all values of q . Illustration on SSL.
The goal of SSL is to infer the label(or class) of all the nodes of a graph given a few pre-labellednodes. Consider a partial labeling Y = ( y | y | . . . | y k ) ∈ R n × k of the nodes, where k is the number of classes and y l ( i ) is equal to if node i is a priori known to belong toclass l and otherwise. The objective is to find k classifi-cation functions { f l } l =1 ,...,k such that each f l is on the onehand close to the labeling function y l and on the other handsmooth on the graph, with a trade-off given by a parameter µ > . Depending on the choice of Laplacian used to definethis graph smoothness (in the following, σ = 1 correspondsto the combinatorial Laplacian, σ = 1 / to the normalizedLaplacian, σ = 0 to the random walk Laplacian), the ex-plicit formulation of f l can be written as (Prop. 2.2 of [3]): f l = µ µ (cid:16) I − µ D − σ WD σ − (cid:17) − y l . Note that this can be : ˜ x ( N = 1) : y : ¯ x ( N = 1) : ˆ x : ˜ x ( N = 20) : ¯ x ( N = 20) : Fig. 1 . Illustration on an image. A grayscale image x is considered as a graph signal on an unweighted 2D grid graph whereeach pixel is a node connected to its four immediate neighbours. y = x + (cid:15) is a noisy measurement of x ( (cid:15) is Gaussian withcovariance matrix γ I ). ˆ x = q ( q I + L ) − y is the exact Tikhonov denoised signal (here with q = 1 ) that we try to estimate.Bottom line: the two left images show estimates of ˆ x obtained with the RSF-based estimators ˜ x and ¯ x detailed in Section 3.Averaging over N = 20 forest realisations, one obtains the two images on the right. Finally, the top-right figure is the PeakSignal-to-Noise Ratio (PSNR) of the denoised images (averaged over realisations of (cid:15) ) . As usual in these scenarios, thereexists an optimal regularization parameter (a value of q maximizing the PSNR) that is here observed to be between and . Fig. 2 . Illustration on SSL. Performance of the communityrecovery in a SBM with two equal-size classes, vs. the num-ber of pre-labeled nodes m in each class. Results are averagedover 10 realisations of SBM. Left: setting with strong com-munities. Right: setting with fuzzy communities.re-written, with K = ( Q + L ) − Q and Q = µ D , as: ∀ l = 1 , . . . , k f l = D − σ KD σ − y l . Finally, once all classification functions f l are computed, eachnode i is classified in the class argmax l f l ( i ) .One may use our estimators to solve the SSL classificationtask: ∀ l , i/ use the proposed estimators on the vector D σ − y l to estimate KD σ − y l , ii/ left-multiply the result by D − σ andobtain an estimate of f l , iii/ once all functions f l have beenestimated, classify each node i to argmax l f l ( i ) . In the fol-lowing, we choose σ = 0 and set µ = 1 .We illustrate this on the Stochastic Block Model (SBM):a random graph model with communities. Consider a SBMwith n = 3000 nodes and two communities of equal size.We generate two scenarios: a well-structured (resp. fuzzy) setting with probability of intra-block connection p in = 2 · − (resp. − ) and inter-block connection p out = 3 · − , which corresponds to a sparse graph with average de-gree (cid:39) (resp. ). The following experiment is performed:i/ choose m nodes randomly in each community to serve as a priori knowledge on the labels and use them to define thetwo label functions y l ; ii/ compute the two classification func-tions f l either via direct computation (method referred to as ˆ x ) or via our proposed estimators; iii/ each node i is classi-fied in community argmax [ f ( i ) , f ( i )] . The performance ofthe community recovery is measured by the Adjusted RandIndex (ARI), a number between − and : the larger it is, themore accurate the recovery of the ground truth. Results areshown in Fig. 2. The estimator ¯ x matches the performanceof ˆ x after N = 500 forest realizations. Also, the smallerthe amount of prior knowledge m and the fuzzier the blockstructure, the harder it is to match ˆ x . A closer look at thesampled forests shows that some trees do not contain any la-beled nodes, thus failing to propagate the label information.This proof-of-concept could be improved in various ways toavoid this difficulty –going beyond the scope of this paper.
5. CONCLUSION
We provide an original and scalable method enabling to es-timate graph smoothing operations in large graphs. In futurework, we will further explore this deep link between RSFsand Laplacian-based numerical linear algebra, to apply theseideas to more advanced graph signal processing operations. . REFERENCES [1] D.I. Shuman, S.K. Narang, P. Frossard, A. Ortega, andP. Vandergheynst, “The emerging field of signal pro-cessing on graphs: Extending high-dimensional dataanalysis to networks and other irregular domains,”
Sig-nal Processing Magazine, IEEE , vol. 30, no. 3, pp. 83–98, May 2013.[2] A. Sandryhaila and J.M.F. Moura, “Big Data Analysiswith Signal Processing on Graphs: Representation andprocessing of massive data sets with irregular structure,”
Signal Processing Magazine, IEEE , vol. 31, no. 5, pp.80–90, Sept. 2014.[3] Konstantin Avrachenkov, Alexey Mishenin, Paulo Go-nalves, and Marina Sokol, “Generalized OptimizationFramework for Graph-based Semi-supervised Learn-ing,” in
Proceedings of the 2012 SIAM InternationalConference on Data Mining , pp. 966–974.[4] Gilles Puy, Nicolas Tremblay, Rmi Gribonval, andPierre Vandergheynst, “Random sampling of bandlim-ited signals on graphs,”
Applied and ComputationalHarmonic Analysis , pp. –, 2016.[5] Yousef Saad,
Iterative methods for sparse linear sys-tems , vol. 82, SIAM, 2003.[6] D.I. Shuman, P. Vandergheynst, and P. Frossard,“Chebyshev polynomial approximation for distributedsignal processing,” in
Distributed Computing in SensorSystems and Workshops (DCOSS), 2011 InternationalConference on , June 2011, pp. 1–8.[7] F.R.K. Chung,
Spectral graph theory , Number 92. AmerMathematical Society, 1997.[8] Rafig Agaev and Pavel Chebotarev, “Spanning Forestsof a Digraph and Their Applications,”
Automation andRemote Control , vol. 62, no. 3, pp. 443–466, 2001,arXiv: math/0602061.[9] Simon Barthelm, Nicolas Tremblay, Alexandre Gaudil-lire, Luca Avena, and Pierre-Olivier Amblard, “Estimat-ing the inverse trace using random forests on graphs,” in
Proc. GRETSI Symposium Signal and Image Process-ing, Lille, France , 2019, arXiv: 1905.02086.[10] David Bruce Wilson, “Generating random spanningtrees more quickly than the cover time,” in
Proceedingsof the twenty-eighth annual ACM symposium on Theoryof computing . 1996, pp. 296–303, ACM.[11] L. Avena and A. Gaudillire, “Two Applications of Ran-dom Spanning Forests,”
Journal of Theoretical Proba-bility , July 2017. [12] Luca Avena and Alexandre Gaudillire, “Random span-ning forests, Markov matrix spectra and well distributedpoints,” arXiv:1310.1723 [math] , Oct. 2013, arXiv:1310.1723.[13] Nicolas Tremblay, Paulo Gonalves, and Pierre Borgnat,“Chapter 11 - Design of Graph Filters and Filterbanks,”in