[PDF] Topology-Aware Joint Graph Filter and Edge Weight Identification for Network Processes

Abstract

Data defined over a network have been successfully modelled by means of graph filters. However, although in many scenarios the connectivity of the network is known, e.g., smart grids, social networks, etc., the lack of well-defined interaction weights hinders the ability to model the observed networked data using graph filters. Therefore, in this paper, we focus on the joint identification of coefficients and graph weights defining the graph filter that best models the observed input/output network data. While these two problems have been mostly addressed separately, we here propose an iterative method that exploits the knowledge of the support of the graph for the joint identification of graph filter coefficients and edge weights. We further show that our iterative scheme guarantees a non-increasing cost at every iteration, ensuring a globally-convergent behavior. Numerical experiments confirm the applicability of our proposed approach.

Full PDF

22020 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 21–24, 2020, ESPOO, FINLAND

TOPOLOGY-AWARE JOINT GRAPH FILTER AND EDGE WEIGHT IDENTIFICATION FORNETWORK PROCESSES

Alberto Natali, Mario Coutino and Geert Leus

Faculty of Electrical Engineering, Mathematics and Computer ScienceDelft University of Technology, Delft, The NetherlandsE-mails: { a.natali; m.a.coutinominguez; g.j.t.leus } @tudelft.nl ABSTRACT

Data deﬁned over a network have been successfully modelled bymeans of graph ﬁlters. However, although in many scenarios theconnectivity of the network is known, e.g., smart grids, social net-works, etc., the lack of well-deﬁned interaction weights hinders theability to model the observed networked data using graph ﬁlters.Therefore, in this paper, we focus on the joint identiﬁcation of coef-ﬁcients and graph weights deﬁning the graph ﬁlter that best modelsthe observed input/output network data.While these two problems have been mostly addressed sepa-rately, we here propose an iterative method that exploits the knowl-edge of the support of the graph for the joint identiﬁcation of graphﬁlter coefﬁcients and edge weights. We further show that our itera-tive scheme guarantees a non-increasing cost at every iteration, en-suring a globally-convergent behavior. Numerical experiments con-ﬁrm the applicability of our proposed approach.

Index Terms — Filtering over graphs, graph signal processing,graph ﬁlter identiﬁcation, networked data modeling, topology iden-tiﬁcation

1. INTRODUCTION

The increasing amount of networked data, also conceptualized asgraph signals within the graph signal processing (GSP) ﬁeld [1] [2],has gained a lot of attention in the scientiﬁc community. Due tothis, many signal processing tasks have been adapted towards theirnetworked counterpart, as extensively detailed in [3].In the graph setting, it is common to parameterize network pro-cesses through graph ﬁlters, due to their versatility and their natu-ral distributed implementation [4] [5]. They play an important rolewithin GSP, with applications ranging from reconstruction [6] [7][8], denoising [9] and classiﬁcation [10], to forecasting [11] [12]and (graph-)convolutional neural networks [13]. Notable recent ad-vances in such structures are [14], which generalizes state-of-the-art graph ﬁlters to ﬁlters where every node weights the signal of itsneighbors with different values, and [15], which extends the classi-cal problem of blind system identiﬁcation or blind de-convolution tothe graph setting.Given the structure of the graph, encoded by the so-calledgraph shift operator (GSO) [2], and assuming a process modelledby a graph ﬁlter, identifying an underlying network process frominput/output networked data amounts to estimate the graph ﬁltercoefﬁcients, thus alleviating the estimation workload [2] [16]. A keyassumption in graph ﬁltering is the knowledge of the GSO, whichcan be obtained from some other ﬁeld of research or can be esti-mated from historical data. The latter relates to network topologyinference or graph learning which, in recent years, has experienced an exponentially-increasing scientiﬁc interest, see, e.g., [17] [18][19].Related to the scenario we are going to consider, there are alsoworks that model the observed signal as the output of an unknowngraph ﬁlter over an unknown graph. In [20], a two-step GSO iden-tiﬁcation approach is taken, where ﬁrst the GSO’s eigenvectors areidentiﬁed from the diffused (stationary) graph signals and then theGSO’s eigenvalues are estimated based on some general propertiesof the GSO. In [21], the work of [20] is extended to non-stationarygraph signals, entailing the solution of a system of quadratic matrixequations. Using the same approach, the problem of directed net-work topology identiﬁcation is investigated in [22]. Note, though,that none of these above works focuses on estimating the relatedgraph ﬁlter. More similar to our work, is the approach of [23], wherenot only the GSO but also the ﬁlter taps are learned. Although thecontext of [23] is different, in that work, a general linear ﬁlter oper-ator is estimated from the data and then both the GSO and the ﬁltertaps are estimated from it.All the previous approaches rely on a multi-step algorithm andonly exploit some general properties of the GSO, e.g., sparsity. Inaddition, in many practical networks such as social and supply net-works, the support of the graph is a priori known, that is, the con-nections between different entities of the network are already known,yet their importance might be unknown. And this information is notdirectly handled by the above algorithms.Motivated by the above reasons, this work aims to jointly es-timate the graph ﬁlter coefﬁcients and the weights of the networktopology. This joint approach leads to an optimization problem thatis non-convex. We tackle the non-convexity of the problem by build-ing on sequential convex programming (SCP), a local optimizationtool for non-convex problems that leverages the convex optimizationmachinery. We show that an alternating minimization between theﬁlter coefﬁcients and the GSO guarantees that the objective func-tion value at each iteration is non-increasing, obtaining a globallyconvergent method.

2. PRELIMINARIES

In this section, we introduce the GSP background material necessaryfor the rest of the paper, including the formal deﬁnition of graph sig-nals and the core concepts of graph ﬁltering and topology identiﬁca-tion.

Graph Signal Processing

We consider the case in which thedata of interest live in a non-Euclidean domain, described by theundirected graph G = ( V , E , S ) , where V = { , . . . , N } is theset of nodes (or vertices), E ⊆ V × V is the set of edges, and S (cid:13) a r X i v : . [ ee ss . SP ] J u l s a symmetric N × N matrix that represents the graph structure.The matrix S is called the graph shift operator (GSO) [2], whoseentries [ S ] ij for i (cid:54) = j are different from zero only if nodes i and j are connected by an edge. Typical choices of the GSO include the(weighted) adjacency matrix W [2] and the graph Laplacian L [1].This allows us to deﬁne a graph signal, denoted by the vector x ∈ R N , as a mapping from the node set to the set of real vectors;that is, x : V → R N . In this way, x i ∈ R is a scalar that representsthe signal value at node i . Because S reﬂects the local connectivityof G , the operation S x performs at each node a local computationenabling us to introduce the concept of ﬁltering in the graph setting. Graph Filters

We can process a graph signal x by means of aso-called graph ﬁlter [2] as: y = H ( h , S ) x = K (cid:88) k =0 h k S k x , (1)where K is the order of the ﬁlter, H ( h , S ) is a polynomial matrixon S and h := [ h , . . . , h K ] (cid:62) is the vector that contains the ﬁltertaps. Due the locality of S , graph ﬁlters represent linear transfor-mations that can be implemented in a distributed setting [20]. Moreformally, the output entry y i of y at node i is a linear combination of K + 1 terms: the ﬁrst term is the signal value x i of node i ; the k thterm ( k = 1 , , . . . , K ) combines signal values x j from the k -hopneighbors of node i . Topology Identiﬁcation

When the connections of the networkcannot be directly observed or the network is just a conceptual modelof pair-wise relationships among entities, a fundamental question ishow to learn its structure from the graph signals. Formally, considerthe matrix X = [ x , . . . , x T ] ∈ R N × T that stacks column-wise T graph signals x t residing over the network G = ( V , E , S ) . Thegoal is to infer the latent underlying network topology encoded inthe GSO S under some optimality criterion.This problem has been addressed in the past by means of sta-tistical approaches, mostly based on correlation analysis and itsconnections to covariance selection and high-dimensional regres-sion for learning Gaussian graphical models. Only more recently,GSP postulated the network topology inference problem under theassumption that the observed signals exhibit certain properties overthe graph, such as smoothness, stationarity or band-limitedness. Thereader interested in this topic is referred to [17] [18] [19].Differently from the traditional topology identiﬁcation setting,instead of estimating S from X , we rely on model (1) and focuson a problem where given input and output data, the values of thenonzero entries of S , i.e., the edge weights, and the ﬁlter taps h ofa graph ﬁlter H ( h , S ) have to be jointly identiﬁed. In Section 3, werigorously formulate this problem and, in Section 4, we propose away to efﬁciently tackle it.

3. JOINT GRAPH FILTER AND TOPOLOGY ESTIMATION

Suppose there is an unknown network process that can be accuratelymodelled by a graph ﬁlter H ( h , S ) where, in response to an input x t , we observe a corresponding output y t . Such dynamics can befound for instance in social networks, where as a result of an ad-vertisement campaign, we may expect to observe a response of thenetwork’s users; or in epidemics, where the nodes of the network arecities and we monitor the evolution of a spreading disease from onetime instant to the next.Let us assume that there are T input-output pairs available, andthat we stack them column-wise in the matrices X = [ x , . . . , x T ] and Y = [ y , . . . , y T ] , respectively. Let the unknown ﬁlter H ( h , S ) be of the form in (1). At this point, we are ready toformally state the problem we are going to address. Problem Statement

Given the input-output data { x t , y t } Tt =1 andthe support, A , of the graph G , the goal is to identify the ﬁlter coef-ﬁcients h and the GSO S embodied in the graph ﬁlter H ( h , S ) , thatmaps x t into y t as accurately as possible. The above problem can be mathematically deﬁned with a least-squares formulation as: argmin h , S (cid:107) Y − (cid:80) Kk =0 h k S k X (cid:107) s.t. S ∈ S supp ( S ) ⊆ A (2)where S represents the set of valid GSOs, A denotes the set with thesupport of G , and (cid:107) · (cid:107) F denotes the Frobenius matrix norm. Notethe (relaxed) constraint on the support: as the sparsity pattern of theGSO might have been overestimated, we leave it to the algorithmto optimize it, eventually shrinking to zero some unnecessary edges.That is, we constrain only the entries of the GSO to be zero in cor-respondence to the zeros of the support, leaving the other entriesunconstrained (both zero and non-zero values are admitted).From (2), we can deduce that the problem is not convex. Indeed,the objective function is made up of cross-products between the en-tries of S and the ﬁlter coefﬁcients h k , and by the power terms S k .The overall optimization problem is hence not convex and traditionaltools of convex optimization cannot be used.Although not directly handling the ﬁxed-support case, the worksreferenced in Section 1 address the estimation problem using multi-step approaches to ﬁnd S and/or h . For instance, in [23] each realiza-tion is modeled through a graph ﬁlter-based vector auto-regressive(VAR) model, and this structure is leveraged to ﬁrst recover thegraph ﬁlters H i ( h , S ) representing the matrix ﬁlter taps of the VAR,and only then to recover the shift S and the coefﬁcients h from them.Other approaches, such as [20] [21] are only interested in learningthe shift S , while others, such as [15], only in the ﬁlter coefﬁcients h . Differently from the method in [23], in the following, we intro-duce a globally convergent SCP-based method to directly ﬁnd boththe ﬁlter taps h and the GSO S . To the best of our knowledge, this isthe ﬁrst work that jointly learns the ﬁlter taps and the graph topologyfrom observations.

4. ALTERNATING MINIMIZATION

To tackle the non-convexity of the problem and to bypass the limitedﬂexibility of other methods, we resort to the alternating minimization(AM) approach, acting iteratively on h and S . The general AMpseudo-code, adapted to our case, is reported in Algorithm 1. Noticethat due to steps and in Algorithm 1, the cost is guaranteed to bea non-increasing function of the iteration number. In the following,we show how to perform step and of the proposed algorithm.Given the estimate of the GSO S at the ( n − th iteration, i.e., S ( n − , the estimation problem at the n th iteration for the ﬁlter tapsvector h , i.e., h ( n ) , reads as: h ( n ) = argmin h (cid:107) Y − K (cid:88) k =0 h k (cid:0) S ( n − (cid:1) k X (cid:107) . (3)Problem (3) is convex and boils down to the traditional linear leastsquares (LLS) problem. lgorithm 1 Joint GF & GSO Identiﬁcation

Require:

Feasible S (0) , ε > , A , S n ← while not converged do h ( n ) ← argmin h f (cid:16) h , S ( n − (cid:17) [See Eq. (3)] S ( n ) ← argmin S f (cid:16) h ( n ) , S (cid:17) ( SCP ) [See Alg. 2] Check convergence ( h ( n ) , S ( n ) , ε ) n ← n + 1 end while return S ( n ) , h ( n ) The solution of (3) is then used in the next step, i.e., step , tominimize the function with respect to the constrained GSO S ; that is S ( n ) = argmin S { f ( S ) := (cid:107) Y − (cid:80) Kk =0 h ( n ) k S k X (cid:107) } s.t. S ∈ S supp ( S ) ⊆ A (4)As problem (4) is not convex, we employ SCP [24], a heuristic andlocal optimization method for non-convex problems that leveragesconvex optimization, where the non-convex portion of the problemis modeled by convex functions that are (at least locally) accurate.Given the non-convex function f ( S ) , the idea in SCP is to main-tain a solution estimate S [ l ] and a respective convex trust region T [ l ] ⊆ R N × N over which we “trust” our solution to reside . Then,using a convex approximation (cid:98) f of f , around S [ l ] , the next solutionestimate, S [ l +1] , is computed using the optimizer of (cid:98) f in T [ l ] . Typi-cal trust regions include (cid:96) -norm balls or bounded regions.For our case, we deﬁne as trust region the box: T [ l ] =  S ∈ S , supp ( S ) ⊆ A| [ S ] ij − [ S [ l ] ] ij | ≤ ρ ij ( l ) , if ( i, j ) ∈ E (cid:54) = 0 , ∀ i, j ∈ V , (5)where ρ ij : Z + → R ++ is a mapping from the iteration number tothe breadth of the search for the ( i, j ) th entry.For the convex approximation of the function, we linearize thefunction f ( S ) around the previous estimate S [ l ] using its ﬁrst-orderTaylor approximation : ˆ f [ l ] ( S ) := f ( S [ l ] ) + tr (cid:104) ∇ S f ( S [ l ] ) (cid:62) ( S − S [ l ] ) (cid:105) . (6)We then ﬁnd a feasible intermediate iterate by solving the problem: ˆ S = arg min S ∈T [ l ] ˆ f [ l ] ( S ) . (7)Due to the non-convexity of the cost function f ( S ) , its value atthe (feasible) point ˆ S is not guaranteed to be lower than the one at S [ l ] . Hence, to ﬁnd the “best” feasible solution S at the ( l + 1) thiteration, we ﬁrst resort to a line search to ﬁnd the optimal scalingstep size parameter α l toward the feasible descent direction ∆ l :=ˆ S − S [ l ] ; that is, α ∗ l = arg min α l ∈ [0 , f ( S [ l ] + α l ∆ l ) . (8) We use the superscript with square brackets to indicate the SCP itera-tions. The computation of ∇ S f ( S [ l ] ) is reported in the Appendix. Then, we compute our next solution estimate S [ l +1] through S [ l +1] = S [ l ] + α ∗ l ∆ l , (9)which is feasible for the original problem as long as the set S isconvex, i.e., the update in (9) is a convex combination of feasiblepoints. The specialized SCP procedure for our problem is summa-rized in Algorithm 2. Note that steps - guarantee, at each iteration,the feasibility of the iterate and a non-increasing cost function value,leading to the global convergence of Algorithm 1. Algorithm 2

SCP

Require: S ( n ) , h ( n ) , { ρ ij } ( i,j ) ∈E , ε > l ← S [0] ← S ( n ) while not converged do Compute { ρ ij ( l − } ( i,j ) ∈E Construct ˆ f [ l − ( S ) as in (6) Deﬁne T [ l − as in (5) ˆ S ← arg min S ∈T [ l − ˆ f [ l − ( S ) α ∗ l − ← arg min α l − ∈ [0 , f ( S [ l − + α l − (ˆ S − S [ l − )) S [ l ] ← S [ l − + α ∗ l − (ˆ S − S [ l − )) Check convergence ( h ( n ) , S [ l ] , ε ) l ← l + 1 end while return S [ l ] Due to the non-convexity of the cost function, the global op-timality of the solution is not guaranteed, thus the results are de-pendent on the initial starting point(s) as they might lead to dif-ferent local minima. Despite that in these cases multi-start is rec-ommended, we have found in our numerical experiments that boththe unweighted adjacency matrix, A , and the respective combina-torial Laplacian matrix, L , are good initial iterates, i.e., S (0) , forthe proposed approach; they are straightforward choices and can becomputed using the support of the graph. To validate this claim, inour experiments, we generate initial GSO iterates S (0) i , through amethod reported in the Appendix, and show their performance in thenext section, along with those of A and L .

5. NUMERICAL RESULTS

In this section, we show some numerical results obtained for identi-fying different graph ﬁlters and GSOs S . In these experiments, weconsider cases where the GSOs to identify are the weighted adja-cency matrix and the Laplacian.To evaluate the correctness of our method, we ﬁrst generate arandom graph composed of N = 30 nodes with the GSP Toolbox[25] and construct from the graph the respective GSO S involvedin the graph ﬁlter that generates the output data. We then generate T = 500 input graph signals { x t } Tt =1 drawn from a standard normaldistribution. By ﬁxing the order of the graph ﬁlter to K = 5 , wegenerate graph ﬁlter taps h following a Gaussian distribution withzero mean and σ = 3 . Finally, the output graph signals { y t } Tt =1 aregenerated following (1).In our experiments, we analyze two main aspects of the pro-posed method: i) the convergence of the algorithm, regardless theinitial starting point; and ii) the similarity in terms of edge weightsbetween the groundtruth GSO S and the identiﬁed one (cid:98) S . To provide Iteration Number N M SE S1S2S3S4S5BLBA (a) S g = L, S h = L Iteration Number N M SE S1S2S3S4S5BLBA (b) S g = W, S h = W Iteration Number N M SE S1S2S3S4S5BLBA (c) S g = W, S h = L Fig. 1 : NMSE for different settings of the true GSO type S g and the hypothesis GSO type S h . The legend in each plot contains the consideredGSOs for initializing the algorithm.a fair comparison, we assume we do not know in advance the typeof GSO that generate the network process, i.e., S is not completelyknown a priori. For this, we provide a guess of GSO type as inputto Algorithm 1, and hope that a proper guess leads to a good ﬁtting.In the sequel, we denote with S g the type of GSO used to generate the data, and with S h the type of GSO hypothesized . Both types ofGSOs can assume the values W and L, indicating respectively the(weighted) adjacency matrix and the Laplacian matrix .As performance metric for the error evaluation, we consider thenormalized MSE (NMSE), deﬁned asNMSE = (cid:80) Tt =1 (cid:107) (cid:98) y t − y t (cid:107) (cid:80) Tt =1 (cid:107) y t (cid:107) (10)where (cid:98) y t is the predicted graph signal relative to the input x t .Figure 1a shows the NMSE as function of the “cumulative” it-eration number , for S g = S h = L. Regardless of the starting point,we observe the non-increasing behavior of the NMSE, corroboratingthe global convergence of the algorithm. For this particular ( S g , S h ) combination, L and A are the best performing starting points interms of ﬁnal NMSE, with L reaching convergence in just a fewiterations. The sharp steps downwards, especially noticeable in thecase of A are due to the update of the graph ﬁlter coefﬁcients h . Inthis case, the other initial points are not better that the straightfor-ward initial guesses. Similar observations can be made from Fig. 1b.A case of GSO mismatch is shown in Fig. 1c, where the data aregenerated using the weighted adjacency matrix, but the algorithm isrunning based on the Laplacian hypothesis. As expected, the A ma-trix is the best starting point. Comparing Fig. 1b and Fig. 1c, wherethe curves starting at A and L achieve the same NSMSE, we notehow in case of matched hypotheses, the GSOs generated throughthe generation procedure yield a lower error with respect to the mis-matched counterpart.As a quantitative measure of similarity between the groundtruthand the inferred weights, we report their Spearman correlation coef-ﬁcient r s , which is a non parametric measure of rank correlation. Inparticular, it answers the following question: do edges with higherweight in the groundtruth GSO tend to have a higher weight in theinferred one? A perfect Spearman correlation of +1 or − occurswhen each of the variables is a perfect monotone function of the Note how we don’t use here the bold notation, because both S g and S h are (textual) parameters of the algorithm, in contrast to the considered GSOsstarting points that are effectively matrices. We count all the iterations of the algorithm up to its convergence. Wesum in a cumulative manner the outer and the inner iterations of Algorithm 1. other. In our setting, r s = 0 . thus conﬁrming a strong positivecorrelation of the two vectors. Moreover, as depicted in the Q-Qplot of Fig. 2, the quantiles of the two vectors lie almost entirelyon the straight line, allowing us to state that the weights of the twoGSOs come approximately from the same distribution. For a qual- X Quantiles (Inferred) Y Q uan t il e s ( G r ound t r u t h ) Fig. 2 : Q-Q plot of the weights of the groundtruth GSO and theweights of the inferred Laplacian for the case S g = L, S h = Litative and visual assessment of the method, in Fig. 3a and Fig. 3bwe show, respectively, the graphs and the weighted sparsity patternof the groundtruth and the learned Laplacian matrix (for S g = S h = L). We observe how, up to a scaling factor, the algorithm is able togive a larger weight to those edges that are also “important” in theoriginal graph. All these considerations make us optimistic in thecontinuation of the development and the study of the proposed ap-proach, driving us toward its application in more complex real-worldscenarios.

6. CONCLUSION

In this work, we formulated and studied the problem of jointly es-timating the ﬁlter coefﬁcients and the graph shift operator (GSO)deﬁning a graph ﬁlter that models the dynamics of signals deﬁnedover a network. In particular, motivated by practical scenarios, weexploited the a priori knowledge of the sparsity pattern of the net-work. We proposed an alternating-minimization approach, whosenon-convex subproblem is handled through sequential convex pro-gramming methods. As shown in the numerical results, the proposedmethod is globally convergent and is able to identify the type of GSO .5 1 1.5 2 2.5 3 3.5 4 4.50.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 (a) Left: groundtruth graph. Right: inferred graph. The initial con-dition for the inferred graph was L. The darker the edge in the graph,the higher its value.

Groundtruth GSO

10 20 3051015202530 BL

10 20 3051015202530 (b) Left: groundtruth Laplacian. Right: inferred Laplacian

Fig. 3 : (a) Graphs and (b) Laplacian matrix heatmaps for the caseS g = S h = Lused to generate the data. Quantitative statistical measures and qual-itative graphics demonstrated the efﬁcacy of the algorithm to assignhigher values to those weights that are prominent in the real graph.

7. APPENDIX

To compute the derivative ∇ S f ( S ) , let us ﬁrst expand the function f ( S ) : f ( S ) == tr (cid:104) ( Y − H ( h , S ) X )( Y − H ( h , S ) X ) (cid:62) (cid:105) = tr (cid:16) YY (cid:62) (cid:17) − tr (cid:16) HXY (cid:62) (cid:17) + tr (cid:104) H (cid:62) HXX (cid:62) (cid:105) = tr (cid:104) YY (cid:62) (cid:105) − K (cid:88) k =0 h k tr (cid:104) S k XY (cid:62) (cid:105) + K (cid:88) k K (cid:88) k h k h k tr (cid:16) S k + k XX (cid:62) (cid:17) We set h ( n ) to h , and H ( h , S ) to H , for the rest of the proof. Then ∇ S f ( S ) == − K (cid:88) k =0 h k ∇ S tr (cid:104) S k XY (cid:62) (cid:105) + K (cid:88) k K (cid:88) k h k h k ∇ S tr (cid:16) S k + k XX (cid:62) (cid:17) Because S is symmetric, we have to take into account its struc-ture for the computation of the derivative of f ( S ) . Indeed, due to thematrix symmetry, the overall gradient can be decomposed in: ∇ S f ( S ) = (cid:20) ∂f ( S ) ∂ S (cid:21) + (cid:20) ∂f ( S ) ∂ S (cid:21) (cid:62) − diag (cid:20) ∂f ( S ) ∂ S (cid:21) . Finally, because ∂∂ S Tr (cid:0) S k (cid:1) = k (cid:0) S k − (cid:1) (cid:62) and ∂∂ S Tr (cid:0) BS k (cid:1) = (cid:80) k − r =0 (cid:0) S r BS k − r − (cid:1) (cid:62) , we have that the component [ ∂f ( S ) /∂ S ] of the gradient is : ∂f ( S ) ∂ S == − K (cid:88) k =0 h k ∇ S tr (cid:104) S k XY (cid:62) (cid:105) + K (cid:88) k K (cid:88) k h k h k ∇ S tr (cid:16) S k + k XX (cid:62) (cid:17) = − K (cid:88) k =1 h k (cid:34) k − (cid:88) r =0 ( S r XY (cid:62) S k − r − ) (cid:62) (cid:35) + K (cid:88) k K (cid:88) k h k h k k + k − (cid:88) r =0 ( S r XX (cid:62) S k + k − r − ) (cid:62) Let the model be y = H ( h , S ) x for some order K of the ﬁlter.Then, a K = 1 approximation for the overall problem (2) is givenby y ≈ (ˆ h (1)0 I + ˆ S ) x , (11)where ˆ h (1)0 is the constant ﬁlter tap estimate related to the ﬁrst orderapproximation, and ˆ S ∈ S is the respective estimate for the GSO.We assume ˆ h (1)1 = 1 to avoid the scalar ambiguity that would other-wise arise in the term ˆ h ˆ S . This also decouples the ﬁlter parametersfrom the GSO, both of which can be estimated by LLS. This way, aﬁrst GSO candidate S (0)1 for the algorithm is found. Next, we con-sider a second order approximation of the model y ≈ (ˆ h (2)0 I + ˆ S + ˆ h (2)2 S (0)21 ) x , (12)where the variables are now ˆ h (2)0 , ˆ h (2)2 and ˆ S , and we still assumethe ﬁrst ﬁlter tap ˆ h (2)1 is equal to one. This again leads to a LLSproblem which generates a second GSO candidate S (0)2 . We iteratethis procedure by increasing the order of the ﬁlter at each step, andmaintaining the term that is linear in the GSO variable S . At the end,we have K initial GSO candidates S (0)1 , S (0)2 , . . . , S (0) K , which canbe given as input to Algorithm 1. . REFERENCES [1] David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Or-tega, and Pierre Vandergheynst, “The emerging ﬁeld of signalprocessing on graphs: Extending high-dimensional data anal-ysis to networks and other irregular domains,” IEEE signalprocessing magazine , vol. 30, no. 3, pp. 83–98, 2013.[2] A. Sandryhaila and J. M. F. Moura, “Discrete signal processingon graphs,”

IEEE Transactions on Signal Processing , vol. 61,no. 7, pp. 1644–1656, April 2013.[3] Antonio Ortega, Pascal Frossard, Jelena Kovaˇcevi´c, Jos´e MFMoura, and Pierre Vandergheynst, “Graph signal processing:Overview, challenges, and applications,”

Proceedings of theIEEE , vol. 106, no. 5, pp. 808–828, 2018.[4] David I Shuman, Pierre Vandergheynst, Daniel Kressner, andPascal Frossard, “Distributed signal processing via chebyshevpolynomial approximation,”

IEEE Transactions on Signal andInformation Processing over Networks , vol. 4, no. 4, pp. 736–751, 2018.[5] Santiago Segarra, Antonio G Marques, and Alejandro Ribeiro,“Optimal graph-ﬁlter design and applications to distributed lin-ear network operators,”

IEEE Transactions on Signal Process-ing , vol. 65, no. 15, pp. 4117–4131, 2017.[6] Sunil K Narang, Akshay Gadde, and Antonio Ortega, “Sig-nal processing techniques for interpolation in graph structureddata,” in . IEEE, 2013, pp. 5445–5449.[7] Benjamin Girault, Paulo Gonc¸alves, Eric Fleury, and Arash-preet Singh Mor, “Semi-supervised learning for graph to sig-nal mapping: A graph signal wiener ﬁlter interpretation,” in . IEEE, 2014, pp. 1115–1119.[8] Elvin Isuﬁ, Paolo Di Lorenzo, Paolo Banelli, and Geert Leus,“Distributed wiener-based reconstruction of graph signals,” , pp.21–25, 2018.[9] Masaki Onuki, Shunsuke Ono, Masao Yamagishi, and YuichiTanaka, “Graph signal denoising via trilateral ﬁlter on graphspectral domain,”

IEEE Transactions on Signal and Informa-tion Processing over Networks , vol. 2, no. 2, pp. 137–148,2016.[10] J. Ma, W. Huang, S. Segarra, and A. Ribeiro, “Diffusion ﬁlter-ing of graph signals and its use in recommendation systems,”in , March 2016, pp. 4563–4567.[11] Elvin Isuﬁ, Andreas Loukas, Nathanael Perraudin, and GeertLeus, “Forecasting time series with varma recursions ongraphs,”

IEEE Transactions on Signal Processing , vol. 67, no.18, pp. 4870–4885, 2019.[12] A. Natali, E. Isuﬁ, and G. Leus, “Forecasting multi-dimensional processes over graphs,” in

ICASSP 2020 - 2020IEEE International Conference on Acoustics, Speech and Sig-nal Processing (ICASSP) , 2020, pp. 5575–5579.[13] Fernando Gama, Antonio G Marques, Geert Leus, and Alejan-dro Ribeiro, “Convolutional neural network architectures forsignals supported on graphs,”

IEEE Transactions on SignalProcessing , vol. 67, no. 4, pp. 1034–1049, 2018. [14] Mario Coutino, Elvin Isuﬁ, and Geert Leus, “Advances in dis-tributed graph ﬁltering,”

IEEE Transactions on Signal Process-ing , vol. 67, no. 9, pp. 2320–2333, 2019.[15] Santiago Segarra, Gonzalo Mateos, Antonio G Marques, andAlejandro Ribeiro, “Blind identiﬁcation of graph ﬁlters,”

IEEETransactions on Signal Processing , vol. 65, no. 5, pp. 1146–1159, 2016.[16] Jiani Liu, Elvin Isuﬁ, and Geert Leus, “Filter design for au-toregressive moving average graph ﬁlters,”

IEEE Transactionson Signal and Information Processing over Networks , vol. 5,no. 1, pp. 47–60, 2018.[17] Gonzalo Mateos, Santiago Segarra, Antonio G Marques, andAlejandro Ribeiro, “Connecting the dots: Identifying networkstructure via graph signal processing,”

IEEE Signal ProcessingMagazine , vol. 36, no. 3, pp. 16–43, 2019.[18] Xiaowen Dong, Dorina Thanou, Michael Rabbat, and PascalFrossard, “Learning graphs from data: A signal representationperspective,”

IEEE Signal Processing Magazine , vol. 36, no.3, pp. 44–63, 2019.[19] Georgios B Giannakis, Yanning Shen, and Georgios VasileiosKaranikolas, “Topology identiﬁcation and learning overgraphs: Accounting for nonlinearities and dynamics,”

Pro-ceedings of the IEEE , vol. 106, no. 5, pp. 787–807, 2018.[20] Santiago Segarra, Antonio G Marques, Gonzalo Mateos, andAlejandro Ribeiro, “Network topology inference from spec-tral templates,”

IEEE Transactions on Signal and InformationProcessing over Networks , vol. 3, no. 3, pp. 467–483, 2017.[21] Rasoul Shaﬁpour, Santiago Segarra, Antonio G Marques, andGonzalo Mateos, “Identifying the topology of undirectednetworks from diffused non-stationary graph signals,” arXivpreprint arXiv:1801.03862 , 2018.[22] R. Shaﬁpour, S. Segarra, A. G. Marques, and G. Mateos, “Di-rected network topology inference via graph ﬁlter identiﬁca-tion,” in , June 2018,pp. 210–214.[23] J. Mei and J. M. F. Moura, “Signal processing on graphs:Causal modeling of unstructured data,”

IEEE Transactions onSignal Processing , vol. 65, no. 8, pp. 2077–2092, April 2017.[24] Stephen Boyd, “Sequential convex programming,”

LectureNotes, Stanford University , 2008.[25] Nathana¨el Perraudin, Johan Paratte, David Shuman, Li-onel Martin, Vassilis Kalofolias, Pierre Vandergheynst, andDavid K. Hammond, “GSPBOX: A toolbox for signal pro-cessing on graphs,”