[PDF] Modelling Graph Errors: Towards Robust Graph Signal Processing

Abstract

The first step for any graph signal processing (GSP) procedure is to learn the graph signal representation, i.e., to capture the dependence structure of the data into an adjacency matrix. Indeed, the adjacency matrix is typically not known a priori and has to be learned. However, it is learned with errors. A little attention has been paid to modelling such errors in the adjacency matrix, and studying their effects on GSP methods. However, modelling errors in the adjacency matrix will enable both to study the graph error effects in GSP and to develop robust GSP algorithms. In this paper, we therefore introduce practically justifiable graph error models. We also study, both analytically when possible and numerically, the graph error effect on the performance of GSP methods in different types of problems such as filtering of graph signals and independent component analysis of graph signals (graph decorrelation).

Full PDF

11 Modelling Graph Errors: Towards RobustGraph Signal Processing

Jari Miettinen,

Member, IEEE , Sergiy A. Vorobyov,

Fellow, IEEE , and EsaOllila,

Senior Member, IEEE

Abstract

The ﬁrst step for any graph signal processing (GSP) procedure is to learn the graph signal rep-resentation, i.e., to capture the dependence structure of the data into an adjacency matrix. Indeed, theadjacency matrix is typically not known a priori and has to be learned. However, it is learned with errors.A little attention has been paid to modelling such errors in the adjacency matrix, and studying theireffects on GSP methods. However, modelling errors in the adjacency matrix will enable both to study thegraph error effects in GSP and to develop robust GSP algorithms. In this paper, we therefore introducepractically justiﬁable graph error models. We also study, both analytically when possible and numerically,the graph error effect on the performance of GSP methods in different types of problems such as ﬁlteringof graph signals and independent component analysis of graph signals (graph decorrelation).

Keywords:

Erd¨os-R´enyi graphs, error effect, graph signal processing, minimum distance index,shift matrix I. INTRODUCTIONIn the classical signal processing setup where the digital signals are represented in terms oftime series or vectors of spatial measurements (for example, measurements by sensor arrays),it is assumed that each point of a discrete signal depends on the preceding or spatially closepoint of the signal. With the current advances in the data collection and representation, when

The authors are with the Dept. Signal Processing and Acoustics, Aalto University, PO Box 13000 FI-00076 Aalto, Fin-land, Emails: [email protected], [email protected], [email protected] . Somepreliminary results of this work have been presented at the

IEEE ICASSP 2018 . This work was supported in parts by theAcademy of Finland grants No. 299243 and No. 298118. Sergiy A. Vorobyov is the corresponding author.

September 28, 2020 DRAFT a r X i v : . [ c s . I T ] S e p the data points may not be ordered temporally or spatially and the digital signal may no longerbe accurately or even adequately represented by a simple time series structure, the classicalsignal processing tools are no longer applicable. Thus, graph signal processing (GSP) [1]–[3]has emerged as a new area of signal processing and data analysis where one of the major aimsis to generalize the standard signal processing methods and concepts into the context of morecomplex signals represented on graphs that can have various structures. The graph correspondingto ordered data is then just a simple special case of a digital signal graph. Examples of digitalsignals represented on graphs include sensor networks, brain networks, gene regulatory networks,and social networks to name just a few most frequently encountered graph signals [4]–[7].In GSP, each signal value is indexed by a node in a graph at which this signal value is measured,and edges between pairs of nodes in a graph indicate dependencies between the correspondingsignal values. Then the underlying graph can be fully speciﬁed by an adjacency matrix, denotedhereafter as A , whose ( i, j ) th element is nonzero if the i th and the j th nodes are connected,and the value [ A ] i,j = a ij describes the strength of the relationship. The adjacency matrix canbe viewed as the basis of GSP, on which for example the graph Fourier transform (GFT) andgraph ﬁlters [1] are built, often via the graph Laplacian matrix or other shift operators whichare based on the adjacency matrix. The process of obtaining the adjacency matrix may take different forms. There may existphysical or friendship connections between the nodes known in advance, which directly yielda graph. At the other extreme end, there may be no auxiliary information available about theconnectedness of the nodes, and the methods for ﬁnding an adjacency matrix are then entirelybased on training data. Methods for estimating the graph Laplacian matrix in the latter case havebeen developed in [14]–[18]. Third, the choice of the adjacency matrix may be based on othervariables, which are not perfectly correlated with the graph signal which we are interested in.In both the second and the third cases, the process of adjacency matrix learning is necessarilystochastic. As a result, there are multiple sources of errors which lead to imperfect learning ofthe adjacency matrix. Even in the case when the choice of the adjacency matrix seems to beobvious and based on the known physical or friendship connections between the nodes, such The GSP literature is developing fast and by now covers also sampling theory for graph signals [8], [9], stationarity theoryfor graph signals [10], and percolation [11] of graph signals. Graph ﬁlters have been applied for example in distributed averageconsensus problem [12], [13].

September 28, 2020 DRAFT choice still may not be the best one.Our focus in this paper is to study consequences of imperfect speciﬁcations of graph signaladjacency matrix to GSP tasks. The source of errors in the adjacency matrix learning maybe related to or be completely irrelevant to the sources of errors in the signal measurements.However, even when the sources of errors in the adjacency matrix learning and signal valuemeasurements are the same, the effect of errors in the adjacency matrix on GSP tasks aredifferent from those caused by the errors in signal measurements. Considerations of the graphlearning errors has only recently gained some attention. For example, a discussion on the topicis given in [19], where eigen-decomposition of the graph Laplacian was analyzed when smallsubsets of edges are added or deleted. Based on the analysis, robust methods were built for GSPtasks such as signal recovery, label propagation and clustering, when the knowledge of edgewiseprobabilities of errors in the graph topology is available.There are also some works which study robustness with respect to graph topology errors inspeciﬁc tasks. For example, total least-squares approach for robust graph signal recovery wasproposed in [20]. Graph neural networks were found to be stable to small changes in graphtopology in [21]. Node-variant graph ﬁlters were introduced in [22], and they were shown bynumerical experiments to be more robust to errors in graph shift operator than node-invariantﬁlters. For distributed average consensus problem, [23] proposed a method which optimizesworst case performance when the eigenvalue distribution is considered uncertain. In [24], a newcentrality measure was introduced and the robustness with respect to edge weight perturbationwas considered. The robustness to some missing edges was also part of the comparison ofautoregressive moving average graph ﬁlters and ﬁnite impulse response graph ﬁlters for time-varying graph signals in [25]. For spectral graph ﬁlters, [26] deﬁned Caley smoothness ﬁlterspace where the ﬁlter perturbation is bounded by a constant times the perturbation of the graph.In the presence of erroneous edges, [27] studied differences between small world graph andErd¨os–R´enyi graph, and [28] studied clustering of nodes into two communities.The starting point for studying GSP performance under the condition of imperfect knowledgeof graph signal adjacency matrix is the modelling of graph errors. As the errors in the signalmodelling in the classical signal processing may lead to signiﬁcant performance degradation(see for example [29]), the errors in the adjacency matrix may have a signiﬁcant effect on theGSP performance. Then the development of robust GSP methods, just as the development of

September 28, 2020 DRAFT robust signal processing methods [29], is of a great importance. This paper, to the best of ourknowledge, is the ﬁrst attempt towards developing justiﬁable and generic enough models foradjacency matrix mismatches. It also studies the effects of the mismatched adjacency matrix onthe performance of some more traditional GSP applications such as ﬁltering of graph signalsas well as independent component analysis (ICA) of graph signals (also referred to as graphdecorrelation (GraDe) in the context of separating signals based on graph structure only).

A. Contributions

Our main contributions are the following. • We formulate graph error models for the adjacency matrix, which help to quantify thedeviation from the true matrix using a few parameters. We consider adding and removingedges and perturbing the edge weights. First, we build models assuming equal probabilitiesof mislearning the edge status for each pair of nodes, but later we generalize to the casewhere there are several subsets of pairs such that within the groups the probabilities areequal, but they differ between the groups. • We study the structural effects of the proposed error models on the adjacency matrix,including spectrum analysis where applicable and possible. • We introduce the graph autocorrelation, which is a scale and edge density invariant measureof smoothness between the signal and the graph. • We illustrate what effects different type of errors in adjacency matrix speciﬁcation producein ﬁltering of graph signal and ICA of graph signals. Some theoretical studies as well asnumerical and real data-based studies are performed.

B. Notation

We use boldface capital letters for matrices, boldface lowercase letters for vectors, and capitalcalligraphic letters for sets. The exceptions are N which is the N -dimensional vector full ofones, the M × N matrix full of ones M × N = M (cid:62) N , and A is a matrix of the same sizeas A , such that [ A ] i,j = 1 , if a i,j (cid:54) = 0 and [ A ] i,j = 0 , if a i,j = 0 . The matrix I N × N is the N × N identity matrix. The notations ( · ) (cid:62) , (cid:12) , (cid:107) · (cid:107) , tr {·} , P ( · ) , E {·} , and var ( · ) stand for Our preliminary work on the topic has been reported in [30].

September 28, 2020 DRAFT the transpose, Hadamard product, Euclidian norm of a vector, trace of a matrix, probability,mathematical expectation, and variance, respectively. The notation for diagonal elements of amatrix A is diag ( A ) , and the notation N (0 , σ ) stands for Gaussian zero-mean distribution withvariance σ . C. Paper Organization

In Section II, we start by recalling graph models and establishing a result, which brings ananalogy between Gaussian distribution and Erd¨os–R´enyi model. This motivates the special roleof Erd¨os–R´enyi model in graph error models which are introduced for different types of graphs.Section III is devoted to deﬁning and studying graph autocorrelation. In Section IV, the effectof adjacency matrix mismatch for several GSP applications is studied using simulations and realdata examples. Section V concludes the paper.II. GRAPH ERROR MODELS

A. Basic Building Blocks

Let us ﬁrst go through the basics of graphs, and deﬁne the graph models which will be usedin the paper.Let G = ( N , E ) be a directed graph that represents the basis of a graph signal, where N isthe set of N nodes and E is the set of edges. The adjacency matrix of the graph G , denoted as A , is a matrix that satisﬁes the conditions a i,i = 0 for i = 1 , . . . , N and a i,j (cid:54) = 0 if and only if ( i, j ) ∈ E , i.e., there is an edge from node j to node i .For developing our graph error models, we will use the Erd ¨os–R´enyi model according to whicha random graph is constructed by connecting nodes randomly with a constant probability [31].The corresponding graph is denoted as G = ( N , (cid:15) ) and its adjacency matrix ∆ (cid:15) is a random N × N matrix such that P ([ ∆ (cid:15) ] i,j = 1) = (cid:15) and P ([ ∆ (cid:15) ] i,j = 0) = 1 − (cid:15) for all i (cid:54) = j , and [ ∆ (cid:15) ] i,i = 0 for i = 1 , . . . , N , where each element of the matrix is generated independently fromthe other elements.Another graph model which will be used in this paper is the stochastic block model (SBM) [32],where each node belongs to one of r communities and connections within a community are morecommon than connections between nodes in different communities. Let N , . . . , N r denote thesizes of the communities and let us assume that the nodes are ordered so that the i th node September 28, 2020 DRAFT belongs to the j th community if and only if (cid:80) j − l =1 N l < i ≤ (cid:80) jl =1 N l . The probability of theedge existence from a node of the j th community to a node of the i th community is denotedas p i,j . A special case of the stochastic block model when p i,i = p j,j = p for all i and j , and p i,j = p k,l = q for any i (cid:54) = j and k (cid:54) = l , is called planted partition model (PPM).Because of the errors in the adjacency matrix learning, the estimated adjacency matrix deviatesfrom the true one. In the following subsections, we introduce graph error models in the formaligned with the objective to study the effect of the errors in the adjacency matrix. All modelsshare the same additive structure, i.e., have the form W = A + E , (1)where W presents the estimated adjacency matrix, A is the correct adjacency matrix, and E isan unknown error matrix which can be viewed as an analog of the additive error/distortion/noisecomponent (that applies not to the signal points, but to the signal structure) in the traditionalsignal processing and time series analysis. Notice that E needs to be a function of A becauseit depends on A whether distorting an edge means adding or deleting it. Moreover, the Erd¨os–R´enyi graph is the basic GSP error/distortion/noise model analogous to the basic Gaussian noisemodel in the traditional signal processing as will be shown in the next subsection.The correct adjacency matrix for a given graph signal and a GSP method can be deﬁned asthe one which yields the best results. In the case when the data generating process is known,the optimal adjacency matrix in that sense can often be found. On the other hand, when theoptimal adjacency matrix is not known, we might still want to use the graph error models tosee how much the results vary when the adjacency matrix is perturbed. Then the role of correctadjacency matrix A can be assigned to an estimated adjacency matrix as well. B. Motivation for Erd¨os-R´enyi Graph Error Models

In the context of our paper, an important characteristic of the Erd¨os-R´enyi graph is that itdoes not allow for formation of communities [33], and if applied on the top of another graph,it will not change the essential structure of that graph. Instead, it just disturbs the spectrum ofthe original graph as will be shown later in this section.The simplicity and the above-mentioned properties are the main reasons for the use of Erd¨os-R´enyi model as the main block for modelling the mismatches in the graphical structures of graph

September 28, 2020 DRAFT signals. However, we will further justify it by bringing an analogy to the basic nature of theGaussian additive noise for modelling errors related to imperfect ﬁt between signal models andactual measured signal values.Similar to the fact that the key for modelling additive noise is the noise probability densityfunction, the key for modelling graph errors will be the graph function or graphon . The termgraphon was introduced in [34], in the context of graph sequence theory, and it refers tomeasurable and symmetric functions mapping from [0 , to [0 , . Graphons have been used, forexample, as graph limit objects [34], [35], and as building blocks in the kernel graph model [36].Moreover, they have been used for spectral analysis of the adjacency matrix [33]. See Appendix Afor a brief introduction to graphons.In traditional signal processing, the main reason for the use of Gaussian random variablesas error terms meant to model the discrepancies between the analytic signal model and actualmeasurements is the central limit theorem (CLT). This theorem states that the distribution of sumof independent and identically distributed random variables tends to the Gaussian distribution asthe number of summands grows. With respect to modelling errors in graphs, the Erd ¨os-R´enyigraph plays a special role due to its simplicity. However, the importance of the Erd ¨os-R´enyigraph also follows from the fact that similar to the Gaussian distribution for measurement errormodelling, the Erd¨os-R´enyi graphon leads to a theorem that can be considered as an analog ofCLT in the case of modelling the graph structure errors.Let W (cid:15) denote the set of all graphons W , which is a weighted graph with uncountable set ofnodes, satisfying (cid:90) (cid:90) W ( x, y ) dxdy = (cid:15) where W ( x, y ) is the edge weight between nodes x and y in the graphon W (see Appendix Afor more details), and < (cid:15) < . Then the following theorem is in order. Theorem 1.

Let W , . . . , W M be random samples from W (cid:15) and c , . . . , c M be random samplesfrom a distribution with support on [0 , and expected value c . Assume that W , . . . , W M , c , . . . , c M are mutually independent. Then for all ≤ x, y ≤ , we almost surely have ¯ W ( x, y ) = 1 M M (cid:88) i =1 c i W i ( x, y ) → c(cid:15) as M → ∞ . The proof of the theorem is given at the end of Appendix A.

September 28, 2020 DRAFT

Theorem 1 states that the average of many independent graphons converges to a constant, i.e.,to the Erd¨os-R´enyi graphon. Thus, the theorem further motivates the use of Erd ¨os-R´enyi graphsas the main building block in developing graph error models for GSP, when the summands areconsidered as noise factors which impede the estimation of the graph structure. In this sense,Theorem 1 is an analog of the CLT.

C. A Basic Model for Unweighted Graphs

We start by considering unweighted graphs, for which the adjacency matrix becomes A = A .First, making a somewhat idealistic assumption that the outcome of the graph signal adjacencymatrix learning is accurate enough, that is, assuming that incorrect graph edge learning is equallyprobable for any edge in the graph, the learning errors can be simply modelled by the Erd¨os-R´enyi graph. Then the actually available learned adjacency matrix of a graph signal can bemodelled in terms of the following inaccurate version of AW = A + ∆ (cid:15) (cid:12) ( N × N − A ) . (M1)According to (M1), the true adjacency matrix of a graph signal is distorted because ofthe imperfect learning, and this distortion follows the Erd¨os-R´enyi model, where the level ofdistortion depends on a single parameter, which is the probability (cid:15) . As a result of the distortioncaptured by model (M1), an edge can be added with probability (cid:15) , when there is no edge in thetrue graph, or an edge of the true graph can be removed with the same probability, if there existsan edge in the true graph. It corresponds to ﬂipping the value from 0 to 1 or from 1 to 0 inthe adjacency matrix A in the positions corresponding to value 1 in the Erd¨os-R´enyi adjacencymatrix ∆ (cid:15) . This basic model is introduced due to its simplicity as it has a single parameter, (cid:15) . However, it may not be sufﬁciently ﬂexible to describe signal graph errors that appear asoutcome in any practical graph learning process. For example, it will change the number ofedges signiﬁcantly if (cid:15) is large and the number of edges is small.Model (M1) can be easily modiﬁed/revised for applying to undirected graphs by deﬁninglower triangular matrices ∆ l(cid:15) analogously to ∆ (cid:15) , and then replacing ∆ (cid:15) by ∆ l(cid:15) + ( ∆ l(cid:15) ) (cid:62) . It isalso worth noting that all the following graph error models will be given for directed graphs Even though the results in [31] are for undirected graphs only and often Erd¨os-R´enyi model is considered to imply undirectedgraph, we will use the term Erd¨os-R´enyi for both undirected and directed graphs.

September 28, 2020 DRAFT and the same technique as above can be used to derive the corresponding model for undirectedgraphs.

D. Different Probabilities of Missed and Mislearned Edges

Basic model (M1) can be easily extended to the case where the probability of removing anedge as a result of distortion from a graph which correctly captures a graph signal, denoted as (cid:15) , is not the same as the probability of adding an edge, which does not exist in the true graph,denoted as (cid:15) . The corresponding inaccurately learned adjacency matrix of a graph signal canthen be modelled as W = A − ∆ (cid:15) (cid:12) A + ∆ (cid:15) (cid:12) ( N × N − A ) . (M2)The second term on the right-hand side in (M2) corresponds to edge removal and the third termcorresponds to edge addition. This is a straightforward extension of basic model (M1), whichis, however, very useful for modelling the typical situation in the graph structure learning whenparticular algorithms are more or less likely to miss an actually existing edge rather than tomislearn an actully non-existing edge [14]. Model (M2) can be interpreted as an application oftwo Erd¨os–R´enyi graphs on the top of the true graph, where one Erd¨os–R´enyi graph G = ( N , (cid:15) ) can only add edges which do not exist in the true graph, while the other Erd ¨os-R´enyi graph G = ( N , (cid:15) ) can only erroneously remove actually existing edges. It is easy to see that model(M2) is equivalent to model (M1) when (cid:15) = (cid:15) = (cid:15) .To analyze the perturbation that the error above proposed models produce, let us start witha trivial observation that if A is given by an Erd ¨os-R´enyi graph with probability parameter α , then W from model (M2) is also an Erd¨os-R´enyi graph, and the parameter takes value α (1 − (cid:15) ) + (1 − α ) (cid:15) .Moreover, if A is given by the SBM, after applying graph error model (M2), the SBM andthe communities in the SBM remain the same as is shown in the following proposition. Proposition 1.

Assume that A follows the SBM with probability p Ai,j for edges from a node in the j th community to a node in the i th community. Then W from graph error model (M2) followsSBM with the same communities as in A and the probabilities are given by p Wi,j = p Ai,j (1 − (cid:15) ) + (1 − p Ai,j ) (cid:15) = (1 − (cid:15) − (cid:15) ) p Ai,j + (cid:15) . September 28, 2020 DRAFT0

Proof:

For W given by (M2), the probability p Wi,j for each edge from the j th community to the i th community is given by the probability that “the edge is in A ( p Ai,j ) and not removed”, i.e., theprobability (1 − (cid:15) ) , plus the probability that “the edge is not in A (1 − p Ai,j ) and is added,” i.e.,the probability ( (cid:15) ) . Since ∆ (cid:15) and ∆ (cid:15) in (M2) are independent from A , the two probabilitiesabove are products of their component probabilities. This observation gives the stated result andcompletes the proof. (cid:3) The spectrum of the adjacency matrix can be found in closed form for simple models suchas the undirected (for the error model of undirected graphs, see the end of the next subsection)PPM with equal community sizes [37]. Thus, we also have the following proposition for thespectrum perturbation because of the graph error given by (M2).

Proposition 2.

Assume that A follows the PPM with M equally large communities, and let p be the probability for edges within communities and q be the probability for edges betweencommunities. If W follows graph error model (M2) , the expected difference between the largesteigenvalues of the true adjacency matrix A and the mismatched one W is given by E { λ ( A ) − λ ( W ) } = N ( (cid:15) + (cid:15) )( p + ( M − q ) − M (cid:15) M .

Moreover, the expected difference between the second to M th largest eigenvalues for A and W ,are mutually equal and given by E { λ k ( A ) − λ k ( W ) } = N ( (cid:15) + (cid:15) )( p − q ) M , k = 2 , . . . , M.

Proof:

For true adjacency matrix A , the expected value of the largest eigenvalue is N ( p +( M − q ) /M and the expected values of next M − largest eigenvalues are N ( p − q ) /M [37].From Proposition 1, we see that the expected value of the largest eigenvalue of W becomes E { λ ( W ) } = NM ((1 − (cid:15) − (cid:15) ) p + (cid:15) + ( M − − (cid:15) − (cid:15) ) q + (cid:15) )) and the the expected values of next M − largest eigenvalues become E { λ ( W ) } = · · · = E { λ M − ( W ) } = E { λ M ( W ) } = NM ((1 − (cid:15) − (cid:15) ) p + (cid:15) − (1 − (cid:15) − (cid:15) ) q − (cid:15) )= N ((1 − (cid:15) − (cid:15) )( p − q )) . September 28, 2020 DRAFT1

Combining these two results together, the difference of the corresponding expected values canbe found to be as it is stated in the the proposition. It completes the proof. (cid:3)

The relative change of the largest eigenvalue is thus (cid:15) + (1 − M/ ( p + ( M − q )) (cid:15) and therelative change of next M − largest eigenvalues is (cid:15) + (cid:15) . E. Generalized Graph Error Model

It is assumed in models (M1) and (M2) that mislearning the edge status is equally probable(although the probabilities of adding non-existing edge and removing existing edge may bedifferent) for all pairs of nodes. For some adjacency matrix estimation methods, this assumptionmight not hold strictly even if the pairwise connections were equally strong. To allow thedifferences in learning accuracy, caused by the structure of the graph or the connectivity strengthdifferences, we formulate the following generalized graph error model.Consider an unweighted adjacency matrix A . Deﬁne D = { D , . . . , D K } , where D , . . . , D K are N × N matrices satisfying [ D k ] ij ∈ { , } for all k = 1 , . . . , K and i, j = 1 , . . . , N , and (cid:80) Kk =1 D k = N × N − I N × N . Each matrix D k presents the pairs of nodes, indicated by ones, forwhich the probabilities of mislearning the existence of the edges are equal. The probabilities ofremoving edges in the subsets are given in (cid:15) = { (cid:15) , . . . , (cid:15) K } , and the probabilities of addingedges are given in (cid:15) = { (cid:15) , . . . , (cid:15) K } . The graph error model speciﬁed by D , (cid:15) , and (cid:15) canthen be written as W = A − K (cid:88) k =1 ∆ (cid:15) k (cid:12) D k (cid:12) A + K (cid:88) k =1 ∆ (cid:15) k (cid:12) D k (cid:12) ( N × N − A ) . (M3)This construction allows to build very ﬂexible and accurate graph error models, which canadjust to basically any type of graph topology learning errors. However, ﬂexible model requireslarge K , i.e., a lot of parameters. Therefore, it is of interest in practice to only approximatethe graph error by using a reasonable value of K . The accuracy of such approximation is thenappears as trade off with the requirements to the amount and reliability of prior informationabout the graph structure of a graph signal and the complexity in terms of the value of K , i.e.,the number of parameters. It is worth noting here that the Erd¨os-R´enyi graph-based error model(M2) is a special case of graph error model (M3). Indeed, it can be obtained by letting K = 1 and D = N × N − I N × N . September 28, 2020 DRAFT2

For further insights into the above introduced generalized graph error model (M3), we takeas an example the SBM. Let C k,m denote an N × N matrix such that [ C k,m ] i,j = 1 if i (cid:54) = j , (cid:80) k − l =1 N l < i ≤ (cid:80) kl =1 N l and (cid:80) m − l =1 N l < j ≤ (cid:80) ml =1 N l , and [ C k,m ] i,j = 0 , otherwise. Thenwe can set, for example, K = r , D = C , , D = C , , · · · , D r = C r,r , and we obtain amodel where the probability of mislearning an edge depends on which communities the startand the end nodes belong to. For such model, the following result about the structure of the trueadjacency matrix A and that of the mismatched one W as modelled by (M3) is of interest. Proposition 3.

Assume that the true adjacency matrix A follows the SBM with r communities,the mismatched adjacency matrix W follows model (M3) , and D , . . . , D K are chosen accordingto the SBM structure of A . Then W retains SBM with the same communities.Proof: When all such pairs of nodes where one node belongs to i th community and the otherone belongs j th community of A are in the same set D k , the probability for edges betweenthose pairs of nodes remains constant in the adjacency matrix W mismatched according to (M3)as well. The latter trivially means that the SBM structure of A is retained in W . (cid:3) It is also interesting to note that a natural graph error model related to the previously usedPPM can be derived using model (M3) by letting K = 2 , D = (cid:80) rk =1 C k,k , and D = N × N − I N × N − (cid:80) rk =1 C k,k . F. Models for Weighted Graphs

The above deﬁned models can be extended to signals on weighted graphs. Then, in additionto adding and removing edges, we have to consider changes in the edge weights and also ﬁgureout how to generate the weights for added edges in a mismatched adjacency matrix.Let us start with extending graph error model (M2) (extension of graph error model (M1) issimilar). Let A denote the set of nonzero elements of the true graph adjacency matrix A . Theinaccurately learned weighted adjacency matrix can be then modelled as W = A + ( N × N − ∆ (cid:15) ) (cid:12) A (cid:12) Σ c − ∆ (cid:15) (cid:12) A + ∆ (cid:15) (cid:12) B (cid:12) ( N × N − A ) . (M2w)Like in graph error model (M2), (cid:15) is the probability to erroneously removing edges while (cid:15) is the probability of erroneously adding edges. The second term in (M2w) perturbs the weightsof the remaining edges using an N × N matrix Σ c whose elements are drawn from a zero mean September 28, 2020 DRAFT3

Gaussian distribution with variance c · σ , where σ is the sample variance of A and c is thevariance multiplier parameter. The third term in (M2w) models the erroneous removal of edgesand the fourth term in (M2w) models the erroneous addition of edges with weights given by thematrix B which is an N × N matrix whose elements are derived by the same rules which wereused for edge weights of A if applicable, or alternatively the elements can be drawn from A with replacement. Typically for practical GSP tasks, only positive edge weights are considered,and then, if the above described process produces negative elements in W , they should be putto zero.Similarly, the weighted generalized graph error model can be written as a weighted extensionof graph error model (M3), that is, W = A + K (cid:88) k =1 (( N × N − ∆ (cid:15) k ) (cid:12) D k (cid:12) A (cid:12) Σ ck − ∆ (cid:15) k (cid:12) D k (cid:12) A + ∆ (cid:15) k (cid:12) B k (cid:12) D k (cid:12) ( N × N − A )) . (M3w)where (cid:15) k , (cid:15) k and D k , k = 1 , . . . , K are deﬁned in the same way as in Subsection II-E, and B k and Σ ck , k = 1 , . . . , K are analogous to B and Σ c above.III. GRAPH AUTOCORRELATIONIn GSP tasks such as ﬁltering or signal recovery from samples, smoothness with respectto the chosen shift matrix is a key assumption. Therefore, smoothness is also instrumental inthe shift operator learning from training data, as can be also seen for example in [15]. Graphautocorrelation is a measure of smoothness which we will discuss here. As a model for graph signal we use the moving average model which have been used at leastsince [7]. Thus, the graph moving average (GMA) signal model of order m , denoted hereafter asGMA ( m ) , is an extension for graph signals of the traditional time series moving average (MA)model, and it is given as z = y + m (cid:88) l =1 θ l A l y (2) Later in the paper, we deal with a related concept for multivariate signals in the context of ICA, that is, graph autocorrelationmatrix.

September 28, 2020 DRAFT4 where y (cid:44) [ y , . . . , y N ] (cid:62) with y , . . . , y N ∼ N (0 , σ y ) being mutually independent Gaussianrandom variables with zero mean and variance σ y , and θ , . . . , θ m are MA coefﬁcients. In thispaper, we mainly consider the GMA (1) model z = y + θ l Ay = ˜ Ay (3)where ˜ A = I N × N + θ A .We start with the autocovariance of a graph signal. For a centered graph signal, its autoco-variance at lag k with respect to W is deﬁned as s z ,k ( W ) = tr { cov ( z , W k z ) } = E (cid:26) N − k z (cid:62) W k z (cid:27) = 1 N − k E (cid:8) z (cid:62) W k z (cid:9) . (4)Note here that another quadratic form z (cid:62) Lz , where L is the graph Laplacian, appears oftenin GSP literature, for example in [15]. Since the off-diagonal elements of the Laplacian matrixare the opposites of those of the adjacency matrix and the diagonal elements of the Laplacianare given by the node degrees whereas the diagonal elements of the adjacency matrix are zeros,the two quadratic forms as measures of smoothness are closely related but not fully equivalent.Large graph autocovariance values indicate smoothness, but in the case of the Laplacian, smallvalues indicate smoothness. Indeed, assume that there are two adjacency matrices W and W ,and the matrix W gives a larger graph autocovariance value for z and k = 1 than that of thematrix W . Then for the corresponding Laplacian matrices L and L , respectively, it holds that z (cid:62) L z < z (cid:62) L z any time when autocovariance values are distinct and not close to each other,and only if the autocovariance values are close to each other, then sometimes z (cid:62) L z > z (cid:62) L z .For a GMA (1) signal z from (3), we obtain s z ,k ( W ) = 1 N − k E (cid:110) y (cid:62) ˜ A (cid:62) W k ˜ Ay (cid:111) = 1 N − k tr (cid:110) W k E (cid:110) ˜ Ay ( ˜ Ay ) (cid:62) (cid:111)(cid:111) = σ y N − k tr (cid:8) W k ( I N × N + θ A (cid:62) )( I N × N + θ A ) (cid:9) where the property tr { AB } = tr { BA } is used in the ﬁrst step, and the second step follows fromthe formula of the covariance matrix E (cid:8) zz (cid:62) (cid:9) of z = ˜ Ay , see [38]. Thus, graph autocovariancefor GMA (1) signal is a weighted sum of the covariances between the nodes of a graph signal.As a standardized version of the graph autocovariance we deﬁne graph autocorrelation whichis invariant to changes in scales of the graph signal and the matrix W . Then the followingdeﬁnition can be given. September 28, 2020 DRAFT5

Deﬁnition 1.

The graph autocorrelation of lag k with respect to matrix W is r z ,k ( W ) = E (cid:8) z (cid:62) W k z (cid:9) ( E {(cid:107) z (cid:107) } E {(cid:107) W k z (cid:107) } ) / . It is interesting to see how the graph autocorrelation in Deﬁnition 1 depends on the speciﬁcparameters of random graph error model, for example, parameters (cid:15) and (cid:15) of graph error model(M2). It can be seen that the graph autocorrelation is also a function of the graph adjacencymatrix A . Thus, let us also assume that A = ∆ α , i.e., A also follows the Erd¨os-R´enyi modelwith the probability parameter α .For graph autocorrelation r z , ( W ) of the GMA(1) signal z , we would like to derive theexpected value, when both A and W are considered to be random, that is, the expected value E A , W , z { r z , ( W ) } for the lag k = 1 . Even though this expected value is an average over Erd¨os-R´enyi graphs A , we have veriﬁed by extensive simulations that the graph autocorrelation valuesare very similar for any speciﬁc and ﬁxed realization of A . The expected graph autocorrelationvalue can be then expressed in terms of the essential parameters only, such as the GMA coefﬁcient θ , the probability parameter α in the graph model A = ∆ α , and for example, the parameters (cid:15) and (cid:15) in the graph error model given by (M2). The expected value E A , W , z { r z , ( W ) } can beused for example to verify, that GMA signal is smoother with respect to its adjacency matrixthan the mismatched adjacency matrix. Example: Expected value of the graph autocorrelation at k = 1 for GMA(1) and graph errormodel (M2) . We approximate the expected value of the ratio in r z , ( W ) by ratio of expected values, andthe expected value of product in the denominator by the product of expected values. We start by rescaling the matrices A a = a A and W w = w W and choosing the variance σ y so that the following approximation holds E A , W , z { r z , ( W ) } ≈ E A a , W w , z { z (cid:62) Wz } . In order to have this, a , w and σ y need to be chosen so that the following three equations are Even though the elements are not independent, it will be seen in simulations that this approximation is accurate.

September 28, 2020 DRAFT6 satisﬁed. N E {(cid:107) z (cid:107) } = 11 N E {(cid:107) A a z (cid:107) } = 1 (5) N E {(cid:107) W w z (cid:107) } = 1 . For given values of N , α , (cid:15) and (cid:15) , the only unknowns in (5) are a , w and σ y . Hence we canﬁnd a , w and σ y by simply solving the system of three equations (5). Notice that only the thirdequation includes w , which implies that a and σ y can be found from the ﬁrst two equations.However, the expectations in (5) need to be calculated ﬁrst. For calculating the expected valuesin (5), the fact that E { y i y j } = σ y if i = j and E { y i y j } = 0 otherwise can be used together withthe assumption that the elements of the matrices ∆ α , ∆ (cid:15) , and ∆ (cid:15) are independent within eachmatrix as well as between the matrices. The parameter w can be found from the last equation in (5) as w = (cid:0) σ y (cid:0) − α N θ a ( (cid:15) + (cid:15) − + α N θ a (1 − (cid:15) + (cid:15) (2( (cid:15) + (cid:15) ) − αN θ a ( (cid:15) − (cid:15) ) + α N ( θ a ( (cid:15) + (cid:15) − − ( (cid:15) + (cid:15) − )+ αN (1 − (cid:15) + (cid:15) (2( (cid:15) + (cid:15) ) − θ a − N ( (cid:15) − (cid:15) ) + α ( (cid:15) + (cid:15) − − (cid:15) (cid:1)(cid:1) − / and the parameters a and σ y can be found by solving the system of remaining two equationsthat after substituting the above found w becomes  σ y (( α − α ) θ a N − αθa + 1) = 1 σ y (( α − α ) θ a N − α θa N + ( α − α ) a N ) = 1 . Finally, we ﬁnd for W from (M2) with parameters (cid:15) and (cid:15) , the closed-form approximateexpression for the expected autocorrelation value of the graph signal z as a function of (cid:15) and (cid:15) as E A , W , z { r z , ( W ) } ≈ σ y ( θ a w N ( α − α ) − α w )(1 − (cid:15) − (cid:15) ) − w (cid:15) . (6)The analysis of (6) is non-trivial because, for example, w depends on all other parameters.Thus, we leave it to a numerical study in the next section. For more details, see Appendix B

September 28, 2020 DRAFT7

IV. NUMERICAL STUDY

A. Study Case 1: Graph Autocorrelation of GMA(1) Signal for the Mismatched Adjacency Matrix

In our ﬁrst example, we examine how the expected graph autocorrelation of the GMA (1) signal derived in (6) changes when the matrix with respect to which it is computed is graduallychanged from the adjacency matrix of the signal according to graph error model (M2). Weassume that z is a GMA (1) signal with an unweighted Erd ¨os-R´enyi adjacency matrix A withprobability α and coefﬁcient θ as parameters.Figs. 1 and 2 illustrate the behavior of the graph autocorrelation as a function of (cid:15) and (cid:15) ,respectively, when one of them varies and the other one is kept ﬁxed. Both theoretical valuesand averages of 2000 graph autocorrelations from simulated datasets are shown. For each run, anew Erd¨os-R´enyi adjacency matrix A is generated with parameters N = 500 and α = 0 . , andthe GMA coefﬁcient is θ = 0 . . The choice of the variance parameter value σ y has no effect onthe results because of the standardization in the graph autocorrelation (see the deﬁnition).The ﬁgures show that the graph autocorrelation is decreasing with respect to both (cid:15) and (cid:15) ,which suggests that graph autocorrelation is useful for estimating the adjacency matrix even ifwe only have a single realization of the graph signal. B. Study Case 2: Graph Error Effect on GMA Graph Filter

As in any other GSP task, the adjacency matrix has a key role in graph ﬁltering. In fact, anylinear and shift-invariant ﬁlter (commutes with other shift-invariant ﬁlters) can be written as apolynomial of the form [1] F GMA ( W ) = h I N × N + h W + · · · + h K W K where K is smaller than or equal to the degree of the minimal polynomial of W .The coefﬁcients h , . . . , h K can be found by using least squares for solving the system ofequations h + h λ + · · · + h K λ K = α ... h + h λ N + · · · + h K λ KN = α N September 28, 2020 DRAFT8 . . . . . e : Probability of removing an edge E { r z , ( W ) } Fig. 1. Theoretical (solid) and numerical (dash) values of graph autocorrelations of order k = 1 . The lines from top to bottomare for (cid:15) = 0 , . , . , . . where λ , . . . , λ N are eigenvalues of W ordered in increasing order of their distance to themaximum magnitude, i.e., d n = | max {| λ | , . . . , | λ N |} − λ n | . Then λ , . . . , λ N can be interpretedas graph frequencies from lowest to highest [39], and α , . . . , α N are the corresponding frequencyresponses.In this example, we study the ﬁlter sensitivity to the choice of the adjacency matrix, whenthe high-pass ﬁlter is deﬁned by the following frequency responses α n =  , if d n > median { d , . . . , d N } , otherwise and it is applied to detecting malfunctioning sensors as in [39]. The data are the daily temperaturesin the year 2016 from 150 Finnish weather stations [40]. Stations that had at most one missing September 28, 2020 DRAFT9 . . . . . e : probability of adding an edge E { r z , ( W ) } Fig. 2. Theoretical (solid) and simulated (dash) values of graph autocorrelations of order k = 1 . The lines from top to bottomare for (cid:15) = 0 , . , . , . . observation were selected, and since those missing observations were from the same day inNovember, after dropping that day out, we have a clean data of 365 days (leap year). Thetechnique to detect the outlying measurements, presented in [39], is to high-pass ﬁlter the dataand threshold the Fourier transform coefﬁcients of the output by the maximum absolute value ofthe GFT coefﬁcients from the three previous days. If any of the coefﬁcients exceed the thresholdvalue, it is diagnosed that at least one of the sensors is not working properly.In this example, there is no obvious choice of a benchmark adjacency matrix, but we choose In other example, there always exists an obvious choice for benchmark adjacency matrix.

September 28, 2020 DRAFT0 a weighted 6-nearest neighbors graph like in [39] with edge weights a kl = e − ( d kl / (cid:113)(cid:80) j ∈N k e − ( d kj / (cid:80) j ∈N l e − ( d lj / (7)where d kl is the distance between the locations of the k th and l th sensors in kilometers. We usethe general graph error model because it is not sensible to connect two stations that are far awayfrom each other. We set the threshold to 250 kilometers above which a pair of stations is treatedas no longer connected. Otherwise, the probabilities of removing and creating connections haveconstant values (cid:15) and (cid:15) between the pairs. Hence, in graph error model (M3w), which is thenapplicable here, we have the matrices [ D ] kl =  , if d kl ≤ , otherwise[ D ] kl =  , if d kl > , otherwise and probabilities (cid:15) = (cid:15) , (cid:15) = (cid:15) , (cid:15) = (cid:15) = 0 .By perturbing the edge weights, we deviate slightly from graph error model (M3w), so that thevariance of the Gaussian component related to incoming edges of a given node is the variancemultiplier c times the variance of the incoming edge weights of that node in the 6-nearestneighbors graph, not the variance of the edge weights of the whole 6-nearest neighbors graph.We do not allow negative weights, but use zero weights instead.In the experiment, we change one sensor value at a time to +20 Celsius degrees and performthe test described above using adjacency matrices given by all combinations of edge removingprobabilities (cid:15) = 0 , . , . , . , edge adding probabilities (cid:15) = 0 , . , . , . , and varianceparameter values c = 0 , . , . , . .Finding the malfunctioning sensors can be seen as a binary classiﬁcation task. Theoretically,all classiﬁers (the same method with different graphs) have equal proportion of false positivesof about , because the maximum GFT coefﬁcient today is larger than the maximum of theGFT coefﬁcients from the three previous days one out of four times when there are no outliers.Hence, we can focus on the accuracy in ﬁnding the true positives. The benchmark adjacencymatrix given by (cid:15) = (cid:15) = c = 0 found the malfunctioning sensors with accuracy. For theother combinations, 200 realizations of the adjacency matrix are generated. September 28, 2020 DRAFT1

In Fig. 3, we give the averages for different values of parameters of interest when otherparameters are zeros. There are no visible patterns of interactions between the parameters, i.e.,the shapes of the curves look identical at all levels of the other two parameters. The results showthat adding edges to the graph has no effect, but removing edges weakens the performance. Also,the perturbation of the weights has negative impact, though the reduction is signiﬁcant only forvalues from 0 to 0.005, and for c = 0 . , . , . , the numbers are approximately the same.This suggests that the number of edges in the 6-nearest neighbors graph might be too smallfor these purposes, but that the edge weights are good as they are, because small changes inthem worsen the performance. The reason why addition of edges has smaller effect can be partlyexplained by the fact that they have smaller edge weight than the edges in the original nearestneighbors graph. C. Study Case 3: Graph Error Effect on Graph Autoregressive Moving Average Filter

First order graph autoregressive moving average (GARMA) ﬁlter [41] has coefﬁcients φ , ψ and c , and uses some graph Laplacian matrix L . We use here the translated normalized Laplacian L = − T − / WT − / where T is a diagonal matrix of node degrees and W is a symmetric adjacency matrix. Notethat the eigenvalues of L are in the interval [ − , , but as in the GMA ﬁlter case, they can bevery different for the cases of the true A and the estimated W adjacency matrices.The output of GARMA(1) ﬁlter is given by z = y + c x , where x is the input and y is thestate after convergence of the of recursion y t +1 = φ Ly t + ψ x . GARMA ﬁlter of order K canbe constructed from K parallel GARMA(1) ﬁlters, whose coefﬁcients are designed based on theeigenvalues of L .Again, the errors in the adjacency matrix W have a signiﬁcant effect on the ﬁlter directly via L , but also via distorting the spectrum of L . The errors then can reﬂect even on the GARMAﬁlter stability.We are following a simulation setup in [25]. Undirected and unweighted graphs are createdby generating N = 100 random points on the area [0 , × [0 , using the uniform distribution,and connecting two points if the distance between them is less than . √ . September 28, 2020 DRAFT2 C o rr e c t de c i s i on s K=3, K=3, K=3, cK=10, K=10, K=10, c

Fig. 3. The share of correctly detected outlying sensor values for (cid:15) = 0 , . , . , . , (cid:15) = 0 , . , . , . and c =0 , . , . , . . Let L = VΛV (cid:62) be the eigendecomposition of the translated normalized graph Laplacian, andlet λ n and v n denote the n th diagonal element of Λ and the n th column vector of V , i.e., the n th eigenvalue and eigenvector, respectively. The eigenvalues are in the interval [ − , . Thenthe graph signal is given by x = ¯ x + n , where ¯ x is a low frequency signal satisfying ¯ x (cid:62) v n = 1 ,if λ n < , and ¯ x (cid:62) v n = 0 , otherwise, and n is Gaussian noise with zero mean and covariancematrix . I N × N .Frequency responses of GARMA graph ﬁlters of orders K = 1 , , , are designed to matchthe frequency content of the signal ¯ x . Outputs of the ﬁlters, denoted by z ( e ) , are compared to theoutput of the ideal ﬁlter, denoted by z ( d ) . The performance is then measured using the square September 28, 2020 DRAFT3 root of the mean square error σ e = [ tr ( ee (cid:62) ) /N ] / where e = z ( e ) − z ( d ) .The values in Figs. 4 and 5 are averages over 2000 runs for each pair ( (cid:15) , (cid:15) ) in graph errormodel (M2) that is applicable here. The results show that higher order GARMA ﬁlters aremore accurate when the correct adjacency matrix is used. On the other hand, the lower orderﬁlters are more robust to graph errors, and thus all ﬁlters perform almost equally when the errorprobabilities grow. e K = 1K = 3K = 5K = 7

Fig. 4. Averages of σ e over 2000 repetitions for GARMA ﬁlter orders K = 1 , , , and probabilities (cid:15) in graph errormodel (M2), when (cid:15) = 0 . September 28, 2020 DRAFT4 e K = 1K = 3K = 5K = 7

Fig. 5. Averages of σ e over 2000 repetitions for GARMA ﬁlter orders K = 1 , , , and probabilities (cid:15) in graph errormodel (M2), when (cid:15) = 0 . D. Study Case 4: Graph Error Effect on ICA of Graph Signals

In this example, we use both numerical study and theoretical results to investigate the effectof non-optimal signal graph knowledge to the performance of GraDe [7], [42] for graph signalseparation in ICA problem.Let X ∈ R P × N denote centered P -dimensional graph signal generated as a mixture ofindependent components according to the model [42] X = ΩZ where Ω ∈ R P × P is a full rank mixing matrix, Z ∈ R P × N is the matrix of P mutuallyindependent graph signals with zero means and unit variances. September 28, 2020 DRAFT5

The goal of ICA is to estimate the unmixing matrix Γ = Ω − using only the signal matrix X .Let X w = ˆ S − / X be the whitened signals, where ˆ S is the sample covariance matrix of X .In GraDe, the unmixing matrix estimate is obtained by diagonalizing/jointly diagonalizing oneor more graph autocorrelation matrices given as ˆ S k ( W ) = 1 N − k ( X w W k X (cid:62) w ) , k = 1 , . . . , K i.e., by ﬁnding the orthogonal U which maximizes the objective function K (cid:88) k =1 (cid:107) diag ( U ˆ S k ( W ) U (cid:62) ) (cid:107) . The unmixing matrix estimate is then given as ˆ Γ = U ˆ S − / . A fast algorithm for the jointdiagonalization is available in [43] and it is applicable for the case when the shift matrix W ischosen to be symmetric, or the graph-autocorrelation matrices are symmetrized. The unmixingmatrix estimate for an inaccurately learned adjacency matrix W is denoted as ˆ Γ ( W ) . Noticethat GraDe reduces to the well-known second-order blind identiﬁcation (SOBI) estimator [44],when W is the cyclic graph.We will use the following (see [45] for details) asymptotic result, derived in the context ofthe SOBI estimator, for an unmixing matrix estimate ˆ Γ obtained using joint diagonalization ofmatrices ˆ S , . . . , ˆ S K . When Ω = I P × P , for i (cid:54) = j , we have √ N (ˆ γ ii −

1) = − √ N ([ˆ S ] ii −

1) + o p (1) and √ N ˆ γ ij = (cid:80) k ( λ ki − λ kj )( √ N [ˆ S k ] ij − λ ki √ N [ˆ S ] ij ) (cid:80) k ( λ ki − λ kj ) + o p (1) (8)where λ ki (cid:44) E { [ S k ] ii } , and o p (1) stands for negligible terms. The diagonal elements of ˆ Γ donot depend asymptotically on ˆ S , . . . , ˆ S K , and thus, in the case of graph signals ICA, do notdepend on W . Therefore, the sum of variances of the off-diagonal elementsSOV ( ˆ Γ ( W )) = N (cid:88) j (cid:54) = i var ( ˆ Γ ( W ) ij ) (9)can be used when comparing the separation efﬁciencies induced by different choices of W . Wewill use the ratio of the sums given as R ( W , W ) = SOV ( ˆ Γ ( W )) SOV ( ˆ Γ ( W )) . September 28, 2020 DRAFT6

Particularly, W is the adjacency matrix which is used for generating the source componentsand W is a perturbed version of the adjacency matrix. Using the ratio of the sum of variancesinstead of the sums helps to visualize the efﬁciency losses.We consider ICA model where the independent components are GMA(1) signals with sym-metric and unweighted adjacency matrix A , and evaluate the performance of the GraDe estimatewith K = 1 and the matrix W , which is a mismatched version A obtained according to grapherror model (M2) for undirected graphs. In this model, the asymptotic variances of ˆ Γ ( W )) ,which are needed for computing SOV ( ˆ Γ ( W )) , can be calculated using (8). Expressions forthese asymptotic variances can be derived in the same way as the corresponding formulas forthe variances in the time series context in [45]. The performance in the numerical simulations is measured using the minimum distance (MD)index [46] D ( ˆ Γ ) (cid:44) √ P − C ∈C (cid:107) C ˆ ΓΩ − I P × P (cid:107) where C (cid:44) { C : each row and column of C has exactly one non-zero element } . The MDindex takes values between zero and one, and it is invariant with respect to the mixing matrix.Also, there is a connection between the minimum distance index and the sum of variances ofthe off-diagonal elements when Ω = I P × P , given as N ( P − E { D ( ˆ Γ ) } → SOV ( ˆ Γ ) , as N → ∞ (10)where SOV is deﬁned in (9).For two sets of estimates, W and W , we deﬁne ˆ R ( W , W ) = ave { D ( ˆ Γ ( W )) } ave { D ( ˆ Γ ( W )) } where the averages are found over 1000 Monte Carlo trials. Equation (10) implies that ˆ R ( W , W ) ≈ R ( W , W ) for large N .Erd¨os–R´enyi matrices A = ∆ α with different values of α are used as the adjacency matricesof GMA signals. The estimate ˆ Γ ( A ) (with true A ) is a natural benchmark to which we comparethe estimates obtained using W . Indeed, the extension of asymptotic variance expression in [45] from the time series context to the context of graph signalsis straightforward, but the formula appears to be too long to be presented here because of the space limitation.

September 28, 2020 DRAFT7

TABLE I R ( A , W ) FOR A WITH α = 0 . AND W GIVEN BY (cid:15) = 0 , . , . . . , . AND (cid:15) = 0 , . , . . . , . . (cid:15) \ (cid:15) ˆ R ( A , W ) FROM

REPETITIONS FOR α = 0 . AND (cid:15) = 0 , . , . . . , . AND (cid:15) = 0 , . , . . . , . . (cid:15) \ (cid:15) In Tables I and II, the values of R ( A , W ) and ˆ R ( A , W ) are shown, respectively, when A is × matrix with α = 0 . and there are p = 4 independent components generatedusing (2) with θ = 0 , . , . , and . . For Table II, we generate 1000 datasets for each pair ( (cid:15) , (cid:15) ) and always generate a new W . In Table I, the sum of variances is an average for ten W ’s, even though SOV ( ˆ Γ ( W )) is quite stable for ﬁxed (cid:15) and (cid:15) . The simulation results matchthe theoretical values quite well. When looking at the results with respect to error probabilities (cid:15) and (cid:15) , it seems that GraDe is more sensitive to adding irrelevant edges than missing the realedges. However, notice that when α = 0 . , then in numbers of mislearned edges (cid:15) = 0 . corresponds to (cid:15) = 0 . .For four selected pairs ( (cid:15) , (cid:15) ) , Fig. 6 plots R ( A , W ) as a function of α that is used increating A . The curves display the averages of ten values given by different W ’s. As expected,the efﬁciency loss caused by inaccuracy in the adjacency matrix is the larger, the more sparsethe graph is. September 28, 2020 DRAFT8 a R ( A , W ) e = 0.05, e = 0.025 e = 0.1, e = 0.05 e = 0.2, e = 0.075 e = 0.4, e = 0.1 . . . . . . Fig. 6. Ratio of the theoretical variances as a function of α for four choices of ( (cid:15) , (cid:15) ) . V. CONCLUSION AND DISCUSSIONIn this paper, the effect of graph adjacency matrix mismatch has been analyzed and grapherror models for different types of graphs, e.g., directed/undirected and weighted/unweightedgraphs, have been developed. The complexity of the error models varies from simple Erd¨os-R´enyi type model to one which captures nonconstant edge mislearning probabilities. The latteris of interest because it has been reported in GSP literature that deleting one edge can haveimmensely larger effect than deleting another edge. Better understanding and formalization ofwhat kind of edges are important or why connecting some pairs of nodes is more harmfulthan others is therefore crucial. The graph error models have been applied for studying grapherror effects in graph signal ﬁltering and graph signal ICA applications, where both theoreticalarguments and numerical studies for real and synthetic data have been used. The results for

September 28, 2020 DRAFT9 different application examples differed in whether missing or extra links were more detrimental,and in the GARMA ﬁlter example, it was observed that the higher-order ﬁlters were moresensitive to graph errors. These ﬁndings suggest that the graph error effects need to be studiedcase by case, and that competing GSP methods may differ in terms of their robustness to grapherrors. Therefore, studying the robustness properties of the GSP methods as well as developingrobust GSP methods is of high importance for GSP in general.A

PPENDIX

APROOF OF THEOREM 1

A. Introduction to Graphons

Limits of sequences (when the number of nodes tends to inﬁnity) of dense graphs (suchgraphs in which each node is connected to positive percent of the other nodes) cannot be easilydeﬁned using graphs with countable set of nodes since there is no uniform distribution oncountably inﬁnite sets. Hence, uncountable cardinality of nodes is needed in order to have anatural graph sequence limit object. In this context graphon W is interpreted as a weightedgraph with uncountable set of nodes, W ( x, y ) giving the edge weight between nodes x and y .In [47], centrality measures of graphons were deﬁned and they were shown to be limit objectsof centrality measures of ﬁnite size graphs.Another role of graphons is a kernel in the kernel graph model, which consists of the triple G = ( N, W, µ ) , where N is the number of nodes, W is a graphon (or kernel) and µ is afunction/mapping from a probability space to the space [0 , N .A realization of a kernel graph is obtained by generating values u , . . . , u N from µ . Then the i th and the j th nodes are connected with probability W ( u i , u j ) independently from the otherpairs. Thus, the graphon can be seen here as sort of a density function for the graph structure .Typically µ is chosen to have uniform density on [0 , N .Some variants of the kernel graph model have been discussed in the literature. For example, N can be derived through a random process or µ = µ D can be taken as a deterministic functionthat gives sequence u i = i/N for i = 1 , . . . , N with probability one. For example, the Erd¨os-R´enyi model with probability parameter (cid:15) , which is of interest here, is obtained by choosing theconstant graphon W ( x, y ) = (cid:15) for all ≤ x, y ≤ and for any choice of µ . September 28, 2020 DRAFT0

The SBM with ﬁxed community sizes N , . . . , N r is given by the deterministic µ D and W = W SB satisfying W SB ( x, y ) = p k,l when (cid:80) k − i =1 N i /N < x ≤ (cid:80) ki =1 N i /N and (cid:80) l − i =1 N i /N < y ≤ (cid:80) li =1 N i /N where (cid:80) i =1 N i = 0 . When µ has the uniform distribution and W = W SB , we havea SBM with random community sizes, the expected sizes being N , . . . , N r .In the kernel graph model, the expected degree of a random node is ( N − (cid:90) (cid:90) W ( x, y ) dµ ( x ) dµ ( y ) and the range of expected degrees is ( N − (cid:90) W ( x, y ) dµ ( y ) , x ∈ [0 , . While the stochastic block model allows for a very heterogeneous degree distribution withthe extreme setup r = N and N = · · · = N N = 1 , such construction is quite cumbersome.Therefore, it is of interest to use continuous graphons which only have few parameters, such asthe exponential model [33] where the graphon is of the form W ( x, y ) = e − ( β ( x + y )+ β ) with β , β ≥ . Parameter β controls here the sparsity of the graph. The larger the value of β is, the more sparse the graph is. Moreover, increasing the parameter β makes the degreedistribution more heterogeneous.The eigenvalues of large normalized and unweighted graph adjacency matrices can be thenapproximated in terms of the eigenvalues and eigenfunctions of the kernel [35]. The eigenvalues λ and eigenfunctions of the kernel corresponding to graphon W can be found by solving theequation λf ( x ) = (cid:90) W ( x, y ) f ( y ) dy. Due to symmetry, any eigenvalue λ is real-valued. Examples of solving the eigenvalue/eigenfunctionproblem in the cases of Erd ¨os-R´enyi, stochastic block and exponential models can be foundin [33], where the behavior of the eigenvalues of ﬁnite graphs is also studied in terms ofsimulations. Thus, graphons have been proved to be useful in the analysis of graph sequences,but they are also convenient for formulating other theoretical results. Here, we are rather intrestedto justify the use of Erd¨os-R´enyi graphon as a basic model for modelling graph errors in graphsignals. September 28, 2020 DRAFT1

B. Proof of Theorem 1

Consider the random variable c W ( x, y ) where x and y are ﬁxed and W is picked ran-domly from W (cid:15) . First notice that the distribution is the same for all pairs ( x, y ) , and therefore E { c W ( x, y ) } = E { c } E { W ( x, y ) } = c(cid:15) . The result then follows by applying the law of largenumbers. A PPENDIX

BDERIVATION OF EXPECTED VALUE OF THE GRAPH AUTOCORRELATION FORTHE EXAMPLE IN SECTION IIILet us ﬁrst compute here the expectation in the ﬁrst equation in (5). The calculations of theexpectations in the other two equations are explained afterwards shortly. We start by substituting(2) into the expectation in the ﬁrst equation in (5), and opening the square E {(cid:107) z (cid:107) } = E {(cid:107) ( θ Ay + y ) (cid:107) } = E  N (cid:88) i =1 (cid:32) θ N (cid:88) k =1 a ik y k + y i − N N (cid:88) j =1 (cid:32) θ N (cid:88) l =1 a jl y l − y j (cid:33)(cid:33)  = E (cid:40) N (cid:88) i =1 (cid:32) θ N (cid:88) k =1 a ik y k + y i − N N (cid:88) j =1 (cid:32) θ N (cid:88) l =1 a jl y l − y j (cid:33)(cid:33) × (cid:32) θ N (cid:88) k (cid:48) =1 a ik y (cid:48) k + y i − N N (cid:88) j (cid:48) =1 (cid:32) θ N (cid:88) l (cid:48) =1 a j (cid:48) l (cid:48) y (cid:48) l − y (cid:48) j (cid:33)(cid:33)(cid:41) . Then after some algebraic manipulations of the latter, we obtain that E {(cid:107) z (cid:107) } = E (cid:40) N (cid:88) i =1 (cid:32) θ N (cid:88) k =1 a ik y k − N N (cid:88) j =1 (cid:32) θ N (cid:88) k =1 a ik a jk y k − θa ij y j (cid:33) + y i − N N (cid:88) j =1 (cid:0) θa ji y i − y i y j (cid:1) + θ N N (cid:88) j =1 ,j (cid:48) =1 ,l =1 a jl a j (cid:48) l y l + 2 θN N (cid:88) j =1 ,l =1 a jl y l + 1 N N (cid:88) j =1 y j (cid:33)(cid:33)(cid:41) = σ y (cid:0) N θ αa − N θ αa − N θ α a − N θαa + N − N θαa − N θ αa + N θ α a + 2 N θαa + 1 (cid:1) . September 28, 2020 DRAFT2

Dropping the negligible terms in the above expression, i.e., the zero-order terms of N andﬁrst-order terms of N that also include a , the ﬁrst equation in (5) can be approximated as N E {(cid:107) z (cid:107) } ≈ σ y (cid:0) ( α − α ) θ a N − αθa + 1 (cid:1) . For deriving the expected values in the other two equations of (5), there are sums over termsof the type w ij w i (cid:48) j (cid:48) a kl a k (cid:48) l (cid:48) y m . To calculate the expected values of them, we can use the fact that w ij and a kl are independent if min ( i, j ) (cid:54) = min ( k, l ) or max ( i, j ) (cid:54) = max ( k, l ) , and then P ( w ij = w and a kl = a ) = P ( w ij = w ) P ( a kl = a ) = ( α (1 − (cid:15) ) + (1 − α ) (cid:15) )) α. If min ( i, j ) = min ( k, l ) and max ( i, j ) = max ( k, l ) , we have P ( w ij = w and a kl = a ) = (1 − (cid:15) ) α. R EFERENCES [1] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs,”

IEEE Trans. Signal Process. , vol. 61, no. 7,pp. 1644–1656, Jul. 2013.[2] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging ﬁeld of signal processingon graphs: Extending high-dimensional data analysis to networks and other irregular domains,”

IEEE Signal Process.Magazine , vol. 30, no. 3, pp. 83–98, May 2013.[3] A. Ortega, P. Frossard, J. Kovacevic, J. M. F. Moura and P. Vandergheynst, “Graph signal processing: Overview, challenges,and applications,”

Proceedings of the IEEE , vol. 106, no. 5, 808–828, May 2018.[4] S. Y. Tu and A. H. Sayed, “Mobile adaptive networks,”

IEEE J. Select. Topics Signal Process. , vol. 5, no. 4, pp. 649–664,Apr. 2011.[5] D. Acemoglu, G. Como, F. Fagnani, and A. Ozdaglar, “Opinion ﬂuctuations and disagreement in social networks,”

Mathematics of Operations Research , vol. 38, no. 1, pp. 1–27, Jan. 2013.[6] W. Huang, L. Goldsberry, N. F. Wymbs, S. T. Grafton, D. S. Bassett, and A. Ribeiro, “Graph frequency analysis of brainsignals,”

IEEE J. Select. Topics Signal Process. , vol. 10, no. 7, pp. 1189–1203, Oct. 2016.[7] F. Bl¨ochl, A. Kowarsch, and F. J. Theis, “Second-order source separation based on prior knowledge realized in a graphmodel,” in “Latent Variable Analysis and Signal Separation,” LNCS (V. Vigneron, V. Zarzoso, E. Moreau, R. Gribonval,and E. Vincent, eds.), vol. 6365, (Heidelberg), pp. 434–441, Springer, 2010.[8] S. Chen, R. Varma, A. Sandryhaila, and J. Kovacevic, “Discrete signal processing on graphs: Sampling theory,”

IEEETrans. Signal Process. , vol. 63, no. 24, pp. 6510–6523, Dec. 2015.[9] A. G. Marques, S. Segarra, G. Leus, and A. Ribeiro, “Sampling of graph signals with successive local aggregations,”

IEEETrans. Signal Process. vol. 64, no. 7, pp. 1832–1843, Apr. 2016.[10] A. G. Marques, S. Segarra, G. Leus, and A. Ribeiro, “Stationary graph processes and spectral estimation,”

IEEE Trans.Signal Process. , vol. 65, no. 22, pp. 5911–5926, Nov. 2017.[11] S. Segarra, A. G. Marques, G. Leus, and A. Ribeiro, “Reconstruction of graph signals through percolation from seedingnodes,”

IEEE Trans. Signal Process. vol. 64, no. 16, pp. 4363–4378, Aug. 2016.

September 28, 2020 DRAFT3 [12] A. Sandryhaila, S. Kar, and J. M. Moura, “Finite-time distributed consensus through graph ﬁlters,” in

Proc. IEEE Int.Conf. Acoust., Speech, Signal Process. , Florence, Italy, May 2014, pp. 1080–1084.[13] S. Safavi and U. A. Khan, “Revisiting ﬁnite-time distributed algorithms via successive nulling of eigenvalues,”

IEEE SignalProcess. Letters , vol. 22, no. 1, pp. 54–57, Jan. 2015.[14] X. Dong, D. Thanou, P. Frossard, and P. Vandergheynst, “Learning Laplacian matrix in smooth graph signal representations,”

IEEE Trans. Signal Process. , vol. 64, no. 23, pp. 6160–6173, Dec. 2016.[15] S. P. Chepuri, S. Liu, G. Leus, and A. O. Hero, “Learning sparse graphs under smoothness prior,” in

Proc. IEEE Int. Conf.Acoust., Speech, Signal Process. , New Orleans, Louisiana, USA, Mar. 2017, pp. 6508–6512.[16] P. A. Traganitis Y. Shen, and G.B. Giannakis, “Network toplogy inference via elastic net structural equation models,” in

Proc. European Signal Processing Conference , Kos island, Greece, Aug.-Sept. 2017, pp. 156–160.[17] B. Pasdeloup, V. Gripon, G. Mercier, D. Pastor, and M. G.Rabbat, “Characterization and inference of graph diffusionprocesses from observations of stationary signals,”

IEEE Trans. Signal and Information Processing over Networks , vol. 4,no. 3, pp. 481–496, Sept. 2018.[18] S. Segarra, S., A. G. Marques, G. Mateos, and A. Ribeiro, “Network topology inference from spectral templates,”

IEEETrans. Signal and Information Processing over Networks , vol. 3, no. 3, pp. 467–483, Sept. 2017.[19] E. Ceci and S. Barbarossa, “Graph signal processing in the presence of topology uncertainties,”

IEEE Trans. Signal Process. ,vol. 68, pp. 1558–1573, Feb. 2020.[20] E. Ceci, Y. Shen, G. B. Giannakis and S. Barbarossa, “Graph-based learning under perturbations via total least-squares,”

IEEE Trans. Signal Process. , vol. 68, pp. 2870–2882, Feb. 2020.[21] F. Gama, J. Bruna, and A. Ribeiro, “Stability properties of graph neural networks”, arXiv preprint arXiv:1905.04497 ,Sep. 2019.[22] S. Segarra, A. G. Marques, and A. Ribeiro, “Optimal graph-ﬁlter design and applications to distributed linear networkoperators,”

IEEE Trans. Signal Process. , vol. 65, no. 15, pp. 4117–4131, Aug. 2017.[23] S. Kruzick and J. M. Moura, “Optimal ﬁlter design for signal processing on random graphs: Accelerated consensus,”

IEEETrans. Signal Process. , vol. 66, no. 5, pp. 1258–1272, Mar. 2017.[24] S. Segarra, and A. Ribeiro, “Stability and continuity of centrality measures in weighted graphs,”

IEEE Trans. SignalProcess. , vol. 64, no. 3, pp. 543–555, Feb. 2016.[25] E. Isuﬁ, A. Loukas, A. Simonetto and G. Leus, “Filtering random graph processes over random time-varying graphs,”

IEEE Trans. Signal Process. , vol. 65, no. 16, pp. 4406–4421, Aug. 2017.[26] E. Isuﬁ, A. Loukas, A. Simonetto and G. Leus, “On the transferability of spectral graph ﬁlters,” in

Proc. Int. Conf. SamplingTheory and Applications , IEEE, Bordeaux, France, July 2019, pp. 1–5.[27] T. Cai, T. Liang, and A. Rakhlin, “On detection and structural reconstruction of small-world random networks,”

IEEETrans. Network Science and Engineering , vol. 4, no. 3, pp. 165–176, 2017.[28] E. Abbe, A. S. Bandeira, A. Bracher, and A. Singer, “Decoding binary node labels from censored edge measurements:Phase transition and efﬁcient recovery,”

IEEE Trans. Network Science and Engineering , vol. 1, no. 1, 10–22, 2014.[29] S.A. Vorobyov, A.B. Grshman, and Z.-Q. Luo, “Robust adaptive beamforming using worst-case performance optimization:A solution to the signal mismatch problem,”

IEEE Trans. Signal Process. , vol. 51, no. 2, pp. 313–324, Feb. 2003.[30] J. Miettinen, S.A. Vorobyov, and E. Ollila, “Graph error effect in graph signal processing,” in

Proc. IEEE Int. Conf. Acoust.,Speech, Signal Process. , Calgary, Canada, Apr. 2018, pp. 4164–4168.[31] P. Erd¨os and A. R´enyi, “On random graphs,”

Publ. Math. Debrecen , vol. 6, pp. 290–297, 1959.

September 28, 2020 DRAFT4 [32] P.W. Holland, K. Laskey, and S. Leinhardt, “Stochastic blockmodels: First steps,”

Social Networks , vol. 5, no. 2, pp. 109–137, Jun. 1983.[33] M. W. Morency and G. Leus, “Signal processing on kernel-based random graphs,” in

Proc. European Signal ProcessingConference , Kos Island, Greece, Aug.–Sept. 2017, pp. 385–389.[34] L. Lov´asz and B. Szegedy, “Limits of dense graph sequences,”

Journal of Combinatorial Theory B , vol. 96, pp. 933–957,2006.[35] L. Lov´asz,

Large networks and graph limits.

Providence: American Mathematical Society, vol. 60, 2012.[36] B. Bollob´as, S. Janson and O. Riordan, “The phase transition in inhomogeneous random graphs,”

Random Structures andAlgorithms , vol. 31, no. 1, pp. 3–122, Jan. 2007.[37] K. Avrachenkov, L. Cottatellucci, L. and A. Kadavankandy, “Spectral properties of random matrices for stochastic blockmodel,” in

Proc. IEEE International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks(WiOpt) , May 2015, pp. 537–544.[38] S. P. Chepuri and G. Leus, “Graph sampling for covariance estimation,”

IEEE Trans. Signal and Information Processingover Networks , vol. 3, no. 3, pp. 451–466, Sept. 2017.[39] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs: Frequency analysis,”

IEEE Trans. SignalProcessing , vol. 62, no. 12, pp. 3042–3054, Jun. 2014.[40]

National Climatic Data Center , ftp://ftp.ncdc.noaa.gov/pub/data/gsod/, 2016.[41] E. Isuﬁ, A. Loukas, A. Simonetto and G. Leus, “Autoregressive moving average graph ﬁltering,”

IEEE Trans. SignalProcessing , vol. 65, no. 2, pp. 274–288, Jan. 2017.[42] J. Miettinen, E. Nitzan, S. A. Vorobyov and E. Ollila, “Graph signal processing meets blind source separation”, arXivpreprint arXiv:2008.10461 , Aug. 2020.[43] D. Clarkson, “Remark AS R74: A least squares version of algorithm AS 211: The F-G diagonalization algorithm,”

AppliedStatistics , vol. 37, pp. 317–321, 1988.[44] A. Belouchrani, K. Abed-Meraim, J. F. Cardoso, and E. Moulines, “A blind source separation technique using second-orderstatistics,”

IEEE Trans. Signal Process. , vol. 45, no. 2, pp. 434–444, Feb. 1997.[45] J. Miettinen, K. Illner, K. Nordhausen, H. Oja, S. Taskinen, and F. Theis, “Separation of uncorrelated stationary time seriesusing autocovariance matrices,”

Journal of Time Series Analysis , vol. 37, no. 3, pp. 337–354, Mar. 2016.[46] P. Ilmonen, K. Nordhausen, H. Oja, and E. Ollila, “A new performance index for ICA: Properties computation andasymptotic analysis,” in “Latent Variable Analysis and Signal Separation,” LNCS (V. Vigneron, V. Zarzoso, E. Moreau,R. Gribonval, and E. Vincent, eds.), vol. 63–65, (Heidelberg), pp. 229–236, Springer, 2010.[47] M. Avella-Medina, F. Parise, M. Schaub, and S. Segarra, “Centrality measures for graphons: Accounting for uncertaintyin networks,”

IEEE Trans. Network Science and Engineering , vol. 7, no. 1, pp. 520–537, 2018., vol. 7, no. 1, pp. 520–537, 2018.