[PDF] Polynomial Filtering for Fast Convergence in Distributed Consensus

Abstract

In the past few years, the problem of distributed consensus has received a lot of attention, particularly in the framework of ad hoc sensor networks. Most methods proposed in the literature address the consensus averaging problem by distributed linear iterative algorithms, with asymptotic convergence of the consensus solution. The convergence rate of such distributed algorithms typically depends on the network topology and the weights given to the edges between neighboring sensors, as described by the network matrix. In this paper, we propose to accelerate the convergence rate for given network matrices by the use of polynomial filtering algorithms. The main idea of the proposed methodology is to apply a polynomial filter on the network matrix that will shape its spectrum in order to increase the convergence rate. Such an algorithm is equivalent to periodic updates in each of the sensors by aggregating a few of its previous estimates. We formulate the computation of the coefficients of the optimal polynomial as a semi-definite program that can be efficiently and globally solved for both static and dynamic network topologies. We finally provide simulation results that demonstrate the effectiveness of the proposed solutions in accelerating the convergence of distributed consensus averaging problems.

Full PDF

aa r X i v : . [ c s . I T ] F e b Polynomial Filtering for Fast Convergence in Distributed Consensus

Eﬀrosyni Kokiopoulou and Pascal Frossard ∗ Ecole Polytechnique F´ed´erale de Lausanne (EPFL)Signal Processing Laboratory (LTS4)Lausanne - 1015, Switzerland { effrosyni.kokiopoulou,pascal.frossard } @epfl.ch Abstract

In the past few years, the problem of distributed consensus has received a lot of attention,particularly in the framework of ad hoc sensor networks. Most methods proposed in the liter-ature address the consensus averaging problem by distributed linear iterative algorithms, withasymptotic convergence of the consensus solution. The convergence rate of such distributedalgorithms typically depends on the network topology and the weights given to the edges be-tween neighboring sensors, as described by the network matrix. In this paper, we propose toaccelerate the convergence rate for given network matrices by the use of polynomial ﬁlteringalgorithms. The main idea of the proposed methodology is to apply a polynomial ﬁlter on thenetwork matrix that will shape its spectrum in order to increase the convergence rate. Suchan algorithm is equivalent to periodic updates in each of the sensors by aggregating a few ofits previous estimates. We formulate the computation of the coeﬃcients of the optimal poly-nomial as a semi-deﬁnite program that can be eﬃciently and globally solved for both staticand dynamic network topologies. We ﬁnally provide simulation results that demonstrate theeﬀectiveness of the proposed solutions in accelerating the convergence of distributed consensusaveraging problems.

We consider the problem of distributed consensus [1] that has become recently very interestingespecially in the context of ad hoc sensor networks. In particular, the problem of distributedaverage consensus has attracted a lot of research eﬀorts due to its numerous applications in diverseareas. A few examples include distributed estimation [2], distributed compression [3], coordinationof networks of autonomous agents [4] and computation of averages and least-squares in a distributedfashion (see e.g., [5, 6, 7, 8] and references therein).In general the main goal of distributed consensus is to reach a global solution using only localcomputation and communication while staying robust to changes in the network topology. Given theinitial values at the sensors, the problem of distributed averaging is to compute their average at eachsensor using distributed linear iterations. Each distributed iteration involves local communicationamong the sensors. In particular, each sensor updates its own local estimate of the average by aweighted linear combination of the corresponding estimates of its neighbors. The weights that are ∗ This work has been partly supported by the Swiss NSF, under NCCR IM2. W typically drive the importance of the measurements ofthe diﬀerent neighbors.One of the important characteristics of the distributed consensus algorithms is the rate ofconvergence to the asymptotic solution. In many cases, the average consensus solution can bereached by successive multiplications of W with the vector of initial sensor values. Furthermore,it has been shown in [5] that in the case of ﬁxed network topology, the convergence rate dependson the second largest eigenvalue of W , λ ( W ). In particular, the convergence is faster when thevalue of λ ( W ) is small. Similar convergence results have been proposed recently in the case ofrandom network topology [9, 10], where the convergence rate is governed by the expected value ofthe λ ( W ), E [ λ ( W )].The main research direction so far focuses on the computation of the optimal weights W thatyield the fastest convergence rate to the consensus solution [5, 6, 7]. In this work, we diverge frommethods that are based on successive multiplications of W , and we rather allow the sensors to usetheir previous estimates, in order to accelerate the convergence rate. This is similar in spirit tothe works proposed [12, 13] that reach the consensus solution in a ﬁnite number of steps. Theyuse respectively extrapolation methods and linear dynamical system formulation for ﬁxed networktopologies. In order to address more generic network topologies, we propose here to use a matrixpolynomial p applied on the weight matrix W in order to shape its spectrum. Given the fact thatthe convergence rate is driven by λ ( W ), it is therefore possible to impact on the convergence rateby careful design of the polynomial p . In the implementation viewpoint, working with p ( W ) isequivalent to each sensor aggregating its value periodically using its own previous estimates. Wefurther formulate the problem of the computation of the optimal coeﬃcients for both static anddynamic network topologies. We show that this problem can be eﬃciently and globally solved inboth cases by the deﬁnition of a semi-deﬁnite program (SDP).The rest of this paper is organized as follows. In Section 2 we review the main convergenceresults of average consensus in both ﬁxed and dynamic random network topologies. Next, in Section3 we introduce the polynomial ﬁltering methodology and discuss its implementation for distributedconsensus problems. We compute the optimal polynomial ﬁlter in Section 4 for both static anddynamic network topologies. In Section 5 we provide simulation results that verify the validity andthe eﬀectiveness of our method. Related work is ﬁnally presented in Section 6. Let us ﬁrst deﬁne formally the problem of distributed consensus averaging. Assume that initiallyeach sensor i reports a scalar value x ( i ) ∈ R . We denote by x = [ x (1) , . . . , x ( n )] ⊤ ∈ R n thevector of initial values on the network. Denote by µ = 1 n n X i =1 x ( i ) (1)the average of the initial values of the sensors. However, one rarely has a complete view of thenetwork. The problem of distributed averaging therefore becomes typically to compute µ at eachsensor by distributed linear iterations. In what follows we review the main convergence results fordistributed consensus algorithms on both ﬁxed and random network topologies.2 .1 Static network topology We model the static network topology as an undirected graph G = ( V , E ) with nodes V = { , . . . , n } corresponding to sensors. An edge ( i, j ) ∈ E is drawn if and only if sensor i can communicate withsensor j . We denote the set of neighbors for node i as N i = { j | ( i, j ) ∈ E} . Unless otherwise stated,we assume that each graph is simple i.e., no loops or multiple edges are allowed.In this work, we consider distributed linear iterations of the following form x t +1 ( i ) = W ii x t ( i ) + X j ∈N i W ij x t ( j ) , (2)for i = 1 , . . . , n , where x t ( j ) represents the value computed by sensor j at iteration t . Since thesensors communicate in each iteration t , we assume that they are synchronized. The parameters W ij denote the edge weights of G . Since each sensor communicates only with its direct neighbors, W ij = 0 when ( i, j ) / ∈ E . The above iteration can be compactly written in the following form x t +1 = W x t , (3)or more generally x t = (cid:16) t − Y i =0 W (cid:17) x = W t x . (4)We call the matrix W that gathers the edge weights W ij , as the weight matrix. Note that W is a sparse matrix whose sparsity pattern is driven by the network topology. We assume that W is symmetric, and we denote its eigenvalue decomposition as W = Q Λ Q ⊤ . The (real) eigenvaluescan further be arranged as follows: λ ( W ) ≥ λ ( W ) ≥ . . . ≥ λ n − ( W ) ≥ λ n ( W ) . (5)The distributed linear iteration given in eq. (3) converges to the average if and only iflim t →∞ W t = ⊤ n , (6)where is the vector of ones [5]. Indeed, notice that in this case x ∗ = lim t →∞ x t = lim t →∞ W t x = ⊤ n x = µ . It has been shown that for ﬁxed network topology the convergence rate of eq. (3) depends onthe magnitude of the second largest eigenvalue λ ( W ) [5]. The asymptotic convergence factor isdeﬁned as r asym ( W ) = sup x = µ lim t →∞ (cid:16) k x t − µ k k x − µ k (cid:17) /t , (7)and the per-step convergence factor is written as r step ( W ) = sup x = µ k x t +1 − µ k k x t − µ k . (8)Furthermore, it has been shown that the convergence rate relates to the spectrum of W , asgiven by the following theorem [5]. 3 heorem 1 The convergence given by eq. (6) is guaranteed if and only if ⊤ W = ⊤ , (9) W = , (10) ρ (cid:0) W − ⊤ n (cid:1) < , (11) where ρ ( · ) denotes the spectral radius of a matrix. Furthermore, r asym ( W ) = ρ (cid:0) W − ⊤ n (cid:1) , (12) r step ( W ) = k W − ⊤ n k . (13)According to the above theorem, √ n is a left and right eigenvector of W associated with theeigenvalue one, and the magnitude of all other eigenvalues is strictly less than one. Note ﬁnally, thatsince W is symmetric, the asymptotic convergence factor coincides with the per-step convergencefactor, which implies that the relations (12) and (13) are equivalent.We give now an alternate proof of the above theorem that illustrates the importance of thesecond largest eigenvalue in the convergence rate. We expand the initial state vector x to theorthogonal eigenbasis Q of W ; that is, x = ν √ n + n X j =2 ν j q j , where ν = h √ n , x i and ν j = h q j , x i . We further assume that ν = 0. Then, eq. (4) implies that x t = W t x = W t (cid:0) ν √ n + n X j =2 ν j q j (cid:1) = ν √ n + n X j =2 ν j W t q j = ν √ n + n X j =2 ν j λ tj q j . Observe now that if | λ j | < , ∀ j ≥

2, then in the limit, the second term in the above equationdecays and x ∗ = lim t →∞ x t = ν √ n = µ . We see that the smaller the value of λ ( W ), the faster the convergence rate. Analogous conver-gence results hold in the case of dynamic network topologies discussed next. Let us consider now networks with random link failures, where the state of a link changes overthe iterations. In particular, we use the random network model proposed in [9, 10]. We assume4hat the network at any arbitrary iteration t is G ( t ) = ( V , E ( t )), where E ( t ) denotes the edge set atiteration t , or equivalently at time instant t . Since the network is dynamic, the edge set changesover the iterations, as links fail at random. We assume that E ( t ) ⊆ E ∗ , where E ∗ ⊆ V × V is theset of realizable edges when there is no link failure.We also assume that each link ( i, j ) fails with a probability (1 − p ij ), independently of the otherlinks. Two random edge sets E ( t ) and E ( t ) at diﬀerent iterations t and t are independent. Theprobability of forming a particular E ( t ) is thus given by Q ( i,j ) ∈E ( t ) p ij . We deﬁne the matrix P as P ij = ( p ij if ( i, j ) ∈ E ∗ and i = j, . (14)The matrix P is symmetric and its diagonal elements are zero, since it corresponds to a simplegraph. It represents the probabilities of edge formation in the network, and the edge set E ( t ) istherefore a random subset of E ∗ driven by the P matrix. Finally, the weight matrix W becomesdependent on the edge set since only the weights of existing edges can take non zero values.In the dynamic case, the distributed linear iteration of eq. (2) becomes x t +1 ( i ) = W ii ( t ) x t ( i ) + X j ∈N i W ij ( t ) x t ( j ) (15)or in compact form, x t +1 = W t x t , (16)where W t denotes the weight matrix corresponding to the graph realization G ( t ) of iteration t . Theiterative relation given by eq. (16) can be written as x t = (cid:16) t − Y i =0 W i (cid:17) x . Clearly, x t now represents a stochastic process since the edges are drawn randomly. The convergencerate to the consensus solution therefore depends on the behavior of the product Q t − i =0 W i . We saythat the algorithm converges if ∀ x ∈ R n , lim t →∞ E k x t − µ k = 0 . (17)We review now some convergence results from [10], which ﬁrst shows that Lemma 1

For any x ∈ R n , k x t +1 − µ k ≤ (cid:16) t Y i =0 ρ (cid:0) W i − ⊤ n (cid:1)(cid:17) k x − µ k . It leads to the following convergence theorem [10] for dynamic networks.

Theorem 2 If E [ ρ (cid:0) W − ⊤ n (cid:1)(cid:17) ] < , the vector sequence { x t } ∞ t =0 converges in the sense of eq.(17).

5e deﬁne the convergence factor in dynamic network topologies as r d ( W ) = E h ρ (cid:0) W − ⊤ n (cid:1)i . This factor depends in general on the spectral properties of the induced network matrix and drivesthe convergence rate of eq. (16). The authors in [14] show that | λ ( E [ W ]) | < As we have seen above, the convergence rate of the distributed consensus algorithms depends ingeneral on the spectral properties of an induced network matrix. This is the case for both ﬁxed andrandom network topologies. Most of the research work has been devoted to ﬁnding weight matrix W for accelerating the convergence to the consensus solution when sensors only use their currentestimates. We choose a diﬀerent approach where we exploit the memory of sensors, or the valuesof previous estimates in order to augment to convergence rate.Therefore, we have proposed in our previous work [12] the Scalar Epsilon Algorithm (SEA)for accelerating the convergence rate to the consensus solution. SEA belongs to the family ofextrapolation methods for accelerating vector sequences, such as eq. (3). These methods exploitthe fact that the ﬁxed point of the sequence belongs to the subspace spanned by any ℓ +1 consecutiveterms of it, where ℓ is the degree of the minimal polynomial of the sequence generator matrix (formore details, see [12] and references therein). SEA is a low complexity algorithm, which is ideal forsensor networks and it is known to reach the consensus solution in 2 ℓ steps. However, ℓ is unknownin practice, so one may use all the available terms of the vector sequence. Hence, the memoryrequirements of SEA are O ( T ), where T is the number of terms. Moreover, SEA assumes that thesequence generator matrix (e.g., W in the case of eq. (3)) is ﬁxed, so that it does not adapt easilyto dynamic network topologies.In this paper, we propose a more ﬂexible algorithm based on the polynomial ﬁltering technique.Polynomial ﬁltering permits to “shape” the spectrum of a certain symmetric weight matrix, inorder to accelerate the convergence to the consensus solution. Similarly to SEA, it allows thesensors to use the value of their previous estimates. However, the polynomial ﬁltering methodologyintroduced below presents three main advantages: (i) it is robust to dynamic topologies (ii) it hasexplicit control on the convergence rate and (iii) its memory requirements can be adjusted to thememory constraints imposed by the sensor. 6 .2 Polynomial ﬁltering Starting from a given (possibly optimal) weight matrix W , we propose the application of a polyno-mial ﬁlter on the spectrum of W in order to impact the magnitude of λ ( W ) that mainly drives theconvergence rate. Denote by p k ( λ ) the polynomial ﬁlter of degree k that is applied on the spectrumof W , p k ( λ ) = α + α λ + α λ + . . . + α k λ k . (18)Accordingly, the matrix polynomial is given as p k ( W ) = α I + α W + α W + . . . + α k W k . (19)Observe now that p k ( W ) = p k ( Q Λ Q ⊤ )= α I + α ( Q Λ Q ⊤ ) + . . . + α k ( Q Λ k Q ⊤ )= Qp k (Λ) Q ⊤ , (20)which implies that the eigenvalues of p k ( W ) are simply the polynomial ﬁltered eigenvalues of W i.e., p k ( λ i ( W )) , i = 1 , . . . , n .In the implementation level, working on p k ( W ) implies a periodic update of the current sensor’svalue with a linear combination of its previous values. To see why this is true, we observe that: x t + k +1 = p k ( W ) x t = α x t + α W x t + . . . + α k W k x t = α x t + α x t +1 + . . . + α k x t + k . (21)A careful design of p k may impact the convergence rate dramatically. Then, each sensor typicallyapplies polynomial ﬁltering for distributed consensus by following the main steps tabulated inAlgorithm 1.Note that, for both ﬁxed and random network topology cases, the α k ’s are computed oﬀ-lineassuming that W and respectively E [ W ] are known a priori. In what follows, we propose diﬀerentapproaches for computing the coeﬃcients α i of the ﬁlter p k . One simple and rather intuitive approach for the design of the polynomial p k ( λ ) is to use Hermiteinterpolation. Recall that the objective is to dampen the smallest eigenvalues λ , . . . , λ n of W ,while keeping the eigenvalue one intact. Therefore, we assume that the spectrum of W lies in aninterval [ a,

1] and we impose smoothness constraints of p k at the left endpoint a . In particular, thepolynomial p k ( λ ) : [ a, → R that we seek, will be determined by the following constraints: p k ( a ) = 0 , p ( i ) k ( a ) = 0 , i = 1 , . . . , k − p k (1) = 1 , (23)where p ( i ) k ( a ) denotes the i -th derivative of p k ( λ ) evaluated at a . The dampening is achievedby imposing smoothness constraints of the polynomial on the left endpoint of the interval. The7 lgorithm 1 Polynomial Filtered Distributed Consensus Input : polynomial coeﬃcients α , . . . , α k +1 , tolerance ǫ . Output : average estimate ¯ µ . Initialization : ¯ µ = x ( i ). Set the iteration index t = 1. repeat if mod( t, k + 1) ==0 then x t ( i ) = α x t − k − ( i )+ α x t − k ( i )+ α x t − k +1 ( i )+ . . . + α k x t − ( i ) { polynomial ﬁltered update } x t ( i ) = W ii x t ( i ) + P j ∈N i W ij x t ( j ). else x t ( i ) = W ii x t − ( i ) + P j ∈N i W ij x t − ( j ). end if Increase the iteration index t = t + 1. ¯ µ t = x t ( i ). until ¯ µ t − ¯ µ t − < ǫ Figure 1: Newton’s polynomial of various degrees k .computed polynomial will have degree equal to k . Finally, the coeﬃcients of p k that satisﬁes theabove constraints can be computed by the Newton’s divided diﬀerences table.Figure 1 shows the form of p k ( λ ) for a = 0 and diﬀerent values of the degree k . As k increases,the dampening of the small eigenvalues becomes more eﬀective. It is worth mentioning that thedesign of Newton’s polynomial does not depend on the network size or the size of the weightmatrix W . What is only needed is a left endpoint a , which encloses the spectrum of W , as wellas the desired degree k , which moreover may be imposed by memory constraints. This featureof Newton’s polynomial is very interesting and it is particularly appealing in the case of dynamicnetwork topologies. However, the above polynomial design is mostly driven by intuitive arguments,which tend to obtain small eigenvalues for faster convergence. In the following section, we providean alternative technique for computing the polynomial ﬁlter that optimizes the convergence rate.8 Optimal polynomial ﬁlters

We are now interested in ﬁnding the polynomial that leads to the fastest convergence of lineariteration described in eq. (3), for a given weight matrix W and a certain degree k . For notationalease, we call W p = p k ( W ) = P ki =0 α i W i . According to Theorem 1, the optimal polynomial is theone that minimizes the second largest eigenvalue of W p , i.e., ρ (cid:0) W p − ⊤ n (cid:1) . Therefore, we need tosolve an optimization problem where the optimization variables are the k +1 polynomial coeﬃcients α , . . . , α k and the objective function is the spectral radius of W p − ⊤ n .Optimization problem: OPT1 min α ∈ R k +1 ρ (cid:0) P ki =0 α i W i − ⊤ n (cid:1) subject to (cid:0) P ki =0 α i W i (cid:1) = .Interestingly, the optimization problem OPT1 is convex. First, its objective function is convex,as stated in Lemma 2. Lemma 2

For a given symmetric weight matrix W and degree k , ρ (cid:0) P ki =0 α i W i − ⊤ n (cid:1) is a convexfunction of the polynomial coeﬃcients α i ’s.Proof : Let β, γ ∈ R k +1 and 0 ≤ θ ≤

1. Since W is symmetric, P ki =0 α i W i is also symmetric.Hence, the spectral radius is equal to the matrix 2-norm. Thus, we have ρ (cid:0) k X i =0 ( θβ i + (1 − θ ) γ i ) W i − ⊤ n (cid:1) = (cid:13)(cid:13)(cid:13) k X i =0 ( θβ i + (1 − θ ) γ i ) W i − ⊤ n (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) θ (cid:16) k X i =0 β i W i − ⊤ n (cid:17) + (1 − θ ) (cid:16) k X i =0 γ i W i − ⊤ n (cid:17)(cid:13)(cid:13)(cid:13) ≤ θ (cid:13)(cid:13)(cid:13) k X i =0 β i W i − ⊤ n (cid:13)(cid:13)(cid:13) + (1 − θ ) (cid:13)(cid:13)(cid:13) k X i =0 γ i W i − ⊤ n (cid:13)(cid:13)(cid:13) ≤ θρ (cid:16) k X i =0 β i W i − ⊤ n (cid:17) + (1 − θ ) ρ (cid:16) k X i =0 γ i W i − ⊤ n (cid:17) , which proves the Lemma. In addition, the constraint of OPT1 is linear which implies that the setof feasible α i ’s is convex. As OPT1 minimizes a convex function over a convex set, the optimizationproblem is indeed convex.In order to solve OPT1, we use an auxiliary variable s to bound the objective function, andthen we express the spectral radius constraint as a linear matrix inequality (LMI). Thus, we needto solve the following optimization problem. 9ptimization problem: OPT2 min s ∈ R ,α ∈ R k +1 s subject to − sI (cid:22) P ki =0 α i W i − ⊤ n (cid:22) sI , (cid:0) P ki =0 α i W i (cid:1) = .Recall that since W is symmetric, W p = P ki =0 α i W i will be symmetric as well. Hence, the constraint W p = is suﬃcient to ensure that will be also a left eigenvector of W p . The spectral radiusconstraint, − sI (cid:22) W p − ⊤ n (cid:22) sI ensures that that all the eigenvalues of W p other than the ﬁrst one, are less or equal to s . Dueto the LMI, the above optimization problem becomes equivalent to a semi-deﬁnite program (SDP)[16]. SDPs are convex problems and can be globally and eﬃciently solved. The solution to OPT2is therefore computed eﬃciently in practice, where the SDP only has a moderate number of k + 2unknowns (including s ). We extend now the idea of polynomial ﬁltering to dynamic network topologies. Theorem 2 suggeststhat the convergence rate in the random network topology case is governed by E h ρ (cid:0) W − ⊤ n (cid:1)i .Since W depends on a dynamic edge set, p k ( W ) now becomes stochastic. Following the sameintuition as above, we could form an optimization problem, similar to OPT1, whose objectivefunction would be E [ ρ (cid:0) P ki =0 α i W i − ⊤ n (cid:1) ]. Although this objective function can be shown to beconvex, its evaluation is hard and typically requires several Monte Carlo simulations steps.Recall that the convergence rate of eq. (16) is related to the second largest eigenvalue of E [ W ],which is much easier to evaluate. Let W denote the average weight matrix E [ W ]. We then observethat E h ρ (cid:0) W − ⊤ n (cid:1)i ≥ ρ (cid:16) W − ⊤ n (cid:17) , which is due to Lemma 2 and Jensen’s inequality. The above inequality implies that the spectralradius ρ (cid:16) W − ⊤ n (cid:17) shall be small in order E h ρ (cid:0) W − ⊤ n (cid:1)i to be small too. Additionally, the authorsprovide experimental evidence in [10], which indicates that ρ (cid:16) W − ⊤ n (cid:17) seems to be closely relatedto the convergence rate of eq. (16).Based on the above facts, we propose to build our polynomial ﬁlter based on W . Hence, weformulate the following optimization problem for computing the polynomial coeﬃcients α i ’s in therandom network topology case. Optimization problem: OPT3 min α ∈ R k +1 ρ (cid:0) P ki =0 α i W i − ⊤ n (cid:1) subject to (cid:0) P ki =0 α i W i (cid:1) = .10PT3 could be viewed as the analog of OPT1 for the case of dynamic network topology. Themain diﬀerence is that we work on W , whose eigenvalues can be easily obtained. Using again theauxiliary variable s , we reach the following formulation for obtaining the α i ’s.Optimization problem: OPT4 min s ∈ R ,α ∈ R k +1 s subject to − sI (cid:22) P ki =0 α i W i − ⊤ n (cid:22) sI , (cid:0) P ki =0 α i W i (cid:1) = .Once W has been computed, this optimization problem is solved eﬃciently by a SDP similarlyto the case of static networks. In this section, we provide simulation results which show the eﬀectiveness of the polynomial ﬁlteringmethodology. First we introduce a few weight matrices that have been extensively used in thedistributed averaging literature. Suppose that d ( i ) denotes the degree of the i -th sensor. It hasbeen shown in [5, 6] that iterating eq. (3) with the following matrices leads to convergence to µ . • Maximum-degree weights. The Maximum-degree weight matrix is W ij =  n if ( i, j ) ∈ E , − d ( i ) n i = j, . (24) • Metropolis weights. The Metropolis weight matrix is W ij =  { d ( i ) ,d ( j ) } if ( i, j ) ∈ E , − P ( i,k ) ∈E W ik i = j, . (25) • Laplacian weights. Suppose that A is the adjacency matrix of G and D is a diagonal matrixwhich holds the vertex degrees. The Laplacian matrix is deﬁned as L = D − A and theLaplacian weight matrix is deﬁned as W = I − γL, (26)where the scalar γ must satisfy γ < /d max [5].The sensor networks are built using the random geographic graph model [19]. In particular,we place n nodes uniformly distributed on the 2-dimensional unit area. Two nodes are adjacent iftheir Euclidean distance is smaller than r = q log( N ) N in order to guarantee connectedness of thegraph with high probability [19].Finally, the SDP programs for optimizing the polynomial ﬁlters are solved in Matlab using theSeDuMi [18] solver . Publically available at: http://sedumi.mcmaster.ca/ .65 0.7 0.75 0.8 0.85 0.9 0.95−0.6−0.4−0.200.20.40.60.81 (a) SDP polynomial ﬁlter p k ( λ ) , λ min ≤ λ ≤ eig(W)eig(P(W)) (b) Eﬀect on the spectrum λ ( W ) Figure 2: Eﬀect of polynomial ﬁltering on the spectrum of the Maximum-degree matrix of eq. (24).

We illustrate ﬁrst the eﬀect of polynomial ﬁltering on the spectrum of W . We build a network of n = 50 sensors and we apply polynomial ﬁltering on the Maximum-degree weight matrix W , given in(24). We use k = 4 and we solve the optimization problem OPT2 using the Maximum-degree matrix W as input. Figure 2(a) shows the obtained polynomial ﬁlter p k ( λ ), when λ ∈ [ λ min ( W ) , W and Figure 2(b) shows the spectrum of W before (star-solid line)and after (circle-solid line) polynomial ﬁltering, versus the vector index. Observe that polynomialﬁltering dramatically increases the spectral gap | − λ | , which further leads to accelerating thedistributed consensus, as we show in the simulations that follow.Then we compare the performance of the diﬀerent distributed consensus algorithms, with allthe aforementioned weight matrices; that is, Maximum-degree, Metropolis and Laplacian weightmatrices for distributed averaging. We compare both Newton’s polynomial and the SDP polynomial(obtained from the solution of OPT2) with the standard iterative method, which is based onsuccessive iterations of eq. (3). For the sake of completeness, we also provide the results of thescalar epsilon algorithm (SEA) that uses all previous estimates [12].First, we explore the behavior of polynomial ﬁltering methods under variable degree k from 2to 6 with step 2. We use the Laplacian weight matrix for this experiment. Figures 3(a) and 3(b)illustrate the evolution of the absolute error k x t − µ k versus the iteration index t , for polynomialﬁltering with SDP and Newton’s polynomials respectively. We also provide the curve of the standarditerative method as a baseline. Observe ﬁrst that both polynomial ﬁltering methods outperform thestandard method by exhibiting faster convergence rates, across all values of k . Notice also, that thedegree k governs the convergence rate, since larger k implies more eﬀective ﬁltering and thereforefaster convergence. Finally, the stagnation of the convergence process of the SDP polynomialﬁltering and large values of k is due to the limited accuracy of the SDP solver.Next, we show the results obtained with the other two weight matrices on the same sensor net-work. Figures 4(a) and 4(b) show the convergence behavior of all methods for the Maximum-degreeand Metropolis matrices respectively. In both polynomial ﬁltering methods we use a representative12

50 100 150 200 25010 −15 −10 −5 Number of iterations || x − µ || Iterk=2k=4k=6SEA (a) SDP polynomial ﬁltering −10 −5 Number of iterations || x − µ || Iterk=2k=4k=6SEA (b) Newton polynomial ﬁltering

Figure 3: Behavior of polynomial ﬁltering for variable degree k on ﬁxed topology using the Laplacianweight matrix. −10 −5 Number of iterations || x − µ || IterSDP−PFNewton−PFSEA (a) Max-degree weight matrix −15 −10 −5 Number of iterations || x − µ || IterSDP−PFNewton−PFSEA (b) Metropolis weight matrix

Figure 4: Comparison between Newton’s polynomial and SDP polynomial ( k = 4) on ﬁxed topology.value of k , namely 4. Notice again that polynomial ﬁltering accelerates the convergence of thestandard iterative method (solid line). As expected, the optimal polynomial computed with SDPoutperforms Newton’s polynomial, which is based on intuitive arguments only.Finally, we can see from Figures 3 and 4 that in some cases the convergence rate is comparablefor SEA and SDP polynomial ﬁltering. Note however that the former uses all previous iterates, incontrast to the latter that uses only the k + 1 most recent ones. Hence, the memory requirementsare smaller for polynomial ﬁltering, since they are directly driven by k . This moreover allows moredirect control on the convergence rate, as we have seen in Fig. 3. Interestingly, we see that the13

50 100 150 200 25010 −15 −10 −5 Number of iterations || x − µ || IterSDP−PFNewton−PF (a) k = 1, q = 1 −15 −10 −5 Number of iterations || x − µ || IterSDP−PFNewton−PF (b) k = 2, q = 0 . −15 −10 −5 Number of iterations || x − µ || IterSDP−PFNewton−PF (c) k = 2, q = 0 . −20 −10 Number of iterations || x − µ || IterSDP−PFNewton−PF (d) k = 2, q = 0 . Figure 5: Random network topology simulations. q denotes the probability that the networkchanges at each iteration.convergence process is smoother with polynomial ﬁltering, which further permits easy extension todynamic network topologies. We study now the performance of polynomial ﬁltering for dynamic networks topologies. We builda sequence of random networks of n = 50 sensors, and we assume that in each iteration thenetwork topology changes independently from the previous iterations, with probability q and withprobability 1 − q it remains the same as in the previous iteration. We compare all methods fordiﬀerent values of the probability q . We use the Laplacian weight matrix (26). In the SDPpolynomial ﬁltering method, we solve the SDP program OPT4 (see Sec. 4.2). Fig. 5 shows theaverage performance of polynomial ﬁltering for some representative values of the degree k and the14robability q . The average performance is computed using the median over the 100 experiments.We have not reported the performance of the SEA algorithm, since it is not robust to changes ofthe network topology.Notice that when k = 1 (i.e., each sensor uses only its current value and the right previous one)polynomial ﬁltering accelerates the convergence over the standard method. At the same time, itstays robust to network topology changes. Also, observe that in this case, the SDP polynomialoutperforms Newton’s polynomial. However, when k = 2, the roles between the two polynomialﬁltering methods change as the probability q increases. For instance, when q = 0 .

8, the SDPmethod even diverges. This is expected if we think that the coeﬃcients of Newton’s polynomial arecomputed using Hermite interpolation in a given interval and they do not depend on the speciﬁcrealization of the underlying weight matrix. Thus, they are more generic than those of the SDPpolynomial that takes W into account, and therefore less sensitive to the actual topology realization.Algorithms based on optimal polynomial ﬁltering become ineﬃcient in a highly dynamic network,whose topology changes very frequently. Several works have studied the convergence rate of distributed consensus algorithms. In particular,the authors in [5] and [9, 10] have shown that the convergence rate depends on the second largesteigenvalue of the network weight matrix, for ﬁxed and random networks, respectively. They bothuse semi-deﬁnite programs to compute the optimal weight matrix, and the optimal topology.Other works have addressed the consensus problem, and we mention here only the most relevantones. A. Olshevsky and J. N. Tsitsiklis in [8] propose two consensus algorithms for ﬁxed networktopologies, which build on the “agreement algorithm”. The proposed algorithms make use ofspanning trees and the authors bound their worst-case convergence rate. For dynamic networktopologies, they propose an algorithm which builds on a previously known distributed load balancingalgorithm. In this case, the authors show that the algorithm has a polynomial bound on theconvergence time ( ǫ -convergence).The authors in [25] study the convergence properties of agreement over random networks fol-lowing the Erd˝os and R´enyi random graph model. According to this model, each edge of the graphexists with probability p , independently of other edges and the value of p is the same for all edges.By agreement, we consider the case where all nodes of the graph agree on a particular value. Theauthors employ results from stochastic stability in order to establish convergence of agreement overrandom networks. Also, it is shown that the rate of convergence is governed by the expectation ofan exponential factor, which involves the second smallest eigenvalue of the Laplacian of the graph.Gossip algorithms have also been applied successfully to solving distributed averaging prob-lems. In [15] provide convergence results on randomized gossip algorithm in both synchronous andasynchronous settings. Based on the obtained results, they optimize the network topology (edgeformation probabilities) in order to maximize the convergence rate of randomized gossip. Thisoptimization problem is also formulated as a semi-deﬁnite program (SDP). In a recent study, theauthors in [20] have been able to improve the standard gossip protocols in cases where the sensorsknow their geometric positions. The main idea is to exploit geographic routing in order to aggregatevalues among random nodes that are far away in the network.Under the same assumption of knowing the geometric positions of the sensors, the authors in[21] propose a fast consensus algorithm for geographic random graphs. In particular, they utilize15ocation information of the sensors in order to construct a nonreversible lifted Markov chain thatmixes faster than corresponding reversible chains. The main idea of lifting is to distinguish thegraph nodes from the states of the Markov chain and to “split” the states into virtual states thatare connected in such a way that permits faster mixing. The lifted graph is then “projected” backto the original graph, where the dynamics of the lifted Markov chain are simulated subject to theoriginal graph topology. However, the proposed algorithm is not applicable in the case where thenodes’ geographic location is not available.In [22] the authors propose a cluster-based distributed averaging algorithm, applicable to bothﬁxed linear iteration and random gossiping. The induced overlay graph that is constructed byclustering the nodes is better connected relatively to the original graph; hence, the random walk onthe overlay graph mixes faster than the corresponding walk on the original graph. Along the samelines, K. Jung et al. in [23], have used nonreversible lifted Markov chains to accelerate consensus.They use the lifting scheme of [24] and they propose a deterministic gossip algorithm based on aset of disjoint maximal matchings, in order to simulate the dynamics of the lifted Markov chain.Finally, even if we have mostly considered synchronous algorithms in this paper, it is worthmentioning that the authors in [26] propose two asynchronous algorithms for distributed averaging.The ﬁrst algorithm is based on blocking (that is, when two nodes update their values they blockuntil the update has been completed) and the other algorithm drops the blocking assumption.The authors show the convergence of both algorithms under very general asynchronous timing as-sumptions. Moreover, the authors in [27] propose consensus propagation , which is an asynchronousdistributed protocol that is a special case of belief propagation. In the case of singly-connectedgraphs (i.e., connected with no loops), synchronous consensus propagation converges in a numberof iterations that is equal to the diameter of the graph. The authors provide convergence analysisfor regular graphs. In this paper, we proposed a polynomial ﬁltering methodology in order to accelerate distributed av-erage consensus in both ﬁxed and random network topologies. The main idea of polynomial ﬁlteringis to shape the spectrum of the polynomial weight matrix in order to minimize its second largesteigenvalue and subsequently increase the convergence rate. We have constructed semi-deﬁniteprograms to compute the optimal polynomial coeﬃcients in both static and dynamic networks.Simulation results with several common weight matrices have shown that the convergence rate ismuch higher than for state-of-the-art algorithms in most scenarios, except in the speciﬁc case ofhighly dynamic networks and small memory sensors.

The ﬁrst author would like to thank prof. Yousef Saad for the valuable and insightful discussionson polynomial ﬁltering.

References [1] D. Bertsekas and J. N. Tsitsiklis,

Parallel and Distributed Computation: Numerical Methods .Prentice Hall, 1989. 162] I. D. Schizas, A. Ribeiro, and G. B. Giannakis, “Consensus in Ad Hoc WSNs with NoisyLinks - Part I: Distributed Estimation of Deterministic Signals,”

IEEE Transactions on SingalProcessing , vol. 56, no. 1, pp. 350–364, January 2008.[3] M. Rabbat, J. Haupt, A. Singh, and R. Nowak, “Decentralized compression and predistributionvia randomized gossiping,” , pp. 51 – 59, April 2006.[4] V. D. Blondel, J. M. Hendrickx, A. Olshevsky, and J. N. Tsitsiklis, “Convergence in multiagentcoordination, consensus and ﬂocking,”

IEEE Conf. on Decision and Control, and the EuropeanControl Conference , pp. 2996–3000, December 2005.[5] L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,”

Systems and ControlLetters , no. 53, pp. 65–78, February 2004.[6] L. Xiao, S. Boyd, and S. Lall, “A scheme for robust distributed sensor fusion based on averageconsensus,”

Int. Conf. on Information Processing in Sensor Networks , pp. 63–70, April 2005,los Angeles.[7] ——, “Distributed average consensus with time-varying metropolis weights,”

Automatica , June2006, submitted.[8] A. Olshevsky and J. Tsitsiklis, “Convergence rates in distributed consensus and averaging,”

IEEE Conference on Decision and Control , December 2006, san Diego, CA.[9] S. Kar and J. M. F. Moura, “Distributed average consensus in sensor networks with randomlink failures,”

IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP) , April2007.[10] ——, “Sensor networks with random links: Topology design for distributed consensus,”

IEEETransactions on Signal Processing , April 2007, submitted.[11] E. Kokiopoulou and Y. Saad, “Polynomial ﬁltering in latent semantic indexing for informa-tion retrieval,” , 2004.[12] E. Kokiopoulou and P. Frossard, “Accelarating distributed consensus using extrapolation,”

IEEE Signal Processing Letters , vol. 14, no. 10, pp. 665–668, October 2007.[13] S. Sundaram and C. N. Hadjicostis, “Distributed consensus and linear functional calculationin networks: An observability perspective,” , April 25-27 2007.[14] A. Tahbaz-Salehi and A. Jadbabaie, “On consensus over random networks,”

In Proceedings of44th annual Allerton Conference on Communication, Control and Computing , pp. 1315–1321,September 2006.[15] B. P. S. Boyd, A. Ghosh and D. Shah, “Randomized gossip algorithms,”

IEEE Transactionson Information Theory , vol. 52, pp. 2508–2530, June 2006.[16] S. Boyd and L. Vandenberghe,

Convex Optimization . Cambridge University Press, 2004.1717] Y. Saad,

Iterative methods for sparse linear systems , 2nd ed. SIAM, 2003.[18] J. F. Sturm, “Implementation of interior point methods for mixed semideﬁnite and secondorder cone optimization problems,”

EconPapers 73 , August 2002, Tilburg University, Centerfor Economic Research.[19] P. Gupta and P. R. Kumar, “The capacity of wireless networks,”

IEEE Trans. on InformationTheory , vol. 46, no. 2, pp. 388–404, March 2000.[20] A. Dimakis, A. D. Sarwate, and M. J. Wainwright, “Geographic gossip: Eﬃcient averaging forsensor networks,”

IEEE Trans. on Signal Processing , to appear.[21] W. Li and H. Dai, “Location-aided fast distributed consensus,”

IEEE Transactions on Infor-mation Theory , June 2007, submitted.[22] ——, “Cluster-based fast distributed consensus,”

IEEE Int. Conf. on Acoustics, Speech andSignal Processing (ICASSP) , April 2007.[23] K. Jung and D. Shah, “Fast gossip via nonreversible random walk,”

IEEE Information TheoryWorkshop (ITW) , March 2006.[24] F. Chen, L. Lov´asz, and I. Pak, “Lifting markov chains to speed up mixing,”

STOC ’99:Proceedings of the thirty-ﬁrst annual ACM symposium on Theory of computing , pp. 275–281,1999.[25] Y. Hatano and M. Mesbahi, “Agreement over random networks,”

IEEE Conf. on Decision andControl , vol. 2, pp. 2010–2015, December 2004.[26] M. Mehyar, D. Spanos, J. Pongsajapan, S. H. Low, and R. M. Murray, “Asynchronous dis-tributed averaging on communication networks,”

IEEE/ACM Transactions on Networking ,vol. 15, no. 3, pp. 512–520, June 2007.[27] C. C. Moallemi and B. V. Roy, “Consensus propagation,”