Polynomial Filtering for Fast Convergence in Distributed Consensus
aa r X i v : . [ c s . I T ] F e b Polynomial Filtering for Fast Convergence in Distributed Consensus
Effrosyni Kokiopoulou and Pascal Frossard ∗ Ecole Polytechnique F´ed´erale de Lausanne (EPFL)Signal Processing Laboratory (LTS4)Lausanne - 1015, Switzerland { effrosyni.kokiopoulou,pascal.frossard } @epfl.ch Abstract
In the past few years, the problem of distributed consensus has received a lot of attention,particularly in the framework of ad hoc sensor networks. Most methods proposed in the liter-ature address the consensus averaging problem by distributed linear iterative algorithms, withasymptotic convergence of the consensus solution. The convergence rate of such distributedalgorithms typically depends on the network topology and the weights given to the edges be-tween neighboring sensors, as described by the network matrix. In this paper, we propose toaccelerate the convergence rate for given network matrices by the use of polynomial filteringalgorithms. The main idea of the proposed methodology is to apply a polynomial filter on thenetwork matrix that will shape its spectrum in order to increase the convergence rate. Suchan algorithm is equivalent to periodic updates in each of the sensors by aggregating a few ofits previous estimates. We formulate the computation of the coefficients of the optimal poly-nomial as a semi-definite program that can be efficiently and globally solved for both staticand dynamic network topologies. We finally provide simulation results that demonstrate theeffectiveness of the proposed solutions in accelerating the convergence of distributed consensusaveraging problems.
We consider the problem of distributed consensus [1] that has become recently very interestingespecially in the context of ad hoc sensor networks. In particular, the problem of distributedaverage consensus has attracted a lot of research efforts due to its numerous applications in diverseareas. A few examples include distributed estimation [2], distributed compression [3], coordinationof networks of autonomous agents [4] and computation of averages and least-squares in a distributedfashion (see e.g., [5, 6, 7, 8] and references therein).In general the main goal of distributed consensus is to reach a global solution using only localcomputation and communication while staying robust to changes in the network topology. Given theinitial values at the sensors, the problem of distributed averaging is to compute their average at eachsensor using distributed linear iterations. Each distributed iteration involves local communicationamong the sensors. In particular, each sensor updates its own local estimate of the average by aweighted linear combination of the corresponding estimates of its neighbors. The weights that are ∗ This work has been partly supported by the Swiss NSF, under NCCR IM2. W typically drive the importance of the measurements ofthe different neighbors.One of the important characteristics of the distributed consensus algorithms is the rate ofconvergence to the asymptotic solution. In many cases, the average consensus solution can bereached by successive multiplications of W with the vector of initial sensor values. Furthermore,it has been shown in [5] that in the case of fixed network topology, the convergence rate dependson the second largest eigenvalue of W , λ ( W ). In particular, the convergence is faster when thevalue of λ ( W ) is small. Similar convergence results have been proposed recently in the case ofrandom network topology [9, 10], where the convergence rate is governed by the expected value ofthe λ ( W ), E [ λ ( W )].The main research direction so far focuses on the computation of the optimal weights W thatyield the fastest convergence rate to the consensus solution [5, 6, 7]. In this work, we diverge frommethods that are based on successive multiplications of W , and we rather allow the sensors to usetheir previous estimates, in order to accelerate the convergence rate. This is similar in spirit tothe works proposed [12, 13] that reach the consensus solution in a finite number of steps. Theyuse respectively extrapolation methods and linear dynamical system formulation for fixed networktopologies. In order to address more generic network topologies, we propose here to use a matrixpolynomial p applied on the weight matrix W in order to shape its spectrum. Given the fact thatthe convergence rate is driven by λ ( W ), it is therefore possible to impact on the convergence rateby careful design of the polynomial p . In the implementation viewpoint, working with p ( W ) isequivalent to each sensor aggregating its value periodically using its own previous estimates. Wefurther formulate the problem of the computation of the optimal coefficients for both static anddynamic network topologies. We show that this problem can be efficiently and globally solved inboth cases by the definition of a semi-definite program (SDP).The rest of this paper is organized as follows. In Section 2 we review the main convergenceresults of average consensus in both fixed and dynamic random network topologies. Next, in Section3 we introduce the polynomial filtering methodology and discuss its implementation for distributedconsensus problems. We compute the optimal polynomial filter in Section 4 for both static anddynamic network topologies. In Section 5 we provide simulation results that verify the validity andthe effectiveness of our method. Related work is finally presented in Section 6. Let us first define formally the problem of distributed consensus averaging. Assume that initiallyeach sensor i reports a scalar value x ( i ) ∈ R . We denote by x = [ x (1) , . . . , x ( n )] ⊤ ∈ R n thevector of initial values on the network. Denote by µ = 1 n n X i =1 x ( i ) (1)the average of the initial values of the sensors. However, one rarely has a complete view of thenetwork. The problem of distributed averaging therefore becomes typically to compute µ at eachsensor by distributed linear iterations. In what follows we review the main convergence results fordistributed consensus algorithms on both fixed and random network topologies.2 .1 Static network topology We model the static network topology as an undirected graph G = ( V , E ) with nodes V = { , . . . , n } corresponding to sensors. An edge ( i, j ) ∈ E is drawn if and only if sensor i can communicate withsensor j . We denote the set of neighbors for node i as N i = { j | ( i, j ) ∈ E} . Unless otherwise stated,we assume that each graph is simple i.e., no loops or multiple edges are allowed.In this work, we consider distributed linear iterations of the following form x t +1 ( i ) = W ii x t ( i ) + X j ∈N i W ij x t ( j ) , (2)for i = 1 , . . . , n , where x t ( j ) represents the value computed by sensor j at iteration t . Since thesensors communicate in each iteration t , we assume that they are synchronized. The parameters W ij denote the edge weights of G . Since each sensor communicates only with its direct neighbors, W ij = 0 when ( i, j ) / ∈ E . The above iteration can be compactly written in the following form x t +1 = W x t , (3)or more generally x t = (cid:16) t − Y i =0 W (cid:17) x = W t x . (4)We call the matrix W that gathers the edge weights W ij , as the weight matrix. Note that W is a sparse matrix whose sparsity pattern is driven by the network topology. We assume that W is symmetric, and we denote its eigenvalue decomposition as W = Q Λ Q ⊤ . The (real) eigenvaluescan further be arranged as follows: λ ( W ) ≥ λ ( W ) ≥ . . . ≥ λ n − ( W ) ≥ λ n ( W ) . (5)The distributed linear iteration given in eq. (3) converges to the average if and only iflim t →∞ W t = ⊤ n , (6)where is the vector of ones [5]. Indeed, notice that in this case x ∗ = lim t →∞ x t = lim t →∞ W t x = ⊤ n x = µ . It has been shown that for fixed network topology the convergence rate of eq. (3) depends onthe magnitude of the second largest eigenvalue λ ( W ) [5]. The asymptotic convergence factor isdefined as r asym ( W ) = sup x = µ lim t →∞ (cid:16) k x t − µ k k x − µ k (cid:17) /t , (7)and the per-step convergence factor is written as r step ( W ) = sup x = µ k x t +1 − µ k k x t − µ k . (8)Furthermore, it has been shown that the convergence rate relates to the spectrum of W , asgiven by the following theorem [5]. 3 heorem 1 The convergence given by eq. (6) is guaranteed if and only if ⊤ W = ⊤ , (9) W = , (10) ρ (cid:0) W − ⊤ n (cid:1) < , (11) where ρ ( · ) denotes the spectral radius of a matrix. Furthermore, r asym ( W ) = ρ (cid:0) W − ⊤ n (cid:1) , (12) r step ( W ) = k W − ⊤ n k . (13)According to the above theorem, √ n is a left and right eigenvector of W associated with theeigenvalue one, and the magnitude of all other eigenvalues is strictly less than one. Note finally, thatsince W is symmetric, the asymptotic convergence factor coincides with the per-step convergencefactor, which implies that the relations (12) and (13) are equivalent.We give now an alternate proof of the above theorem that illustrates the importance of thesecond largest eigenvalue in the convergence rate. We expand the initial state vector x to theorthogonal eigenbasis Q of W ; that is, x = ν √ n + n X j =2 ν j q j , where ν = h √ n , x i and ν j = h q j , x i . We further assume that ν = 0. Then, eq. (4) implies that x t = W t x = W t (cid:0) ν √ n + n X j =2 ν j q j (cid:1) = ν √ n + n X j =2 ν j W t q j = ν √ n + n X j =2 ν j λ tj q j . Observe now that if | λ j | < , ∀ j ≥
2, then in the limit, the second term in the above equationdecays and x ∗ = lim t →∞ x t = ν √ n = µ . We see that the smaller the value of λ ( W ), the faster the convergence rate. Analogous conver-gence results hold in the case of dynamic network topologies discussed next. Let us consider now networks with random link failures, where the state of a link changes overthe iterations. In particular, we use the random network model proposed in [9, 10]. We assume4hat the network at any arbitrary iteration t is G ( t ) = ( V , E ( t )), where E ( t ) denotes the edge set atiteration t , or equivalently at time instant t . Since the network is dynamic, the edge set changesover the iterations, as links fail at random. We assume that E ( t ) ⊆ E ∗ , where E ∗ ⊆ V × V is theset of realizable edges when there is no link failure.We also assume that each link ( i, j ) fails with a probability (1 − p ij ), independently of the otherlinks. Two random edge sets E ( t ) and E ( t ) at different iterations t and t are independent. Theprobability of forming a particular E ( t ) is thus given by Q ( i,j ) ∈E ( t ) p ij . We define the matrix P as P ij = ( p ij if ( i, j ) ∈ E ∗ and i = j, . (14)The matrix P is symmetric and its diagonal elements are zero, since it corresponds to a simplegraph. It represents the probabilities of edge formation in the network, and the edge set E ( t ) istherefore a random subset of E ∗ driven by the P matrix. Finally, the weight matrix W becomesdependent on the edge set since only the weights of existing edges can take non zero values.In the dynamic case, the distributed linear iteration of eq. (2) becomes x t +1 ( i ) = W ii ( t ) x t ( i ) + X j ∈N i W ij ( t ) x t ( j ) (15)or in compact form, x t +1 = W t x t , (16)where W t denotes the weight matrix corresponding to the graph realization G ( t ) of iteration t . Theiterative relation given by eq. (16) can be written as x t = (cid:16) t − Y i =0 W i (cid:17) x . Clearly, x t now represents a stochastic process since the edges are drawn randomly. The convergencerate to the consensus solution therefore depends on the behavior of the product Q t − i =0 W i . We saythat the algorithm converges if ∀ x ∈ R n , lim t →∞ E k x t − µ k = 0 . (17)We review now some convergence results from [10], which first shows that Lemma 1
For any x ∈ R n , k x t +1 − µ k ≤ (cid:16) t Y i =0 ρ (cid:0) W i − ⊤ n (cid:1)(cid:17) k x − µ k . It leads to the following convergence theorem [10] for dynamic networks.
Theorem 2 If E [ ρ (cid:0) W − ⊤ n (cid:1)(cid:17) ] < , the vector sequence { x t } ∞ t =0 converges in the sense of eq.(17).
5e define the convergence factor in dynamic network topologies as r d ( W ) = E h ρ (cid:0) W − ⊤ n (cid:1)i . This factor depends in general on the spectral properties of the induced network matrix and drivesthe convergence rate of eq. (16). The authors in [14] show that | λ ( E [ W ]) | < As we have seen above, the convergence rate of the distributed consensus algorithms depends ingeneral on the spectral properties of an induced network matrix. This is the case for both fixed andrandom network topologies. Most of the research work has been devoted to finding weight matrix W for accelerating the convergence to the consensus solution when sensors only use their currentestimates. We choose a different approach where we exploit the memory of sensors, or the valuesof previous estimates in order to augment to convergence rate.Therefore, we have proposed in our previous work [12] the Scalar Epsilon Algorithm (SEA)for accelerating the convergence rate to the consensus solution. SEA belongs to the family ofextrapolation methods for accelerating vector sequences, such as eq. (3). These methods exploitthe fact that the fixed point of the sequence belongs to the subspace spanned by any ℓ +1 consecutiveterms of it, where ℓ is the degree of the minimal polynomial of the sequence generator matrix (formore details, see [12] and references therein). SEA is a low complexity algorithm, which is ideal forsensor networks and it is known to reach the consensus solution in 2 ℓ steps. However, ℓ is unknownin practice, so one may use all the available terms of the vector sequence. Hence, the memoryrequirements of SEA are O ( T ), where T is the number of terms. Moreover, SEA assumes that thesequence generator matrix (e.g., W in the case of eq. (3)) is fixed, so that it does not adapt easilyto dynamic network topologies.In this paper, we propose a more flexible algorithm based on the polynomial filtering technique.Polynomial filtering permits to “shape” the spectrum of a certain symmetric weight matrix, inorder to accelerate the convergence to the consensus solution. Similarly to SEA, it allows thesensors to use the value of their previous estimates. However, the polynomial filtering methodologyintroduced below presents three main advantages: (i) it is robust to dynamic topologies (ii) it hasexplicit control on the convergence rate and (iii) its memory requirements can be adjusted to thememory constraints imposed by the sensor. 6 .2 Polynomial filtering Starting from a given (possibly optimal) weight matrix W , we propose the application of a polyno-mial filter on the spectrum of W in order to impact the magnitude of λ ( W ) that mainly drives theconvergence rate. Denote by p k ( λ ) the polynomial filter of degree k that is applied on the spectrumof W , p k ( λ ) = α + α λ + α λ + . . . + α k λ k . (18)Accordingly, the matrix polynomial is given as p k ( W ) = α I + α W + α W + . . . + α k W k . (19)Observe now that p k ( W ) = p k ( Q Λ Q ⊤ )= α I + α ( Q Λ Q ⊤ ) + . . . + α k ( Q Λ k Q ⊤ )= Qp k (Λ) Q ⊤ , (20)which implies that the eigenvalues of p k ( W ) are simply the polynomial filtered eigenvalues of W i.e., p k ( λ i ( W )) , i = 1 , . . . , n .In the implementation level, working on p k ( W ) implies a periodic update of the current sensor’svalue with a linear combination of its previous values. To see why this is true, we observe that: x t + k +1 = p k ( W ) x t = α x t + α W x t + . . . + α k W k x t = α x t + α x t +1 + . . . + α k x t + k . (21)A careful design of p k may impact the convergence rate dramatically. Then, each sensor typicallyapplies polynomial filtering for distributed consensus by following the main steps tabulated inAlgorithm 1.Note that, for both fixed and random network topology cases, the α k ’s are computed off-lineassuming that W and respectively E [ W ] are known a priori. In what follows, we propose differentapproaches for computing the coefficients α i of the filter p k . One simple and rather intuitive approach for the design of the polynomial p k ( λ ) is to use Hermiteinterpolation. Recall that the objective is to dampen the smallest eigenvalues λ , . . . , λ n of W ,while keeping the eigenvalue one intact. Therefore, we assume that the spectrum of W lies in aninterval [ a,
1] and we impose smoothness constraints of p k at the left endpoint a . In particular, thepolynomial p k ( λ ) : [ a, → R that we seek, will be determined by the following constraints: p k ( a ) = 0 , p ( i ) k ( a ) = 0 , i = 1 , . . . , k − p k (1) = 1 , (23)where p ( i ) k ( a ) denotes the i -th derivative of p k ( λ ) evaluated at a . The dampening is achievedby imposing smoothness constraints of the polynomial on the left endpoint of the interval. The7 lgorithm 1 Polynomial Filtered Distributed Consensus Input : polynomial coefficients α , . . . , α k +1 , tolerance ǫ . Output : average estimate ¯ µ . Initialization : ¯ µ = x ( i ). Set the iteration index t = 1. repeat if mod( t, k + 1) ==0 then x t ( i ) = α x t − k − ( i )+ α x t − k ( i )+ α x t − k +1 ( i )+ . . . + α k x t − ( i ) { polynomial filtered update } x t ( i ) = W ii x t ( i ) + P j ∈N i W ij x t ( j ). else x t ( i ) = W ii x t − ( i ) + P j ∈N i W ij x t − ( j ). end if Increase the iteration index t = t + 1. ¯ µ t = x t ( i ). until ¯ µ t − ¯ µ t − < ǫ Figure 1: Newton’s polynomial of various degrees k .computed polynomial will have degree equal to k . Finally, the coefficients of p k that satisfies theabove constraints can be computed by the Newton’s divided differences table.Figure 1 shows the form of p k ( λ ) for a = 0 and different values of the degree k . As k increases,the dampening of the small eigenvalues becomes more effective. It is worth mentioning that thedesign of Newton’s polynomial does not depend on the network size or the size of the weightmatrix W . What is only needed is a left endpoint a , which encloses the spectrum of W , as wellas the desired degree k , which moreover may be imposed by memory constraints. This featureof Newton’s polynomial is very interesting and it is particularly appealing in the case of dynamicnetwork topologies. However, the above polynomial design is mostly driven by intuitive arguments,which tend to obtain small eigenvalues for faster convergence. In the following section, we providean alternative technique for computing the polynomial filter that optimizes the convergence rate.8 Optimal polynomial filters
We are now interested in finding the polynomial that leads to the fastest convergence of lineariteration described in eq. (3), for a given weight matrix W and a certain degree k . For notationalease, we call W p = p k ( W ) = P ki =0 α i W i . According to Theorem 1, the optimal polynomial is theone that minimizes the second largest eigenvalue of W p , i.e., ρ (cid:0) W p − ⊤ n (cid:1) . Therefore, we need tosolve an optimization problem where the optimization variables are the k +1 polynomial coefficients α , . . . , α k and the objective function is the spectral radius of W p − ⊤ n .Optimization problem: OPT1 min α ∈ R k +1 ρ (cid:0) P ki =0 α i W i − ⊤ n (cid:1) subject to (cid:0) P ki =0 α i W i (cid:1) = .Interestingly, the optimization problem OPT1 is convex. First, its objective function is convex,as stated in Lemma 2. Lemma 2
For a given symmetric weight matrix W and degree k , ρ (cid:0) P ki =0 α i W i − ⊤ n (cid:1) is a convexfunction of the polynomial coefficients α i ’s.Proof : Let β, γ ∈ R k +1 and 0 ≤ θ ≤
1. Since W is symmetric, P ki =0 α i W i is also symmetric.Hence, the spectral radius is equal to the matrix 2-norm. Thus, we have ρ (cid:0) k X i =0 ( θβ i + (1 − θ ) γ i ) W i − ⊤ n (cid:1) = (cid:13)(cid:13)(cid:13) k X i =0 ( θβ i + (1 − θ ) γ i ) W i − ⊤ n (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) θ (cid:16) k X i =0 β i W i − ⊤ n (cid:17) + (1 − θ ) (cid:16) k X i =0 γ i W i − ⊤ n (cid:17)(cid:13)(cid:13)(cid:13) ≤ θ (cid:13)(cid:13)(cid:13) k X i =0 β i W i − ⊤ n (cid:13)(cid:13)(cid:13) + (1 − θ ) (cid:13)(cid:13)(cid:13) k X i =0 γ i W i − ⊤ n (cid:13)(cid:13)(cid:13) ≤ θρ (cid:16) k X i =0 β i W i − ⊤ n (cid:17) + (1 − θ ) ρ (cid:16) k X i =0 γ i W i − ⊤ n (cid:17) , which proves the Lemma. In addition, the constraint of OPT1 is linear which implies that the setof feasible α i ’s is convex. As OPT1 minimizes a convex function over a convex set, the optimizationproblem is indeed convex.In order to solve OPT1, we use an auxiliary variable s to bound the objective function, andthen we express the spectral radius constraint as a linear matrix inequality (LMI). Thus, we needto solve the following optimization problem. 9ptimization problem: OPT2 min s ∈ R ,α ∈ R k +1 s subject to − sI (cid:22) P ki =0 α i W i − ⊤ n (cid:22) sI , (cid:0) P ki =0 α i W i (cid:1) = .Recall that since W is symmetric, W p = P ki =0 α i W i will be symmetric as well. Hence, the constraint W p = is sufficient to ensure that will be also a left eigenvector of W p . The spectral radiusconstraint, − sI (cid:22) W p − ⊤ n (cid:22) sI ensures that that all the eigenvalues of W p other than the first one, are less or equal to s . Dueto the LMI, the above optimization problem becomes equivalent to a semi-definite program (SDP)[16]. SDPs are convex problems and can be globally and efficiently solved. The solution to OPT2is therefore computed efficiently in practice, where the SDP only has a moderate number of k + 2unknowns (including s ). We extend now the idea of polynomial filtering to dynamic network topologies. Theorem 2 suggeststhat the convergence rate in the random network topology case is governed by E h ρ (cid:0) W − ⊤ n (cid:1)i .Since W depends on a dynamic edge set, p k ( W ) now becomes stochastic. Following the sameintuition as above, we could form an optimization problem, similar to OPT1, whose objectivefunction would be E [ ρ (cid:0) P ki =0 α i W i − ⊤ n (cid:1) ]. Although this objective function can be shown to beconvex, its evaluation is hard and typically requires several Monte Carlo simulations steps.Recall that the convergence rate of eq. (16) is related to the second largest eigenvalue of E [ W ],which is much easier to evaluate. Let W denote the average weight matrix E [ W ]. We then observethat E h ρ (cid:0) W − ⊤ n (cid:1)i ≥ ρ (cid:16) W − ⊤ n (cid:17) , which is due to Lemma 2 and Jensen’s inequality. The above inequality implies that the spectralradius ρ (cid:16) W − ⊤ n (cid:17) shall be small in order E h ρ (cid:0) W − ⊤ n (cid:1)i to be small too. Additionally, the authorsprovide experimental evidence in [10], which indicates that ρ (cid:16) W − ⊤ n (cid:17) seems to be closely relatedto the convergence rate of eq. (16).Based on the above facts, we propose to build our polynomial filter based on W . Hence, weformulate the following optimization problem for computing the polynomial coefficients α i ’s in therandom network topology case. Optimization problem: OPT3 min α ∈ R k +1 ρ (cid:0) P ki =0 α i W i − ⊤ n (cid:1) subject to (cid:0) P ki =0 α i W i (cid:1) = .10PT3 could be viewed as the analog of OPT1 for the case of dynamic network topology. Themain difference is that we work on W , whose eigenvalues can be easily obtained. Using again theauxiliary variable s , we reach the following formulation for obtaining the α i ’s.Optimization problem: OPT4 min s ∈ R ,α ∈ R k +1 s subject to − sI (cid:22) P ki =0 α i W i − ⊤ n (cid:22) sI , (cid:0) P ki =0 α i W i (cid:1) = .Once W has been computed, this optimization problem is solved efficiently by a SDP similarlyto the case of static networks. In this section, we provide simulation results which show the effectiveness of the polynomial filteringmethodology. First we introduce a few weight matrices that have been extensively used in thedistributed averaging literature. Suppose that d ( i ) denotes the degree of the i -th sensor. It hasbeen shown in [5, 6] that iterating eq. (3) with the following matrices leads to convergence to µ . • Maximum-degree weights. The Maximum-degree weight matrix is W ij = n if ( i, j ) ∈ E , − d ( i ) n i = j, . (24) • Metropolis weights. The Metropolis weight matrix is W ij = { d ( i ) ,d ( j ) } if ( i, j ) ∈ E , − P ( i,k ) ∈E W ik i = j, . (25) • Laplacian weights. Suppose that A is the adjacency matrix of G and D is a diagonal matrixwhich holds the vertex degrees. The Laplacian matrix is defined as L = D − A and theLaplacian weight matrix is defined as W = I − γL, (26)where the scalar γ must satisfy γ < /d max [5].The sensor networks are built using the random geographic graph model [19]. In particular,we place n nodes uniformly distributed on the 2-dimensional unit area. Two nodes are adjacent iftheir Euclidean distance is smaller than r = q log( N ) N in order to guarantee connectedness of thegraph with high probability [19].Finally, the SDP programs for optimizing the polynomial filters are solved in Matlab using theSeDuMi [18] solver . Publically available at: http://sedumi.mcmaster.ca/ .65 0.7 0.75 0.8 0.85 0.9 0.95−0.6−0.4−0.200.20.40.60.81 (a) SDP polynomial filter p k ( λ ) , λ min ≤ λ ≤ eig(W)eig(P(W)) (b) Effect on the spectrum λ ( W ) Figure 2: Effect of polynomial filtering on the spectrum of the Maximum-degree matrix of eq. (24).
We illustrate first the effect of polynomial filtering on the spectrum of W . We build a network of n = 50 sensors and we apply polynomial filtering on the Maximum-degree weight matrix W , given in(24). We use k = 4 and we solve the optimization problem OPT2 using the Maximum-degree matrix W as input. Figure 2(a) shows the obtained polynomial filter p k ( λ ), when λ ∈ [ λ min ( W ) , W and Figure 2(b) shows the spectrum of W before (star-solid line)and after (circle-solid line) polynomial filtering, versus the vector index. Observe that polynomialfiltering dramatically increases the spectral gap | − λ | , which further leads to accelerating thedistributed consensus, as we show in the simulations that follow.Then we compare the performance of the different distributed consensus algorithms, with allthe aforementioned weight matrices; that is, Maximum-degree, Metropolis and Laplacian weightmatrices for distributed averaging. We compare both Newton’s polynomial and the SDP polynomial(obtained from the solution of OPT2) with the standard iterative method, which is based onsuccessive iterations of eq. (3). For the sake of completeness, we also provide the results of thescalar epsilon algorithm (SEA) that uses all previous estimates [12].First, we explore the behavior of polynomial filtering methods under variable degree k from 2to 6 with step 2. We use the Laplacian weight matrix for this experiment. Figures 3(a) and 3(b)illustrate the evolution of the absolute error k x t − µ k versus the iteration index t , for polynomialfiltering with SDP and Newton’s polynomials respectively. We also provide the curve of the standarditerative method as a baseline. Observe first that both polynomial filtering methods outperform thestandard method by exhibiting faster convergence rates, across all values of k . Notice also, that thedegree k governs the convergence rate, since larger k implies more effective filtering and thereforefaster convergence. Finally, the stagnation of the convergence process of the SDP polynomialfiltering and large values of k is due to the limited accuracy of the SDP solver.Next, we show the results obtained with the other two weight matrices on the same sensor net-work. Figures 4(a) and 4(b) show the convergence behavior of all methods for the Maximum-degreeand Metropolis matrices respectively. In both polynomial filtering methods we use a representative12
50 100 150 200 25010 −15 −10 −5 Number of iterations || x − µ || Iterk=2k=4k=6SEA (a) SDP polynomial filtering −10 −5 Number of iterations || x − µ || Iterk=2k=4k=6SEA (b) Newton polynomial filtering
Figure 3: Behavior of polynomial filtering for variable degree k on fixed topology using the Laplacianweight matrix. −10 −5 Number of iterations || x − µ || IterSDP−PFNewton−PFSEA (a) Max-degree weight matrix −15 −10 −5 Number of iterations || x − µ || IterSDP−PFNewton−PFSEA (b) Metropolis weight matrix
Figure 4: Comparison between Newton’s polynomial and SDP polynomial ( k = 4) on fixed topology.value of k , namely 4. Notice again that polynomial filtering accelerates the convergence of thestandard iterative method (solid line). As expected, the optimal polynomial computed with SDPoutperforms Newton’s polynomial, which is based on intuitive arguments only.Finally, we can see from Figures 3 and 4 that in some cases the convergence rate is comparablefor SEA and SDP polynomial filtering. Note however that the former uses all previous iterates, incontrast to the latter that uses only the k + 1 most recent ones. Hence, the memory requirementsare smaller for polynomial filtering, since they are directly driven by k . This moreover allows moredirect control on the convergence rate, as we have seen in Fig. 3. Interestingly, we see that the13
50 100 150 200 25010 −15 −10 −5 Number of iterations || x − µ || IterSDP−PFNewton−PF (a) k = 1, q = 1 −15 −10 −5 Number of iterations || x − µ || IterSDP−PFNewton−PF (b) k = 2, q = 0 . −15 −10 −5 Number of iterations || x − µ || IterSDP−PFNewton−PF (c) k = 2, q = 0 . −20 −10 Number of iterations || x − µ || IterSDP−PFNewton−PF (d) k = 2, q = 0 . Figure 5: Random network topology simulations. q denotes the probability that the networkchanges at each iteration.convergence process is smoother with polynomial filtering, which further permits easy extension todynamic network topologies. We study now the performance of polynomial filtering for dynamic networks topologies. We builda sequence of random networks of n = 50 sensors, and we assume that in each iteration thenetwork topology changes independently from the previous iterations, with probability q and withprobability 1 − q it remains the same as in the previous iteration. We compare all methods fordifferent values of the probability q . We use the Laplacian weight matrix (26). In the SDPpolynomial filtering method, we solve the SDP program OPT4 (see Sec. 4.2). Fig. 5 shows theaverage performance of polynomial filtering for some representative values of the degree k and the14robability q . The average performance is computed using the median over the 100 experiments.We have not reported the performance of the SEA algorithm, since it is not robust to changes ofthe network topology.Notice that when k = 1 (i.e., each sensor uses only its current value and the right previous one)polynomial filtering accelerates the convergence over the standard method. At the same time, itstays robust to network topology changes. Also, observe that in this case, the SDP polynomialoutperforms Newton’s polynomial. However, when k = 2, the roles between the two polynomialfiltering methods change as the probability q increases. For instance, when q = 0 .
8, the SDPmethod even diverges. This is expected if we think that the coefficients of Newton’s polynomial arecomputed using Hermite interpolation in a given interval and they do not depend on the specificrealization of the underlying weight matrix. Thus, they are more generic than those of the SDPpolynomial that takes W into account, and therefore less sensitive to the actual topology realization.Algorithms based on optimal polynomial filtering become inefficient in a highly dynamic network,whose topology changes very frequently. Several works have studied the convergence rate of distributed consensus algorithms. In particular,the authors in [5] and [9, 10] have shown that the convergence rate depends on the second largesteigenvalue of the network weight matrix, for fixed and random networks, respectively. They bothuse semi-definite programs to compute the optimal weight matrix, and the optimal topology.Other works have addressed the consensus problem, and we mention here only the most relevantones. A. Olshevsky and J. N. Tsitsiklis in [8] propose two consensus algorithms for fixed networktopologies, which build on the “agreement algorithm”. The proposed algorithms make use ofspanning trees and the authors bound their worst-case convergence rate. For dynamic networktopologies, they propose an algorithm which builds on a previously known distributed load balancingalgorithm. In this case, the authors show that the algorithm has a polynomial bound on theconvergence time ( ǫ -convergence).The authors in [25] study the convergence properties of agreement over random networks fol-lowing the Erd˝os and R´enyi random graph model. According to this model, each edge of the graphexists with probability p , independently of other edges and the value of p is the same for all edges.By agreement, we consider the case where all nodes of the graph agree on a particular value. Theauthors employ results from stochastic stability in order to establish convergence of agreement overrandom networks. Also, it is shown that the rate of convergence is governed by the expectation ofan exponential factor, which involves the second smallest eigenvalue of the Laplacian of the graph.Gossip algorithms have also been applied successfully to solving distributed averaging prob-lems. In [15] provide convergence results on randomized gossip algorithm in both synchronous andasynchronous settings. Based on the obtained results, they optimize the network topology (edgeformation probabilities) in order to maximize the convergence rate of randomized gossip. Thisoptimization problem is also formulated as a semi-definite program (SDP). In a recent study, theauthors in [20] have been able to improve the standard gossip protocols in cases where the sensorsknow their geometric positions. The main idea is to exploit geographic routing in order to aggregatevalues among random nodes that are far away in the network.Under the same assumption of knowing the geometric positions of the sensors, the authors in[21] propose a fast consensus algorithm for geographic random graphs. In particular, they utilize15ocation information of the sensors in order to construct a nonreversible lifted Markov chain thatmixes faster than corresponding reversible chains. The main idea of lifting is to distinguish thegraph nodes from the states of the Markov chain and to “split” the states into virtual states thatare connected in such a way that permits faster mixing. The lifted graph is then “projected” backto the original graph, where the dynamics of the lifted Markov chain are simulated subject to theoriginal graph topology. However, the proposed algorithm is not applicable in the case where thenodes’ geographic location is not available.In [22] the authors propose a cluster-based distributed averaging algorithm, applicable to bothfixed linear iteration and random gossiping. The induced overlay graph that is constructed byclustering the nodes is better connected relatively to the original graph; hence, the random walk onthe overlay graph mixes faster than the corresponding walk on the original graph. Along the samelines, K. Jung et al. in [23], have used nonreversible lifted Markov chains to accelerate consensus.They use the lifting scheme of [24] and they propose a deterministic gossip algorithm based on aset of disjoint maximal matchings, in order to simulate the dynamics of the lifted Markov chain.Finally, even if we have mostly considered synchronous algorithms in this paper, it is worthmentioning that the authors in [26] propose two asynchronous algorithms for distributed averaging.The first algorithm is based on blocking (that is, when two nodes update their values they blockuntil the update has been completed) and the other algorithm drops the blocking assumption.The authors show the convergence of both algorithms under very general asynchronous timing as-sumptions. Moreover, the authors in [27] propose consensus propagation , which is an asynchronousdistributed protocol that is a special case of belief propagation. In the case of singly-connectedgraphs (i.e., connected with no loops), synchronous consensus propagation converges in a numberof iterations that is equal to the diameter of the graph. The authors provide convergence analysisfor regular graphs. In this paper, we proposed a polynomial filtering methodology in order to accelerate distributed av-erage consensus in both fixed and random network topologies. The main idea of polynomial filteringis to shape the spectrum of the polynomial weight matrix in order to minimize its second largesteigenvalue and subsequently increase the convergence rate. We have constructed semi-definiteprograms to compute the optimal polynomial coefficients in both static and dynamic networks.Simulation results with several common weight matrices have shown that the convergence rate ismuch higher than for state-of-the-art algorithms in most scenarios, except in the specific case ofhighly dynamic networks and small memory sensors.
The first author would like to thank prof. Yousef Saad for the valuable and insightful discussionson polynomial filtering.
References [1] D. Bertsekas and J. N. Tsitsiklis,
Parallel and Distributed Computation: Numerical Methods .Prentice Hall, 1989. 162] I. D. Schizas, A. Ribeiro, and G. B. Giannakis, “Consensus in Ad Hoc WSNs with NoisyLinks - Part I: Distributed Estimation of Deterministic Signals,”
IEEE Transactions on SingalProcessing , vol. 56, no. 1, pp. 350–364, January 2008.[3] M. Rabbat, J. Haupt, A. Singh, and R. Nowak, “Decentralized compression and predistributionvia randomized gossiping,” , pp. 51 – 59, April 2006.[4] V. D. Blondel, J. M. Hendrickx, A. Olshevsky, and J. N. Tsitsiklis, “Convergence in multiagentcoordination, consensus and flocking,”
IEEE Conf. on Decision and Control, and the EuropeanControl Conference , pp. 2996–3000, December 2005.[5] L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,”
Systems and ControlLetters , no. 53, pp. 65–78, February 2004.[6] L. Xiao, S. Boyd, and S. Lall, “A scheme for robust distributed sensor fusion based on averageconsensus,”
Int. Conf. on Information Processing in Sensor Networks , pp. 63–70, April 2005,los Angeles.[7] ——, “Distributed average consensus with time-varying metropolis weights,”
Automatica , June2006, submitted.[8] A. Olshevsky and J. Tsitsiklis, “Convergence rates in distributed consensus and averaging,”
IEEE Conference on Decision and Control , December 2006, san Diego, CA.[9] S. Kar and J. M. F. Moura, “Distributed average consensus in sensor networks with randomlink failures,”
IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP) , April2007.[10] ——, “Sensor networks with random links: Topology design for distributed consensus,”
IEEETransactions on Signal Processing , April 2007, submitted.[11] E. Kokiopoulou and Y. Saad, “Polynomial filtering in latent semantic indexing for informa-tion retrieval,” , 2004.[12] E. Kokiopoulou and P. Frossard, “Accelarating distributed consensus using extrapolation,”
IEEE Signal Processing Letters , vol. 14, no. 10, pp. 665–668, October 2007.[13] S. Sundaram and C. N. Hadjicostis, “Distributed consensus and linear functional calculationin networks: An observability perspective,” , April 25-27 2007.[14] A. Tahbaz-Salehi and A. Jadbabaie, “On consensus over random networks,”
In Proceedings of44th annual Allerton Conference on Communication, Control and Computing , pp. 1315–1321,September 2006.[15] B. P. S. Boyd, A. Ghosh and D. Shah, “Randomized gossip algorithms,”
IEEE Transactionson Information Theory , vol. 52, pp. 2508–2530, June 2006.[16] S. Boyd and L. Vandenberghe,
Convex Optimization . Cambridge University Press, 2004.1717] Y. Saad,
Iterative methods for sparse linear systems , 2nd ed. SIAM, 2003.[18] J. F. Sturm, “Implementation of interior point methods for mixed semidefinite and secondorder cone optimization problems,”
EconPapers 73 , August 2002, Tilburg University, Centerfor Economic Research.[19] P. Gupta and P. R. Kumar, “The capacity of wireless networks,”
IEEE Trans. on InformationTheory , vol. 46, no. 2, pp. 388–404, March 2000.[20] A. Dimakis, A. D. Sarwate, and M. J. Wainwright, “Geographic gossip: Efficient averaging forsensor networks,”
IEEE Trans. on Signal Processing , to appear.[21] W. Li and H. Dai, “Location-aided fast distributed consensus,”
IEEE Transactions on Infor-mation Theory , June 2007, submitted.[22] ——, “Cluster-based fast distributed consensus,”
IEEE Int. Conf. on Acoustics, Speech andSignal Processing (ICASSP) , April 2007.[23] K. Jung and D. Shah, “Fast gossip via nonreversible random walk,”
IEEE Information TheoryWorkshop (ITW) , March 2006.[24] F. Chen, L. Lov´asz, and I. Pak, “Lifting markov chains to speed up mixing,”
STOC ’99:Proceedings of the thirty-first annual ACM symposium on Theory of computing , pp. 275–281,1999.[25] Y. Hatano and M. Mesbahi, “Agreement over random networks,”
IEEE Conf. on Decision andControl , vol. 2, pp. 2010–2015, December 2004.[26] M. Mehyar, D. Spanos, J. Pongsajapan, S. H. Low, and R. M. Murray, “Asynchronous dis-tributed averaging on communication networks,”
IEEE/ACM Transactions on Networking ,vol. 15, no. 3, pp. 512–520, June 2007.[27] C. C. Moallemi and B. V. Roy, “Consensus propagation,”