[PDF] Gossip Algorithms for Distributed Signal Processing

Abstract

Gossip algorithms are attractive for in-network processing in sensor networks because they do not require any specialized routing, there is no bottleneck or single point of failure, and they are robust to unreliable wireless network conditions. Recently, there has been a surge of activity in the computer science, control, signal processing, and information theory communities, developing faster and more robust gossip algorithms and deriving theoretical performance guarantees. This article presents an overview of recent work in the area. We describe convergence rate results, which are related to the number of transmitted messages and thus the amount of energy consumed in the network for gossiping. We discuss issues related to gossiping over wireless links, including the effects of quantization and noise, and we illustrate the use of gossip algorithms for canonical signal processing tasks including distributed estimation, source localization, and compression.

Full PDF

DDIMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 1

Gossip Algorithms for Distributed SignalProcessing

Alexandros G. Dimakis, Soummya Kar, Jos´e M.F. Moura, Michael G. Rabbat, and Anna Scaglione,

Abstract

Gossip algorithms are attractive for in-network processing in sensor networks because they do not require anyspecialized routing, there is no bottleneck or single point of failure, and they are robust to unreliable wirelessnetwork conditions. Recently, there has been a surge of activity in the computer science, control, signal processing,and information theory communities, developing faster and more robust gossip algorithms and deriving theoreticalperformance guarantees. This article presents an overview of recent work in the area. We describe convergence rateresults, which are related to the number of transmitted messages and thus the amount of energy consumed in thenetwork for gossiping. We discuss issues related to gossiping over wireless links, including the effects of quantizationand noise, and we illustrate the use of gossip algorithms for canonical signal processing tasks including distributedestimation, source localization, and compression.

I. I

NTRODUCTION

Collaborative in-network processing is a major tenet of wireless sensor networking, and has received muchattention from the signal processing, control, and information theory communities during the past decade [1]. Earlyresearch in this area considered applications such as detection, classiﬁcation, tracking, and pursuit [2]–[5]. Byexploiting local computation resources at each node, it is possible to reduce the amount of data that needs to betransmitted out of the network, thereby saving bandwidth and energy, extending the network lifetime, and reducinglatency.

Manuscript received November 16, 2009; revised March 26, 2010.A.G. Dimakis is with the Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, 90089USA { e-mail: [email protected]. } S. Kar and J.M.F. Moura are with the Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213USA { e-mail: [email protected]; [email protected]. } M.G. Rabbat is with the Department of Electrical and Computer Engineering, McGill University, Montr´eal, QC, H3A 2A7 CANADA { e-mail:[email protected]. } A. Scaglione is with the Department of Electrical and Computer Engineering, University of California, Davis, CA, 95616 USA { e-mail:[email protected]. } The work of Kar and Moura was partially supported by the NSF under grants ECS-0225449 and CNS-0428404, and by the Ofﬁce of NavalResearch under MURI N000140710747. The work of Rabbat was partially supported by the NSERC under grant RGPIN 341596-2007, byMITACS, and by FQRNT under grant 2009-NC-126057. The work of Scaglione is supported by the NSF under grant CCF-0729074. a r X i v : . [ c s . D C ] M a r IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 2

In addition to having on-board sensing and processing capabilities, the archetypal sensor network node is battery-powered and uses a wireless radio to communicate with the rest of the network. Since each wireless transmissionconsumes bandwidth and, on common platforms, also consumes considerably more energy than processing datalocally [6], [7], reducing the amount of data transmitted can signiﬁcantly prolong battery life. In applications wherethe phenomenon being sensed varies slowly in space, the measurements at nearby sensors will be highly correlated.In-network processing can compress the data to avoid wasting transmissions on redundant information. In otherapplications, rather than collecting data from each node, the goal of the system may be to compute a function ofthe data such as estimating parameters, ﬁtting a model, or detecting an event. In-network processing can be usedto carry out the computation within the network so that, instead of transmitting raw data to a fusion center, onlythe results of the computation are transmitted to the end-user. In many situations, in-network computation leads toconsiderable energy savings over the centralized approach [8], [9].Many previous approaches to in-network processing assume that the network can provide specialized routingservices. For example, some schemes require the existence of a cyclic route through the network that passesthrough every node precisely one time [9]–[11]. Others are based on forming a spanning tree rooted at the fusioncenter or information sink, and then aggregating data up the tree [8], [12], [13]. Although using a ﬁxed routingscheme is intuitive, there are many drawbacks to this approach in wireless networking scenarios. Aggregating datatowards a fusion center at the root of a tree can cause a bottleneck in communications near the root and creates asingle point of failure. Moreover, wireless links are unreliable, and in dynamic environments, a signiﬁcant amountof undesirable overhead trafﬁc may be generated just to establish and maintain routes. A. Gossip Algorithms for In-Network Processing

This article presents an overview of gossip algorithms and issues related to their use for in-network processingin wireless sensor networks. Gossip algorithms have been widely studied in the computer science community forinformation dissemination and search [14]–[16]. More recently, they have been developed and studied for informationprocessing in sensor networks. They have the attractive property that no specialized routing is required. Each nodebegins with a subset of the data in the network. At each iteration, information is exchanged between a subset ofnodes, and then this information is processed by the receiving nodes to compute a local update.Gossip algorithms for in-network processing have primarily been studied as solutions to consensus problems ,which capture the situation where a network of agents must achieve a consistent opinion through local informationexchanges with their neighbors. Early work includes that of Tsitsiklis et al. [17], [18]. Consensus problems havearisen in numerous applications including: load balancing [19]; alignment, ﬂocking, and multi-agent collabora-tion [20], [21]; vehicle formation [22], tracking and data fusion [23], and distributed inference [24].The canonical example of a gossip algorithm for information aggregation is a randomized protocol for distributedaveraging. The problem setup is such that each node in a n -node network initially has a scalar measurement value, This is a Hamiltonian cycle, in graph-theoretic terms.

IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 3 and the goal is to have every node compute the average of all n initial values – often referred to as the averageconsensus. In pairwise randomized gossiping [25], each node maintains an estimate of the network average, whichit initializes with its own measurement value. Let x ( t ) denote the vector of estimates of the global averages afterthe t th gossip round, where x (0) is the vector of initial measurements; that is, x i ( t ) is the estimate at node i after t iterations. In one iteration, a randomly selected pair of neighboring nodes in the network exchange their currentestimates, and then update their estimates by setting x i ( t + 1) = x j ( t + 1) = (cid:0) x i ( t ) + x j ( t ) (cid:1) / . A straightforwardanalysis of such an algorithm shows that the estimate at each node are guaranteed to converge to the average, x ave = n (cid:80) ni =1 x i (0) , as long as the network is connected (information can ﬂow between all pairs of nodes),and as long as each pair of neighboring nodes gossips frequently enough; this is made more precise in Section IIbelow. Note that the primitive described above can be used to compute any function of the form (cid:80) ni =1 f i (cid:0) x i (0) (cid:1) by properly setting the initial value at each node, and while this is not the most general type of query, many usefulcomputations can be reduced in this form as will further be highlighted in Sections IV and V.Gossip algorithms can be classiﬁed as being randomized or deterministic. The scheme described above israndomized and asynchronous, since at each iteration a random pair of nodes is active. In deterministic, synchronousgossip algorithms, at each iteration node i updates x i ( t + 1) with a convex combination of its own values and thevalues received from all of its neighbors, e.g., as discussed in [26]. Asynchronous gossip is much better suitedto wireless sensor network applications, where synchronization itself is a challenging task. Asynchronous gossipcan be implemented using the framework described in [18], [27]. Each node runs an independent Poisson clock,and when node i ’s clock “ticks”, it randomly selects and gossips with one neighbor. In this formulation, denotingthe probability that node i chooses a neighbor j by P i,j , conditions for convergence can be expressed directlyas properties of these probabilities. Gossip and consensus algorithms have also been the subject of study withinthe systems and control community, with a focus on characterizing conditions for convergence and stability ofsynchronous gossiping, as well as optimization of the algorithm parameters P i,j ; see the excellent surveys byOlfati-Saber and Murray [28], and Ren et al. [29], and references therein. B. Paper Outline

Our overview of gossip algorithms begins on the theoretical side and progresses towards sensor network appli-cations. Each gossip iteration requires wireless transmission and thus consumes valuable bandwidth and energyresources. Section II discusses techniques for bounding rates of convergence for gossip, and thus the number oftransmissions required. Because standard pairwise gossip converges slowly on wireless network topologies, a largebody of work has focused on developing faster gossip algorithms for wireless networks, and this work is alsodescribed. When transmitting over a wireless channel, one must also consider issues such as noise and coding.Section III discusses the effects of ﬁnite transmission rates and quantization on convergence of gossip algorithms.Finally, Section IV illustrates how gossip algorithms can be applied to accomplish distributed signal processing Throughout, we will sometimes alternatively refer to the estimates x i ( t ) as states, and to nodes as agents. IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 4 tasks such as distributed estimation and compression.II. R

ATES OF C ONVERGENCE AND F ASTER G OSSIP

Gossip algorithms are iterative, and the number of wireless messages transmitted is proportional to the numberof iterations executed. Thus, it is important to characterize rate of convergence of gossip and to understand whatfactors inﬂuence these rates. This section surveys convergence results, describing the connection between the rateof convergence and the underlying network topology, and then describes developments that have been made in thearea of fast gossip algorithms for wireless sensor networks.

A. Analysis of Gossip Algorithms

In pairwise gossip, only two nodes exchange information at each iteration. More generally, a subset of nodes mayaverage their information. All the gossip algorithms that we will be interested in can be described by an equationof the form x ( t + 1) = W ( t ) x ( t ) , (1)where W ( t ) are randomly selected averaging matrices, selected independently across time, and x ( t ) ∈ R n is thevector of gossip states after t iterations. When restricted to pairwise averaging algorithms, in each gossip roundonly the values of two nodes i, j are averaged (as in [25]) and the corresponding W ( t ) matrices have / in thecoordinates ( i, i ) , ( i, j ) , ( j, i ) , ( j, j ) and a diagonal identity for every other node. When pairwise gossip is performedon a graph G = ( V, E ) , only the matrices that average nodes that are neighbors on G (i.e., i, j ∈ E ) are selectedwith non-zero probability. More generally, we will be interested in matrices that average sets of node values andleave the remaining nodes unchanged. A matrix W ( t ) acting on a vector x ( t ) is set averaging matrix for a set S of nodes, if x i ( t + 1) = 1 | S | (cid:88) i ∈ S x ( t ) , i ∈ S, (2)and x i ( t + 1) = x i ( t ) , i / ∈ S . Such matrices therefore have entry / | S | at the coordinates corresponding to the set S and a diagonal identity for all other entries.It is therefore easy to see that all such matrices will have the following properties:  (cid:126) T W ( t ) = (cid:126) T W ( t ) (cid:126) (cid:126) , (3)which respectively ensure that the average is preserved at every iteration, and that (cid:126) , the vector of ones, is a ﬁxedpoint. Further, any set averaging matrix W is symmetric and doubly stochastic. A matrix is doubly stochastic if itsrows sum to unity, and its columns also sum to unity, as implied in (3). The well-known Birkhoff–von NeumannTheorem states that a matrix is doubly stochastic if and only if it is a convex combination of permutation matrices.In the context of gossip, the only permutation matrices which contribute to the convex combination are those whichpermute nodes in S to other nodes in S , and keep all other nodes not in S ﬁxed. The matrix W must also be a IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 5 projection matrix; i.e., W = W since averaging the same set twice no longer changes the vector x ( t ) . It thenfollows that W must also be positive-semideﬁnite.We are now ready to understand the evolution of the estimate vector x ( t ) through the product of these randomlyselected set averaging matrices: x ( t + 1) = W ( t ) x ( t ) = t (cid:89) k =0 W ( k ) x (0) . (4)Since W ( t ) are selected independently across time, E W ( t ) = E W (0) and we can drop the time index andsimply refer to the expected averaging matrix E W , which is the average of symmetric, doubly stochastic, positivesemideﬁnite matrices and therefore also has these properties. The desired behavior is that x ( t + 1) → x ave (cid:126) that isequivalent to asking that t (cid:89) k =0 W ( k ) → n(cid:126) (cid:126) T . (5) B. Expected behavior

We start by looking at the expected evolution of the random vector x ( t ) by taking expectations on both sides of(4): E x ( t + 1) = E (cid:32) t (cid:89) k =0 W ( k ) (cid:33) x (0) = ( E W ) t +1 x (0) , (6)where the second equality is true because the matrices are selected independently. Since E W is a convex combinationof the matrices W ( t ) which all satisfy the conditions (3), it is clear that E W is also a doubly stochastic matrix. Wecan see that the expected evolution of the estimation vector follows a Markov chain that has the ¯ x ave (cid:126) vector as itsstationary distribution. In other words, (cid:126) is an eigenvector of E W with eigenvalue . Therefore if the Markov chaincorresponding to E W is irreducible and aperiodic, our estimate vector will converge in expectation to the desiredaverage. Let λ ( E [ W ]) be the second largest eigenvalue of E W . If condition (3) holds and if λ ( E [ W ]) < ,then x ( t ) converges to x ave (cid:126) in expectation and in mean square. Further precise conditions for convergence inexpectation and in mean square can be found in [30]. C. Convergence rate

The problem with the expectation analysis is that it gives no estimate on the rate of convergence, a key parameterfor applications. Since the algorithms are randomized, we need to specify what we mean by convergence. One notionthat yields clean theoretical results involves deﬁning convergence as the ﬁrst time where the normalized error issmall with high probability, and controlling both error and probability with one parameter, (cid:15) . Deﬁnition 1: (cid:15) -averaging time T ave ( (cid:15) ) . Given (cid:15) > , the (cid:15) -averaging time is the earliest gossip round in whichthe vector x ( k ) is (cid:15) close to the normalized true average with probability greater than − (cid:15) : T ave ( (cid:15) ) = sup x (0) inf t =0 , , ... (cid:40) P (cid:32) (cid:107) x ( t ) − x ave (cid:126) (cid:107)(cid:107) x (0) (cid:107) ≥ (cid:15) (cid:33) ≤ (cid:15) (cid:41) . (7)Observe that the convergence time is deﬁned for the worst case over the initial vector of measurements x (0) . Thisdeﬁnition was ﬁrst used in [25] (see also [31] for a related analysis). IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 6

The key technical theorem used in the analysis of gossip algorithms is the following connection between theaveraging time and the second largest eigenvalue of E W : Theorem 1:

For any gossip algorithm that uses set-averaging matrices and converges in expectation, the averagingtime is bounded by T ave ( (cid:15), E W ) ≤ (cid:15) − log (cid:16) λ ( E W ) (cid:17) ≤ (cid:15) − − λ ( E W ) . (8)This theorem is a slight generalization of Theorem 3 from [25] for non-pairwise averaging gossip algorithms. Thereis also a lower bound of the same order, which implies that T ave ( (cid:15), E W ) = Θ(log (cid:15) − / (1 − λ ( E W ))) .The topology of the network inﬂuences the convergence time of the gossip algorithm, and using this theoremthis is precisely quantiﬁed; the matrix E [ W ] is completely speciﬁed by the network topology and the selectionprobabilities of which nodes gossip. The rate at which the spectral gap − λ ( E [ W ]) approaches zero, as n increases, controls the (cid:15) -averaging time T ave . The spectral gap is related to the mixing time (see, e.g., [32]) of arandom walk on the network topology. Roughly, the gossip averaging time is the mixing time of the simple randomwalk on the graph times a factor of n . One therefore would like to understand how the spectral gap scales fordifferent models of networks and gossip algorithms.This was ﬁrst analyzed for the complete graph and uniform pairwise gossiping [15], [25], [30]. For this case it wasshown that λ ( E [ W ]) = 1 − /n and therefore, T ave = Θ( n log (cid:15) − ) . Since only nearest neighbors interact, eachgossip round costs two transmitted messages, and therefore, Θ( n log (cid:15) − ) gossip messages need to be exchanged toconverge to the global average within (cid:15) accuracy. This yields Θ( n log n ) messages to have a vanishing error withprobability /n , an excellent performance for a randomized algorithm with no coordination that averages n nodeson the complete graph. For other well connected graphs (including expanders and small world graphs), uniformpairwise gossip converges very quickly, asymptotically requiring the same number of messages ( Θ( n log (cid:15) − ) ) asthe complete graph. Note that any algorithm that averages n numbers with a constant error and constant probabilityof success should require Ω( n ) messages.If the network topology is ﬁxed, one can ask what is the selection of pairwise gossiping probabilities thatmaximizes the convergence rate (i.e. maximizes the spectral gap). This problem is equivalent to designing a Markovchain which approaches stationarity optimally fast and, interestingly, it can be formulated as a semideﬁnite program Because our primary interest is in understanding scaling laws—how many messages are needed as the network size grows—our discussioncenters on the order-wise behavior of gossip algorithms. Recall the Landau or “big O” notation: a function f is asymptotically bounded aboveby g , written f ( n ) = O ( g ( n )) , if there exist constants N > and c > such that f ( n ) ≤ c g ( n ) for all n ≥ N ; f is asymptoticallybounded below by g , written f ( n ) = Ω( g ( n )) , if there exist constants c > and N > such that f ( n ) ≥ c g ( n ) for n ≥ N ; and f isasymptotically bounded above and below by g , written f ( n ) = Θ( g ( n )) , if c g ( n ) ≤ f ( n ) ≤ c g ( n ) for all n ≥ N . IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 7 (SDP) which can be solved efﬁciently [25], [26], [33]. Unfortunately, for random geometric graphs and grids,which are the relevant topologies for large wireless ad-hoc and sensor networks, even the optimized version ofpairwise gossip is extremely wasteful in terms of communication requirements. For example for a grid topology,the number of required messages scales like Θ( n log (cid:15) − ) [25], [35]. Observe that this is of the same order as theenergy required for every node to ﬂood its estimate to all other nodes. On the contrary, the obvious solution ofaveraging numbers on a spanning tree and ﬂooding back the average to all the nodes requires only O ( n ) messages.Constructing and maintaining a spanning tree in dynamic and ad-hoc networks introduces signiﬁcant overhead andcomplexity, but a quadratic number of messages is a high price to pay for fault tolerance. D. Faster Gossip Algorithms

Pairwise gossip converges very slowly on grids and random geometric graphs because of its diffusive nature.Information from nodes is essentially performing random walks, and, as is well known, a random walk on the two-dimensional lattice has to perform d steps to cover distance d . One approach to gossiping faster is to modify thealgorithm so that there is some directionality in the underlying diffusion of information. Assuming that nodes haveknowledge of their geographic location, we can use a modiﬁed algorithm called geographic gossip [35]. The ideaof geographic gossip is to combine gossip with greedy geographic routing towards a randomly selected location. Ifeach node has knowledge of its own location and under some mild assumptions on the network topology, greedygeographic routing can be used to build an overlay network where any pair of nodes can communicate. The overlaynetwork is a complete graph on which pairwise uniform gossip converges with Θ( n log (cid:15) − ) iterations. At eachiteration, we perform greedy routing, which costs Θ( (cid:112) n/ log n ) messages on a random geometric graph (also theorder of the diameter of the network). In total, geographic gossip thus requires Θ( n . log (cid:15) − / √ log n ) messages.The technical part of the analysis involves understanding how this can be done with only local information: assumingthat each node only knows their own location, routing towards a randomly selected location is not identical to routingtowards a randomly selected node. If the nodes are evenly spaced, however, these two processes are almost thesame and the Θ( n . ) message scaling still holds [35].Li and Dai [36], [37] recently proposed Location-Aided Distributed Averaging (LADA) , a scheme that usespartial locations and Markov chain lifting to create fast gossiping algorithms. Lifting of gossip algorithms is basedon the seminal work of Diaconis et al. [38] and Chen et al. [39] on lifting Markov chain samplers to accelerateconvergence rates. The basic idea is to lift the original chain to one with additional states; in the context of gossiping,this corresponds to replicating each node and associating all replicas of a node with the original. LADA creates The family of random geometric graphs with n nodes and connectivity radius r , denoted G ( n, r ) , is obtained by placing n nodes uniformlyat random in the unit square, and placing an edge between two nodes if their Euclidean distance is no more than r . In order to process data inthe entire network, it is important that the network be connected (i.e., there is a route between every pair of nodes). A fundamental result dueto Gupta and Kumar [34] states that the critical connectivity threshold for G ( n, r ) is r con ( n ) = Θ( (cid:113) log nn ) ; that is, if r does not scale as fastas r con ( n ) , then the network is not connected with high probability, and if r scales at least as fast as r con ( n ) , then the network is connectedwith high probability. Throughout this paper, when using random geometric graphs it is implied that we are using G ( n, r con ( n )) , in order toensure that information ﬂows across the entire network. IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 8 one replica of a node for each neighbor and associates the policy of a node given it receives a message fromthe neighbor with that particular lifted state. In this manner, LADA suppresses the diffusive nature of reversibleMarkov chains that causes pairwise randomized gossip to be slow. The cluster-based LADA algorithm performsslightly better than geographic gossip, requiring Θ( n . log (cid:15) − / (log n ) . ) messages for random geometric graphs.While the theoretical machinery is different, LADA algorithms also use directionality to accelerate gossip, but canoperate even with partial location information and have smaller total delay compared to geographic gossip, at thecost of a somewhat more complicated algorithm. A related scheme based on lifting was proposed concurrentlyby Jung, Shah, and Shin [40]. Mosk-Aoyama and Shah [41] use an algorithm based on the work of Flajolet andMartin [42] to compute averages and bound the averaging time in terms of a “spreading time” associated with thecommunication graph, with a similar scaling for the number of messages on grids and RGGs.Just as algorithms based on lifting incorporate additional memory at each node (by way of additional statesin the lifted Markov chain), another collection of algorithms seek to accelerate gossip computations by havingnodes remember a few previous state values and incorporate these values into the updates at each iteration. Thesememory-based schemes can be viewed as predicting the trajectory as seen by each node, and using this predictionto accelerate convergence. The schemes are closely related to shift-register methods studied in numerical analysis toaccelerate linear system solvers. The challenge of this approach is to design local predictors that provide speedupswithout creating instabilities. Empirical evidence that such schemes can accelerate convergence rates is shown in[43], and numerical methods for designing linear prediction ﬁlters are presented in [44], [45]. Recent work ofOreshkin et al. [46] shows that improvements in convergence rate on par with of geographic gossip are achieved bya deterministic, synchronous gossip algorithm using only one extra tap of memory at each node. Extending thesetheoretical results to asynchronous gossip algorithms remains an open area of research.The geographic gossip algorithm uses location information to route packets on long paths in the network. Onenatural extension of the algorithm is to allow all the nodes on the routed path to be averaged jointly. This canbe easily performed by aggregating the sum and the hop length while routing. As long as the information of theaverage can be routed back on the same path, all the intermediate nodes can replace their estimates with updatedvalue. This modiﬁed algorithm is called geographic gossip with path averaging . It was recently shown [47] thatthis algorithm converges much faster, requiring only Θ( √ n ) gossip interactions and Θ( n log (cid:15) − ) messages, whichis clearly minimal.A related distributed algorithm was introduced by Savas et al. [48], using multiple random walks that merge inthe network. The proposed algorithm does not require any location information and uses the minimal number ofmessages, Θ( n log n ) , to average on grid topologies with high probability. The coalescence of information reducesthe number of nodes that update information, resulting in optimal communication requirements but also less faulttolerance. In most gossip algorithms all nodes keep updating their information which, as we discuss in the nextsection, adds robustness with respect to changes to the network and noise in communications.Finally, we note the recent development of schemes that exploit the broadcast nature of wireless communications inorder to accelerate gossip rates of convergence [49], [50], either by having all neighbors that overhear a transmission IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 9 execute a local update, or by having nodes eavesdrop on their neighbors’ communication and then using thisinformation to strategically select which neighbor to gossip with next. The next section discusses issues arisingwhen gossiping speciﬁcally over wireless networks.III. R

ATE L IMITATIONS IN G OSSIP A LGORITHMS

Rate limitations are relevant due to the bandwidth restrictions and the power limitations of nodes. Finite trans-mission rates imply that nodes learn of their neighbors’ states with ﬁnite precision; if the distortion is measured bythe MSE, then it is well established that the operational distortion rate function is exponentially decaying with thenumber of bits [51], which implies that the precision doubles for each additional bit of representation. For example,in an AWGN channel with path loss inversely proportional to the distance squared, r , the rate R needs to be belowthe capacity bound R < C = 1 / γr − ) . Then, at a ﬁxed power budget, every bit of additional precisionrequires approximately shrinking the range by half; i.e. , ﬁxing γ , the channel capacity increases as the inter-nodedistance decreases. For a uniform network deployment, this would reduce the size of each node’s neighborhoodby about 75%, decreasing the network connectivity and therefore the convergence speed. This simple argumentillustrates the importance of understanding if the performance of gossip algorithms degrades gracefully as thecommunication rate of each link decreases.Before summarizing the key ﬁndings of selected literature on the subject of average consensus under communi-cation constraints, we explain why some papers care about this issue and some do not. A. Are Rate Constraints Signiﬁcant?

In most sensor network architectures today, the overhead of packet headers and reliable communication is sogreat that using a few bytes to encode the gossip state variables exchanged leads to negligible additional cost whilepractically giving a precision that can be seen as inﬁnite. Moreover, we can ignore bit errors in transmissions, whichvery rarely go undetected thanks to CRC bits. It is natural to ask: why should one bother studying rate constraintsat all?

One should bother because existing sensor network modems are optimized to transmit long messages, infrequently,to nearby neighbors, in order to promote spatial bandwidth reuse, and were not designed with decentralized iterativecomputation in mind. Transmission rates are calculated amortizing the overhead of establishing the link over theduration of very long transmission sessions.Optimally encoding for computation in general (and for gossiping in particular) is an open problem; very few havetreated the subject of communication for computation in an information theoretic sense (see, e.g., [52], [53]) andconsensus gossiping is nearly absent in the landscape of network information theory. This is not an accident. Brokenup in parts, consensus gossip contains the elements of complex classical problems in information theory, such asmulti-terminal source coding, the two-way channel, the feedback channel, the multiple access of correlated sourcesand the relay channel [54]; this is a frightening collection of open questions. However, as the number of possibleapplications of consensus gossip primitives expands, designing source and channel encoders to solve precisely

IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 10 this class of problems more efﬁciently, even though perhaps not optimally, is a worthy task. Desired features areefﬁciency in exchanging frequently, and possibly in an optimal order, few correlated bits, and exchanging withnodes that are (at least occasionally) very far, to promote rapid diffusion. Such forms of communications are veryimportant in sensor networks and network control.Even if fundamental limits are hard to derive, there are several heuristics that have been applied to the problemto yield some achievable bound. Numerous papers have studied the effects of intermittent or lossy links in thecontext of gossip algorithms (i.i.d. and correlated models, symmetric and asymmetric) [55]–[64]. In these models,lossy links correspond to masking some edges from the topology at each iteration, and, as we have seen above,the topology directly affects the convergence rate. Interestingly, a common thread running through all of the workin this area is that so long as the network remains connected on average, convergence of gossip algorithms is notaffected by lossy or intermittent links, and convergence speeds degrade gracefully.Another aspect that has been widely studied is that of source coding for average consensus, and is the one thatwe consider next in Section III-B. It is fair to say that, particularly in wireless networks, the problem of channelcoding is essentially open, as we will discuss in section III-C.

B. Quantized consensus

Quantization maps the state variable exchanged x j ( t ) onto codes that correspond to discrete points Q t,j ( x j ( t )) = q j ( t ) ∈ Q t,j ⊂ R . The set Q t,j is referred to as the code used at time t by node j ; the points q j ( t ) are used togenerate an approximation ˆ x j ( t ) of the state x j ( t ) ∈ R that each node needs to transmit; the quantizer rate , in bits,is R t,j = log |Q t,j | , where |A| is the cardinality of the set A . Clearly, under the constraints speciﬁed previouslyon the network update matrix W ( t ) , the consensus states, { c(cid:126) c ∈ R } , are ﬁxed points. The evolution of thenodes’ quantized states is that of an automaton; under asynchronous random exchanges, the network state forms aMarkov chain with (cid:81) nj =1 |Q t,j | possible states and consensus states { c(cid:126) c ∈ R } that are absorbing states . Thecumulative number of bits that quantized consensus diffuses throughout the network asymptotically is: R ∞ tot = ∞ (cid:88) t =1 R t,tot = ∞ (cid:88) t =1 n (cid:88) j =1 R t,j . (9)The ﬁrst simple question is: for a ﬁxed uniform quantizer with step-size ∆ , i.e., ˆ x j ( t ) = uni ∆ ( x j ( t )) = argmin q ∈Q | x j ( t ) − q | , where Q = { , ± ∆ , ± , . . . , ± (2 R − − } , do the states x ( t ) always converge (in a probabilistic sense) to theﬁxed points c(cid:126) ? The second is: what is the distortion d (cid:16) lim k →∞ x ( t ) , n (cid:80) ni =1 x i (0) (cid:126) (cid:17) due to limited R k,i or a totalbudget R tot ? Fig. 1 illustrates the basic answers through numerical simulation. Interestingly, with a synchronousgossip update, quantization introduces new ﬁxed points other than consensus (Fig. 1(a)), and asynchronous gossipingin general reaches consensus, but without guarantees on the location of outcome (Fig. 1(b)).Kashyap et al. [65] ﬁrst considered a ﬁxed code quantized consensus algorithm, which preserves the networkaverage at every iteration. In their paper, the authors draw an analogy between quantization and load balancing IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 11 (a)

10 20 30 40 50 60 70 80 90 10000.10.20.30.40.50.60.70.80.91 iteration s t a t e (b)

200 400 600 800 1000 1200 1400 1600 1800 200000.10.20.30.40.50.60.70.80.91 iteration s t a t e exact average Fig. 1. Quantized consensus over a random geometric graph with n = 50 nodes transmission radius r = . and initial states ∈ [0 , withuniform quantization with 128 quantization levels. Synchronous updates (a) and pairwise exchange (b). IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 12 among processors, which naturally comes with an integer constraint since the total number of tasks are ﬁnite anddivisible only by integers (see, e.g., [19], [66], [67]). Distributed policies to attain a balance among loads werepreviously proposed in [68], [69]. Assuming that the average can be written as n (cid:80) nj =1 x j (0) = Sn and denoting L (cid:44) S mod n , under these updates in [65] it is proven that any algorithm meeting the aforementioned conditionsmakes every node converge to either L or L + 1 , thereby approximating the average. The random gossip algorithmanalyzed in [70] leads to a similar result, where the ﬁnal consensus state differs at most by one bin from the trueaverage; the same authors discuss bounds on the rate of convergence in [71]. In these protocols the agents willbe uncertain on what interval contains the actual average: the nodes whose ﬁnal value is L will conclude that theaverage is in [ L − , L + 1] and those who end with L + 1 will think that the average is in [ L, L + 2] . Benezit etal. [72] proposed a slight modiﬁcation of the policy, considering a ﬁxed rate class of quantization strategies that arebased on voting , requiring only 2 bits of memory per agent and attaining a consensus on the interval that containsthe actual average.To overcome the fact that not all nodes end up having the same quantized value, a simple variant on the quantizedconsensus problem that guarantees almost sure convergence to a unique consensus point was proposed concurrentlyin [73] and [74]. The basic idea is to dither the state variables by adding a uniform random variable u ∼ U ( − ∆2 , ∆2 ) prior to quantizing the states, i.e., ˆ x i ( t ) = uni ∆ ( x i ( t ) + u ) . This modest change enables gossip to converge to aconsensus almost surely , as shown in [75]. This guarantees that the nodes will make exactly the same decision.However, the algorithm can deviate more from the actual average than the quantized consensus policies consideredin [65]. The advantage of using a ﬁxed code is the low complexity, but with relatively modest additional cost, theperformance can considerably improve.Carli et al. in [76] noticed that the issue of quantizing for consensus averaging has analogies with the problem ofstabilizing a system using quantized feedback [77], which amounts to partitioning the state-space into sets whosepoints can be mapped to an identical feedback control signal. Hence, the authors resorted to control theoretic tools toinfer effective strategies for quantization. In particular, instead of using a static mapping, they model the quantizer Q t ( x i ( t )) at each node i as a dynamical system with internal state ξ i ( t ) , which is coupled with the consensusupdate through a quantized error variable q i ( t ) (see Fig. 2). They study two particular strategies. They refer to theﬁrst as the zoom in - zoom out uniform coder/decoder where they adaptively quantize the state as follows. Thenode states are deﬁned as ξ i ( t ) = (ˆ x − ,i ( t ) , f i ( t )) . (10)The quantized feedback and its update are ˆ x i ( t ) = ˆ x − ,i ( t + 1) = ˆ x − ,i ( t ) + f i ( t ) q i ( t ); (11) q i ( t ) = uni ∆ (cid:18) x i ( t ) − ˆ x − ,i ( t ) f i ( t ) (cid:19) , (12) IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 13 F ( ξ i ( k ) , q i ( k )) ξ i ( k ) q i ( k ) Q ( ξ i ( k ) , x i ( k )) x i ( k ) z − Encoder stateNode state

Code F ( ξ i ( k ) , q i ( k )) ξ i ( k ) q i ( k ) Code H ( ξ i ( k ) , q i ( k )) ˆ x i ( k )Quantizedstate Channel

Enc. i Dec. j z − Fig. 2. Quantized consensus Node i Encoder and Node j Decoder, with memory. which is basically a differential encoding, and f i ( t ) is the stepsize, updated according to f i ( t + 1) =  k in f i ( t ) if | q i ( t ) | < k out f i ( t ) if | q i ( t ) | = 1 , (13)which allows the encoder to adaptively zoom-in and out, depending on the range of q i ( t ) . The second strategy hasthe same node states but uses a logarithmic quantizer, ˆ x i ( t ) = ξ i ( t + 1) = ξ i ( t ) + q i ( t ); (14) q i ( t ) = log δ ( x i ( t ) − ξ i ( t )) , (15)where the logarithmic quantization amounts to: q i ( t ) = sign( x i ( t ) − ξ i ( t )) (cid:18) δ − δ (cid:19) (cid:96) i ( t ) (16) (cid:96) i ( t ) : 11 − δ ≤ | x i ( t ) − ξ i ( t ) | (cid:18) δ − δ (cid:19) − (cid:96) i ( t ) ≤

11 + δ .

In [76] numerical results are provided for the convergence of the zoom-in/out quantizer, while the properties of thelogarithmic quantizer are studied analytically. Remarkably, the authors prove that if the state average is preserved andif < δ < λ min ( W )3 − λ min ( z ) , then the network reaches asymptotically exactly the same state as the un-quantized averageconsensus. In other words, for all i , lim k →∞ x i ( t ) = n (cid:80) ni =1 x i (0) . One needs to observe that the logarithmicquantizer replaces state values in an uncountable set R with discrete countable outputs (cid:96) ∈ N , in the most efﬁcient IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 14 q i ( k )Code ˆ x i ( k )QuantizedstateDec. j H (( ζ ij ( k ) , ξ i ( k )) , q i ( k )) G (( ζ ij ( k ) , ξ i ( k )) , q i ( k )) x j ( k ) , ... Side Information z − ζ ij ( k ) , ξ i ( k ) Fig. 3. Node j Decoder, with memory and side information. way [77], but there are still inﬁnite many such sets; in other words, the logarithmic quantizer has unlimited rangeand therefore R t,j = ∞ . Hence, in practice, one will have to accept a penalty in accuracy when its range is limited.The vast signal processing literature on sampling and quantization can obviously be applied to the consensusproblem as well to ﬁnd heuristics. It is not hard to recognize that the quantizers analyzed in [76] are equivalent topredictive quantizers. Noting that the states are both temporally and spatially correlated, it is clear that encoding usingthe side information that is available at both transmitter and receiver can yield improved performance and lower cost;this is the tenet of the work in [78], [79], which analyzed a more general class of quantizers. They can be captured ina similar framework as that of [76] by adding an auxiliary state variable ζ ij ( t ) , which affects the state of the decoderonly (see the decoder in Fig. 3). The idea is similar, since ˆ x − ,i ( t + 1) in (11) is replaced in [79] by the optimumlinear minimum mean-squared error prediction, performed using k previous states ˆ x − ,i ( t ) = (cid:80) kl =1 a i,k ( l )ˆ x i ( t − l ) .Similarly, the receiver state is introduced to utilize the idea of coding with side information [80], where the sideinformation about x i ( t ) that is available at, say, receiver j consists of the receiver’s present state x j ( t ) , as well aspossibly its own past states and those of neighbors in communication with node j . The decoder augmented state ( ζ ij ( t ) , ξ i ( t )) in [79] is useful to reap the beneﬁts of the reﬁnement in the prediction of x i ( t ) that the decoder canobtain using its own side information. This prediction ˆ x − ,ij ( t ) can more closely approximate the true state x i ( t ) compared to the transmitter ˆ x − ,i ( t ) and this, in turn, means that (12) can be replaced by a nested quantizer, suchas for example the nested lattice quantizers in [81]. In practice, to keep the complexity at bay, one can use a staticnested lattice quantizer at the transmitter without any memory, while using the current local state as the j -nodedecoder state, i.e., ζ ij ( t ) = x j ( t ) . The main analytical result in [79] is the conclusion that, even with the lowestcomplexity (i.e. prediction memory k = 1 only or ζ ij ( t ) = x j ( t ) and no memory) one needs ﬁnite R ∞ tot < ∞ toguarantee that the network will reach consensus with a bounded error d (cid:16) lim k →∞ x ( t ) , n (cid:80) ni =1 x i (0) (cid:126) (cid:17) ≤ D ∞ tot that decreases as a function of R ∞ tot . This is useful to establish since one may argue that, as long as the network isﬁnite, ﬂooding each value from each node, rather than gossiping, would require a total cost in term of transmissionbits that is ﬁnite, which can also be reduced via optimal joint source and network coding methods. It is meaningfulto ask if gossiping can also lead to a similar rate-distortion tradeoff and the result in [79] suggests that this is,indeed, the case. IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 15

Recent work has begun to investigate information-theoretic performance bounds for gossip. These bounds charac-terize the rate-distortion tradeoff either (i) as a function of the underlying network topology assuming that each linkhas a ﬁnite capacity [82], or (ii) as a function of the rate of the information source providing new measurementsto each sensor [83].

C. Wireless channel coding for average consensus

Quantization provides a source code, but equally important is the channel code that is paired with it. First, usingthe separation of source and channel coding in wireless networks is not optimal in general. Second, and moreintuitively, in a wireless network there are a variety of rates that can be achieved with a variety of nodes underdifferent trafﬁc conditions. The two key elements that determine what communications can take place are schedulingand channel coding. Theoretically, there is no ﬁxed-range communication; any range can be reached albeit withlower capacity. Also, there is no such thing as a collision; rather, there is a tradeoff between the rate that multipleusers can simultaneously access the channel.The computational codes proposed in [84] aim to strike a near-optimal trade-off for each gossip iteration, byutilizing the additive noise multiple access channel as a tool for directly computing the average of the neighborhood.The idea advocated by the authors echoes their previous work [53]: nodes send lattice codes that, when added throughthe channel, result in a lattice point that encodes a speciﬁc algebraic sum of the inputs. Owing to the algebraicstructure of the channel codes and the linearity of the channel, each recipient decodes directly the linear combinationof the neighbors’ states, which provides a new estimate of the network average when added to the local state. Theonly drawbacks of this approach is that 1) it requires channel state information at the transmitter, and 2) that onlyone recipient can be targeted at the time. The scenario considered is closer to that in [83], since a stream of dataneeds to be averaged, and a ﬁnite round is dedicated to each input. The key result proven is that the number ofrounds of gossip grows as O (log n /r ) where r is the radius of the neighborhood.IV. S ENSOR N ETWORK A PPLICATIONS OF G OSSIP

This section illustrates how gossip algorithms can be applied to solve representative problems in wireless sensornetworks. Of course, gossip algorithms are not suited for all distributed signal processing tasks. They have provenuseful, so far, for problems that involve computing functions that are linear combinations of data or statistics ateach node. Two straightforward applications arise from distributed inference and distributed detection. When sensorsmake conditionally independent observations, the log-likelihood function conditioned on a hypothesis H j is simplythe sum of local log-likelihood functions, (cid:80) ni =1 log p ( x i | H j ) , and so gossip can be used for distributed detection(see also [85], [86]). Similarly, if sensor readings can be modeled as i.i.d. Gaussian with unknown mean, distributedinference of the mean boils down to computing the average of the sensor measurements, and again gossip can beapplied. Early papers that made a broader connection are those of Saligrama et al. [86], and Moallemi and VanRoy [87], which both discuss connections between gossip algorithms and belief propagation. IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 16

Below we consider three additional example applications. Section IV-A describes a gossip algorithm for distributedlinear parameter estimation that uses stochastic approximation to overcome quantization noise effects. Sections IV-Band IV-C illustrate how gossip can be used for distributed source localization and distributed compression, respec-tively. We also note that gossip algorithms have recently been applied to problems in camera networks for distributedpose estimation [88], [89].

A. Robust Gossip for Distributed Linear Parameter Estimation

The present section focuses on robust gossiping for distributed linear parameter estimation of a vector ofparameters with low-dimensional observations at each sensor. We describe the common assumptions on sensing, thenetwork topology, and the gossiping protocols. Although we focus on estimation, the formulation is quite generaland applies to many inference problems, including distributed detection and distributed localization.

1) Sensing/Observation Model:

Let θ ∈ R m × be an m -dimensional parameter that is to be estimated by anetwork of n sensors. We refer to θ as a parameter, although it is a vector of m parameters. For deﬁniteness weassume the following observation model for the i -th sensor: z i ( t ) = H i θ + w i ( t ) (17)where: (cid:8) z i ( t ) ∈ R m i × (cid:9) t ≥ is the i.i.d. observation sequence for the i -th sensor; { w i ( t ) } t ≥ is a zero-meani.i.d. noise sequence of bounded variance. For most practical sensor network applications, each sensor observesonly a subset of m i of the components of θ , with m i (cid:28) m . Under such conditions, in isolation, each sensor canestimate at most only a part of the parameter. Since we are interested in obtaining a consistent estimate of theentire parameter θ at each sensor, we need some type of observability condition. We assume the matrix n (cid:88) i =1 H Ti H i (18)is full rank. Note that the invertibility is even required by a centralized estimator (one which has access to datafrom all sensors at all time) to get a consistent estimate of θ . It turns out that, under reasonable assumptions on thenetwork connectivity, this necessary condition for centralized observability is sufﬁcient for distributed observability,i.e., for each sensor to obtain a consistent estimate of θ . It is not necessary to restrict to time-invariant observationmatrices and the H i s can be random time-varying [90], as would be required in most regression based analyses. Ingeneral, the observations need not come from a linear statistical model and may be distributions parameterized by θ . The distributed observability would then correspond to asymptotic distinguishability of the collection of thesedistributions over the network. A generic formulation in such a setting requires the notion of separably estimableobservation models (see [91]).An equivalent formulation of the estimation problem in the setting considered above, comes from the distributed least mean square (LMS) adaptive ﬁltering framework [92]–[94]. The objective here is slightly different. While weare interested in consistent estimates of the entire parameter at each sensor, the LMS formulations require, in adistributed way, to adapt to the environment to produce a desired response at each sensor, and the observability issue IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 17 is not of primary importance. A generic framework for distributed estimation, both in the static parameter case andwhen the parameter is non-stationary, is addressed in [95]. An important aspect of algorithm design in these casesis the choice of the inter-sensor weight sequence for fusing data or estimates. In the static parameter case, wherethe objective is to drive all the sensors to the true parameter value, the weight sequence necessarily decays overtime to overcome the accumulation of observation and other forms of noises, whereas, in the dynamic parameterestimation case, it is required that the weight sequence remains bounded away from zero, so that the algorithmpossesses tracking abilities. We direct the reader to the recent article [96] for a discussion along these lines. In thedynamic case, we also suggest the signiﬁcant literature on distributed Kalman ﬁltering (see, e.g., [97]–[101] andthe references therein), where the objective is not consensus seeking among the local estimates, but, in general,optimizing fusion strategies to minimize the mean-squared error at each sensor.It is important to note here that average consensus is a speciﬁc case of a distributed parameter estimation model,where each sensor initially takes a single measurement, and sensing of the ﬁeld thereafter is not required for theduration of the gossip algorithm. Several distributed inference protocols (for example, [24], [102], [103]) are basedon this approach, where either the sensors take a single snapshot of the ﬁeld at the start and then initiate distributedconsensus protocols (or more generally distributed optimization, as in [103]) to fuse the initial estimates, or theobservation rate of the sensors is assumed to be much slower than the inter-sensor communicate rate, thus permittinga separation of the two time-scales.

2) Distributed linear parameter estimation:

We now brieﬂy discuss distributed parameter estimation in the linearobservation model (17). Starting from an initial deterministic estimate of the parameters (the initial states may berandom, we assume deterministic for notational simplicity), x i (0) ∈ R m × , each sensor generates, by a distributediterative algorithm, a sequence of estimates, { x i ( t ) } t ≥ . To simplify the discussion in this section, we assume asynchronous update model where all nodes exchange information and update their local estimates at each iteration.The parameter estimate x i ( t + 1) at the i -th sensor at time t + 1 is a function of: 1) its previous estimate; 2) thecommunicated quantized estimates at time t of its neighboring sensors; and 3) the new observation z i ( t ) . The data issubtractively dithered quantized, i.e., there exists a vector quantizer Q ( . ) and a family, (cid:8) ν lij ( t ) (cid:9) , of i.i.d. uniformlydistributed random variables on [ − ∆ / , ∆ / such that the quantized data received by the i -th sensor from the j -thsensor at time t is Q ( x j ( t ) + ν ij ( t )) , where ν ij ( t ) = [ ν ij ( t ) , · · · , ν mij ( t )] T . It then follows that the quantizationerror, ε ij ( t ) ∈ R m × , is a random vector, whose components are i.i.d. uniform on [ − ∆ / , ∆ / and independentof x j ( t ) .

3) Stochastic approximation algorithm:

Let N i ( t ) denote the neighbors of node i at iteration t ; that is, j ∈ N i ( t ) if i can receive a transmission from j at time t . In this manner, we allow the connectivity of the network to varywith time. Based on the current state, x i ( t ) , the quantized exchanged data { Q ( x j ( t ) + ν ij ( t )) } j ∈N i ( t ) , and the IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 18 observation z i ( t ) , the updated estimate at node i is x i ( t + 1) = x i ( t ) − α ( t )  b (cid:88) j ∈N i ( t ) ( x i ( t ) (19) − Q ( x j ( t ) + ν ij ( t ))) − H Ti ( z i ( t ) − H i x i ( t )) (cid:105) In (19), b > is a constant and { α ( t ) } t ≥ is a sequence of weights satisfying the persistence condition : α ( t ) ≥ , (cid:88) t α ( t ) = ∞ , (cid:88) t α ( t ) < ∞ (20)Algorithm (19) is distributed because for sensor n it involves only the data from the sensors in its neighbor-hood N i ( t ) .The following result from [91] characterizes the desired statistical properties of the distributed parameter es-timation algorithm just described. The ﬂavor of these results is common to other stochastic approximation algo-rithms [104]. First, we have a law of large numbers-like result which guarantees that the estimates at each nodewill converge to the true parameter estimates, P (cid:16) lim t →∞ x i ( t ) = θ, ∀ i (cid:17) = 1 (21)If, in addition to the conditions mentioned above, the weight sequence is taken to be α ( t ) = at + 1 , (22)for some constant a > , we also obtain a central limit theorem-like result, describing the distribution of estimationerror over time. Speciﬁcally, for a sufﬁciently large, we have that the error, √ t (cid:0) x ( t ) − (cid:126) ⊗ θ (cid:1) converges in distribution to a zero-mean multivariate normal with covariance matrix that depends on the observationmatrices, the quantization parameters, the variance of the measurement noise, w i ( t ) , and the constants a and b . Thetwo most common techniques for analyzing stochastic approximation algorithms are stochastic Lyapunov functionsand the ordinary differential equations method [104]. For the distributed estimation algorithm (19), the results justmentioned can be derived using the Lyapunov approach [91], [105].Performance analysis of the algorithm for an example network is illustrated in Figure 4. An example networkof n = 45 sensors are deployed randomly on a × grid, where sensors communicate in a ﬁxed radius andare further constrained to have a maximum of neighbors per node. The true parameter θ ∗ ∈ R . Each node isassociated with a single component of θ ∗ . For the experiment, each component of θ ∗ is generated by an instantiationof a zero mean Gaussian random variable of variance 25. (Note, the parameter θ ∗ here has a physical signiﬁcance We need the α ( t ) to sum to inﬁnity, so that the algorithm ‘persists’ and does not stop; on the other hand, the α sequence should be squaresummable to prevent the build up of noise over time. IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 19 (a)

50 100 150 200 250 30000.511.522.53

Iterations E rr o r p e r c o m p o n e n t o f θ ∗ (b)Fig. 4. Illustration of distributed linear parameter estimation. (a) Example network deployment of 45 nodes. (b) Convergence of normalizedestimation error at each sensor. and may represent the state of the ﬁeld to be estimated. In this example, the ﬁeld is assumed to be white, stationaryand hence each sample of the ﬁeld has the same Gaussian distribution and independent of the others. More generally,the components of θ ∗ may correspond to random ﬁeld samples, as dictated by the sensor deployment, representing adiscretization of the PDE governing the ﬁeld.) Each sensor observes the corresponding ﬁeld component in additiveGaussian noise. For example, sensor 1 observes z ( t ) = θ ∗ + w ( t ) , where w ( t ) ∼ N (0 , . Clearly, such a modelsatisﬁes the distributed observability condition G = (cid:88) i H Ti H i = I = G − (23)(Note, here H i = e Ti , where e i is the standard unit vector with 1 at the i -th component and zeros elsewhere.)Fig. 4(a) shows the network topology, and Fig. 4(b) shows the normalized error of each sensor plotted against theiteration index t for an instantiation of the algorithm. The normalized error for the i -th sensor at time t is givenby the quantity (cid:107) x i ( t ) − θ ∗ (cid:107) / , i.e., the estimation error normalized by the dimension of θ ∗ . We note that theerrors converge to zero as established by the theoretical ﬁndings. The decrease is rapid at the beginning and slowsdown at t increases. This is a standard property of stochastic approximation based algorithms and is attributed tothe decreasing weight sequence α ( t ) required for convergence.It is interesting to note that, although the individual sensors suffer from low rank observations of the true parameter,by collaborating, each of them can reconstruct the true parameter value. The asymptotic normality shows that theestimation error at each sensor decays as / √ t , the decay rate being similar to that of a centralized estimator havingaccess to all the sensor observations at all times. The efﬁciency of the distributed estimator is measured in terms ofits asymptotic variance, the lower limit being the Fisher information rate of the corresponding centralized estimator.As expected, because of the distributed nature of the protocol (information needs to disseminate across the entirenetwork) and quantized (noisy) inter-sensor communication, the achieved asymptotic variance is larger than the IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 20 centralized Fisher information rate. In the absence of quantization (perfect communication), it can be shown thatthe parameter a in eqn. (22) can be designed appropriately so that the asymptotic variance of the decentralizedestimator matches the centralized Fisher information rate, showing that the distributed estimator described above isefﬁcient for the centralized estimation problem. An example of interest, with Gaussian observation noise is studiedin [91], where it is shown that asymptotic variance attainable by the distributed algorithm is the same as that ofthe optimum (in the sense of Cram´er-Rao) centralized estimator having access to all information simultaneously.This is an interesting result, as it holds irrespective of the network topology. Such a phenomenon is attributed toa time scale separation between the consensus potential and the innovation rate (rate of new information enteringthe network), when inter-sensor communication is unquantized (perfect) with possible link failures.As noted before, the observation model need not be linear for distributed parameter estimation. In [91] a large classof nonlinear observation models were considered and a notion of distributed nonlinear observability called separablyestimable observable models introduced. Under the separably estimable condition, there exist local transforms underwhich the updates can be made linear. However, such a state transformation induces different time scales on theconsensus potential and the innovation update, giving the algorithm a mixed time-scale behavior (see [91], [106]for details.) This mixed time-scale behavior and the effect of biased perturbations leads to the inapplicability ofstandard stochastic approximation techniques. B. Source Localization

A canonical problem, encompassing many of the challenges which commonly arise in wireless sensor networkapplications, is that of estimating the location of an energy-emitting source [1]. Patwari et al. [107] presents anexcellent overview of the many approaches that have been developed for this problem. The aim in this sectionis to illustrate how gossip algorithms can be used for source localization using received signal strength (RSS)measurements.Let θ ∈ R denote the coordinates of the unknown source, and for i = 1 , . . . , n , let y i ∈ R denote the locationof the i th sensor. The RSS measurement at node i is modeled as f i = α (cid:107) y i − θ (cid:107) β + w i , (24)where α > is the signal strength emitted at the source, β is the path-loss coefﬁcient, and w j is additivewhite Gaussian noise. Typical values of β are between 2 and 4. This model was validated experimentally in[108]. Centralized maximum likelihood estimators for single and multiple-source scenarios based on this model arepresented in [109] and [110]. Because the maximum likelihood problem is, in general, non-linear and non-convex,it is challenging to solve in a decentralized fashion. Distributed approaches based on ﬁnding a cyclic route throughthe network are presented in [9], [10].An alternative approach, using gossip algorithms [111], forms a location estimate (cid:98) θ by taking a linear combinationof the sensor locations weighted by a function of their RSS measurement, (cid:98) θ = (cid:80) ni =1 y i K ( f i ) (cid:80) ni =1 K ( f i ) , (25) IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 21 where K : R + → R + is a monotone increasing function satisfying K (0) = 0 and lim f →∞ K ( f ) < ∞ . Intuitively,nodes that are close to the source measure high RSS values, so their locations should be given more weight thannodes that are further away, which will measure lower RSS values. Taking K ( f ) = 1 { f ≥ γ } , where γ > is apositive threshold, and where {·} is the indicator function, (25) reduces to (cid:98) θ = (cid:80) ni =1 y i {(cid:107) y i − θ (cid:107)≤ γ − /β } (cid:80) ni =1 {(cid:107) y i − θ (cid:107)≤ γ − /β } , (26)which is simply the centroid of the locations of sensors that are no further than γ − /β from the source. In [111], itwas shown that this estimator beneﬁts from some attractive properties. First, if the sensor locations y i are modeledas uniform and random over the region being sensed, then (cid:98) θ is a consistent estimator, as the number of sensorsgrows. It is interesting to note that one does not necessarily need to know the parameters α or β precisely toimplement this estimator. In particular, because (25) is self-normalizing, the estimator automatically adapts to thesource signal strength, α . In addition, [111] shows that this estimator is robust to choice of γ . In particular, even if β is not known precisely, the performance of (25) degrades gracefully. On the other hand, the maximum likelihoodapproach is very sensitive to model mismatch and estimating α and β can be challenging.Note that (25) is a ratio of linear functions of the measurements at each node. To compute (25), we run twoparallel instances of gossip over the network, one each for the numerator and the denominator. If each nodeinitializes x Ni (0) = y i K ( f i ) , and x Di (0) = K ( f i ) , then executing gossip iterations will cause the values at eachnode to converge to lim t →∞ x Ni ( t ) = n (cid:80) nj =1 y j K ( f j ) and lim t →∞ x Di = n (cid:80) nj =1 K ( f j ) . Of course, in a practicalimplementation one would stop gossiping after a ﬁxed number of iterations, t stop , which depends on the desiredaccuracy and network topology. Then, each node can locally compute the estimate x Ni ( t stop ) /x Di ( t stop ) of thesource’s location. Note that throughout this section it was assumed that each node knows its own location. Thiscan also be accomplished using a gossip-style algorithm, as described in [112]. C. Distributed Compression and Field Estimation

Extracting information in an energy-efﬁcient and communication-efﬁcient manner is a fundamental challenge inwireless sensor network systems. In many cases, users are interested in gathering data to see an “image” of activityor sensed values over the entire region. Let f i ∈ R denote the measurement at node i , and let f ∈ R n denotethe network signal obtained by stacking these values into a vector. Having each sensor transmit f i directly to aninformation sink is inefﬁcient in many situations. In particular, when the values at different nodes are correlatedor the signal is compressible, then one can transmit less data without loosing the salient information in the signal.Distributed source coding approaches attempt to reduce the total number of bits transmitted by leveraging thecelebrated results of Slepian and Wolf [113] to code with side information [114], [115]. These approaches makeassumptions about statistical characteristics of the underlying data distribution that may be difﬁcult to verify inpractice.An alternative approach is based on linear transform coding, gossip algorithms, and compressive sensing. It hasbeen observed that many natural signals are compressible under some linear transformation. That is, although f IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 22 (a) M ean (cid:239) S qua r ed A pp r o x i m a t i on E rr o r Original SignalTransformed Signal (b) N o r m a li z ed M ean S qua r ed E rr o r k=50k=100k=150 (c)Fig. 5. Example illustrating compression of a smooth signal. Panel (a) shows the original smooth signal which is sampled at 500 randomnode locations, and nodes are connected as in a random geometric graph. Panel (b) illustrates the m -term approximation error decay in boththe original basis and using the eigenvectors of the graph Laplacian as a transform, which is analogous to taking a Fourier transform of signalssupported on the network. Panel (c) illustrates the reconstruction error after gossiping on random linear combinations of the sensor measurementsand reconstructing using compressed sensing techniques. Note that using more random linear projections (larger k ), gives lower error, but thenumber of projections used is much smaller than the network size. may have energy in all locations (i.e., f i > for all i ), there is a linear basis transformation matrix, T ∈ R n × n , suchthat when f is represented in terms of the basis T by computing θ = T f , the transformed signal θ is compressible(i.e., θ j ≈ for many j ). For example, it is well known that smooth one-dimensional signals are well approximatedusing the Fourier basis, and piece-wise smooth images with smooth boundaries (a reasonable model for images)are well-approximated using wavelet bases [116].To formally capture the notion of compressibility using ideas from the theory of nonlinear approximation [117],we reorder the coefﬁcients θ j in order of decreasing magnitude, | θ (1) | ≥ | θ (2) | ≥ | θ (3) | ≥ · · · ≥ | θ ( n ) | , (27)and then deﬁne the best m -term approximation of f in T as f ( m ) = (cid:80) mj =1 θ ( j ) T : , ( j ) , where T : ,j denotes the j thcolumn of T . This is analogous to projecting f onto the m -dimensional subspace of T that captures the most energyin f . We then say that f is α -compressible in T , for α ≥ , when the mean squared approximation error behaveslike n (cid:107) f − f ( m ) (cid:107) ≤ Cm − α , (28)for some constant C > . Since the error exhibits a power-law decay in m for compressible signals, it is possibleto achieve a small mean squared approximation error while only computing and/or communicating the few mostsigniﬁcant coefﬁcients θ (1) , . . . , θ ( m ) . Figure 5 shows an example where 500 nodes forming a random geometricgraph sample a smooth function. As a compressing basis T , we use the eigenvectors of the normalized graphLaplacian (a function of the network topology), which are analogous to the Fourier basis vectors for signalssupported on G [118]. IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 23

Observe that each coefﬁcient, θ j , is a linear function of the data at each node, and so one could conceivablycompute these coefﬁcients using gossip algorithms. Assuming that each node i knows the values { T j,i } nj =1 ineach basis vector, to compute θ j , we can initialize x i (0) = nT j,i f i , and, by gossiping, each node will compute lim t →∞ x i ( t ) = (cid:80) nk =1 T j,k f k = θ j . The main challenge with this approach is that the indices of the most signiﬁcantcoefﬁcients are very signal-speciﬁc, and are generally not known in advance.We can avoid this issue by making use of the recent theory of compressive sensing [119]–[121], which saysthat one can recover sparse signals from a small collection of random linear combinations of the measurements. Inthe present setting, to implement the gathering of k compressive sensing measurements using gossip algorithms,each node initializes k parallel instances of gossip with x i,j (0) = nA i,j f i , j = 1 , . . . , k , where A i,j , e.g., arei.i.d. zero-mean normal random variables with variance /n . Let ¯ x j denote the limiting value of the j th gossipinstance at each node. Stacking these into the vector, ¯ x , any node can recover an estimate of the signal f by solvingthe optimization min θ (cid:107) ¯ x − AT T θ (cid:107) + τ n (cid:88) i =1 | θ i | , (29)where τ > is a regularization parameter. In practice, the values A i,j can be pseudo-randomly generated at eachnode using a predeﬁned seeding mechanism. Then, any user can retrieve the gossip values { x i,j ( t ) } kj =1 from anynode i and solve the reconstruction. Moreover, note that the compressing transformation T only needs to be known atreconstruction time, and to initialize the gossip instances each node only needs its measurement and pseudo-randomlygenerated values A i,j . In general, there is a tradeoff between 1) k , the number of compressed sensing measurementscollected, 2) (cid:15) , the accuracy to which the gossip algorithm is run, 3) the number of transmissions required for thiscomputation, and 4) the average reconstruction accuracy available at each node. For an α -compressible signal f ,compressed sensing theory provides bounds on the mean squared reconstruction error as a function of k and α ,assuming the values, ¯ x , are calculated precisely. Larger k corresponds to lower error, and the error decays rapidlywith k (similar to the m -term approximation), so one can obtain a very accurate estimate of f with k (cid:28) n measurements. Inaccurate computation of the compressed sensing values, ¯ x , due to gossiping for a ﬁnite number ofiterations, can be thought of as adding noise to the values ¯ x , and increases the overall reconstruction error. Figure 5(c)illustrates, via numerical simulation, the tradeoff between varying k and the number of gossip iterations. For moreon the theoretical performance guarantees achievable in this formulation, see [122], [123]. A gossip-based approachto solving the reconstruction problem in a distributed fashion is described in [124]. For an alternative approach tousing gossip for distributed ﬁeld estimation, see [125].V. C ONCLUSION AND F UTURE D IRECTIONS

Because of their simplicity and robustness, gossip algorithms are an attractive approach to distributed in-networkprocessing in wireless sensor networks, and this article surveyed recent results in this area. A major concern insensor networks revolves around conserving limited bandwidth and energy resources, and in the context of iterativegossip algorithms, this is directly related to the rate of convergence. One thread of the discussion covered fast

IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 24 gossiping in wireless network topologies. Another thread focused on understanding and designing for the effectsof wireless transmission, including source and channel coding. Finally, we have illustrated how gossip algorithmscan be used for a diverse range of tasks, including estimation and compression.Currently, this research is branching into a number of directions. One area of active research is investigatinggossip algorithms that go beyond computing linear functions and averages. Just as the average can be viewed as theminimizer of a quadratic cost function, researchers are studying what other classes of functions can be optimizedwithin the gossip framework [126]. A related direction is investigating the connections between gossip algorithmsand message-passing algorithms for distributed inference and information fusion, such as belief propagation [87],[127]. While it is clear that computing pairwise averages is similar to the sum-product algorithm for computingmarginals of distributions, there is no explicit connection between these families of distributed algorithms. It wouldbe interesting to demonstrate that pairwise gossip and its generalizations correspond to messages of the sum- product(or max-product) algorithm for an appropriate Markov random ﬁeld. Such potentials would guarantee convergence(which is not guaranteed in general iterative message-passing) and further establish explicit convergence and messagescheduling results.Another interesting research direction involves understanding the effects of intermittent links and dynamictopologies, and in particular the effects of node mobility. Early work [128] has analyzed i.i.d mobility modelsand shown that mobility can greatly beneﬁt convergence under some conditions. Generalizing to more realisticmobility models seems to be a very interesting research direction that would also be relevant in practice sincegossip algorithms are more useful in such dynamic environments.Gossip algorithms are certainly relevant in other applications that arise in social networks and the interactionof mobile devices with social networks. Distributed inference and information fusion in such dynamic networkedenvironments is certainly going to pose substantial challenges for future research.R

EFERENCES[1] F. Zhao and L. Guibas,

Wireless Sensor Networks: An Information Processing Approach . Morgan Kaufmann, 2004.[2] F. Zhao, J. Liu, J. Liu, L. Guibas, and J. Reich, “Collaborative signal and information processing: An information-directed approach,”

Proc. IEEE , vol. 91, no. 8, pp. 1199–1209, Aug. 2003.[3] B. Sinopoli, C. Sharp, L. Schenato, S. Schaffert, and S. Sastry, “Distributed control applications within sensor networks,”

Proc. IEEE ,vol. 91, no. 8, pp. 1235–1246, Aug. 2003.[4] C. Chong and S. Kumar, “Sensor networks: Evolution, opportunities, and challenges,”

Proc. IEEE , vol. 91, no. 8, pp. 1247–1256, Aug.2003.[5] R. Brooks, P. Ramanathan, and A. Sayeed, “Distributed target classiﬁcation and tracking in sensor networks,”

Proc. IEEE , vol. 91, no. 8,pp. 1163–1171, Aug. 2003.[6] G. Pottie and W. Kaiser, “Wireless integrated network sensors,”

Communications of the ACM , vol. 43, no. 5, pp. 51–58, 2000.[7] V. Shnayder, M. Hempstead, B. Chen, G. Werner-Allen, and M. Welsh, “Simulating the power consumption of large-scale sensor networkapplications,” in

Proc. ACM Conf. on Embedded Networked Sensor Systems , Baltimore, Nov. 2004.[8] Y. Yu, B. Krishnamachari, and V. Prasanna, “Energy-latency tradeoffs for data gathering in wireless sensor networks,” in

IEEE Infocom ,Hong Kong, March 2004.[9] M. Rabbat and R. Nowak, “Distributed optimization in sensor networks,” in

Proc. IEEE/ACM Symposium on Information Processing inSensor Networks , Berkeley, CA, April 2004.

IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 25 [10] D. Blatt and A. Hero, “Energy based sensor network source localization via projection onto convex sets (POCS),”

IEEE Transactions onSignal Processing , vol. 54, no. 9, pp. 3614–3619, 2006.[11] S. Son, M. Chiang, S. Kulkarni, and S. Schwartz, “The value of clustering in distributed estimation for sensor networks,” in

IEEEWirelesscom , Maui, June 2005.[12] A. Ciancio, S. Pattem, A. Ortega, and B. Krishnamachari, “Energy-efﬁcient data representation and routing for wireless sensor networksbased on a distributed wavelet compression algorithm,” in

Proc. IEEE/ACM Symposium on Information Processing in Sensor Networks ,Nashville, Apr. 2006.[13] S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. Govindan, L. Yin, and F. Yu, “Data-centric storage in sensornets with GHT, a geographichash table,”

Mobile Nets. and Apps. , vol. 8, no. 4, pp. 427–442, 2003.[14] R. Karp, C. Schindelhauer, S. Shenker, and B. Vocking, “Randomized rumor spreading,” in

Annual Symp. on Foundations of ComputerScience , vol. 41, 2000, pp. 565–574.[15] D. Kempe, A. Dobra, and J. Gehrke, “Computing aggregate information using gossip,” in

Proc. Foundations of Computer Science ,Cambridge, MA, Oct. 2003.[16] P. Levis, N. Patel, D. Culler, and S. Shenker, “Trickle: A self-regulating algorithm for code propagation and maintenance in wirelesssensor networks,” in

Proc. USENIX/ACM Symp. on Networked Systems Design and Implementation , vol. 246, 2004.[17] J. Tsitsiklis, “Problems in decentralized decision making and computation,” Ph.D. dissertation, Massachusetts Institute of Tech., Nov.1984.[18] J. Tsitsiklis, D. Bertsekas, and M. Athans, “Distributed asynchronous deterministic and stochastic gradient optimization algorithms,”

IEEETrans. Automatic Control , vol. AC-31, no. 9, pp. 803–812, Sep. 1986.[19] G. V. Cybenko, “Dynamic load balancing for distributed memory multiprocessors,”

Journal on Parallel and Distributed Computing , vol. 7,pp. 279–301, 1989.[20] A. Jadbabaie, J. Lin, and A. S. Morse, “Coordination of groups of mobile autonomous agents using nearest neighbor rules,”

IEEETransactions on Automatic Control , vol. AC-48, no. 6, pp. 988–1001, June 2003.[21] R. Olfati-Saber and R. M. Murray, “Consensus problems in networks of agents with switching topology and time-delays,”

IEEE Trans.Automat. Contr. , vol. 49, no. 9, pp. 1520–1533, Sept. 2004.[22] J. A. Fax and R. M. Murray, “Information ﬂow and cooperative control of vehicle formations,”

IEEE Transactions on Automatic Control ,vol. 49, no. 9, pp. 1465–1476, Sep. 2004.[23] V. Saligrama and D. Castanon, “Reliable distributed estimation with intermittent communications,” in , San Diego, CA, Dec. 2006, pp. 6763–6768.[24] S. Kar, S. A. Aldosari, and J. M. F. Moura, “Topology for distributed inference on graphs,”

IEEE Transactions on Signal Processing ,vol. 56, no. 6, pp. 2609–2613, June 2008.[25] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip algorithms.”

IEEE Trans. Inf. Theory , vol. 52, no. 6, pp. 2508–2530,Jun. 2006.[26] L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,” in

Proc. IEEE Conf. on Decision and Control , Hawaii, Dec. 2003.[27] D. Bertsekas and J. Tsitsiklis,

Parallel and Distributed Computation: Numerical Methods . Athena Scientiﬁc, 1997.[28] R. Olfati-Saber, J. Fax, and R. Murray, “Consensus and cooperation in networked multi-agent systems,”

Proc. IEEE , vol. 95, no. 1, pp.215–233, Jan. 2007.[29] W. Ren, R. Beard, and E. Atkins, “Information consensus in multivehicle cooperative control,”

IEEE Control Systems Magazine , vol. 27,no. 2, pp. 71–82, April 2007.[30] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Analysis and optimization of randomized gossip algorithms,” in

Proceedings of the 43rdConference on Decision and Control (CDC 2004) , 2004.[31] F. Fagnani and S. Zampieri, “Randomized consensus algorithms over large scale networks,” in

IEEE J. on Selected Areas ofCommunications, to appear , 2008.[32] A. Sinclair, “Improved bounds for mixing rates of markov chains and multicommodity ﬂow,” in

Combinatorics, Probability and Computing ,vol. 1, 1992.[33] S. Boyd, P. Diaconis, and L. Xiao, “Fastest mixing markov chain on a graph,”

SIAM REVIEW , vol. 46, pp. 667–689, 2003.

IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 26 [34] P. Gupta and P. R. Kumar, “Critical power for asymptotic connectivity in wireless networks,” in

Stochastic Analysis, Control, Optimization,and Applications , Boston, 1998, pp. 1106–1110.[35] A. G. Dimakis, A. D. Sarwate, and M. J. Wainwright, “Geographic gossip: efﬁcient aggregation for sensor networks,” in

ACM/IEEESymposium on Information Processing in Sensor Networks , 2006.[36] W. Li and H. Dai, “Location-aided fast distributed consensus,” in

IEEE Transactions on Information Theory, submitted , 2008.[37] ——, “Cluster-based distributed consensus,”

IEEE Trans. Wireless Communications , vol. 8, no. 1, pp. 28–31, Jan. 2009.[38] P. Diaconis, S. Holmes, and R. Neal, “Analysis of a nonreversible markov chain sampler,”

Annals of Applied Probability , pp. 726–752,2000.[39] F. Chen, L. Lovasz, and I. Pak, “Lifting markov chains to speed up mixing,” in

Proceedings of the thirty-ﬁrst annual ACM symposiumon Theory of computing . ACM, 1999, pp. 275–281.[40] K. Jung, D. Shah, and J. Shin, “Fast gossip through lifted Markov chains,” in

Proc. Allerton Conf. on Comm., Control, and Comp. ,Urbana-Champaign, IL, Sep. 2007.[41] D. Mosk-Aoyama and D. Shah, “Information dissemination via gossip: Applications to averaging and coding,” April 2005,http://arxiv.org/cs.NI/0504029.[42] P. Flajolet and G. Martin, “Probabilistic counting algorithms for data base applications,”

Journal of Computer and System Sciences ,vol. 31, no. 2, pp. 182–209, 1985.[43] M. Cao, D. A. Spielman, and E. M. Yeh, “Accelerated gossip algorithms for distributed computation,” in

Proc. 44th Annual AllertonConf. Comm., Control, and Comp. , Monticello, IL, Sep. 2006.[44] E. Kokiopoulou and P. Frossard, “Polynomial ﬁltering for fast convergence in distributed consensus,”

IEEE Trans. Signal Processing ,vol. 57, no. 1, pp. 342–354, Jan. 2009.[45] B. Johansson and M. Johansson, “Faster linear iterations for distributed averaging,” in

Proc. IFAC World Congress , Seoul, South Korea,Jul. 2008.[46] B. Oreshkin, M. Coates, and M. Rabbat, “Optimization and analysis of distributed averaging with short node memory,” to appear,

IEEETrans. Signal Processing , Jul. 2010.[47] F. Benezit, A. G. Dimakis, P. Thiran, and M. Vetterli, “Gossip along the way: Order-optimal consensus through randomized path averaging,”in

Proceedings of Allerton Conference, Monticello, IL , 2007.[48] O. Savas, M. Alanyali, and V. Saligrama, “Efﬁcient in-network processing through local ad-hoc information coalescence,” in

DCOSS ,2006, pp. 252–265.[49] T. Aysal, M. Yildiz, A. Sarwate, and A. Scaglione, “Broadcast gossip algorithms for consensus,”

Signal Processing, IEEE Transactionson , vol. 57, no. 7, pp. 2748–2761, July 2009.[50] D. Ustebay, B. Oreshkin, M. Coates, and M. Rabbat, “Rates of convergence for greedy gossip with eavesdropping,” in

Proc. AllertonConf. on Communication, Control, and Computing , Monticello, 2008.[51] A. Ortega and K. Ramchandran, “Rate-distortion methods for image and video compression,”

Signal Processing Magazine, IEEE , vol. 15,no. 6, pp. 23–50, Nov 1998.[52] A. Orlitsky and A. El Gamal, “Average and randomized communication complexity,”

Information Theory, IEEE Transactions on , vol. 36,no. 1, pp. 3–16, Jan 1990.[53] B. Nazer and M. Gastpar, “Computation over multiple-access channels,”

Information Theory, IEEE Transactions on , vol. 53, no. 10, pp.3498–3516, Oct. 2007.[54] T. M. Cover and J. A. Thomas,

Elements of information theory . John Wiley and Sons, Inc., 1991.[55] S. Kar and J. M. F. Moura, “Sensor networks with random links: Topology design for distributed consensus,”

IEEE Transactions onSignal Processing , vol. 56, no. 7, pp. 3315–3326, July 2008.[56] Y. Hatano and M. Mesbahi, “Agreement over random networks,” in , vol. 2, Dec. 2004,pp. 2010–2015.[57] M. G. Rabbat, R. D. Nowak, and J. A. Bucklew, “Generalized consensus computation in networked systems with erasure links,” in

Proc.of the 6th Intl. Wkshp. on Sign. Proc. Adv. in Wireless Communications , New York, NY, 2005, pp. 1088–1092.[58] S. Patterson and B. Bamieh, “Distributed consensus with link failures as a structured stochastic uncertainty problem,” in , Monticello, IL, Sept. 2008.

IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 27 [59] C. Wu, “Synchronization and convergence of linear dynamics in random directed networks,”

IEEE Transactions on Automatic Control ,vol. 51, no. 7, pp. 1207–1210, July 2006.[60] M. Porﬁri and D. Stilwell, “Stochastic consensus over weighted directed networks,” in

Proceedings of the 2007 American ControlConference , New York City, USA, July 11-13 2007.[61] D. Jakovetic, J. Xavier, and J. M. F. Moura, “Weight optimization for consensus algorithms with correlated switching topology,”

IEEETransactions on Signal Processing , vol. abs/0906.3736, June 2009, submitted.[62] A. T. Salehi and A. Jadbabaie, “On consensus in random networks,” in

The Allerton Conference on Communication, Control, andComputing , Allerton House, IL, September 2006.[63] S. Kar and J. M. F. Moura, “Distributed consensus algorithms in sensor networks with imperfect communication: Link failures andchannel noise,”

IEEE Transactions on Signal Processing , vol. 57, no. 1, pp. 355–369, January 2009.[64] A. Nedic, A. Ozdaglar, and P. Parrilo, “Constrained consensus and optimization in multi-agent networks,”

IEEE Transactions on AutomaticControl , 2009, to appear.[65] A. Kashyap, T. Basar, and R. Srikant, “Quantized consensus,”

Automatica , vol. 43, no. 7, pp. 1192 – 1203, 2007.[66] R. Subramanian and I. D. Scherson, “An analysis of diffusive load-balancing,” in

SPAA ’94: Proceedings of the sixth annual ACMsymposium on Parallel algorithms and architectures . New York, NY, USA: ACM, 1994, pp. 220–225.[67] Y. Rabani, A. Sinclair, and R. Wanka, “Local divergence of Markov chains and the analysis of iterative load-balancing schemes,” in

InProceedings of the 39th IEEE Symposium on Foundations of Computer Science (FOCS 98 , 1998, pp. 694–703.[68] W. Aiello, B. Awerbuch, B. Maggs, and S. Rao, “Approximate load balancing on dynamic and asynchronous networks,” in

STOC ’93:Proceedings of the twenty-ﬁfth annual ACM symposium on Theory of computing . New York, NY, USA: ACM, 1993, pp. 632–641.[69] B. Ghosh and S. Muthukrishnan, “Dynamic load balancing by random matchings,”

J. Comput. Syst. Sci. , vol. 53, no. 3, pp. 357–370,1996.[70] J. Lavaei and R. Murray, “On quantized consensus by means of gossip algorithm - part i: Convergence proof,” in

American ControlConference, 2009. ACC ’09. , June 2009, pp. 394–401.[71] ——, “On quantized consensus by means of gossip algorithm - part ii: Convergence time,” in

American Control Conference, 2009. ACC’09. , June 2009, pp. 2958–2965.[72] F. Benezit, P. Thiran, and M. Vetterli, “Interval consensus: From quantized gossip to voting,”

Acoustics, Speech, and Signal Processing,IEEE International Conference on , vol. 0, pp. 3661–3664, 2009.[73] S. Kar and J. M. F. Moura, “Distributed consensus algorithms in sensor networks: Quantized data and random link failures,”

Acceptedfor publication in the IEEE Transactions on Signal Processing , September 2009. [Online]. Available: http://arxiv.org/abs/0712.1609[74] T. C. Aysal, M. Coates, and M. Rabbat, “Distributed average consensus using probabilistic quantization,” in

Statistical Signal Processing,2007. SSP ’07. IEEE/SP 14th Workshop on , Aug. 2007, pp. 640–644.[75] T. Aysal, M. Coates, and M. Rabbat, “Distributed average consensus with dithered quantization,”

IEEE Trans. Signal Processing , vol. 56,no. 10, pp. 4905–4918, Oct. 2008.[76] R. Carli, F. Fagnani, A. Speranzon, and S. Zampieri, “Communication constraints in the average consensus problem,”

Automatica , vol. 44,no. 3, pp. 671–684, 2008.[77] N. Elia and S. Mitter, “Stabilization of linear systems with limited information,”

Automatic Control, IEEE Transactions on , vol. 46, no. 9,pp. 1384–1400, Sep 2001.[78] M. Yildiz and A. Scaglione, “Differential nested lattice encoding for consensus problems,” in

Proc. Information Processing in SensorNetworks , April 2007, pp. 89–98.[79] ——, “Coding with side information for rate-constrained consensus,”

Signal Processing, IEEE Transactions on , vol. 56, no. 8, pp.3753–3764, Aug. 2008.[80] A. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,”

Information Theory, IEEETransactions on , vol. 22, no. 1, pp. 1–10, Jan 1976.[81] R. Zamir, S. Shamai, and U. Erez, “Nested linear/lattice codes for structured multiterminal binning,”

Information Theory, IEEETransactions on , vol. 48, no. 6, pp. 1250–1276, Jun 2002.[82] O. Ayaso, D. Shah, and M. Dahleh, “Distributed computation under bit constraints,” in

Decision and Control, 2008. CDC 2008. 47thIEEE Conference on , Dec. 2008, pp. 4837–4842.

IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 28 [83] A. El Gamal and H.-I. Su, “Distributed lossy averaging,” in

Information Theory, 2009. ISIT 2009. IEEE International Symposium on , 282009-July 3 2009, pp. 1453–1457.[84] B. Nazer, A. G. Dimakis, and M. Gastpar, “Neighborhood gossip: Concurrent averaging through local interference,” in

Acoustics, Speechand Signal Processing, 2009. ICASSP 2009. IEEE International Conference on , April 2009, pp. 3657–3660.[85] S. Kar and J. Moura, “Consensus based detection in sensor networks: Topology optimization under practical constraints,” in

Proc.International Workshop on Information Theory in Sensor Networks , Santa Fe, NM, June 2007.[86] V. Saligrama, M. Alanyali, and O. Savas, “Distributed detection in sensor networks with packet loss and ﬁnite capacity links,”

IEEETrans. Signal Processing , vol. 54, no. 11, pp. 4118–4132, Nov. 2006.[87] C. Moallemi and B. Van Roy, “Consensus propagation,”

IEEE Transactions on Information Theory , vol. 52, no. 11, pp. 4753–4766, 2006.[88] R. Tron, R. Vidal, and A. Terzis, “Distributed pose estimation in camera networks via consensus on SE (3) ,” in Proc. IEEE Conf. onDistributed Smart Cameras , Palo Alto, Sep. 2008.[89] A. Jorstad, P. Burlina, I. Wang, D. Lucarelli, and D. DeMenthon, “Model-based pose estimation by consensus,” in

Proc. IntelligentSensors, Sensor Networks, and Inf. Processing , Sydney, Dec. 2008.[90] S. Kar and J. M. F. Moura, “A linear iterative algorithm for distributed sensor localization,” in , Paciﬁc Grove, CA, Oct. 2008, pp. 1160–1164.[91] S. Kar, J. M. F. Moura, and K. Ramanan, “Distributed parameter estimation in sensor networks: Nonlinear observation models andimperfect communication,” Aug. 2008, submitted for publication. [Online]. Available: http://arxiv.org/abs/0809.0009[92] C. G. Lopes and A. H. Sayed, “Diffusion least-mean squares over adaptive networks: Formulation and performance analysis,”

IEEETrans. Signal Processing , vol. 56, no. 7, pp. 3122–3136, July 2008.[93] S. Stankovic, M. Stankovic, and D. Stipanovic, “Decentralized parameter estimation by consensus based stochastic approximation,” in , New Orleans, LA, USA, 12-14 Dec. 2007, pp. 1535–1540.[94] I. Schizas, G. Mateos, and G. Giannakis, “Stability analysis of the consensus-based distributed lms algorithm,” in

Proceedings of the33rd International Conference on Acoustics, Speech, and Signal Processing , Las Vegas, Nevada, USA, April 1-4 2008, pp. 3289–3292.[95] S. Ram, V. Veeravalli, and A. Nedic, “Distributed and recursive parameter estimation in parametrized linear state-space models,”

Submittedfor publication , April 2008.[96] S. S. Ram, V. V. Veeravalli, and A. Nedic, “Distributed and recursive nonlinear least square parameter estimation: Linear and separablemodels,” in

Sensor Networks: Where Theory Meets Practice , G. Ferrari, Ed. Springer-Verlag, 2009.[97] R. Olfati-Saber, “Distributed Kalman ﬁlter with embedded consensus ﬁlters,” in

ECC-CDC’05, 44th IEEE Conference on Decision andControl and European Control Conference , 2005.[98] S. Kirti and A. Scaglione, “Scalable distributed Kalman ﬁltering through consensus,” in

Proceedings of the 33rd International Conferenceon Acoustics, Speech, and Signal Processing , Las Vegas, Nevada, USA, April 1-4 2008, pp. 2725–2728.[99] U. A. Khan and J. M. F. Moura, “Distributing the Kalman ﬁlter for large-scale systems,”

Accepted for publication, IEEE Transactionson Signal Processing , 2008.[100] A. Ribeiro, I. D. Schizas, S. I. Roumeliotis, and G. B. Giannakis, “Kalman ﬁltering in wireless sensor networks: Incorporatingcommunication cost in state estimation problems,”

IEEE Control Systems Magazine , 2009, submitted.[101] R. Carli, A. Chiuso, L. Schenato, and S. Zampieri, “Distributed Kalman ﬁltering using consensus strategies,”

IEEE Journal on SelectedAreas in Communications , vol. 26, no. 4, pp. 622–633, 2008.[102] A. Das and M. Mesbahi, “Distributed linear parameter estimation in sensor networks based on laplacian dynamics consensus algorithm,”in , vol. 2, Reston, VA, USA, 28-28Sept. 2006, pp. 440–449.[103] I. D. Schizas, A. Ribeiro, and G. B. Giannakis, “Consensus in ad hoc wsns with noisy links - part i: Distributed estimation of deterministicsignals,”

IEEE Transactions on Signal Processing , vol. 56, no. 1, pp. 350–364, January 2008.[104] M. Nevel’son and R. Has’minskii,

Stochastic Approximation and Recursive Estimation . Providence, Rhode Island: American MathematicalSociety, 1973.[105] M. Huang and J. Manton, “Stochastic Lyapunov analysis for consensus algorithms with noisy measurements,” in

Proc. American ControlConf. , New York, Jul. 2007.[106] S. Kar and J. M. F. Moura, “A mixed time scale algorithm for distributed parameter estimation: nonlinear observation models and

IMAKIS

ET AL. : GOSSIP ALGORITHMS FOR DISTRIBUTED SIGNAL PROCESSING 29 imperfect communication,” in

Proceedings of the 34th International Conference on Acoustics, Speech, and Signal Processing , Taipei,Taiwan, April 2009, pp. 3669–3672.[107] N. Patwari, J. Ash, S. Kyperountas, A. Hero, R. Moses, and N. Correal, “Locating the nodes: Cooperative localization in wireless sensornetworks,”

IEEE Signal Processing Magazine , vol. 22, no. 4, pp. 54–69, July 2005.[108] D. Li and Y. Hu, “Energy based collaborative source localization using acoustic micro-sensor array,”

J. EUROSIP Applied SignalProcessing , vol. 2003, no. 4, pp. 321–337, 2003.[109] X. Sheng and Y. Hu, “Energy based acoustic source localization,” in

Proc. ACM/IEEE Int. Conf. on Information Processing in SensorNetworks , Palo Alto, April 2003.[110] ——, “Maximum likelihood multiple-source localization using acoustic energy measurements with wireless sensor networks,”

IEEETransactions on Signal Processing , vol. 53, no. 1, pp. 44–53, Jan. 2005.[111] M. Rabbat, R. Nowak, and J. Bucklew, “Robust decentralized source localization via averaging,” in

Proc. IEEE ICASSP , Phil., PA, Mar.2005.[112] U. Khan, S. Kar, and J. Moura, “Distributed sensor localization in random environments using minimal number of anchor nodes,”

IEEETrans. Signal Processing , vol. 57, no. 5, pp. 2000–2016, May 2009.[113] D. Slepian and J. Wolf, “Noiseless coding of correlated information sources,”

IEEE Trans. Inf. Theory , vol. 19, no. 4, pp. 471–480, 1973.[114] S. Servetto, “On the feasibility of large-scale wireless sensor networks,”

Proc. Allerton Conf. on Comm., Control, and Computing , 2002.[115] S. Pradhan, J. Kusuma, and K. Ramchandran, “Distributed compression in a dense microsensor network,”

IEEE Signal ProcessingMagazine , vol. 19, no. 2, pp. 51–60, March 2002.[116] S. Mallat,

A Wavelet Tour of Signal Processing . Academic Press, 1999.[117] R. DeVore, “Nonlinear approximation,”

Acta numerica , vol. 7, pp. 51–150, 1998.[118] F. Chung,

Spectral Graph Theory . American Math. Society, 1997.[119] E. J. Candes and T. Tao, “Decoding by linear programming,”

IEEE Trans. Inform. Theory , vol. 51, no. 12, pp. 4203–4215, Dec. 2005.[120] D. L. Donoho, “Compressed sensing,”

IEEE Trans. Inform. Theory , vol. 52, no. 4, pp. 1289–1306, Apr. 2006.[121] E. J. Candes and T. Tao, “Near-optimal signal recovery from random projections: Universal encoding strategies?”

IEEE Trans. Inform.Theory , vol. 52, no. 12, pp. 5406–5425, Dec. 2006.[122] M. Rabbat, J. Haupt, A. Singh, and R. Nowak, “Decentralized compression and predistribution via randomized gossiping,” in

Proc. Information Processing in Sensor Networks , Nashville, TN, Apr. 2006.[123] J. Haupt, W. Bajwa, M. Rabbat, and R. Nowak, “Compressed sensing for networked data,”

IEEE Signal Processing Magazine , vol. 25,no. 2, pp. 92–101, Mar. 2008.[124] A. Schmidt and J. Moura, “A distributed sensor fusion algorithm for the inversion of sparse ﬁelds,” in

Proc. Asilomar Conf. on Signals,Systems, and Computers , Paciﬁc Grove, CA, Nov. 2009.[125] R. Sarkar, X. Zhu, and J. Gao, “Hierarchical spatial gossip for multi-resolution representations in sensor networks,” in

Proc. Int. Conf.on Information Processing in Sensor Networks , April 2007, pp. 420–429.[126] S. S. Ram, A. Nedi´c, and V. Veeravalli, “Asynchronous gossip algorithms for stochastic optimization,” in

Proc. IEEE Conf. on Decisionand Control , Shanghai, China, Dec. 2009.[127] M. Cetin, L. Chen, J. Fisher, A. Ihler, R. Moses, M. Wainwright, and A. Willsky, “Distributed fusion in sensor networks,”

IEEE SignalProcessing Magazine , vol. 23, no. 4, pp. 42–55, 2006.[128] A. Sarwate and A. Dimakis, “The impact of mobility on gossip algorithms,”