[PDF] An Event-based Diffusion LMS Strategy

Abstract

We consider a wireless sensor network consists of cooperative nodes, each of them keep adapting to streaming data to perform a least-mean-squares estimation, and also maintain information exchange among neighboring nodes in order to improve performance. For the sake of reducing communication overhead, prolonging batter life while preserving the benefits of diffusion cooperation, we propose an energy-efficient diffusion strategy that adopts an event-based communication mechanism, which allow nodes to cooperate with neighbors only when necessary. We also study the performance of the proposed algorithm, and show that its network mean error and MSD are bounded in steady state. Numerical results demonstrate that the proposed method can effectively reduce the network energy consumption without sacrificing steady-state network MSD performance significantly.

Full PDF

11 An Event-based Diffusion LMS Strategy

Yuan Wang, Wee Peng Tay, and Wuhua Hu

Abstract

We consider a wireless sensor network consists of cooperative nodes, each of them keep adaptingto streaming data to perform a least-mean-squares estimation, and also maintain information exchangeamong neighboring nodes in order to improve performance. For the sake of reducing communicationoverhead, prolonging batter life while preserving the beneﬁts of diffusion cooperation, we propose anenergy-efﬁcient diffusion strategy that adopts an event-based communication mechanism, which allownodes to cooperate with neighbors only when necessary. We also study the performance of the proposedalgorithm, and show that its network mean error and MSD are bounded in steady state. Numerical resultsdemonstrate that the proposed method can effectively reduce the network energy consumption withoutsacriﬁcing steady-state network MSD performance signiﬁcantly.

I. I

NTRODUCTION

In the era of big data and Internet-of-Things (IoT), ubiquitous smart devices continuously sense theenvironment and generate large amount of data rapidly. To better address the real-time challenges arisingfrom online inference, optimization and learning, distributed adaptation algorithms have become especiallypromising and popular compared with traditional centralized solutions. As computation and data storageresources are distributed to every sensor node in the network, information can be processed and fusedthrough local cooperation among neighboring nodes, and thus reducing system latency and improvingrobustness and scalability. Among various implementations of distributed adaptation solutions [1]–[6],diffusion strategies are particularly advantageous for continuous adaptation using constant step-sizes,thanks to their low complexity, better mean-square deviation (MSD) performance and stability [7]–[12].Therefore diffusion strategies have attracted a lot of research interest in recent years for both single-task

Y. Wang and W. P. Tay are with the School of Electrical and Electronic Engineering, Nanyang Technological University,Singapore. Emails: [email protected], [email protected]. Hu was with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, andnow is with the Department of Artiﬁcial Intelligence for Applications, SF Technology Co. Ltd, Shenzhen, China. E-mail:[email protected]. a r X i v : . [ ee ss . SP ] M a r scenarios where nodes share a common parameter of interest [13]–[19], and multi-task networks whereparameters of interest differ among nodes or groups of nodes [20]–[24].In diffusion strategies, each sensor communicates local information to their neighboring sensors ineach iteration. However, in IoT networks, devices or nodes usually have limited energy budget andcommunication bandwidth, which prevent them from frequently exchanging information with neighboringsensors. Several methods to improve energy efﬁciency in diffusion have been proposed in the literature,and these can be divided into two main categories: reducing the number of neighbors to cooperate with[25]–[27]; and reducing the dimension of the local information to be transmitted [28]–[30]. These methodseither rely on additional optimization procedures, or use auxiliary selection or projection matrices, whichrequire more computation resources to implement.Unlike time-driven communication where nodes exchange information at every iteration, event-basedcommunication mechanisms allow nodes only trigger communication with neighbors upon occurrenceof certain meaningful events. This can signiﬁcantly reduce energy consumption by avoiding unnecessaryinformation exchange especially when the system has reached steady-state. It also allows every nodein the network to share the limited bandwidth resource so that channel efﬁciency is improved. Suchmechanisms have been developed for state estimation, ﬁltering, and distributed control over wirelesssensor networks [31]–[38], but have not been fully investigated in the context of diffusion adaptation. In[39], the author proposes a diffusion strategy where every entry of the local intermediate estimates arequantized into values of multiple levels before being transmitted to neighbors, communication is triggeredonce quantized local information goes through a quantization level crossing. The performance of thismethod relies largely on the precision of selected quantization scheme. However, choosing a suitablequantization scheme with desired precision, and requiring every node being aware of same quantizationscheme is practically difﬁcult for online adaptation where parameter of interest and environment maychange over time.In this paper, we propose an event-based diffusion strategy to reduce communication among neighboringnodes while preserve the advantages of diffusion strategies. Speciﬁcally, each node monitors the differencebetween the full vector of its current local update and the most recent intermediate estimate transmittedto its neighbors. A communication is triggered only if this difference is sufﬁciently large. We providea sufﬁcient condition for the mean error stability of our proposed strategy, and an upper bound ofits steady-state network mean-squared deviation (MSD). Simulations demonstrate that our event-basedstrategy achieves a similar steady-state network MSD as the popular adapt-then-combine (ATC) diffusionstrategy but a signiﬁcantly lower communication rate.The rest of this paper is organized as follows. In Section II, we introduce the network model, problem formulation and discuss prior works. In Section III, we describe our proposed event-based diffusionLMS strategy and analyze its performance. Simulation results are demonstrated in Section V followedby concluding remarks in Sections VI. Notations.

Throughout this paper, we use boldface characters for random variables, plain charactersfor realizations of the corresponding random variables as well as deterministic quantities. In addition, weuse upper-case characters for matrices and lower-case ones for vectors and scalars. The notation I N is an N × N identity matrix. The matrix A T is the transpose of the matrix A , λ n ( A ) , and λ min ( A ) is the n -theigenvalue and the smallest eigenvalue of the matrix A , respectively. Besides, ρ ( A ) is the spectral radiusof A . The operation A ⊗ B denotes the Kronecker product of the two matrices A and B . The notation (cid:107)·(cid:107) is the Euclidean norm, (cid:107)·(cid:107) b, ∞ denotes the block maximum norm [11], while (cid:107) A (cid:107) (cid:44) A ∗ Σ A . Weuse diag {·} to denote a matrix whose main diagonal is given by its arguments, and col {·} to denote acolumn vector formed by its arguments. The notation vec( · ) represents a column vector consisting of thecolumns of its matrix argument stacked on top of each other. If σ = vec(Σ) , we let (cid:107)·(cid:107) σ = (cid:107)·(cid:107) Σ , anduse either notations interchangeably.II. D ATA M ODELS AND P RELIMINARIES

In this section, we ﬁrst present our network and data model assumptions. We then give a briefdescription of the ATC diffusion strategy.

A. Network and Data Model

Consider a network represented by an undirected graph G = ( V, E ) , where V = { , , · · · , N } denotesthe set of nodes, and E is the set of edges. Any two nodes are said to be connected if there is an edgebetween them. The neighborhood of each node k is denoted by N k which consists of node k and all thenodes connected with node k . Since the network is assumed to be undirected, if node k is a neighbor ofnode (cid:96) , then node (cid:96) is also a neighbor of node k . Without loss of generality, we assume that the networkis connected.Every node in the network aims to estimate an unknown parameter vector w ◦ ∈ R M × . At each timeinstant i ≥ , each node k observes data d k ( i ) ∈ R and u k ( i ) ∈ R M × , which are related through thefollowing linear regression model: d k ( i ) = u T k ( i ) w ◦ + v k ( i ) , (1)where v k ( i ) is an additive observation noise. We make the following assumptions. Assumption 1.

The regression process { u k,i } is zero-mean, spatially independent and temporally white.The regressor u k ( i ) has positive deﬁnite covariance matrix R u,k = E (cid:2) u k ( i ) u T k ( i ) (cid:3) . Assumption 2.

The noise process { v k ( i ) } is spatially independent and temporally white. The noise v k ( i ) has variance σ v,k , and is assumed to be independent of the regressors u (cid:96) ( j ) for all { k, (cid:96), i, j } .B. ATC Diffusion Strategy To estimate the parameter w ◦ , the network solves the following least mean-squares (LMS) problem: min w N (cid:88) k =1 J k ( w ) , (2)where for each k ∈ V , J k ( w ) = (cid:88) k ∈N k E (cid:12)(cid:12)(cid:12) d k ( i ) − u k ( i ) T w (cid:12)(cid:12)(cid:12) . (3)The ATC diffusion strategy [7], [11] is a distributed optimization procedure that attempts to solve (2)iteratively by performing the following local updates at each node k at each time instant i : ψ k ( i ) = w k ( i −

1) + µ k u k ( i ) (cid:16) d k ( i ) − u k ( i ) T w k ( i − (cid:17) , (4) w k,i = (cid:88) (cid:96) ∈N a (cid:96)k ψ (cid:96),i , (5)where µ k > is a chosen step size. The procedure in (4) is referred to as the adaptation step and (5) isthe combination step. The combination weights { a (cid:96)k } are non-negative scalars and satisfy: a (cid:96)k ≥ , N (cid:88) (cid:96) =1 a (cid:96)k = 1 , a (cid:96)k = 0 , if (cid:96) / ∈ N k . (6)The local estimates w k,i in the ATC strategy are shown to converge in mean to the true parameter w ◦ ifthe step sizes µ k are chosen to be below a particular threshold [7], [11].III. E VENT -B ASED D IFFUSION

We consider a modiﬁcation of the ATC strategy so that the local intermediate estimate ψ k ( i ) of eachnode k is communicated to its neighbors only at certain trigger time instants s nk , n = 1 , , . . . . Let ψ k ( i ) be the last local intermediate estimate node k transmitted to its neighbors at time instant i , i.e., ψ k ( j ) = ψ k ( s nk ) , for j ∈ (cid:2) s nk , s n +1 k (cid:1) . (7)Let (cid:15) − k ( i ) be the a prior gap deﬁned as (cid:15) − k ( i ) = ψ k ( i ) − ψ k ( i − . (8) Let f (cid:0) (cid:15) − k ( i ) (cid:1) = (cid:13)(cid:13) (cid:15) − k ( i ) (cid:13)(cid:13) Y k , where Y k is a positive semi-deﬁnite weighting matrix.For each node k , transmission of its local intermediate estimate ψ k ( i ) is triggered whenever f (cid:0) (cid:15) − k ( i ) (cid:1) > δ k ( i ) > , (9)where δ k ( i ) is the threshold adopted by node k at time i .In this paper, we allow the thresholds to be time-varying. We further assume { δ k ( i ) } of each node k are upper bounded, and let δ k = sup { δ k ( i ) | i > } . (10)In addition, we deﬁne binary variables { γ k ( i ) } such that γ k ( i ) = 1 if node k transmits at time instant i , and 0 otherwise. The sequence of triggering time instants ≤ s k ≤ s k ≤ . . . can then be deﬁnedrecursively as s n +1 k = min { i ∈ N | i > s nk , γ k ( i ) = 1 } . (11)For every node in the network, we apply the event-based adapt-then-combine (EB-ATC) strategydetailed in Algorithm 1. Note that every node always combines its own intermediate estimate regardlessof the triggering status. A succinct form of the EB-ATC can be summarized as the following equations, ψ k ( i ) = w k ( i −

1) + µ k u k ( i ) (cid:16) d k ( i ) − u k ( i ) T w k ( i − (cid:17) , (12) w k ( i ) = a kk ψ k ( i ) + (cid:88) (cid:96) ∈N k \ k a (cid:96)k ψ (cid:96) ( i ) . (13)IV. P ERFORMANCE A NALYSIS

In this section, we study the mean and mean-square error behavior of the EB-ATC diffusion strategy.

A. Network Error Recursion Model

In order to facilitate the analysis of error behavior, we ﬁrst deﬁne some necessary symbols and derivethe recursive equations of errors across the network. To begin with, the error vectors of each node k attime instant i are given by (cid:101) ψ k ( i ) = w ◦ − ψ k ( i ) , (14) (cid:101) w k ( i ) = w ◦ − w k ( i ) . (15) Algorithm 1

Event-based ATC Diffusion Strategy (EB-ATC) for every node k at each time instant i do Local Update: Obtain intermediate estimate ψ k ( i ) using (4) Event-based Triggering: Compute (cid:15) − k ( i ) and f (cid:0) (cid:15) − k ( i ) (cid:1) . if f (cid:0) (cid:15) − k ( i ) (cid:1) > δ k ( i ) then (i) Trigger the communication, broadcast local update ψ k,i to every neighbors (cid:96) ∈ N k . (ii) Mark γ k ( i ) = 1 , and update ψ (cid:96) ( i ) = ψ (cid:96) ( i ) . else if f (cid:0) (cid:15) − k ( i ) (cid:1) ≤ δ k ( i ) then (i) Keep silent. (ii) Mark γ k ( i ) = 0 , and update ψ (cid:96) ( i ) = ψ (cid:96) ( i − . end if Diffusion Combination w k ( i ) = a kk ψ k ( i ) + (cid:80) (cid:96) ∈N k \ k a (cid:96)k ψ (cid:96) ( i ) end for Recall that under EB-ATC each node only combines the local updates { ψ (cid:96) ( i ) | (cid:96) ∈ N k } that werepreviously received from its neighbors. Therefore, we also introduce the a posterior gap (cid:15) k ( i ) deﬁnedas: (cid:15) k ( i ) = ψ k ( i ) − ψ k ( i ) , (16)to capture the discrepancy between the local intermediate estimate ψ k ( i ) and the estimate ψ k ( i ) that isavailable at neighboring nodes. We have (cid:15) k ( i ) =  , if (cid:13)(cid:13) (cid:15) − k ( i ) (cid:13)(cid:13) Y k > δ k ( i ) , (cid:15) − k ( i ) , otherwise . (17)From (17), we have the following result. Lemma 1.

The a posterior gap (cid:15) k ( i ) is bounded, and (cid:107) (cid:15) k ( i ) (cid:107) ≤ (cid:16) δ k λ min ( Y k ) (cid:17) .Proof. See Appendix A

Collecting the iterates (cid:101) ψ k,i , (cid:101) w k,i , and (cid:15) k ( i ) across all nodes we have, (cid:101) ψ ( i ) = col (cid:26)(cid:16) (cid:101) ψ k ( i ) (cid:17) Nk =1 (cid:27) , (18) (cid:101) w ( i ) = col (cid:110) ( (cid:101) w k ( i )) Nk =1 (cid:111) , (19) (cid:15) ( i ) = col (cid:110) ( (cid:15) k ( i )) Nk =1 (cid:111) . (20)Subtracting both sides of (12) from w ◦ , and applying the data model (1), we obtain the following errorrecursion for each node k : (cid:101) ψ k ( i ) = (cid:16) I M − µ k u k ( i ) u T k ( i ) (cid:17) (cid:101) w k ( i ) − µ k u k ( i ) v k ( i ) . (21)Note that by resorting to (16), the local combination step (13) can be expressed as w k ( i ) = a kk ψ k ( i ) + (cid:88) (cid:96) ∈N k \ k a (cid:96)k ( ψ (cid:96) ( i ) − (cid:15) (cid:96) ( i )) , (22)then subtract both sides of the above equation from w ◦ we obtain (cid:101) w k ( i ) = (cid:88) (cid:96) ∈N k a (cid:96)k (cid:101) ψ (cid:96) ( i ) + (cid:88) (cid:96) ∈N k \ k a (cid:96)k (cid:15) (cid:96) ( i ) . (23)Let A be the matrix whose ( (cid:96), k ) -th entry is the weight a (cid:96)k , also we introduce matrix C = A − diag (cid:8) ( a kk ) Nk =1 (cid:9) . Then relating (19), (20), (21), and (23) yields the following recursion: (cid:101) w ( i ) = B ( i ) (cid:101) w ( i − − A T M s ( i ) + C T (cid:15) ( i ) , (24)where A = A ⊗ I M , C = C ⊗ I M (25) B ( i ) = A T ( I MN − M R u ( i )) , (26) R u ( i ) = diag (cid:110) ( u k ( i ) u T k ( i )) Nk =1 (cid:111) , (27) M = diag (cid:8) ( µ k I M ) Nk =1 (cid:9) , (28) s ( i ) = A T col (cid:8) ( u k ( i ) v k ( i )) Nk =1 (cid:9) . (29) B. Mean Error Analysis

Suppose Assumption 1 and Assumption 2 hold, then by taking expectation on both sides of (24) wehave the following recursion model for the network mean error, E [ (cid:101) w ( i )] = B E [ (cid:101) w ( i − C T E [ (cid:15) ( i )] , (30) where B = E [ B ] = A T ( I MN − MR u ) , (31) R u = E [ R u ( i )] = diag (cid:8) ( R u,k ) Nk =1 (cid:9) . (32)We have the following result on the asymptotic behavior of the mean error. Theorem 1. (Mean Error Stability)

Suppose that Assumption 1 and Assumption 2 hold. Then, the networkmean error vector of EB-ATC, i.e., E [ (cid:101) w ( i )] , is bounded input bounded output (BIBO) stable in steadystate if the step-size µ k is chosen such that µ k < λ max ( R u,k ) . (33) In addition, the block maximum norm of the network mean error is upper-bounded by α − β · max ≤ k ≤ N (cid:18) δ k λ min ( Y k ) (cid:19) , (34) where, α = max ≤ k ≤ N (1 − a kk ) , β = (cid:107) I MN − MR u (cid:107) b, ∞ . (35) Proof.

See Appendix B

C. Mean-square Error Analysis

Due to the triggering mechanism and resulting a posterior gap, (20) correlates with the error vectors(18) and (19), and explicitly characterizing the exact network MSD of EB-ATC is technically difﬁcult.Instead, we study the upper bound of the network MSD. First, we derive the MSD recursions as follows.From the recursion (24), we have the following for any compatible non-negative deﬁnite matrix Σ : (cid:107) (cid:101) w ( i ) (cid:107) = (cid:101) w ( i − T B ( i ) T Σ B ( i ) (cid:101) w ( i −

1) + s ( i ) T M T A Σ A T M s ( i ) + (cid:15) ( i ) T C Σ C T (cid:15) ( i )+ 2 (cid:101) w ( i − T B ( i ) T Σ C T (cid:15) ( i ) − s ( i ) M T A Σ C T (cid:15) ( i ) − (cid:101) w ( i − T B ( i ) T Σ A T M s ( i ) . (36)Taking expectation on both sides of the above expression, the last term evaluates to zero underAssumption 1-2, and we have E (cid:107) (cid:101) w ( i ) (cid:107) = E (cid:107) (cid:101) w ( i − (cid:107) (cid:48) + t + t + 2 t − t , (37)where the weighting matrix Σ (cid:48) is Σ (cid:48) = E (cid:104) B ( i ) T Σ B ( i ) (cid:105) , (38) and the last four terms in (37) are given as follows, t = E [ s ( i ) T MA Σ A T M s ( i )] , (39) t = E [ (cid:15) ( i ) T C Σ C T (cid:15) ( i )] , (40) t = E [ (cid:101) w ( i − T B ( i ) T Σ C T (cid:15) ( i )] , (41) t = E [ s ( i ) M T A Σ C T (cid:15) ( i )] . (42)Further, we let σ = vec(Σ) and σ (cid:48) = vec(Σ (cid:48) ) . We then have σ (cid:48) = E σ , where E = E (cid:104) B ( i ) T ⊗ B ( i ) T (cid:105) = [ I M N − I MN ⊗ MR u − MR u ⊗ I MN + ( M ⊗ M ) E ( R u ( i ) ⊗ R u ( i ))] A ⊗ A . (43)So that (37) can be rewritten as, E (cid:107) (cid:101) w ( i ) (cid:107) σ = E (cid:107) (cid:101) w ( i ) (cid:107) E σ + t + t + 2 t − t . (44)Next, we derive the expression and bounds for terms

1) Term t : For the term t , we have t = E (cid:104) Tr (cid:16) A T M s ( i ) s ( i ) T MA Σ (cid:17)(cid:105) = Tr (cid:104) A T M E (cid:16) s ( i ) s ( i ) T (cid:17) MA Σ (cid:105) = Tr (cid:16) A T MSMA Σ (cid:17) = vec (cid:16) A T MSMA (cid:17) T σ, (45)where the equality (45) follows from the identity Tr( AB ) = vec( A T ) T vec( B ) , and S = diag (cid:8) ( σ v,k R u,k ) Nk =1 (cid:9) . (46)

2) Term t : Similarly, we have the following for the term t , t = Tr (cid:104) C T E (cid:16) (cid:15) ( i ) (cid:15) ( i ) T (cid:17) C Σ (cid:105) = vec( C ) T (cid:104) Σ ⊗ E (cid:16) (cid:15) ( i ) (cid:15) ( i ) T (cid:17)(cid:105) vec( C ) (47) Moreover, it can be veriﬁed that relationship yy T ≤ y T yI N holds for any vector y ∈ R N , and thus (cid:15) ( i ) (cid:15) ( i ) T ≤ (cid:15) ( i ) T (cid:15) ( i ) I MN follows immediately, so that we have E (cid:16) (cid:15) ( i ) (cid:15) ( i ) T (cid:17) ≤ (cid:15) ( i ) T (cid:15) ( i ) I MN = N (cid:88) k =1 (cid:107) (cid:15) k ( i ) (cid:107) I MN = N (cid:88) k =1 (cid:18) δ k λ min ( Y k ) (cid:19) I MN . (48)Now, letting ∆ = N (cid:88) k =1 (cid:18) δ k λ min ( Y k ) (cid:19) , (49)due to Σ ≥ the following results follows, Σ ⊗ (cid:104) E (cid:16) (cid:15) ( i ) (cid:15) ( i ) T (cid:17) − ∆ I MN (cid:105) ≤ , (50)and therefore, vec( C ) T (cid:110) Σ ⊗ (cid:104) E (cid:16) (cid:15) ( i ) (cid:15) ( i ) T (cid:17) − ∆ I MN (cid:105)(cid:111) vec( C ) ≤ , (51)or equivalently, vec( C ) T (cid:104) Σ ⊗ E (cid:16) (cid:15) ( i ) (cid:15) ( i ) T (cid:17)(cid:105) vec( C ) ≤ ∆ · vec( C ) T (Σ ⊗ I MN ) vec( C )= ∆ · Tr (cid:16) C T C Σ (cid:17) , (52)which further implies that t ≤ ∆ · vec (cid:16) C T C (cid:17) σ. (53)

3) Term t : Since matrix Σ is positive semi-deﬁnite, so that we have Σ = ΘΘ T . Then, let P = (cid:101) w ( i ) T B T ( i )Θ ,Q = (cid:15) ( i ) T C Θ . (54)From the fact ( P − Q )( P − Q ) T ≥ we have the following, P Q T + QP T ≤ P P T + QQ T . (55)Substituting (54) into the above inequality and taking expectation on both sides gives, t ≤ E (cid:104) (cid:101) w ( i − T B ( i ) T Σ B ( i ) (cid:101) w ( i − (cid:105) + E (cid:104) (cid:15) ( i ) T C Σ C T (cid:15) ( i ) (cid:105) = E (cid:107) (cid:101) w ( i − (cid:107) (cid:48) + t . (56)

4) Term t : Applying manipulations similar with t to t , we have t = Tr (cid:104) C T E (cid:16) (cid:15) ( i ) s ( i ) T (cid:17) MA Σ (cid:105) = vec (cid:16) C T E (cid:16) (cid:15) ( i ) s ( i ) T (cid:17) MA Σ (cid:17) T σ. (57)To facilitate the evaluation of the covariance matrix E (cid:0) (cid:15) ( i ) s ( i ) T (cid:1) , we derive its ( k, (cid:96) ) -th block entry,i.e., E [ (cid:15) k ( i ) u (cid:96) ( i ) v (cid:96) ( i )] . To this end, substituting (1) into (12), we can express ψ k ( i ) as follows, ψ k ( i ) = w k ( i −

1) + µ k u k ( i ) u k ( i ) T (cid:101) w k ( i −

1) + µ k u k ( i ) v k ( i ) , (58)so that we have E [ ψ k ( i ) u (cid:96) ( i ) v (cid:96) ( i )] = E [ w k ( i − u (cid:96) ( i ) v (cid:96) ( i )] + µ k E (cid:104) u k ( i ) u k ( i ) T (cid:101) w k ( i − u (cid:96) ( i ) v (cid:96) ( i ) (cid:105) + µ k E [ u k ( i ) v k ( i ) u (cid:96) ( i ) v (cid:96) ( i )] . (59)Note that (59) evaluates to zero if (cid:96) (cid:54) = k , and when (cid:96) = k the ﬁrst two terms in (59) evaluate to zero,and the last term equals µ k σ v,k R u,k . In addition, E (cid:2) ψ k ( i ) u (cid:96) ( i ) v (cid:96) ( i ) (cid:3) = 0 for all { k, (cid:96) } ∈ V . Therefore,at particular time instant i , by conditioning on γ k ( i ) = γ k ( i ) for all k , from (8) and (17) we concludethat E [ (cid:15) k ( i ) u (cid:96) ( i ) v (cid:96) ( i )] =  , if (cid:96) (cid:54) = k,µ k σ v,k R u,k , if (cid:96) = k and γ k ( i ) = 0 . So that the term t can be expressed as, t = − vec (cid:16) C T G ( i ) MSMA (cid:17) T σ, (60)where matrix S is given in (46) and G ( i ) = E diag (cid:8) ( γ k ( i ) I M ) Nk =1 (cid:9) − I MN . (61)Therefore, substituting (45), (53), (56), and (61) into (44), we have the following bound for the networkMSD at time instant i , E (cid:107) (cid:101) w ( i ) (cid:107) σ ≤ E (cid:107) (cid:101) w ( i − (cid:107) D σ + [ f + f + f ( i )] T σ, (62)where D = 2 E and matrix E is given in (43), and f = vec (cid:16) A T MSMA (cid:17) ,f = 2∆ · vec (cid:16) C T C (cid:17) ,f ( i ) = 2 vec (cid:16) C T G ( i ) MSMA (cid:17) . (63) Assumption 3.

Each node k adopts a regressor covariance matrix R u,k whose eigenvalues satisfy λ max ( R u,k ) < (cid:18) √ − √ (cid:19) λ min ( R u,k ) . (64) Theorem 2. (Mean-square Error Behavior)

Suppose that Assumptions 1-2 hold. Then, as i → ∞ , thenetwork MSD of EB-ATC, i.e., E (cid:107) (cid:101) w ( i ) (cid:107) /N , has a ﬁnite constant upper bound if the step sizes { µ k } are chosen such that ρ ( D ) < is satisﬁed. In addition, it follows that matrix D can be approximated by D ≈ F = D + O ( M ) , where F = 2 B T ⊗ B T , (65) so that if Assumption 3 also holds and { µ k } also satisfy − √ λ min ( R u,k ) < µ k < √ λ max ( R u,k ) , (66) an upper bound of the network MSD in steady state is given by N (cid:104) ( f + f ) T ( I M N − F ) − + f , ∞ (cid:105) vec( I MN ) + O ( µ ) , (67) where, µ max = max ≤ k ≤ N { µ k } , (68) f , ∞ = lim i →∞ i − (cid:88) j =0 f ( i − j ) T F j . (69) Proof.

See Appendix C

Remark 1.

Assumption 3 is needed additionally to ensure that the set of µ k in (66) is non-empty. Notethat if R u,k is chosen to be R u,k = σ u,k I M , the above assumption (64) is automatically met, and condition (66) becomes − √ σ u,k < µ k < √ σ u,k . (70) Besides, although diffusion adaptation strategies [7]–[12] usually do not have lower bounds for step sizeson the stability of network MSD, the condition (66) is a sufﬁcient condition to ensure the upper bound ofthe network MSD (62) converges at steady state, so that (66) is only sufﬁcient (but not necessary) for thestability of the exact network MSD in steady state. Indeed, numerical studies also suggest that withoutrelying on Assumption 3 and choosing a step size even smaller than the lower bounds in (66) will notcause the divergence of the network MSD in steady state. (a) Network topology (b) MSD performance (c) Average ENTR Fig. 1: Simulation results for the network.V. S

IMULATION R ESULTS

In this section, numerical examples are provided to illustrate the MSD performance and energy-efﬁciency of the proposed EB-ATC, and to compare against ATC and the non-cooperative LMS algorithm.We performed simulations on a network with N = 60 nodes as depicted in Fig. 1(a). The measurementnoise powers { σ v,k } are generated from a uniform distribution over [ − , − dB. We consider aparameter of interest w ◦ with dimension M = 10 , and suppose that the zero-mean regressor u k ( i ) hascovariance R u,k = σ u,k I M , where the coefﬁcients { σ u,k } are drawn uniformly from the interval [1 , .For the ease of implementation, we adopt constant and uniform triggering thresholds δ k ( i ) = δ , andidentity weighting matrix Y k = I M for the event triggering function of every node. Moreover, we usethe Metropolis rule [11] for the diffusion combination (13). All the simulations results are averaged over200 Monte Carlo runs.From Fig. 1(b), it can be observed that compared with the ATC strategy, MSDs of the proposed EB-ATC in steady-state are higher by a few dBs, but still much lower than that of the non-cooperative LMSalgorithm, which demonstrates the capability of EB-ATC to preserve the beneﬁts of diffusion cooperation.On the other hand, the convergence of EB-ATC is relatively slower. This is because in the transient phase,the event-based communication mechanism of EB-ATC restricts the frequency of exchanging the newestlocal intermediate estimates { ψ k,i } , for the purpose of energy saving. This leads to inferior transientperformance compared to ATC.On the other hand, EB-ATC achieves signiﬁcant communication overhead savings compared to ATC. To visualize this, we deﬁne the expected network triggering rate (ENTR) as follows:

ENTR( i ) = 1 N N (cid:88) k =1 E γ k ( i ) . (71)The ENTR at time instant i captures how frequently communication is triggered by each node at thattime instant i , on average. ENTR is directly proportional to the average communication overhead incurredby the nodes in the network at each time instant. From (71), it is clear that ≤ ENTR( i ) ≤ , so asmaller value of ENTR( i ) implies a lower energy consumption. Note that ATC has ENTR ( i ) = 1 for alltime instants i . From Fig. 1(c), we observe that the ENTR for EB-ATC decays rapidly over time duringthe transient phase, and for all the different triggering thresholds we tested, EB-ATC uses less than 30%of the communication overhead of ATC after the time instant i ≈ , which is the average time thatthe MSD of ATC is within 90% of its steady-state value. This demonstrates that even though EB-ATChas not reached steady-state (at i ≈ ), communication between nodes do not trigger very frequentlyas the intermediate estimates do not change signiﬁcantly after this time instant. Furthermore, in steadystate, although each node maintains estimates that are close to the true parameter value, communicationtriggering does not completely stop. This is due to occasional abrupt changes in the random noise andregressors, which can make the local estimate update deviate signiﬁcantly. This is in the same spirit ofwhy MSD does not converge to zero.It is also worth mentioning that, although in theory the methods in the literature [28]–[30] can savemore energy by transmitting only a few entries or compressed values, for real-time applications they maynot be as reliable as EB-ATC in under the same channel conditions, especially when the SNR is poor.To guarantee successful diffusion cooperation among neighborhood, higher channel SNR or more robustencoding scheme is required for [28]–[30], whereas EB-ATC is simpler yet effective.VI. C ONCLUSION

We have proposed an event-based diffusion ATC strategy where communication among neighboringnodes is triggered only when signiﬁcant changes occur in the local updates. The proposed algorithm is notonly able to signiﬁcantly reduce communication overhead, but can still maintain good MSD performanceat steady-state compared with the conventional diffusion ATC strategy. Future research includes analyzingthe expected triggering rate theoretically as well as characterizing the rate of convergence, and to establishtheir relationship with the triggering threshold, so that the thresholds can be selected to optimize itsperformance. A PPENDIX AP ROOF OF L EMMA Y k is positive semi-deﬁnite, and therefore real symmetric, so that there exists an unitary matrix U such that Y k = U diag (cid:8) λ m ( Y k ) Nm =1 (cid:9) U T , (72)Let φ m , m = 1 , , . . . , M be the eigenvectors of Y k , so we have U = [ φ , φ , · · · , φ M ] . (73)Recall that any vector x ∈ R M can be expressed as x = (cid:88) m ( φ T m x ) φ m , (74)therefore it is easy to verify that (cid:107) x (cid:107) Y k (cid:107) x (cid:107) = x T Y k xx T x = (cid:80) m λ m ( Y k )( φ T m x ) (cid:80) m ( φ T m x ) ≥ λ min ( Y k ) (75)which implies λ min ( Y k ) · (cid:107) (cid:15) k ( i ) (cid:107) ≤ (cid:107) (cid:15) k ( i ) (cid:107) Y k . (76)Besides, from (17), we can conclude that (cid:107) (cid:15) k ( i ) (cid:107) Y k ≤ (cid:13)(cid:13) (cid:15) − k ( i ) (cid:13)(cid:13) Y k ≤ δ k ( i ) (77)Therefore, we have λ min ( Y k ) · (cid:107) (cid:15) k ( i ) (cid:107) ≤ δ k ( i ) ≤ δ k (78)which gives (cid:107) (cid:15) k ( i ) (cid:107) ≤ (cid:18) δ k λ min ( Y k ) (cid:19) . (79)The proof is complete. A PPENDIX BP ROOF OF THE T HEOREM (cid:107)·(cid:107) b, ∞ to E [ (cid:15) ( i )] , due to every norm is a convex function of its argument,by Jensen’s inequality and Lemma 1, we have (cid:107) E [ (cid:15) ( i )] (cid:107) b, ∞ ≤ E (cid:104) (cid:107) (cid:15) ( i ) (cid:107) b, ∞ (cid:105) (80) = E (cid:20) max ≤ k ≤ N (cid:107) (cid:15) k ( i ) (cid:107) (cid:21) (81) ≤ max ≤ k ≤ N (cid:18) δ k λ min ( Y ) (cid:19) , (82)where we have used the deﬁnition of the block maximum norm in [11] for the equality (81), and (82)follows from the Lemma 1. The right hand side (R.H.S.) of (82) is a ﬁnite constant scalar, which impliesthat the input signal to the recursion (30), i.e., E [ (cid:15) ( i )] is bounded. Therefore, the recursion (30) is BIBOstable if ρ ( B ) < .In addition, since matrix A T is left-stochastic, by applying the Lemma. D5 and Lemma. D6 in [11],we have the following from (31), ρ ( B ) = ρ (cid:16) A T ( I MN − MR u ) (cid:17) (83) ≤ ρ ( I MN − MR u ) (84) = (cid:107) I MN − MR u (cid:107) b, ∞ . (85)Therefore, we conclude that the network mean error is BIBO stable if (cid:107) I MN − MR u (cid:107) b, ∞ < , (86)which further yields the condition (33).To establish the upper bound (34), we iterate (30) from i = 0 , which gives, E [ (cid:101) w ( i )] = B i E [ (cid:101) w (0)] + i − (cid:88) j =0 B j C T E [ (cid:15) ( i − j )] . (87)Then applying block maximum norm (cid:107)·(cid:107) b, ∞ on both sides of the above equation, by the properties ofvector norms and induced matrix norms, it can be obtained that (cid:107) E [ (cid:101) w ( i )] (cid:107) b, ∞ ≤ (cid:13)(cid:13) B i (cid:13)(cid:13) b, ∞ · (cid:107) E [ (cid:101) w (0)] (cid:107) b, ∞ + i − (cid:88) j =0 (cid:13)(cid:13) B j (cid:13)(cid:13) b, ∞ · (cid:13)(cid:13)(cid:13) C T E [ (cid:15) ( i − j )] (cid:13)(cid:13)(cid:13) b, ∞ (88) ≤ (cid:13)(cid:13)(cid:13) A T (cid:13)(cid:13)(cid:13) ib, ∞ · (cid:107) I MN − MR u (cid:107) ib, ∞ · (cid:107) E [ (cid:101) w (0)] (cid:107) b, ∞ (89) + i − (cid:88) j =0 (cid:13)(cid:13)(cid:13) A T (cid:13)(cid:13)(cid:13) jb, ∞ · (cid:107) I MN − MR u (cid:107) jb, ∞ · (cid:13)(cid:13)(cid:13) C T (cid:13)(cid:13)(cid:13) b, ∞ · (cid:107) E [ (cid:15) ( i − j )] (cid:107) b, ∞ . (90) Let α = (cid:13)(cid:13) C T (cid:13)(cid:13) b, ∞ , from the Lemma. D3 of [11] we have α = (cid:13)(cid:13)(cid:13) C T (cid:13)(cid:13)(cid:13) ∞ = max ≤ k ≤ N (1 − a kk ) . (91)Moreover, since matrix A T is left-stochastic, so that we have (cid:13)(cid:13) A T (cid:13)(cid:13) b, ∞ = 1 by the Lemma. D4 of [11].Let β = (cid:107) I MN − MR u (cid:107) b, ∞ , then substitute (82) into (90) we obtain that, (cid:107) E [ (cid:101) w ( i )] (cid:107) b, ∞ ≤ (cid:107) E [ (cid:101) w (0)] (cid:107) b, ∞ · β i + α · max ≤ k ≤ N (cid:18) δ k λ min ( Y ) (cid:19) · i − (cid:88) j =0 β j . (92)If step size µ k is chosen to satisfy ≤ β < , then letting i → ∞ on both sides of (92) we arrive atfollowing inequality relationship lim i →∞ (cid:107) E [ (cid:101) w ( i )] (cid:107) b, ∞ ≤ α − β · max ≤ k ≤ N (cid:18) δ k λ min ( Y ) (cid:19) , (93)and the proof is complete. A PPENDIX CP ROOF OF THE T HEOREM i = 1 , we have E (cid:107) (cid:101) w ( i ) (cid:107) σ ≤ E (cid:107) (cid:101) w ( i − (cid:107) D σ + ( f + f ) T  i − (cid:88) j =0 D j  σ +  i − (cid:88) j =0 f ( i − j ) T D j  σ, (94)where vectors f , f , and f ( i ) are given in (63). Letting i → ∞ , the ﬁrst term on the R.H.S. of the aboveinequality converges to zero, and the second term converge to a ﬁnite value ( f + f ) T ( I M N − D ) − σ ,if and only if D i → as i → ∞ , i.e., ρ ( D ) < . From (61) and (63) we have f ( i ) is bounded dueto every entry of matrix G ( i ) is bounded. Moreover, if ρ ( D ) < , there exists a norm (cid:107)·(cid:107) ζ such that (cid:107)D(cid:107) ζ < , therefore we have (cid:12)(cid:12)(cid:12) f ( i − j ) T D j σ (cid:12)(cid:12)(cid:12) ≤ a · (cid:107)D(cid:107) jζ , (95)for some positive constant a . Since (cid:107)D(cid:107) jζ → as j → ∞ , the series, i − (cid:88) j =0 (cid:12)(cid:12)(cid:12) f ( i − j ) T D j σ (cid:12)(cid:12)(cid:12) (96)converges as i → ∞ , which implies the absolute convergence of the third term of R.H.S of (94).Besides, note that the matrix F given in (65) can be explicitly expressed as F = 2 B T ⊗ B T = [ I M N − I MN ⊗ MR u − MR u ⊗ I MN + ( M ⊗ M ) ( R u ⊗ R u )] A ⊗ A . (97) Substituting (43) in to D = 2 E and comparing with the above (97), we have D = F + O ( M ) , (98)where O ( M ) = ( M ⊗ M ) { E [ R u ( i ) ⊗ R u ( i )] − R u ⊗ R u } , (99)so that substituting (98) into the R.H.S of (94) gives E (cid:107) (cid:101) w ( i ) (cid:107) σ ≤ E (cid:107) (cid:101) w ( i − (cid:107) F σ + ( f + f ) T  i − (cid:88) j =0 F j  σ +  i − (cid:88) j =0 f ( i − j ) T F j  σ + E (cid:107) (cid:101) w ( i − (cid:107) O ( M ) σ + g ( i ) T O ( M ) σ, (100)where g ( i ) = f + f + i − (cid:88) j f ( j ) . (101)Due to the vector g ( i ) is bounded, so that if ρ ( D ) < such that E (cid:107) (cid:101) w ( i − (cid:107) σ is bounded, then thelast two terms on the R.H.S of (100) are negligible for sufﬁciently small step sizes { µ k } , which meansthe matrix D can be approximated by D ≈ F if { µ k } are sufﬁciently small and also satisfy ρ ( D ) < .Therefore (100) can be further expressed as E (cid:107) (cid:101) w ( i ) (cid:107) σ ≤ E (cid:107) (cid:101) w ( i − (cid:107) F σ + ( f + f ) T  i − (cid:88) j =0 F j  σ +  i − (cid:88) j =0 f ( i − j ) T F j  σ + O ( µ ) . (102)Choosing σ = vec( I MN ) N and using arguments similar for (94), as i → ∞ the ﬁrst term on the R.H.S of(102) converges to N [( f + f ) ( I M N − F ) − + g ∞ ] vec( I MN ) , (103)if and only if F is stable, i.e., ρ ( F ) < , where f , ∞ is given in (69). Since ρ ( F ) = ρ (2 B T ⊗ B T ) = 2 ρ ( B ) , (104)so that a sufﬁcient condition to guarantee ρ ( F ) < is ρ ( B ) < √ . By the Lemma D.5 in [11], we have ρ ( B ) ≤ ρ ( I MN − MR u ) = max ≤ k ≤ N ρ ( I M − µ k R u,k ) . (105)Thus, to have ρ ( B ) < √ , we need max ≤ k ≤ N ρ ( I M − µ k R u,k ) < √ , (106) which is requiring each node k to satisfy max ≤ m ≤ M | − µ k λ m ( R u,k ) | < √ , (107)and this is equivalent to require that | − µ k λ m ( R u,k ) | < √ (108)holds for each eigenvalue of R u,k , i.e., λ m ( R u,k ) . From (108), we obtain that µ k needs to satisfy − √ λ m ( R u,k ) < µ k < √ λ m ( R u,k ) (109)for each of { λ m ( R u,k ) | ≤ m ≤ M } . In addition, suppose for each R u,k we have λ max ( R u,k ) < (cid:18) √ − √ (cid:19) λ min ( R u,k ) , (110)then requiring µ k to satisfy (109) for every λ m ( R u,k ) yields − √ λ min ( R u,k ) < µ k < √ λ max ( R u,k ) , which is the condition (66). R EFERENCES [1] D. Bertsekas, “A new class of incremental gradient methods for least squares problems,”

SIAM J. Optim. , vol. 7, no. 4,pp. 913–926, 1997.[2] M. G. Rabbat and R. D. Nowak, “Quantized incremental algorithms for distributed optimization,”

IEEE J. Sel. AreasCommun. , vol. 23, no. 4, pp. 798–808, April 2005.[3] N. Bogdanovi´c, J. Plata-Chaves, and K. Berberidis, “Distributed incremental-based LMS for node-speciﬁc adaptiveparameter estimation,”

IEEE Trans. Signal Process. , vol. 62, no. 20, pp. 5382–5397, Oct 2014.[4] L. Xiao, S. Boyd, and S. Lall, “A space-time diffusion scheme for peer-to-peer least-squares estimation,” in

Proc. Int.Conf. on Info. Process. in Sensor Networks , 2006, pp. 168–176.[5] A. Nedic, A. Ozdaglar, and P. A. Parrilo, “Constrained consensus and optimization in multi-agent networks,”

IEEE Trans.Autom. Control , vol. 55, no. 4, pp. 922–938, April 2010.[6] K. Srivastava and A. Nedic, “Distributed asynchronous constrained stochastic optimization,”

IEEE J. Sel. Topics SignalProcess. , vol. 5, no. 4, pp. 772–790, Aug 2011.[7] F. S. Cattivelli and A. H. Sayed, “Diffusion LMS strategies for distributed estimation,”

IEEE Trans. Signal Process. , vol. 58,no. 3, pp. 1035–1048, March 2010.[8] X. Zhao and A. H. Sayed, “Performance limits for distributed estimation over LMS adaptive networks,”

IEEE Trans. SignalProcess. , vol. 60, no. 10, pp. 5107–5124, Oct 2012.[9] S. Y. Tu and A. H. Sayed, “Diffusion strategies outperform consensus strategies for distributed estimation over adaptivenetworks,”

IEEE Trans. Signal Process. , vol. 60, no. 12, pp. 6217–6234, Dec 2012.[10] A. H. Sayed, S. Y. Tu, J. Chen, X. Zhao, and Z. J. Towﬁc, “Diffusion strategies for adaptation and learning over networks:an examination of distributed strategies and network behavior,”

IEEE Signal Process. Mag. , vol. 30, no. 3, pp. 155–171,May 2013. [11] A. H. Sayed, “Diffusion adaptation over networks,” in Academic Press Library in Signal Processing . Elsevier, 2014,vol. 3, pp. 323 – 453.[12] ——, “Adaptive networks,”

Proc. IEEE , vol. 102, no. 4, pp. 460–497, April 2014.[13] W. Hu and W. P. Tay, “Multi-hop diffusion LMS for energy-constrained distributed estimation,”

IEEE Trans. Signal Process. ,vol. 63, no. 15, pp. 4022–4036, Aug 2015.[14] Y. Zhang, C. Wang, L. Zhao, and J. A. Chambers, “A spatial diffusion strategy for tap-length estimation over adaptivenetworks,”

IEEE Trans. Signal Process. , vol. 63, no. 17, pp. 4487–4501, Sept 2015.[15] R. Abdolee and B. Champagne, “Diffusion LMS strategies in sensor networks with noisy input data,”

IEEE/ACM Trans.Netw. , vol. 24, no. 1, pp. 3–14, Feb 2016.[16] S. Ghazanfari-Rad and F. Labeau, “Formulation and analysis of LMS adaptive networks for distributed estimation in thepresence of transmission errors,”

IEEE Internet Things J. , vol. 3, no. 2, pp. 146–160, April 2016.[17] M. J. Piggott and V. Solo, “Diffusion LMS with correlated regressors i: Realization-wise stability,”

IEEE Trans. SignalProcess. , vol. 64, no. 21, pp. 5473–5484, Nov 2016.[18] K. Ntemos, J. Plata-Chaves, N. Kolokotronis, N. Kalouptsidis, and M. Moonen, “Secure information sharing in adversarialadaptive diffusion networks,”

IEEE Trans. Signal Inf. Process. Netw. , vol. PP, no. 99, pp. 1–1, 2017.[19] C. Wang, Y. Zhang, B. Ying, and A. H. Sayed, “Coordinate-descent diffusion learning by networked agents,”

IEEE Trans.Signal Process. , vol. 66, no. 2, pp. 352–367, Jan 2018.[20] J. Plata-Chaves, N. Bogdanovi´c, and K. Berberidis, “Distributed diffusion-based LMS for node-speciﬁc adaptive parameterestimation,”

IEEE Trans. Signal Process. , vol. 63, no. 13, pp. 3448–3460, July 2015.[21] R. Nassif, C. Richard, A. Ferrari, and A. H. Sayed, “Multitask diffusion adaptation over asynchronous networks,”

IEEETrans. Signal Process. , vol. 64, no. 11, pp. 2835–2850, June 2016.[22] J. Chen, C. Richard, and A. H. Sayed, “Multitask diffusion adaptation over networks with common latent representations,”

IEEE J. Sel. Topics Signal Process. , vol. 11, no. 3, pp. 563–579, April 2017.[23] Y. Wang, W. P. Tay, and W. Hu, “A multitask diffusion strategy with optimized inter-cluster cooperation,”

IEEE J. Sel.Topics Signal Process. , vol. 11, no. 3, pp. 504–517, April 2017.[24] J. Fernandez-Bes, J. Arenas-García, M. T. M. Silva, and L. A. Azpicueta-Ruiz, “Adaptive diffusion schemes forheterogeneous networks,”

IEEE Trans. Signal Process. , vol. 65, no. 21, pp. 5661–5674, Nov 2017.[25] X. Zhao and A. H. Sayed, “Single-link diffusion strategies over adaptive networks,” in

Proc. IEEE Int. Conf. on Acoustics,Speech and Signal Process. , March 2012, pp. 3749–3752.[26] R. Arablouei, S. Werner, K. Do˘gançay, and Y.-F. Huang, “Analysis of a reduced-communication diffusion LMS algorithm,”

Signal Processing , vol. 117, pp. 355–361, 2015.[27] W. Huang, X. Yang, and G. Shen, “Communication-reducing diffusion LMS algorithm over multitask networks,”

Information Sciences , vol. 382, pp. 115–134, 2017.[28] R. Arablouei, S. Werner, Y. F. Huang, and K. Do˘gançay, “Distributed least mean-square estimation with partial diffusion,”

IEEE Trans. Signal Process. , vol. 62, no. 2, pp. 472–484, Jan 2014.[29] M. O. Sayin and S. S. Kozat, “Compressive diffusion strategies over distributed networks for reduced communicationload,”

IEEE Trans. Signal Process. , vol. 62, no. 20, pp. 5308–5323, Oct 2014.[30] I. E. K. Harrane, R. Flamary, and C. Richard, “Doubly compressed diffusion LMS over adaptive networks,” in

Proc. 50-thAsilomar Conf. on Signals, Sys. and Comp. , Nov 2016, pp. 987–991.[31] J. Wu, Q. S. Jia, K. H. Johansson, and L. Shi, “Event-based sensor data scheduling: Trade-off between communicationrate and estimation quality,”

IEEE Trans. Autom. Control , vol. 58, no. 4, pp. 1041–1046, April 2013. [32] D. Han, Y. Mo, J. Wu, S. Weerakkody, B. Sinopoli, and L. Shi, “Stochastic event-triggered sensor schedule for remotestate estimation,” IEEE Trans. Autom. Control , vol. 60, no. 10, pp. 2661–2675, Oct 2015.[33] Q. Liu, Z. Wang, X. He, and D. H. Zhou, “Event-based recursive distributed ﬁltering over wireless sensor networks,”

IEEETrans. Autom. Control , vol. 60, no. 9, pp. 2470–2475, Sept 2015.[34] A. Mohammadi and K. N. Plataniotis, “Event-based estimation with information-based triggering and adaptive update,”

IEEE Trans. Signal Process. , vol. 65, no. 18, pp. 4924–4939, Sept 2017.[35] G. S. Seyboth, D. V. Dimarogonas, and K. H. Johansson, “Event-based broadcasting for multi-agent average consensus,”

Automatica , vol. 49, no. 1, pp. 245–252, 2013.[36] E. Garcia, Y. Cao, and D. W. Casbeer, “Decentralized event-triggered consensus with general linear dynamics,”

Automatica ,vol. 50, no. 10, pp. 2633–2640, 2014.[37] W. Hu, L. Liu, and G. Feng, “Consensus of linear multi-agent systems by distributed event-triggered strategy,”

IEEE Trans.Cybern. , vol. 46, no. 1, pp. 148–157, Jan 2016.[38] L. Xing, C. Wen, F. Guo, Z. Liu, and H. Su, “Event-based consensus for linear multiagent systems without continuouscommunication,”

IEEE Trans. Cybern. , vol. 47, no. 8, pp. 2132–2142, Aug 2017.[39] I. Utlu, O. F. Kilic, and S. S. Kozat, “Resource-aware event triggered distributed estimation over adaptive networks,”