Stochastic Alignment Processes
SStochastic Alignment Processes
Amos Korman and Robin VacusFebruary 3, 2021
Abstract
The tendency to align to others is inherent to social behavior, including in animal groups, and flockingin particular. Here we introduce the
Stochastic Alignment
Problem, aiming to study basic algorithmicaspects that govern alignment processes in unreliable stochastic environments. Consider n birds thataim to maintain a cohesive direction of flight. In each round, each bird receives a noisy measurement ofthe average direction of others in the group, and consequently updates its orientation. Then, before thenext round begins, the orientation is perturbed by random drift (modelling, e.g., the affects of wind).We assume that both noise in measurements and drift follow Gaussian distributions. Upon receiving ameasurement, what should be the orientation adjustment policy of birds if their goal is to minimize theaverage (or maximal) expected deviation of a bird’s direction from the average direction? We prove thata distributed weighted-average algorithm, termed W (cid:63) , that at each round balances between the currentorientation of a bird and the measurement it receives, maximizes the social welfare. Interestingly, theoptimality of this simple distributed algorithm holds even assuming that birds can freely communicateto share their gathered knowledge regarding their past and current measurements. We find this resultsurprising since it can be shown that birds other than a given i can collectively gather information thatis relevant to bird i , yet not processed by it when running a weighted-average algorithm. Intuitively, itseems that optimality is nevertheless achieved, since, when running W (cid:63) , the birds other than i somehowmanage to collectively process the aforementioned information in a way that benefits bird i , by turningthe average direction towards it. Finally, we also consider the game-theoretic framework, proving that W (cid:63) is the only weighted-average algorithm that is at Nash equilibrium. Keywords:
Noisy communication, Kalman filter, Flocking, Biological distributed algorithms, Distributedsignal processing, Clock Synchronization, Weighted-average algorithms.1 a r X i v : . [ c s . G T ] F e b Introduction
Reaching agreement, or approximate agreement, is fundamental to many distributed systems, including,e.g., computer networks, mobile sensor systems, animal groups, and neural networks [11, 8, 28, 23, 27, 7].In the natural world, one of the most beautiful manifestations of approximate agreement happens whenflocking birds (or, e.g., schooling fish) manage to maintain a cohesive direction of movement by constantlyaligning themselves to others [28, 27, 7]. In this context, as well as in multiple other contexts, such as duringclock synchronization [26, 25], the space in which the approximate agreement process occurs is continuous,measurements are noisy and the output needs to be maintained over time despite drift.This paper introduces the
Stochastic Alignment problem, aiming to capture some of the basic algorithmicchallenges involved in reaching approximate agreement under such stochastic conditions. Informally, theproblem considers a group of n agents positioned on the real line, aiming to be located as close as possible toone another. Initially, agents’ positions are sampled from a Gaussian distribution around 0. In each round,each agent receives a noisy measurement of its current deviation from the average position of others. Then,governed by the rules of its algorithm, each agent performs a move to re-adjust its position. Subsequently,before the next round begins, the position of each agent is perturbed following random drift. Both noises inmeasurements and random drifts are governed by Gaussian distributions. We are mostly interested in thefollowing questions: • Which re-adjustment rule should agents adopt if their goal is to minimize the maximal (or average)expected distance of an agent from the center of mass? • Could further communication between agents (e.g., by sharing measurements) help? • What would be the impact on the global alignment when each agent aims to minimize its own distancefrom the center of mass?Importantly, we assume that agents are unaware of the actual value of their current positions, and of therealizations of the random drifts, and instead, must base their movement decisions only on noisy measure-ments of relative positions. This lack of global “sense of orientation” prevents the implementation of thetrivial distributed protocol in which all agents simply move to a predetermined point, say 0.One trivial algorithm is the “fully responsive” protocol, where in each round, each agent moves all theway to its current measurement of the average position of others. This alignment protocol was assumedin various models that consider alignment, including in the celebrated flocking model by Vicsek et al. [27].When drift is large, measurement noise is negligible, and the number of agents is large, this protocol isexpected to be highly efficient. However, when measurement noise is non-negligible, it is expected thatincorporating past measurements could enhance the cohesion, even though drift may have changed theconfiguration considerably.Perhaps the simplest algorithms that take such past information into account are weighted-average algo-rithms; By weighing the current position against the measured position in a non-trivial way, such algorithmscan potentially exploit the fact that the current position implicitly encodes information from past measure-ments. Indeed, in a centralized setting, when a single agent aims to estimate a fixed target relying on noisyGaussian measurements, a weighted-average algorithm is known to be optimal [1]. However, here the settingis more complex since it is distributed, and the objective goal is to estimate (and get closer to) the center ofmass, which is a function of the agents decisions.
We consider n agents located on the real line R . Let I = { , . . . , n } be the set of agents. We denote by θ ( t ) i ∈ R the position of Agent i at round t , where it is assumed that initially agents are normally distributed Depending on the application, the actual domain may be bounded, or periodic. For example, when modeling directions,the domain is [ − π, π ] and when modeling clock synchronization, the domain may be [0 , T ] for some phase duration T . Sincewe are interested in the cases where agents are more or less aligned, approximating an interval domain with the real line is notexpected to reduce the generality of our results. σ , that is, for each agent i , θ (0) i ∼ N (cid:0) , σ (cid:1) . Execution proceeds in discrete rounds. At round t , each agent i receives a noisy measurement of the deviationfrom the current average position of all other agents. Specifically, denote the average of the positions of allagents except i by: (cid:104) θ ( t ) − i (cid:105) = 1 n − n (cid:88) j =1 j (cid:54) = i θ ( t ) i . Let θ ( t ) i = (cid:104) θ ( t ) − i (cid:105) − θ ( t ) i denote the stretch of Agent i . At any round t , for every i ∈ I , a noisy measurementof the stretch of Agent i is sampled: Y ( t ) i = θ ( t ) i + N ( t ) m,i , (1)where N ( t ) m,i ∼ N (0 , σ m ). In response, Agent i makes a move dθ ( t ) i and may update its memory state (if ithas any). Finally, the position of Agent i at the next round is obtained by adding a drift: θ ( t +1) i = θ ( t ) i + dθ ( t ) i + N ( t ) d,i , (2)where N ( t ) d,i ∼ N (0 , σ d ). All random perturbations (cid:16) N ( t ) m,i (cid:17) i ∈ I and (cid:16) N ( t ) d,i (cid:17) i ∈ I are mutually independent, andwe assume that σ m , σ d > cost of Agent i at a given time t is the absolute value of its expected stretch at that time , i.e., C ( t ) i := E (cid:16)(cid:12)(cid:12)(cid:12) θ ( t ) i (cid:12)(cid:12)(cid:12)(cid:17) . Note that the cost depends on the algorithm used by i but also on the algorithms used by others. As thesealgorithms will be clear from the context, we typically omit mentioning them in notations. Definition 1 (Optimality) . We say that an algorithm is optimal if, for every i ∈ { , . . . , n } and everyround t , no algorithm can achieve a strictly smaller cost C ( t ) i . Weighted-average algorithms.
Perhaps the simplest algorithms that one may consider are weighted-average algorithms. Such an algorithm is characterized by a responsiveness parameter ρ ( t ) for each round t ,indicating the weight given to the measurement at that round. Formally, an agent i following the weighted-average algorithm W ( ρ ( t ) ) at round t , sets dθ ( t ) i = ρ ( t ) Y ( t ) i . (3) Full communication model.
When executing a weighted-average algorithm, an agent bases its deci-sions solely on its own measurements. A main question we ask is whether, and if so to what extent, canperformances be improved if agents could communicate with each other to share their measurements. Inorder to study the impact of communication, we compare the performances of the best weighted-algorithmto the performances of the best algorithm in a full-communication setting, where agents are free to sharetheir measurements with all other agents at no cost. In the case that agents have identities, this setting isessentially equivalent to the following centralized setting: Consider a master agent that is external to thesystem. The master agent receives, at any round t , the stretch measurements of all agents, i.e., the collection { Y ( t ) j } nj =1 , where these measurements are noisy in the same manner as described in Eq. (1). Analyzing thesemeasurements at round t , the master agent then instructs each agent i to move by a quantity dθ ( t ) i . Aftermoving, the agents are subject to drift, as described in Eq. (2). Note that the master agent is unable to “see” Another natural cost measure is the expected deviation from the average position of all agents (including the agent), i.e., C ( t ) i (cid:48) := E ( | n (cid:80) nj =1 θ ( t ) j − θ ( t ) i | ) . These two measures are effectively equivalent. Indeed, C ( t ) i (cid:48) = n − n C ( t ) i , thus an algorithmminimizing one measure will also minimize the other. i contain strictly lessinformation about the agent’s relative position to the center of mass than the information contained in thecollection of all measurements { Y ( t ) j } j (cid:54) = i . Indeed, although the { Y ( t ) j } j (cid:54) = i measurements are not centeredaround the stretch θ ( t ) i of Agent i , they still contain useful information. For example, it can be shown that − n (cid:88) j =1 j (cid:54) = i Y ( t ) j = θ ( t ) i − n (cid:88) j =1 j (cid:54) = i N ( t ) m,j , (4)thus representing an additional “fresh” estimation of θ ( t ) i . Therefore, it may appear that in the centralizedsetting, the stretch of agents could potentially be reduced by letting the master agent process all measure-ments. Weighted-average algorithms.
We first investigate weighted-average algorithms in which all agents havethe same responsiveness ρ , that furthermore remains fixed throughout the execution (see Eq. (3)). The proofof the following theorem is deferred to Appendix A.1. Theorem 2.
Assume that all agents execute W ( ρ ) , for a fixed ≤ ρ ≤ . Then for every i ∈ { , ..., n } andevery t ∈ N , the stretch θ ( t ) i is normally distributed, and lim t → + ∞ Var (cid:16) θ ( t ) i (cid:17) = nn − ( ρ σ m + σ d )1 − (1 − nn − ρ ) , (5) with the convention that lim t → + ∞ Var (cid:16) θ ( t ) i (cid:17) = + ∞ if the denominator − (1 − nn − ρ ) = 0 . If all agents run W ( ρ ), then the extent to which they are aligned with each-other asymptotically iscaptured by Var( ρ ) := lim t → + ∞ Var (cid:16) θ ( t ) i (cid:17) . Indeed, for every i , since θ ( t ) i is normally distributed,lim t → + ∞ E (cid:16)(cid:12)(cid:12)(cid:12) θ ( t ) i (cid:12)(cid:12)(cid:12)(cid:17) = (cid:114) π Var( ρ ) . The minimal value of this is achieved when taking argmin ρ Var( ρ ) as the responsiveness parameter. Theproof of the following theorem is deferred to Appendix A.1. Theorem 3.
The weighted-average algorithm that optimizes group variance among all weighted-averagealgorithms W ( ρ ) (that use the same responsiveness parameter ρ at all rounds) is W ( ρ (cid:63) ) , where ρ (cid:63) = σ d (cid:114) σ m + (cid:16) nn − σ d (cid:17) − nn − σ d σ m . (6)When n is large, Eq (6) becomes ρ (cid:63) ≈ σ d (cid:112) σ m + σ d − σ d σ m . For example, if σ m (cid:29) σ d , then ρ (cid:63) ≈
0. However, if σ m (cid:28) σ d then ρ (cid:63) ≈
1. Interestingly, if σ m = σ d then ρ (cid:63) ≈ √ − , which is highly related to the golden ratio. Moreover, for large n , the minimal Var( ρ ) isVar( ρ (cid:63) ) = 12 σ d (cid:18)(cid:113) σ m + σ d + σ d (cid:19) . (7)4ote that when the measurements are perfect, i.e., σ m = 0, we have Var( ρ (cid:63) ) = σ d , which is the bestachievable value that an agent can hope for, since no strategy can overcome the drift-noise. The impact of communication.
Our next goal is to understand whether, and if so, to what extent, canperformances be improved if further communication between agents is allowed. For this purpose, we comparethe performances of W ( ρ (cid:63) ) to the performances of the best algorithm in a centralized (full-communication)setting.A natural candidate for an optimal algorithm in the centralized setting is the “meet at the center”algorithm. This algorithm first obtains, for each agent, the best possible estimate of the distance from theagent’s position to the center of mass (cid:104) θ (cid:105) , based on all measurements, and then instructs the agent to moveby this quantity (towards the estimated center of mass). However, it is not immediate to figure out thedistances to the center of mass, and furthermore, quantify the performances of this algorithm. To this end,we adapt the celebrated Kalman filter tool, commonly used in statistics and control theory [1], to our setting.Solving the Kalman filter system associated with the centralized version of our alignment problem, we obtainan estimate of the relative distance of each agent i from the center of mass (based on all measurements). Todescribe these estimates we first define the following. Definition 4.
We inductively define the sequence ( α t ) ∞ t =0 . Let α = nσ / ( n − t ,let α t +1 = σ m α tnn − α t + σ m + nn − σ d . Definition 5.
For every integer t , let ρ ( t ) (cid:63) = α tnn − α t + σ m . At each round t , the Kalman filter returns an estimate of the relative distance of each agent i from thecenter of mass which turns out to be n − n ρ ( t ) (cid:63) (cid:16) Y ( t ) i − n − (cid:80) j (cid:54) = i Y ( t ) j (cid:17) . As guaranteed by the properties of theKalman filter, these estimates minimize the expected sum of square-errors, which can be translated to ourdesired measure of minimizing the agents’ costs. The “meet at the center” algorithm is given by Algorithm1 below, and the following theorem stating its optimality is proved in Section 3. Algorithm 1:
Meet at the center foreach round t do Consider all measurements at round t , { Y ( t ) j | ≤ j ≤ n } ; foreach agent i do Set dθ ( t ) i = n − n ρ ( t ) (cid:63) (cid:16) Y ( t ) i − n − (cid:80) j (cid:54) = i Y ( t ) j (cid:17) ; /* Output an estimate of (cid:104) θ (cid:105) − θ i */ end endTheorem 6. Algorithm 1 is optimal in the centralized setting.
Quite remarkably, another solution the follows from the Kalman filter estimations is in the form of aweighted-average algorithm. The proof of the following theorem is given in Section 3.
Theorem 7.
The weighted-average algorithm W (cid:63) := W ( ρ ( t ) (cid:63) ) is optimal in the centralized setting. Note that by the strong definition of optimality (Definition 1), for any given i , no algorithm in thecentralized setting can achieve a better cost for agent i (for any round t ). We find this optimality resultsurprising since, as mentioned, agents other than a given i can collectively gather information that is relevantto agent i , yet not processed by it when running a weighted-average algorithm (see Eq. (4)). Intuitively, itseems that optimality is nevertheless achieved, since, when running W (cid:63) , the agents other than i somehowmanage to collectively process the aforementioned information in a way that benefits agent i , by shifting thecenter of mass towards it.In contrast to W ( ρ (cid:63) ), Algorithm W (cid:63) uses a different responsiveness ρ ( t ) (cid:63) at each round t . We nextargue that the sequences ( α t ) and ( ρ ( t ) (cid:63) ) converge. Not surprisingly, at the limit, we recover the optimalresponsiveness as stated in Theorem 3. The proof of the following claim is deferred to Appendix A.2.5igure 1: Position of the center of mass of a group with n = 3 agents over time, when W (cid:63) is used (red),and when “meet at the center” is used (blue), while both algorithms face the same randomness, in bothmeasurement noise and drift, and initialization of positions. Parameters are σ m = σ d = 1. Claim 8.
The sequence α t converges to α ∞ := lim t → + ∞ α t = (cid:32) σ d (cid:114) σ m + (cid:16) nn − σ d (cid:17) + nn − σ d (cid:33) .Moreover, lim t → + ∞ ρ ( t ) (cid:63) = ρ (cid:63) . Note that in the centralized setting, once we have an optimal algorithm A , we can derive another optimalalgorithm B but simply shifting all agents, at each round t , by a fixed quantity λ t . Indeed, such shifts do notinfluence the relative positions between the agents. Conversely, we prove in Appendix B.2 that all (optimal)deterministic algorithms in the centralized setting, are in fact, shifts of one another, though, we stress thatshifts λ t are not necessarily the same for all rounds t . In particular, Algorithm W (cid:63) can be obtained byadding the shift λ t = n ρ ( t ) (cid:63) (cid:80) ni =1 Y ( t ) i to the agents in Algorithm 1 (see Appendix B.3). Figure 1 depicts thetrajectory of the center of mass of the group, when W (cid:63) is used, and when the “meet at the center” algorithmis used, while facing the same randomness instantiations. Game theory.
Finally, we investigate the game theoretic setting, in which each agents aims to minimizeits own deviation from the center of mass, and identify the specific role played by W (cid:63) also in this setting. Definition 9.
We say that a profile of algorithms ( A i ) i ∈ I for the agents is a strong Nash equilibrium if forevery i ∈ I , and for every round t , no algorithm A (cid:48) i for Agent i yields a smaller cost C ( t ) i , if each other agent j keeps using A j .The proof of the following theorem is deferred to Appendix A.3. Theorem 10.
Algorithm W (cid:63) is a (symmetric) strong Nash-equilibrium. Moreover, if all agents are restrictedto execute weighted-average algorithms, then W (cid:63) is the only strong Nash equilibrium. Kalman filter and noisy self-organizing systems.
The Kalman filter algorithm is a prolific tool com-monly used in control theory and statistics, with numerous technological applications [1]. This algorithmreceives a sequence of noisy measurements, and produces estimates of unknown variables, by estimating ajoint probability distribution over the variables for each time step. In this paper, we used the Kalman filterto investigate the centralized setting, where all relative measurements are gathered at one master agentthat processes them and instructs agents how to move. Fusing relative measurements of multiple agents isoften referred to in the literature as distributed Kalman filter , see surveys in [4, 21, 22]. However, there, thetypical setting is that each agent invokes a separate Kalman filter to process its measurements, and then theresulted outputs are integrated using communication. Moreover, works in that domain often assume that6bservations are attained by static observers (i.e., the agents are residing on a fixed underlying graph) andthat the measured target is external to the system. In contrast, here we consider a self-organizing system,with mobile agents that measure a target (center of mass) that is a function of their positions.The study of flocking was originally based on computer simulations, rather than on rigorous analysis[7, 2, 27]. In recent years, more attention has been given to such self-organizing processes by controltheoreticians [20, 18], physicists [28], and computer scientists [5]. Instead of considering all componentsof flocking (typically assumed to be attraction, repulsion, and alignment), here we focus on the alignmentcomponent, and the ability to reach cohesion while avoiding excessive communication.Another related self-organization problem is clock synchronization , where the goal is to maintain a com-mon notion of time in the absence of a global source of real time. The main difficulty is handling the factthat clocks count time at slightly different rates and that messages arrive with some delays. Variants of thisproblem were studied in wireless network contexts, mostly by control theoreticians [24, 26, 25]. A commontechnique uses oscillating models [19, 14]. A recent trend in the engineering community is to study the clocksynchronization problem from a signal processing perspective, while adopting tools from information theory[31]. However, so far this perspective hardly received any attention by distributed computing theoreticians.
Distributed computing studies on stochastic systems with noisy communication.
The problemsof consensus and clock synchronization were also extensively studied in the discipline of theoretical distributedcomputing [8, 16, 11]. However, the corresponding works almost exclusively assume that, although theprocesses themselves may be faulty, the communication is nevertheless reliable, that is, not subject to noise.Moreover, the typical setting is adversarial, and, for example, very few studies in this discipline consider theclock synchronisation problem with random delays [17, 10].In recent years, more attention has been given in the distributed computing discipline to study stochasticprocesses under noisy communication. Following [9], such processes were studied in [3, 6, 12], under theassumption that each message is a bit that can be flipped with some small probability. A model concerninga group of individuals that aim to estimate an external signal relying on noisy real-valued measurements wasstudied in [15]. Despite the differences between models, similarly to our paper, the findings in [15] emphasizethe effectiveness of performing a weighted-average between the current opinion of the individual and thesample it receives. Nevertheless, we note that in the context of the model in [15], this result is less surprisingsince it does not involve any drift, and since each individual communicates with only one individual at around.
We follow the notations of [30]. Denote by N ( µ, Σ) the multivariate normal distribution with mean vector µ ∈ R n and co-variance matrix Σ ∈ R n × n , and by I the identity matrix. Time proceeds in discrete rounds. The problem is to estimate a vector x t ∈ R n at each round t . Informally,in each round t , a vector of measurements for x t is given as input, and the output is an estimation ˆ x t of x t .The vector x t is updated by some known linear transformation that is subject to noise. Then next roundstarts with new measurements for the updated vector x t +1 , and so forth.Formally, consider round t . Given matrices A t , B t , H t , Q t and R t ∈ R n × n , the measurement vector z t isgiven by z t = H t x t + v t , (8)where v t is a normally distributed noise, v t ∼ N (0 , R t ). The update at the end of the t ’th round is givenby: x t +1 = A t x t + B t u t + w t , (9)where u t is an arbitrary quantity, which is called here move , known by the Kalman filter, and w t is anothernoise factor, called here drift , which is distributed by w t ∼ N (0 , Q t ). The definition of u t can depend onthe estimation of the Kalman filter at round t . The noise vectors w t and v t that perturb the process and themeasurements are assumed to be independent of each-other, and independent across rounds.7 .2 The Kalman filter estimator We define ˆ x t the estimate of x t after the measurements at round t were obtained (Eq. (8)) and before theupdate of round t occurs (Eq. (9)). Let P t denote the error co-variance matrix associated with this estimate.Specifically, P t = E (cid:2) ( x t − ˆ x t )( x t − ˆ x t ) (cid:62) (cid:3) . We add the superscript “-” to these notations (for example P - t ), to denote the same quantities before themeasurement obtained at round t . So, in a sense, round t can be divided into the following four consecutivetime steps: (a) the filter produces an estimation ˆ x - t of x t , (b) a new measurement vector of x t is obtained,(c) the filter produces an estimation ˆ x t of x t given the new measurement, and (d) x t is updated to x t +1 . Measurement update.
In order to produce (c), the filter incorporates the measurement in (b) to theestimation in (a). Specifically, the filter first computes a quantity called the “Kalman gain”: K t = P - t H (cid:62) t (cid:0) H t P - t H (cid:62) t + R t (cid:1) − . (10)Then, it produces the following estimate, as required in step (c):ˆ x t = ˆ x - t + K t ( z t − H t ˆ x - t ) . (11)The new error co-variance matrix can then be written as: P t = ( I − K t H t ) P - t . (12) Time update.
The estimation for (a) for the following round is then given by:ˆ x - t +1 = A t ˆ x t + B t u t . (13)Finally, the new error co-variance matrix that will be used in the Kalman gain corresponding to the t + 1’thround (Eq. (10)) is: P - t +1 = A t P t A (cid:62) t + Q t . (14) For the system described in Section 2.1, the Kalman filer produces two estimates at each round. We aremainly interested in the first of these estimates, ˆ x - t , that is, the one corresponding to step (a), which isestablished before the t ’th measurement. We shall use the well-known fact that these estimates are optimal,in the sense that they minimize the expected mean square distance to x t . To state this more formally, weneed to define a general estimator for such a system. Definition 11.
Consider the system described in Section 2.1. An estimator at round t is given a sequenceof measurements ( z s ) s ≤ t − and a sequence of moves ( u s ) s ≤ t − , and produces an estimation ˜ x t for x t . Notethat with this definition, the Kalman filter restricted to the outputs ˆ x - t is an estimator. Definition 12.
An estimator at round t is said to be optimal if for every sequence of t − z s ) s ≤ t − and every sequence of t − u s ) s ≤ t − , it minimizes E (cid:32) n (cid:88) i =1 ( x t,i − ˜ x t,i ) (cid:12)(cid:12)(cid:12)(cid:12) ( z s , u s ) s ≤ t − (cid:33) . where the expectation is taken with respect to the noise in the measurements and the drifts, the initializationof positions, and the coins tosses by the estimator (in case it is probabilistic). It is said to be optimal w.r.tthe i th coordinate if it minimizes E (cid:0) ( x t,i − ˜ x t,i ) | ( z s , u s ) s ≤ t − (cid:1) .The following is a well-known result, see e.g., [1]. 8 heorem 13. For every round t , the estimator ˆ x - t , given by the Kalman filter, is optimal. Moreover, thisis the only deterministic optimal estimator. At this point, we note that optimality (as stated in Definition 12) implies optimality for each coordinate.
Corollary 14.
For every round t , and for every i ∈ { , . . . , n } , the estimator ˆ x - t,i , produced by the Kalmanfilter, is optimal w.r.t the i th coordinate. Moreover, this is the only deterministic optimal estimator w.r.tthis coordinate.Proof. Fix i ∈ { , . . . , n } . Consider an alternative estimator ˜ x t,i (cid:54) = ˆ x - t,i at round t , and a sequence ( z s , u s ) s ≤ t − of measurements and updates. If E (( x t,i − ˜ x t,i ) | ( z s , u s ) s ≤ t − ) ≤ E (( x t,i − ˆ x - t,i ) | ( z s , u s ) s ≤ t − ), then E (( x t,i − ˜ x t,i ) + (cid:80) ni =2 ( x t,i − ˆ x - t,i ) | ( z s , u s ) s ≤ t − ) ≤ E ( (cid:80) ni =1 ( x t,i − ˆ x - t,i ) | ( z s , u s ) s ≤ t − ), contradictingTheorem 13.Next, we show that if one wishes to choose the move u t − in order to minimize x t , then, wheneverpossible, the best choice is to set u t − such that the Kalman filter would produce ˆ x - t = 0 (see Eq. (13)). Proposition 15.
Fix i ∈ { , . . . , n } and an integer t ≥ , and consider a sequence of moves ( u s ) s ≤ t − , anda sequence of measurements ( z s ) s ≤ t − . If there exists a move u t − at round t − such that the Kalman filterproduces ˆ x - t,i = 0 on input ( z s , u s ) s ≤ t − , then for every other move u (cid:48) t − , E (cid:0) x t,i | ( z s ) s ≤ t − , ( u s ) s ≤ t − , u (cid:48) t − (cid:1) ≥ E (cid:0) x t,i | ( z s ) s ≤ t − , ( u s ) s ≤ t − (cid:1) with equality if and only if the corresponding estimate ˆ x - t,i (cid:48) produced by the Kalman filter is equal to .Proof. Consider a round t , and a history H t = { ( z s ) s ≤ t − , ( u s ) s ≤ t − } of t − t − t = 1, then we consider that ( u s ) s ≤ t − is an empty sequence). We assume that there exists a move u t − at round t − x - t,i = 0 on input ( z s , u s ) s ≤ t − . Consider an alternativemove u (cid:48) t − at round t −
1, and denote by ˆ x - t,i (cid:48) the estimation produced by the Kalman filter estimator oninput ( H t , u (cid:48) t − ). Our goal is to show that E (cid:0) x t,i | H t , u (cid:48) t − (cid:1) ≥ E (cid:0) x t,i | H t , u t − (cid:1) , (15)with equality in Eq. (15) if and only if ˆ x - t,i (cid:48) = 0.Let c be the i th coordinate of the vector B t ( u t − − u (cid:48) t − ). By Eq. (9), c is the difference between theposition of Agent i in the beginning of round t if it moves by u t − instead of moving by u (cid:48) t − (conditioningon having the same drift at the end of round t − E (cid:0) x t,i | H t , u (cid:48) t − (cid:1) = E (cid:0) ( x t,i − c ) | H t , u t − (cid:1) , (16)and, by Eq. (13), ˆ x - t,i (cid:48) = ˆ x - t,i − c. (17)First, let us consider the case where ˆ x - t,i = ˆ x - t,i (cid:48) = 0. In this case, by Eq. (17), c = 0, and so, by Eq. (16), E (cid:0) x t,i | H t , u (cid:48) t − (cid:1) = E (cid:0) x t,i | H t , u t − (cid:1) . Next, let us consider the case that ˆ x - t,i (cid:48) (cid:54) = 0. By Eq. (17), c (cid:54) = ˆ x - t,i . We thus have E (cid:0) x t,i | H t , u (cid:48) t − (cid:1) = E (cid:0) ( x t,i − c ) | H t , u t − (cid:1) (Eq. (16)) > E (cid:0) ( x t,i − ˆ x - t,i ) | H t , u t − (cid:1) (by Corollary 14 and because c (cid:54) = ˆ x - t,i )= E (cid:0) x t,i | H t , u t − (cid:1) , (because ˆ x - t,i = 0)which concludes the proof. Letting be the matrix whose all coefficients are equal to 1, we denote M ( a, b ) = b + ( a − b ) I, the matrix having diagonal coefficients equal to a , and other coefficients equal to b .9 .1 Rephrasing the alignment problem as a linear filtering problem First, we write our equations in matrical form to allow us to apply the Kalman filter straightforwardly. Let θ ( t ) = (cid:16) θ ( t )1 , . . . , θ ( t ) n (cid:17) , dθ ( t ) = (cid:16) dθ ( t )1 , . . . , dθ ( t ) n (cid:17) and Y ( t ) = (cid:16) Y ( t )1 , . . . , Y ( t ) n (cid:17) , and N ( t ) m = (cid:16) N ( t ) m, , . . . , N ( t ) m,n (cid:17) and N ( t ) d = (cid:16) N ( t ) d, , . . . , N ( t ) d,n (cid:17) . Measurement rule.
We recall the equation giving the measurement of Agent i at time t : Y ( t ) i = θ ( t ) i + N ( t ) m,i . We simply rewrite this equation using vectors: Y ( t ) = θ ( t ) + N ( t ) m , (18)where, by definition, N ( t ) m ∼ N (cid:0) , σ m I (cid:1) . Update rule.
We recall the update equation of the stretch, which follows from Eq. (25): θ ( t +1) i = θ ( t ) i − dθ ( t ) i − N ( t ) d,i + (cid:16) (cid:104) θ ( t +1) − i (cid:105) − (cid:104) θ ( t ) − i (cid:105) (cid:17) = θ ( t ) i − dθ ( t ) i − N ( t ) d,i + 1 n − (cid:88) j =1 j (cid:54) = i (cid:16) dθ ( t ) j + N ( t ) d,j (cid:17) . (19)We define the matrix M n = M (cid:18) − , n − (cid:19) . Let ˜ N ( t ) d = M n N ( t ) d . It follows from these definitions and Eq.(19) that θ ( t +1) = θ ( t ) + M n dθ ( t ) + ˜ N ( t ) d . (20)By definition, N ( t ) d ∼ N (cid:0) , σ d I (cid:1) , so by Claim 26, ˜ N ( t ) d ∼ N (0 , Q ) where Q = σ d · M n IM (cid:62) n = σ d M n . At this point, we note that Equations (18) and (20) correspond to Equations (8) and (9), with A t = I , B t = M n , H t = I , v t = N ( t ) m , w t = ˜ N ( t ) d , R t = σ m I and Q t = Q = σ d M n .Let ˆ θ - t , ˆ θ t denote the estimates of the stretch produces by the Kalman filter, before and after the mea-surement at round t , respectively. Definition 16.
We say that an algorithm for the alignment problem is
Kalman-perfect if it always producesa sequence of moves ( dθ ( t ) ) t ≥ , such that for every integer t ≥
1, the estimates ˆ θ - t by the Kalman filtercorresponding to this process is equal to 0.The following proposition follows directly from Proposition 15. Proposition 17.
If there exists a Kalman-perfect algorithm for the alignment problem, then this algorithm isoptimal in the centralized setting (in the sense of Definition 1. Moreover, any other optimal (deterministic)algorithm is Kalman-perfect.
Recall that we denote K t as the Kalman gain at round t , and by P - t and P t the error co-variance matricesbefore and after the measurement at round t , respectively. Recall also that at round 0, i.e., at the initializationstage, the agents are normally distributed around 0. For technical reasons, we define the Kalman filterestimate at round 0 to be zero, that is, ˆ θ -0 = 0. 10 easurement update. In our case, the Kalman gain (Eq. (10)) writes K t = P - t (cid:0) P - t + σ m I (cid:1) − . We have the following expression for the estimate (Eq. (11)):ˆ θ t = ˆ θ - t + K t (cid:16) Y ( t ) − ˆ θ - t (cid:17) . Eventually, the update equation for the error co-variance is (Eq. (12)): P t = ( I − K t ) P - t . Time update.
We have (Eq. (13)) ˆ θ - t +1 = ˆ θ t + M n dθ ( t ) , and (Eq. (14)) P - t +1 = P t + σ d · M n . Recall the sequences ( α t ) t ≥ and ( ρ ( t ) (cid:63) ) t ≥ introduced in Definitions 4 and 5. Lemma 18.
For every t ∈ N , P - t = M (cid:18) α t , − α t n − (cid:19) = − α t M n and K t = − n − n α t M nnn − σ m + α t = − ρ ( t ) (cid:63) M n . Proof.
We prove the first part of the claim by induction, and prove that for every round t , the second partof the claim (regarding K t ) follows from the first part (regarding P - t ).By Claim 26, P -0 = σ M n . By Claim 27, M n = M (cid:18) nn − , − n ( n − (cid:19) = − nn − M n . Therefore, P -0 = − nn − σ M n , and so the first part of the claim holds at round 0 since α = nn − σ .Now, let us assume that the first part of the claim holds for some t ∈ N . It follows that P - t + σ m I = M (cid:18) α t + σ m , − α t n − (cid:19) . By Claim 28, since σ m >
0, we have (cid:0) P - t + σ m I (cid:1) − = M (cid:16) α t + σ m − ( n − · α t n − , α t n − (cid:17)(cid:16) α t + σ m + α t n − (cid:17) (cid:16) α t + σ m − ( n − · α t n − (cid:17) = M (cid:16) α t n − + σ m , α t n − (cid:17) σ m (cid:16) nn − α t + σ m (cid:17) . By Claim 27 again, we can compute the “Kalman gain”: K t = P - t (cid:0) P - t + σ m I (cid:1) − = M (cid:16) α t n − + σ m α t − ( n − · α t ( n − , − α t ( n − − σ m α t n − + α t n − − ( n − · α t ( n − (cid:17) σ m (cid:16) nn − α t + σ m (cid:17) = M (cid:16) σ m α t , − σ m α t n − (cid:17) σ m (cid:16) nn − α t + σ m (cid:17) = − α t M nnn − α t + σ m , t .Next, we compute the error co-variance matrix after the measurement: P t = ( I − K t ) · P - t = M (cid:16) α t n − + σ m , α t n − (cid:17) nn − α t + σ m · P - t = M (cid:16) α t n − + σ m α t − ( n − · α t ( n − , − α t ( n − − σ m α t n − + α t n − − ( n − · α t ( n − (cid:17) nn − α t + σ m = M (cid:16) σ m α t , − σ m α t n − (cid:17) nn − α t + σ m = − n − n σ m α t α t + n − n σ m M n . Eventually, we compute the error co-variance matrix at round t + 1, before the measurement: P - t +1 = P t + σ d M n = P t − nn − σ d M n . Plugging in the expression of P t , we get P - t +1 = − n − n σ m α t α t + n − n σ m M n − nn − σ d M n = − (cid:18) n − n σ m α tn − n σ m + α t + nn − σ d (cid:19) M n = − α t +1 M n , which concludes the induction proof. Our next goal is to prove that Algorithm 1 is optimal. We then explain why this algorithm is called “meetat the center”.
Theorem 19.
Algorithm 1 is optimal in the centralized setting.Proof.
Our goal is to prove that Algorithm 1 is Kalman-perfect, that is, for every integer t ≥
1, the Kalmanfilter associated with the moves produced by Algorithm 1 gives the estimate ˆ θ - t = 0. This would concludethe proof of the theorem, by Proposition 17.For this purpose, assume that all agents run Algorithm 1. We prove by induction that for every integer t ≥
0, the Kalman filter produces the estimate ˆ θ - t = 0.The base case, where t = 0, holds since we assumed that the Kalman filter estimates zero at round zero,i.e., ˆ θ -0 = 0. Next, let us assume that ˆ θ - t = 0 for some integer t ≥
0. We have by definition,ˆ θ - t +1 = ˆ θ t + M n dθ ( t ) , (21)and ˆ θ t = ˆ θ - t + K t (cid:16) Y ( t ) − ˆ θ - t (cid:17) = K t Y ( t ) , (22)where the second equality in Eq. (22) is by induction hypothesis.By Lemma 18, K t = − ρ ( t ) (cid:63) M n . Note that, by definition of Algorithm 1, dθ ( t ) = − n − n ρ ( t ) (cid:63) M n Y ( t ) = n − n ˆ θ t . θ - t +1 = (cid:18) I + n − n M n (cid:19) ˆ θ t = (cid:18) I + n − n M n (cid:19) K t Y ( t ) = − (cid:18) M n + n − n M n (cid:19) ρ ( t ) (cid:63) Y ( t ) . Since M n = − nn − M n , we have ˆ θ - t +1 = 0. This concludes the induction, and completes the proof of thetheorem.Next, we wish to explain why we refer to Algorithm 1 as the “meet at the center” algorithm. Let i ∈{ , . . . , n } . Since ˆ θ t,i is produced by the Kalman filter, it is the optimal estimate of the stretch θ ( t ) i of agent i , given the previous measurements and moves. Next, recall that (cid:104) θ t (cid:105) denotes the center of mass of allagents, and that (cid:104) θ t (cid:105) − θ ( t ) i = n − n θ ( t ) i . Therefore, dθ ( t ) i = n − n ˆ θ t,i is the optimal estimate of (cid:104) θ t (cid:105) − θ ( t ) i (giventhe previous measurements and moves). Consequently, instructing Agent i to move by dθ ( t ) i , would makethis agent be located as close as possible to the center of mass. This observation justifies that Algorithm 1consists in “meet at the center”. W (cid:63) is optimal in the centralized setting Our next goal is to prove Theorem 7.
Theorem 7 (restated).
The weighted-average algorithm W (cid:63) is optimal in the centralized setting. Proof. (The proof follows the same line of arguments as the proof of Theorem 19.) Our goal is to provethat Algorithm W (cid:63) is Kalman-perfect, that is, for every integer t ≥
1, the Kalman filter associated with themoves produced by Algorithm W (cid:63) gives the estimate ˆ θ - t = 0. This would conclude the proof of the theorem,by Proposition 17.For this purpose, assume that all agents run Algorithm W (cid:63) . We prove by induction that for everyinteger t ≥
0, the Kalman filter produces the estimate ˆ θ - t = 0.The base case, where t = 0, holds since we assumed that the Kalman filter estimates zero at round zero,i.e., ˆ θ -0 = 0. Next, let us assume that ˆ θ - t = 0 for some integer t ≥
0, and consider t + 1. We have by definition,ˆ θ - t +1 = ˆ θ t + M n dθ ( t ) , (23)and ˆ θ t = ˆ θ - t + K t (cid:16) Y ( t ) − ˆ θ - t (cid:17) = K t Y ( t ) , (24)where the second equality in Eq. (24) is by induction hypothesis. By Lemma 18, K t = − ρ ( t ) (cid:63) M n . Hence,Eq. (23) rewrites ˆ θ - t +1 = M n (cid:16) − ρ ( t ) (cid:63) Y ( t ) + dθ ( t ) (cid:17) . Finally, by the definition of W (cid:63) , we have dθ ( t ) = ρ ( t ) (cid:63) Y ( t ) , so ˆ θ - t +1 = 0, concluding the induction step. In our setting, the cost of an individual is defined as its expected distance from the average position of otheragents at steady state. Minimizing this quantity is equivalent to minimizing the expected distance from theaverage position of all agents (see Footnote 2). Another interesting measure is the expected diameter of thegroup, defined as the maximal distance between two agents, at steady state. It would not come as a surpriseif Algorithm W (cid:63) would turn out to be optimal also with respect to this measure, however, analyzing itsexpected diameter would require handling further dependencies between agents, and therefore remains forfuture work.Finally, we conclude with a philosophical remark concerning the dichotomy between conformity and in-dividuality. When individuals have a priory conflicted preferences it is natural to assume that the individualresponsiveness to the group would be moderated [29]. Without such preferences, when the goal is to purely13onform, existing models in collective behavior typically assume that whenever an individual receives a mea-surement of the group’s average it tries to align itself with it as much as it can [13, 27]. However, we showhere that due to noise, each individual should actually moderate its social responsiveness, by weighing itscurrent direction in a non-trivial manner. This insight suggests that the dichotomy between conformity andindividuality (manifested here as persistency) might be more subtle than commonly perceived. Acknowledgment.
We would like to thank Yongcan Cao for helpful discussion regarding related work ondistributed Kalman filter.
References [1] Brian DO Anderson and John B Moore.
Optimal filtering . Courier Corporation, 2012.[2] Ichiro Aoki. A simulation study on the schooling mechanism in fish.
NIPPON SUISAN GAKKAISHI ,48(8):1081–1088, 1982.[3] Lucas Boczkowski, Ofer Feinerman, Amos Korman, and Emanuele Natale. Limits for rumor spreadingin stochastic populations. In Anna R. Karlin, editor, , volume 94 of
LIPIcs , pages 49:1–49:21. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik, 2018.[4] Yongcan Cao, Wenwu Yu, Wei Ren, and Guanrong Chen. An overview of recent progress in the studyof distributed multi-agent coordination.
IEEE Transactions on Industrial informatics , 9(1):427–438,2012.[5] Bernard Chazelle. Natural algorithms. In
Proceedings of the twentieth annual ACM-SIAM symposiumon Discrete algorithms , pages 422–431. SIAM, 2009.[6] Andrea E. F. Clementi, Luciano Gual`a, Emanuele Natale, Francesco Pasquale, Giacomo Scornavacca,and Luca Trevisan. Consensus vs broadcast, with and without noise (extended abstract). In ThomasVidick, editor, , volume 151 of
LIPIcs , pages 42:1–42:13. Schloss Dagstuhl -Leibniz-Zentrum f¨ur Informatik, 2020.[7] Iain D Couzin, Jens Krause, Richard James, Graeme D Ruxton, and Nigel R Franks. Collective memoryand spatial sorting in animal groups.
Journal of theoretical biology , 218(1):1–11, 2002.[8] Rui Fan and Nancy A. Lynch. Gradient clock synchronization.
Distributed Comput. , 18(4):255–266,2006.[9] Ofer Feinerman, Bernhard Haeupler, and Amos Korman. Breathe before speaking: efficient informationdissemination despite noisy, limited and anonymous communication.
Distributed Comput. , 30(5):339–355, 2017.[10] Ofer Feinerman and Amos Korman. Clock synchronization and estimation in highly dynamic networks:An information theoretic approach. In Christian Scheideler, editor,
Structural Information and Commu-nication Complexity - 22nd International Colloquium, SIROCCO 2015, Montserrat, Spain, July 14-16,2015, Post-Proceedings , volume 9439 of
Lecture Notes in Computer Science , pages 16–30. Springer,2015.[11] Michael J Fischer, Nancy A Lynch, and Michael S Paterson. Impossibility of distributed consensus withone faulty process.
Journal of the ACM (JACM) , 32(2):374–382, 1985.[12] Pierre Fraigniaud and Emanuele Natale. Noisy rumor spreading and plurality consensus.
DistributedComput. , 32(4):257–276, 2019. 1413] Aviram Gelblum, Itai Pinkoviezky, Ehud Fonio, Abhijit Ghosh, Nir Gov, and Ofer Feinerman. Antgroups optimally amplify the effect of transiently informed individuals.
Nature communications , 6:7729,2015.[14] Yao-Win Hong and Anna Scaglione. A scalable synchronization protocol for large scale sensor networksand its applications.
IEEE Journal on Selected Areas in Communications , 23(5):1085–1099, 2005.[15] Amos Korman, Efrat Greenwald, and Ofer Feinerman. Confidence sharing: An economic strategy forefficient information flows in animal groups.
PLoS Computational Biology , 10(10), 2014.[16] Christoph Lenzen, Thomas Locher, and Roger Wattenhofer. Tight Bounds for Clock Synchronization.In
Journal of the ACM, Volume 57, Number 2, New York, NY, USA , January 2010.[17] Christoph Lenzen, Philipp Sommer, and Roger Wattenhofer. Pulsesync: An efficient and scalable clocksynchronization protocol.
IEEE/ACM Trans. Netw. , 23(3):717–727, 2015.[18] Shukai Li, Xinzhi Liu, Wansheng Tang, and Jianxiong Zhang. Flocking of multi-agents following aleader with adaptive protocol in a noisy environment.
Asian Journal of Control , 16(6):1771–1778, 2014.[19] Renato E Mirollo and Steven H Strogatz. Synchronization of pulse-coupled biological oscillators.
SIAMJournal on Applied Mathematics , 50(6):1645–1662, 1990.[20] Reza Olfati-Saber. Flocking for multi-agent dynamic systems: Algorithms and theory.
IEEE Transac-tions on automatic control , 51(3):401–420, 2006.[21] Reza Olfati-Saber. Distributed kalman filtering for sensor networks. In , pages 5492–5498. IEEE, 2007.[22] Wei Ren, Randal W Beard, and Ella M Atkins. A survey of consensus problems in multi-agent co-ordination. In
Proceedings of the 2005, American Control Conference, 2005. , pages 1859–1864. IEEE,2005.[23] Craig W Reynolds.
Flocks, herds and schools: A distributed behavioral model , volume 21. ACM, 1987.[24] Osvaldo Simeone, Umberto Spagnolini, Yeheskel Bar-Ness, and Steven H Strogatz. Distributed syn-chronization in wireless networks.
IEEE Signal Processing Magazine , 25(5):81–97, 2008.[25] Fikret Sivrikaya and B¨ulent Yener. Time synchronization in sensor networks: a survey.
IEEE network ,18(4):45–50, 2004.[26] Bharath Sundararaman, Ugo Buy, and Ajay D Kshemkalyani. Clock synchronization for wireless sensornetworks: a survey.
Ad hoc networks , 3(3):281–323, 2005.[27] Tam´as Vicsek, Andr´as Czir´ok, Eshel Ben-Jacob, Inon Cohen, and Ofer Shochet. Novel type of phasetransition in a system of self-driven particles.
Phys. Rev. Lett. , 75:1226–1229, Aug 1995.[28] Tam´as Vicsek and Anna Zafeiris. Collective motion.
Physics Reports , 517(3):71 – 140, 2012. Collectivemotion.[29] Ashley JW Ward, James E Herbert-Read, David JT Sumpter, and Jens Krause. Fast and accuratedecisions through collective vigilance in fish shoals.
Proceedings of the National Academy of Sciences ,108(6):2312–2315, 2011.[30] Greg Welch, Gary Bishop, et al. An introduction to the kalman filter, 1995.[31] Yik-Chung Wu, Qasim Chaudhari, and Erchin Serpedin. Clock synchronization of wireless sensornetworks.
IEEE Signal Processing Magazine , 28(1):124–138, 2010.15
More Results Regarding Weighted-average Algorithms
A.1 An optimal weighted-average algorithm (at steady state)
Our goal in this section is to prove Theorems 2 and 3. We start with the following observation.
Lemma 20.
For any t , and whatever the positions of the agents, we have n (cid:88) i =1 θ ( t ) i = 0 . Proof. n (cid:88) i =1 θ ( t ) i = n (cid:88) i =1 (cid:16) (cid:104) θ ( t ) − i (cid:105) − θ ( t ) i (cid:17) = 1 n − n (cid:88) i =1 n (cid:88) j =1 j (cid:54) = i θ ( t ) i − n (cid:88) i =1 θ ( t ) i = 1 n − · ( n − n (cid:88) i =1 θ ( t ) i − n (cid:88) i =1 θ ( t ) i = 0 . Next, we compute how the stretch θ ( t ) i of Agent i changes when all agents perform weighted-averagemoves with constant responsiveness parameter ρ . Lemma 21.
Assume that all agents execute W ( ρ ) , for some ≤ ρ ≤ . Let E ( t ) j = ρN ( t ) m,j + N ( t ) d,j . Then for every i ∈ { , ..., n } and every t ∈ N , θ ( t +1) i = (cid:18) − nn − ρ (cid:19) θ ( t ) i + 1 n − n (cid:88) j =1 j (cid:54) = i E ( t ) j − E ( t ) i . Proof.
The stretch of Agent i at the t + 1st round is given by: θ ( t +1) i = (cid:104) θ ( t +1) − i (cid:105) − θ ( t +1) i by definition= (cid:104) θ ( t +1) − i (cid:105) − (cid:16) θ ( t ) i + dθ ( t ) i + N ( t ) d,i (cid:17) by (2)= (cid:16) (cid:104) θ ( t +1) − i (cid:105) − (cid:104) θ ( t ) − i (cid:105) (cid:17) + (cid:16) (cid:104) θ ( t ) − i (cid:105) − θ ( t ) i (cid:17) − dθ ( t ) i − N ( t ) d,i = (cid:16) (cid:104) θ ( t +1) − i (cid:105) − (cid:104) θ ( t ) − i (cid:105) (cid:17) + θ ( t ) i − dθ ( t ) i − N ( t ) d,i . (25)Let us break down the first term: (cid:104) θ ( t +1) − i (cid:105) − (cid:104) θ ( t ) − i (cid:105) = 1 n − n (cid:88) j =1 j (cid:54) = i (cid:16) dθ ( t ) j + N ( t ) d,j (cid:17) = 1 n − n (cid:88) j =1 j (cid:54) = i (cid:16) ρ (cid:16) θ ( t ) j + N ( t ) m,j (cid:17) + N ( t ) d,j (cid:17) , where the second equality is because dθ ( t ) j = ρY ( t ) j = ρ (cid:16) θ ( t ) j + N ( t ) m,j (cid:17) . By Lemma 20, (cid:80) nj =1 j (cid:54) = i θ ( t ) j = − θ ( t ) i , sowe can rewrite the last equation as (cid:104) θ ( t +1) − i (cid:105) − (cid:104) θ ( t ) − i (cid:105) = − ρn − θ ( t ) i + 1 n − n (cid:88) j =1 j (cid:54) = i (cid:16) ρN ( t ) m,j + N ( t ) d,j (cid:17) = − ρn − θ ( t ) i + 1 n − n (cid:88) j =1 j (cid:54) = i E ( t ) j . (26)Plugging Eq. (26) into Eq. (25) gives θ ( t +1) i = (cid:18) − ρn − (cid:19) θ ( t ) i + 1 n − n (cid:88) j =1 j (cid:54) = i E ( t ) j − dθ ( t ) i − N ( t ) d,i . (27)By assumption, dθ ( t ) i = ρY ( t ) i = ρ (cid:16) θ ( t ) i + N ( t ) m,i (cid:17) . Plugging this expression into Eq. (27) gives the result.16ow we can prove that the stretch of each agent is normally distributed at every round, and compute itsvariance. Lemma 22.
Assume that all agents execute W ( ρ ) , for some ≤ ρ ≤ . Then, for every i ∈ { , ..., n } andevery t ∈ N , the stretch θ ( t ) i is normally distributed. Moreover, E (cid:16) θ ( t ) i (cid:17) = 0 , and Var (cid:16) θ ( t +1) i (cid:17) = (cid:18) − nn − ρ (cid:19) Var (cid:16) θ ( t ) i (cid:17) + nn − ρ σ m + σ d ) . Proof.
We prove that the stretch is normally distributed with mean 0 by induction on t . By construction,for every i , θ (0) i is normally distributed with mean 0. Let us assume that θ ( t ) i is normally distributed withmean 0 for some round t , and consider round t + 1. Recall that Lemma 21 gives θ ( t +1) i = (cid:18) − nn − ρ (cid:19) θ ( t ) i + 1 n − n (cid:88) j =1 j (cid:54) = i E ( t ) j − E ( t ) i . (28)Since by definition, for every j , E ( t ) j is normally distributed around 0, and by induction θ ( t ) i is normallydistributed around 0, then θ ( t +1) i is also normally distributed around 0. This concludes the induction.Moreover, note that Var (cid:16) E ( t ) j (cid:17) = ρ σ m + σ d , soVar n − n (cid:88) j =1 j (cid:54) = i E ( t ) j − E ( t ) i = 1( n − n (cid:88) j =1 j (cid:54) = i Var (cid:16) E ( t ) j (cid:17) + Var (cid:16) E ( t ) i (cid:17) = nn − (cid:0) ρ σ m + σ d (cid:1) . Hence, by Eq. (28), Var (cid:16) θ ( t +1) i (cid:17) = (cid:18) − nn − ρ (cid:19) Var (cid:16) θ ( t ) i (cid:17) + nn − (cid:0) ρ σ m + σ d (cid:1) , which concludes the proof.Before proving Theorem 2, we need a small technical result, which we prove next for the sake of com-pleteness. Claim 23.
Let a, b ≥ . Consider the sequence { u n } ∞ n =0 defined by letting u ∈ R and for every integer n , u n +1 = au n + b . If a < , then { u n } ∞ n =0 converges and lim n → + ∞ u n = b/ (1 − a ) . If a = 1 , and b > , then lim n → + ∞ u n = + ∞ .Proof. First, consider the case that a <
1. Let λ = b/ ( a − v n = u n + λ .We have v n +1 = u n +1 + λ = au n + b + λ = au n + ( a − λ + λ = au n + aλ = a ( u n + λ ) = av n . Since 0 ≤ a <
1, lim n → + ∞ v n = 0, and so lim n → + ∞ u n = − λ = b/ (1 − a ).Now, if a = 1, then we have for every n ∈ N , u n = u + nb . If b > n → + ∞ u n = + ∞ .The next theorem follows directly from Lemma 22 and Claim 23. Theorem 2 (restated).
Assume that all agents execute W ( ρ ), for a fixed 0 ≤ ρ ≤
1. Then for every i ∈ { , ..., n } and every t ∈ N , the stretch θ ( t ) i is normally distributed, andlim t → + ∞ Var (cid:16) θ ( t ) i (cid:17) = nn − ( ρ σ m + σ d )1 − (1 − nn − ρ ) , with the convention that lim t → + ∞ Var (cid:16) θ ( t ) i (cid:17) = + ∞ if the denominator 1 − (1 − nn − ρ ) = 0.17 roof. We apply Lemma 22. Hence, by Claim 23, Var (cid:16) θ ( t ) i (cid:17) converges to the limit as stated. Note that thevariance is infinite if 1) the responsiveness is equal to 0, in which case the drift adds up endlessly, or 2) theresponsiveness is equal to 1 and n = 2, in which case the agents “swap” at each round, producing the sameresult. Theorem 3 (restated).
The weighted-average algorithm that optimizes group variance among all weighted-average algorithms W ( ρ ) (that use the same responsiveness parameter ρ at all rounds) is W ( ρ (cid:63) ), where ρ (cid:63) = σ d (cid:114) σ m + (cid:16) nn − σ d (cid:17) − nn − σ d σ m . Proof.
Consider the function Var( ρ ) = nn − ( ρ σ m + σ d )1 − (1 − nn − ρ ) . Note that this function evaluates to + ∞ when ρ = 0, or when ρ = 1 and n = 2. This function can berewritten as Var( ρ ) = ρ σ m + σ d ρ − nn − ρ . One can compute the derivative in a straightforward manner:Var (cid:48) ( ρ ) = 2 ρσ m (cid:16) ρ − nn − ρ (cid:17) − ( ρ σ m + σ d ) (cid:16) − nn − ρ (cid:17)(cid:16) ρ − nn − ρ (cid:17) . We have that Var (cid:48) ( ρ ) = 0 ⇐⇒ ρσ m (cid:18) ρ − nn − ρ (cid:19) − ( ρ σ m + σ d ) (cid:18) − nn − ρ (cid:19) = 0 ⇐⇒ ρ σ m (cid:18) − nn − ρ (cid:19) − ( ρ σ m + σ d ) (cid:18) − nn − ρ (cid:19) = 0 ⇐⇒ ρ σ m − σ d (cid:18) − nn − ρ (cid:19) = 0 . This equation has a unique solution in the interval [0 , (cid:48) ( ρ ) = 0 , ρ ∈ [0 , ⇐⇒ ρ = σ d (cid:114) σ m + (cid:16) nn − σ d (cid:17) − nn − σ d σ m . We check that this corresponds to a minimum to conclude the proof.
A.2 The asymptotic behavior of W (cid:63) The goal of this section is to prove Claim 8.
Claim 8 (restated).
The sequence α t converges to α ∞ := lim t → + ∞ α t = (cid:32) σ d (cid:114) σ m + (cid:16) nn − σ d (cid:17) + nn − σ d (cid:33) .Moreover, lim t → + ∞ ρ ( t ) (cid:63) = ρ (cid:63) . Proof.
For a, b >
0, define f a,b : R + → R + such that f a,b ( x ) = a xx + a + b . Solving f a,b ( (cid:96) ) = (cid:96) on R + gives (cid:96) = (cid:16) √ b √ a + b + b (cid:17) . For every x ∈ R + : f a,b ( x ) − f a,b ( (cid:96) ) = a (cid:18) xx + a − (cid:96)(cid:96) + a (cid:19) = a · x ( (cid:96) + a ) − (cid:96) ( x + a )( x + a )( (cid:96) + a ) = a x − (cid:96) ( x + a )( (cid:96) + a ) . x ≥ | f a,b ( x ) − f a,b ( (cid:96) ) | = (cid:12)(cid:12)(cid:12)(cid:12) a x − (cid:96) ( x + a )( (cid:96) + a ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ a a ( a + (cid:96) ) | x − (cid:96) | = aa + (cid:96) | x − (cid:96) | . (29) Claim 24.
Let ( u t ) t ∈ N be a sequence defined by u ∈ R + and for every integer t , u t +1 = f a,b ( u t ) . Then, ( u t ) converges, and lim t →∞ u t = (cid:96) .Proof. Let k = aa + (cid:96) . Since a, b > (cid:96) >
0, and so k <
1. Let us show by induction that for every t , | u t − (cid:96) | ≤ k t · | u − (cid:96) | . This equality is trivial for t = 0. Assuming that it holds for some t ∈ N , we have | u t +1 − (cid:96) | = | f a,b ( u t ) − f a,b ( (cid:96) ) | (by definition of u t and (cid:96) ) ≤ k · | u t − (cid:96) | (by Eq.(29)) ≤ k t +1 · | u − (cid:96) | , (by induction hypothesis)concluding the induction. Since k <
1, it implies that lim t →∞ | u t − (cid:96) | = 0, and so lim t →∞ u t = (cid:96) .Applying Claim 24 to ( α t ) t ∈ N with a = n − n σ m and b = nn − σ d gives lim t →∞ α t = α ∞ , as stated.Next, we show that lim t →∞ ρ ( t ) (cid:63) = ρ (cid:63) . By letting t tend to + ∞ in Definition 4, we obtain α ∞ = n − n σ m α ∞ n − n σ m + α ∞ + nn − σ d . (30)Doing the same in Definition 5, we getlim t → + ∞ ρ ( t ) (cid:63) = n − n α ∞ n − n σ m + α ∞ = 1 σ m n − n σ m α ∞ n − n σ m + α ∞ . By Eq. (30), this gives lim t → + ∞ ρ ( t ) (cid:63) = 1 σ m (cid:18) α ∞ − nn − σ d (cid:19) . Plugging in the expression of α ∞ mentioned in the Lemma, we find thatlim t → + ∞ ρ ( t ) (cid:63) = σ d (cid:114) σ m + (cid:16) nn − σ d (cid:17) − nn − σ d σ m = ρ (cid:63) , which establishes the proof. A.3 Game-theoretic considerations
Theorem 10 (restated).
Algorithm W (cid:63) is a (symmetric) strong Nash-equilibrium. Moreover, if all agentsare restricted to execute weighted-average algorithms, then W (cid:63) is the only strong Nash equilibrium. Proof.
The fact that Algorithm W (cid:63) is a (symmetric) strong Nash-equilibrium is a direct consequence ofTheorem 7 and our definition of optimality (Definition 1).Let us now prove the uniqueness result. Consider the case that the agents all use some weighted-averagealgorithm W ( ρ ( t ) ), and that this is a strong Nash equilibrium. We shall show that it implies that, for everyround t , ρ ( t ) = ρ ( t ) (cid:63) .Fix i ∈ I . First, let us investigate the best response for Agent i to the behavior of others. Let˜ N ( t ) = − N ( t ) d,i + 1 n − n (cid:88) j =1 j (cid:54) = i (cid:16) ρ ( t ) N ( t ) m,j + N ( t ) d,j (cid:17) .
19y Lemma 21, θ ( t +1) i = (cid:18) − ρ ( t ) n − (cid:19) θ ( t ) i + 1 n − n (cid:88) j =1 j (cid:54) = i (cid:16) ρ ( t ) N ( t ) m,j + N ( t ) d,j (cid:17) − N ( t ) d,i − dθ ( t ) i = (cid:18) − ρ ( t ) n − (cid:19) θ ( t ) i + ˜ N ( t ) − dθ ( t ) i . Note that ˜ N ( t ) is normally distributed, with mean 0, and thatVar (cid:16) ˜ N ( t ) (cid:17) = 1 n − (cid:16) ρ ( t )2 σ m + σ d (cid:17) + σ d . These equations show that the evolution of the stretch of Agent i can be written as in Eq. (9), with the u t variables being the moves dθ ( t ) i of the agent. More precisely, in this case, we have A t = (cid:16) − ρ ( t ) n − (cid:17) , B t = − Q t = n − (cid:16) ρ ( t )2 σ m + σ d (cid:17) + σ d , H t = 1, and R t = σ m . (Note that all these are scalars, since the filteringproblem corresponding to Agent i is one dimensional.) This implies that this agent faces a discrete linearfiltering problem, that we tackle using the Kalman filter algorithm.For every round t , and for every sequence of moves ( dθ ( s ) i ) s ≤ t , we can compute the variance of theKalman filter estimate, before and after the t ’th measurement, as well as the corresponding Kalman gain(see Section 2.2). By definition of the Alignment problem, we have P -0 = nσ / ( n − . Following the definitions in Section 2.2, we have K t = P - t P - t + σ m , (31)and P - t +1 = (cid:18) − ρ ( t ) n − (cid:19) P - t σ m P - t + σ m + 1 n − (cid:16) ρ ( t )2 σ m + σ d (cid:17) + σ d . (32) Claim 25.
Fix i , and assume that all agents j (cid:54) = i execute W ( ρ ( t ) ) . The Kalman filter corresponding to θ ( t ) i produces ˆ θ - t = 0 for every round t , if and only if Agent i chooses dθ ( t ) i = (cid:16) − ρ ( t ) n − (cid:17) P - t P - t + σ m Y ( t ) i for everyround t .Proof. We prove the claim by induction on t . Specifically, our goal is to prove that for every round t , theKalman filter produces ˆ θ - s = 0 for every round s ≤ t , if and only if, Agent i chooses dθ ( s ) i = (cid:18) − ρ ( s ) n − (cid:19) P - s P - s + σ m Y ( s ) i for every round s ≤ t −
1. This is trivially true for t = 0, since, it is assumed that the Kalman filter alwaysproduces ˆ θ -0 = 0. Now, let us assume that this holds for some round t , and consider round t + 1.As a consequence of the definition of the Kalman filter, by plugging in Eq. (11) in Eq. (13), we haveˆ θ - t +1 = (cid:18) − ρ ( t ) n − (cid:19) (cid:16) ˆ θ - t + K t (cid:16) Y ( t ) i − ˆ θ - t (cid:17)(cid:17) − dθ ( t ) i . (33)The Kalman filter produces ˆ θ - s = 0 for every round s ≤ t + 1, if and only if, (1) it produces ˆ θ - s = 0 for everyround s ≤ t and (2) it produces ˆ θ - t +1 = 0. By Eq. (33), this occurs if and only if (1) it produces ˆ θ - s = 0 forevery round s ≤ t and (2) (cid:18) − ρ ( t ) n − (cid:19) K t Y ( t ) i − dθ ( t ) i = 0 . (34)20y the induction hypothesis, and the computation of the Kalman gain K t in Eq. (31), these two conditionshold if and only if Agent i chooses dθ ( s ) i = (cid:16) − ρ ( s ) n − (cid:17) P - s P - s + σ m Y ( s ) i for every round s ≤ t , which concludes theinduction proof.By the assumption that this is a strong Nash equilibrium, Algorithm W ( ρ ( t ) ) is a best response forAgent i . According to Claim 25, and by proposition 15 this implies that dθ ( t ) i = (cid:18) − ρ ( t ) n − (cid:19) P - t P - t + σ m Y ( t ) i (35)for every round t . Because Agent i was assumed to follow Algorithm W ( ρ ( t ) ), dθ ( t ) i = ρ ( t ) Y ( t ) i for everyround t . Therefore, Eq. (35) rewrites ρ ( t ) = (cid:18) − ρ ( t ) n − (cid:19) P - t P - t + σ m , (36)which, by rearranging yields: ρ ( t ) = P - tnn − P - t + σ m . (37)We can use Eq. (36) to simplify the first term in Eq. (32), to get P - t +1 = (cid:18) − ρ ( t ) n − (cid:19) ρ ( t ) σ m + 1 n − (cid:16) ρ ( t )2 σ m + σ d (cid:17) + σ d = ρ ( t ) σ m + nn − σ d . Replacing ρ ( t ) in the last equation by its expression in Eq. (37) gives P - t +1 = σ m P - tnn − P - t + σ m + nn − σ d . (38)Note that P -0 = α = nσ / ( n − t , P - t = α t . Moreover, Eq. (37) matches Definition 5, so we actually have for everyround t , ρ ( t ) = ρ ( t ) (cid:63) , which concludes the proof. B More proofs related to the Centralized Setting
B.1 Useful linear algebra claims
We first recall the following well-known property,
Claim 26. If X ∼ N ( µ, Σ) , then for every c ∈ R n and B ∈ R n × n , c + BX ∼ N (cid:0) c + Bµ, B Σ B (cid:62) (cid:1) . In addition, we give two useful results about matrices of the form M ( a, b ). Claim 27.
For every a, b, a (cid:48) , b (cid:48) ∈ R , M ( a, b ) M ( a (cid:48) , b (cid:48) ) = M ( aa (cid:48) + ( n − bb (cid:48) , ab (cid:48) + a (cid:48) b + ( n − bb (cid:48) ) . In particular, M ( a, b ) M ( a (cid:48) , b (cid:48) ) = M ( a (cid:48) , b (cid:48) ) M ( a, b ) . Claim 28.
For every a, b ∈ R such that a (cid:54) = b and a (cid:54) = − ( n − b , the matrix M ( a, b ) is invertible, and M ( a, b ) − = M ( a + ( n − b, − b )( a − b )( a + ( n − b ) . roof. Note that = n . Let A = M ( a, b ). We have A = ( b + ( a − b ) I ) = b + 2 b ( a − b ) + ( a − b ) I = ( nb + 2( a − b ))( b ) + ( a − b ) I = (2 a + ( n − b )( b + ( a − b ) I ) − ( nb + 2( a − b ))( a − b ) I + ( a − b ) I = (2 a + ( n − b ) A + ( a − b )( − nb − a − b ) + ( a − b )) I = (2 a + ( n − b ) A − ( a − b )( a + ( n − b ) I. Hence A ( A − (2 a + ( n − b ) I ) = − ( a − b )( a + ( n − b ) I, from which we conclude (provided that a (cid:54) = b and a (cid:54) = ( n − b ), A − = (2 a + ( n − b ) I − A ( a − b )( a + ( n − b ) . B.2 All optimal algorithms are shifts of one another
In this section, we characterize all optimal (deterministic) algorithms for the Alignment problem in thecentralized setting. We show that each of these algorithms can be obtained from W (cid:63) , by shifting all theagents by the same quantity λ t , though we stress that shifts λ t are not necessarily the same for all rounds t . Theorem 29.
A deterministic algorithm is optimal in the centralized setting if and only if for every round t ,there exists λ t such that for every i ∈ { , . . . , n } , dθ ( t ) i = ρ ( t ) (cid:63) Y ( t ) i + λ t .Proof. We have already established that the (deterministic) weighted-average algorithm W (cid:63) is a Kalman-perfect algorithm. Therefore, by Proposition 17, any other deterministic algorithm is optimal in the cen-tralized setting if and only if it is Kalman-perfect. In other words, it is optimal if and only if it produces asequence of moves such that for every round t , the Kalman-filter estimator operating on the correspondingprocess yields ˆ θ - t +1 = 0 ⇐⇒ ˆ θ t + M n dθ ( t ) = 0 ⇐⇒ K t Y ( t ) + M n dθ ( t ) = 0 ⇐⇒ − ρ ( t ) (cid:63) M n Y ( t ) + M n dθ ( t ) = 0 ⇐⇒ M n (cid:16) − ρ ( t ) (cid:63) Y ( t ) + dθ ( t ) (cid:17) = 0 ⇐⇒ − ρ ( t ) (cid:63) Y ( t ) + dθ ( t ) ∈ ker( M n ) . Writing to denote the vector whose coefficients are all equal to 1, we observe that ∈ ker( M n ). Sincerank( M n ) = n −
1, dim(ker( M n )) = 1, so for every round t , − ρ ( t ) (cid:63) Y ( t ) + dθ ( t ) ∈ ker( M n ) ⇐⇒ ∃ λ t ∈ R , − ρ ( t ) (cid:63) Y ( t ) + dθ ( t ) = λ t · ⇐⇒ ∃ λ t ∈ R , dθ ( t ) = λ t · + ρ ( t ) (cid:63) Y ( t ) . This concludes the proof.
B.3 Computing the shifts between W (cid:63) and Algorithm 1 In this section, we consider one execution of the process when W (cid:63) is used, and one execution when Algo-rithm 1 (meet at the center) is used. The variables involved in the execution of W (cid:63) are denoted with [ · ] W (cid:63) ,while the variables involved in the execution of Algorithm 1 are denoted with [ · ] MatC .22e assume that the randomness is the same for both algorithms, that is, the initialization of agents isthe same, and for every round t , and every i ∈ I , we have (cid:104) N ( t ) m,i (cid:105) W (cid:63) = (cid:104) N ( t ) m,i (cid:105) MatC , (cid:104) N ( t ) d,i (cid:105) W (cid:63) = (cid:104) N ( t ) d,i (cid:105) MatC and (cid:104) θ (0) i (cid:105) W (cid:63) = (cid:104) θ (0) i (cid:105) MatC . Claim 30.
For every round t , and for every i ∈ { , . . . , n } , • (cid:104) θ ( t ) i (cid:105) W (cid:63) = (cid:104) θ ( t ) i (cid:105) MatC , • (cid:104) dθ ( t ) i (cid:105) W (cid:63) − (cid:104) dθ ( t ) i (cid:105) MatC = n ρ ( t ) (cid:63) (cid:80) ni =1 (cid:104) Y ( t ) i (cid:105) W (cid:63) = n ρ ( t ) (cid:63) (cid:80) ni =1 (cid:104) Y ( t ) i (cid:105) MatC .Proof.
The proof proceeds by induction. More precisely, we prove the first item in the claim by inductionon t . For any round t , we prove that the second item follows from the first one. Then, in the induction step,when proving that the first item regarding time t + 1 holds, we use the second item regarding the previoustime t .The base case for the first item in the claim, i.e., (cid:104) θ (0) i (cid:105) W (cid:63) = (cid:104) θ (0) i (cid:105) MatC for every i ∈ I , holds byassumption. Next, let us assume that for some round t , we have for every i ∈ I , (cid:104) θ ( t ) i (cid:105) W (cid:63) = (cid:104) θ ( t ) i (cid:105) MatC . Since the measurement noises are equal, then by the induction hypothesis, the measurement are also equal,that is, for every i ∈ I , (cid:104) Y ( t ) i (cid:105) W (cid:63) = (cid:104) θ ( t ) i + N ( t ) m,i (cid:105) W (cid:63) = (cid:104) θ ( t ) i + N ( t ) m,i (cid:105) MatC = (cid:104) Y ( t ) i (cid:105) MatC = Y ( t ) i . Thus, (cid:104) dθ ( t ) i (cid:105) W (cid:63) − (cid:104) dθ ( t ) i (cid:105) MatC = ρ ( t ) (cid:63) Y ( t ) i − n − n ρ ( t ) (cid:63) Y ( t ) i − n − (cid:88) j (cid:54) = i Y ( t ) j (by definition)= 1 n ρ ( t ) (cid:63) Y ( t ) i + 1 n ρ ( t ) (cid:63) (cid:88) j (cid:54) = i Y ( t ) j = 1 n ρ ( t ) (cid:63) n (cid:88) i =1 Y ( t ) i := λ t . Finally, (cid:104) θ ( t +1) i (cid:105) W (cid:63) = θ ( t ) i − dθ ( t ) i − N ( t ) d,i + 1 n − (cid:88) j (cid:54) = i (cid:16) dθ ( t ) j + N ( t ) d,j (cid:17) W (cid:63) (by Eq. (19))= (cid:104) θ ( t ) i − N ( t ) d,i (cid:105) MatC − (cid:104) dθ ( t ) i (cid:105) W (cid:63) + 1 n − (cid:88) j (cid:54) = i (cid:104) dθ ( t ) j (cid:105) W (cid:63) + (cid:104) N ( t ) d,j (cid:105) MatC (1st item of the claim)= θ ( t ) i − ( dθ ( t ) i + λ t ) − N ( t ) d,i + 1 n − (cid:88) j (cid:54) = i dθ ( t ) j + λ t + N ( t ) d,j MatC (2nd item of the claim)= θ ( t ) i − dθ ( t ) i − N ( t ) d,i + 1 n − (cid:88) j (cid:54) = i dθ ( t ) j + N ( t ) d,j MatC = (cid:104) θ ( t +1) i (cid:105) MatC ..