[PDF] Universal Quantile Estimation with Feedback in the Communication-Constrained Setting

Abstract

We consider the following problem of decentralized statistical inference: given i.i.d. samples from an unknown distribution, estimate an arbitrary quantile subject to limits on the number of bits exchanged. We analyze a standard fusion-based architecture, in which each of m sensors transmits a single bit to the fusion center, which in turn is permitted to send some number k bits of feedback. Supposing that each of $\nodenum$ sensors receives n observations, the optimal centralized protocol yields mean-squared error decaying as $\order(1/[n m])$. We develop and analyze the performance of various decentralized protocols in comparison to this centralized gold-standard. First, we describe a decentralized protocol based on $k = \log(\nodenum)$ bits of feedback that is strongly consistent, and achieves the same asymptotic MSE as the centralized optimum. Second, we describe and analyze a decentralized protocol based on only a single bit ( k=1 ) of feedback. For step sizes independent of m , it achieves an asymptotic MSE of order $\order[1/(n \sqrt{m})]$, whereas for step sizes decaying as 1/ m − − √ , it achieves the same $\order(1/[n m])$ decay in MSE as the centralized optimum. Our theoretical results are complemented by simulations, illustrating the tradeoffs between these different protocols.

Full PDF

aa r X i v : . [ c s . I T ] J un Universal Quantile Estimation with Feedbackin the Communication-Constrained Setting

Ram Rajagopal , Martin J. Wainwright , Department of Electrical Engineering and Computer Science Department of StatisticsUniversity of California, Berkeley { ramr,wainwrig } @eecs.berkeley.edu Abstract

We consider the following problem of decentralized statistical inference: given i.i.d. samples from anunknown distribution, estimate an arbitrary quantile subject to limits on the number of bits exchanged.We analyze a standard fusion-based architecture, in which each of m sensors transmits a single bit to thefusion center, which in turn is permitted to send some number k bits of feedback. Supposing that each of m sensors receives n observations, the optimal centralized protocol yields mean-squared error decayingas O (1 / [ nm ]) . We develop and analyze the performance of various decentralized protocols in comparisonto this centralized gold-standard. First, we describe a decentralized protocol based on k = log( m ) bits offeedback that is strongly consistent, and achieves the same asymptotic MSE as the centralized optimum.Second, we describe and analyze a decentralized protocol based on only a single bit ( k = 1 ) of feedback.For step sizes independent of m , it achieves an asymptotic MSE of order O [1 / ( n √ m )] , whereas for stepsizes decaying as / √ m , it achieves the same O (1 / [ nm ]) decay in MSE as the centralized optimum.Our theoretical results are complemented by simulations, illustrating the tradeoffs between these differentprotocols. Keywords:

Decentralized inference; communication constraints; distributed estimation; non-parametricestimation; quantiles; sensor networks; stochastic approximation.I. I

NTRODUCTION

Whereas classical statistical inference is performed in a centralized manner, many modern scientiﬁcproblems and engineering systems are inherently decentralized : data are distributed, and cannot be aggre-

Portions of this work were presented at the International Symposium on Information Theory, Seattle, WA, July 2006.

November 19, 2018 DRAFT gated due to various forms of communication constraints. An important example of such a decentralizedsystem is a sensor network [6]: a set of spatially-distributed sensors collect data about the environmentalstate (e.g., temperature, humidity or light). Typically, these networks are based on ad hoc deployments,in which the individual sensors are low-cost, and must operate under very severe power constraints (e.g.,limited battery life). In statistical terms, such communication constraints imply that the individual sensorscannot transmit the raw data; rather, they must compress or quantize the data—for instance, by reducinga continuous-valued observation to a single bit—and can transmit only this compressed representationback to the fusion center.By now, there is a rich literature in both information theory and statistical signal processing on problemsof decentralized statistical inference. A number of researchers, dating back to the seminal paper of Tenneyand Sandell [16], have studied the problem of hypothesis testing under communication-constraints; seethe survey papers [17], [18], [4], [19], [5] and references therein for overviews of this line of work. Thehypothesis-testing problem has also been studied in the information theory community, where the analysisis asymptotic and Shannon-theoretic in nature [1], [11]. A parallel line of work deals with problem ofdecentralized estimation. Work in signal processing typically formulates it as a quantizer design problemand considers ﬁnite sample behavior [2], [8]; in contrast, the information-theoretic approach is asymptoticin nature, based on rate-distortion theory [20], [10]. In much of the literature on decentralized statisticalinference, it is assumed that the underlying distributions are known with a speciﬁed parametric form(e.g., Gaussian). More recent work has addressed non-parametric and data-driven formulations of theseproblems, in which the decision-maker is simply provided samples from the unknown distribution [14],[13], [9]. For instance, Nguyen et al. [14] established statistical consistency for non-parametric approachesto decentralized hypothesis testing based on reproducing kernel Hilbert spaces. Luo [13] analyzed a non-parametric formulation of decentralized mean estimation, in which a ﬁxed but unknown parameter iscorrupted by noise with bounded support but otherwise arbitrary distribution, and shown that decentralizedapproaches can achieve error rates that are order-optimal with respect to the centralized optimum.This paper addresses a different problem in decentralized non-parametric inference—namely, that ofestimating an arbitrary quantile of an unknown distribution. Since there exists no unbiased estimator basedon a single sample, we consider the performance of a network of m sensors, each of which collects a totalof n observations in a sequential manner. Our analysis treats the standard fusion-based architecture, inwhich each of the m sensors transmits information to the fusion center via a communication-constrainedchannel. More concretely, at each observation round, each sensor is allowed to transmit a single bit tothe fusion center, which in turn is permitted to send some number k bits of feedback. For a decentralized November 19, 2018 DRAFT protocol with k = log( m ) bits of feedback, we prove that the algorithm achieves the order-optimal rateof the best centralized method (i.e., one with access to the full collection of raw data). We also considera protocol that permits only a single bit of feedback, and establish that it achieves the same rate. Thissingle-bit protocol is advantageous in that, with for a ﬁxed target mean-squared error of the quantileestimate, it yields longer sensor lifetimes than either the centralized or full feedback protocols.The remainder of the paper is organized as follows. We begin in Section II with background onquantile estimation, and optimal rates in the centralized setting. We then describe two algorithms forsolving the corresponding decentralized version, based on log( m ) and bit of feedback respectively, andprovide an asymptotic characterization of their performance. These theoretical results are complementedwith empirical simulations. Section III contains the analysis of these two algorithms. In Section IV, weconsider various extensions, including the case of feedback bits ℓ varying between the two extremes, andthe effect of noise on the feedforward link. We conclude in Section V with a discussion.II. P ROBLEM S ET - UP AND D ECENTRALIZED A LGORITHMS

In this section, we begin with some background material on (centralized) quantile estimation, beforeintroducing our decentralized algorithms, and stating our main theoretical results.

A. Centralized Quantile Estimation

We begin with classical background on the problem of quantile estimation (see Serﬂing [15] for furtherdetails). Given a real-valued random variable X , let F ( x ) : = P [ X ≤ x ] be its cumulative distributionfunction (CDF), which is non-decreasing and right-continuous. For any < α < , the α th -quantile of X is deﬁned as F − ( α ) = θ ( α ) : = inf { x ∈ R | F ( x ) ≥ α } . Moreover, if F is continuous at α , then wehave α = F ( θ ( α )) . As a particular example, for α = 0 . , the associated quantile is simply the median.Now suppose that for a ﬁxed level α ∗ ∈ (0 , , we wish to estimate the quantile θ ∗ = θ ( α ∗ ) . Rather thanimpose a particular parameterized form on F , we work in a non-parametric setting, in which we assumeonly that the distribution function F is differentiable, so that X has the density function p X ( x ) = F ′ ( x ) (w.r.t Lebesgue measure), and moreover that p X ( x ) > for all x ∈ R . In this setting, a standard estimatorfor θ ∗ is the sample quantile ξ N ( α ∗ ) : = F − N ( α ∗ ) where F N denotes the empirical distribution functionbased on i.i.d. samples ( X , . . . , X N ) . Under the conditions given above, it can be shown [15] that ξ N ( α ∗ ) is strongly consistent for θ ∗ (i.e., ξ N a.s. → θ ∗ ), and moreover that asymptotic normality holds √ N ( ξ N − θ ∗ ) d → N (cid:18) , α ∗ (1 − α ∗ ) p X ( θ ∗ ) (cid:19) , (1) November 19, 2018 DRAFT

PSfrag replacements X ∼ F ( · ) m Fusion center

Fig. 1.

Sensor network for quantile estimation with m sensors. Each sensor is permitted to transmit a -bitmessage to the fusion center; in turn, the fusion center is permitted to broadcast k bits of feedback. so that the asymptotic MSE decreases as O (1 /N ) , where N is the total number of samples. Althoughthis /N rate is optimal, the precise form of the asymptotic variance (1) need not be in general; seeZielinski [21] for in-depth discussion of the optimal asymptotic variances that can be obtained withvariants of this basic estimator under different conditions. B. Distributed Quantile Estimation

We consider the standard network architecture illustrated in Figure 1. There are m sensors, each ofwhich has a dedicated two-way link to a fusion center. We assume that each sensor i ∈ { , . . . , m } collectsindependent samples X ( i ) of the random variable X ∈ R with distribution function F ( θ ) : = P [ X ≤ θ ] .We consider a sequential version of the quantile estimation problem, in which sensor i receives measure-ments X n ( i ) at time steps n = 0 , , , . . . , and the fusion center forms an estimate θ n of the quantile. Thekey condition—giving rise to the decentralized nature of the problem—is that communication betweeneach sensor and the central processor is constrained, so that the sensor cannot simply relay its measurement X ( i ) to the central location, but rather must perform local computation, and then transmit a summarystatistic to the fusion center. More concretely, we impose the following restrictions on the protocol. First,at each time step n = 0 , , , . . . , each sensor i = 1 , . . . , m can transmit a single bit Y n ( i ) to the fusioncenter. Second, the fusion center can broadcast k bits back to the sensor nodes at each time step. Weanalyze two distinct protocols, depending on whether k = log( m ) or k = 1 . C. Protocol speciﬁcation

For each protocol, all sensors are initialized with some ﬁxed θ . The algorithms are speciﬁed in termsof a constant K > and step sizes ǫ n > that satisfy the conditions ∞ X n =0 ǫ n = ∞ and ∞ X n =0 ǫ n < ∞ . (2) November 19, 2018 DRAFT

The ﬁrst condition ensures inﬁnite travel (i.e., that the sequence θ n can reach θ ∗ from any startingcondition), whereas the second condition (which implies that ǫ n → ) is required for variance reduction.A standard choice satisfying these conditions—and the one that we assume herein—is ǫ n = 1 /n . With thisset-up, the log( m ) -bit scheme consists of the steps given in Table I. Although the most straightforward Algorithm: Decentralized quantile estimation with log( m ) -bit feedback Given

K > and variable step sizes ǫ n > :(a) Local decision: each sensor computes the binary decision Y n +1 ( i ) ≡ Y n +1 ( i ; θ n ) : = I ( X n +1 ( i ) ≤ θ n ) , (3)and transmits it to the fusion center.(b) Parameter update: the fusion center updates its current estimate θ n +1 of the quantile parameter as follows: θ n +1 = θ n + ǫ n K „ α ∗ − P mi =1 Y n +1 ( i ) m « (4)(c) Feedback: the fusion broadcasts the m received bits { Y n +1 (1) , . . . , Y n +1 ( m ) } back to the sensors. Each sensor canthen compute the updated parameter θ n +1 . TABLE I:

Description of the log( m ) -bf algorithm. feedback protocol is to broadcast back the m received bits { Y n +1 (1) , . . . , Y n +1 ( m ) } , as described in step(c), in fact it sufﬁces to transmit only the log( m ) bits required to perfectly describe the binomial randomvariable P mi =1 Y n +1 ( i ) in order to update θ n . In either case, after the feedback step, each sensor knowsthe value of the sum P mi =1 Y n +1 ( i ) , which (in conjunction with knowledge of m , α ∗ and ǫ n ) allow it tocompute the updated parameter θ n +1 . Finally, knowledge of θ n +1 allows each sensor to then computethe local decision (3) in the following round.The 1-bit feedback scheme detailed in Table II is similar, except that it requires broadcasting only asingle bit ( Z n +1 ), and involves an extra step size parameter K m , which is speciﬁed in the statement ofTheorem 2. After the feedback step of the 1-bf algorithm, each sensor has knowledge of the aggregatedecision Z n +1 , which (in conjunction with ǫ n and the constant β ) allow it to compute the updatedparameter θ n +1 . Knowledge of this parameter sufﬁces to compute the local decision (5). D. Convergence results

We now state our main results on the convergence behavior of these two distributed protocols. In allcases, we assume the step size choice ǫ n = 1 /n . Given ﬁxed α ∗ ∈ (0 , , we use θ ∗ to denote the November 19, 2018 DRAFT

Algorithm: Decentralized quantile estimation with -bit feedback Given K m > (possibly depending on number of sensors m ) and variable step sizes ǫ n > :(a) Local decision: each sensor computes the binary decision Y n +1 ( i ) = I ( X n +1 ( i ) ≤ θ n ) (5)and transmits it to the fusion center.(b) Aggregate decision and parameter update:

The fusion center computes the aggregate decision Z n +1 = I „ P mi =1 Y n +1 ( i ) m ≤ α ∗ « , (6)and uses it update the parameter according to θ n +1 = θ n + ǫ n K m ( Z n +1 − β ) (7)where the constant β is chosen as β = ⌊ mα ∗ ⌋ X i =0 mi ! ( α ∗ ) i (1 − α ∗ ) m − i . (8)(c) Feedback:

The fusion center broadcasts the aggregate decision Z n +1 back to the sensor nodes (one bit of feedback).Each sensor can then compute the updated parameter θ n +1 . TABLE II:

Description of the -bf algorithm. α ∗ -level quantile (i.e., such that P ( X ≤ θ ∗ ) = α ∗ ); note that our assumption of a strictly positive densityguarantees that θ ∗ is unique. Theorem 1 ( m -bit feedback): For any α ∗ ∈ (0 , , consider a random sequence { θ n } generated by the m -bit feedback protocol. Then(a) For all initial conditions θ , the sequence θ n converges almost surely to the α ∗ -quantile θ ∗ .(b) Moreover, if the constant K is chosen to satisfy p X ( θ ∗ ) K > , then √ n ( θ n − θ ∗ ) d → N , K α ∗ (1 − α ∗ ) (cid:2) Kp X ( θ ∗ ) − (cid:3) m ! , (9)so that the asymptotic MSE is O ( mn ) . Remarks:

After n steps of this decentralized protocol, a total of N = nm observations have been made,so that our discussion in Section II-A dictates (see equation (1)) that the optimal asymptotic MSE is O ( nm ) . Interestingly, then, the log( m ) -bit feedback decentralized protocol is order-optimal with respectto the centralized gold standard. November 19, 2018 DRAFT

Before stating the analogous result for the 1-bit feedback protocol, we begin by introducing someuseful notation. First, we deﬁne for any ﬁxed θ ∈ R the random variable ¯ Y ( θ ) : = 1 m m X i =1 Y ( i ; θ ) = 1 m m X i =1 I ( X ( i ) ≤ θ ) . Note that for each ﬁxed θ , the distribution of ¯ Y ( θ ) is binomial with parameters m and F ( θ ) . It isconvenient to deﬁne the function G m ( r, y ) : = ⌊ my ⌋ X i =0 (cid:18) mi (cid:19) r i (1 − r ) m − i , (10)with domain ( r, y ) ∈ [0 , × [0 , . With this notation, we have P ( ¯ Y ( θ ) ≤ y ) = G m ( F ( θ ) , y ) . Again, we ﬁx an arbitrary α ∗ ∈ (0 , and let θ ∗ be the associated α ∗ -quantile satisfying P ( X ≤ θ ∗ ) = α ∗ . Theorem 2 ( -bit feedback): Given a random sequence { θ n } generated by the -bit feedback protocol,we have(a) For any initial condition, the sequence θ n a.s. −→ θ ∗ .(b) Suppose that the step size K m is chosen such that K m > √ πα ∗ (1 − α ∗ )2 p X ( θ ∗ ) √ m , or equivalently such that γ m ( θ ∗ ) : = K m (cid:12)(cid:12)(cid:12) ∂G m ∂r ( r ; α ∗ ) (cid:12)(cid:12) r = α ∗ (cid:12)(cid:12)(cid:12) p X ( θ ∗ ) > , (11)then √ n ( θ n − θ ∗ ) d → N , K m G m ( α ∗ , θ ∗ ) (cid:2) − G m ( α ∗ , θ ∗ ) (cid:3) γ m ( θ ∗ ) − ! (12)(c) If we choose a constant step size K m = K , then as n → ∞ , the asymptotic variance behaves as " K p πα ∗ (1 − α ∗ )8 Kp X ( θ ∗ ) √ m − p πα ∗ (1 − α ∗ ) , (13)so that the asymptotic MSE is O (cid:16) n √ m (cid:17) .(d) If we choose a decaying step size K m = K √ m , then m " K p πα ∗ (1 − α ∗ )8 Kp X ( θ ∗ ) − p πα ∗ (1 − α ∗ ) , (14)so that the asymptotic MSE is O (cid:0) nm (cid:1) . November 19, 2018 DRAFT

E. Comparative Analysis

It is interesting to compare the performance of each proposed decentralized algorithm to the centralizedperformance. Considering ﬁrst the log( m ) -bf scheme, suppose that we set K = 1 /p X ( θ ∗ ) . Using theformula (9) from Theorem 1, we obtain that the asymptotic variance of the m -bf scheme with this choiceof K is given by α ∗ (1 − α ∗ ) p X ( θ ∗ ) 1 mn , thus matching the asymptotics of the centralized quantile estimator (1). Infact, it can be shown that the choice K = 1 /p X ( θ ∗ ) is optimal in the sense of minimizing the asymptoticvariance for our scheme, when K is constrained by the stability criterion in Theorem 1. In practice,however, the value p X ( θ ∗ ) is typically not known, so that it may not be possible to implement exactlythis scheme. An interesting question is whether an adaptive scheme could be used to estimate p X ( θ ∗ ) (and hence the optimal K simultaneously), thereby achieving this optimal asymptotic variance. We leavethis question open as an interesting direction for future work.Turning now to the algorithm -bf, if we make the substitution ¯ K = K/ p πα ∗ (1 − α ∗ ) in equa-tion (14), then we obtain the asymptotic variance π K α ∗ (1 − α ∗ ) (cid:2) Kp X ( θ ∗ ) − (cid:3) m . (15)Since the stability criterion is the same as that for m -bf, the optimal choice is ¯ K = 1 /p X ( θ ∗ ) . Conse-quently, while the (1 / [ mn ]) rate is the same as both the centralized and decentralized m -bf protocols,the pre-factor for the -bf algorithm is π ≈ . times larger than the optimized m -bf scheme. However,despite this loss in the pre-factor, the -bf protocol has substantial advantages over the m -bf; in particular,the network lifetime scales as O ( m ) compared to O ( m/ log( m )) for the log( m ) -bf scheme. F. Simulation example

We now provide some simulation results in order to illustrate the two decentralized protocols, and theagreement between theory and practice. In particular, we consider the quantile estimation problem whenthe underlying distribution (which, of course, is unknown to the algorithm) is uniform on [0 , . In thiscase, we have p X ( x ) = 1 uniformly for all x ∈ [0 , , so that taking the constant K = 1 ensures that thestability conditions in both Theorem 1 and 2 are satisﬁed. We simulate the behavior of both algorithmsfor α ∗ = 0 . over a range of choices for the network size m . Figure 2(a) illustrates several sample pathsof m -bit feedback protocol, showing the convergence to the correct θ ∗ .For comparison to our theory, we measure the empirical variance by averaging the error ˆ e n = √ n ( θ n − θ ∗ ) over L = 20 runs. The normalization by √ n is used to isolate the effect of increasing m , the number ofnodes in the network. We estimate the variance by running algorithm for n = 2000 steps, and computing November 19, 2018 DRAFT −0.200.20.40.60.81 Number of Iterations (N) θ P a r a m e t e r −6 −5 −4 −3 −2 −1 M (number of sensors) M ean S qua r ed E rr o r −6 −5 −4 −3 −2 −1 M (Number of Sensors) M ean S qua r ed E rr o r M−bf 1−bf (a) (b) (c)

Fig. 2.

Convergence of θ n to θ ∗ with m = 11 nodes, and quantile level α ∗ = 0 . . (b) Log-log plots of thevariance against m for both algorithms ( log( m ) -bf and -bf) with constant step sizes, and comparison tothe theoretically-predicted rate (solid straight lines). (c) Log-log plots of log( m ) -bf with constant step sizeversus -bf algorithm with decaying step size. the empirical variance of ˆ e n for time steps n = 1800 through to n = 2000 . Figure 2(b) shows theseempirically computed variances, and a comparison to the theoretical predictions of Theorems 1 and 2for constant step size; note the excellent agreement between theory and practice. Panel (c) shows thecomparison between the log( m ) -bf algorithm, and the -bf algorithm with decaying / √ m step size.Here the asymptotic MSE of both algorithms decays like /m for log m up to roughly ; after thispoint, our ﬁxed choice of n is insufﬁcient to reveal the asymptotic behavior.III. A NALYSIS

In this section, we turn to the proofs of Theorem 1 and 2, which exploit results from the stochasticapproximation literature [12], [3]. In particular, both types of parameter updates (4) and (7) can be writtenin the general form θ n +1 = θ n + ǫ n H ( θ n , Y n +1 ) , (16)where Y n +1 = ( Y n +1 (1) , . . . Y n +1 ( m )) . Note that the step size choice ǫ n = 1 /n satisﬁes the conditionsin equation (2). Moreover, the sequence ( θ n , Y n +1 ) is Markov, since θ n and Y n +1 depend on the pastonly via θ n − and Y n . We begin by stating some known results from stochastic approximation, applicableto such Markov sequences, that will be used in our analysis.For each ﬁxed θ ∈ R , let µ θ ( · ) denote the distribution of Y conditioned on θ . A key quantity in theanalysis of stochastic approximation algorithms is the averaged function h ( θ ) : = Z H ( θ, y ) µ θ ( dy ) = E [ H ( θ, Y ) | θ ] . (17) November 19, 2018 DRAFT0

We assume (as is true for our cases) that this expectation exists. Now the differential equation methoddictates that under suitable conditions, the asymptotic behavior of the update (16) is determined essentiallyby the behavior of the ODE dθdt = h ( θ ( t )) . Almost sure convergence:

Suppose that the following attractiveness condition h ( θ ) [ θ − θ ∗ ] < for all θ = θ ∗ (18)is satisﬁed. If, in addition, the variance R ( θ ) : = Var[ H ( θ ; Y ) | θ ] is bounded, then we are are guaranteedthat θ n a.s. → θ ∗ (see § Asymptotic normality:

In our updates, the random variables Y n take the form Y n = g ( X n , θ n ) wherethe X n are i.i.d. random variables. Suppose that the following stability condition is satisﬁed: γ ( θ ∗ ) : = − dhdθ ( θ ∗ ) > . (19)Then we have √ n ( θ n − θ ∗ ) d → N (cid:18) , R ( θ ∗ )2 γ ( θ ∗ ) − (cid:19) (20)See § A. Proof of Theorem 1 (a) The m -bit feedback algorithm is a special case of the general update (16), with ǫ n = n and H ( θ n , Y n +1 ) = K (cid:2) α ∗ − m P mi =1 Y n +1 ( i ; θ n ) (cid:3) . Computing the averaged function (17), we have h ( θ ) = K E " α ∗ − m m X i =1 Y n +1 ( i ) | θ n = K ( α ∗ − F ( θ n )) , where F ( θ n ) = P ( X ≤ θ n ) . We then observe that θ ∗ satisﬁes the attractiveness condition (18), since [ θ − θ ∗ ] h ( θ n ) = K [ θ − θ ∗ ] [ α ∗ − F ( θ n )] < for all θ = θ ∗ , by the monotonicity of the cumulative distribution function. Finally, we compute theconditional variance of H as follows: R ( θ n ) = K Var (cid:20) α ∗ − P mi =1 Y n +1 ( i ) m | θ n (cid:21) = K m F ( θ n ) [1 − F ( θ n )] ≤ K m , (21)using the fact that H is a sum of m Bernoulli variables that are conditionally i.i.d. (given θ n ). Thus, wecan conclude that θ n → θ ∗ almost surely. November 19, 2018 DRAFT1 (b) Note that γ ( θ ∗ ) = − dhdθ ( θ ∗ ) = Kp X ( θ ∗ ) > , so that the stability condition (19) holds. Applying theasymptotic normality result (20) with the variance R ( θ ∗ ) = K m α ∗ (1 − α ∗ ) (computed from equation (21))yields the claim. (cid:4) B. Proof of Theorem 2

This argument involves additional analysis, due to the aggregate decision (6) taken by the fusioncenter. Since the decision Z n +1 is a Bernoulli random variable; we begin by computing its parameter.Each transmitted bit Y n +1 ( i ) is Ber( F ( θ n )) , where we recall the notation F ( θ ) : = P ( X ≤ θ ) . Using thedeﬁnition (10), we have the equivalences P ( Z n +1 = 1) = G m ( F ( θ n ) , α ∗ ) (22a) β = G m ( α ∗ , α ∗ ) = G m ( F ( θ ∗ ) , α ∗ ) . (22b)We start with the following result: Lemma 1:

For ﬁxed x ∈ [0 , , the function f ( r ) : = G m ( r, x ) is non-negative, differentiable andmonotonically decreasing. Proof:

Non-negativity and differentiability are immediate. To establish monotonicity, note that f ( r ) = P ( P mi =1 Y i ≤ xm ) , where the Y i are i.i.d. Ber( r ) variates. Consider a second Ber( r ′ ) sequence Y ′ i with r ′ > r . Then the sum P mi =1 Y ′ i stochastically dominates P mi =1 Y i , so that f ( r ) < f ( r ′ ) as required. (cid:4) To establish almost sure convergence, we use a similar approach as in the previous theorem. Usingthe equivalences (22), we compute the function h as follows h ( θ ) = K m E [ Z n +1 − β | θ ]= K m [ G m ( F ( θ ) , α ∗ ) − G m ( F ( θ ∗ ) , α ∗ )] . Next we establish the attractiveness condition (18). In particular, for any θ such that F ( θ ) = F ( θ ∗ ) , wecalculate that h ( θ ) [ θ − θ ∗ ] is given by K m n G m ( F ( θ n ) , α ∗ ) − G m ( F ( θ ∗ ) , α ∗ ) o [ θ n − θ ∗ ] < , where the inequality follows from the fact that G m ( r, x ) is monotonically decreasing in r for each ﬁxed x ∈ [0 , (using Lemma 1), and that the function F is monotonically increasing. Finally, computing thevariance R ( θ ) : = Var [ H ( θ, Y ) | θ ] , we have R ( θ ) = K m G m ( F ( θ ) , α ∗ ) [1 − G m ( F ( θ ) , α ∗ )] ≤ K m November 19, 2018 DRAFT2 since (conditioned on θ ), the decision Z n +1 is Bernoulli with parameter G m ( F ( θ ); α ∗ ) . Thus, we canconclude that θ n → θ ∗ almost surely.(b) To show asymptotic normality, we need to verify the stability condition. By chain rule, we have hdθ ( θ ∗ ) = K m ∂G m ∂r ( r, α ∗ ) (cid:12)(cid:12)(cid:12) r = F ( θ ) p X ( θ ) . From Lemma 1, we have ∂G m ∂r ( F ( θ ) , α ∗ ) < , so that the stabilitycondition holds as long as γ m ( θ ∗ ) > (where γ m is deﬁned in the statement). Thus, asymptotic normalityholds.In order to compute the asymptotic variance, we need to investigate the behavior of R ( θ ∗ ) and γ ( θ ∗ ) as m → + ∞ . First examining R ( θ ∗ ) , the central limit theorem guarantees that G m ( F ( θ ∗ ) , y ) → Φ (cid:16) √ m y − α ∗ α ∗ (1 − α ∗ ) (cid:17) .Consequently, we have R ( θ ∗ ) = K m G m ( F ( θ ∗ ) , α ∗ ) [1 − G m ( F ( θ ∗ ) , α ∗ )] → K m . We now turn to the behavior of γ ( θ ∗ ) . We ﬁrst prove a lemma to characterize the asymptotic behaviorof G m ( r, α ∗ ) : Lemma 2: (a) The partial derivative of G m ( r, x ) with respect to r is given by: ∂G m ( r, x ) ∂r = E [ X I ( X ≤ xm )] − E [ X ] E [ I ( X ≤ xm )] r (1 − r ) , (23)where X is binomial with parameters ( m, x ) , and mean E [ X ] = xm .(b) Moreover, as m → + ∞ , we have ∂G m ( r, α ∗ ) ∂r (cid:12)(cid:12) r = F ( θ ∗ ) → − r m πα ∗ (1 − α ∗ ) . Proof: (a) Computing the partial derivative, we have ∂G m ( r, x ) ∂r = ⌊ mα ∗ ⌋ X i =0 (cid:18) mi (cid:19) (cid:2) ir i − (1 − r ) m − i − ( m − i ) r i (1 − r ) m − i − (cid:3) = 1 r (1 − r ) ⌊ mx ⌋ X i =0  mi  ( i − mr ) r i (1 − r ) m − i = 1 r (1 − r )  ⌊ mx ⌋ X i =0  mi  r i (1 − r ) m − i − mr ⌊ mx ⌋ X i =0  mi  r i (1 − r ) m − i  = 1 r (1 − r ) ( E [ X I ( X ≤ mx )] − E [ X ] E [ I ( X ≤ mx )]) , as claimed.(b) We derive this limiting behavior by applying classical asymptotics to the form of ∂G m ( r,α ∗ ) ∂r given in November 19, 2018 DRAFT3 part (a). Deﬁning Z m = X − α ∗ m √ m , the central limit theorem yields that: Z m d → Z ∼ N (0 , a ) (24) a : = α ∗ (1 − α ∗ ) Moreover, in this binomial case, we actually have E [ | Z m | ] → E [ | Z | ] = q aπ .First, since E [ X ] = α ∗ m and E [ I ( X ≤ α ∗ m )] → by the CLT, we have E [ X ] E [ I ( X ≤ α ∗ m )] → α ∗ m . (25)Let us now re-write the ﬁrst term in the representation (23) of ∂G m ( r,α ∗ ) ∂r as E [ X I ( X ≤ α ∗ m )] = α ∗ m E [ I ( X ≤ α ∗ m )] + √ m E [ Z m I ( Z m ≤ → α ∗ m − √ m r a π (26)since E [ I ( X ≤ α ∗ m )] → / and E [ Z m I ( Z m ≤ → E [ Z I ( Z ≤ E [ | Z | ] = r a π . Putting together the limits (25) and (26), we conclude that ∂G m ( r,α ∗ ) ∂r (cid:12)(cid:12) r = α ∗ converges to α ∗ (1 − α ∗ ) "( α ∗ m − √ m r α ∗ (1 − α ∗ )2 π ) − α ∗ m = − r m πα ∗ (1 − α ∗ ) , as claimed. (cid:4) Returning now to the proof of the theorem, we use Lemma 2 and put the pieces together to obtainthat R ( θ ∗ )2 K m ˛˛˛ ∂Gm ( r,θ ∗ ) ∂r (cid:12)(cid:12) r = α ∗ ˛˛˛ p X ( θ ∗ ) − converges to K m / K m √ mp X ( θ ∗ ) √ πα ∗ (1 − α ∗ ) − m " K p πα ∗ (1 − α ∗ )8 Kp X ( θ ∗ ) − p πα ∗ (1 − α ∗ ) , with K > √ πα ∗ (1 − α ∗ )2 p X ( θ ∗ ) for stability, thus completing the proof of the theorem. (cid:4) IV. S

OME EXTENSIONS

In this section, we consider some extensions of the algorithms and analysis from the preceding sections,including variations in the number of feedback bits, and the effects of noise.

November 19, 2018 DRAFT4

A. Different levels of feedback

We ﬁrst consider the generalization of the preceding analysis to the case when the fusion center somenumber of bits between and m . The basic idea is to apply a quantizer with ℓ levels, correspondingto log (2 ℓ ) bits, on the update of the stochastic gradient algorithm. Note that the extremes ℓ = 1 and ℓ = 2 m − correspond to the previously studied protocols. Given ℓ levels, we partition the real line as − ∞ = s − ℓ < s − ℓ +1 < . . . < s ℓ − < s ℓ = + ∞ , (27)where the remaining breakpoints { s k } are to be speciﬁed. With this partition ﬁxed, we deﬁne a quanti-zation function Q ℓ Q ℓ ( X ) : = r k if X ∈ ( s k , s k +1 ] for k = − ℓ, . . . , ℓ − , (28)where the ℓ quantized values ( r − ℓ , . . . , r ℓ − ) are to be chosen. In the setting of the algorithm to beproposed, the quantizer is applied to binomial random variables X with parameters ( m, r ) . Recall thefunction G m ( r, x ) , as deﬁned in equation (10), corresponding to the probability P [ X ≤ mx ] . Let usdeﬁne a new function G m,ℓ , corresponding to the expected value of the quantizer when applied to sucha binomial variate, as follows G m,ℓ ( r, x ) : = ℓ − X k = − ℓ r k { G m ( r, x − s k ) − G m ( r, x − s k +1 ) } . (29)With these deﬁnitions, the general log (2 ℓ ) feedback algorithm takes the form shown in Table III.In order to understand the choice of the offset parameter β deﬁned in equation (33), we compute theexpected value of the quantizer function, when θ n = θ ∗ , as follows E h Q ℓ (cid:20) α ∗ − P mi =1 Y n +1 ( i ) m (cid:21) | θ n = θ ∗ i = ℓ − X k = − ℓ r k P (cid:20) ( α ∗ − s k +1 ) < ¯ Y ( θ ∗ ) m ≤ ( α ∗ − s k ) (cid:21) = ℓ − X k = − ℓ r k [ G m ( F ( θ ∗ ) , α ∗ − s k ) − G m ( F ( θ ∗ ) , α ∗ − s k +1 )]= G m,ℓ ( F ( θ ∗ ) , α ∗ ) . The following result, analogous to Theorem 2, characterizes the behavior of this general protocol:

Theorem 3 (General feedback scheme):

Given a random sequence { θ n } generated by the general log (2 ℓ ) -bit feedback protocol, there exist choices of partition { s k } and quantization levels { r k } such that:(a) For any initial condition, the sequence θ n a.s. −→ θ ∗ . November 19, 2018 DRAFT5

Algorithm: Decentralized quantile estimation with log (2 ℓ ) -bits feedback Given K m > (possibly depending on number of sensors m ) and variable step sizes ǫ n > :(a) Local decision: each sensor computes the binary decision Y n +1 ( i ) = I ( X n +1 ( i ) ≤ θ n ) (30)and transmits it to the fusion center.(b) Aggregate decision and parameter update:

The fusion center computes the quantized aggregate decision variable Z n +1 = Q ℓ » α ∗ − P mi =1 Y n +1 ( i ) m – , (31)and uses it update the parameter according to θ n +1 = θ n + ǫ n K m ( Z n +1 − β ) (32)where the constant β is chosen as β : = G m,ℓ ( F ( θ ∗ ) , α ∗ ) . (33)(c) Feedback:

The fusion center broadcasts the aggregate quantized decision Z n +1 back to the sensor nodes, using its log (2 ℓ ) bits of feedback. The sensor nodes can then compute the updated parameter θ n +1 . TABLE III:

Description of the general algorithm, with log (2 ℓ ) bits of feedback. (b) There exists a choice of decaying step size (i.e., K m ≍ √ m ) such that the asymptotic variance ofthe protocol is given by κ ( α ∗ , Q ℓ ) mn , where the constant has the form κ ( α ∗ , Q ℓ ) : = 2 π P ℓ − k = − ℓ r k ∆ G m ( s k , s k +1 ) − β (cid:16)P ℓ − k = − ℓ r k ∆ m ( s k , s k +1 ) (cid:17) , (34)with ∆ G m ( s k , s k +1 ) = G m ( F ( θ ∗ ) , α ∗ − s k ) − G m ( F ( θ ∗ ) , α ∗ − s k +1 ) , and (35a) ∆ m ( s k , s k +1 ) = exp (cid:18) − ms k α ∗ (1 − α ∗ ) (cid:19) − exp − ms k +1 α ∗ (1 − α ∗ ) ! . (35b)We provide a formal proof of Theorem 3 in the Appendix. Figure 3(a) illustrates how the constant factor κ , as deﬁned in equation (34) decreases as the number of levels ℓ in an uniform quantizer is increased.In order to provide comparison with results from the previous section, let us see how the two extremecases ( bit and m feedback) can be obtained as special case. For the -bit case, the quantizer has ℓ = 1 levels with breakpoints s − = −∞ , s = 0 , s = + ∞ , and quantizer outputs r − = 0 and r = 1 . By November 19, 2018 DRAFT6 making the appropriate substitutions, we obtain: κ ( α ∗ , Q ) = 2 π ∆ G m ( s , s ) − β ∆ m ( s , s ) , β = G m,ℓ ( F ( θ ∗ ) , α ∗ ) , ∆ G m ( s , s ) = G m,ℓ ( F ( θ ∗ ) , α ∗ ) and ∆ m ( s , s )) = 1 . By applying the central limit theorem, we conclude that ∆ G m ( s , s ) − β = G m,ℓ ( F ( θ ∗ ) , α ∗ )(1 − G m,ℓ ( F ( θ ∗ ) , α ∗ )) → / , as established earlier. Thus κ ( α ∗ , Q ) → π/ as m → ∞ , recovering the result of Theorem 2. Similarly,the results for m -bf can be recovered by setting the parameters r k − ℓ = α ∗ − m − km , for k = 0 , ..., m, and s i = r i . (36) κ ( α * , Q ) P r e f a c t o r V M bit1 bit (a) (b)

Fig. 3. (a) Plots of the asymptotic variance κ ( α ∗ , Q ℓ ) deﬁned in equation (34) versus the number of levels ℓ in a uniform quantizer, corresponding to log (2 ℓ ) bits of feedback, for a sensor network with m = 4000 nodes. The plots show the asymptotic variance rescaled by the centralized gold standard, so that it starts at π/ for ℓ = 2 , and decreases towards as ℓ is increased towards m/ . (b) Plots of the asymptotic variances V m ( ǫ ) and V ( ǫ ) deﬁned in equation (39) as the feedforward noise parameter ǫ is increased from towards . B. Extensions to noisy links

We now brieﬂy consider the effect of communication noise on our algorithms. There are two types ofnoise to consider: (a) feedforward , meaning noise in the link from sensor node to fusion center, and (b)

November 19, 2018 DRAFT7 feedback , meaning noise in the feedback link from fusion center to the sensor nodes. Here we show thatfeedforward noise can be handled in a relatively straightforward way in our algorithmic framework. On theother hand, feedback noise requires a different analysis, as the different sensors may loose synchronicityin their updating procedure. Although a thorough analysis of such asynchronicity is an interesting topicfor future research, we note that assuming noiseless feedback is not unreasonable, since the fusion centertypically has greater transmission power.Focusing then on the case of feedforward noise, let us assume that the link between each sensor andthe fusion center acts as a binary symmetric channel (BSC) with probability ǫ ∈ [0 , ) . More precisely,if a bit x ∈ { , } is transmitted, then the received bit y has the (conditional) distribution P ( y | x ) =  − ǫ if x = yǫ if x = y. (37)With this bit-ﬂipping noise, the updates (both equation (4) and (7)) need to be modiﬁed so as to correctfor the bias introduced by the channel noise. If α ∗ denotes the desired quantile, then in the presence ofBSC( ǫ ) noise, both algorithms should be run with the modiﬁed parameter e α ( ǫ ) : = (1 − ǫ ) α ∗ + ǫ. (38)Note that e α ( ǫ ) ranges between α ∗ (for the noiseless case ǫ = 0 ), to a quantity arbitrarily close to , asthe channel approaches the extreme of pure noise ( ǫ = ). The following lemma shows that for all ǫ < ,this adjustment (38) sufﬁces to correct the algorithm. Moreover, it speciﬁes how the resulting asymptoticvariance depends on the noise parameter: Proposition 1:

Suppose that each of the m feedforward links from sensor to fusion center are modeledas i.i.d. BSC channels with probability ǫ ∈ [0 , ) . Then the m -bf or -bf algorithms, with the adjusted e α ( ǫ ) , are strongly consistent in computing the α ∗ -quantile. Moreover, with appropriate step size choices,their asymptotic MSEs scale as / ( mn ) with respective pre-factors given by V m ( ǫ ) : = K e α ( ǫ ) (1 − e α ( ǫ )) (cid:2) K (1 − ǫ ) p X ( θ ∗ ) − (cid:3) (39a) V ( ǫ ) : = " K p π e α ( ǫ )(1 − e α ( ǫ ))8 K (1 − ǫ ) p X ( θ ∗ ) − p π e α ( ǫ )(1 − e α ( ǫ )) . (39b)In both cases, the asymptotic MSE is minimal for ǫ = 0 . Proof:

If sensor node i transmits a bit Y n +1 ( i ) at round n + 1 , then the fusion center receives the randomvariable e Y n +1 ( i ) = Y n +1 ( i ) ⊕ W n +1 , November 19, 2018 DRAFT8 where W n +1 is Bernoulli with parameter ǫ , and ⊕ denotes addition modulo two. Since W n +1 is inde-pendent of the transmitted bit (which is Bernoulli with parameter F ( θ n ) ), the received value e Y n +1 ( i ) isalso Bernoulli, with parameter ǫ ∗ F ( θ n ) = ǫ (1 − F ( θ n )) + (1 − ǫ ) F ( θ n ) = ǫ + (1 − ǫ ) F ( θ n ) . (40)Consequently, if we set e α ( ǫ ) according to equation (38), both algorithms will have their unique ﬁxed pointwhen F ( θ ) = α ∗ , so will compute the α ∗ -quantile of X . The claimed form of the asymptotic variancesfollows from by performing calculations analogous to the proofs of Theorems 1 and 2. In particular, thepartial derivative with respect to θ now has a multiplicative factor (1 − ǫ ) , arising from equation (40)and the chain rule. To establish that the asymptotic variance is minimized at ǫ = 0 , it sufﬁces to notethat the derivative of the MSE with respect to ǫ is positive, so that it is an increasing function of ǫ . (cid:4) Of course, both the algorithms will fail, as would be expected, if ǫ = 1 / corresponding to purenoise. However, as summarized in Proposition 1, as long as ǫ < , feedforward noise does not affect theasymptotic rate itself, but rather only the pre-factor in front of the / ( mn ) rate. Figure 3(b) shows howthe asymptotic variances V m ( ǫ ) and V ( ǫ ) behave as ǫ is increased towards ǫ = .V. D ISCUSSION

In this paper, we have proposed and analyzed different approaches to the problem of decentralizedquantile estimation under communication constraints. Our analysis focused on the fusion-centric archi-tecture, in which a set of m sensor nodes each collect an observation at each time step. After n roundsof this process, the centralized oracle would be able to estimate an arbitrary quantile with mean-squarederror of the order O (1 / ( mn )) . In the decentralized formulation considered here, each sensor node isallowed to transmit only a single bit of information to the fusion center. We then considered a range ofdecentralized algorithms, indexed by the number of feedback bits that the fusion center is allowed totransmit back to the sensor nodes. In the simplest case, we showed that an log m -bit feedback algorithmachieves the same asymptotic variance O (1 / ( mn )) as the centralized estimator. More interestingly, wealso showed that that a -bit feedback scheme, with suitably designed step sizes, can also achieve thesame asymptotic variance as the centralized oracle. We also showed that using intermediate amounts offeedback (between and m bits) does not alter the scaling behavior, but improves the constant. Finally,we showed how our algorithm can be adapted to the case of noise in the feedforward links from sensornodes to fusion center, and the resulting effect on the asymptotic variance. November 19, 2018 DRAFT9

Our analysis in the current paper has focused only on the fusion center architecture illustrated inFigure 1. A natural generalization is to consider a more general communication network, speciﬁed by anundirected graph on the sensor nodes. One possible formulation is to allow only pairs of sensor nodesconnected by an edge in this communication graph to exchange a bit of information at each round. Inthis framework, the problem considered in this paper effectively corresponds to the complete graph, inwhich every node communicates with every other node at each round. This more general formulationraises interesting questions as to the effect of graph topology on the achievable rates and asymptoticvariances.

Acknowledgements

We would like to thank Prof. Pravin Varaiya for some discussion that led to the initial ideas onthis subject. RR was supported by the California Department of Transportation through the CaliforniaPATH program. MJW was partially supported by NSF grant DMS-0605165 and an NSF CAREER awardCCF-0545862. A

PPENDIX

Proof of Theorem 3:

We proceed in an analogous manner to the proof of Theorem 1:

Lemma 3:

For ﬁxed x ∈ [0 , , the function G m,ℓ ( r, x ) is non-negative, differentiable and monotoni-cally decreasing. Proof:

First notice that by deﬁnition: G m,ℓ ( r, x ) = E h Q ℓ (cid:20) x − Xm (cid:21) i , (41)where X is a Bin ( r, m ) random variable. Note that if X ′ ∼ Bin ( r ′ , m ) , with r ′ > r , then cer-tainly P ( X ′ ≤ n ) ≤ P ( X ≤ n ) —meaning that X ′ stochastically dominates X . For any constant x , P (cid:0) x − X ′ m ≤ s (cid:1) ≥ P (cid:0) x − Xm ≤ s (cid:1) . Furthermore, by the quantizer is, by deﬁnition, a monotonically non-decreasing function. Consequently, a standard result on stochastic domination [7, § G m,ℓ ( r, x ) ≥ G m,ℓ ( r ′ , x ) . Differentiability follows from the deﬁnition of the function. (cid:4) The ﬁniteness of the variance of the quantization step is clear by construction; more speciﬁcally, acrude upper bound is r ℓ . Thus, analogous to the previous theorems, Lemma 3 is used to establish almostsure convergence. November 19, 2018 DRAFT0

Now, some straightforward algebra using the results of Lemma 2 shows that the partial derivative ∂G m,ℓ ( r,x ) ∂r is r (1 − r ) ℓ − X k = − ℓ r k (cid:26) E (cid:20) X I (cid:18) x − s k +1 ≤ Xm ≤ x − s k (cid:19)(cid:21) − E [ X ] P (cid:20) x − s k +1 ≤ Xm ≤ x − s k (cid:21)(cid:27) , (42)This will be used next. To compute the asymptotic variance, we again exploit asymptotic normality(see equation (24)) as before: E [ X I ( m ( α ∗ − s k +1 ) ≤ X ≤ m ( α ∗ − s k ))] = E (cid:20) X I (cid:18) −√ ms k +1 ≤ X − α ∗ m √ m ≤ −√ ms k (cid:19)(cid:21) = √ m E (cid:2) ( Z + α ∗ √ m ) I (cid:0) −√ ms k +1 ≤ Z ≤ −√ ms k (cid:1)(cid:3) = √ m E (cid:2) Z I (cid:0) −√ ms k +1 ≤ Z ≤ −√ ms k (cid:1)(cid:3) + S → −√ m Z √ ms k +1 √ ms k z exp (cid:16) − z a (cid:17) √ πa dz + SS : = E [ X ] P ( m ( x − s k +1 ) ≤ X ≤ m ( x − s k )) Now make the deﬁnition, which corresponds to solving the integral above: ∆ m ( s k , s k +1 ) = exp (cid:18) − ms k α ∗ (1 − α ∗ ) (cid:19) − exp − ms k +1 α ∗ (1 − α ∗ ) !! Thus, plugging into Equation 42, noticing that S cancels: ∂G m,ℓ ( r, α ∗ ) ∂r (cid:12)(cid:12) r = F ( θ ∗ ) → − r m πα ∗ (1 − α ∗ ) ℓ − X k = − ℓ r k ∆ m ( s k , s k +1 ) A side note is that if one chooses s = 0 , we are guaranteed that at least one ∆ m ( s k , s k +1 ) doesnot go to zero in a ﬁxed quantizer (i.e. a quantizer where the levels s k do not depend on m ). But thecorrection factor expression, and as a matter of fact, the optimum quantization of Gaussian, suggests thatthe levels s k scale as / √ m . In this case, the factor is a constant, independent of m .We now need to compute R ( θ ∗ ) for the quantized updated. It is also straightforward to see that thisquantity is given by: R ( θ ∗ ) = K m ℓ − X k = − ℓ r k ( G m ( F ( θ ∗ ) , α ∗ − s k ) − G m ( F ( θ ∗ ) , α ∗ − s k +1 )) − β Putting everything together we obtain the asymptotic variance estimate for the more general quantizerconverges to:

November 19, 2018 DRAFT1 R ( θ ∗ )2 K m (cid:12)(cid:12)(cid:12) ∂G m,ℓ ( r,θ ∗ ) ∂r (cid:12)(cid:12) r = α ∗ (cid:12)(cid:12)(cid:12) p X ( θ ∗ ) − → K m P ℓ − k = − ℓ r k ( G m ( F ( θ ∗ ) , α ∗ − s k ) − G m ( F ( θ ∗ ) , α ∗ − s k +1 )) − β K m √ m P ℓ − k = − ℓ r k ∆ m ( s k ,s k +1 ) p X ( θ ∗ ) √ πα ∗ (1 − α ∗ ) − Set a gain K = K m √ m P ℓ − k = − ℓ r k ∆ m ( s k ,s k +1 ) √ πα ∗ (1 − α ∗ ) and we have the ﬁnal expression for the variance: π P ℓ − k = − ℓ r k ∆ G m ( s k , s k +1 ) − β (cid:16)P ℓ − k = − ℓ r k ∆ m ( s k , s k +1 ) (cid:17) (cid:20) K α ∗ (1 − α ∗ )2 Kp X ( θ ∗ ) − m (cid:21) Where ∆ G m ( s k , s k +1 ) = G m ( α ∗ , α ∗ − s k ) − G m ( α ∗ , α ∗ − s k +1 ) . The constant κ ( α ∗ , Q ℓ ) deﬁnes theperformance of the algorithm for different quantization choices: κ ( α ∗ , Q ℓ ) = 2 π P ℓ − k = − ℓ r k ∆ G m ( s k , s k +1 ) − β (cid:16)P ℓ − k = − ℓ r k ∆ m ( s k , s k +1 ) (cid:17) The rate with respect to m is the same, independent of quantization. It is clear from previous analysisthat if the best quantizers are chosen ≤ κ ( α ∗ , Q ℓ ) ≤ π . Obviously κ ( α ∗ , Q ℓ ) over the class of optimalquantizers is a decreasing function of ℓ . R EFERENCES [1] S. Amari and T. S. Han. Statistical inference under multiterminal rate restrictions: A differential geometric approach.

IEEETrans. Info. Theory , 35(2):217–227, March 1989.[2] E. Ayanoglu. On optimal quantization of noisy sources.

IEEE Trans. Info. Theory , 36(6):1450–1452, 1990.[3] A. Benveniste, M. Metivier, and P. Priouret.

Adaptive Algorithms and Stochastic Approximations . Springer-Verlag, NewYork, NY, 1990.[4] R. S. Blum, S. A. Kassam, and H. V. Poor. Distributed detection with multiple sensors: Part ii—advanced topics.

Proceedings of the IEEE , 85:64–79, January 1997.[5] J. F. Chamberland and V. V. Veeravalli. Asymptotic results for decentralized detection in power constrained wireless sensornetworks.

IEEE Journal on Selected Areas in Communication , 22(6):1007–1015, August 2004.[6] C. Chong and S. P. Kumar. Sensor networks: Evolution, opportunities, and challenges.

Proceedings of the IEEE , 91:1247–1256, 2003.[7] G.R. Grimmett and D.R. Stirzaker.

Probability and random processes . Oxford Science Publications, Clarendon Press,Oxford, 1992.[8] J. A. Gubner. Decentralized estimation and quantization.

IEEE Trans. Info. Theory , 39(4):1456–1459, 1993.

November 19, 2018 DRAFT2 [9] J. Han, P. K. Varshney, and V. C. Vannicola. Some results on distributed nonparametric detection. In

Proc. 29th Conf. onDecision and Control , pages 2698–2703, 1990.[10] T. S. Han and S. Amari. Statistical inference under multiterminal data compression.

IEEE Trans. Info. Theory , 44(6):2300–2324, October 1998.[11] T. S. Han and K. Kobayashi. Exponential-type error probabilities for multiterminal hypothesis testing.

IEEE Trans. Info.Theory , 35(1):2–14, January 1989.[12] H. J. Kushner and G. G. Yin.

Stochastic Approximation Algorithms and Applications . Springer-Verlag, New York, NY,1997.[13] Z.Q. Luo. Universal decentralized estimation in a bandwidth-constrained sensor network.

IEEE Trans. Info. Theory ,51(6):2210–2219, 2005.[14] X. Nguyen, M. J. Wainwright, and M. I. Jordan. Nonparametric decentralized detection using kernel methods.

IEEE Trans.Signal Processing , 53(11):4053–4066, November 2005.[15] R. J. Serﬂing.

Approximation Theorems of Mathematical Statistics . Wiley Series in Probability and Statistics. Wiley, 1980.[16] R. R. Tenney and N. R. Jr. Sandell. Detection with distributed sensors.

IEEE Trans. Aero. Electron. Sys. , 17:501–510,1981.[17] J. N. Tsitsiklis. Decentralized detection. In

Advances in Statistical Signal Processing , pages 297–344. JAI Press, 1993.[18] V. V. Veeravalli, T. Basar, and H. V. Poor. Decentralized sequential detection with a fusion center performing the sequentialtest.

IEEE Trans. Info. Theory , 39(2):433–442, 1993.[19] R. Viswanathan and P. K. Varshney. Distributed detection with multiple sensors: Part i—fundamentals.

Proceedings of theIEEE , 85:54–63, January 1997.[20] Z. Zhang and T. Berger. Estimation via compressed information.

IEEE Trans. Info. Theory , 34(2):198–211, 1988.[21] R. Zielinski. Optimal quantile estimators: Small sample approach. Technical report, Inst. of Math. Pol. Academy of Sci.,2004., 34(2):198–211, 1988.[21] R. Zielinski. Optimal quantile estimators: Small sample approach. Technical report, Inst. of Math. Pol. Academy of Sci.,2004.