High-Rate Vector Quantization for the Neyman-Pearson Detection of Correlated Processes
TTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 1
High-Rate Vector Quantization for theNeyman-Pearson Detection ofCorrelated Processes
Joffrey Villard,
Student Member, IEEE, and Pascal Bianchi,
Member, IEEE
Abstract
This paper investigates the effect of quantization on the performance of the Neyman-Pearson test. Itis assumed that a sensing unit observes samples of a correlated stationary ergodic multivariate process.Each sample is passed through an N -point quantizer and transmitted to a decision device which performsa binary hypothesis test. For any false alarm level, it is shown that the miss probability of the Neyman-Pearson test converges to zero exponentially as the number of samples tends to infinity, assuming that theobserved process satisfies certain mixing conditions. The main contribution of this paper is to provide acompact closed-form expression of the error exponent in the high-rate regime i.e. , when the number N of quantization levels tends to infinity, generalizing previous results of Gupta and Hero to the case ofnon-independent observations. If d represents the dimension of one sample, it is proved that the errorexponent converges at rate N /d to the one obtained in the absence of quantization. As an application,relevant high-rate quantization strategies which lead to a large error exponent are determined. Numericalresults indicate that the proposed quantization rule can yield better performance than existing ones interms of detection error. Index Terms
Binary hypothesis testing, compression, error exponents, hidden Markov models, stochastic pro-cesses, vector quantization.
J. Villard is with the Department of Telecommunications, SUPELEC, 91192 Gif-sur-Yvette, France (e-mail: [email protected]).P. Bianchi is with Telecom ParisTech, 75634 Paris Cedex 13, France (e-mail: [email protected]).The work of J. Villard is supported by DGA (French Armement Procurement Agency).
May 2011 DRAFT a r X i v : . [ c s . I T ] M a y O APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 2
I. I
NTRODUCTION
Consider a sensing unit which transmits a sequence of measurements to a decision device (DD)whose mission is to detect a given signal. For example, a CCTV camera in a surveillance systemtransmits its data to a remote controller interested in the detection of a particular object in itsfield of view. This situation also arises in the context of wireless sensor networks (WSN) wherea fusion center collects the individual measurements of a large number of identical sensors andprocesses these measurements in order to detect abnormal events [1], [2]. In such applications,due to bandwidth, delay or storage limitations, transmitted data rates are often limited. Therefore,measurements must be quantized prior to transmission. As a matter of fact, this quantization stepmay severely degrade the overall detection performance of the system.In this paper, we consider that a binary hypothesis test is performed at the DD. The availabledata set corresponds to a quantized version of a stationary ergodic discrete-time multivariateprocess. Our aim is to quantify the detection performance of a given quantizer and characterizequantization strategies which guarantee attractive performance at the DD.In the past decades, numerous papers were dedicated to the search for relevant quantizationstrategies and their practical design [3]. The most popular criterion used to select quantizers is themean square error (MSE) between the quantized signal and the initial source [4]. An analyticalcharacterization of quantizers minimizing the MSE is difficult in the general case. Bennett [5]pioneered the study of high-rate (or high-resolution ) quantization for the reconstruction ofscalar signals. The idea of Bennett was to study the MSE in the asymptotic regime wherethe number of quantization levels tends to infinity. A closed form expression of the (properlynormalized) MSE can be determined in that case, and the families of quantizers minimizing theasymptotic MSE can be directly characterized. Extension of the work of Bennett to vector-valuedobservations was later achieved in [6]. However, the MSE criterion is especially relevant whenthe aim is to reconstruct the source. On the other hand, it can be inappropriate as far as otherapplications are concerned. For this reason, various distortion measures have been proposed inthe literature in a task-oriented setting for estimation, classification and detection [7]–[18]. Inparticular, considerable attention has been paid to optimal quantization for hypothesis testing.Poor and Thomas [12] used Ali-Silvey distances between densities. Later, Poor [13] proposed thegeneralized f -divergence and studied this distortion measure in the high-rate regime. Picinbono May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 3 and Duvaut [14] considered a deflection criterion and proved that the corresponding optimalprocedure corresponds to the scalar quantization of the likelihood ratio. Tsitsiklis [15] studiedthe properties of such quantizers with respect to several distortion measures. More recently,following the initial works of Tenney and Sandell [16] and Tsitsiklis [17], Gupta and Hero [18]investigated the selection of high-rate quantizers for binary hypothesis tests. In their setting, thedecision device gathers a sequence of n independent and identically distributed (i.i.d.) variables,each of these variables being passed through a fixed quantizer. The probability density function(pdf) of the samples is assumed to be known both under the null hypothesis and the alternative.In this case, it is well known that a uniformly most powerful test is obtained by the Neyman-Pearson (NP) procedure which consists in rejecting the null hypothesis when the log-likelihoodratio (LLR) exceeds a certain threshold [19]. The threshold is usually chosen in such a waythat the probability of false alarm of the test (that is, the probability to decide the alternativeunder the null hypothesis) is fixed to a specified level , say α . The performance of the NP test oflevel α can be evaluated in terms of the miss probability (that is, the probability to decide thenull hypothesis under the alternative). In our case, the miss probability clearly depends on thequantizer used by the sensing unit. Thus, a natural approach would be to select the quantizerwhich minimizes the miss probability. Unfortunately, the miss probability does not admit anytractable expression as a function of the quantizer. To circumvent this issue, it is convenient tostudy the miss probability in the case where the number n of available snapshots tends to infinity.In case of i.i.d. observations, the celebrated Stein’s lemma [20] states that the miss probabilitytends to zero exponentially in n . Based on this result, it is relevant to select the quantizers whichyield a large value of the error exponent. Unfortunately, the maximization of the error exponentas a function of the quantizer is impractical. Following the idea of [5], [6], Gupta and Herorestrict their attention to high-rate quantizers and manage to obtain a compact expression of theerror exponent loss induced by quantization.Most of these works address the case where observations are independent random variables.However, the detection of a correlated process is a crucial issue in many applications [21]–[24].In this case, fewer results are available in the literature. Chamberland and Veeravalli [21] analyzethe impact of the density of sensors in a WSN on the detection performance, when observationsare correlated. Willett et al. [22] study the one-bit quantization of a pair of dependent Gaussianrandom variables. In case of the detection of a Gauss-Markov signal in noise, Sung et al. [23] May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 4 prove that for a fixed false alarm level, the miss probability of the NP test converges exponentiallyto zero, and provide a closed form expression of the error exponent. Hachem et al. [24] laterextended the results of [23] to irregularly sampled Gaussian diffusion processes. However, [23],[24] assume that the DD has a perfect access to the observations of the sensing unit, and do notaddress quantization issues.In this paper, we study the performance of the Neyman-Pearson test based on a quantizedversion of a stationary ergodic multivariate process. We generalize the work of Gupta andHero [18] to the case where the observed process is non-i.i.d. (either under the null hypothesis,the alternative, or both). In this situation, Stein’s lemma does not directly apply. The errorexponent does no longer admit a closed-form expression and the determination of relevantquantizers is therefore a more difficult task. Provided that the process of interest satisfiescertain forgetting properties (present observations should become nearly independent of pastobservations after a sufficient amount of time), we prove that the miss probability of the NPtest of level α tends exponentially to zero as the number of observations tends to infinity. Ourmain contribution is to provide a compact closed form expression of the error exponent incase of high-rate quantizers. If N denotes the number of quantization levels (or equivalentlyif each measurement is quantized on log ( N ) bits), we prove that the error exponent achievedwhen using quantized observations converges as N tends to infinity to the ideal error exponentthat one would obtain if perfect/unquantized measurements were available at the DD. Moreprecisely, we prove that the error exponent loss tends to zero at speed N − /d where d representsthe dimension of each individual measurement. The asymptotic error exponent depends on theprocess distributions under both hypotheses. It also depends on the quantization strategy throughthe so-called model point density and model covariation profile . The model point density can beinterpreted as the asymptotic density of cells in the neighborhood of each point of the observationspace. The model covariation profile captures the shape of the cells. As a consequence, theselection of relevant high-rate quantizers reduces to the determination of the point densities andcovariation profiles minimizing the asymptotic error exponent loss. In case of scalar quantization( d = 1 ), our compact expression immediately yields a simple characterization of optimal high-rate quantizers. In case of vector quantization ( d ≥ ), an exact characterization of optimalquantizers is more difficult. Following the approach of [18] once again, we nevertheless determinerelevant families of quantizers with attractive error exponent. Note that our theoretical results May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 5 hold under the assumption that the observed process “forgets” past observations fast enough. Asa special case, we prove that our assumptions hold for a general class of hidden Markov modelsverifying a certain contraction property. Numerical illustrations are provided in the case wherethe measurements correspond to a modulated signal in the In-phase/Quadrature plane.The paper is organized as follows. In Section II, we describe the observation model. Wealso review some known results on Neyman-Pearson tests and we derive the associated errorexponent in the ideal case where the DD has perfect access to the measurements. The vectorquantization framework is introduced in Section III. In Section IV, the impact of quantizationon the error exponent is evaluated in the high-rate regime. We determine relevant quantizationstrategies allowing to reduce this degradation. Section V is devoted to the proof of the mainresult. In Section VI, we illustrate our findings in the special case of hidden Markov processesand give sufficient conditions on the transition and observation kernels ensuring that our resultsapply. Section VII is dedicated to numerical illustrations.
Notation
For any sequence ( y i ) i ∈ Z , for any integers k ≤ (cid:96) , notation y k : (cid:96) stands for the collection ( y k , y k +1 , . . . , y (cid:96) ) and notation y Z is used to designate the whole sequence. If y is a vector withdimension d , we denote by y ( i ) its i -th component and (cid:107) y (cid:107) its Euclidean norm. We denote by (cid:107) A (cid:107) the spectral norm of any square matrix A . Notation . T stands for the transpose operator.A real-valued function f : y k : (cid:96) (cid:55)→ f ( y k : (cid:96) ) on S ⊂ R d × · · · × R d is said to be of class C on S if it is three times continuously differentiable on S . We denote by ∇ y m f ( y k : (cid:96) ) its gradientw.r.t. y m at point y k : (cid:96) . When no variable is specified, ∇ g ( y ) simply denotes the ( d -dimensional)gradient of the real-valued single-variable function y (cid:55)→ g ( y ) defined on Y ⊂ R d . We define theHessian matrix of f by (cid:2) ∇ y m ,y n f (cid:3) i,j = ∂ f∂y ( i ) m ∂y ( j ) n for all i, j ∈ { , . . . , d } . Moreover, notation ∇ y m stands for ∇ y m ,y m .Notation B ( X ) stands for the Borel σ -field on X . Notation σ ( Y n ) stands for the sub- σ -fieldof B ( Y Z ) , associated with the random vector Y n . Notation P −−−→ n →∞ stands for the convergencein probability as n → ∞ . Notation L r ( P ) −−−−→ n →∞ stands for the convergence in the L r -norm w.r.t.probability P .Notation ◦ stands for the composition operator i.e. , for any arbitrary functions f and g , f ◦ g ( x ) = f ( g ( x )) . Notation o N ( · ) is a little-o notation as N tends to infinity. May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 6
II. N
EYMAN -P EARSON D ETECTION WITH P ERFECT O BSERVATIONS
A. Observation Model
Consider two probability measures P and P on a relevant probability space. Denote by ( Y k ) k ∈ Z a stationary ergodic process for both P and P , taking its values in a bounded convexsubset Y of R d . We associate an hypothesis ( H and H respectively) to each of the twoprobability measures P and P and investigate the problem of the detection of H vs. H based on a set of n observations Y n = ( Y , . . . , Y n ) .For each i ∈ { , } , we assume that P i is the probability distribution of the coordinate process ( Y k ) k ∈ Z on the canonical space ( Y Z , B ( Y Z )) . We denote by P i,n the restriction of P i to σ ( Y n ) .We denote by E and E the expectations associated with P and P respectively. We introducethe reference measure µ which coincides with the d -dimensional Lebesgue measure restrictedto Y . Assumption 1:
The following properties hold true for each i ∈ { , } .1) For each n ≥ , P i,n admits a density p i w.r.t. µ ⊗ n .2) p i ( y n ) > for each y n ∈ Y n .3) E | log p i ( Y ) | < ∞ .The density p i of P i,n depends of course on n , but we drop the index n to simplify the notation.For each i ∈ { , } , we also define p i ( y n | y n − ) = p i ( y n ) /p i ( y n − ) with the convention that p i ( y n | y n − ) = p i ( y n ) when n = 1 (that is, when y n − is a void vector). Assumption 1-2)implies that both distributions P ,n and P ,n are absolutely continuous w.r.t. each other. B. Likelihood Ratio Test
We now investigate the detection of H vs. H based on the perfect observation of n mea-surements Y n . The log-likelihood ratio (LLR) writes: L n = log p ( Y n ) p ( Y n ) . (1)The NP test rejects the null hypothesis when L n is larger than a threshold, say γ . For each α ∈ (0 , , we define the miss probability of the NP test of level α by: β n ( α ) = inf P [ L n < γ ] , May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 7 where the infimum is w.r.t. all γ such that the probability of false alarm does not exceed α i.e. , γ s.t. P [ L n > γ ] ≤ α . For each n ≥ and each α ∈ (0 , , due to the celebrated Neyman-Pearson’s lemma, β n ( α ) is the lowest achievable miss probability among all binary tests of level α which are based onthe observation of Y n . Quantity β n ( α ) is therefore a key metric in order to characterize theperformance of the hypothesis test. Unfortunately, it usually does not admit any tractable closed-form expression. In the sequel, we study the asymptotic behaviour of β n ( α ) as the number ofobservations n tends to infinity. In this regime, it can be shown that, under certain assumptions, β n ( α ) (cid:39) exp( − n K ) (2)for some constant K given below, which we shall refer to as the error exponent. C. Error Exponent with Perfect Observations
The evaluation of the error exponent K in Equation (2) fundamentally relies on the followinglemma: Lemma 1 ( [25]):
Assume that a binary test is performed on a sequence ˇ Y n = ( ˇ Y , . . . , ˇ Y n ) of n observed random variables. Denote by ˇ p and ˇ p the density of ˇ Y n under H and H respectively (w.r.t. any common reference measure). Assume that under H , n log ˇ p ( ˇ Y n )ˇ p ( ˇ Y n ) P −−−→ n →∞ κ for some deterministic constant κ such that < κ ≤ ∞ . Then, for any α ∈ (0 , the missprobability β n ( α ) of the Neyman-Pearson test of level α is such that lim n →∞ n log β n ( α ) = − κ . Lemma 1 implies that the error exponent, if it exists, coincides with the limit in probability(under P ) of − (1 /n ) L n , where L n is the LLR defined by (1). The existence of the error exponentis directly obtained from the following assumption, which will be discussed later on. Assumption 2:
For each i ∈ { , } , (log p i ( Y | Y − m : − )) m ≥ is a convergent sequence in L ( P ) .We are now in position to study the limit of the LLR L n and prove the following result, whichprovides the general form of the error exponent. May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 8
Theorem 1:
Under Assumptions 1 and 2, lim n →∞ n log β n ( α ) = − K , where K is the constant defined by K = lim m →∞ E (cid:20) log p p ( Y | Y − m : − ) (cid:21) . (3) Proof:
Using the chain rule, we first write L n under the form: L n = − n (cid:88) k =1 log p p ( Y k | Y k − ) . Denote by Υ the limit in L ( P ) of sequence (log p p ( Y | Y − m : − )) m ≥ . The main point is thestudy of the difference log p p ( Y k | Y k − ) − Υ ◦ θ k , where θ is the shift operator . We can write: E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n L n + 1 n n (cid:88) k =1 Υ ◦ θ k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( a ) ≤ n n (cid:88) k =1 E (cid:12)(cid:12)(cid:12)(cid:12) log p p ( Y k | Y k − ) − Υ ◦ θ k (cid:12)(cid:12)(cid:12)(cid:12) ( b ) ≤ n n (cid:88) k =1 E (cid:12)(cid:12)(cid:12)(cid:12) log p p ( Y | Y − k +1: − ) − Υ (cid:12)(cid:12)(cid:12)(cid:12) , where step ( a ) comes from the triangular inequality and step ( b ) is a consequence of thestationarity of process ( Y k ) k ∈ Z under P . The right-hand side of the above inequality can beinterpreted as a Cesàro mean and thus converges to zero by definition of Υ . We thus write: − n L n = 1 n n (cid:88) k =1 Υ ◦ θ k + ε n , where ε n represents a term which converges in probability (under P ) to zero as n → ∞ . As P is stationary ergodic, we conclude using the ergodic theorem that − (1 /n ) L n converges inprobability to E (Υ) under P . This result together with Lemma 1 proves Theorem 1. Remark 1:
Let us make some remarks on the above Assumptions 1 and 2. Assumption 1 is anextension of those made by Gupta and Hero [18, Section III, pp. 1956]. Assumption 2 does notappear in [18] since it is obviously verified by i.i.d. processes. In this case, Theorem 1 is known Recall that we are considering probability measures defined on the canonical space Y Z . For any ω ∈ Y Z , we may write ω =( . . . , ω − , ω , ω , . . . ) . The k th-time shifted version of ω is then given by θ k ω = ( . . . , ω k − , ω k , ω k +1 , . . . ) . Notation Υ ◦ θ k represents the measurable function Υ ◦ θ k ( ω ) = Υ( θ k ω ) = Υ(( . . . , ω k − , ω k , ω k +1 , . . . )) . Recall that process Y Z is defined asthe coordinate process i.e. , Y n ( ω ) = ω n for each n . As a consequence, the measurable function log p p ( Y k | Y k − ) − Υ ◦ θ k atpoint ω is equal to the measurable function log p p ( Y | Y − k +1: − ) − Υ at point θ k ω . May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 9 as Stein’s lemma. Assumption 2 is trivially satisfied by short-dependent ( m -dependent) processessuch as moving average processes for instance [26]. In this case, the present observation Y isindependent of past observations Y − m − , Y − m − , . . . as soon as m is large enough. As explainedin Section VI, Assumption 2 is as well satisfied by a wide class of hidden Markov models. Remark 2:
In order that (log p ( Y | Y − m : − )) m ≥ is a convergent sequence in L ( P ) , it issufficient to check that ( E log p ( Y | Y − m : − )) m ≥ is a bounded sequence. This claim is a con-sequence of Moy [27] (see Theorem 4 therein). In practical situations, this remark provides uswith a convenient way to check whether Assumption 2 is verified for i = 0 . On the other hand,the validation of Assumption 2 for i = 1 generally requires more efforts in practice: One shouldbe able to prove that (log p ( Y | Y − m : − )) m ≥ is a Cauchy sequence in L ( P ) . Remark 3:
When P is a finite-order Markovian measure, Assumption 2 can be simply reducedto the assumption that sequence ( E log p p ( Y | Y − m : − )) m ≥ is bounded. Indeed, due to Moy [27],this hypothesis directly implies the convergence of sequence (log p p ( Y | Y − m : − )) m ≥ in L ( P ) and thus yields Theorem 1. III. Q UANTIZATION
A. Definitions
Consider a fixed integer N ≥ . An N -point quantizer is a triplet ( C N , Ξ N , ξ N ) where C N = { C N, , . . . , C N,N } is a set of N cells (Borel sets of Y with non-zero volume) which form apartition of Y , where Ξ N = { ξ N, , . . . , ξ N,N } is an arbitrary set of distinct elements and where ξ N : Y → Ξ N is a function s.t. ξ N ( y ) = ξ N,j whenever y ∈ C N,j . For each
N, k , we introduce Z N,k = ξ N ( Y k ) , the quantized measurement on log ( N ) bits. We assume that the quantizer ( C N , Ξ N , ξ N ) is knownat the decision device. The aim is to decide between hypotheses H and H based on theobservation of Z N, n .Note that in our model, each individual measurement is quantized based on the same quan-tization rule as in the traditional framework of vector-quantization [3]. It is also relevant in thecase of WSN when samples are collected using identical sensors. May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 10
B. Error Exponent
Assume that the number of quantization levels N is fixed. For a given number n of quantizedobservations, we define the LLR based on quantized measurements by: L n,N = log p ,N ( Z N, n ) p ,N ( Z N, n ) , where for each i ∈ { , } and for any set of quantization points ξ N,j n = ( ξ N,j , . . . , ξ N,j n ) ∈ Ξ nN , p i,N ( ξ N,j n ) = P i,n ( C N,j × . . . × C N,j n ) is the probability that measurements Y , . . . , Y n respectively fall into the cells C N,j , . . . , C N,j n associated with the observed points ξ N,j , . . . , ξ N,j n ( n.b. function p i,N depends on n , but weomit the index n to simplify notation). We define similarly: p i,N ( ξ N,j n | ξ N,j n − ) = p i,N ( ξ N,j n ) p i,N ( ξ N,j n − ) . For each α ∈ (0 , , we denote by β n,N ( α ) the miss probability of the NP test of level α when quantization is applied i.e. , the infimum of P [ L n,N < γ ] w.r.t. all γ s.t. P [ L n,N > γ ] ≤ α .The error exponent associated with β n,N ( α ) is provided by the following result, whose proof issimilar to the one of Theorem 1. Corollary 1:
Consider a fixed N ≥ . If Assumption 1 holds and if (log p i,N ( Z N, | Z N, − m : − )) m ≥ is a convergent sequence in L ( P ) for each i ∈ { , } then, lim n →∞ n log β n,N ( α ) = − K N , where K N is the constant defined by: K N = lim m →∞ E (cid:20) log p ,N p ,N ( Z N, | Z N, − m : − ) (cid:21) . (4)The above result provides the error exponent K N associated with the NP test on quantizedobservations. A natural question is: How does the choice of the quantizer affect the errorexponent? Unfortunately, the expression of the error exponent does not immediately allowto evaluate the impact of the quantizer. In the sequel, we thus follow the approach of [6],[18] and focus on the case where the order N of the quantizer tends to infinity. We refer tosuch quantizers as high-rate quantizers. This approach leads to a convenient and informativeasymptotic expression of K N . In particular, it will be shown that, under some assumptions onthe process ( Y k ) k ∈ Z and the quantizers sequence ( C N , Ξ N , ξ N ) N ≥ , the above error exponent K N converges to K as N tends to infinity. May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 11
IV. P
ERFORMANCE OF H IGH -R ATE V ECTOR Q UANTIZERS
A. Notation and Assumptions
For each N , we remark that the error exponent K N does not depend on the particular choice ofthe quantization alphabet Ξ N . For the sake of simplicity, we assume with no loss of generalitythat : ξ N,j = (cid:82) C N,j y dy (cid:82) C N,j dy , i.e. each ξ N,j coincides with the centroid of cell C N,j . We respectively define the volume and thediameter of cell j by V N,j = (cid:82) C N,j dy and d N,j = sup u,v ∈ C N,j (cid:107) u − v (cid:107) . We introduce the specificpoint density ζ N and the specific covariation profile M N as the piecewise constant functions on Y respectively defined as follows, for any y ∈ C N,j ( j ∈ { , . . . , N } ): ζ N ( y ) = ζ N,j = 1
N V
N,j ,M N ( y ) = M N,j = 1 V /dN,j (cid:90) C N,j ( y − ξ N,j )( y − ξ N,j ) T dy . Now consider a family of quantizers ( C N , Ξ N , ξ N ) N ≥ . We make the following assumption. Assumption 3:
The following properties hold true.1) As N → ∞ , ζ N converges uniformly to a continuous function ζ such that inf y ∈ Y ζ ( y ) > .2) As N → ∞ , M N converges uniformly to a continuous (matrix-valued) function M suchthat sup y ∈ Y (cid:107) M ( y ) (cid:107) < ∞ .3) There exists a constant C d such that, for all N , sup j d N,j ≤ C d N /d .We will refer to ζ as the model point density of the family ( C N , Ξ N , ξ N ) N ≥ . It represents thefraction of cells in the neighborhood of a given point y . Function M will be referred to as the model covariation profile . For each y ∈ Y , M ( y ) is a non-negative d × d matrix. In the literature,function y (cid:55)→ Tr ( M ( y )) is usually referred to as the inertial profile [3], [6], [18]. Function M provides information about the shape of the cells. The value of the log-likelihood ratio (and a fortiori the value of the error exponent) remains unchanged by any one-to-onetransformation of the quantized observations. Otherwise stated, the particular definition of the quantization alphabet has noimpact on the corresponding Neyman-Pearson test provided that the latter quantization alphabet is composed by N distinctelements. The i th component of ξ N,j is defined as ξ ( i ) N,j (cid:44) (cid:16)(cid:82) C N,j y ( i ) dy (cid:17) / (cid:16)(cid:82) C N,j dy (cid:17) . May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 12
Intuitively, high-rate quantizers should be constructed in such a way that ζ ( y ) is large at thosepoints y for which a fine quantization is essential to discriminate the two hypotheses. Theorem 2below provides a more rigorous formulation of this intuition. Remark 4:
Assumption 3 is essentially the same as the one traditionally made in the high-ratequantization framework [3], [6], [18]. The main difference lies in Assumption 3-3): Usually, thevolume of each cell vanishes at speed /N while the diameter tends to zero. Our assumptionintroduces a constraint on the speed of convergence of the sequence of diameters { d N,j } , whichensures that cells shrink at the same speed ( /N /d ) on each dimension. Assumption 3 is forinstance valid for sequence of quantizers constructed as companders [3], [5]. Such quantizerswrite as the composition of an invertible function (the so-called compressor ) and a uniformquantizer. Since [5], it is known that any scalar quantizer can be written as a compander. Undermild conditions on the compressor, it can be shown that any sequence of companders witha given fixed compressor satisfies Assumption 3 (in this case, the model point density ζ isfully determined by the first order derivative of the compressor). This point is discussed inSection IV-C. B. Error Exponent in the High-Rate Regime
Before stating the main result, we need further assumptions. For each m ≥ and each i ∈ { , } , define: η i ( m ) = sup m (cid:48) ≥ m E | log p i ( Y | Y − m : − ) − log p i ( Y | Y − m (cid:48) : − ) | , (5) η i,N ( m ) = sup m (cid:48) ≥ m E | log p i,N ( Z N, | Z N, − m : − ) − log p i,N ( Z N, | Z N, − m (cid:48) : − ) | Note that we already assumed in Theorem 1 and Corollary 1 that sequences log p i ( Y | Y − m : − ) and log p i,N ( Z N, | Z N, − m : − ) converge in L ( P ) as m → ∞ , meaning that η i ( m ) and η i,N ( m ) tendto zero. Now coefficients η i ( m ) and η i,N ( m ) characterize the speed at which log p i ( Y | Y − m : − ) and log p i,N ( Z N, | Z N, − m : − ) converge to their limits. They are therefore related to the mixingproperty of processes Y Z and Z N, Z (this point is discussed below in Remark 8). In the sequel,we will need to ensure that these limits are reached fast enough (see Assumption 4-3) below). Assumption 4:
The following properties hold true.1) For any n ≥ , y n (cid:55)→ p i ( y n ) is of class C on Y n . May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 13 sup { n ≥ ,y n ∈ Y n , ≤ k,(cid:96),r ≤ n, ≤ h, ¯ß , ¯ ≤ d } (cid:12)(cid:12)(cid:12) ∂ log p i ∂y ( h ) k ∂y (¯ß) (cid:96) ∂y (¯ ) r ( y n ) (cid:12)(cid:12)(cid:12) < ∞ .3) There exist two constants C e , (cid:15) > such that for each i ∈ { , } , N ≥ and m ≥ , max ( η i ( m ) , η i,N ( m )) ≤ C e (1 + m ) (cid:15) . (6)4) For each i ∈ { , } , each integers m, m (cid:48) , k such that − m (cid:48) ≤ − m ≤ ≤ k : E (cid:107)∇ y log p i ( Y k | Y − m : − ) − ∇ y log p i ( Y k | Y − m (cid:48) : − ) (cid:107) ≤ ϕ m , (7) E (cid:107)∇ y log p i ( Y k | Y − m : k − ) (cid:107) ≤ ψ k , (8)where (cid:80) k ϕ k and (cid:80) k ψ k are convergent series.Assumption 4 will be discussed in details at the end of the present subsection. Particularexamples of processes satisfying the above assumption are provided in Section VI and in thenumerical results of Section VII. We are now in position to state our main result. Recall that p ( y ) is the pdf of Y under P . Recall that K and K N are the error exponents associated withthe NP test in the absence and in the presence of quantization respectively, given by (3) and (4).Note that Assumption 4-3) implies that both sequences η i ( m ) and η i,N ( m ) tend to zero. Thisguarantees that under Assumption 1 the conclusions of Theorem 1 and Corollary 1 hold true i.e. , error exponents K and K N do exist. Theorem 2:
Under Assumptions 1, 3, 4, the following statement holds true:As N tends to infinity, N /d ( K − K N ) converges to a constant D e given by D e = 12 (cid:90) p ( y ) F ( y ) ζ ( y ) /d dy , (9)where function F is given by F ( y ) = E (cid:104) (cid:96) ( Y Z ) T M ( Y ) (cid:96) ( Y Z ) (cid:12)(cid:12)(cid:12) Y = y (cid:105) , (10)and random variable (cid:96) ( Y Z ) is the limit in L ( P ) of sequence (cid:16) ∇ y log p p ( Y − k : k ) (cid:17) k ≥ .The proof of Theorem 2 is given in Section V.Theorem 2 states that when the order of the quantizer tends to infinity, the error exponent K N associated with the NP test converges at speed N − /d to the error exponent K that one wouldhave obtained in the absence of quantization. Loosely speaking, if β n,N ( α ) represents the missprobability of the NP test of level α , the approximation β n,N ( α ) (cid:39) e − n (cid:16) K − DeN /d (cid:17) (11) May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 14 holds when both the number n of sensors and the order N of quantization are large. Quantity D e represents the (normalized) loss in error exponent between the quantized and the unquantizedcases, in the high-rate quantization regime.Note that Equation (9) resembles to Bennett’s formula [5, Equation (1.6)] and its vectorextension for r th-power distortion [6, Equation (7)]. Remark 5:
As a first consequence of Theorem 2, under some assumptions on the process, classical quantizers as those produced in an MSE perspective will lead to error exponent K N which converges to K as N tends to infinity, at speed N − /d (see Equation (11) above). Remark 6:
The particular situation where measurements ( Y k ) k ≥ are i.i.d. under both hypothe-ses was studied by Gupta and Hero [18]. In this case, function F ( y ) reduces to: F ( y ) = ∇ Λ( y ) T M ( y ) ∇ Λ( y ) , where Λ( y ) = log p ( y ) p ( y ) is the single sample LLR. Then, expression (9) of D e is consistent withthe results of Gupta and Hero (see in particular [18, Equation (20)]).Note that we assume that each joint density p ( y n ) and p ( y n ) is of class C on Y n .Gupta and Hero’s assumption is weaker, since they only assume that “ the densities are twicecontinuously differentiable on an open set of probability ” [18, page 1956]. In fact, we needconditions on the third derivatives of the logarithm of the densities in order to find relevant upperbounds of the Taylor-Lagrange remainders in the expansion of the joint densities p i ( y − m : u ) inthe general case (see the detailed proof in Section V). Remark 7:
We now provide some insights on the meaning of Assumption 4 and on the classof stationary processes which satisfy the latter. Assumptions 4-1) and 4-2) are mild technicalconditions on the smoothness of the pdf of the observations. They encompass a large family ofstochastic processes and are generally simple to validate on a case-by-case basis. As explainedabove, Assumption 4-3) can be interpreted as a condition on the speed at which past observationsare forgotten. Quantities η i ( m ) and η i,N ( m ) can be interpreted as conditional mixing coefficientsassociated with the unquantized and quantized processes ( Y k ) k and ( Z N,k ) k respectively (seeRemark 8 below). Past observations must be forgotten at least at a polynomial speed fasterthan m . Assumption 4-4) can be interpreted similarly as a forgetting property, which nolonger involves the logarithm of the density of the observations, but its derivative. For instance,Assumption 4 is simple to verify in case of short-dependent processes (such as moving average May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 15 processes for instance) provided that the density of the observation is smooth enough. A similarremark holds for a wide class of Markov chains. In this case, Assumption 4 essentially reducesto a smoothness assumption on the density of the transition kernel. More generally, we provein Section VI that Assumption 4 holds for a wide class of hidden Markov models: We providesufficient conditions on the transition kernel such that Assumption 4 holds. See also the numericalresults in Section VII.
Remark 8:
It is worth making some remarks on the link between Assumption 4 and standardmixing conditions used in the literature on mixing processes [26], [28], [29]. The mixing propertywhich is the closest to our setting is related to the notion of ψ -mixing. For two σ -fields U and V , define the following coefficient [26], [28]: ψ ( U , V ) = sup U ∈U ,V ∈V P ( U ) > , P ( V ) > (cid:12)(cid:12)(cid:12)(cid:12) − P ( U ∩ V ) P ( U ) P ( V ) (cid:12)(cid:12)(cid:12)(cid:12) . Recall that a stochastic process Y Z is said to be ψ -mixing when the sequence of ψ -mixingcoefficients ψ ( σ ( Y n +1 ) , σ ( Y −∞ :0 )) converges to zero. The classical ψ -mixing condition traducesthe fact that, loosely speaking, current samples at time n tend to become independent of pastsamples Y , Y − , . . . as n increases. In our case, we need to ensure that current samples becomeindependent of past ones conditionally to intermediate values Y n . Usual ψ -mixing coefficient donot fully allow to grasp this property. In [30], we introduced the following conditional ψ -mixingcoefficient for σ -fields U , V and W : ¯ ψ i ( U , V|W ) = sup U ∈U , V ∈V ess sup (cid:12)(cid:12)(cid:12)(cid:12) − P i ( U ∩ V |W ) P i ( U |W ) P i ( V |W ) (cid:12)(cid:12)(cid:12)(cid:12) where the essential supremum is taken w.r.t. P and where we use the convention / . Theabove coefficient can be interpreted as a measure of dependence (under P i ) between U and V conditionally to W . In particular, it coincides with the traditional ψ -mixing coefficient ψ ( U , V ) when W is taken to be the whole space B ( Y Z ) and P = P . For each n ≥ , we further define ¯ ψ i ( n ) = ¯ ψ i ( σ ( Y n +1 ) , σ ( Y −∞ :0 ) | σ ( Y n )) and ¯ ψ i (0) = ¯ ψ i ( σ ( Y ) , σ ( Y −∞ :0 )) when n = 0 . Thereexists a close link between the above conditional mixing coefficients and the set of coefficients η i ( m ) defined in (5). In particular, Theorem 2 is valid when Assumption 4-2) is replaced by theassumption that sequences ¯ ψ ( n ) and ¯ ψ i,N ( n ) = ¯ ψ i ( σ ( Z N,n +1 ) , σ ( Z N, −∞ :0 ) | σ ( Z N, n )) convergeto zero at speed n (cid:15) . We refer to [30] for details. May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 16
The asymptotic loss in error exponent D e depends on the quantizer through its model pointdensity ζ and its model covariation profile M . In the sequel, we study the values of theseparameters which attenuate as much as possible the loss D e . C. Determination of Relevant High-Rate Quantizers: Scalar case ( d = 1 ) We first address the case where measurements ( Y k ) k ≥ are real-valued. Assume without muchloss of generality that each cell is connected (cells are intervals) i.e. , the quantizer is regular [4].In this case, a straightforward derivation leads to M N ( y ) = 1 / for each y and each N .Therefore, function F simplifies to: F ( y ) = 112 E (cid:104) (cid:96) ( Y Z ) (cid:12)(cid:12)(cid:12) Y = y (cid:105) = 112 lim k →∞ E (cid:34)(cid:18) ∂∂y log p p ( Y − k : k ) (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) Y = y (cid:35) . Using Holder’s inequality on (9), it is straightforward to prove the following result.
Corollary 2:
Assume that d = 1 and that cells are intervals. The error exponent loss D e issuch that: D e ≥ (cid:18)(cid:90) [ p ( y ) F ( y )] / dy (cid:19) , (12)where equality holds in (12) when the model point density coincides with: ζ ( y ) = [ p ( y ) F ( y )] / (cid:82) [ p ( s ) F ( s )] / ds . The above corollary provides the optimal high-rate quantization rule for the initial hypothesistesting problem. Note that expression (12) is quite similar to [31, Equation (15)] which gives“the minimum distortion resulting with optimum level spacing” in an MSE perspective.
Remark 9:
In practice, N -point scalar quantizer achieving a given model point density ζ canbe easily implemented by means of a compander. Recall that a compander is defined as thecomposition of an invertible continuous function φ (the so-called compressor) and a uniformquantizer [3], [5]. To that end, it is sufficient to define the compressor φ as the primitive of ζ on the observation space. For example, if Y is the segment [ a, b ] ⊂ R , define φ ( x ) = (cid:82) xa ζ ( t ) dt .Next the output of the compander is quantized using a uniform N -point quantizer on the interval [0 , . Under the assumption that ζ is a Lipschitz function, it is straightforward to show that theresulting sequence of quantizers satisfies Assumption 3 i.e. , that it achieves the model pointdensity ζ . May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 17
D. Determination of Relevant High-Rate Quantizers: Vector case ( d ≥ ) In the vector case, the determination of optimal high-rate quantization rules implies the jointminimization of expression (9) w.r.t. both functions ζ and M . Unfortunately, as remarked in [3],[32], it is not known what functions M are allowable as covariation profiles. The determinationof the set of admissible couples ( ζ, M ) is an open problem, which is beyond the scope of thispaper.However, when M is fixed, the point density ζ which minimizes D e can be easily expressedas a function of M . Once again, this is a consequence of Holder’s inequality: D e ≥ (cid:18)(cid:90) [ p ( y ) F ( y )] dd +2 dy (cid:19) d +2 d , where equality is achieved when the point density coincides with: ζ ( y ) = [ p ( y ) F ( y )] dd +2 (cid:82) [ p ( s ) F ( s )] dd +2 ds . (13)In other words, one can easily provide the optimal high-rate quantization rule for a given limitingcovariation profile. Following the approach of [18], we study two special cases of covariationprofile:
1) Congruent cells with minimum moment of inertia:
In this paragraph, we focus on congruentcells with minimum moment of inertia i.e. , we assume that ∀ y ∈ Y , M ( y ) = νI d , (14)for some ν > , where I d represents the d × d identity matrix.Recall that Gersho [33] made the now widely accepted conjecture that when N tends to infinity,most cells ( i.e. , all the cells except those which are close to the boundary of the considereddomain) of a d -dimensional MSE-optimal quantizer become congruent to some tessellating d -dimensional polytope H ∗ d . In such a case, M ( y ) is independent of y . Furthermore, Zamir andFeder [34, Lemma 1] proved that the cells of the MSE-optimal lattice quantizers become “closer”to balls i.e. , with minimum moment of inertia, as dimension d grows.For quantizers with covariation profile given by (14), the optimal point density (13) becomes: ζ ( y ) = (cid:2) p ( y ) ¯ F ( y ) (cid:3) dd +2 (cid:82) (cid:2) p ( s ) ¯ F ( s ) (cid:3) dd +2 ds , (15) May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 18 where function ¯ F is defined by ¯ F ( y ) = E (cid:104) (cid:107) (cid:96) ( Y Z ) (cid:107) (cid:12)(cid:12)(cid:12) Y = y (cid:105) = lim k →∞ E (cid:34)(cid:13)(cid:13)(cid:13)(cid:13) ∇ y log p p ( Y − k : k ) (cid:13)(cid:13)(cid:13)(cid:13) (cid:12)(cid:12)(cid:12) Y = y (cid:35) . (16) Design Algorithm:
In practice, one would like to design an N -point quantizer whichpoint density approximately equals (15) for some finite N . This can be achieved by meansof well-established algorithms, the most popular of them being the Linde-Buzo-Gray (LBG)algorithm [35]. This algorithm is an iterative method which computes a Voronoi tessellation,and yields an MSE-optimal N -point quantizer, from a training set of data of some pdf p ( y ) .An ( N -point) MSE-optimal quantizer for density p ( y ) minimizes E (cid:2) (cid:107) Y − ξ N ( Y ) (cid:107) (cid:3) . Asthe number of quantization points N tends to infinity, such a quantizer has the following modelpoint density [3], [6]: ζ MSE ( y ) = p ( y ) dd +2 (cid:82) p ( s ) dd +2 ds . (17)Comparing Equations (15) and (17), we deduce that the proposed quantizer, whose model pointdensity ζ is given by Equation (15), can be obtained in practice by simply feeding the classicalLBG algorithm with a training set of data of the following pdf: q ∗ ( y ) = p ( y ) ¯ F ( y ) (cid:82) p ( s ) ¯ F ( s ) ds . Section VII provides numerical illustrations of this approach.
2) Ellipsoidal cells:
In order to yield some insights on the general shape of the cells, andfollowing [18], we focus in this paragraph on ellipsoidal cells. This kind of cells can not partitionthe considered convex subset Y of R d but, for large dimension d , in analogy with the sphericalcell approximation [3], [34], [36], we may assume that almost all cells of a given quantizer areclose to ellipsoids.Such an ellipsoidal cell, in the neighborhood of point y writes C = { x : ( x − y ) T R ( y ) ( x − y ) ≤ } , for some symmetric positive definite matrix R ( y ) . The corresponding covariation profilewrites M ( y ) = ν | R ( y ) | /d R ( y ) − [18], [37], for some ν > , and has an eigendecomposition M ( y ) = U ( y ) Φ( y ) U ( y ) T , May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 19 where Φ( y ) = Diag ( φ d ( y )) , and U ( y ) is an orthogonal matrix. Note that the (positive)eigenvalues ( φ i ( y )) i ∈{ ,...,d } of M ( y ) capture the relative importance of the axes of the ellipsoid C ,while the columns ( u i ( y )) i ∈{ ,...,d } of U ( y ) i.e. , the eigenvectors of M ( y ) , indicate their respectivedirections.In this paragraph, we assume that eigenvalues ( φ i ) i ∈{ ,...,d } are fixed, constant w.r.t. y and,without loss of generality, sorted in increasing order i.e. , < φ ≤ φ ≤ · · · ≤ φ d . We wantto find the best orthogonal matrix U ( y ) i.e. , the one which minimizes function F ( y ) , given atEquation (10), in order to minimize the error exponent loss D e (9). In other words, for a given“shape” of (non-degenerate) ellipsoid, we look for the best directions of its axes. Function F ( y ) writes: F ( y ) = E (cid:104) (cid:96) ( Y Z ) T M ( Y ) (cid:96) ( Y Z ) (cid:12)(cid:12)(cid:12) Y = y (cid:105) = Tr (cid:0) U ( y ) Φ U ( y ) T ¯ L ( y ) (cid:1) , (18)where ¯ L ( y ) = E (cid:104) (cid:96) ( Y Z ) (cid:96) ( Y Z ) T (cid:12)(cid:12)(cid:12) Y = y (cid:105) . Now write the eigendecomposition of the positivedefinite matrix ¯ L ( y ) : ¯ L ( y ) = V ( y ) ∆( y ) V ( y ) T , where ∆( y ) = Diag ( λ d ( y )) , ( λ i ( y )) i ∈{ ,...,d } are the (positive) eigenvalues of ¯ L ( y ) sorted inincreasing order i.e. , < λ ( y ) ≤ λ ( y ) ≤ · · · ≤ λ d ( y ) , and V ( y ) is an orthogonal matrix.Equation (18) thus writes: F ( y ) = Tr ( U ( y ) Φ U ( y ) T V ( y ) ∆( y ) V ( y ) T ) ≥ d (cid:88) i =1 φ i λ d − i +1 ( y ) , where the last inequality follows from a well-known trace inequality for positive semidefiniteHermitian matrices [38], [39, Section 9-H]. The above lower bound is furthermore achievedchoosing matrix U ( y ) such that U ( y ) T V ( y ) is the anti-diagonal matrix with ones on its anti-diagonal i.e. , defining the i th column of matrix U ( y ) as the ( d − i + 1) th column of matrix V ( y ) ,or equivalently eigenvector u i ( y ) of matrix M ( y ) as eigenvector v d − i +1 ( y ) of matrix ¯ L ( y ) . For any given d -dimensional vector x d ∈ R d , Diag ( x d ) represents the d -by- d diagonal matrix with diagonal coefficients ( x , x , . . . , x d ) . May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 20
From the above derivations, we conclude that if a cell is a non-degenerate ellipsoid around y then its axes should be aligned along the ones of matrix ¯ L ( y ) in the reverse order. In particular,its minor axis should be aligned along the principal eigenvector of matrix ¯ L ( y ) .V. P ROOF OF T HEOREM A. Preliminaries
Recall that V N,j = (cid:82) C N,j dy is the volume of cell C N,j ( j ∈ { , . . . , N } ). For each i ∈ { , } and each set of quantization points ξ N,j n = ( ξ N,j , . . . , ξ N,j n ) ∈ Ξ nN , define the following rescaledpdf of Z N, n : ¯ p i,N ( ξ N,j n ) = 1 V N,j × . . . × V N,j n p i,N ( ξ N,j n )= 1 V N,j × . . . × V N,j n P i,n ( C N,j × . . . × C N,j n ) . (19)The above definition is convenient because ¯ p i,N ( ξ N,j n ) (cid:39) p i ( ξ N,j n ) for large values of N . Thisapproximation will be of prime importance in the sequel. We define function ¯ p i,N ( ξ N,j n | ξ N,j n − ) similarly.For each i ∈ { , } and each integer (cid:96) ≥ , we introduce the following functions: ∀ y − (cid:96) :0 ∈ Y (cid:96) +1 , L i ( y − (cid:96) :0 ) = log p i ( y | y − (cid:96) : − ) , ∀ z − (cid:96) :0 ∈ Ξ (cid:96) +1 N , L i,N ( z − (cid:96) :0 ) = log ¯ p i,N ( z | z − (cid:96) : − ) . Due to Assumptions 1-3) and 4-3) (which ensures that η i (0) < ∞ ), random sequence ( L i ( Y − (cid:96) :0 )) (cid:96) ≥ lies in L ( P ) . Moreover, Assumption 4-3) for large m ensures that sequence ( L i ( Y − (cid:96) :0 )) (cid:96) ≥ is aCauchy sequence of L ( P ) . Denote by L i ( Y −∞ :0 ) its limit. From Assumption 4-3) once again,the following holds for any (cid:96) ≥ , E |L i ( Y − (cid:96) :0 ) − L i ( Y −∞ :0 ) | ≤ C e (1 + (cid:96) ) (cid:15) . (20)A similar result holds for sequence ( L i,N ( Z N, − (cid:96) :0 )) (cid:96) ≥ which converges in L ( P ) to somerandom variable L i,N ( Z N, −∞ :0 ) and verifies for any (cid:96) ≥ , E |L i,N ( Z N, − (cid:96) :0 ) − L i,N ( Z N, −∞ :0 ) | ≤ C e (1 + (cid:96) ) (cid:15) . (21)Our aim is to study the difference K − K N between error exponents associated with the idealand quantized cases respectively. We may write the difference as K − K N = ( K − K ,N ) − ( K − K ,N ) , (22) May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 21 where we defined for each i ∈ { , } , K i = E [ L i ( Y −∞ :0 )] ,K i,N = E [ L i,N ( Z N, −∞ :0 )] . In the sequel, we focus on the study of K − K ,N , the study of K − K ,N being similar.We now proceed with the proof. Choose any (cid:15) (cid:48) such that < (cid:15) (cid:48) < (cid:15) d (6+ (cid:15) ) . Define the sequenceof integers m = m ( N ) = (cid:98) N / (3 d ) − (cid:15) (cid:48) (cid:99) . We shall remember that with this definition, lim N →∞ m N /d = 0 . (23)The following decomposition holds true: K ,N = K + T N + U N + δ N , where we defined: T N = E [ L ,N ( Z N, − m :0 ) − L ( Z N, − m :0 )] ,U N = E [ L ( Z N, − m :0 ) − L ( Y − m :0 )] ,δ N = E [ L ,N ( Z N, −∞ :0 ) − L ,N ( Z N, − m :0 )] + E [ L ( Y − m :0 ) − L ( Y −∞ :0 )] . Using Equations (20) and (21), it is straightforward to show that N /d | δ N | ≤ C e N /d (1 + m ) (cid:15) . By definition of m = m ( N ) , we deduce that N /d | δ N | converges to zero as N → ∞ . As aconsequence, the asymptotic analysis of quantity N /d ( K ,N − K ) reduces to the study of T N and U N .As Y is a bounded set, Assumption 4-2) implies the following bounds on the derivatives ofdensity p which will be of permanent use in the sequel: sup { y n ∈ Y n , ≤ k ≤ n } (cid:107)∇ y k log p ( y n ) (cid:107) ≤ C , (24) sup { y n ∈ Y n , ≤ k ≤ n } (cid:13)(cid:13) ∇ y k log p ( y n ) (cid:13)(cid:13) ≤ C , (25)for some constants C and C . B. Study of T N We expand T N as follows: T N = E (cid:20) log ¯ p ,N ( Z N, − m :0 ) p ( Z N, − m :0 ) (cid:21) − E (cid:20) log ¯ p ,N ( Z N, − m : − ) p ( Z N, − m : − ) (cid:21) . (26) May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 22
We now study each term of the r.h.s. of the above equality. Consider u ∈ {− , } . Writingthe Taylor-Lagrange expansion of function y − m : u (cid:55)→ p ( y − m : u ) at point ξ N,j − m : u , using Assump-tions 3-3), 4 and the properties of the quantizers sequence, we prove the following lemma (thedetailed proof is given in Appendix A). Lemma 2:
For each j − m : u ∈ { , . . . , N } u + m +1 , the following expansion holds true: ¯ p ,N ( ξ N,j − m : u ) p ( ξ N,j − m : u ) = 1 + 12 N /d u (cid:88) k = − m Tr (cid:32) ∇ y k p ( ξ N,j − m : u ) T p ( ξ N,j − m : u ) M N,j k ζ /dN,j k (cid:33) + (cid:15) N,j − m : u , where (cid:12)(cid:12) (cid:15) N,j − m : u (cid:12)(cid:12) ≤ c T (cid:0) m +1 N /d (cid:1) for some constant c T .Plugging the above equation into (26), using (cid:12)(cid:12) log(1 + x ) − x (cid:12)(cid:12) ≤ x in a neighborhood of zero,Assumptions 3, 4-2) and Equation (23), we obtain: T N = T N (0) − T N ( −
1) + o N ( N − /d ) , (27)where, for each u ∈ {− , } , T N ( u ) = 12 N /d u (cid:88) k = − m E (cid:20) Tr (cid:18) ∇ y k p ( Z N, − m : u ) T p ( Z N, − m : u ) M N ( Y k ) ζ N ( Y k ) /d (cid:19)(cid:21) . (28) C. Study of U N We expand U N as follows: U N = E [log p ( Z N, − m :0 ) − log p ( Y − m :0 )] − E [log p ( Z N, − m : − ) − log p ( Y − m : − )] , (29)and study each term of the r.h.s. of the above equality. For each u ∈ {− , } and each j − m : u ∈{ , . . . , N } u + m +1 , we expand function y − m : u (cid:55)→ log p ( y − m : u ) at point ξ N,j − m : u : log p ( y − m : u ) = log p ( ξ N,j − m : u ) + u (cid:88) k = − m ∇ y k log p ( ξ N,j − m : u ) T ( y k − ξ N,j k )+ 12 u (cid:88) k,(cid:96) = − m ( y k − ξ N,j k ) T ∇ y k ,y (cid:96) log p ( ξ N,j − m : u ) ( y (cid:96) − ξ N,j (cid:96) ) + (cid:15) (cid:48) N ( y − m : u ) . (30)Under Assumptions 3-3) and 4-2), for each y − m : u ∈ C N,j − m × · · · × C N,j u , the remainder is suchthat | (cid:15) (cid:48) N ( y − m : u ) | ≤ ( m + 1) c (cid:48) (cid:18) C d N /d (cid:19) , May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 23 for some constant c (cid:48) . By Equation (23), the r.h.s. of the above inequality converges to zero as N tends to infinity faster than N − /d . Plugging Taylor expansion (30) into the expression (29)of U N , we obtain: U N = U N (0) − U N ( −
1) + o N ( N − /d ) , (31)where, for each u ∈ {− , } , U N ( u ) = − u (cid:88) k = − m E [ ∇ y k log p ( Z N, − m : u ) T ( Y k − Z N,k )] − u (cid:88) k,(cid:96) = − m E (cid:2) ( Y k − Z N,k ) T ∇ y k ,y (cid:96) log p ( Z N, − m : u ) ( Y (cid:96) − Z N,(cid:96) ) (cid:3) . (32)The next step is to study each dominant term of the r.h.s. of (32). The proof of the followinglemma is provided in Appendix B. Lemma 3:
The following equality holds true for each u ∈ {− , } : U N ( u ) = A N ( u ) + B N ( u ) + o N ( N − /d ) , where A N and B N are defined as follows: A N ( u ) = − N /d u (cid:88) k = − m E (cid:20) ∇ y k log p ( Z N, − m : u ) T M N ( Y k ) ζ N ( Y k ) /d ∇ y k log p ( Y − m : u ) (cid:21) , (33) B N ( u ) = − N /d u (cid:88) k = − m E (cid:20) Tr (cid:18) ∇ y k log p ( Z N, − m : u ) M N ( Y k ) ζ N ( Y k ) /d (cid:19)(cid:21) . Now we expand the term ∇ y k log p as follows: ∇ y k log p ( y − m : u ) = ∇ y k p ( y − m : u ) p ( y − m : u ) − ∇ y k p ( y − m : u ) ∇ y k p ( y − m : u ) T ( p ( y − m : u )) . From the above decomposition and Equation (28), we can divide B N ( u ) into two terms: B N ( u ) = 12 N /d u (cid:88) k = − m E (cid:20) Tr (cid:18) ∇ y k log p ( Z N, − m : u ) ∇ y k log p ( Z N, − m : u ) T M N ( Y k ) ζ N ( Y k ) /d (cid:19)(cid:21) − T N ( u ) . Expanding function ∇ y k log p in the above equation and in (33), we can write dominant termsin a simple form i.e. , replace each Z N by Y . Under Assumption 3, from Equations (25) and (23),we can easily prove that the corresponding remainders are o N ( N − /d ) . May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 24
Putting all pieces together, we obtain U N ( u ) = − N /d u (cid:88) k = − m E (cid:20) ∇ y k log p ( Y − m : u ) T M N ( Y k ) ζ N ( Y k ) /d ∇ y k log p ( Y − m : u ) (cid:21) + 12 N /d u (cid:88) k = − m E (cid:20) ∇ y k log p ( Y − m : u ) T M N ( Y k ) ζ N ( Y k ) /d ∇ y k log p ( Y − m : u ) (cid:21) − T N ( u ) + o N ( N − /d ) . (34) D. End of the Proof
From the results of sections V-B and V-C, we can easily prove the following lemma.
Lemma 4:
The following holds true: N /d ( K − K N ) = E [ H N, ( Y − m :0 )] + − (cid:88) k = − m E [ H N,k ( Y − m :0 ) − H N,k ( Y − m : − )] + o N (1) , (35)where for each u ∈ {− , } , each m ≥ and each k ∈ {− m, . . . , u } : H N,k ( Y − m : u ) = 12 ∇ y k log p p ( Y − m : u ) T M N ( Y k ) ζ N ( Y k ) /d ∇ y k log p p ( Y − m : u ) . (36) Proof:
Recalling the decomposition: K ,N = K + T N + U N + o N ( N − /d ) and gatheringEquations (27), (31), (34), it is straightforward to prove the following equality: N /d ( K ,N − K ) = − (cid:88) k = − m E (cid:20) ∇ y k log p ( Y − m :0 ) T M N ( Y k ) ζ N ( Y k ) /d ∇ y k log p ( Y − m :0 ) (cid:21) + 12 (cid:88) k = − m E (cid:20) ∇ y k log p ( Y − m :0 ) T M N ( Y k ) ζ N ( Y k ) /d ∇ y k log p ( Y − m :0 ) (cid:21) + − (cid:88) k = − m E (cid:20) ∇ y k log p ( Y − m : − ) T M N ( Y k ) ζ N ( Y k ) /d ∇ y k log p ( Y − m : − ) (cid:21) − − (cid:88) k = − m E (cid:20) ∇ y k log p ( Y − m : − ) T M N ( Y k ) ζ N ( Y k ) /d ∇ y k log p ( Y − m : − ) (cid:21) + o N (1) . Similar expression holds for N /d ( K ,N − K ) –replace all p by p in the above equation.Lemma 4 follows from decomposition (22). May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 25
We now study the series (35). From Assumptions 3, 4-2) and 4-4), the following forgettingproperties hold true for any positive integers (cid:96) (cid:48) , (cid:96) and any integers k , u s.t. − (cid:96) (cid:48) ≤ − (cid:96) ≤ k ≤ u : E |H N,k ( Y − (cid:96) : u ) − H N,k ( Y − (cid:96) (cid:48) : u ) | ≤ c h ϕ (cid:96) −| k | , (37) E |H N,k ( Y − (cid:96) :0 ) − H N,k ( Y − (cid:96) : − ) | ≤ c h ψ | k | , (38)for some constant c h .It is clear from (37) that sequence ( H N,k ( Y − (cid:96) : u ))) (cid:96) ≥− u is a Cauchy sequence in L ( P ) . Wesimply denote its limit by H N,k ( Y −∞ : u ) . Inequalities (37) and (38) provide the main tools for theasymptotic analysis of series (35). The proof of the following lemma is given in Appendix C. Lemma 5:
The following holds true: N /d ( K − K N ) = E [ H N, ( Y −∞ :0 )] + − (cid:88) k = −∞ E [ H N,k ( Y −∞ :0 ) − H N,k ( Y −∞ : − )] + o N (1) . As process ( Y k ) k ∈ Z is stationary, the expectation E enclosed in the sum of the above equationis invariant w.r.t. a time-shift. Using this remark, we obtain after algebra N /d ( K − K N ) = lim k →∞ E [ H N, ( Y −∞ : k )] + o N (1) . (39)For a fixed k ≥ , Equation (7) ensures that sequence (cid:16) ∇ y log p p ( Y − m : k ) (cid:17) m ≥ is a Cauchysequence in L ( P ) . Denote its limit by (cid:96) k ( Y −∞ : k ) . The upper bound of Equation (8) is uniformin m . Consequently, it also holds for sequence ( (cid:96) k ( Y −∞ : k )) k ≥ : E (cid:107) (cid:96) k ( Y −∞ : k ) − (cid:96) k − ( Y −∞ : k − ) (cid:107) ≤ ψ k . Under Assumption 4-4), (cid:80) k ψ k is a convergent series. Sequence ( (cid:96) k ( Y −∞ : k )) k ≥ is thus aCauchy sequence in L ( P ) . Denote its limit by (cid:96) ( Y Z ) . Moreover, the upper bound of Equation (7)(resp. Equation (8)) is uniform in m (cid:48) (resp. m ). It is then straightforward to prove that (cid:96) ( Y Z ) coincides with the L ( P ) -limit of sequence (cid:16) ∇ y log p p ( Y − k : k ) (cid:17) k ≥ .From Equation (24) and its counterpart for density p , quantity ∇ y log p p ( Y − k : k ) is uniformlybounded. Consequently, the above limit also holds in the L ( P ) -sense: ∇ y log p p ( Y − k : k ) L ( P ) −−−−→ k →∞ (cid:96) ( Y Z ) . (40)Plugging Equations (36) and (40) in Equation (39) and letting N tend to ∞ complete theproof of Theorem 2. May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 26
VI. I
LLUSTRATION : C
ASE OF A H IDDEN M ARKOV P ROCESS
In this section, we translate our assumptions in the case of (discrete-time) hidden Markovmodels. For such models, they reduce to simpler conditions on the transition kernel of theunderlying Markov chain, and on the observation kernel. This context, where the measurementsare noisy samples of a certain Markov source, has raised a deep interest in the recent literatureon sensor networks (see [23], [24] and reference therein).Consider a stationary Markov process ( X k ) k ≥ taking its values in an arbitrary state space X ,and playing the role of a source signal to be detected. For each i ∈ { , } and each integer t , weassume that the (iterated) transition kernel P i [ X k + t ∈ · | X k = x ] admits a density x (cid:48) (cid:55)→ q ti ( x, x (cid:48) ) w.r.t. some probability measure λ on ( X , B ( X )) . Assume that there exist an integer m , and tworeal numbers σ − , σ + s.t., for each i ∈ { , } and each ( x, x (cid:48) ) ∈ X , < σ − ≤ q mi ( x, x (cid:48) ) ≤ σ + .In particular, this assumption implies that the Markov chain ( X k ) k ∈ Z has bounded support.If the state space X is finite, the above conditions hold if the Markov chain ( X k ) k ∈ Z is irreducible aperiodic , choosing λ as the (normalized) counting measure on X . In this case, thechain indeed admits a stationary distribution, and q mi ( x, x (cid:48) ) > for each x , x (cid:48) and some integer m [40, Section 8].The states X k of the above Markov source are supposed to be hidden. However, a “noisy”version Y k ( ∈ Y ⊂ R d ) of X k is available at the k th sensor. We assume that the distribution P [ Y k ∈ · | X k = x ] does not depend on the hypothesis H or H , and admits a density y (cid:55)→ g ( x, y ) w.r.t. the d -dimensional Lebesgue measure µ restricted to Y , such that < inf x,y g ( x, y ) ≤ sup x,y g ( x, y ) < ∞ . We furthermore assume that this density verifies some smoothness condi-tions: For each x ∈ X , y (cid:55)→ g ( x, y ) is of class C on Y , and sup { x ∈ X , y ∈ Y , ≤ h, ¯ß , ¯ ≤ d } (cid:12)(cid:12)(cid:12) ∂ g∂y ( h ) ∂y (¯ß) ∂y (¯ ) ( x, y ) (cid:12)(cid:12)(cid:12) < ∞ . The situation is depicted in Figure 1.A similar assumption was recently introduced by [41], [42] in order to study the asymptoticbehaviour of the log-likelihood log p i ( Y n ) as n tends to infinity. In particular, it was shownthat: | log p i ( Y | Y − m : − ) − log p i ( Y | Y − m (cid:48) : − ) | ≤ − σ − /σ + (cid:18) σ − σ + (cid:19) m − for each m (cid:48) ≥ m ≥ . This clearly proves that sequence log p i ( Y | Y − m : − ) converges in L ( P ) as m → ∞ and yields Assumption 2. Moreover, the convergence holds at exponential speed,meaning that quantities η i ( m ) , defined by Equation (5), vanish faster than /m . The same claim May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 27 · · · X k − X k X k + · · · q / q / q / Y k − Y k Y k + g ξ N ξ N ξ N Fusion Center Z N , k − Z N , k Z N , k + decision Figure 1: Detection of a discrete-time Markov process based on noisy observations.holds as well for quantities η i,N ( m ) , without need for any special condition on the quantizer(quantization preserves the hidden Markov nature of the original process ( Y k ) k ∈ Z ). This yieldsAssumption 4-3).Assumptions 4-1) and 4-2) are direct consequences of the above smoothness conditions ondensity g . Assumption 4-4) can be derived following the arguments of [41], [42].The following proposition then follows from the results of [41], [42]. The proof is thereforeomitted. Proposition 1:
All conditions given by Assumptions 1 and 4 hold true for the particularprocess ( Y k ) k ∈ Z described in this section.As a consequence, if the family of quantizers moreover verifies Assumption 3, then theconclusions of Theorems 1 and 2 hold true.Section VII-A below provides a practical example of such a detection problem.VII. N UMERICAL R ESULTS
In this section, we provide numerical illustrations of the proposed quantization rule in termsof geometric properties and performance. Different contexts are considered and we compareseveral quantizers: • The proposed quantizer , obtained using the approach described in Section IV-D1 and whosemodel point density is given by (15).
May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 28 • The
MSE-optimal quantizer , which minimizes E (cid:107) Y − Z N, (cid:107) and whose model pointdensity is given by (17). • Gupta-Hero quantizer , introduced in [18]: In this case the model point density is drawn asif observations were i.i.d. i.e. , only taking the marginal distributions p ( y ) and p ( y ) intoaccount. • The uniform quantizer with constant model point density.
A. Scenario
In this section, we provide an example of hidden Markov models which verify the assumptionsgiven at Section VI, and detail how to use in this case the approach described in Section IV-D1for the design of practical quantizers.
1) Observation Model:
We consider the following model for vector observations with dimen-sion d = 2 : Y k = T ( X k ) + W k , (41)where ( X k ) k ∈ Z is a 2-bit message, which takes values in X = { , , , } , T ( x ) is the -D representation of state x in the I-Q plane according to Figure 2, and W k i.i.d. ∼ CN (0 , σ ) represents a zero mean circular Gaussian thermal noise with variance σ . Process ( X k ) k ∈ Z isi.i.d., uniformly distributed under H , and forms a Markov chain under H . More precisely, H X k i.i.d. ∼ U { , , , } H X ∼ U { , , , } , P [ X k +1 = x (cid:48) | X k = x ] = q ( x, x (cid:48) ) , where q is the transition matrix of the Markov chain and is given by: q = / / / / / / / / / / / / . This situation arises when testing from noisy observations between two possible quaternary mod-ulations, namely quadrature phase-shift keying (QPSK) and offset quadrature phase-shift keying(OQPSK), in the In-phase/Quadrature plane [43, Chapter 3]. The corresponding constellationsare depicted in Figure 2. T (0) = [ − − , T (1) = [ −
1; 1] , T (2) = [1; 1] , T (3) = [1; − . May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 29 IQ
01 231 / / / / / / / / / / IQ
01 231 / / / / / / / / (a) (b)Figure 2: QPSK vs. OQPSK – Constellation diagrams and transitions probabilities for (a)QPSK, (b) OQPSK.In the observation model (41), densities have infinite support. We thus consider truncatedobservations on Y = [ − M ; M ] for some positive real number M [44, Section 10.1]. The new(truncated) model is a hidden Markov model with observation density g ( x, y ) given by: g ( x, y ) = [ − M ; M ] ( y ) C M ( σ ) exp (cid:18) − σ ( y − T ( x )) T ( y − T ( x )) (cid:19) , (42)where A stands for the indicator function of set A , and C M ( σ ) is a constant such that (cid:82) Y g ( x, y ) dy =1 , for each x ∈ { , , , } i.e. , C M ( σ ) = (cid:16)(cid:82) M − M exp (cid:16) − ( t − σ (cid:17) dt (cid:17) .The above hidden Markov model verifies the assumptions given at Section VI. From Propo-sition 1, if the family of quantizers verifies Assumption 3, then the conclusions of Theorems 1and 2 hold true.Note that the marginal pdf of the measurements ( Y k ) k ≥ (represented in Figure 3) writes p ( y ) = p ( y ) = 14 (cid:88) x =0 g ( x, y ) . (43)Since it does not depend on the hypothesis, Gupta-Hero quantizer [18], which minimizes theerror exponent loss in case of i.i.d. observations, is not defined. May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 30 −3 −2 −1 0 1 2 3−3−2−10123 y (1) y (2) Figure 3: QPSK vs. OQPSK – Marginal pdf of the observations p ( y ) = p ( y ) ( M = 3 , σ = 0 . ).
2) Examples of Quantizers:
Figure 4(a) represents the MSE-optimal -cell quantizer ob-tained by the LBG algorithm, and setting M = 3 , σ = 0 . . Figure 4(b) represents the corre-sponding proposed quantizer. Our quantizer is significantly different from the MSE-optimal one.Some low probability points turn out to be significant for the considered detection problem.Details on how we obtained these quantizers are given below. a) MSE-optimal quantizer: The MSE-optimal quantizer of Figure 4(a) was obtained byfeeding the LBG algorithm with
20 000 samples following distribution P i.e. , i.i.d. with pdf p ( y ) (see Figure 3). b) Proposed quantizer: As noted in Section IV-D1, the proposed quantizer, whose modelpoint density ζ is given by Equation (15), can be obtained by simply feeding the LBG algorithmwith observations corresponding with the following pdf: q ∗ ( y ) = p ( y ) ¯ F ( y ) (cid:82) p ( s ) ¯ F ( s ) ds . We simulated
20 000 samples of this pdf using rejection sampling [45, Section 2.2]. In practice,we approximated function ¯ F given by Equation (16) by: ¯ F k ( y ) = 1 n MC n MC (cid:88) j =1 (cid:13)(cid:13)(cid:13)(cid:13) ∇ y log p p ( Y − k : − ( j ) , y, Y k ( j )) (cid:13)(cid:13)(cid:13)(cid:13) , (44)for k = 3 and n MC = 1 000 replications ( Y m ( j )) m ∈{− k,..., − , ,...,k } ,j ∈{ ,...,n MC } i.e. , i.i.d.samples with pdf p . These values were chosen based on empirical observations. May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 31 −3 −2 −1 0 1 2 3−3−2−10123 −3 −2 −1 0 1 2 3−3−2−10123 (a) (b)Figure 4: QPSK vs. OQPSK – (a) MSE-optimal -cell quantizer, (b) Proposed -cellquantizer ( M = 3 , σ = 0 . ,
20 000 samples).The gradient in the above equation may be written as follows, after some derivations, andusing Equations (42), (43): ∇ y log p p ( y − k : k ) = ∇ y log p ( y ) − ∇ y log p ( y − k : k )= 1 σ E [ T ( X ) g ( X , y )] E [ g ( X , y )] − E (cid:104) T ( X ) (cid:81) kj = − k g ( X j , y j ) (cid:105) E (cid:104)(cid:81) kj = − k g ( X j , y j ) (cid:105) . As they are finite sums on X or X k +1 , the above four expectations are exactly computed at thetime of the evaluation of ¯ F k (44). B. Scenario
We consider the following model for vector observations with dimension d = 2 : Y k = X k + W k , where W k i.i.d. ∼ CN (0 , σ ) represents a zero mean circular Gaussian thermal noise with variance σ , and where ( X k ) k ∈ Z is a Gaussian process which is white under H and correlated (AR-1) May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 32 −5 0 5−505 y ( ) y (1) −5 0 5−505 y (1) y ( ) (a) (b)Figure 5: Detection of an AR structure – (a) MSE-optimal -cell quantizer, (b) Proposed -cellquantizer ( a = 0 . , σ = 1 ,
20 000 samples).under H . More precisely, H X k i.i.d. ∼ CN (0 , H X k = aX k − + √ − a U k , where a ∈ (0 , is the correlation coefficient and U k i.i.d. ∼ CN (0 , is the innovation process. Inparticular, ( Y k ) k ∈ Z is a white Gaussian process under H and is a hidden Markov process under H , with the particular property that marginal distribution of single observations are identicalunder both hypotheses.We mention that in the above model, densities have infinite support so that the assumptionsmade in this paper are not satisfied (the observation set Y coincides with R and is thusunbounded). In particular, Theorem 2 does not apply. Nevertheless, in order to yield some insightson the design of practical quantizers for detection, we can still use the approach described inSection IV-D1 and compute the proposed model point density given by Equation (15).Figure 5(a) represents the MSE-optimal -cell quantizer obtained by the LBG algorithm (witha
20 000 -sample training set of data), and setting σ = 1 . Figure 5(b) represents the corresponding May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 33 proposed quantizer , obtained when setting a = 0 . . Once again, our quantizer is significantlydifferent from the MSE-optimal one. As a matter of fact, low probability points seem to besignificant for the considered detection problem.Table I compares the latter two quantization rules and the uniform one (on the rectangle [ −
8; 8] ) in terms of quantity D e (9). As expected, the proposed quantization rule leads to thelowest one. We can guess it will also lead to higher detection performance.Table I: Detection of an AR structure – Quantity D e for parameters values a = 0 . and σ = 1 . Quantization rule Uniform on [ −
8; 8] MSE-optimal Proposed oneQuantity D e C. Scenario
Denote by Y k the samples collected by a receiver which makes a binary test associated withthe following hypotheses: H Y k = W k , H Y k = L (cid:88) (cid:96) =0 h (cid:96) U k − (cid:96) + W k . where W k i.i.d. ∼ N (0 , σ ) represents a thermal noise which is supposed to be real-valued forthe sake of illustration. Here, U k represents a certain random source which is passed through apropagation channel with deterministic real coefficients h , . . . , h L , where L is an integer whichrepresents the channel’s memory. In the sequel, we set L = 3 . Assume for instance that U k isGaussian distributed U k i.i.d. ∼ N (0 , . We investigate the case where the sensing unit performs ascalar quantization of the received signal before transmission to the decision device.As in Section VII-B, in the above model, densities have infinite support so that the assumptionsmade in this paper are not satisfied. Once again, in order to yield some insights on the designof practical quantizers for detection, we can still use the approach described in Section IV-D1and compute the proposed model point density given by Equation (15) . In this case, we approximated function ¯ F (16) for finite k and exactly computed the involved expectation. In this case, we approximated function ¯ F (16) for finite k and exactly computed the involved expectation. May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 34 −10 −5 0 5 1000.050.10.150.20.250.3 y p (y)p (y)MSE−optimalGupta−HeroProposed Figure 6: Detection of an MA process – Probability and model point densities ( h =[1 . , − . , . , σ = 1 . ).For the same reason, the result of Gupta and Hero [18, Equation (20)] does not apply,but we can compute the corresponding quantizer, which model point density is given by [18,Equation (25)], as they did for their Gaussian examples in [18, Section V].The performance depend on the noise variance σ and on the particular value of the channel.Thus, we assumed that channel coefficients h , . . . , h L are i.i.d. Gaussian distributed with zeromean and unit variance, and made several simulations.Figure 6 represents the probability and model point densities for one channel realization i.e. , h = [1 . , − . , . , and setting σ = 1 . .Considering a system with n = 80 sensors, constructing -cell quantizers for different methods,and computing the corresponding quantized probability distributions under each hypothesis, wecan compare the considered quantization rules in terms of detection performance through theirrespective receiver operating characteristics (ROC curves). Figure 7 represents such curves forthe above channel realization. The uniform quantizer is used on the support [ − σ, σ ] . Thewhole curve is plotted using
50 000 samples of LLR under each hypothesis.The proposed quantization rule improves the detection performance compared to the MSE-optimal quantizer. In this example, the ROC curve is close to that obtained using Gupta-Hero
May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 35 quantizer. Recall however that in other contexts ( e.g. in Scenarios N and n tend to infinity, that is, in the regime wherethe power of the test tends exponentially to one. In practice, the empirical validation of ourresult would thus require to simulate rare events. This topic is out of the scope of this paper.Note that if we interchange H and H , the proposed quantization rule will be different.This is due to the fact that the asymptotic regime we are interested in when dealing with errorexponents i.e. , n tends to infinity for a fixed type-I error α , restricts attention to one point alongthe Neyman-Pearson ROC curve. False positive rate ( α ) T r ue po s i t i v e r a t e ( − β ) No quantization
UniformMSE−optimalGupta−Hero
Proposed
Figure 7: Detection of an MA process – ROC curves ( h = [1 . , − . , . , σ = 1 . , n = 80 , N = 4 ,
100 000 samples). VIII. C
ONCLUSION
We investigated the performance of the Neyman-Pearson detector used on quantized versionsof a correlated vector-valued stationary process. It was shown that for a constant false alarmlevel, the miss probability of the test converges exponentially to zero. We determined the errorexponent and we provided a compact and informative expression of the latter in the context ofhigh-rate quantization. It is proved in particular that when the number N of quantization levelstends to infinity, the error exponent converges at speed N − /d to the ideal error exponent that May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 36 one would obtain in the absence of quantization. In case of scalar quantization, we analyticallycharacterized the high-rate quantizers minimizing the error exponent loss. In case of vectorquantization, we proposed a method based on the LBG algorithm in order to construct practicalquantizers with attractive performance.We believe that there are many directions for extending these results and mention a fewhere. In this paper, observations have absolutely continuous probability distributions w.r.t. theLebesgue measure. Following Graf and Luschgy [46, Section 6] who considered measures withboth continuous and singular parts, we could think of an extension of our work to such cases.We moreover focused on constant false-alarm rate (CFAR) tests. Following the argumentsdeveloped in [18] and using the results of [25, Section III], it could be interesting to study thewhole asymptotic ROC curve and use a global performance criterion like the area under thecurve (AUC). However, this would require a nontrivial extension of Sanov’s theorem [47] tonon-i.i.d. times series.We furthermore think that the framework developed in this paper could be applied in thecontext of parameter estimation. The effect of quantization on performance, measured for instanceby the Fisher information, could be studied and corresponding optimal vector quantizers couldbe described. A
PPENDIX AP ROOF OF L EMMA y − m : u (cid:55)→ p ( y − m : u ) at point ξ N,j − m : u : p ( y − m : u ) = p ( ξ N,j − m : u ) + u (cid:88) k = − m ∇ y k p ( ξ N,j − m : u ) T ( y k − ξ N,j k )+ 12 u (cid:88) k,(cid:96) = − m ( y k − ξ N,j k ) T ∇ y k ,y (cid:96) p ( ξ N,j − m : u ) ( y (cid:96) − ξ N,j (cid:96) ) + (cid:15) N ( y − m : u ) , (45)where (cid:15) N ( y − m : u ) = 16 u (cid:88) k,(cid:96),r = − m d (cid:88) h, ¯ß , ¯ =1 ( y ( h ) k − ξ ( h ) N,j k )( y (¯ß) (cid:96) − ξ (¯ß) N,j (cid:96) )( y (¯ ) r − ξ (¯ ) N,j r ) × ∂ p ∂y ( h ) k ∂y (¯ß) (cid:96) ∂y (¯ ) r ( θy − m : u + (1 − θ ) ξ N,j − m : u ) , May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 37 for a given θ ∈ [0 , (see [48]). Plugging expansion (45) into (19) leads to: ¯ p ,N ( ξ N,j − m : u ) p ( ξ N,j − m : u ) = 1 + u (cid:88) k = − m (cid:90) C N,jk ∇ y k p ( ξ N,j − m : u ) T p ( ξ N,j − m : u ) ( y k − ξ N,j k ) dy k V N,j k + 12 u (cid:88) k = − m (cid:90) C N,jk ( y k − ξ N,j k ) T ∇ y k p ( ξ N,j − m : u ) p ( ξ N,j − m : u ) ( y k − ξ N,j k ) dy k V N,j k + 12 (cid:88) k (cid:54) = (cid:96) (cid:90) C N,jk (cid:90) C N,j(cid:96) ( y k − ξ N,j k ) T ∇ y k ,y (cid:96) p ( ξ N,j − m : u ) p ( ξ N,j − m : u ) ( y (cid:96) − ξ N,j (cid:96) ) dy k V N,j k dy (cid:96) V N,j (cid:96) + (cid:15) N,j − m : u , (46)where (cid:15) N,j − m : u = (cid:90) . . . (cid:90) C N,j − m ×···× C N,ju (cid:15) N ( y − m : u ) p ( ξ N,j − m : u ) dy − m : u (cid:81) ui = − m V N,j i . We now determine an estimate for this remainder term. For each y − m : u ∈ C N,j − m ×· · ·× C N,j u , (cid:15) N ( y − m : u ) p ( ξ N,j − m : u ) = 16 u (cid:88) k,(cid:96),r = − m d (cid:88) h, ¯ß , ¯ =1 ( y ( h ) k − ξ ( h ) N,j k )( y (¯ß) (cid:96) − ξ (¯ß) N,j (cid:96) )( y (¯ ) r − ξ (¯ ) N,j r ) × p ( θy − m : u + (1 − θ ) ξ N,j − m : u ) ∂ p ∂y ( h ) k ∂y (¯ß) (cid:96) ∂y (¯ ) r ( θy − m : u + (1 − θ ) ξ N,j − m : u ) × p ( θy − m : u + (1 − θ ) ξ N,j − m : u ) p ( ξ N,j − m : u ) . (47)First, we find a bound for the last factor. To that end, we expand function z − m : u (cid:55)→ log p ( z − m : u ) at point ξ N,j − m : u : log p ( z − m : u ) = log p ( ξ N,j − m : u ) + u (cid:88) k = − m ∇ y k log p ( θ (cid:48) z − m : u + (1 − θ (cid:48) ) ξ N,j − m : u ) T ( z k − ξ N,j k ) , for a given θ (cid:48) ∈ [0 , . From Equation (24), the following inequality holds: (cid:12)(cid:12)(cid:12)(cid:12) log p ( z − m : u ) p ( ξ N,j − m : u ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ u (cid:88) k = − m (cid:13)(cid:13) ∇ y k log p ( θ (cid:48) z − m : u + (1 − θ (cid:48) ) ξ N,j − m : u ) (cid:13)(cid:13) (cid:107) z k − ξ N,j k (cid:107)≤ C u (cid:88) k = − m (cid:107) z k − ξ N,j k (cid:107) . Applying the above upper bound at point z − m : u = θy − m : u + (1 − θ ) ξ N,j − m : u and using Assump-tion 3-3), we find (cid:12)(cid:12)(cid:12)(cid:12) log p ( θy − m : u + (1 − θ ) ξ N,j − m : u ) p ( ξ N,j − m : u ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ C ( m + 1) C d N /d , May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 38 for each y − m : u ∈ C N,j − m × · · · × C N,j u . According to the definition of sequence m ( N ) (seeEquation (23)), the r.h.s. of the above equation vanishes as N tends to infinity. Consequently, theterm p ( θy − m : u +(1 − θ ) ξ N,j − m : u ) p ( ξ N,j − m : u ) in Equation (47) is bounded. This result together with Assumption 4-2) gives the following upper bound: (cid:12)(cid:12) (cid:15) N,j − m : u (cid:12)(cid:12) ≤ c T (cid:18) m + 1 N /d (cid:19) , for some constant c T .Let us now examine the dominant terms of Equation (46). Recall that ξ N,j is defined as thecentroid of cell C N,j : ξ N,j = (cid:90) C N,j y dyV
N,j . It is straightforward to prove the following two equalities, for any j ∈ { , . . . , N } and any d -by- d matrix A : (cid:90) C N,j ( y − ξ N,j ) dyV N,j = 0 , (cid:90) C N,j ( y − ξ N,j ) T A ( y − ξ N,j ) dyV N,j = Tr ( AM N,j ) V /dN,j . Plugging above identities in Equation (46) and recalling that ζ N,j = NV N,j prove Lemma 2.A
PPENDIX BP ROOF OF L EMMA V N on Y by V N ( y ) = V N,j whenever y ∈ C N,j . Lemma 6:
For each k ∈ {− m, . . . , u } , the following equality holds true: E (cid:104) ∇ y k log p ( Z N, − m : u ) T ( Y k − Z N,k ) (cid:105) = 1 N /d E (cid:34) ∇ y k log p ( Z N, − m : u ) T M N ( Y k ) ζ N ( Y k ) /d ∇ y k p ( Y − m : k − , Z N,k , Y k +1: u ) p ( Y − m : u ) (cid:35) + ¯ (cid:15) N,k , where | ¯ (cid:15) N,k | ≤ c (cid:48) N /d for some constant c (cid:48) . May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 39
Proof:
We expand the expectation: E (cid:104) ∇ y k log p ( Z N, − m : u ) T ( Y k − Z N,k ) (cid:105) = (cid:88) j − m : u (cid:90) . . . (cid:90) C N,j − m ×···× C N,ju ∇ y k log p ( ξ N,j − m : u ) T ( y k − ξ N,j k ) p ( y − m : u ) dy − m : u . (48)where (cid:80) j − m : u is a summation over all index vectors j − m : u ∈ { , . . . , N } u + m +1 .For each j k ∈ { , . . . , N } , we then consider the Taylor-Lagrange expansion of y k (cid:55)→ p ( y − m : u ) at point ξ N,j k : p ( y − m : u ) = p ( y − m : k − , ξ N,j k , y k +1: u )+ ∇ y k p ( y − m : k − , ξ N,j k , y k +1: u ) T ( y k − ξ N,j k ) + (cid:15) N,k ( y − m : u ) , (49)where (cid:15) N,k ( y − m : u ) = ( y k − ξ N,j k ) T ∇ y k p ( y − m : k − , θy k + (1 − θ ) ξ N,j k , y k +1: u ) ( y k − ξ N,j k ) for a given θ ∈ [0 , . Under Assumption 4-2), from the counterparts of Equations (24), (25) fordensity p and following the argument of Lemma 2 (see Appendix A), we can find a bound forthis remainder: For each y − m : u ∈ C N,j − m × · · · × C N,j u , | (cid:15) N,k ( y − m : u ) | ≤ (cid:107) y k − ξ N,j k (cid:107) (cid:13)(cid:13) ∇ y k p ( y − m : k − , θy k + (1 − θ ) ξ N,j k , y k +1: u ) (cid:13)(cid:13) = (cid:107) y k − ξ N,j k (cid:107) (cid:13)(cid:13)(cid:13)(cid:13) ∇ y k p ( y − m : k − , θy k + (1 − θ ) ξ N,j k , y k +1: u ) p ( y − m : k − , θy k + (1 − θ ) ξ N,j k , y k +1: u ) (cid:13)(cid:13)(cid:13)(cid:13) × p ( y − m : k − , θy k + (1 − θ ) ξ N,j k , y k +1: u ) p ( y − m : u ) p ( y − m : u ) ≤ c (cid:107) y k − ξ N,j k (cid:107) p ( y − m : u ) , (50)for some constant c .Plugging expansion (49) into (48) leads to two dominant terms D N, and D N, and a remain-der r N : E (cid:104) ∇ y k log p ( Z N, − m : u ) T ( Y k − Z N,k ) (cid:105) = D N, + D N, + r N . May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 40
We successively study each of them. The first dominant term is D N, = (cid:88) j − m : u (cid:90) . . . (cid:90) C N,j − m ×···× C N,ju ∇ y k log p ( ξ N,j − m : u ) T ( y k − ξ N,j k ) × p ( y − m : k − , ξ N,j k , y k +1: u ) dy − m : u = (cid:88) j − m : u ∇ y k log p ( ξ N,j − m : u ) T (cid:90) . . . (cid:90) { C N,ji } i (cid:54) = k (cid:32)(cid:90) C N,jk ( y k − ξ N,j k ) dy k (cid:33) × p ( y − m : k − , ξ N,j k , y k +1: u ) { dy i } i (cid:54) = k = 0 , where { dy i } i (cid:54) = k stands for (cid:81) ui = − m,i (cid:54) = k dy i . The last equality holds true since we have chosen thequantization level ξ N,j to be the centroid of cell C N,j .The second dominant term is D N, = (cid:88) j − m : u (cid:90) . . . (cid:90) C N,j − m ×···× C N,ju ∇ y k log p ( ξ N,j − m : u ) T ( y k − ξ N,j k )( y k − ξ N,j k ) T × ∇ y k p ( y − m : k − , ξ N,j k , y k +1: u ) dy − m : u = (cid:88) j − m : u ∇ y k log p ( ξ N,j − m : u ) T (cid:90) . . . (cid:90) { C N,ji } i (cid:54) = k (cid:32)(cid:90) C N,jk ( y k − ξ N,j k )( y k − ξ N,j k ) T dy k (cid:33) × ∇ y k p ( y − m : k − , ξ N,j k , y k +1: u ) { dy i } i (cid:54) = k = (cid:88) j − m : u ∇ y k log p ( ξ N,j − m : u ) T M N,j k V /dN,j k × (cid:90) . . . (cid:90) { C N,ji } i (cid:54) = k ∇ y k p ( y − m : k − , ξ N,j k , y k +1: u ) { dy i } i (cid:54) = k . (51)We now write this equality in a simple form. Obviously, under Assumption 1-2), we can write ∇ y k p ( y − m : k − , ξ N,j k , y k +1: u ) = ∇ y k p ( y − m : k − , ξ N,j k , y k +1: u ) p ( y − m : u ) p ( y − m : u ) . Note that the above expression is independent of y k ∈ C N,j , so we can also write ∇ y k p ( y − m : k − , ξ N,j k , y k +1: u ) = (cid:90) C N,j ∇ y k p ( y − m : k − , ξ N,j k , y k +1: u ) p ( y − m : u ) p ( y − m : u ) dy k V N,j . May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 41
Equation (51) thus becomes D N, = (cid:88) j − m : u ∇ y k log p ( ξ N,j − m : u ) T M N,j k V /dN,j k × (cid:90) . . . (cid:90) { C N,ji } ∇ y k p ( y − m : k − , ξ N,j k , y k +1: u ) p ( y − m : u ) p ( y − m : u ) dy − m : u = 1 N /d E (cid:20) ∇ y k log p ( Z N, − m : u ) T M N ( Y k ) ζ N ( Y k ) /d ∇ y k p ( Y − m : k − , Z N,k , Y k +1: u ) p ( Y − m : u ) (cid:21) , where the last line comes from ζ N,j = NV N,j .We complete the proof with a bound on the remainder term: | r N | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) j − m : u (cid:90) . . . (cid:90) C N,j − m ×···× C N,ju ∇ y k log p ( ξ N,j − m : u ) T ( y k − ξ N,j k ) (cid:15) N,k ( y − m : u ) dy − m : u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( a ) ≤ C c (cid:90) . . . (cid:90) (cid:107) y k − ξ N ( y k ) (cid:107) p ( y − m : u ) dy − m : u ( b ) ≤ C c (cid:18) C d N /d (cid:19) = c (cid:48) N /d , where inequality (a) is obtained from Equations (24), (50) and (b) is a consequence of Assump-tion 3-3).Putting all pieces together proves Lemma 6. Lemma 7:
There exists a constant c (cid:48) such that, for each k (cid:54) = (cid:96) ∈ {− m, . . . , u } , E (cid:104) ( Y k − Z N,k ) T ∇ y k ,y (cid:96) log p ( Z N, − m : u ) ( Y (cid:96) − Z N,(cid:96) ) (cid:105) ≤ c (cid:48) N /d . Proof:
For each k (cid:54) = (cid:96) , we expand the expectation: E (cid:2) ( Y k − Z N,k ) T ∇ y k ,y (cid:96) log p ( Z N, − m : u ) ( Y (cid:96) − Z N,(cid:96) ) (cid:3) = (cid:88) j − m : u (cid:90) . . . (cid:90) C N,j − m ×···× C N,ju ( y k − ξ N,j k ) T ∇ y k ,y (cid:96) log p ( ξ N,j − m : u ) ( y (cid:96) − ξ N,j (cid:96) ) p ( y − m : u ) dy − m : u (52)and consider the expansion of y k (cid:55)→ p ( y − m : u ) at point ξ N,j k : p ( y − m : u ) = p ( y − m : k − , ξ N,j k , y k +1: u ) + (cid:15) (cid:48) N,k ( y − m : u ) , (53)where, from the counterpart of Equation (24) for density p and following the argument leadingto Equation (50), (cid:12)(cid:12) (cid:15) (cid:48) N,k ( y − m : u ) (cid:12)(cid:12) ≤ c (cid:48) (cid:107) y k − ξ N,j k (cid:107) p ( y − m : u ) for some constant c (cid:48) . May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 42
Plugging expansion (53) into (52) leads to a dominant term and a remainder. The dominantterm is (cid:88) j − m : u (cid:90) . . . (cid:90) C N,j − m ×···× C N,ju ( y k − ξ N,j k ) T ∇ y k ,y (cid:96) log p ( ξ N,j − m : u ) ( y (cid:96) − ξ N,j (cid:96) ) × p ( y − m : k − , ξ N,j k , y k +1: u ) dy − m : u = (cid:88) j − m : u (cid:90) . . . (cid:90) { C N,ji } i (cid:54) = k (cid:32)(cid:90) C N,jk ( y k − ξ N,j k ) dy k (cid:33) ∇ y k ,y (cid:96) log p ( ξ N,j − m : u ) ( y (cid:96) − ξ N,j (cid:96) ) × p ( y − m : k − , ξ N,j k , y k +1: u ) { dy i } i (cid:54) = k = 0 . Using Equation (25) and Assumption 3-3), we find a bound for the remainder term: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) j − m : u (cid:90) . . . (cid:90) C N,j − m ×···× C N,ju ( y k − ξ N,j k ) T ∇ y k ,y (cid:96) log p ( ξ N,j − m : u ) ( y (cid:96) − ξ N,j (cid:96) ) (cid:15) (cid:48) N,k ( y − m : u ) dy − m : u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C c (cid:48) (cid:18) C d N /d (cid:19) = c (cid:48) N /d . (54) Lemma 8:
For each k ∈ {− m, . . . , u } , E (cid:104) ( Y k − Z N,k ) T ∇ y k log p ( Z N, − m : u ) ( Y k − Z N,k ) (cid:105) = 1 N /d E (cid:20) Tr (cid:18) ∇ y k log p ( Z N, − m : u ) M N ( Y k ) ζ N ( Y k ) /d (cid:19) p ( Y − m : k − , Z N,k , Y k +1: u ) p ( Y − m : u ) (cid:21) + ¯ (cid:15) (cid:48) N,k , where (cid:12)(cid:12) ¯ (cid:15) (cid:48) N,k (cid:12)(cid:12) ≤ c (cid:48) N /d . Proof:
For each k , we expand the expectation: E (cid:2) ( Y k − Z N,k ) T ∇ y k log p ( Z N, − m : u ) ( Y k − Z N,k ) (cid:3) = (cid:88) j − m : u (cid:90) . . . (cid:90) C N,j − m ×···× C N,ju ( y k − ξ N,j k ) T ∇ y k log p ( ξ N,j − m : u ) ( y k − ξ N,j k ) p ( y − m : u ) dy − m : u . (55)Plugging expansion (53) into (55) leads to a dominant term and a remainder. The study of thedominant term uses the same arguments as Lemma 6. The final expression comes from the May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 43 following equality: (cid:90) C N,j ( y − ξ N,j ) T A ( y − ξ N,j ) dy = Tr ( A M
N,j ) V /dN,j , for any d -by- d matrix A , and the definition of the specific point density ζ N,j = NV N,j .Equation (54) is also valid when k = (cid:96) i.e. , for the remainder considered here. This provesLemma 8.Gathering Equation (32) and Lemmas 6, 7, 8 results in U N ( u ) = − N /d u (cid:88) k = − m E (cid:20) ∇ y k log p ( Z N, − m : u ) T M N ( Y k ) ζ N ( Y k ) /d ∇ y k p ( Y − m : k − , Z N,k , Y k +1: u ) p ( Y − m : u ) (cid:21) − N /d u (cid:88) k = − m E (cid:20) Tr (cid:18) ∇ y k log p ( Z N, − m : u ) M N ( Y k ) ζ N ( Y k ) /d (cid:19) p ( Y − m : k − , Z N,k , Y k +1: u ) p ( Y − m : u ) (cid:21) + ¯ (cid:15) N ( u ) , where | ¯ (cid:15) N ( u ) | ≤ c U m N /d for some constant c U .Expanding ∇ y k p and p once again, under Assumptions 3 and 4-2), it is straightforward towrite the dominant term in a simple form i.e. , replace each Z N,k by Y k . From Equation (23),the remainder term is a little-o of N − /d . This proves Lemma 3.A PPENDIX CP ROOF OF L EMMA Σ N = E [ H N, ( Y −∞ :0 )] + − (cid:88) k = −∞ E [ H N,k ( Y −∞ :0 ) − H N,k ( Y −∞ : − )] . Using Equation (35), the approximation of N /d ( K − K N ) by series Σ N leads to the followingremainder: (cid:12)(cid:12) N /d ( K − K N ) − Σ N (cid:12)(cid:12) ≤ (cid:88) k = − m E (cid:12)(cid:12) ∆ ( k ) N (cid:12)(cid:12) + − m − (cid:88) k = −∞ E (cid:12)(cid:12) Υ ( k ) N (cid:12)(cid:12) + ˇ (cid:15) N , (56)where ∆ (0) N = H N, ( Y − m :0 ) − H N, ( Y −∞ :0 ) and ∆ ( k ) N = H N,k ( Y − m :0 ) − H N,k ( Y − m : − ) − H N,k ( Y −∞ :0 ) + H N,k ( Y −∞ : − ) ( ∀ k ≤ − , Υ ( k ) N = H N,k ( Y −∞ :0 ) − H N,k ( Y −∞ : − ) ( ∀ k ≤ − m − , May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 44 and where ˇ (cid:15) N → as N → ∞ . Using the triangular inequality, we obtain for each k ≤ − : E (cid:12)(cid:12) ∆ ( k ) N (cid:12)(cid:12) ≤ E |H N,k ( Y − m :0 ) − H N,k ( Y −∞ :0 ) | + E |H N,k ( Y − m : − ) − H N,k ( Y −∞ : − ) | . Using (37), this leads to: E (cid:12)(cid:12) ∆ ( k ) N (cid:12)(cid:12) ≤ c h ϕ m −| k | . From the triangular inequality once again, E (cid:12)(cid:12) ∆ ( k ) N (cid:12)(cid:12) ≤ E |H N,k ( Y − m :0 ) − H N,k ( Y − m : − ) | + E |H N,k ( Y −∞ :0 ) − H N,k ( Y −∞ : − ) | . Using (38), this leads to: E (cid:12)(cid:12) ∆ ( k ) N (cid:12)(cid:12) ≤ c h ψ | k | . After some algebra, there exists a constant c ∆ such that: − (cid:88) k = − m E (cid:12)(cid:12) ∆ ( k ) N (cid:12)(cid:12) ≤ c ∆ − (cid:88) k = − m ϕ m −| k | ∧ ψ | k | ≤ c ∆ − (cid:88) k = −(cid:98) m/ (cid:99) ϕ m −| k | + −(cid:98) m/ (cid:99) (cid:88) k = − m ψ | k | ≤ c ∆ ∞ (cid:88) k = (cid:100) m/ (cid:101) ϕ k + ∞ (cid:88) k = (cid:98) m/ (cid:99) ψ k ≤ c ∆ T ( m )∆ , Where ( T ( m )∆ ) m ≥ is a sequence of positive numbers such that T ( m )∆ → as m → ∞ . Thelast line of the above equation holds true under Assumption 4-4) since (cid:80) ϕ k and (cid:80) ψ k areconvergent series. Similarly, E (cid:12)(cid:12) ∆ (0) N (cid:12)(cid:12) ≤ c h ϕ m .The last series in (56) can be bounded using (38): − m − (cid:88) k = −∞ E (cid:12)(cid:12) Υ ( k ) N (cid:12)(cid:12) ≤ c h − m − (cid:88) k = −∞ ψ | k | = c Υ T ( m )Υ , for some constant c Υ and a given sequence ( T ( m )Υ ) m ≥ such that T ( m )Υ → as m → ∞ .Putting all pieces together, Equation (56) leads to: (cid:12)(cid:12) N /d ( K − K N ) − Σ N (cid:12)(cid:12) ≤ c h ϕ m + c ∆ T ( m )∆ + c Υ T ( m )Υ + ˇ (cid:15) N . The r.h.s. of the above inequality tends to zero as m, N → ∞ . This proves Lemma 5.
May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 45 A CKNOWLEDGMENT
The authors would like to thank Prof. Eric Moulines for helpful comments and for bringinguseful references to their attention. They are also grateful to Dr. Walid Hachem and Dr. PabloPiantanida for fruitful discussions. R
EFERENCES [1] I. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “Wireless sensor networks: a survey,”
Computer Networks ,vol. 38, no. 4, pp. 393–422, 2002.[2] B. Chen, L. Tong, and P. Varshney, “Channel-aware distributed detection in wireless sensor networks,”
IEEE Signal Process.Mag. , vol. 1053, no. 5888/06, pp. 16–26, 2006.[3] R. Gray and D. Neuhoff, “Quantization,”
IEEE Trans. Inf. Theory , vol. 44, no. 6, pp. 2325–2383, 1998.[4] A. Gersho and R. Gray,
Vector quantization and signal compression . Kluwer, 1992.[5] W. Bennett, “Spectra of quantized signals,”
Bell System Technical Journal , vol. 27, pp. 446–472, 1948.[6] S. Na and D. Neuhoff, “Bennett’s integral for vector quantizers,”
IEEE Trans. Inf. Theory , vol. 41, no. 4, pp. 886–900,1995.[7] T. Han and S. Amari, “Statistical inference under multiterminal data compression,”
IEEE Trans. Inf. Theory , vol. 44, no. 6,pp. 2300–2324, 1998.[8] V. Misra, V. Goyal, and L. Varshney, “Distributed functional scalar quantization: High-resolution analysis and extensions,”
Arxiv , vol. cs.IT, p. arXiv:0811.3617, 2008.[9] J.-J. Xiao, A. Ribeiro, Z.-Q. Luo, and G. Giannakis, “Distributed compression-estimation using wireless sensor networks,”
IEEE Signal Process. Mag. , vol. 23, no. 4, pp. 27–41, 2006.[10] K. Perlmutter, S. Perlmutter, R. Gray, R. Olshen, and K. Oehler, “Bayes risk weighted vector quantization with posteriorestimation for image compression and classification,”
IEEE Trans. Image Process. , vol. 5, no. 2, pp. 347–360, 1996.[11] S. Kassam, “Optimum quantization for signal detection,”
IEEE Trans. Commun. , vol. 25, no. 5, pp. 479–484, 1977.[12] H. Poor and J. Thomas, “Applications of Ali–Silvey distance measures in the design of generalized quantizers for binarydecision systems,”
IEEE Trans. Commun. , vol. 25, no. 9, pp. 893–900, 1977.[13] H. Poor, “Fine quantization in signal detection and estimation,”
IEEE Trans. Inf. Theory , vol. 34, no. 5, pp. 960–972, 1988.[14] B. Picinbono and P. Duvaut, “Optimum quantization for detection,”
IEEE Trans. Commun. , vol. 36, no. 11, pp. 1254–1258,1988.[15] J. Tsitsiklis, “Extremal properties of likelihood-ratio quantizers,”
IEEE Trans. Commun. , vol. 41, no. 4, pp. 550–558, 1993.[16] R. Tenney and N. Sandell, “Detection with distributed sensors,”
IEEE Transactions on Aerospace and Electronic Systems ,vol. 17, no. 4, pp. 501–510, 1981.[17] J. Tsitsiklis, “Decentralized detection by a large number of sensors,”
Mathematics of Control, Signals, and Systems , vol. 1,no. 2, pp. 167–182, 1988.[18] R. Gupta and A. Hero, “High-rate vector quantization for detection,”
IEEE Trans. Inf. Theory , vol. 49, no. 8, pp. 1951–1969,2003.[19] E. Lehmann and J. Romano,
Testing Statistical Hypotheses (3rd Ed) . Springer Texts in Statistics, 2005.[20] T. Cover and J. Thomas,
Elements of information theory (2nd Ed) . Wiley-Interscience, 2006.
May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 46 [21] J.-F. Chamberland and V. Veeravalli, “How dense should a sensor network be for detection with correlated observations?”
IEEE Trans. Inf. Theory , vol. 52, no. 11, pp. 5099–5106, 2006.[22] P. Willett, P. Swaszek, and R. Blum, “The good, bad, and ugly: Distributed detection of a known signal in dependentGaussian noise,”
IEEE Trans. Signal Process. , vol. 48, no. 12, pp. 3266–3279, 2000.[23] Y. Sung, L. Tong, and H. Poor, “Neyman–Pearson detection of Gauss–Markov signals in noise : closed-form error exponentand properties,”
IEEE Trans. Inf. Theory , vol. 52, no. 4, pp. 1354–1365, 2006.[24] W. Hachem, E. Moulines, and F. Roueff, “Error exponents for Neyman–Pearson detection of a continuous-time GaussianMarkov process from noisy irregular samples,” arXiv cs.IT , 2009, submitted to IEEE Trans. on Inf. Theory.[25] P.-N. Chen, “General formulas for the Neyman–Pearson type-II error exponent subject to fixed and exponential type-I errorbounds,”
IEEE Trans. Inf. Theory , vol. 42, no. 1, pp. 316–323, 1996.[26] R. Bradley, “Basic properties of strong mixing conditions. a survey and some open questions,”
Probability Surveys , vol. 2,pp. 107–144, 2005.[27] S. Moy, “Generalizations of Shannon–McMillan theorem,”
Pacific J. Math. , vol. 11, no. 2, pp. 705–714, 1961.[28] P. Doukhan,
Mixing: properties and examples . Springer, 1994.[29] D. Bosq,
Nonparametric statistics for stochastic processes: estimation and prediction . Springer Verlag, 1998.[30] J. Villard and P. Bianchi, “High-rate vector quantization for the Neyman-Pearson detection of some stationary mixingprocesses,” in
ISIT , Austin, Texas, USA, 2010.[31] P. Panter and W. Dite, “Quantization distortion in pulse-count modulation with nonuniform spacing of levels,”
Proceedingsof the IRE , vol. 39, no. 1, pp. 44 – 48, 1951.[32] D. Neuhoff, “On the asymptotic distribution of the errors in vector quantization,”
IEEE Trans. Inf. Theory , vol. 42, no. 2,pp. 461–468, 1996.[33] A. Gersho, “Asymptotically optimal block quantization,”
IEEE Trans. Inf. Theory , vol. 25, no. 4, pp. 373–380, 1979.[34] R. Zamir and M. Feder, “On lattice quantization noise,”
IEEE Transactions on Information Theory , vol. 42, no. 4, pp.1152 –1159, 1996.[35] Y. Linde, A. Buzo, and R. Gray, “An algorithm for vector quantizer design,”
IEEE Trans. Commun. , vol. 28, no. 1, pp.84–95, 1980.[36] J. Conway and N. Sloane,
Sphere packings, lattices, and groups (3rd Ed) . Springer-Verlag, 1999.[37] R. Gupta, “Quantization strategies for low-power communications,” Ph.D. dissertation, The University of Michigan, 2001.[38] J. Lasserre, “A trace inequality for matrix product,”
IEEE Trans. Autom. Control , vol. 40, no. 8, pp. 1500 –1501, 1995.[39] A. Marshall and I. Olkin,
Inequalities: theory of majorization and its applications . Academic Press New York, 1979.[40] P. Billingsley,
Probability and Measure (3rd Ed) . John Wiley & Sons, 1995.[41] R. Douc, E. Moulines, and T. Ryden, “Asymptotic properties of the maximum likelihood estimator in autoregressive modelswith Markov regime,”
The Annals of Statistics , vol. 32, no. 5, pp. 2254–2304, 2004.[42] O. Cappé, E. Moulines, and T. Ryden,
Inference in Hidden Markov Models . Springer series in statistics, 2007.[43] J. Proakis and M. Salehi,
Digital communications (5th Ed) . McGraw-Hill, 2007.[44] N. Johnson, S. Kotz, and N. Balakrishnan,
Continuous univariate distributions, vol. 1 (2nd Ed) . Wiley-Interscience, 1994.[45] J. Liu,
Monte Carlo strategies in scientific computing . Springer Verlag, 2001.[46] S. Graf and H. Luschgy,
Foundations of quantization for probability distributions . Springer, 2000.[47] A. Dembo and O. Zeitouni,
Large deviations techniques and applications (2nd Ed) . Springer Verlag, 1998.[48] S. Lang,
Calculus of several variables . Addison-Wesley, 1973.
May 2011 DRAFTO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 47
Joffrey Villard (S’09) was born in Saint-Étienne, France, in 1985. He received the Dipl.Ing. degree in digital communicationand electronics, and the M.Sc. degree in wireless communication systems, both from Supélec, Gif-sur-Yvette, France, in 2008.He is currently working towards the Ph.D. degree at the Department of Telecommunications of Supélec. His research interestsinclude information theory, source coding, statistical inference, and signal processing for wireless sensor networks.
Pascal Bianchi (M’06) was born in 1977 in Nancy, France. He received the M.Sc. degree of Supélec-Paris XI in 2000 and thePh.D. degree of the University of Marne-la-Vallée in 2003.From 2003 to 2009, he was an Associate Professor at the Telecommunication Department of Supélec. In 2009, he joined theStatistics and Applications group at LTCI-Telecom ParisTech. His current research interests are in the area of statistical signalprocessing for sensor networks. They include decentralized detection, quantization, stochastic optimization, and applications ofrandom matrix theory.(M’06) was born in 1977 in Nancy, France. He received the M.Sc. degree of Supélec-Paris XI in 2000 and thePh.D. degree of the University of Marne-la-Vallée in 2003.From 2003 to 2009, he was an Associate Professor at the Telecommunication Department of Supélec. In 2009, he joined theStatistics and Applications group at LTCI-Telecom ParisTech. His current research interests are in the area of statistical signalprocessing for sensor networks. They include decentralized detection, quantization, stochastic optimization, and applications ofrandom matrix theory.