New Perspectives on Multiple Source Localization in Wireless Sensor Networks
Thi Le Thu Nguyen, Francois Septier, Harizo Rajaona, Gareth W. Peters, Ido Nevat, Yves Delignon
aa r X i v : . [ c s . I T ] A p r New Perspectives on Multiple SourceLocalization in Wireless Sensor Networks
Thi Le Thu Nguyen , , Franc¸ois Septier , , Harizo Rajaona , , Gareth W. Peters , Ido Nevat andYves Delignon , Institut Mines-T´el´ecom / T´el´ecom Lille, Villeneuve d’ascq, France. CRIStAL UMR CNRS 9189, Villeneuve d’ascq, France. CEA, DAM, DIF, 91297 Arpajon, France Department of Statistical Sciences, University College London (UCL), London, UK. Institute for Infocomm Research, Singapore.
Abstract
In this paper we address the challenging problem of multiple source localization in Wireless SensorNetworks (WSN). We develop an efficient statistical algorithm, based on the novel application of SequentialMonte Carlo (SMC) sampler methodology, that is able to deal with an unknown number of sources givenquantized data obtained at the fusion center from different sensors with imperfect wireless channels. We alsoderive the Posterior Cram´er-Rao Bound (PCRB) of the source location estimate. The PCRB is used to analyzethe accuracy of the proposed SMC sampler algorithm and the impact that quantization has on the accuracyof location estimates of the sources. Extensive experiments show that the benefits of the proposed schemein terms of the accuracy of the estimation method that are required for model selection (i.e., the numberof sources) and the estimation of the source characteristics compared to the classical importance samplingmethod.
Keywords:
Wireless sensor networks, localization, multiple sources, quantized data, Sequential Monte Carlosampler, Bayesian inference
I. I
NTRODUCTION
Wireless sensor networks (WSN) are composed of a large numbers of low-cost, low-power, densely dis-tributed, and possibly heterogeneous sensors. WSN increasingly attract considerable research attention due tothe large number of applications, such as environmental monitoring [1], weather forecasts [2]–[5], surveillance[6], [7], health care [8], structural safety and building monitoring [9] and home automation [5], [10]. Weconsider WSN which consist of a set of spatially distributed sensors that may have limited resources, suchas energy and communication bandwidth. These sensors monitor a spatial physical phenomenon containingsome desired attribute (e.g pressure, temperature, concentrations of substance, sound intensity, radiation levels,pollution concentrations, seismic activity etc.) and regularly communicate their observations to a Fusion Centre
September 4, 2018 DRAFT (FC) in a wireless manner (for example, as in [11]–[17]). The FC collects these observations and fuses themin order to reconstruct the signal of interest, based on which effective management actions are made [10].In this paper, we study the source localization problem which is one important problem that arises in WSN,see for instance the overviews in [18] and [19].
A. Existing works on source localization from WSN
A number of works have addressed different aspects of this source localization problem. For instance in adistributed sensor localization problem the work of [18] studied the problem of sensor localizations and theaccuracy of such estimation in ad-hoc WSN based on measurements and statistical model design for WSNmeasurements such as time of arrival, angle of arrival and received signal strength. The observations utilizedto make this inference typically come from a WSN in which there is typically a large number of inexpensivesensors that are densely deployed in a region of interest (ROI). Generally, this makes it possible to accuratelyperform energy based target localization. Signal intensity measurements are very convenient and economicalto localize a target, since no additional sensor functionalities and measurement features, such as direction ofarrival (DOA) or time-delay of arrival (TDOA), are required.Such energy-based methods, based on the fact that the intensity (energy) of the signal attenuates as afunction of distance from the source, have been proposed and developed in [20]–[26]. More precisely, [21]developed a least-square method to perform the task of localization for a single source based on the energyratios between sensors. This was then extended under a Maximum likelihood (ML) based framework formultiple source localizations in [22]. In this second work, the proposed method uses acoustic signal energymeasurements taken at individual sensors to estimate the locations of multiple acoustic sources. By assumingthat the number of acoustic sources is known in advance, their estimation approach involved a combinationof a multiresolution search algorithm and the use of the expectation-maximization (EM) algorithm.However, in both [21], [22], analog measurements from sensors are required to estimate the source location.For a typical WSN with limited resources (energy and bandwidth), it is important to limit the communicationwith the network. Therefore, it is often desirable that only binary or multiple bit quantized data be transmittedfrom local sensors to the fusion center (processing node). Motivated by such constraints, several papers havemore recently proposed source localization techniques using only quantized data [24]–[26]. In [24], a MLbased approach has been proposed by using multi-bit ( M -bit) sensor measurements transmitted to the fusioncenter. In [26], the authors have also developed for the same problem an alternative solution based on animportance sampler which was utilized to approximate the posterior distribution of the single source giventhe quantized data. However, in both works, perfect communication channels between sensors and the fusioncenter are assumed. Usually, in a target localization scenario, a large number of sensors are deployed in some September 4, 2018 DRAFT area where a line-of-sight between sensors and the FC cannot be always guaranteed. In [25], an extension ofthe ML-based approach previously derived in [24] has been proposed in order to incorporate the imperfectnature of the wireless communication channels.
B. Contribution
In this paper, we generalize previous source localization works by proposing a localization algorithm for anunknown number of sources given some quantized data obtained at the fusion center from different sensorswith imperfect wireless channels. The statistical approach we derive is based on the recent and efficientsampling framework known as Sequential Monte Carlo Samplers (SMC Samplers) [27], [28], and is ableto estimate jointly the unknown number of sources as well as their associated parameters (locations andtransmitted powers) by providing all the information included in the approximated posterior distribution. Inaddition, we also derive the PCRB which provides a theoretical performance limit for the Bayesian estimatorof the locations as well as the transmitted powers of the K sources. We demonstrate that the proposedframework provides significant improvement over classical importance sampling type methods that have beenused for a single source context in [26] and adapted here for multiple sources.II. P ROBLEM F ORMULATION
In this section we first present the system model, and then develop the Bayesian framework for jointlyestimating the unknown number of sources as well as their locations and transmitted powers.
A. Wireless Sensor Network System Model
As illustrated in Fig. 1, we are interested in localizing an unknown number of targets in a wireless sensorenvironment where homogeneous and low-cost wireless sensors are utilized. All the sensors report to a fusioncenter which then performs the estimation of the target locations based on local sensor observations. Sensorscan be deployed in any manner since our approach is capable of handling any kind of deployment as longas the location information for each sensor is available at the fusion center.Each target is assumed to be a source that follows the power attenuation model. We thus use a signalattenuation model to represent the observed power that is emitted by each target [24]. This signal attenuationmodel is based on the fact that an omnidirectional point source emits signals that attenuate at a rate inverselyproportional to the distance from the source, for instance a traveling wave that may be propagating throughground surface or an acoustic pressure wave traveling through free-space medium.
September 4, 2018 DRAFT x-coordinate [m] y - c oo r d i n a t e [ m ] SensorsTargets
Fig. 1: Example of two targets in a grid deployed sensor field.In this work, as in [22], we will further assume that the intensities of the K sources will be linearlysuperimposed without any interaction between them. The received signal amplitude at the i -th sensor ( i =1 , . . . , N ) is thus given by s i = a i + n i , (1)where the measurement noise term, n i , is modeled as an additive white Gaussian noise (AWGN), i.e., n i ∼N (0 , σ ) which represents the cumulative effects of sensor background noise and the modeling error of signalparameters (the Gaussian assumption is generally admitted since the central limit theorem could be appliedon a processed signal resulting on the average of the samples received during a time period). The true signalamplitude a i from all the targets is defined as [22]: a i = K X k =1 P / k (cid:18) d d i,k (cid:19) n , (2)where P k denotes the k -th source signal power at a reference distance d . The signal decay n is approximately when the detection distance is less than 1km [21]. Finally d i,k corresponds to the distance between the i -thsensor and the k -th target: d i,k = q ( x k − c x,i ) + ( y k − c y,i ) , (3)where ( c x,i , c y,i ) and ( x k , y k ) are the coordinates of the i -th sensor and the k -th target, respectively. In thiswork, we assume that sensor noises as well as wireless links between the sensors and the fusion center areindependent across sensors, and that σ is known (although it is not required for our proposed approach towork - this could be indeed embedded in the parameters to be inferred). September 4, 2018 DRAFT
Fig. 2: Illustration of the system model.As illustrated in Fig. 2, at each sensor, the received signal is quantized before being sent to the fusioncenter. Quantization is done locally at the sensors in order to decrease the communication bandwidth onthe sensors thereby reducing energy consumption. The data is quantized using an M -bit quantizer ( M ≥ )which takes values from 0 to M − where L = 2 M is the number of quantization levels. The quantizer ofthe i -th sensor transforms its input s i to its output b i through a mapping: R
7→ { , . . . , L − } such that b i = λ i, ≤ s i < λ i, , λ i, ≤ s i < λ i, , ... ... L − λ i,L − ≤ s i < λ i,L , (4)with λ i, = −∞ and λ i,L = + ∞ . Let θ K = h P , x , y , . . . , P K , x K , y K i T denote all the K source locationsand their associated transmitted powers. Under Gaussian assumption of the measurement noise, the probabilitythat b i takes a specific value l ∈ { , . . . , L − } is: p ( b i = l | θ K ) = Q (cid:18) λ i,l − a i σ (cid:19) − Q (cid:18) λ i,l +1 − a i σ (cid:19) , (5)where Q ( · ) is the complementary distribution function of the Gaussian distribution defined as: Q ( x ) = Z + ∞ x √ π e − t dt. (6)Finally, the quantized observation are transmitted to the fusion center through an imperfect channel whichmay introduce transmission errors. Let z = h z , . . . , z N i denote the observations collected at the fusioncenter via independent channels from the N sensors. As in [11], [25], [29], the probability of a received September 4, 2018 DRAFT observation z i taking a specific value j , given the targets’ parameters, θ K , can be written as: p ( z i = j | θ K ) = L − X m =0 p ( z i = j | b i = m ) p ( b i = m | θ K ) , (7)where p j,m := p ( z i = j | b i = m ) represents the transition probabilities of the wireless channel, see [11], [25],[29].Since sensor noises and wireless links are assumed to be independent, the likelihood function at the fusioncenter can be written as: p ( z | θ K ) = N Y i =1 p ( z i | θ K )= N Y i =1 " L − X m =0 p ( z i | b i = m ) p ( b i = m | θ K ) . (8)Concerning the prior information related to the parameters of interest θ , we use in this work: p ( θ K ) = K Y k =1 p ( x k , y k ) p ( P k ) , (9)where p ( x k , y k ) = N ( µ p , Σ p ) ,p ( P k ) = IG ( a, b ) , (10)with µ p set as the center of the ROI and Σ p = diag ( (cid:2) σ p,x σ p,x (cid:3) ) is the covariance matrix which is verycoarse so that its 99% confidence region covers the entire ROI. IG ( a, b ) corresponds to the inverse gammadistribution with a and b being the shape and the scale parameter, respectively. Note that the proposedinference algorithm does not require the prior distributions to be Gaussian and inverse-gamma and will workwith other prior distribution choices as required for a given application. B. Multiple Source Localization in a Bayesian Framework
In this work, we are interested in estimating the unknown number of sources as well as their parameters(locations and transmitted powers). This problem can therefore be seen as a joint model selection and parameterestimation task. We have a collection of K max competing models {M k } k ∈{ ,...,K max } (corresponding in ourcase to the number of sources in the ROI) and one of them generates the observations obtained at the fusioncenter. Associated with each model, there is a vector of parameters θ k ∈ Θ k , where Θ k denotes the parameterspace of the model M k . The objective is to identify the true model as well as to estimate the parameters, θ k = h P , x , y , . . . , P k , x k , y k i T , associated with this model. September 4, 2018 DRAFT
Bayesian inference proceeds from a prior distribution over the collection of models, p ( M k ) , a priordistribution for the parameters of each model, p ( θ k |M k ) and the likelihood under each model p ( z | θ k , M k ) .In order to solve this joint model selection and parameter estimation under this Bayesian framework, we firstemploy a Maximum A-Posterior (MAP) model selection rule and therefore provides a parameter estimateunder the selected model. Following the model selection step is the inference step of the model parameterswhich can then be deduced from the posterior distribution associated with the model M k ∗ , i.e. p ( θ k ∗ | z , M k ∗ ) .This procedure is summarised as follows:1) Model selection: k ∗ = arg max k { p ( M k | z ) } = arg max k { p ( z |M k ) p ( M k ) } , (11)2) Model parameters estimation via Bayes rule: p ( θ k ∗ | z , M k ∗ ) = p ( z | θ k ∗ , M k ∗ ) p ( θ k ∗ |M k ∗ ) p ( z |M k ∗ ) . (12)Deriving the expressions in (11-12) involves calculating the evidence of the k -th model M k : p ( z |M k ) = Z Θ k p ( z | θ k , M k ) p ( θ k |M k ) d θ k = Z Θ k N Y i =1 " L − X m =0 p ( z i | b i = m ) p ( b i = m | θ k ) k Y n =1 p ( x n , y n |M k ) p ( P n |M k ) d θ k = Z Θ k N Y i =1 " L − X m =0 p i,m (cid:18) Q (cid:18) λ i,m − a i σ (cid:19) − Q (cid:18) λ i,m +1 − a i σ (cid:19)(cid:19) k Y n =1 N x n y n ; µ p , Σ p IG ( P n ; a, b ) d θ k (13)Unfortunately, owing to the highly nonlinear observation function of the parameters of interest in Equations(1-2), the integral in (13) is intractable. As a result, ∀ k ∈ { , . . . , K max } , both the evidence p ( z |M k ) and theposterior distribution of the parameters p ( θ k | z , M k ) are intractable. In this work, we propose to use SMCsampler in order to have an accurate approximation of both quantities.III. P ROPOSED B AYESIAN A LGORITHM TO M ULTIPLE S OURCE L OCALIZATION IN
WSNIn this section we first introduce the general principle of SMC samplers, then develop the SMC samplerfor multiple source localisation, and finally we derive the point estimate for the parameters of interest.
September 4, 2018 DRAFT
A. General Principle of SMC Samplers
Sequential Monte Carlo (SMC) methods are a class of sampling algorithms which combine importancesampling and resampling. They have been primarily used as “particle filters” to solve optimal filteringproblems; see, for example, [30] and [31] for recent reviews. In this context, SMC methods/particle filtershave enjoyed wide-spread use in various applications (tracking, computer vision, digital communications)due to the fact that they provide a simple way of approximating complex filtering distribution sequentiallyin time. However in [27], [28], there have been a range of developments to create a general framework thatallows SMC to be used to simulate from a single and static target distribution, thus becoming an interestingalternative to standard MCMC methods as well as to population-based MCMC algorithms. Finally, let usnote that there exists a few other SMC methods appropriate for static inference such as annealed importancesampling [32], the sequential particle filter of [33] and population Monte Carlo [34] but all of these methodscan be regarded as a special case of the SMC sampler framework.The SMC sampler is based on two main ideas:a) Rather than sampling directly the complex distribution of interest, a sequence of intermediate targetdistributions, { π t } Tt =1 , is designed, that transitions smoothly from a simpler distribution to the one ofinterest. In Bayesian inference problems, the target distribution is the posterior π T ( θ ) = p ( θ | z ) , thus anatural choice for such a sequence of intermediate distributions is to select the following [32] π t ( θ ) = γ t ( θ ) Z t ∝ p ( θ ) p ( z | θ ) φ t (14)where { φ t } is a non-decreasing temperature schedule with φ = 0 and φ T = 1 and γ t ( θ ) correspondsto the unnormalized target distribution (cid:0) i.e. γ t ( θ ) = p ( θ ) p ( z | θ ) φ t (cid:1) and Z t = R Θ p ( θ ) p ( z | θ ) φ t d θ is thenormalization constant. We initially target the prior distribution π = p ( θ ) which is generally easy tosample directly from and then introduce the effect of the likelihood gradually in order to obtain at theend, t = T , the complex posterior distribution of interest π T ( θ ) = p ( θ | z ) as target distribution.b) The idea is to transform this problem in the standard SMC filtering framework, where the sequence oftarget distributions on the path-space, denoted by { e π t } Tt =1 , which admits π t ( x t ) as marginals, is definedon the product space, i.e., supp ( e π t ) = Θ × Θ × ... × Θ = Θ t . This novel sequence of joint targetdistributions ˜ π t is defined as follows: e π t ( θ t ) = e γ t ( θ t ) Z t , (15)where e γ t ( θ t ) = γ t ( θ t ) t − Y k =1 L k ( θ k +1 , θ k ) , (16) September 4, 2018 DRAFT in which the artificial kernels introduced {L k } t − k =1 are called backward Markov kernels since L t ( θ t +1 , θ t ) denotes the probability density of moving back from θ t +1 to θ t . By using such a sequence of extendedtarget distributions { e π t } Tt =1 based on the introduction of backward kernels {L k } t − k =1 , sequential impor-tance sampling can thus be utilized in the same manner as standard SMC filtering algorithms.Within this framework, one may then work with the constructed sequence of distributions, e π t , under thestandard SMC algorithm [35]. In summary, the SMC sampler algorithm therefore involves three stages:1) Mutation: , where the particles are moved from θ t − to θ t via a mutation kernel K t ( θ t − , θ t ) also called forward kernel ;2) Correction: , where the particles are reweighted with respect to π t via the incremental importance weight(Equation (20)); and3) Selection: , where according to some measure of particle diversity, such as effective sample size, theweighted particles may be resampled in order to reduce the variability of the importance weights.In more detail, suppose that at time t − , we have a set of weighted particles n θ ( m )1: t − , f W ( m ) t − o Nm =1 thatapproximates ˜ π t − via the empirical measure e π Nt − ( d θ t − ) = N X m =1 f W ( m ) t − δ θ ( m )1: t − ( d θ t − ) (17)These particles are first propagated to the next distribution ˜ π t using a Markov kernel K t ( θ t − , θ t ) to obtainthe set of particles n θ ( m )1: t o Nm =1 . Importance Sampling (IS) is then used to correct for the discrepancy betweenthe sampling distribution η t ( θ t ) defined as η t ( θ ( m )1: t ) = η ( θ ( m )1 ) t Y k =2 K k ( θ ( m ) t − , θ ( m ) t ) , (18)and e π t ( θ t ) . In this case the new expression for the unnormalized importance weights is given by W ( m ) t ∝ ˜ π t ( θ ( m )1: t ) η t ( θ ( m )1: t ) = π t ( θ ( m ) t ) Q t − s =1 L s ( θ ( m ) s +1 , θ ( m ) s ) η ( θ ( m )1 ) Q tk =2 K k ( θ ( m ) k − , θ ( m ) k ) ∝ w t ( θ ( m ) t − , θ ( m ) t ) W ( m ) t − , (19)where w t , termed the (unnormalized) incremental weights , are calculated as, w t ( θ ( m ) t − , θ ( m ) t ) = γ t ( θ ( m ) t ) L t − ( θ ( m ) t , θ ( m ) t − ) γ t − ( θ ( m ) t − ) K t ( θ ( m ) t − , θ ( m ) t ) . (20)However, as in the particle filter, since the discrepancy between the target distribution ˜ π t and the proposal η t increases with t , the variance of the unnormalized importance weights tends therefore to increase as well,leading to a degeneracy of the particle approximation. A common criterion used in practice to check this September 4, 2018 DRAFT0 problem is the effective sample size
ESS which can be computed by:
ESS t = " N X m =1 ( f W ( m ) t ) − = (cid:18) N P m =1 W ( m ) t − w t ( θ ( m ) t − , θ ( m ) t ) (cid:19) N P j =1 (cid:16) W ( j ) t − (cid:17) (cid:16) w t ( θ ( j ) t − , θ ( j ) t ) (cid:17) . (21)If the degeneracy is too high, i.e., the ESS t is below a prespecified threshold, ESS , then a resampling step isperformed. The particles with low weights are discarded whereas particles with high weights are duplicated.After resampling, the particles are equally weighted. To sum up the algorithm proceeds as shown in Algorithm1.
Algorithm 1
Generic SMC Sampler Algorithm Initialize particle system Sample n θ ( m )1 o Nm =1 ∼ η ( · ) and compute f W ( m )1 = (cid:18) γ ( θ ( m )1 ) η ( θ ( m )1 ) (cid:19) (cid:20)P Nj =1 γ ( θ ( j )1 ) η ( θ ( j )1 ) (cid:21) − and do resampling if ESS < ESS for t = 2 , . . . , T do Mutation: for each m = 1 , . . . , N : Sample θ mt ∼ K t ( θ ( m ) t − ; · ) where K t ( · ; · ) is a π t ( · ) invariant Markov kernel. Computation of the weights: for each m = 1 , . . . , NW ( m ) t = f W ( m ) t − γ t ( θ ( m ) t ) L t − ( θ ( m ) t , θ ( m ) t − ) γ t − ( θ ( m ) t − ) K t ( θ ( m ) t − , θ ( m ) t ) Normalization of the weights : f W ( m ) t = W ( m ) t hP Nj =1 W ( j ) t i − Selection: if
ESS t < ESS then Resample end for Let us mention two interesting estimates from SMC samplers. First, since ˜ π t admits π t as marginals byconstruction, for any ≤ t ≤ T , the SMC sampler provides an estimate of this distribution π Nt ( d θ ) = N X m =1 f W ( m ) t δ θ ( m ) t ( d θ ) (22)and an estimate of any expectations of some integrable function ϕ ( · ) with respect to this distribution by E π Nt [ ϕ ( θ )] = N X m =1 f W ( m ) t ϕ ( θ ( m ) t ) . (23)Secondly, the estimated ratio of normalizing constants Z t Z t − = R γ t ( θ ) d θ R γ t − ( θ ) d θ is given by [ Z t Z t − = N X m =1 f W ( m ) t − w t ( θ ( m ) t − , θ ( m ) t ) . (24) September 4, 2018 DRAFT1
Consequently, the estimate of Z t Z is c Z t Z = t Y k =2 \ Z k Z k − = t Y k =2 N X m =1 f W ( m ) k − w k ( θ ( m ) k − , θ ( m ) k ) . (25)If the resampling scheme used is unbiased, then (25) is also unbiased whatever the number of particles used[36]. Moreover, the complexity of this algorithm is O ( N ) and it can be easily parallelized.To conclude this section, let us summarize the advantages of SMC samplers over traditional and population-based MCMC methods. First, unlike MCMC, SMC methods do not require any burn-in period and do notface the sometimes contentious issue of diagnosing convergence of a Markov chain. Secondly, as discussedin [37], compared to population-based MCMC methods, the SMC sampler is a richer class of method sincethere is substantially more freedom in specifying the mutation kernels in SMC: kernels do not need to bereversible or even Markov (and hence time adaptive). Finally, unlike MCMC, SMC samplers provide anunbiased estimate of the normalizing constant of the posterior distribution which will be one of the quantitiesof interest in the inference problem tackled in this paper related to finding the number of targets that arepresent in the ROI. B. Proposed SMC Samplers for Bayesian Multiple Source Localization
Since the evidence of the model M k corresponds to the normalizing constant of the posterior distributionof the parameters associated with this model, i.e.: p ( θ k | z , M k ) = p ( z | θ k , M k ) p ( θ k |M k ) p ( z |M k ) = p ( z | θ k , M k ) p ( θ k |M k ) R Θ k p ( z | θ k , M k ) p ( θ k |M k ) d θ k , (26)we propose to use the following procedure:1) For each model M k , k ∈ , . . . , K max : approximate the conditional parameter posterior distribution p ( θ k | z , M k ) as well as the marginal likelihood p ( z |M k ) using K max SMC sampler algorithms inparallel (one for each model {M k } k ∈{ ,...,K max } ) using Equations (22) and (25), respectively.2) Perform the MAP Model selection rule: k ∗ = arg max k { ˆ p ( z |M k ) p ( M k ) } , (27)with ˆ p ( z |M k ) corresponds to the unbiased estimate obtained from the k -th SMC sampler.3) Provide a parameter estimate under the selected model, e.g. the minimum mean square (MMSE)estimate, using the empirical approximation of the posterior distribution p ( θ k ∗ | z , M k ∗ ) given by the k ∗ -th SMC sampler. September 4, 2018 DRAFT2
Let us now describe in more details the different choices required in the design of each SMC sampler: theappropriate sequence of distributions { π t } ≤ t ≤ T , the choice of both the mutation kernel {K t } ≤ t ≤ T and thebackward mutation kernel {L t − } Tt =2 .
1) Sequence of distributions π k,t : An annealing procedure which progressively introduces the effect ofthe likelihood function is chosen as the sequence of intermediate target distributions, i.e. for the k -th SMCsampler dealing with model M k : π k,t ( θ k,t ) = γ k,t ( θ k,t ) Z t ∝ p ( θ k,t |M k ) p ( z | θ k,t , M k ) φ k,t , (28)where { φ k,t } is a non-decreasing temperature schedule with φ k, = 0 and φ k,T = 1 . The question that arisesis how to choose this non-decreasing temperature schedule { φ k,t } t =1 ,...,T . Several statistical approaches havebeen proposed in order to automatically obtain such a schedule via the optimization of some criteria, whichare known as on-line schemes. [38] proposed an adaptive selection method based on controlling the rate of theeffective sample size ( ESS k,t ), defined in (21). This scheme thus provides an automatic method to obtain thetempering schedule such that the
ESS decays in a regular predefined way. However, one major drawback ofsuch an approach is that the
ESS k,t of the current sample weights corresponds to some empirical measure ofthe accumulated discrepancy between the proposal and the target distribution since the last resampling time.As a consequence, it does not really represent the dissimilarity between each pair of successive distributionsunless resampling is conducted after every iteration.In order to handle this problem, [39] proposed a slight modification of the
ESS , named the conditional
ESS ( CESS ), by considering how good an importance sampling proposal π k,t − would be for the estimationof expectation under π k,t . At the t -th iteration, this quantity is defined as follows: CESS k,t = N X i =1 N f W ( i ) k,t − w ( i ) k,t P Nj =1 N f W ( j ) k,t − w ( j ) k,t − = (cid:16)P Ni =1 f W ( i ) k,t − w ( i ) k,t (cid:17) P Nj =1 1 N f W ( j ) k,t − ( w ( j ) k,t ) . (29)In this work, this CESS proposed in [39] will be used in all the K max SMC samplers that are run in parallelfor each model in order to have an automatic specification of their individual temperature scheduling process.Owing to the on-line nature of this
CESS -based strategy, the total number of iterations performed by eachsampler is not fixed and does not required to be specified prior to the simulation.
2) Sequence of mutation kernels K k,t : The performance of SMC sampler depends heavily upon the se-lection of the transition kernels {K k,t } Tt =2 and the auxiliary backward kernels {L k,t − } Tt =2 . There are manypossible choices for K k,t which have been discussed in [27], [28]. In this study, we propose to employ MCMCkernels of invariant distribution π k,t for K k,t . This is an attractive strategy since we can use the vast literatureon the design of efficient MCMC algorithms to build effective and efficient importance distributions [40]. September 4, 2018 DRAFT3
More precisely, in this work, since we are interested in a complex model with potentially high-dimensionaland multimodal posterior distribution, a series Metropolis-within-Gibbs kernels with local moves [40] will beemployed in order to successively move the B k sub-blocks of the state of interest, θ k,t = [ ̺ k,t, , ̺ k,t, , · · · , ̺ k,t,B k ] .In this work, the sub-block corresponds to the parameters associated to one target, i.e. ̺ k,t,b = h P b , x b , y b i T for b = 1 , . . . , B k = k . A random walk proposal distribution is used for each sub-block with a multivariateGaussian distribution as proposal at the i -th iteration of the forward kernel: ̺ ∗ k,t,b = ̺ i − k,t,b + ε ib , (30)in which ε ib is a Gaussian random variable with zero mean and × covariance matrix Σ . The Metropoliswithin Gibbs used in the implementation of the SMC sampler in this paper is summarized in Algorithm 2. Algorithm 2
Metropolis-within-Gibbs Kernel K k,t ( · ; · ) for the m -th particle Initialization Set [ ̺ k,t, , . . . , ̺ k,t,k ] = θ k,t − for i = 1 , . . . , N MCMC do for b = 1 , . . . , k do Sample ̺ ∗ k,t,b ∼ N (cid:16) ̺ i − k,t,b , Σ (cid:17) Compute the Acceptance ratio: α ( θ ∗ k , θ k ) = min (cid:26) , p ( z | θ ∗ k , M k ) φ k,t p ( θ ∗ k ) p ( z | θ k , M k ) φ k,t p ( θ k ) (cid:27) with θ ∗ k = [ ̺ ik,t, , . . . , ̺ ik,t,b − , ̺ ∗ b , ̺ i − k,t,b +1 , . . . , ̺ i − k,t,k ] and θ k =[ ̺ ik,t, , . . . , ̺ ik,t,b − , ̺ i − b , ̺ i − k,t,b +1 , . . . , ̺ i − k,t,k ] Sample random variate u from U (0 , if u ≤ α ( θ ∗ k , θ k ) then ̺ ik,t,b = ̺ ∗ k,t,b else ̺ ik,t,b = ̺ i − k,t,b end if end for end for Set the new particle value at time t as θ ( m ) k,t = [ ̺ N MCMC k,t, , . . . , ̺ N MCMC k,t,k ]
3) Sequence of backward kernels L k,t : The backward kernel L k,t is arbitrary, however as discussed in[28], it should be optimized with respect to mutation kernel K k,t to obtain good performance. In [27], [28],it was established that the backward kernel which minimizes the variance of the unnormalized importanceweights, W k,t , are given by L opt k,t ( θ k,t +1 , θ k,t ) = η k,t ( θ k,t ) K k,t +1 ( θ k,t , θ k,t +1 ) η k,t +1 ( θ k,t +1 ) . (31)However, it is typically impossible to use these optimal kernels as they rely on marginal distributions η k,t ( θ k,t ) which do not admit any closed form expression, especially if an MCMC kernel is used as K k,t which has September 4, 2018 DRAFT4 a π k,t -invariant distribution. Thus we can either choose to approximate L opt k,t or choose kernels L k,t so thatthe importance weights are easily calculated or have a familiar form. If an MCMC kernel is used as forwardmutation kernel, the following L k,t is employed L k,t − ( θ k,t , θ k,t − ) = π k,t ( θ k,t − ) K k,t ( θ k,t − , θ k,t ) π k,t ( θ k,t ) , (32)which is a good approximation of the optimal backward kernel if the discrepancy between π k,t and π k,t − is small; note that (32) is the reversal Markov kernel associated with K k,t . In this case, the unnormalizedincremental weights becomes for the SMC sampler associated to the k -th model becomes w ( m ) k,t ( θ ( m ) k,t − , θ ( m ) k,t ) = γ k,t ( θ ( m ) k,t − ) γ k,t − ( θ ( m ) k,t − ) = p ( z | θ ( m ) k,t − , M k ) ( φ k,t − φ k,t − ) (33)where p ( z | θ ( m ) k,t − , M k ) is defined in Eq. (8). Expression (33) is remarkably easy to compute and validregardless of the MCMC kernel adopted. Note that φ k,t − φ k,t − is the step length of the cooling schedule ofthe likelihood at time t for the k -th sampler. As this step becomes larger, the discrepancy between π k,t and π k,t − increases, leading to an increase in the variance of the importance sampling approximation. Thus, it isimportant to construct a smooth sequence of distributions { π k,t } ≤ t ≤ T by judicious choice of an associatedreal sequence { φ k,t } Tt =0 as discussed in the previous section.Let us remark that when such backward kernel is used, the unnormalized incremental weights in (33) attime t does not depend on the particle value at time t but just on the previous particle set. It is known thatin such cases, the particles n θ ( m ) k,t o should be sampled after the weights n W ( m ) k,t o have been computed andafter the particle approximation nf W ( m ) k,t , θ ( m ) k,t − o has possibly been resampled.Based on this discussion regarding the different choices, the SMC sampler used in this paper is summarizedin Algorithm 3. C. Point Estimate for the parameters of interest
Once the model has been selected using the MAP criterion described in (27), the MMSE estimate of theparameters for the k ∗ -th model is obtained using (23): b θ k ∗ = E π Nk ∗ ,T [ ϕ ( θ )] = N X m =1 f W ( m ) k ∗ ,T ϕ ( θ ( m ) k ∗ ,T ) , (34) September 4, 2018 DRAFT5
Algorithm 3 k -th SMC Sampler Algorithm targeting the posterior distribution p ( θ k | z , M k ) Initialize particle system (set t = 1 and φ k, = 0 ) Sample n θ ( m ) k, o Nm =1 ∼ p ( θ k |M k ) and set f W ( m ) k, = 1 /N while φ k,t < do t = t + 1 Automatic Cooling procedure: Use a binary search to find φ ∗ such that CESS φ ∗ k,t = CESS and set φ k,t = φ ∗ if φ ∗ < otherwise φ k,t = 1 Computation of the weights: for each m = 1 , . . . , NW ( m ) k,t = f W ( m ) k,t − p ( z | θ ( m ) k,t − , M k ) ( φ k,t − φ k,t − ) Normalization of the weights : f W ( m ) k,t = W ( m ) k,t hP Nj =1 W ( j ) k,t i − Selection: if
ESS k,t < ESS then Resample Mutation: for each m = 1 , . . . , N : Sample θ ( m ) k,t ∼ K k,t ( θ ( m ) k,t − ; · ) where K k,t ( · ; · ) is a π k,t ( · ) invariant Markovkernel described in more details in Algo. 2. end while Output:
Unbiased approximation of the marginal likelihood : p ( z |M k ) ≈ t Q n =2 \ Z k,n Z k,n − = t Q n =2 N P m =1 f W ( m ) k,n − w k,n ( θ ( m ) k,n − , θ ( m ) k,n ) Empirical approximation of the posterior distribution p ( θ k | z , M k ) ≈ π Nk,t ( d θ k ) = N P m =1 f W ( m ) k,t δ θ ( m ) k,t ( d θ k ) where T denotes the last iteration of the k ∗ -th SMC sampler, since in this last iteration, the system of weightedparticles represents an empirical approximation of the target posterior distribution, i.e.: p ( θ k ∗ | z , M k ∗ ) = π k ∗ ,T ( θ k ∗ ) = N X m =1 f W ( m ) k ∗ ,T δ θ ( m ) k ∗ ,T ( d θ k ∗ ) . (35)Unfortunately, owing to the non-identifiability of the target label in the likelihood and to the same priorfor each target, the posterior distribution will be multimodal (as it will be illustrated in Fig 5). The posterioris indeed invariant under the permutations of source parameters, i.e., p ( θ k ∗ | z , M k ∗ ) = p ( ϑ ( θ k ∗ ) | z , M k ∗ ) (36)where ϑ ( · ) ∈ P denotes any the permutation for which the posterior is invariant and P is the set of thesepermutations.In such a case, the MMSE estimate would lead to very poor performance if selected as a point estimate ofthe source parameters. The problem of having a Monte-Carlo algorithm that approximates such a multimodaltarget posterior, which is invariant under permutation, is known in the literature as the label switching problem [41]. September 4, 2018 DRAFT6
There exists many algorithms that have been proposed in order to deal with this label switching problem inthe class of Monte-Carlo algorithms. A recent and detailed review of these techniques can be found in [42].Here, we are interested in only post-processing technique in order to extract an accurate point-estimate ofthe state of interest from our particle approximation of the posterior distribution. One of the most commonlyused relabeling algorithms is the one proposed in [41].Let us denote the unweighted set of particles obtained at the last iteration of the SMC sampler that targetsthe posterior distribution of the selected model by θ e = n θ (1) k ∗ , . . . , θ ( N ) k ∗ o . In the algorithm proposed in [41],one performs inference tasks (e.g. point estimation) as usual but with the relabeled samples, defined as: ϑ ( θ e ) = (cid:16) ϑ ( θ (1) ) , . . . , ϑ N ( θ ( N ) ) (cid:17) , (37)where ϑ = ( ϑ , . . . , ϑ N ) = arg max P×···×P L ( θ e , ϑ ) , (38)and L ( · ) is a user-defined cost-function, which is generally chosen as: L ( θ e , ϑ ) = N Y i =1 N (cid:16) ϑ i ( θ ( i ) ) | µ ϑ N , Σ ϑ N (cid:17) , (39)with µ ϑ N = 1 N N X i =1 ϑ i ( θ ( i ) ) , (40) Σ ϑ N = 1 N N X i =1 ( ϑ i ( θ ( i ) ) − µ ϑ N )( ϑ i ( θ ( i ) ) − µ ϑ N ) T . (41)The Gaussian cost function in (39) imposes the idea that one wants a relabeled sample to be the most Gaussianpossible among its permutations ϑ ( θ e ) , ϑ ∈ P N , in order for ϑ ( θ e ) to look as unimodal as possible.However, this technique is particularly costly since it involves a combinatorial optimization over P N , whichis unfeasible in practice: here the posterior is defined on R K and P is the group formed by the permutationsof K elements, P N has cardinality ( K !) N . As a consequence, in this work, we use the online version of thisalgorithm proposed in [43] and having a final cost of N ( K !) . To avoid the use of the resampling in order toget this set of unweighted particles, θ e , we propose an adaptation of this algorithm, described in Algo. 4, inorder to be able to use directly the set of weighted particles provided by the SMC sampler.IV. P OSTERIOR C RAM ´ ER -R AO BOUND FOR MULTIPLE SOURCE LOCALIZATION
In this section, we derive the posterior Cram´er-Rao bound (PCRB) as an estimation benchmark for theparameters. We will thus assume in this setting that we condition on the number of sources. This PCRB
September 4, 2018 DRAFT7
Algorithm 4
Online post-processing relabeling algorithm Input:
Set of weighted particles from the last iteration of the SMC sampler n θ ( m ) k ∗ ,T , f W k ∗ ,T o Nm =1 Sort the particle by descending order of their associated weights Set µ = θ (1) k ∗ ,T and θ (1) relabel = θ (1) k ∗ ,T and initialize Σ for n = 2 , . . . , N do Find ϑ n = arg max ϑ ∈P N (cid:16) ϑ ( θ ( n ) k ∗ ,T ) | µ n − , Σ n − (cid:17) Set θ ( n ) relabel = ϑ n ( θ ( n ) k ∗ ,T ) Set α ( i ) = f W ( i ) k ∗ ,T hP nj =1 f W ( j ) k ∗ ,T i − for i = 1 , . . . , n and compute: µ n = n X i =1 α ( i ) θ ( i ) relabel Σ n = n X i =1 α ( i ) ( θ ( i ) relabel − µ n )( θ ( i ) relabel − µ n ) T end for Use the relabeled collection of weighted particles n θ ( m ) relabel , f W ( m ) k ∗ ,T o Nm =1 to compute point estimate, e.g. MMSE: b θ k ∗ = N X i =1 f W ( i ) k ∗ ,T θ ( i ) relabel will thus provide a theoretical performance limit for the Bayesian estimator of the locations as well as thetransmitted powers of the K sources given the observations, z , obtained at the fusion center. Let us remarkthat in [25], the authors have derived the Cram´er-Rao bound for the single source problem with quantizeddata and imperfect channel between the sensors and the fusion center. Here, we propose to generalize thisresult by considering θ K as a random variable (Bayesian framework which leads to the posterior CRB) and θ K composed of K multiple sources.Indeed, the PCRB gives a lower bound for the error covariance matrix [44]: E (cid:20)(cid:16) ˆ θ K ( z ) − θ K (cid:17) (cid:16) ˆ θ K ( z ) − θ K (cid:17) T (cid:21) ≥ J − , (42)where J is the K × K Fisher information matrix (FIM) J = E (cid:2) ∇ θ log p ( z , θ K |M K ) ∇ T θ log p ( z , θ K |M K ) (cid:3) = − E h ∆ θθ log p ( z , θ K |M K ) i , (43)where ∆ θθ := ∇ θ ∇ T θ is the second derivative operator and ∇ θ is the gradient operator with respect to θ .Using the fact that p ( z , θ K |M K ) = p ( z | θ K , M K ) p ( θ K |M K ) , the expression of the FIM in ((43)) can be September 4, 2018 DRAFT8 expressed as: J = − E h ∆ θθ log p ( z | θ K , M K ) i − E h ∆ θθ log p ( θ K |M K ) i = J d + J p , (44)where J p represents the a priori information and J d is the “standard” FIM (used in the derivation of theCRB) averaged over the prior of the different location and power of the K sources: J d = Z Θ k J d ( θ K ) p ( θ K |M K ) d θ K . (45)As demonstrated in the Appendix, this standard FIM is defined for this problem as follows: J d ( θ K ) = N X i =1 L − X j =0 ∇ θ p ( z i = j | θ K , M K ) ∇ T θ p ( z i = j | θ K , M K ) p ( z i = j | θ K , M K ) , (46)with the gradient operator given by: ∇ θ = h ∂∂P ∂∂x ∂∂y · · · ∂∂P K ∂∂x K ∂∂y K , i T . (47)Using (7), the gradient term in (46) is expressed as: ∇ θ p ( z i = j | θ K , M K ) = L − X l =0 p ( z i = j | b i = l ) ∇ θ p ( b i = l | θ K , M K ) , (48)in which for k = 1 , . . . , K produces: ∂p ( b i = l | θ K , M K ) ∂P k = (cid:18) d d i,k (cid:19) n/ ρ i,l p πσ P k ,∂p ( b i = l | θ K , M K ) ∂x k = (cid:18) d d i,k (cid:19) n/ nP / k d − i,k ρ i,l ( p x,i − x k )2 √ πσ , (49) ∂p ( b i = l | θ K , M K ) ∂y k = (cid:18) d d i,k (cid:19) n/ nP / k d − i,k ρ i,l ( p y,i − y k )2 √ πσ , and ρ i,l = (cid:18) e − ( λi,l − ai )22 σ − e − ( λi,l +1 − ai )22 σ (cid:19) . (50)Although an analytical expression for J d ( θ K ) has been derived, in order to obtain J d involved in thecomputation of the FIM defined in (44), we need to resort to some numerical techniques for the approximationof the integral that defines this quantity in (45). The procedure we use is a simple Monte-Carlo integration:1) Draw N MC realization of the state from the prior: (cid:8) θ iK (cid:9) N MC i =1 ∼ p ( θ K ) .2) Approximate the quantity of interest by: J d ≈ N MC N MC X i =1 J d ( θ iK ) . (51) September 4, 2018 DRAFT9
Finally, the second term representing the a priori information in (44) is a K × K matrix defined as: J p = ξ Σ − p . . . ξ Σ − p , (52)with ξ = a ( a +1)( a +3) b - Proof:
See the Appendix.V. N
UMERICAL S IMULATIONS
In all the experiments, we consider a signal decay exponent and a reference distance as n = 2 and d = 1 respectively. The ROI is a m × m field in which 100 sensors are deployed in a grid where the locationof each sensor is assumed to be known. The thresholds of the M -bit quantizer defined in (4) are the same foreach sensor and are equally spaced between λ i, = 0 and λ i,L − = 22 , ∀ i = 1 , . . . , . The hyper-parametersin the prior distribution of the transmitted power of each source defined in (10) are a = 50 and b = 2 . × .An uniform distribution is used as the prior over the collection of models, i.e. ∀ k ∈ { , . . . , K max } we have p ( M k ) = 1 /K max . All the results have been obtained by using N MCMC = 5 in the MWG (summarized inAlgo. 2) used in the SMC sampler as forward kernel. In order to illustrate the benefit of using the SMCsampler, we have adapted to the problem considered in this paper the importance sampler (IS) that has beenproposed for a single source localization in [26]. For each model M k , this IS algorithm simply consistsin sampling N IS particles from the prior distribution in (10) and in assigning to each of these particles anweights which is proportional to the likelihood given in (8). In order to have a fair comparison betweenboth algorithms since the proposed SMC sampler adapts the number of iterations on-line using the proceduredescribed in Section III-B1, the number of particles used in the IS algorithm is set to N IS = T N , with T the total number of SMC iterations averaged over multiple runs for the configuration under study. As aconsequence, the complexity of these two algorithms is equivalent since the number of particles generatedfor all the results described below is the same for both schemes. A. Accuracy of the estimators
We first study the robustness and the accuracy of the estimators of the two main quantities of interest:model posterior probabilities and the MMSE of the parameters under each model. In order to perform thisanalysis, both schemes have been run 100 times on the same realization of observations from a scenariowith 4 sources. From the theory, we know that both IS and SMC algorithms provides, whatever the number
September 4, 2018 DRAFT0 of particles used, an unbiased estimate of log p ( z |M k ) which corresponds as discussed in (11) to the onlyunknown quantity in the model posterior distribution. However, as depicted in Fig. 3a, the variance of theestimator of this quantity obtained from these two algorithms is significantly different. If both algorithmsperform similarly for one source, the SMC sampler outperforms significantly the IS algorithm as the numberof sources increases. The same remark holds for the variance of the MMSE estimator shown in Fig. 3b whichis quite remarkable since the MMSE with the SMC sampler is only computed with N particles instead of N e T with the IS. The same remarks hold even for an increasing number of particles as illustrated in Fig.4. To understand these results, we present in Table I the effective sample size (ESS), defined in (21) whichrepresents the number of particles that will really contribute to the final estimator. We can see from these ESSresults that the IS algorithm completely collapses when the dimension of the model (i.e. the number of sourcesin the ROI) increases. As an example, for the model M , only 3-4 particles over the N IS = N e T = 1700 will really contribute to the MMSE estimator by having a non-negligible importance weights. Conversely, theESS from the SMC sampler is quite stable with the dimension of the model. All these results clearly illustratethe significant gain provided by the proposed SMC sampler and thus the benefit of gradually introducing thelikelihood information through the successive iterations of the algorithm. Model V a r i a n ce o f t h e l og e v i d e n cee s t i m a t o r M M M M M . . . . IS - N IS = 50 e T M k IS - N IS = 100 e T M k SMC - N = 50SMC - N = 100 (a) Model V a r i a n ce o f t h e MM S E e s t i m a t o r M M M M M IS - N IS = 50 e T M k IS - N IS = 100 e T M k SMC - N = 50SMC - N = 100 (b) Fig. 3: Comparison of the variance of both the log evidence, log p ( z |M k ) , estimator in (a) and MMSEestimator in (b) provided by the proposed SMC sampler and the IS algorithm - The parameter of the adaptivecooling schedule is set as CESS = 0 . N leading to the following average number of iterations e T M = 8 , e T M = 12 , e T M = 15 , e T M = 17 and e T M = 18 for both N = 50 or N = 100 (Number of quantizationlevels: L = 4 and σ = 1 )Let us now illustrate with a 2 targets scenario, the label switching problem discussed in Section III-C andthe importance of having a relabeling algorithm in order to provide a point estimate. In Fig. 5, we present September 4, 2018 DRAFT1 M M M M M IS 0.0836 0.0082 0.0030 0.0019 0.0018SMC 0.6180 0.6290 0.6297 0.6276 0.6319TABLE I: Comparison of the average Effective sample size defined in (21) and scaled by the number ofparticles for both SMC with N = 100 and IS algorithms for the different models N I S = N e T V a r i a n ce o f t h ee s t i m a t o r s − − IS - Log Evidence EstimatorSMC - Log Evidence EstimatorIS - MMSE EstimatorSMC - MMSE Estimator
Fig. 4: Evolution of the variance of the estimators of both the log evidence and the MMSE provided by theIS and the SMC for model M . For the SMC, the results have been obtained with N = 100 particles byvarying the CESS thus leading to a different average number of iterations, e T . (Number of quantization levels: L = 4 and σ = 1 )the marginal posterior distribution obtained with the proposed SMC sampler. We can first remark that thealgorithm is clearly able to capture the multimodality of each marginals. However, if the MMSE is directlycomputed from this approximation, the estimated y -coordinate for both targets will be approximately 50instead of 55 and 45. The proposed relabeling algorithm described in Algo. 4 allows to isolate both modesby finding the best permutations for all particles. The MMSE estimate by taking the particle system afterrelabeling will therefore provide an accurate point estimate close to the truth. Moreover from the estimatesof the posterior distribution in this figure, we can remark that the algorithm is able to detect that there are 2targets in the scene.Let us now illustrate with Fig. 6, the challenging problem of having two targets of interest that are placedvery close to each other ( [50 , and [50 , ). We remark that from the model posterior probabilities that thealgorithm is able to provide the uncertainty arising from this scenario between the model with 1 and 2 targets.The ability of the algorithm to distinguish two close targets will clear be a function of many parameters of thesensor network: distance between sensors, variance of the observation noise, number of quantization levelsas well as the quality of the wireless channel between the sensors and the fusion center. The approximation September 4, 2018 DRAFT2 of the posterior marginal distributions obtained from the SMC sampler under model M are both unimodalowing to the relatively small distance between the 2 sources. Once again in this case, we can see the benefitof using the relabeling algorithm for computing the final MMSE point estimator. y P o s t e r i o r m a r g i n a l d i s t r i bu t i o n
30 40 50 60 7000 . . . . . . . p ( y | z ) p ( y | z )True values (a) Posterior marginal before relabeling y P o s t e r i o r m a r g i n a l d i s t r i bu t i o n
30 40 50 60 7000 . . . . . p ( y | z ) p ( y | z )True values (b) Posterior marginal after relabeling Model M o d e l p o s t e r i o r p r o b a b ili t y M M M M M . . . . . . . . . (c) Model Posterior Fig. 5: Illustration of the relabeling and the model posterior obtained with the proposed SMC sampler ( N =200 and CESS = 0 . N ) in a scenario with two targets located at [50 , and [50 , (Number of quantizationlevels: L = 8 and σ = 0 . )Next, we compare the performances of the proposed algorithm when 4 sources are in the ROI. In orderto obtain the following results, 100 realizations of the four sources and associated observations have beendrawn from the prior and likelihood defined in Section II-A. Table II illustrates the ability of the proposedto detect the correct number of targets. The correct number of target ( i.e. Model M ) is chosen more oftenwith the proposed SMC sampler than the IS algorithm over the 100 scenarios. Even if both provides an September 4, 2018 DRAFT3 y P o s t e r i o r m a r g i n a l d i s t r i bu t i o n
30 40 50 60 7000 . . . . . p ( y | z ) p ( y | z )True values (a) Posterior marginal before relabeling y P o s t e r i o r m a r g i n a l d i s t r i bu t i o n
30 40 50 60 7000 . . . . . p ( y | z ) p ( y | z )True values (b) Posterior marginal after relabeling Model M o d e l p o s t e r i o r p r o b a b ili t y M M M M M . . . . . . . . . (c) Model Posterior Fig. 6: Illustration of the relabeling and the model posterior obtained with the proposed SMC sampler ( N =200 and CESS = 0 . N ) in a scenario with two close targets located at [50 , and [50 , (Number ofquantization levels: L = 8 and σ = 0 . )unbiased estimate of the evidence of each model, the significant lower variance of the estimator providedby the SMC sampler, which was illustrated previously in Fig. 3, allow to have a more efficient and accuratemodel decision step. September 4, 2018 DRAFT4 M M M M M IS 0 0 13 85 2SMC 0 0 3 96 1TABLE II: Comparison of the number of times that each model has been selected from the estimated modelposterior probabilities given by both the proposed SMC sampler the IS algorithm under 100 realizations( N = 100 , CESS = 0 . N , L = 4 and σ = 1 ) B. Localization Performance and PCRB
In Fig. 7 the performance of the proposed SMC sampler (and the IS algorithm) in term of the mean squarederror between point estimate ˆ θ p of the algorithm and the true location θ p of the four sources: M SE = trace n E h ( ˆ θ p − θ p )( ˆ θ p − θ p ) T io (53)with ˆ θ p = h ˆ x ˆ y · · · ˆ x ˆ y i T and θ p = h x y · · · x y i T represents the estimated (by usingthe relabeling algorithm) and the true location of the four targets, respectively. We also plot the associatedPCRB that we have derived in Section IV. In order to obtain the results we use 100 realizations (of thedifferent source characteristics and associated observations by avoiding the case in which two targets are veryclose). The results depicted in Figures 7 and 8 clearly demonstrate the good localization performance of theproposed algorithm and the significant gain compared to the IS algorithm which completely fails to localizefour targets. As expected, the accuracy on the localization improves with the increase of either the number ofsensors or the number of quantization levels as well as with the decrease of the measurement noise variance. Number of quantization levels L M e a nS q u a r e d E rr o r ( p o s i t i o n ) ISProposed SMCPCRB (a) σ = 1 Number of quantization levels L M e a nS q u a r e d E rr o r ( p o s i t i o n ) − ISProposed SMCPCRB (b) σ = 0 . Fig. 7: Evolution of the mean squared error for the source locations as a function of the number of quantizationlevels L with two different values of the measurement noise σ ( N = 100 and CESS = 0 . N ) September 4, 2018 DRAFT5
Number of sensors M e a nS q u a r e d E rr o r ( p o s i t i o n ) ISProposed SMCPCRB (a) σ = 1 Number of sensors M e a nS q u a r e d E rr o r ( p o s i t i o n ) ISProposed SMCPCRB (b) σ = 0 . Fig. 8: Evolution of the mean squared error for the source locations as a function of the number of sensorsin the ROI with two different values of the measurement noise σ (Number of quantization levels: L = 8 - N = 100 and CESS = 0 . N ) VI. C ONCLUSION
In this paper, we addressed the problem of localizing an unknown number of energy emitting sourcesin wireless sensor networks with quantized data. We provided a generalization of recent existing worksconsidering a single source. We proposed a Bayesian solution for the joint estimation of the unknown numberof sources as well as their associated parameters which is based on SMC sampler. Then, we derived theposterior Cram´er-Rao bound for the estimation of the characteristics of these multiple energy emitting sources.Numerical simulations clearly illustrated the ability of the proposed SMC sampler to perform this challengingjoint estimation. The different experiments showed that the proposed scheme based on novel SMC samplerimproves quite significantly the accuracy of the estimators method that are required for model selection (i.e.,the number of sources) and the estimation of the source characteristics compared to more classical importancesampling method. A
PPENDIX P ROOF OF THE P OSTERIOR C RAM ´ ER -R AO BOUND
As presented in Section IV, the Fisher information matrix (FIM) for the posterior Cram´er-Rao bound(PCRB) can be decomposed as follows: J = Z Θ k J d ( θ K ) p ( θ K |M K ) d θ K | {z } J d + E h − ∆ θθ log p ( θ K |M K ) i| {z } J p (54) September 4, 2018 DRAFT6 where ∆ θθ := ∇ θ ∇ T θ is the second derivative operator and ∇ θ is the gradient operator with respect to θ .In this appendix, we derive respectively J d ( θ K ) and J p .The information matrix J d ( θ K ) is defined as: J d ( θ K ) = − E z | θ K h ∆ θθ log p ( z | θ K , M K ) i . (55)The first derivative of the log likelihood is given by: ∇ T θ log p ( z | θ K , M K ) = N X i =1 ∇ T θ log p ( z i | θ K , M K )= N X i =1 ∇ T θ p ( z i | θ K , M K ) p ( z i | θ K , M K ) , (56)Therefore, the second derivative can be written as: ∆ θθ log p ( z | θ K , M K ) = ∇ θ ∇ T θ log p ( z | θ K , M K )= N X i =1 ∇ θ ∇ T θ p ( z i | θ K , M K ) p ( z i | θ K , M K ) − ∇ θ p ( z i | θ K , M K ) ∇ T θ p ( z i | θ K , M K ) p ( z i | θ K , M K ) . (57)To obtain (55), we now take the negative expectation of this second derivative with respect to p ( z i | θ K , M k = K ) : J d ( θ K ) = N X i =1 L − X j =0 p ( z i = j | θ K , M K ) (cid:26) − ∇ θ ∇ T θ p ( z i = j | θ K , M K ) p ( z i = j | θ K , M K )+ ∇ θ p ( z i = j | θ K , M K ) ∇ T θ p ( z i = j | θ K , M K ) p ( z i = j | θ K , M K ) (cid:27) = N X i =1 L − X j =0 ∇ θ p ( z i = j | θ K , M K ) ∇ T θ p ( z i = j | θ K , M K ) p ( z i = j | θ K , M K ) −∇ θ ∇ T θ p ( z i = j | θ K , M K ) . (58)The second term is equal to 0 since: N X i =1 L − X j =0 ∇ θ ∇ T θ p ( z i = j | θ K , M K ) = N X i =1 ∇ θ ∇ T θ L − X j =0 p ( z i = j | θ K , M K ) | {z } =1 = 0 . (59) September 4, 2018 DRAFT7
As a consequence, we finally obtain: J d ( θ K ) = N X i =1 L − X j =0 ∇ θ p ( z i = j | θ K , M K ) ∇ T θ p ( z i = j | θ K , M K ) p ( z i = j | θ K , M K ) . (60)Using (7), the gradient term involved in this expression can be expressed as: ∇ θ p ( z i = j | θ K , M K ) = L − X l =0 p ( z i = j | b i = l ) ∇ θ p ( b i = l | θ K , M K ) , (61)with p ( b i = l | θ K , M K ) = Q (cid:18) λ i,l − a i σ (cid:19) − Q (cid:18) λ i,l +1 − a i σ (cid:19) . (62)As a consequence, since the Q - function is the complementary Gaussian cumulative distribution, we caneasily remark that: ∇ θ p ( b i = l | θ K , M K ) = 1 √ πσ (cid:18) e − ( λi,l − ai )22 σ − e − ( λi,l +1 − ai )22 σ (cid:19)| {z } ρ i,l ∇ θ a i . (63)Finally from the definition of a i in (2), we obtain, for k = 1 , . . . , K : ∂p ( b i = l | θ K , M K ) ∂P k = (cid:18) d d i,k (cid:19) n/ ρ i,l p πσ P k ,∂p ( b i = l | θ K , M K ) ∂x k = (cid:18) d d i,k (cid:19) n/ nP / k d − i,k ρ i,l ( p x,i − x k )2 √ πσ , (64) ∂p ( b i = l | θ K , M K ) ∂y k = (cid:18) d d i,k (cid:19) n/ nP / k d − i,k ρ i,l ( p y,i − y k )2 √ πσ , which completes the analytical calculation of J d ( θ K ) .Finally, we derive the a priori information matrix given by: J p = E h − ∆ θθ log p ( θ K |M K ) i . (65)From the prior distributions in (9-10), each target’s location and power are independent and identicallydistributed. J p will be therefore a K × K block diagonal matrix with information associated to the locationand the power defined respectively as: E h − ∆ [ x k ,y k ] T [ x k ,y k ] T log N ([ x k , y k ] T | µ p , Σ p ) i = Σ − p , (66) September 4, 2018 DRAFT8 and E h − ∆ P k P k log IG ( P k | a, b ) i = E (cid:20) − ∂ ∂P k log (cid:26) b a Γ( a ) P − a − k exp (cid:18) − bP k (cid:19)(cid:27)(cid:21) = E (cid:20) ∂ ∂P k ( a + 1) log( P k ) + bP k (cid:21) = E (cid:2) bP − k − ( a + 1) P − k (cid:3) = 2 b E (cid:2) P − k (cid:3) − ( a + 1) E (cid:2) P − k (cid:3) . (67)Let us now derive the two moments involved in this expression. We have, for n > : E (cid:2) P − nk (cid:3) = Z + ∞ P − nk b a Γ( a ) P − a − k exp (cid:18) − bP k (cid:19) dP k = b a Γ( a ) Z + ∞ P − ( a + n ) − k exp (cid:18) − bP k (cid:19) dP k = b a Γ( a ) Γ( a + n ) b a + n . (68)The last expression is obtained from the expression of the normalizing constant of an inverse-gamma distri-bution, IG ( a + n, b ) . By using the equality of the Gamma function, Γ( a + 1) = a Γ( a ) , we obtain: E (cid:2) P − k (cid:3) = ( a + 1) ab , (69) E (cid:2) P − k (cid:3) = ( a + 2)( a + 1) ab . (70)By plugging these expressions in (67), the prior information for the power is given by: E h − ∆ P k P k log IG ( P k | a, b ) i = a ( a + 1)( a + 3) b = ξ, (71)leading to J p = ξ Σ − p . . . ξ Σ − p . (72)R EFERENCES [1] J. K. Hart and K. Martinez, “Environmental sensor networks: A revolution in the earth system science?”
Earth-Science Reviews ,vol. 78, no. 3, pp. 177–191, 2006.
September 4, 2018 DRAFT9 [2] S. Rajasegarar, P. Zhang, Y. Zhou, S. Karunasekera, C. Leckie, and M. Palaniswami, “High resolution spatio-temporal monitoringof air pollutants using wireless sensor networks,” in
Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP),2014 IEEE Ninth International Conference on . IEEE, 2014, pp. 1–6.[3] A. Kottas, Z. Wang, and A. Rodrguez, “Spatial modeling for risk assessment of extreme values from environmental time series:a bayesian nonparametric approach,”
Environmetrics , vol. 23, no. 8, pp. 649–662, 2012.[4] C. Fonseca and H. Ferreira, “Stability and contagion measures for spatial extreme value analyses,” arXiv preprintarXiv:1206.1228 , 2012.[5] J. P. French and S. R. Sain, “Spatio-temporal exceedance locations and confidence regions,”
Annals of Applied Statistics. Prepress ,2013.[6] P. Zhang, J. Y. Koh, S. Lin, and I. Nevat, “Distributed event detection under byzantine attack in wireless sensor networks,”in
Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), 2014 IEEE Ninth International Conference on .IEEE, 2014, pp. 1–6.[7] K. Sohraby, D. Minoli, and T. Znati,
Wireless sensor networks: technology, protocols, and applications . John Wiley & Sons,2007.[8] K. Lorincz, D. J. Malan, T. R. Fulford-Jones, A. Nawoj, A. Clavel, V. Shnayder, G. Mainland, M. Welsh, and S. Moulton,“Sensor networks for emergency response: challenges and opportunities,”
Pervasive Computing, IEEE , vol. 3, no. 4, pp. 16–23,2004.[9] K. Chintalapudi, T. Fu, J. Paek, N. Kothari, S. Rangwala, J. Caffrey, R. Govindan, E. Johnson, and S. Masri, “Monitoring civilstructures with a wireless sensor network,”
Internet Computing, IEEE , vol. 10, no. 2, pp. 26–34, 2006.[10] I. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “Wireless sensor networks: a survey,”
Computer networks , vol. 38,no. 4, pp. 393–422, 2002.[11] I. Nevat, G. W. Peters, and I. B. Collings, “Random field reconstruction with quantization in wireless sensor networks,”
IEEETransactions on Signal Processing , vol. 61, pp. 6020–6033, 2013.[12] ——, “Distributed detection in sensor networks over fading channels with multiple antennas at the fusion centre,”
IEEEtransactions on signal processing , vol. 62, no. 1-4, pp. 671–683, 2014.[13] F. Fazel, M. Fazel, and M. Stojanovic, “Random access sensor networks: Field reconstruction from incomplete data,” in
IEEEInformation Theory and Applications Workshop (ITA) , 2012, pp. 300–305.[14] J. Matamoros, F. Fabbri, C. Ant´on-Haro, and D. Dardari, “On the estimation of randomly sampled 2d spatial fields underbandwidth constraints,”
IEEE Transactions on Wireless Communications, , vol. 10, no. 12, pp. 4184–4192, 2011.[15] O. Schabenberger and F. J. Pierce,
Contemporary Statistical Models for the Plant and Soil Sciences . CRC Press, 2002.[16] I. Akyildiz, M. Vuran, and O. Akan, “On exploiting spatial and temporal correlation in wireless sensor networks,”
Proceedingsof WiOpt04: Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks , pp. 71–80, 2004.[17] M. C. Vuran, O. B. Akan, and I. F. Akyildiz, “Spatio-temporal correlation: theory and applications for wireless sensor networks,”
Computer Networks Journal, Elsevier , vol. 45, pp. 245–259, 2004.[18] N. Patwari, J. N. Ash, S. Kyperountas, A. O. Hero, R. L. Moses, and N. S. Correal, “Locating the nodes: cooperative localizationin wireless sensor networks,”
Signal Processing Magazine, IEEE , vol. 22, no. 4, pp. 54–69, 2005.[19] G. Mao, B. Fidan, and B. Anderson, “Wireless sensor network localization techniques,”
Computer networks , vol. 51, no. 10,pp. 2529–2553, 2007.[20] D. Li, K. D. Wong, Y. H. Hu, and A. M. Sayeed, “Detection, classification and tracking of targets,”
IEEE Signal ProcessingMagazine , vol. 19, no. 3, pp. 17–29, Mar. 2002.[21] D. Li and Y. H. Hu, “Energy based collaborative source localization using acoustic microsensor array,”
EURASIP Journal onApplied Signal Processing , no. 4, pp. 321–337, 2003.
September 4, 2018 DRAFT0 [22] X. Sheng and Y. H. Hu, “Maximum likelihood multiple-source localization using acoustic energy measurements with wirelesssensor networks,”
IEEE Transactions on Signal Processing , vol. 53, no. 1, pp. 44–53, Jan. 2005.[23] D. Blatt and A. O. Hero, “Energy-based sensor network source localization via projection onto convex sets,”
IEEE Transactionson Signal Processing , vol. 54, no. 9, pp. 3614–3619, 2006.[24] R. Niu and P. K. Varshney, “Target Location Estimation in Sensor Networks With Quantized Data,”
IEEE Transactions on SignalProcessing , vol. 54, no. 12, pp. 4519–4528, Dec. 2006.[25] O. Ozdemir, R. Niu, and P. K. Varshney, “Channel Aware Target Localization With Quantized Data in Wireless Sensor Networks,”
IEEE Transactions on Signal Processing , vol. 57, no. 3, pp. 1190–1202, March 2009.[26] E. Masazade, R. Niu, P. K. Varshney, and M. Keskinoz, “Energy Aware Iterative Source Localization for Wireless SensorNetworks,”
IEEE Transactions on Signal Processing , vol. 58, no. 9, pp. 4824–4835, Sept. 2010.[27] G. W. Peters, “Topics in Sequential Monte Carlo Samplers,” Master’s thesis, University of Cambridge, Jan. 2005.[28] P. Del Moral, A. Doucet, and A. Jasra, “Sequential Monte Carlo samplers,”
Journal of the Royal Statistical Society: Series B(Statistical Methodology) , vol. 68, no. 3, pp. 411–436, 2006.[29] I. Nevat, O. Eger, G. W. Peters, and F. Septier, “NEPS: ”Narrowband Efficient Positioning System” for Delivering ResourceEfficient GNSS Receivers,” in
IEEE Ninth International Conference on Intelligent Sensors, Sensor Networks and InformationProcessing (ISSNIP) , Singapore, April 2014.[30] O. Capp´e, S. J. Godsill, and E. Moulines, “An overview of existing methods and recent advances in sequential monte carlo,”
Proceedings of the IEEE , vol. 95, no. 5, pp. 899–924, 2007.[31] A. Doucet and A. M. Johansen, “A tutorial on particle filtering and smoothing: Fifteen years later,”
Handbook of NonlinearFiltering , vol. 12, pp. 656–704, 2009.[32] R. M. Neal, “Annealed importance sampling,”
Statistics and Computing , vol. 11, no. 2, pp. 125–139, 2001.[33] N. Chopin, “A sequential particle filter method for static models,”
Biometrika , vol. 89, no. 3, pp. 539–552, 2002.[34] O. Capp´e, A. Guillin, J.-M. Marin, and C. P. Robert, “Population monte carlo,”
Journal of Computational and GraphicalStatistics , vol. 13, no. 4, 2004.[35] A. Doucet and N. De Freitas,
Sequential Monte Carlo methods in practice . Springer, 2001, vol. 1.[36] P. Del Moral and L. Miclo, “Branching and interacting particle systems approximations of feynman-kac formulae withapplications to non-linear filtering,”
Seminaire de Probabilites XXXIV, Lecture notes in Mathematics , pp. 1–145, 2000.[37] A. Jasra, D. Stephens, and C. Holmes, “On population-based simulation for static inference,”
Statistics and Computing , vol. 17,no. 3, pp. 263–279, 2007.[38] A. Jasra, D. A. Stephens, A. Doucet, and T. Tsagaris, “Inference for l´evy-driven stochastic volatility models via adaptivesequential monte carlo,”
Scandinavian Journal of Statistics , vol. 38, no. 1, pp. 1–22, 2011.[39] Y. Zhou, A. M. Johansen, and J. A. Aston, “Towards automatic model comparison: An adaptive sequential monte carlo approach,” arXiv preprint arXiv:1303.3123 , 2013.[40] C. P. Robert and G. Casella,
Monte Carlo statistical methods , 2nd ed. Springer, 2004.[41] M. Stephens, “Dealing with label switching in mixture models,”
Journal of the Royal Statistical Society, Series B , vol. 62, pp.795–809, 2000.[42] R. Bardenet, “Towards adaptive learning and inference: Applications to hyperparameter tuning and astroparticle physics,” Ph.D.dissertation, Universite Paris Sud XI, 2012.[43] G. Celeux, “Bayesian inference for mixtures: The label-switching problem,” in
In R. Payne and P. Green, editors, COMPSTAT98. Physica-Verlag , 1998.[44] H. L. Van Trees,
Detection, Estimation, and Modulation Theory, Part I . New York: Wiley, 1968.. New York: Wiley, 1968.