Sensor Selection for Estimation with Correlated Measurement Noise
Sijia Liu, Sundeep Prabhakar Chepuri, Makan Fardad, Engin Masazade, Geert Leus, Pramod K. Varshney
JJOURNAL OF L A TEX CLASS FILES, VOL. XX, NO. XX, XX 2016 1
Sensor Selection for Estimationwith Correlated Measurement Noise
Sijia Liu,
Student Member, IEEE,
Sundeep Prabhakar Chepuri,
Student Member, IEEE,
Makan Fardad,
Member, IEEE,
Engin Masazade,
Member, IEEE,
Geert Leus,
Fellow, IEEE, andPramod K. Varshney,
Fellow, IEEE
Abstract —In this paper, we consider the problem of sensorselection for parameter estimation with correlated measurementnoise. We seek optimal sensor activations by formulating anoptimization problem, in which the estimation error, given by thetrace of the inverse of the Bayesian Fisher information matrix,is minimized subject to energy constraints. Fisher informationhas been widely used as an effective sensor selection criterion.However, existing information-based sensor selection methods arelimited to the case of uncorrelated noise or weakly correlatednoise due to the use of approximate metrics. By contrast, herewe derive the closed form of the Fisher information matrix withrespect to sensor selection variables that is valid for any arbitrarynoise correlation regime, and develop both a convex relaxationapproach and a greedy algorithm to find near-optimal solutions.We further extend our framework of sensor selection to solvethe problem of sensor scheduling, where a greedy algorithm isproposed to determine non-myopic (multi-time step ahead) sensorschedules. Lastly, numerical results are provided to illustrate theeffectiveness of our approach, and to reveal the effect of noisecorrelation on estimation performance.
Index Terms —Sensor selection, sensor scheduling, parameterestimation, correlated noise, convex relaxation.
I. I
NTRODUCTION W IRELESS sensor networks consisting of a large num-ber of spatially distributed sensors have been widelyused for environmental monitoring, source localization, andtarget tracking [1]–[3]. Among the aforementioned applica-tions, sensors observe an unknown parameter or state ofinterest and transmit their measurements to a fusion center,which then determines the global estimate. However, due tothe constraints on the communication bandwidth and sensorbattery life, it may not be desirable to have all the sensors
Copyright (c) 2015 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected]. Liu, M. Fardad and P. K. Varshney are with the Department of ElectricalEngineering and Computer Science, Syracuse University, Syracuse, NY, 13244USA e-mail: { sliu17, makan, varshney } @syr.edu.S. P. Chepuri and G. Leus are with the Faculty of Electrical Engineering,Mathematics and Computer Science, Delft University of Technology, TheNetherlands. Email: { s.p.chepuri, g.j.t.leus } @tudelft.nl.E. Masazade is with the Department of Electrical and ElectronicsEngineering, Yeditepe University, Istanbul, 34755, Turkey e-mail: [email protected] work of S. Liu and P. K. Varshney was supported by the U.S. Air ForceOffice of Scientific Research (AFOSR) under grants FA9550-10-1-0458. Thework of M. Fardad was supported by the National Science Foundation (NSF)under awards EAGER ECCS-1545270 and CNS-1329885. The work of S. P.Chepuri and G. Leus is supported by NWO-STW under the VICI program(10382). The work of E. Masazade was supported by the Scientific andTechnological Research Council of Turkey (TUBITAK) under grant 113E220. report their measurements at all time instants. Therefore, theproblem of sensor selection/scheduling arises, which aimsto strike a balance between estimation accuracy and sensoractivations over space and/or time. The importance of sensorselection has been discussed extensively in the context ofvarious applications, such as target tracking [4], bit allocation[5], field monitoring [6], [7], optimal control [8], powerallocation [9], [10], optimal experiment design [11], and leaderselection in consensus networks [12].In this paper, we focus on the problem of sensor selec-tion/scheduling for parameter estimation similar to [12]–[15],but with a key difference in that the measurement noise iscorrelated in the problem formulation. In [13], the sensorselection problem was elegantly formulated under linear mea-surement models, and solved via convex optimization. In [14],the problem of sensor selection was generalized to nonlinearmeasurement models by using the Cram´er-Rao bound as thesensor selection criterion. In [12], a particular class of sensorselection problems were transformed into the problem ofleader selection in dynamical networks. In [15], the problemof non-myopic scheduling that determined sensor activationsover multiple future time steps was addressed for nonlinearfiltering with quantized measurements.In the existing literature [12]–[15], the study of sensorselection/scheduling problems hinges on the assumption of uncorrelated measurement noise, which implies that sensorobservations are conditionally independent given the underly-ing parameter. Due to conditional independence, each mea-surement contributes to Fisher information (equivalently, in-verse of the Cram´er-Rao bound on the error covariance matrix)in an additive manner [16]. Accordingly, Fisher informationbecomes a linear function with respect to the sensor selectionvariables (which characterize the subset of sensors we select),and thus the resulting selection problem can be efficientlyhandled via convex optimization [13], [14]. However, thesensed data is often corrupted by correlated noise due to thenature of the monitored physical environment [17]. Therefore,development of sensor selection schemes for correlated mea-surements is a critical task.Recently, it has been shown in [18]–[21] that the presenceof correlated noise makes optimal sensor selection/schedulingproblems more challenging, since Fisher information is nolonger a linear function with respect to the selection variables.In [18]–[20], the problem of sensor selection with correlatednoise was formulated so as to minimize an approximateexpression of the estimation error subject to an energy con- a r X i v : . [ s t a t . A P ] M a r OURNAL OF L A TEX CLASS FILES, VOL. XX, NO. XX, XX 2016 2 straint or to minimize the energy consumption subject to anapproximate estimation constraint. In [21], a reformulationof the multi-step Kalman filter was introduced to schedulesensors for linear dynamical systems with correlated noise.Different from [18]–[21], here we derive the closed form ex-pression of the estimation error with respect to sensor selectionvariables under correlated measurement noise, which is validfor any arbitrary noise correlation matrix. This expression isoptimized via a convex relaxation method to determine theoptimal sensor selection scheme. We also propose a greedyalgorithm to solve the corresponding sensor selection problem,where we show that when an inactive sensor is made active,the increase in Fisher information yields an information gainin terms of a rank-one matrix. The proposed sensor selectionframework yields a more accurate sensor selection schemethan those presented in [18]–[20], because the schemes of[18]–[20] consider an approximate formulation where thenoise covariance matrix is assumed to be independent of thesensor selection variables. We further demonstrate that theprior formulations for sensor selection are valid only whenmeasurement noises are weakly correlated. In this scenario,maximization of the trace of the Fisher information matrixused in [20] is equivalent to the problem of maximizing aconvex quadratic function over a bounded polyhedron. Theresutling problem structure enables the use of optimizationmethods with reduced computational complexity.Compared to [21], we adopt the recursive Fisher informationto measure the estimation performance of sensor scheduling.However, for non-myopic (multi-time step ahead) schedules,the Fisher information matrices at consecutive time steps arecoupled with each other. Due to coupling, expressing theFisher information matrices in a closed form is intractable.Therefore, we propose a greedy algorithm to seek non-myopicsensor schedules subject to cumulative and individual energyconstraints. Numerical results show that our approach yieldsa better estimation performance than that of [21] for statetracking.In a preliminary version of this paper [22], we studied theproblem of sensor selection using the same framework asin [18]–[20]. Compared to [22], we have the following newcontributions in this paper. • We propose a more general but tractable sensor selectionframework that is valid for an arbitrary noise correlationmatrix, and present a suite of efficient optimization algo-rithms. • We reveal drawbacks of the existing formulations in [18]–[20] for sensor selection, and demonstrate their validityin only the weak noise correlation regime. • We extend the proposed sensor selection approach toaddress the problem of non-myopic sensor scheduling,where the length of the time horizon and energy con-straints on individual sensors are taken into account.The rest of the paper is organized as follows. In Section II,we formulate the problem of sensor selection with correlatednoise. In Section III, we present a convex relaxation approachand a greedy algorithm to solve the problem of sensor selectionwith an arbitrary noise correlation matrix. In Section IV,we present sensor selection approach with weakly correlated noise. In Section V, we extend our framework to solve theproblem of non-myopic sensor scheduling. In Section VI, weprovide numerical results to illustrate the effectiveness of ourproposed methods. In Section VII, we summarize our workand discuss future research directions.II. P
ROBLEM F ORMULATION
We wish to estimate a random vector x ∈ R n with aGaussian prior probability density function (PDF) N ( µ , Σ ) .Observations of x from m sensors are corrupted by correlatedmeasurement noise. To strike a balance between estimationaccuracy and sensor activations, we formulate the problem ofsensor selection, where the estimation error is minimized sub-ject to a constraint on the total number of sensor activations.Consider a linear system y = Hx + v , (1)where y ∈ R m is the measurement vector whose m th entrycorresponds to a scalar observation from the m th sensor, H ∈ R m × n is the observation matrix, and v ∈ R m is themeasurement noise vector that follows a Gaussian distribu-tion with zero mean and an invertible covariance matrix R .We assume that x and v are mutually independent randomvariables, and the noise covariance matrix is positive definiteand thus invertible. We note that the noise covariance matrixis not restricted to being diagonal, so that the measurementnoise could be correlated among the sensors. We also notethat in practice, the first two moments of x can be learnt froma parametric covariance model, such as a power exponentialmodel together with a training dataset of the parameter [23].The task of sensor selection is to determine the best subsetof sensors to activate in order to minimize the estimationerror, subject to a constraint on the number of activations. Weintroduce a sensor selection vector to represent the activationscheme w = [ w , w , . . . , w m ] T , w i ∈ { , } , (2)where w i indicates whether or not the i th sensor is selected.For example, if the i th sensor reports a measurement then w i = 1 , otherwise w i = 0 . In other words, the active sensormeasurements can be compactly expressed as y w = Φ w y = Φ w Hx + Φ w v , (3)where y w ∈ R (cid:107) w (cid:107) is the vector of measurements of selectedsensors, (cid:107) w (cid:107) is the (cid:96) -norm of w which yields the totalnumber of sensor activations, Φ w ∈ { , } (cid:107) w (cid:107) × m is asubmatrix of diag( w ) after all rows corresponding to theunselected sensors have been removed, and diag( w ) is adiagonal matrix whose diagonal entries are given by w . Wenote that Φ w and w are linked as below Φ w Φ Tw = I w and Φ Tw Φ w = diag( w ) , (4)where I w denotes an identity matrix with dimension (cid:107) w (cid:107) . OURNAL OF L A TEX CLASS FILES, VOL. XX, NO. XX, XX 2016 3
A. Minimum mean-squared estimation error
We employ the minimum mean square error (MMSE) esti-mator to estimate the unknown parameter under the Bayesiansetup. It is worth mentioning that the use of the Bayesianestimation framework ensures the validity of parameter esti-mation for an underdetermined system, in which the numberof selected sensors is less than the dimension of the parameterto be estimated, namely, (cid:107) w (cid:107) < n .Given the Gaussian linear measurement model (1), the priordistribution of the unknown parameter x and the active sensormeasurements (3), the error covariance matrix of the MMSEestimate of x is given by [24, Theorem 12.1] P w = (cid:0) Σ − + H T Φ Tw R − w Φ w H (cid:1) − , (5)where the matrix Φ w H comprises rows of H for the activesensors, and R w denotes the submatrix of R after all rowsand columns corresponding to the inactive sensors have beenremoved, i.e., R w = Φ w RΦ Tw . (6)It is clear from (5) that due to the presence of the priorknowledge about Σ , the MSE matrix P w is always welldefined, even if the matrix H T Φ Tw R − w Φ w H is not invertiblefor an underdetermined system with (cid:107) w (cid:107) ≤ n .It is known from [16] that the MSE matrix P w is theinverse of the Bayesian Fisher information matrix J w underthe linear Gaussian measurement model with a Gaussian priordistribution. We thus obtain J w = P − w = Σ − + H T Φ Tw R − w Φ w H , (7)where the second term is related to the sensor selectionscheme. In this paper, for clarity of presentation, we chooseto work with J w rather than P w .It is clear from (6) and (7) that the dependence of J w on w is through Φ w . This dependency does not lend itself to easyoptimization of scalar-valued functions of J w with respect to w . In what follows, we will rewrite J w as an explicit functionof the selection vector w . B. Fisher information J w as an explicit function of w The key idea of expressing (7) as an explicit function of w is to replace Φ w with w based on their relationship given by(4). Consider a decomposition of the noise covariance matrix[25] R = a I + S , (8)where a positive scalar a is chosen such that the matrix S ispositive definite, and I is the identity matrix. We remark thatthe decomposition given in (8) is readily obtained throughan eigenvalue decomposition of the positive definite matrix R , and it helps us in deriving the closed form of the Fisherinformation matrix with respect to w .Substituting (8) into (6), we obtain R w = Φ w ( a I + S ) Φ Tw = a I w + Φ w SΦ Tw , (9) where the last equality holds due to (4).Using (9), we can rewrite a part of the second term on theright hand side of (7) as Φ Tw R − w Φ w = Φ Tw ( a I w + Φ w SΦ Tw ) − Φ w (1) = S − − S − ( S − + a − Φ Tw Φ w ) − S − = S − − S − ( S − + a − diag( w )) − S − , (10)where step (1) is obtained from the matrix inversion lemma ,and step (2) holds due to (4).Substituting (10) into (7), the Fisher information matrix canbe expressed as J w = Σ − + H T S − H − H T S − ( S − + a − diag( w )) − S − H . (11)It is clear from (11) that the decomposition of R in (8),together with equations (9)-(10), allows us to make explicitand isolate the dependence of J w on w . We also remark thatthe positive scalar a and positive definite matrix S can bearbitrarily chosen, and have no effect on the performance ofthe sensor selection algorithms that will be proposed later on. C. Formulation of the optimal sensor selection problem
We now state the main optimization problem considered inthis work as minimize w tr( J − w ) subject to T w ≤ s, w ∈ { , } m , (P0)where J w ∈ R n is given by (11), and s ≤ m is a prescribedenergy budget given by the maximum number of sensors to beactivated. We recall that n is the dimension of the parameterto be estimated, and m is the number of sensors.We note that (P0) is a nonconvex optimization problem dueto the presence of Boolean selection variables. Moreover, ifwe drop the source statistics Σ from the MSE matrix (5)and impose the assumption s ≥ n , the proposed formulation(P0) is then applicable for sensor selection in a non-Bayesianframework, where the unknown parameter is estimated throughthe best linear unbiased estimator [24].In what follows, we discuss two special cases for the for-mulations of the sensor selection problem under two differentstructures of the noise covariance matrix R : a) R is diagonal,and b) R has small off-diagonal entries. D. Formulation for two special cases
When measurement noises are uncorrelated, the noise co-variance matrix R becomes diagonal. From (6) and (7), the For appropriate matrices A , B , C and D , the matrix inversion lemmastates that ( A + BCD ) − = A − − A − B ( C − + DA − B ) − DA − ,which yields B ( C − + DA − B ) − D = A − A ( A + BCD ) − A . OURNAL OF L A TEX CLASS FILES, VOL. XX, NO. XX, XX 2016 4
Fisher information matrix in the objective function of (P0)simplifies to J w = Σ − + H T Φ Tw Φ w R − Φ Tw Φ w H = Σ − + H T diag( w ) R − diag( w ) H = Σ − + m (cid:88) i =1 w i R − ii h i h Ti , (12)where h Ti denotes the i th row of H , R ii denotes the i thdiagonal entry of R , and the last equality holds due to thefact that w i = w i , i = 1 , , . . . , m. (13)It is clear from (12) that each sensor contributes to Fisherinformation in an additive manner. As demonstrated in [13]and [14], the linearity of the inverse mean squared error (Fisherinformation) with respect to w enables the use of convexoptimization to solve the problem of sensor selection.When measurement noises are weakly correlated (namely, R has small off-diagonal entries), it will be shown in Sec. IVthat the Fisher information matrix can be approximately ex-pressed as ˆ J w := Σ − + H T ( ww T ◦ R − ) H , (14)where ◦ stands for the Hadamard (elementwise) product. Theproblem of sensor selection with weakly correlated noisebecomesminimize w tr (cid:0) Σ − + H T ( ww T ◦ R − ) H (cid:1) − subject to T w ≤ s, w ∈ { , } m . (P1)Compared to the generalized formulation (P0), the objectivefunction of (P1) is convex with respect to the rank-one matrix ww T . Such structure introduces computational benefits whilesolving (P1). We emphasize that (P1) has been formulated in[18]–[20] for sensor selection with correlated noise, however,using this formulation, without acknowledging that it is onlyvalid when the correlation is weak, can lead to incorrectsensor selection results. We elaborate on the problem of sensorselection with weakly correlated noise in Sec. IV.III. G ENERAL C ASE : P
ROPOSED O PTIMIZATION M ETHODSFOR S ENSOR S ELECTION
In this section, we present two methods to solve (P0): thefirst is based on convex relaxation techniques, and the secondis based on a greedy algorithm. First, we show that afterrelaxing the Boolean constraints the selection problem canbe cast as a standard semidefinite program (SDP). Given thesolution of the relaxed (P0) we then use the randomizationmethod to generate a near-optimal selection scheme. Next, weshow that given a subset of sensors, activating a new sensoralways improves the estimation performance. Motivated bythis, we present a greedy algorithm that scales gracefully withthe problem size to obtain locally optimal solutions of (P0).
A. Convex relaxation
Substituting the expression of Fisher information (11) intoproblem (P0), we obtainminimize w tr (cid:16) C − B T (cid:0) S − + a − diag( w ) (cid:1) − B (cid:17) − subject to T w ≤ s, w ∈ { , } m , (15)where for notational simplicity we have defined C := Σ − + H T S − H and B := S − H .Problem (15) can be equivalently transformed to [26]minimize w , Z tr ( Z ) subject to C − B T (cid:0) S − + a − diag( w ) (cid:1) − B (cid:23) Z − , T w ≤ s, w ∈ { , } m , (16)where Z ∈ S n is an auxiliary variable, S n represents the setof n × n symmetric matrices, and the notation X (cid:23) Y (or X (cid:22) Y ) indicates that the matrix X − Y (or Y − X ) ispositive semidefinite. The first inequality constraint in (16) isobtained from (cid:16) C − B T (cid:0) S − + a − diag( w ) (cid:1) − B (cid:17) − (cid:22) Z , which implicitly adds the additional constraint Z (cid:23) , sincethe left hand side of the above inequality is the inverse of theFisher information matrix.We further introduce another auxiliary variable V ∈ S n such that the first matrix inequality of (16) is expressed as C − V (cid:23) Z − , (17)and V (cid:23) B T (cid:0) S − + a − diag( w ) (cid:1) − B . (18)Note that the minimization of tr( Z ) with inequalities (17) and(18) would force the variable V to achieve its lower bound.In other words, problem (16) is equivalent to the problemin which the inequality constraint in (16) is replaced by thetwo inequalities (17) and (18). Finally, employing the Schurcomplement, the inequalities (17) and (18) can be rewritten asthe following linear matrix inequalities (LMIs) (cid:20) C − V II Z (cid:21) (cid:23) , (cid:20) V B T B S − + a − diag( w ) (cid:21) (cid:23) . (19)Substituting (19) into (16), the sensor selection problembecomes minimize w , Z , V tr ( Z ) subject to LMIs in (19) , T w ≤ s, w ∈ { , } m . (20)Problem (20) has the form of an SDP except for the lastBoolean constraints. As shown in [13], one possibility isto relax each Boolean variable to its convex hull to obtain w ∈ [0 , m . In this case, we can choose s active sensorsgiven by the first s largest entries of the solution of the relaxedproblem, or employ a randomized rounding algorithm [14,Algorithm 3] to generate a Boolean selection vector. OURNAL OF L A TEX CLASS FILES, VOL. XX, NO. XX, XX 2016 5
Rather than directly relaxing Boolean selection variablesto continuous variables, we can use semidefinite relaxation(SDR) [27] — referred to problems in which the relaxation ofa rank constraint leads to an SDP — to better overcome thedifficulties posed by the nonconvex constraints of (20). TheBoolean constraint (13) on the entries of w can be enforcedby diag( ww T ) = w , (21)where, with an abuse of notation, diag( · ) returns in vectorform the diagonal entries of its matrix argument. By intro-ducing an auxiliary variable W together with the rank-oneconstraint W = ww T , (22)the energy and Boolean constraints in (20) can be expressedas tr( W ) ≤ s, diag( W ) = w . (23)After relaxing the (nonconvex) rank-one constraint (22) to W (cid:23) ww T , we reach the SDPminimize w , W , Z , V tr ( Z ) subject to LMIs in (19) , tr( W ) ≤ s, diag( W ) = w , (cid:20) W ww T (cid:21) (cid:23) , (24)where the last inequality is derived through the application ofa Schur complement to W (cid:23) ww T .We can use an interior-point algorithm to solve the SDP(24). In practice, if the dimension of the unknown parametervector is much less than the number of sensors, the computa-tional complexity of SDP is roughly given by O ( m . ) [28].Once the SDP (24) is solved, we employ a randomizationmethod to generate a near-optimal sensor selection scheme,where the effectiveness of the randomization method hasbeen shown in our extensive numerical experiments. We referthe readers to [27] for more details on the motivation andbenefits of randomization used in SDR. The aforementionedprocedure is summarized in Algorithm 1, which includes therandomization procedure described in Algorithm 2. Algorithm 1
SDR with randomization for sensor selection
Require: prior information Σ , R = a I + S as in (8),observation matrix H and energy budget s solve the SDP (24) and obtain solution ( w , W ) call Algorithm 2 for Boolean solution. B. Greedy algorithm
We begin by showing in Proposition 1 that even in the pres-ence of correlated measurement noise, the Fisher informationincreases if an inactive sensor is made active.
Proposition 1: If w and ˜ w represent two sensor selectionvectors, where w i = ˜ w i for i ∈ { , , . . . , m } \ { j } , w j = Algorithm 2
Randomization method [27]
Require: solution pair ( w , W ) from the SDP (24) for l = 1 , , . . . , N do pick a random number ξ ( l ) ∼ N ( w , W − ww T ) map ξ ( l ) to a sub-optimal sensor selection scheme w ( l ) w ( l ) j = (cid:26) ξ ( l ) j ≥ [ ξ ( l ) ] s otherwise , j = 1 , , . . . , m, where w ( l ) j is the j th element of w ( l ) , and [ ξ ( l ) ] s denotes the s th largest entry of ξ ( l ) end for choose a vector in { w ( l ) } Nl =1 which yields the smallestobjective value of (15). and ˜ w j = 1 , then the resulting Fisher information matrixsatisfies J ˜ w (cid:23) J w . More precisely, J ˜ w − J w = c j α j α Tj , (25)and tr( J − w ) − tr( J − w ) = c j α Tj J − w α j c j α j J − w α j ≥ , (26)where c j is a positive scalar given by c j = (cid:26) R − jj w = ( R jj − r Tj R − w r j ) − otherwise , (27)and α j = (cid:26) h j w = T Φ Tw R − w r j − h j otherwise . (28)In (27)-(28), R jj is the j th diagonal entry of R , r j representsthe covariance vector between the measurement noise of the j th sensor and that of the active sensors in w , h Tj is the j throw of H , Φ w and R w are given by (3) and (6), respectively. Proof : See Appendix A. (cid:4)
It is clear from (25) that when an inactive sensor is madeactive, the increase in Fisher information leads to an infor-mation gain in terms of the rank-one matrix given by (25).Such a phenomenon was also discovered in the calculation ofsensor utility for adaptive signal estimation [29] and leaderselection in stochastically forced consensus networks [12].Since activating a new sensor does not degrade the estimationperformance, the inequality (energy) constraint in (P0) can bereformulated as an equality constraint.In a greedy algorithm, we iteratively select a new sensorwhich gives the largest performance improvement until the en-ergy constraint is satisfied with equality. The greedy algorithmis attractive due to its simplicity, and has been employed in avariety of applications [12], [29], [30]. In particular, a greedyalgorithm was proposed in [30] for sensor selection under theassumption of uncorrelated measurement noise. We generalizethe framework of [30] by taking into account noise correlation.Clearly, in each iteration of the greedy algorithm, the newlyactivated sensor is the one that maximizes the performanceimprovement characterized by tr( J − w ) − tr( J − w ) in (26). Wesummarize the greedy algorithm in Algorithm 3. OURNAL OF L A TEX CLASS FILES, VOL. XX, NO. XX, XX 2016 6
Algorithm 3
Greedy algorithm for sensor selection
Require: w = , I = { , , . . . , m } and J w = Σ − for l = 1 , , . . . , s do given w , enumerate all the inactive sensors in I todetermine j ∈ I such that tr( J − w ) − tr( J − w ) in (26)is maximized update w by setting w j = 1 , and update J w by adding c j α j α Tj in (25) remove j from I . end for In Step 2 of Algorithm 3, we search O ( m ) sensors to achievethe largest performance improvement. In (26), the computationof J − w incurs a complexity of O ( n . ) [31]. Since Algo-rithm 3 terminates after s iterations, its overall complexityis given by O ( sm + sn . ) , where at each iteration, thecalculation of J − w is independent of the search for the newactive sensor. If the dimension of x is much less than thenumber of sensors, the complexity of Algorithm 3 reduces to O ( sm ) . Our extensive numerical experiments show that thegreedy algorithm is able to yield good locally optimal sensorselection schemes.IV. S PECIAL C ASE : S
ENSOR S ELECTION WITH W EAK N OISE C ORRELATION
In this section, we show that the existing sensor selectionmodel in [18]–[20] is invalid for an arbitrary noise covariancematrix. We establish that in contrast to the approach proposedin this paper, the existing model in [18]–[20] is only validwhen measurement noises are weakly correlated. In this sce-nario, the proposed sensor selection problem given by (P0)would simplify to (P1). Moreover, if the trace of the Fisherinformation matrix (also known as information gain definedin [20]) is adopted as the performance measure for sensorselection, we show that the resulting optimization problem canbe cast as a special problem of maximizing a convex quadraticfunction over a bounded polyhedron.
A. Drawbacks of existing formulation
In [18]–[20], several variations of sensor selection problemswith correlated noise have been studied, based on whether thequantity to be estimated is a random parameter or a randomprocess, and whether the cost function is energy or estimationerror. The common feature in [18]–[20] is that the informationmatrix was approximated by (14); we repeat equation (14) herefor convenience ˆ J w = Σ − + H T ( ww T ◦ R − ) H . (29)Compared to our formulation (7), the noise covariance matrixappearing in (29) is independent of the sensor selectionvariables. In fact, ˆ J w can be thought of as Fisher informationunder the measurement model y = Φ w Hx + v , (30)where Φ w was defined in (3). Different from (3), the noisefrom the unselected sensors is spread across the selected sensors. As a result, the measurement model (30) yields y i = v i if the i th sensor is inactive. This contradicts the factthat an inactive sensor should keep silent and thus have noeffect on the estimation task.The Fisher information in (29) can also be interpreted as[18, Sec. 3] ˆ J w = Σ − + (cid:88) i,j ∈S ¯ R ij h i h Tj , = Σ − + H T Φ Tw ( Φ w R − Φ Tw ) Φ w H , (31)where S is the set of selected sensors, and ¯ R ij denotes the ( i, j ) th entry of R − . In (31), R − is computed first and thentruncated according to the sensor selection scheme. This isan incorrect way of modeling the noise covariance matrix foractive sensors, since the matrix R should be truncated firstand then inverted as demonstrated in (7).Both of the interpretations (30) and (31) indicate that theexisting formulation in [18]–[20] is inaccurate for modelingthe problem of sensor selection with correlated noise. Anatural question that arises from the preceding discussion iswhether there exist a condition that ensures the validity of theFisher information matrix (29) as presented in [18]–[20]? Wewill show in the next section that the formulation reportedin [18]–[20] becomes valid only when sensor selection isrestricted to the weak noise correlation regime. B. Validity of existing formulation: weak correlation
We consider the scenario of weakly correlated noise, inwhich the noise covariance matrix R has small off-diagonalentries, namely, noises are weakly correlated across the sen-sors. For ease of representation, we express the noise covari-ance matrix as R = Λ + (cid:15) Υ , (32)where Λ is a diagonal matrix which consists of the diagonalentries of R , (cid:15) Υ is a symmetric matrix whose diagonal entriesare zero and off-diagonal entries correspond to those of R ,the parameter (cid:15) is introduced to govern the strength of noisecorrelation across the sensors, and Λ and Υ are independentof (cid:15) . Clearly, the covariance of weakly correlated noises canbe described by (32) for some small value of (cid:15) since Υ is (cid:15) -independent. As (cid:15) → , the off-diagonal entries of R areforced to go to zero.Proposition 2 below shows that the correct expression (7) ofFisher information is equal to the expression (29), as presentedin [18]–[20], up to first order in (cid:15) as (cid:15) → . Proposition 2:
If measurement noises are weakly correlatedand R = Λ + (cid:15) Υ , then the Fisher information matrix (7) canbe expressed as J w = ˆ J w + O ( (cid:15) ) as (cid:15) → , where ˆ J w is given by (29). Proof:
See Appendix B. (cid:4)
It is clear from Proposition 2 that (P1) is valid only whenthe noise correlation is weak. Proceeding with the same logic
OURNAL OF L A TEX CLASS FILES, VOL. XX, NO. XX, XX 2016 7 as in the previous section for the introduction of constraints(22)-(23), we relax (P1) to the SDPminimize w , W , Z tr( Z ) subject to (cid:20) Σ − + H T ( W ◦ R − ) H II Z (cid:21) (cid:23) , tr( W ) ≤ s, diag( W ) = w , (cid:20) W ww T (cid:21) (cid:23) , (33)where Z ∈ S n is an auxiliary optimization variable. Giventhe solution pair ( w , W ) of problem (33), we can use therandomization method in Algorithm 2 to construct a near-optimal sensor selection scheme. The computational com-plexity of solving problem (33) is close to that of solvingthe SDP (24). However, as will be evident later, the sensorselection problem with weakly correlated noise can be furthersimplified if the trace of the Fisher information matrix is usedas the performance measure. In this scenario, the obtainedproblem structure enables the use of more computationallyinexpensive algorithms, e.g., bilinear programing, to solve thesensor selection problem. C. Sensor selection by maximizing trace of Fisher information
Instead of minimizing the estimation error, the trace ofFisher information (so-called T-optimality [32]) also has beenused as a performance metric in problems of sensor selection[20], [33], [34]. According to [35, Lemma 1], the trace ofFisher information constitutes a lower bound to the trace oferror covariance matrix given by J − w in (7). That is, tr( J − w ) ≥ n tr( J w ) . (34)Motivated by (34) and the generalized information gain used in[20], we propose to minimize the lower bound of the objectivefunction in (P1), which leads to the problemmaximize w tr (cid:0) Σ − + H T ( ww T ◦ R − ) H (cid:1) subject to T w ≤ s, w ∈ { , } m . (P2)It is worth mentioning that the sensor selection schemeobtained from (P2) may not be optimal in the MMSE sense.However, the trace operator is linear and introduces compu-tational benefits in optimization. Reference [20] has shownthat (P2) is not convex even if Boolean selection variablesare relaxed. However, there is no theoretical justification andanalysis provided in [20] on the problem structure. In whatfollows, we demonstrate that the Boolean constraint in (P2)can be replaced by its convex hull w ∈ [0 , m without lossof performance, to obtain an equivalent optimization problem. Proposition 3: (P2) is equivalent tomaximize w w T Ωw subject to T w ≤ s, w ∈ [0 , m , (35)where Ω is a positive semidefinite matrix given by A ( R − ⊗ I n ) A T , ⊗ denotes the Kronecker product, A ∈ R m × mn isa block-diagonal matrix whose diagonal blocks are given by { h Ti } mi =1 , and h Ti denotes the i th row of the measurementmatrix H . Proof:
See Appendix C. (cid:4)
It is clear from Proposition 3 that (P2) eventually approachesthe problem of maximizing a convex quadratic function overa bounded polyhedron. It is known [36] that finding a globallyoptimal solution of (35) is NP-hard. Therefore, we resort tolocal optimization methods, such as bilinear programmingand SDR, to solve problem (35). To be specific, bilinearprogramming is a special case of alternating convex optimiza-tion, where at each iteration we solve two linear programs.Since bilinear programming is based on linear programming,it scales gracefully with problem size but with a possibilityof only finding local optima. If we rewrite the constraints ofproblem (35) as quadratic forms in w , (P2) can be furthertransformed into a nonconvex homogeneous quadratically con-strained quadratic program (QCQP), which refers to a QCQPwithout involving linear terms of optimization variables. In thisscenario, SDR can be applied to solve the problem. Comparedto the application of SDR in (33), the homogeneous QCQPleads to an SDP with a smaller problem size. We refer thereaders to [22, Sec. V] and [20, Sec. V] for more details onthe application of bilinear programming and SDR.V. N ON - MYOPIC S ENSOR S CHEDULING
In this section, we extend the sensor selection frameworkwith correlated noise to the problem of non-myopic sensorscheduling, which determines sensor activations for multiplefuture time steps. Since the Fisher information matrices atconsecutive time steps are coupled with each other, expressingthem in a closed form with respect to the sensor selectionvariables becomes intractable. Therefore, we employ a greedyalgorithm to seek locally optimal solutions of the non-myopicsensor scheduling problem.Consider a discrete-time dynamical system x t +1 = F t x t + u t (36) y t = H t x t + v t , (37)where x t ∈ R n is the target state at time t , y t ∈ R m is the measurement vector whose i th entry corresponds toa scalar observation from the i th sensor at time t , F t isthe state transition matrix from time t to time t + 1 , and H t denotes the observation matrix at time t . The inputs u t and v t are white, Gaussian, zero-mean random vectors withcovariance matrices Q and R , respectively. We note that thecovariance matrix R may not be diagonal, since the noisesexperienced by different sensors could be spatially correlated.We also remark that although the dynamical system (36)-(37) is assumed to be linear, it will be evident later that theproposed sensor scheduling framework is also applicable tonon-linear dynamical systems.The PDF of the initial state x at time step t is assumedto be Gaussian with mean ˆ x and covariance matrix ˆ P ,where ˆ x and ˆ P are estimates of the initial state and errorcovariance from the previous measurements obtained usingfiltering algorithms, such as a particle filter or a Kalman filter[37], [38]. At time step t , we aim to find the optimal sensor OURNAL OF L A TEX CLASS FILES, VOL. XX, NO. XX, XX 2016 8 schedule over the next τ time steps t + 1 , t + 2 , . . . , t + τ .Hereafter, for notational simplicity, we assume t = 0 . Thesensor schedule can be represented by a vector of binaryvariables w = [ w T , w T , . . . , w Tτ ] T ∈ { , } τm , (38)where w t = [ w t, , w t, , . . . , w t,m ] T characterizes the sensorschedule at time ≤ t ≤ τ . In what follows, we assume that τ > . If τ = 1 , the non-myopic sensor scheduling problemreduces to the sensor selection problem for one snapshot orthe so-called myopic scheduling problem. This case has beenstudied in the previous sections.In the context of state tracking [16], [39], the Fisherinformation matrix has the following recursive form J t = ( Q + F t − J − t − F Tt − ) − + G t (39) G t = H Tt Φ Tw t ( Φ w t RΦ Tw t ) − Φ w t H t , (40)for t = 1 , , . . . , τ , where J t denotes the Fisher informationat time t , G t denotes the part of Fisher information matrixwhich incorporates the updated measurement, and Φ w t is asubmatrix of diag( w t ) where all the rows corresponding tothe unselected sensors are removed. It is clear from (10) thatthe term involving Φ w t in (40) can be further expressed as anexplicit form with respect to w t . Remark 1:
In case of non-linear measurement models, theterm G t in the Fisher information matrix becomes G t = E x t [( ∇ x Tt h ) T Φ Tw t ( Φ w t RΦ Tw t ) − Φ w t ( ∇ x Tt h )] , where h ( · ) is a nonlinear measurement function, and ∇ x Tt h isthe Jacobian matrix of h with respect to x t . In this equation,the expectation with respect to x t is commonly calculated withthe help of the prediction state ˆ x t := F t − F t − · · · F ˆ x [38],[40]. To be concrete, we approximate the PDF of x t with p ( x t ) = δ ( x t − ˆ x t ) , where δ ( · ) is a δ -function. The matrix G t is then given by G t = ˆ H Tt Φ Tw t ( Φ w t RΦ Tw t ) − Φ w t ˆ H t , (41) where ˆ H t := ∇ x Tt h (ˆ x t ) . We note that the Fisher information matrices at consecutivetime steps are coupled with each other due to the recursivestructure in (39). Therefore, J t is a function of all selectionvariables { w k } tk =1 . The recursive structure makes the closedform of Fisher information intractable. This is in sharp contrastwith the problem of myopic sensor selection, where expressingthe Fisher information matrix in a closed form is possible.We now pose the non-myopic sensor scheduling problemminimize w τ τ (cid:88) t =1 tr( J − t ) subject to T w ≤ s, (42a) (cid:80) τt =1 w t,i ≤ s i , i = 1 , , . . . , m, (42b) w ∈ { , } mτ , where J t is determined by (39)-(40), the cumulative energyconstraint (42a) restricts the total number of activations for allsensors over the entire time horizon, and the individual energy constraint (42b) implies that the i th sensor can report at most s i measurements over τ time steps.To solve problem (42) in a numerically efficient manner,we employ a greedy algorithm that iteratively activates onesensor at a time until the energy constraints are satisfied withequality. The proposed greedy algorithm can be viewed as ageneralization of Algorithm 3 by incorporating the length ofthe time horizon and individual energy constraints.We elaborate on the greedy algorithm. In the initial step,we assume w = and split the set of indices of w into m subsets {I i } mi =1 , where we use the entries of the set I i to keeptrack of all the time instants at which the i th sensor is inactive.The set I i is initially given by { i, i + m, . . . , i + ( τ − m } for i = 1 , , . . . , m . There exists a one-to-one correspondencebetween an index j ∈ I i and a time instant t ∈ { , , . . . , τ } at which the i th sensor can be scheduled, where j = i + ( t − m . At every iteration of the greedy optimization algorithm,we update I i for i = 1 , , . . . , m such that it only containsindices of zero entries of w . The quantity τ − |I i | gives thenumber of times that the i th sensor has been used, where | · | denotes the cardinality of a set. The condition τ − |I i | ≥ s i indicates a violation of the individual energy constraint. Notethat the union {I ∪I ∪ . . . ∪I m } gives all the remaining timeinstants at which the sensors can be activated. We enumerateall the indices in the union to determine the index j ∗ suchthat the objective function of (42) is minimized as w j ∗ = 1 .We summarize the greedy algorithm for non-myopic sensorscheduling in Algorithm 4. Algorithm 4
Greedy algorithm for sensor scheduling
Require: w = and I i = { i, i + m, . . . , i + ( τ − m } for i = 1 , , . . . , m for l = 1 , , . . . , min { s, (cid:80) mi =1 s i } do if τ − |I i | ≥ s i , then replace I i with an empty set for i = 1 , , . . . , m , enumerate indices of w in {I ∪ I ∪ . . . ∪ I m } toselect j ∗ such that the objective function of (42) isminimized when w j = 1 , remove j from I i ∗ , where i ∗ is given by the remainderof jm for i ∗ (cid:54) = m , and i ∗ = m if the remainder is . end for The computational complexity of Algorithm 4 is domi-nated by Step 3. Specifically, we evaluate the objective func-tion of (42) using O ( τ m ) operations. And the computationof the Fisher information matrix requires a complexity of O ( τ m . ) , where O ( τ ) accounts for the number of recur-sions, and O ( m . ) is the complexity of matrix inversionin (41) [31]. We emphasize that different from Proposition 1,expressing the closed form of the performance improvementin a greedy manner becomes intractable, since the Fisherinformation matrices are coupled with each other over thetime horizon. Therefore, the computation cost of Algorithm 4is given by O ( τ m . ) per iteration.For additional perspective, we compare the computationalcomplexity of Algorithm 4 with the method in [21], where areweighted (cid:96) based quadratic programming (QP) was usedto obtain locally optimal sensor schedules under linear (or OURNAL OF L A TEX CLASS FILES, VOL. XX, NO. XX, XX 2016 9 linearized) dynamical systems with correlated noise. It wasshown in [21] that the computational complexity of QP wasideally given by O ( m . τ ) for every reweighting (cid:96) iteration.We note that the computational complexity of the greedyalgorithm increases slightly in terms of the network size by afactor m . , while it decreases significantly in terms of thelength of the time horizon by a factor τ .VI. N UMERICAL R ESULTS
In this section, we demonstrate the effectiveness of theproposed approach for sensor selection/scheduling with cor-related measurement noise. In our numerical examples, weassume that the sensors are randomly deployed in a squareregion, where each of them provides the measurement of anunknown parameter or state. For parameter estimation, weuse the linear MMSE estimator [24, Sec. 12] to estimate theunknown parameter. For state tracking, we use the extendedKalman filter [24, Sec. 13] to track the target state.
Sensor selection for parameter estimation:
We consider anetwork with m ∈ { , } sensors to estimate the vector ofparameters x ∈ R n with n = 2 , where sensors are randomlydeployed over a × lattice. The prior PDF of x is givenby x ∼ N ( µ , Σ ) , where µ = [10 , T and Σ = I . Forsimplicity, the row vectors of the measurement matrix H are chosen randomly, and independently, from the distribution N ( , I / √ n ) [13]. The covariance matrix of the measurementnoise is set by an exponential model [41] R ij = cov( v i , v j ) = σ v e − ρ (cid:107) β i − β j (cid:107) , (43)for i, j = 1 , , . . . , m , where σ v = 1 , β i ∈ R is the locationof the i th sensor in the 2D plane, (cid:107) · (cid:107) denotes the Euclideannorm, and ρ is the correlation parameter which governs thestrength of spatial correlation, namely, a larger (or smaller) ρ corresponds to a weaker (or stronger) correlation.We choose N = 100 while performing the randomizationmethod. Also, we employ an exhaustive search that enumeratesall possible sensor selection schemes to obtain the globallyoptimal solution of (P0). The estimation performance is mea-sured through the empirical MSE, which is averaged over numerical trials.In Fig. 1, we present the MSE as a function of the energybudget by solving (P0) with correlation parameter ρ = 0 . . InFig. 1-(a) for the tractability of exhaustive search, we considera small network with m = 20 sensors. We compare theperformance of the proposed greedy algorithm and SDR withrandomization to that of SDR without randomization and ex-haustive search. In particular, the right plots of Fig. 1-(a) showthe performance gaps for the obtained locally optimal solutionscompared to the globally optimal solutions resulting from anexhaustive search. We observe that the SDR method withrandomization outperforms the greedy algorithm and yieldsoptimal solutions. The randomization method also significantlyimproves the performance of SDR in sensor selection. This isnot surprising, and our numerical observations agree with theliterature [27], [42] that demonstrate the power and utility ofrandomization in SDR.In Fig. 1-(b), we present the MSE as a function of the energybudget for a relatively large network ( m = 50 ). Similar to M SE Greedy algorithmSDR without rand.SDR with rand.Exhaustive search 2 4 6 8 10 12 14 16 18 2000.050.10.150.20.25Energy budget, s Performance gapfor greedy algorithmPerformance gapfor SDR without rand.Performance gapfor SDR with rand. (a) M SE Energy budget, s Greedy algorithmSDR without rand.SDR with rand. (b)
Fig. 1:
MSE versus energy budget with correlation parameter ρ = 0 . . the results of Fig. 1-(a), the SDR method with randomizationyields the lowest estimation error. We also observe that theMSE ceases to decrease significantly when s ≥ . Thisindicates that a subset of sensors suffices to provide satisfac-tory estimation performance, since the presence of correlationamong sensors introduces information redundancy and makesobservations less diverse.In Fig. 2, we solve the problem of sensor selection withweak noise correlation ( ρ = 0 . ), and present the MSE as afunction of the energy budget s ∈ { , , . . . , } . We comparethe performance of three optimization approaches: SDR withrandomization for solving (P1), bilinear programming (BP)for solving (P2), and SDR with randomization for solving(P2). We recall that (P1) is to minimize the trace of theerror covariance matrix and (P2) is to maximize the trace ofFisher information. As we can see, approaches that maximizethe trace of Fisher information yield worse estimation perfor-mance than those that minimize the estimation error. This isbecause (P2) ignores the contribution of prior information Σ OURNAL OF L A TEX CLASS FILES, VOL. XX, NO. XX, XX 2016 10 in sensor selection. We also note that although BP (a linearprogramming based approach) has the lowest computationalcomplexity, it leads to the worst optimization performance. M SE Energy budget, s BP for (P2)SDR for (P2)SDR for (P1)
Fig. 2:
MSE versus energy budget for sensor selection with weak noisecorrelation ρ = 0 . . -3 -2 -1 M SE Correlation parameter, ρ SDR for (P0) with s = 7SDR for (P0) with s = 13SDR for (P1) with s = 7SDR for (P1) with s = 13All-sensor strong correlation weak correlation
Fig. 3:
MSE versus the strength of correlation for s ∈ { , } . In Fig. 3, we present the MSE as a function of the correlationparameter ρ , where m = 50 and s ∈ { , } . We considersensor selection schemes by using SDR with randomizationto solve problems (P0) and (P1), respectively. For comparison,we also present the estimation performance when all the sen-sors are selected. As demonstrated in Fig. 3, we consider twocorrelation regimes: weak correlation and strong correlation.We observe that in the weak correlation regime, solutions ofboth (P0) and (P1) yield the same estimation performance. Inthe strong correlation regime, solutions of (P1) could lead toworse estimation performance for sensor selection. We alsoobserve that the sensitivity to the strategy of sensor selectionreduces if the strength of correlation becomes extremely large,e.g., ρ ≤ . . More interestingly, the estimation performanceis improved as the correlation becomes stronger. This is because for strongly correlated noise, noise cancellation couldbe achieved by subtracting one observation from the other [43].Further if we fix the value of ρ , the estimation error decreaseswhen the energy budget increases, and the performance gapbetween solutions of (P0) and (P1) reduces. Sensor scheduling for state tracking
In this example, we track a target with m = 30 sensors over time steps. We assume that the target state is a × vector x t = [ x t, , x t, , x t, , x t, ] T , where ( x t, , x t, ) and ( x t, , x t, ) denote the target location and velocity at time step t . The stateequation (36) follows a white noise acceleration model [38] F t = , Q = q ∆ ∆ ∆ ∆ ∆ , where ∆ and q denote the sampling interval and the processnoise parameter, respectively. In our simulations, we set ∆ = 1 and q = 0 . . The prior PDF of the initial state is assumed tobe Gaussian with mean ˆ x = [1 , , . , . T and covariance ˆ Σ = diag(1 , , . , . . The measurement equation followsa power attenuation model [44], h i ( x t ) = (cid:115) P x t, − β i, ) + ( x t, − β i, ) (44)for i = 1 , , . . . , m , where P = 10 is the signal power of thesource, and the pair ( β i, , β i, ) is the position of the i th sensor.The covariance matrix of the measurement noise is given by(43) with ρ = 0 . . Individual energy budget M SE Our proposed greedy algorithmExisting method in [21]
Fig. 4:
MSE versus individual energy budget in target tracking.
In the sensor scheduling problem (42), we assume s = (cid:80) mi =1 s i and s = s = · · · = s m . In order to implementthe proposed greedy algorithm and the existing method in[21], the nonlinear measurement function (44) is linearizedat the prediction state ˆ x t = F t − F t − · · · F ˆ x as suggestedin Remark 1. We determine sensor schedules for every τ = 6 future time steps, and then update the estimate of the target OURNAL OF L A TEX CLASS FILES, VOL. XX, NO. XX, XX 2016 11
X−axis Y − a x i s SensorsTarget trackCurrent target loc.Selected sensors
X−axis Y − a x i s SensorsTarget trackCurrent target loc.Selected sensors (a) (b)
Fig. 5:
Sensor schedules when s i = 2 : (a) t = 10 , (b) t = 24 . state based on the selected measurements via an extendedKalman filter [45]. The estimation performance is measuredthrough the empirical MSE, which is obtained by averagingthe estimation error over time steps and simulationtrials.In Fig. 4, we present the MSE as a function of the indi-vidual energy budget. We compare the performance of ourproposed greedy algorithm with that of the sensor schedulingmethod in [21]. We remark that the method in [21] relies ona reformulation of linearized dynamical systems and an (cid:96) relaxation in optimization. In contrast, the proposed greedyalgorithm is independent of the dynamical system modelsand convex relaxations. We observe that the greedy algorithmoutperforms the method in [21]. This result together with theprevious results in Fig. 1 and 2 have implied that the greedyalgorithm could yield satisfactory estimation performance.Sensor schedules at time steps t = 10 and are shownin Fig. 5. We observe that some sensors closest to the targetare selected due to their high signal power. However, fromthe entire network point of view, the active sensors tend tobe spatially distributed rather than aggregating in a smallneighborhood around the target. This is because observationsfrom neighboring sensors are strongly correlated in space andmay lead to information redundancy in target tracking.VII. C ONCLUSION
In this paper, we studied the problem of sensor selec-tion/scheduling with correlated measurement noise. We pro-posed a general but tractable framework to design optimalsensor activations. We pointed out some drawbacks of theexisting frameworks for sensor selection with correlated noise,and showed that the existing formulation is valid only for thespecial case of weak noise correlation. Further, we extendedour framework to the problem of non-myopic sensor schedul-ing, where a greedy algorithm was developed to design non-myopic sensor schedules. Numerical results were provided toillustrate the effectiveness of our approach and the impact ofnoise correlation on the performance of sensor selection. In future work, we will study applications of sensor selec-tion with correlated noise, such as localization in multipathenvironments, sensor collaboration in distributed estimation,and clock synchronization in wireless sensor networks. Itwould also be of interest to seek theoretical guarantees forthe performance of the greedy algorithm. Furthermore, inorder to reduce the computational burden at the fusion center,developing a decentralized architecture where the optimizationprocedure can be carried out in a distributed way and by thesensors themselves is another direction of future research.A
PPENDIX AP ROOF OF P ROPOSITION ˜ w , it is clear from (7)that Fisher information can be written as J ˜ w = Σ − + [ H Tw , h j ] R − v (cid:20) H w h Tj (cid:21) , R ˜ w := (cid:20) R w r j r Tj r jj (cid:21) (45)where H w := Φ w H .If w (cid:54) = , the inverse of R ˜ w in (45) is given by R − w = c j (cid:20) c j − R − w + R − w r j r Tj R − w − R − w r j − r Tj R − w (cid:21) (46)where c j := 1 / ( r jj − r Tj R − w r j ) , and c j > following fromthe Schur complement of R ˜ w . Substituting (46) into (45), weobtain J ˜ w = J w + c j α j α Tj , (47)where J w = Σ − + H Tw R − w H w as indicated by (7), and α j := H Tw R − w r j − h j .If w = , namely, J w = Σ − , we can immediately obtainfrom (45) that J ˜ w = J w + 1 r jj h Tj h j . (48)Equations (47) and (48) imply that J ˜ w − J w (cid:23) since c j > .We apply the matrix inversion lemma to (47). This yields J − w = [ J w + c j α j α Tj ] − = J − w − c j J − w α j α Tj J − w c j α j J − w α j . OURNAL OF L A TEX CLASS FILES, VOL. XX, NO. XX, XX 2016 12
The improvement in estimation error is then given by tr( J − w ) − tr( J − w ) = c j α Tj J − w α j c j α j J − w α j . (cid:4) A PPENDIX BP ROOF OF P ROPOSITION R − w =( Φ w RΦ Tw ) − =( Φ w ΛΦ Tw + (cid:15) Φ w ΥΦ Tw ) − = ( I + (cid:15) Φ w Λ − Φ Tw Φ w ΥΦ Tw ) − Φ w Λ − Φ Tw (2) = ( I − (cid:15) Φ w Λ − Φ Tw Φ w ΥΦ Tw ) Φ w Λ − Φ Tw + O ( (cid:15) ) ( as (cid:15) → (3) = Φ w Λ − Φ Tw − (cid:15) Φ w Λ − D w ΥD w Λ − Φ Tw + O ( (cid:15) ) ( as (cid:15) → , (49)where D w := diag( w ) . In (49), step (1) holds since we usethe facts that Λ is a diagonal matrix and ( Φ w ΛΦ Tw ) − = Φ w Λ − Φ Tw ; step (2) is obtained from the Taylor series ex-pansion ( I + (cid:15) X ) − = (cid:80) ∞ i =0 ( − (cid:15) X ) i as (cid:15) → (namely, thespectrum of (cid:15) X is contained inside the open unit disk); step(3) is true since Φ Tw Φ w = D w as in (4).Substituting (49) into (7), we obtain J w = Σ − + H T Φ Tw Φ w Λ − Φ Tw Φ w H − (cid:15) H T Φ Tw Φ w Λ − D w ΥD w Λ − Φ Tw Φ w H + O ( (cid:15) ) ( as (cid:15) → (1) = Σ − + H T ( D w Λ − D w − (cid:15) D w Λ − ΥΛ − D w ) H + O ( (cid:15) ) ( as (cid:15) → Σ − + H T D w ( Λ − − (cid:15) Λ − ΥΛ − ) D w H + O ( (cid:15) ) ( as (cid:15) → (2) = Σ − + H T D w R − D w H + O ( (cid:15) ) ( as (cid:15) → (3) = Σ − + H T ( ww T ◦ R − ) H + O ( (cid:15) ) ( as (cid:15) → , where step (1) is achieved by using the fact that D w Λ − = Λ − D w = D w Λ − D w , step (2) holds due to R − = Λ − − (cid:15) Λ − ΥΛ − + O ( (cid:15) ) , and step (3) is true since D w is diagonaland has only binary elements. (cid:4) A PPENDIX CP ROOF OF P ROPOSITION φ ( w ) := tr( Σ − ) + tr (cid:0) ( ww T ◦ R − )( H T H ) (cid:1) = tr( Σ − ) + m (cid:88) i =1 m (cid:88) j =1 w i w j ¯ R ij h Ti h j = tr( Σ − ) + w T Ωw , (50) where ¯ R ij is the ( i, j ) th entry of R − , and ¯ R ij h Ti h j corre-sponds to the ( i, j ) th entry of Ω which yields the succinctform Ω = A ( R − ⊗ I n ) A T . (51)In (51), ⊗ denotes the Kronecker product, A ∈ R m × mn isa block-diagonal matrix whose diagonal blocks are given by { h Ti } mi =1 , and Ω (cid:23) due to R − ⊗ I n (cid:23) .According to (50), (P2) can be rewritten asmaximize w w T Ωw subject to T w ≤ s, w ∈ { , } m . (52)Next, we prove that problem (35) is equivalent to prob-lem (52). We recall that the former is a relaxation of thelatter, where the former entails the maximization of a con-vex quadratic function over a bounded polyhedron P := { w | T w ≤ s, w ∈ [0 , m } . It has been shown in [46]that optimal solutions of such a problem occur at vertices ofthe polyhedron P , which are zero-one vectors. This indicatesthat solutions of problem (35) are feasible for problem (52).Therefore, solutions of (35) are solutions of (52), and viceversa. (cid:4) R EFERENCES[1] L. Oliveira and J. Rodrigues, “Wireless sensor networks: a survey onenvironmental monitoring,”
Journal of Communications , vol. 6, no. 2,2011.[2] Y. Zou and K. Chakrabarty, “Sensor deployment and target localizationin distributed sensor networks,”
ACM Transactions on EmbeddedComputing Systems , vol. 3, no. 1, pp. 61–91, Feb. 2004.[3] T. He, P. Vicaire, T. Yan, L. Luo, L. Gu, G. Zhou, S. Stoleru, Q. Cao,J. A. Stankovic, and T. Abdelzaher, “Achieving real-time target trackingusing wireless sensor networks,” in
Proceedings of IEEE Real TimeTechnology and Applications Symposium , 2006, pp. 37–48.[4] F. Zhao, J. Shin, and J. Reich, “Information-driven dynamic sensorcollaboration,”
IEEE Signal Processing Magazine , vol. 19, no. 2, pp.61–72, Mar. 2002.[5] E. Masazade, R. Niu, and P. K. Varshney, “Dynamic bit allocation forobject tracking in wireless sensor networks,”
IEEE Transactions onSignal Processing , vol. 60, no. 10, pp. 5048–5063, Oct. 2012.[6] H. Zhang, J. Moura, and B. Krogh, “Dynamic field estimation usingwireless sensor networks: Tradeoffs between estimation error and com-munication cost,”
IEEE Transactions on Signal Processing , vol. 57, no.6, pp. 2383 –2395, June 2009.[7] S. Liu, A Vempaty, M. Fardad, E. Masazade, and P. K. Varshney,“Energy-aware sensor selection in field reconstruction,”
IEEE SignalProcessing Letters , vol. 21, no. 12, pp. 1476–1480, 2014.[8] S. Liu, M. Fardad, P. K. Varshney, and E. Masazade, “Optimalperiodic sensor scheduling in networks of dynamical systems,”
IEEETransactions on Signal Processing , vol. 62, no. 12, pp. 3055–3068, June2014.[9] G. Thatte and U. Mitra, “Sensor selection and power allocation fordistributed estimation in sensor networks: Beyond the star topology,”
IEEE Transactions on Signal Processing , vol. 56, no. 7, pp. 2649–2661,July 2008.[10] S. Liu, S. Kar, M. Fardad, and P. K. Varshney, “Sparsity-aware sensorcollaboration for linear coherent estimation,”
IEEE Transactions onSignal Processing , vol. 63, no. 10, pp. 2582–2596, May 2015.[11] K. Chaloner and I. Verdinelli, “Bayesian experimental design: A review,”
Statistical Science , vol. 10, no. 3, pp. 273–304, 1995.[12] F. Lin, M. Fardad, and M. R. Jovanovi´c, “Algorithms for leader selectionin stochastically forced consensus networks,”
IEEE Transactions onAutomatic Control , vol. 59, no. 7, pp. 1789–1802, July 2014.[13] S. Joshi and S. Boyd, “Sensor selection via convex optimization,”
IEEETransactions on Signal Processing , vol. 57, no. 2, pp. 451 –462, Feb.2009.
OURNAL OF L A TEX CLASS FILES, VOL. XX, NO. XX, XX 2016 13 [14] S. P. Chepuri and G. Leus, “Sparsity-promoting sensor selection for non-linear measurement models,”
IEEE Transactions on Signal Processing ,vol. 63, no. 3, pp. 684–698, Feb. 2015.[15] E. Masazade, R. Niu, and P. K. Varshney, “An approximate dynamicprogramming based non-myopic sensor selection method for targettracking,” in
Proc. the 46th Annual Conf. Information Sciences andSystems , March 2012, pp. 1–6.[16] H. L. Van Trees and K. L. Bell,
Bayesian Bounds for ParameterEstimation and Nonlinear Filtering Tracking , Wiley-IEEE press, 2007.[17] A. Jindal and K. Psounis, “Modeling spatially correlated data in sensornetworks,”
ACM Transactions on Sensor Networks , vol. 2, pp. 466–499,Nov. 2006.[18] E. Rigtorp, “Sensor selection with correlated noise,” M.S. thesis, KTHRoyal Institute of Technology, Aug. 2010.[19] H. Jamali-Rad, A. Simonetto, G. Leus, and X. Ma, “Sparsity-aware sen-sor selection for correlated noise,” in
Proceedings of 17th InternationalConference on Information Fusion (FUSION) , July 2014, pp. 1–7.[20] X. Shen and P. K. Varshney, “Sensor selection based on generalizedinformation gain for target tracking in large sensor networks,”
IEEETransactions on Signal Processing , vol. 62, no. 2, pp. 363–375, Jan2014.[21] Y. Mo, R. Ambrosino, and B. Sinopoli, “Sensor selection strategiesfor state estimation in energy constrained wireless sensor networks,”
Automatica , vol. 47, no. 7, pp. 1330–1338, 2011.[22] S. Liu, E. Masazade, M. Fardad, and P. K. Varshney, “Sensor selectionwith correlated measurements for target tracking in wireless sensor net-works,” in
Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP) , April 2015, pp. 4030–4034.[23] J. O. Berger, V. De Oliveira, and B. Sanso, “Objective Bayesiananalysis of spatially correlated data,”
Journal of the American StatisticalAssociation , vol. 96, no. 456, pp. 1361 – 1374, Nov. 2001.[24] S. M. Kay,
Fundamentals of Statistical Signal Processing, Volume I:Estimation Theory , Prentice Hall, Englewood Cliffs, NJ, 1993.[25] S. P. Chepuri and G. Leus, “Sparse sensing for distributed gaussian de-tection,” in
Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP) , April 2015, pp. 2394–2398.[26] S. Boyd and L. Vandenberghe,
Convex Optimization , CambridgeUniversity Press, Cambridge, 2004.[27] Z.-Q. Luo, W.-K. Ma, A. M.-C. So, Y. Ye, and S. Zhang, “Semidefiniterelaxation of quadratic optimization problems,”
IEEE Signal ProcessingMagazine , vol. 27, no. 3, pp. 20–34, May 2010.[28] A. Nemirovski, “Interior point polynomial time methods in convex pro-gramming,” 2012 [Online], Available: .[29] A. Bertrand, J. Szurley, P. Ruckebusch, I. Moerman, and M. Moonen,“Efficient calculation of sensor utility and sensor removal in wirelesssensor networks for adaptive signal estimation and beamforming,”
IEEETransactions on Signal Processing , vol. 60, no. 11, pp. 5857–5869, Nov.2012.[30] M. Shamaiah, S. Banerjee, and H. Vikalo, “Greedy sensor selection:Leveraging submodularity,” in
Proc. of 49th IEEE Conference onDecision and Control (CDC) , Dec. 2010, pp. 2572–2577.[31] V. V. Williams, “Multiplying matrices faster than Coppersmith-Winograd,” in
Proc. the 44th symposium on Theory of Computing ,2012, pp. 887–898.[32] F. E. Udwadia, “Methodology for optimum sensor locations forparameter identification in dynamic systems,”
Journal of EngineeringMechanics , vol. 120, no. 2, pp. 368–390, Feb. 1994.[33] X. Shen, S. Liu, and P. K. Varshney, “Sensor selection for nonlinearsystems in large sensor networks,”
IEEE Transactions on Aerospaceand Electronic Systems , vol. 50, no. 4, pp. 2664–2678, October 2014.[34] T. H. McLoughlin and M. Campbell, “Solutions to periodic sensorscheduling problems for formation flying missions in deep space,”
IEEETransactions on Aerospace and Electronic Systems , vol. 47, no. 2, pp.1351 –1368, april 2011.[35] J. Fang and H. Li, “Power constrained distributed estimation withcorrelated sensor data,”
IEEE Transactions on Signal Processing , vol.57, no. 8, pp. 3292–3297, Aug. 2009.[36] P. M. Pardalos, “Global optimization algorithms for linearly constrainedindefinite quadratic problems,”
Computers & Mathematics with Appli-cations , vol. 21, no. 67, pp. 87 – 97, 1991.[37] E. Masazade, R. Niu, and P. K. Varshney, “Dynamic bit allocationfor object tracking in wireless sensor networks,”
IEEE Trans. SignalProcess. , vol. 60, no. 10, pp. 5048–5063, Oct. 2012.[38] E. Masazade, M. Fardad, and P. K. Varshney, “Sparsity-promoting ex-tended Kalman filtering for target tracking in wireless sensor networks,”
IEEE Signal Processing Letters , vol. 19, no. 12, pp. 845–848, Dec 2012. [39] P. Tichavsky, C.H. Muravchik, and A. Nehorai, “Posterior Cram´er-Rao bounds for discrete-time nonlinear filtering,”
IEEE Transactionson Signal Processin , vol. 46, no. 5, pp. 1386–1396, may 1998.[40] S. P. Chepuri and G. Leus, “Sparsity-promoting adaptive sensorselection for non-linear filtering,” in
Proceedings of IEEE InternationalConference on Acoustics, Speech and Signal Processing (ICASSP) , May2014, pp. 5080–5084.[41] S. Liu, E. Masazade, M. Fardad, and P. K. Varshney, “Sparsity-aware field estimation via ordinary Kriging,” in
Proceedings of IEEEInternational Conference on Acoustics, Speech and Signal Processing(ICASSP) , May 2014, pp. 3976–3980.[42] A. d’Aspremont and S. Boyd, “Relaxations and randomized methods fornonconvex qcqps,” Stanford, CA: Stanford Univ., Autumn 2003. [On-line]. Available: http://web.stanford.edu/class/ee392o/relaxations.pdf .[43] F. Peng and B. Chen, “Decentralized estimation with correlated additivenoise: Does dependency always imply redundancy?,” in
Proceedings ofAsilomar Conference on Signals, Systems and Computers , Nov 2013,pp. 677–681.[44] R. Niu and P. K. Varshney, “Target location estimation in sensornetworks with quantized data,”
IEEE Trans. Signal Process. , vol. 54,no. 12, pp. 4519–4528, Dec. 2006.[45] B. M. Yu, K. V. Shenoy, and M. Sahani, “Derivation of extended kalmanfiltering and smoothing equations,” 2004.[46] H. Konno, “Maximization of a convex quadratic function under linearconstraints,”