[PDF] Decentralized Estimation over Orthogonal Multiple-access Fading Channels in Wireless Sensor Networks - Optimal and Suboptimal Estimators

Abstract

Optimal and suboptimal decentralized estimators in wireless sensor networks (WSNs) over orthogonal multiple-access fading channels are studied in this paper. Considering multiple-bit quantization before digital transmission, we develop maximum likelihood estimators (MLEs) with both known and unknown channel state information (CSI). When training symbols are available, we derive a MLE that is a special case of the MLE with unknown CSI. It implicitly uses the training symbols to estimate the channel coefficients and exploits the estimated CSI in an optimal way. To reduce the computational complexity, we propose suboptimal estimators. These estimators exploit both signal and data level redundant information to improve the estimation performance. The proposed MLEs reduce to traditional fusion based or diversity based estimators when communications or observations are perfect. By introducing a general message function, the proposed estimators can be applied when various analog or digital transmission schemes are used. The simulations show that the estimators using digital communications with multiple-bit quantization outperform the estimator using analog-and-forwarding transmission in fading channels. When considering the total bandwidth and energy constraints, the MLE using multiple-bit quantization is superior to that using binary quantization at medium and high observation signal-to-noise ratio levels.

Full PDF

aa r X i v : . [ c s . I T ] M a y Decentralized Estimation over OrthogonalMultiple-access Fading Channels in Wireless SensorNetworks—Optimal and Suboptimal Estimators

Xin Wang and Chenyang Yang

Abstract —Optimal and suboptimal decentralized estimatorsin wireless sensor networks (WSNs) over orthogonal multiple-access fading channels are studied in this paper. Consideringmultiple-bit quantization before digital transmission, we developmaximum likelihood estimators (MLEs) with both known andunknown channel state information (CSI). When training sym-bols are available, we derive a MLE that is a special case ofthe MLE with unknown CSI. It implicitly uses the trainingsymbols to estimate the channel coefﬁcients and exploits theestimated CSI in an optimal way. To reduce the computationalcomplexity, we propose suboptimal estimators. These estimatorsexploit both signal and data level redundant information toimprove the estimation performance. The proposed MLEs reduceto traditional fusion based or diversity based estimators whencommunications or observations are perfect. By introducinga general message function, the proposed estimators can beapplied when various analog or digital transmission schemesare used. The simulations show that the estimators using digitalcommunications with multiple-bit quantization outperform theestimator using analog-and-forwarding transmission in fadingchannels. When considering the total bandwidth and energyconstraints, the MLE using multiple-bit quantization is superiorto that using binary quantization at medium and high observationsignal-to-noise ratio levels.

I. I

NTRODUCTION

Wireless sensor networks (WSNs) consist of a numberof sensors deployed in a ﬁeld to collect information, e.g.,measuring physical parameters such as temperature and hu-midity. Since the sensors are usually powered by batteries andhave very limited processing and communication abilities [1],the parameters are often estimated in a decentralized way.In typical WSNs for decentralized estimation, there exists afusion center (FC). The sensors transmit their locally processedobservations to the FC without inter-sensor communications,and the FC generates the ﬁnal estimation based on the receivedsignals [2].Both observation noise and communication error deterioratethe performance of the decentralized estimation. Traditionalfusion based estimators are able to minimize the mean squareerror (MSE) of the parameter estimation by assuming perfectcommunication links (see [3] and references therein). Theyreduce the observation noises by exploiting the redundant

This work was supported by the National Nature Science Foundation ofChina under Grant 60672103. Parts of this work were presented at IEEEGlobecom’07, Washington, DC, United States, Nov. 2007.The authors are with Group 203, School of Electronics and Infor-mation Engineering, Beihang University, Beijing, 100191, China. Email: { athody,cyyangbuaa } @vip.sina.com . Tel: +86-10-8231-7213 ext. 603/101.Fax: +86-10-8231-7213 ext. 201. observations provided by multiple sensors. However, theirperformance will degrade dramatically when communicationerrors cannot be ignored or corrected. On the other hand, var-ious wireless communication technologies aimed at achievingtransmission capacity or improving reliability cannot neces-sarily minimize the MSE of the parameter estimation. Forexample, although diversity combining reduces the bit errorrate (BER) of communications, it requires that the signalstransmitted from the sensors are identical, which is not truein the context of WSNs due to the observation noises ap-peared in sensors. This motivates the joint optimization ofthe communication-oriented diversity combination and datafusion-oriented estimator at the FC under realistic observa-tion and channel models, which uses MSE of the parameterestimation as the performance metric.The bandwidth and energy constraints are two most impor-tant issues that are addressed in WSNs. When strict bandwidthconstraint is taken into account, the decentralized estimationwhen the sensors only transmit one bit (binary quantization)for each observation is studied in [4]–[9]. Among them, [4],[5] introduce the maximum likelihood estimation (MLE) anddiscuss the optimal quantization when the communicationchannels are noiseless. Also considering noiseless channels,[6] proposes a universal and isotropic quantization rule, and[8], [9] study the adaptive binary quantization methods. Whenchannels are noisy, [7] studies the MLE in additive whiteGaussian noise (AWGN) channels and introduces several lowcomplexity suboptimal estimators. It has been found that thebinary quantization is sufﬁcient for decentralized estimationat low observation signal-to-noise ratio (SNR), whereas thesensors need to transmit a few extra bits when the observationSNR is high [4]. When the energy constraint and general multi-level quantizers are considered, the decentralized estimationis studies under various channels. When communications areerror-free, quantization at the sensors are designed in [10]–[14]. The optimal trade-off between the number of active sen-sors and the quantization bit-rate of each sensor is investigatedunder total energy constraint in [15]. In binary symmetricalchannels (BSCs), the power scheduling is proposed to reducethe estimation MSE when the best linear unbiased estimator(BLUE) and a quasi-BLUE, where quantization noise is takeninto account, are used at the FC [16], [17]. To the best ofthe authors’ knowledge, the optimal decentralized estimatorusing multiple-bit quantization in fading channels is still notavailable. Although the MLE proposed in AWGN channel[7] can be applied for fading channels if the channel state information (CSI) is known at the FC, it only considers binaryquantization.Besides the decentralized estimation based on digital com-munications, the estimation based on analog communicationsreceives considerable attentions due to the important con-clusions drawn from the studies in multi-terminal codingproblem [18], [19]. The most popular scheme is amplify-and-forward (AF) transmission, which is proved to be optimal inquadratic Gaussian sensor networks with AWGN multiple-access channels (MACs) [20]. The power scheduling andenergy efﬁciency of the AF transmission are studied underAWGN channels in [21] and [22]. It shows that the AFtransmission is more energy-efﬁcient than that of digital com-munications with certain coding and modulation schemes. Infading channels, the AF transmission is not optimal any moreboth in orthogonal MACs [23]–[25] and in non-orthogonalMACs [26]. The outage laws of the estimation diversity withAF transmission in fading channels are studied in [24] and [25]in different asymptotic regimes. These studies, especially theresults in [23], indicate that the separate source-channel codingscheme outperforms the AF transmission, which is a simplejoint source-channel coding scheme, and is actually optimalin fading channels with orthogonal multiple-access protocols.In this paper, we develop the optimal and suboptimal decen-tralized estimators for a deterministic parameter consideringdigital communication systems. The observations of sensorsare quantized, coded and modulated, then transmitted to the FCwith orthogonal MACs over Rayleigh fading channels. Uni-form quantization is used since it is optimal for deterministicparameters. Because the binary quantization is only applicablefor low observation SNR levels [4], [15], a general multi-bitquantizer is considered.We strive for deriving the MLE and feasible suboptimalestimator when different local processing and communicationstrategies are used. To this end, we ﬁrst present a generalmessage function to represent various quantization and trans-mission schemes. We then derive the MLE for an unknownparameter with known CSI at the FC. In typical WSNs, thesensors usually cannot transmit too many training symbolsfor the receiver to estimate or track channel coefﬁcients dueto both energy and bandwidth constraints. Therefore, we willalso consider the case that the CSI is unknown at the FC whenno or only a few training symbols are available, which is ofpractical signiﬁcance. In order to reduce the computationalcomplexity, we will introduce suboptimal estimators followingthe hint provided by the structure of the MLEs.Our contributions are summarized as follows. We developthe decentralized MLEs with known and unknown CSI at theFC over orthogonal MACs with Rayleigh fading. The perfor-mance of the MLEs serves as the practical performance lowerbounds of the decentralized estimation in orthogonal MACs.To provide feasible estimators with affordable complexity,we propose a suboptimal algorithm, which can be viewedas modiﬁed expectation-maximization (EM) algorithm [27].By studying the special cases for error-free communicationsor noiseless observations, we show that the MLEs degenerateinto the well-known centralized fusion estimator—BLUE, orthe maximal ratio combiner (MRC) based estimator when CSI is known and a subspace based estimator when CSI isunknown. This indicates that our estimators can exploit bothdata level redundancy and signal level redundancy providedby the multiple sensors when both observation noises andcommunication errors are present. By introducing a generalmessage function that can describe various quantization andtransmission schemes, the proposed decentralized estimatorscan also be applied for the WSNs where AF transmissionor digital transmission with binary quantization are used.Therefore, our estimators can bridge the gap between theestimators using two extreme case quantization.The rest of the paper is organized as follows. SectionII describes the system models we considered. Section IIIpresents the MLEs with known and unknown CSI, and SectionIV introduces the suboptimal estimators. In Section V, weanalyze several special cases of the MLEs. In Section VI,we discuss the codebook design issue, the computationalcomplexity, and the asymptotic performance of the presentedMLEs. Simulations are provided in Section VII, and theconclusions are given in Section VIII.II. S YSTEM M ODEL

We consider a typical kind of WSNs that consists of N sensors and a FC to measure an unknown deterministic param-eter θ , where there are no inter-sensor communications amongthe sensors. The sensors transmit their quantized observationsto the FC over Rayleigh fading channels. Assume that thesensors use ideal orthogonal multiple-access protocols, suchas TDMA and FDMA, to transmit their signals to the FC.Then the FC can separate the received signals from differentsensors without inducing interference.Figure 1 is a diagram of the decentralized estimation systemconsidered. The sensors process their observations for theparameter θ before transmission. For digital communications,the processing includes quantization, channel coding andmodulation, etc. For analog communications, the processingmay be simply ampliﬁcation before transmission. A function c ( x ) , named as messaging function, is used to describe thelocal processing for both digital and analog communicationsystems. The transmission signals of the sensors arrive at theFC through independent Rayleigh fading channels, and the FCuses the received signals to estimate θ . θ Obser c ( x ) Channel FC ˆ θ ObserObser c ( x ) c ( x )Sensor 1Sensor i Sensor

N x x i x N s s i s N Y = [ y , · · · , y N ]Quan Coding Mod c ( x ) for digital communications Fig. 1. The diagram of the decentralized estimation system we considered.

In the subsequent sections, we will ﬁrst derive the decen-tralized estimators considering digital communication, thenextend the results to the case using analog communication.

A. Observation Model

The observation for the unknown parameter θ provided bythe i -th sensor is, x i = θ + n s,i , i = 1 , · · · , N, (1)where n s,i is the independent identically distributed (i.i.d.)Gaussian observation noise with zero mean and variance σ s ,and θ is bounded within a dynamic range [ − V, + V ] . B. Quantization, Coding and Modulation

We use the messaging function c ( x ) | R → C L , which mapsthe observations to transmission symbols, to represent allthe processing at the sensors including quantization, codingand modulation. To facilitate the analysis, the energy of thetransmission symbols is normalized to 1, c ( x ) H c ( x ) = 1 , ∀ x ∈ R . (2)We consider the uniform quantization, which is optimalfor deterministic parameters or for parameters with unknownstatistics. For an M -level uniform quantizer, deﬁne the granu-lar region of the quantizer as [ − W, + W ] , then all the possiblequantized values of the observations can be written as, S m = m ∆ − W, m = 0 , · · · , M − , (3)where ∆ = 2 W/ ( M − is the quantization interval.The observations are rounded to the nearest S m , therefore c ( x ) is a piecewise constant function described as, c ( x ) =  c , −∞ < x ≤ S + ∆2 c m , S m − ∆2 < x ≤ S m + ∆2 c M − , S M − − ∆2 < x < + ∞ , (4)where c m = [ c m, , · · · , c m,L ] T is the L symbols corre-sponding to the quantized observation S m to be transmitted, m = 0 , · · · , M − .Under the assumption that W is much larger than thedynamic range of θ , the probability that | x i | > W can beignored. Then c ( x ) is simpliﬁed as, c ( x ) = c m , S m − ∆2 < x ≤ S m + ∆2 . (5)Deﬁne the transmission codebook as, C t = [ c , · · · , c M − ] ∈ C L × M , (6)which can be used to describe any coding and modulationscheme following the M -level quantization.The sensors can use various codes such as natural binarycodes to represent the quantized observations. Since the focusof this paper is to design the decentralized estimators, wewill not optimize the transmission codebook for the parameterestimation. C. Received Signals

Since we consider orthogonal MACs, we assume that theFC can perfectly separate and synchronize to the receivedsignals from different sensors. Assume that the channels areblock fading, i.e., the channel coefﬁcients are invariant duringthe period that sensors transmit L symbols representing oneobservation. After matched ﬁltering and symbol-rate sampling,the L received samples corresponding to the L transmittedsymbols from the i -th sensor can be expressed as, y i = p E d h i c ( x i ) + n c,i , i = 1 , · · · , N, (7)where y i = [ y i, , · · · , y i,L ] T , h i is the channel coefﬁcientsubjecting to complex Gaussian distribution with zero meanand unit variance, n c,i is a vector of thermal noise at thereceiver subjecting to complex Gaussian distribution with zeromean and covariance matrix σ c I , and E d is the transmissionenergy for each observation.III. O PTIMAL E STIMATORS WITH OR WITHOUT

CSIIn this section, we derive the MLEs when CSI is knownor unknown at the receiver of the FC, respectively. The MLEusing training symbols in the transmission codebook is alsostudied as a special form of the MLE with unknown CSI.

A. MLE with Known CSI

Given θ , the received signals from different sensors arestatistically independent. If the CSI is known at the receiverof the FC, the log-likelihood function is, log p ( Y | h , θ ) = N X i =1 log p ( y i | h i , θ )= N X i =1 log (cid:18)Z + ∞−∞ p ( y i | h i , x ) p ( x | θ )d x (cid:19) (8)where h = [ h , · · · , h N ] T is the channel coefﬁcients vector,and p ( x | θ ) is the conditional probability density function(PDF) of the observation given θ . Following the observationmodel shown in (1), we have, p ( x | θ ) = 1 √ πσ s exp (cid:18) − ( x − θ ) σ s (cid:19) . (9)According to the received signal model shown in (7), thePDF of the received signals given CSI and the observation ofthe sensors is, p ( y i | h i , x ) = 1( πσ c ) L exp (cid:18) − k y i − √E d h i c ( x ) k σ c (cid:19) , (10)where k z k = ( z H z ) / is l norm of vector z .Substitute (9) and (10) to (8), then the log-likelihoodfunction becomes, log p ( Y | h , θ ) = N X i =1 log (cid:18)Z + ∞−∞ exp (cid:18) − ( x − θ ) σ s −k y i − √E d h i c ( x ) k σ c (cid:19)(cid:19) + a, (11) where a = log (cid:18) √ π L + 12 σ s σ Lc (cid:19) is a constant that does notaffect the estimation. From now on, we will omit the constantwhen we write likelihood functions for simplicity.Now we consider the form of the likelihood function fordigital communications, where c ( x ) is a piecewise constantfunction described in (5). Substituting (5) to (11), we have, log p ( Y | h , θ ) = N X i =1 log M − X m =0 p ( y i | h i , c m ) p ( S m | θ ) ! , (12)where p ( y i | h i , c m ) is the PDF of the received signals givenCSI and the transmitted symbols of the sensors, which is, p ( y i | h i , c m ) = 1( πσ c ) L exp (cid:18) − k y i − √E d h i c m k σ c (cid:19) , (13)and p ( S m | θ ) is the probability mass function (PMF) of thequantized observation given θ , which is, p ( S m | θ ) = Q S m − ∆2 − θσ s ! − Q S m + ∆2 − θσ s ! , (14)where Q( x ) = √ π R ∞ x exp (cid:16) − t (cid:17) d t .The MLE that maximizes (12) is then, ˆ θ = arg max θ N X i =1 log M − X m =0 exp (cid:18) − k y i − √E d h i c m k σ c (cid:19) Q S m − ∆2 − θσ s ! − Q S m + ∆2 − θσ s !!! . (15)The log-likelihood function in (15) is non-concave and hasmultiple extrema. It is difﬁcult to ﬁnd a closed-form expressionof ˆ θ or to compute ˆ θ using high efﬁcient numerical methods. B. MLE with Unknown CSI

When the CSI is unknown at the FC, the log-likelihoodfunction is, log p ( Y | θ ) = N X i =1 (cid:18) log (cid:18)Z + ∞−∞ p ( y i | x ) p ( x | θ )d x (cid:19)(cid:19) , (16)which has a similar form to the likelihood function with knownCSI shown in (8).According to the received signal model shown in (7), given x , y i subjects to zero mean complex Gaussian distribution,i.e., p ( y i | x ) = 1 π L det R y exp (cid:0) − y H i R − y y i (cid:1) , (17)where R y is the covariance matrix of y i , which is, R y = σ c I + E d c ( x ) c ( x ) H . (18)It is readily to ﬁnd that one eigenvalue of R y equals to E d + σ c , and all other eigenvalues equal to σ c . Thereby thedeterminant of R y is, det R y = ( E d + σ c ) σ L − c . (19) Following the Matrix Inverse Lemma, we have, R − y = 1 σ c I − E d σ c ( E d + σ c ) c ( x ) c ( x ) H . (20)Substituting (19) and (20) to (17), p ( y i | x ) becomes, p ( y i | x ) = α exp (cid:18) − k y i k σ c + E d y H i c ( x ) c ( x ) H y i σ c ( E d + σ c ) (cid:19) = α exp (cid:18) − k y i k σ c + E d | y H i c ( x ) | σ c ( E d + σ c ) (cid:19) , (21)where α = π L ( E d + σ c ) σ L − c is a constant.Upon substituting (21) and (14) to (16), the log-likelihoodfunction becomes, log p ( Y | θ ) = N X i =1 log (cid:18)Z + ∞−∞ exp (cid:18) − ( x − θ ) σ s −k y i k σ c + E d | y H i c ( x ) | σ c ( E d + σ c ) (cid:19) d x (cid:19) . (22)Then, the MLE is obtained as, ˆ θ = arg max θ log p ( Y | θ ) . (23) C. MLE with Unknown CSI using Training Symbols

In typical communication systems, the transmitted symbolsmay consist of training symbols to facilitate channel estima-tion. In this subsection, we will analyze the MLE for suchtransmission schemes.Deﬁne c p as a vector which consists of L p training symbols.Each transmission for an observation will begin with transmit-ting c p , then transmitting the data symbols deﬁned as c d ( x ) .Thus the messaging function becomes, c ( x ) = (cid:18) c p c d ( x ) (cid:19) . (24)Upon substituting this expression to (22) and ignoring allthe terms that does not affect the estimation, we obtain thelikelihood function as, log p ( Y | θ ) = N X i =1 log (cid:18)Z + ∞−∞ exp (cid:18) − ( x − θ ) σ s + β | y H i,d c d ( x ) | + 2 β ℜ{ c H p y i,p y H i,d c d ( x ) } (cid:17) d x (cid:19) , (25)where y i,p and y i,d are the received signals correspondingto the training symbols and data symbols, respectively, and β = E d /σ c ( E d + σ c ) is a constant.Now we show that c H p y i,p in (25) can be regarded as theminimum mean square error (MMSE) estimate for the channelcoefﬁcient h i except for a constant factor. Since both h i andthe receiver thermal noise are complex Gaussian distributed,the MMSE estimate of h i is equivalent to its linear MMSEestimate, which is, ˆ h i = ( R − y p r yh ) H y i,p , (26)where r yh = E[ y i,p h ∗ i ] , and R y p is the covariance matrix of y i,p . According to the received signal model, we have r yh = E[( p E d h i c p + n ci,p ) h ∗ i ] = p E d c p , (27)and R y p = E d c p c H p + σ c I . (28)With the Matrix Inverse Lemma, R − y p is expressed as, R − y p = 1 σ c I − E d c p c H p σ c + E d c H p c p ! . (29)Upon substituting (28) and (29) to (26), the MMSE channelestimate becomes, ˆ h i = √E d LLσ c + L p E d c H p y i,p . (30)Let κ = √E d LLσ c + L p E d , then we have c H p y i,p = ˆ h i /κ . Substi-tuting it to (25), we obtain, log p ( Y | θ ) = N X i =1 log (cid:18)Z + ∞−∞ exp (cid:18) − ( x − θ ) σ s + β | y H i,d c d ( x ) | + 2 βκ ℜ{ ˆ h i y H i,d c d ( x ) } (cid:19) d x (cid:19) . (31)In the following, we will show that the MLE in this case isequivalent to a two-stage estimator. During the ﬁrst stage, theFC uses (30) to obtain the MMSE estimate of h i . It can bemodeled as ˆ h i = h i + ǫ h i , where ǫ h i is the estimation errorsubjecting to complex Gaussian distributed with zero meanand variance Lσ c / ( Lσ c + L p E d ) . During the second stage,the FC conducts the MLE using ˆ h i .Substituting ˆ h i to (7), the received signal of the datasymbols becomes, y i,d = p E d ˆ h i c d ( x ) − p E d ǫ h i c d ( x ) + n ci,d , (32)where n ci,d is the receiver thermal noise.Deriving the conditional PDF p ( y i,d | ˆ h i , x ) with (32), wecan obtain a likelihood function which is exactly the sameas (31). This implies that the MLE with unknown CSI canexploit the available training symbols implicitly to providethe optimal channel estimation and then use it to provide theoptimal estimation of θ .Note that the likelihood function in (31) is different fromthe likelihood function that uses the estimated CSI as the truevalue of the channel coefﬁcients, which is, log p ( Y | h i = ˆ h i , θ ) = N X i =1 log (cid:18)Z + ∞−∞ exp (cid:18) − ( x − θ ) σ s +2 √E d σ c ℜ{ ˆ h i y H i,d c d ( x ) } (cid:19) d x (cid:19) . (33)This is a coherent estimator. By contrast, there exist botha coherent term ℜ{ ˆ h i y H i,d c d ( x ) } and a non-coherent term | y H i,d c d ( x ) | in (31). This means that the MLE shown in (31)uses the channel estimate as “partial” CSI after accounting forthe channel estimation errors. The true value of the channel coefﬁcients contained in the channel estimation correspondsto the coherent term of the log-likelihood function, whereasthe uncertainty in the channel estimation, i.e., the estimationerrors, leads to the non-coherent term. We will compare theperformance of the two estimators through simulations inSection VII. IV. S UBOPTIMAL E STIMATOR

In the previous section, we developed the optimal estimatorsfor the considered decentralized estimation systems, whichare not feasible for practical systems due to their prohibitivecomputational complexity. Nevertheless, their performance canserve as the practical upper-bound when both observationnoise and communication error are present, and their structureprovides us some hint to derive low complexity suboptimalestimators. In this section, we take the suboptimal estimatorwith known CSI as an example. The estimator with unknownCSI can be obtained following the same principle.We ﬁrst consider an approximation of the PMF p ( S m | θ ) .Following the Lagrange Mean Value Theorem, there exists ξ in an interval [ S m − ∆2 − θσ s , S m + ∆2 − θσ s ] that satisﬁes, p ( S m | θ ) = − Q ′ ( ξ ) ∆ σ s = ∆ √ πσ s exp (cid:18) − ξ (cid:19) . (34)If the quantization interval ∆ is small enough, we can let ξ equal to the middle value of the interval, i.e., ξ = ( S m − θ ) /σ s ,and obtain an approximate expression of the PMF as, p ( S m | θ ) ≈ p A ( S m | θ ) , ∆ √ πσ s exp (cid:18) − ( S m − θ ) σ s (cid:19) . (35)Substituting (35) to (12) and computing its partial derivativewith respect to θ , the likelihood equation is simpliﬁed as, θ = 1 N N X i =1  M − X m =0 p ( y i | h i , c m ) p A ( S m | θ ) S mM − X m =0 p ( y i | h i , c m ) p A ( S m | θ )  , (36)which is the necessary condition for the MLE of θ .Unfortunately, we cannot obtain an explicit estimator for θ from this equation because the right hand side of the likelihoodequation also contains θ . However, considering the property ofthe conditional PDF, (36) becomes, θ = 1 N N X i =1 M − X m =0 p ( S m | y i , h i , θ ) S m ! = 1 N N X i =1 E [ S m | y i , h i , θ ] . (37)If we assume that θ in the right hand side of the likelihoodequation is known, the right hand side of (37) is actually theMMSE estimator of S m i , i.e., ˆ S m i = E [ S m i | y i , h i , θ ] . This in-dicates that the MLE can be regarded as a two stage estimator.During the ﬁrst stage, it estimates S m i , i = 1 , · · · , N , with thereceived signals from each sensor. During the second stage, itcombines ˆ S m i by a sample mean estimator. These two stagesare consistent with the two steps of the EM algorithm [27]. The ﬁrst stage is the expectation step (E-step) and the secondstage is the maximization step (M-step) of the algorithm. Theset of the quantized observations S m i , which is the sufﬁcientstatistics for estimating θ , is the complete data of the EMalgorithm.We present a suboptimal estimator with a similar two-stagestructure. This estimator can be viewed as a modiﬁed EMalgorithm. Because the likelihood function shown in (12) hasmultiple extrema and the equation shown in (36) is only anecessary condition, the initial value of the iterative computa-tion is critical to the convergence of the iterative algorithm. Toobtain a good initial value, the suboptimal estimator estimates S m i by assuming it to be uniformly distributed. Furthermore,since the estimate quality of the ﬁrst stage is available, weuse BLUE to obtain ˆ θ for exploiting the quality information,instead of using the MLE in the M-step as in the standard EMalgorithm.During the ﬁrst stage of the iterative computation, thesuboptimal algorithm estimates S m i under MMSE criterion.This estimator requires a priori probability of S m i whichdepends on the unknown parameter θ . The initial distributionof S m i is set to be uniform distribution. After obtained atemporary estimate of θ , we can apply it to update the a priori probability of S m i and estimate S m i iteratively. The MMSEestimator during the ﬁrst stage is as follows, ˆ S m i = E[ S m i | h i , y i ]= M X m i =0 p ( S m i | h i , y i ) S m i = M X m i =0 p ( y i | h i , S m i )ˆ p ( S m i ) S m i M X m i =0 p ( y i | h i , S m i )ˆ p ( S m i ) , (38)where p ( y i | h i , S m i ) equals p ( y i | h i , c m ) in (13), and ˆ p ( S m i ) is the estimate for a priori PDF of S m i . Obtained ˆ θ , we use p A ( S m i | ˆ θ ) to update ˆ p ( S m i ) , i.e., let ˆ p ( S m i ) = p A ( S m i | ˆ θ ) .In this section, we omit θ in ˆ p ( S m i ) for notational simplicity,though it depends on θ .Now we derive the mean and variance of ˆ S m i , which willbe used in the BLUE of θ .If ˆ p ( S m i ) equals to its true value, the MMSE estimator in (38) is unbiased because, E[ ˆ S m i | h i ] = Z C L ˆ S m i p ( y i | h i )d y i = Z C L P M − m =0 p ( y i | h i , S m i )ˆ p ( S m i ) S m i P M − m =0 p ( y i | h i , S m i )ˆ p ( S m i ) M − X m =0 p ( y i | h i , S m i )ˆ p ( S m i )d y i = Z C L M − X m =0 p ( y i | h i , S m i )ˆ p ( S m i ) S m i d y i = M − X m =0 ˆ p ( S m i ) S m i = E[ S m i | h i ] . (39)However, ˆ p ( S m i ) in our algorithm is inaccurate since weuse ˆ θ instead of θ . The MMSE estimate may be biased, but itis hard to obtain this bias in practical systems. We regard theMMSE estimator as an unbiased estimate in our suboptimalalgorithm.Given h i and y i , the variance of the MMSE estimate canbe derived as, Var[ ˆ S m i | h i , y i ] = E[ S m i | h i , y i ] − E [ S m i | h i , y i ]= P Mm i =0 S m i p ( y i | h i , S m i )ˆ p ( S m i ) P Mm i =0 p ( y i | h i , S m i )ˆ p ( S m i ) − ˆ S m i . (40)Then the BLUE for estimating θ is, ˆ θ =  N X j =1 σ s + Var[ ˆ S m j | h j , y j ]  − N X i =1 ˆ S m i σ s + Var[ ˆ S m i | h i , y i ] . (41)The iterative algorithm can be summarized as follows:S1) Let ˆ p ( S m i ) = 1 /M as the initial value.S2) Compute ˆ S m i , i = 1 , · · · , N , and its variance with (38)and (40).S3) Substitute ˆ S m i and its variance to (41) to get ˆ θ .S4) Update ˆ p ( S m i ) using p A ( S m i | ˆ θ ) .S5) Repeat step S2) ∼ S4) until the algorithm converges ora predetermined number of iterations is reached.Note that this suboptimal algorithm differs from the oneproposed in [7], which applies the maximal a posteriori (MAP) criterion to detect binary observations of the sensors,then uses the results as the true value of the observations in theMLE derived in noise-free channels. Our suboptimal algorithminherits the structure of the MLE developed in fading channels,which gives “soft” estimates of the quantized observationsat ﬁrst, and combines them with a linear optimal estimatorafterward. By conducting these two stages iteratively, the es-timation accuracy improves rapidly. Although the suboptimal algorithm may converge to some local optimal solutions dueto the non-convexity of the original optimization problem, itstill performs fairly well as will be shown in the simulationresults. The convergence of the algorithm will be studied bysimulations in Section VII.V. S

PECIAL C ASES OF THE

MLE S To gain some insights on the decentralized MLE, in thissection, we ﬁrst study two special cases of the MLEs wheneither the observations of the sensors or the communicationsare perfect. After that, we discuss the form of the MLE withknown CSI using two extreme case quantization, which are theAF transmission (inﬁnitesimal quantization resolution) and the1-bit quantization (rough most resolution). This will providethe connections of the derived MLE with existing well-studiedoptimal estimators in these special cases.

A. Ideal Observations and Ideal Communications

Considering the approximate expression of the PMF shownin (35), the likelihood equation with known CSI is approxi-mated as, log p ( Y | h , θ ) ≈ N X i =1 log M − X m =0 exp (cid:18) − k y i − √E d h i c m k σ c − ( S m − θ ) σ s (cid:19)(cid:19)(cid:19) . (42)This tells us that the MLE exploits both the signal levelinformation y i and the data level information S m when thequantization interval is small enough. If the observations areperfect, say, no observation noises, we will show that theMLE degenerates into a signal level optimal combiner—MRCfollowed by data demodulation and parameter reconstruction.On the other hand, if the communications are perfect, we willshow that the MLE reduces to a data level optimal fusionestimator—BLUE.When CSI is unknown, we draw similar conclusions, exceptthat the MLE becomes a subspace-based estimator followedby data demodulation and parameter reconstruction when thereare no observation noises.

1) Noiseless Observations:

When the observations of thesensors are ideal, i.e., x i = θ , ∀ i = 1 , · · · , N , we have, p ( x | θ ) = δ ( x − θ ) , (43)where δ ( x ) is the Dirac-delta function.We ﬁrst consider the MLE with known CSI. Substituting(43) to (8) and ignoring all terms that do not affect theestimation, the log-likelihood function is simpliﬁed as, log p ( Y | h , θ ) = − N X i =1 k y i − √E d h i c ( θ ) k σ c . (44)Since c ( θ ) is a piecewise constant function that is notderivable, we cannot compute the partial derivative of (44)with respect to θ . Instead, we ﬁrst regard c ( θ ) as the parameterunder estimated and obtain the MLE for estimating c ( θ ) . Thelog-likelihood function in (44) is concave with respect to c ( θ ) and its only maximum can be obtained by solving the equation ∂ log p ( Y | h , θ ) /∂ c ( θ ) = 0 , which is, ˆ c ( θ ) = 1 √E d P Nj =1 | h i | N X i =1 h ∗ i y i . (45)Then we can use it as a decision variable to detect the trans-mitted symbols and reconstruct θ according to the quantizationrule with the detection results.It shows that when the observations are perfect, the structureof the MLE is the MRC concatenated with data demodulationand parameter reconstruction. This is no surprise since in thiscase the signals transmitted by different sensors are all identi-cal, the receiver at the FC is able to use the traditional diversitytechnology to reduce the communication errors. Meanwhile,it is unnecessary to use the redundant observations for datafusion.We then consider the MLE with unknown CSI. Uponsubstituting (43) to (16) and ignoring all terms that do notaffect the estimation, the log-likelihood function becomes, log p ( Y | θ ) = N X i =1 | y H i c ( θ ) | = c ( θ ) H N X i =1 y i y H i ! c ( θ ) . (46)Again, we regard c ( θ ) as the parameter to be estimated.Recall that the energy of c ( θ ) is normalized. Then the problemthat ﬁnds c ( θ ) to maximize (46) is a solvable quadraticallyconstrained quadratic program (QCQP) [28], max c ( θ ) c ( θ ) H N X i =1 y i y H i ! c ( θ )s . t . k c ( θ ) k = 1 . (47)The solution of (47) can be obtained following the resultsabout QCQP in [28], ˆ c ( θ ) = v max N X i =1 y i y H i ! , (48)where v max ( M ) is the eigenvector corresponding to themaximal eigenvalue of the matrix M .This shows that when CSI is unknown at the FC in thecase of noise-free observations, the MLE becomes a subspace-based estimator.

2) Noiseless Communications:

When σ c → , we have y i = √E d h i c m i . It means that y i is merely decided by c m i , orequivalently decided by S m i . Then the log-likelihood functionbecomes a function of the quantized observation S m i .We ﬁrst consider the MLE with known CSI. The log-likelihood function with ideal communications is, log p ( Y | h , θ ) → log p ( S | h , θ )= N X i =1 Q S m i − ∆2 − θσ s ! − Q S m i + ∆2 − θσ s !! , (49)where S = [ S m , · · · , S m N ] T . Computing the derivative of (49), we have the likelihoodequation, N X i =1 exp (cid:16) − ( S mi − ∆2 − θ ) σ s (cid:17) − exp (cid:16) − ( S mi + ∆2 − θ ) σ s (cid:17) Q (cid:16) S mi − ∆2 − θσ s (cid:17) − Q (cid:16) S mi + ∆2 − θσ s (cid:17) = 0 . (50)Generally, this likelihood equation has no closed-form so-lution. Nonetheless, the closed-form solution can be obtainedwhen the quantization noise is very small, i.e., ∆ → . Underthis condition, S m i → x i and (50) becomes, lim ∆ → ∂ log p ( S | h , θ ) ∂θ = N X i =1 x i − θσ s = 0 . (51)The MLE obtained from (51) is, ˆ θ = 1 N N X i =1 x i . (52)It is also no surprise to see that the MLE reduces to BLUE,which is often applied in centralized estimation [17], wherethe FC can obtain all raw observations of the sensors.When the CSI is unknown at the FC, the receiver of the FCcan recover the quantized observations of the sensors witherror-free if the proper codebook, which will be discussedin Section VI-A, is applied. Then the MLE with unknownCSI also degenerates into the BLUE shown in (52). This isreasonable since only the structure of communication dependson the channel information.The special cases of the MLEs with noiseless observationsor noiseless communications are summarized in Table I. TABLE IT

HE SPECIAL CASES OF THE

MLE

S WITH KNOWN AND UNKNOWN

CSI.MLE Noiseless Obser. Noiseless Comm. & ∆ → Known CSI MRC BLUEUnknown CSI Subspace-based esti-mator BLUE (proper code-book)

B. AF Transmission and Binary Quantization1) AF Transmission:

Although the estimators we deriveduntil now consider digital communications, they can alsobe applied when using the AF transmission, because themessaging function we introduced in Section II can describethe AF transmission as well.The messaging function for AF transmission is, c ( x ) = αx, (53)where α is the ampliﬁcation gain. Since c ( x ) reduces to ascalar, we rewritten it as c ( x ) .For AF transmission, we rewrite the energy normalizationcondition as E[ c ( x ) H c ( x )] = 1 . When α = 1 / p E[ θ ] + σ s ,this condition is satisﬁed. Because θ is an unknown non-random parameter, we cannot obtain E[ θ ] . To solve this problem, we assume that θ is a random variable uniformlydistributed in [ − V, + V ] , then we have E[ θ ] = V / . The received signal at the FC is, y i = √ E d h i c ( x i ) + n c,i = √ E d h i α ( θ + n s,i ) + n c,i . (54)Substituting (53) and (54) to the log-likelihood functionwith known CSI shown in (11), we can obtain the MLE withAF transmission. The derivation to obtain the MLE in this wayis rather involved due to the cross-correlation between the realand imaginary parts of the received signals. In the following,we will give an alternative derivation that is simple.We ﬁrst ﬁnd a vector of sufﬁcient statistic, then derive thelog-likelihood function using this vector as an observationvector. When h , · · · , h N is known at the receiver, it is nothard to show that Y h = [ h ∗ y , · · · , h ∗ N y N ] is a sufﬁcientstatistic of Y = [ y , · · · , y N ] for estimating θ , where, h ∗ i y i = p E d | h i | α ( θ + n s,i ) + h ∗ i n c,i , i = 1 , · · · , N. (55)The real and imaginary parts of h ∗ i y i are statistically inde-pendent and Gaussian distributed. We can ﬁnd that the meanand variance of the real part is √E d | h i | αθ and E d | h i | α σ s + | h i | σ c / , and the mean and variance of the imaginary part iszero and | h i | σ c / . Ignoring the constant not associated withthe MLE, we can obtain the log-likelihood function as, log p ( Y h | θ ) = − N X i =1 (cid:18) ( ℜ{ h ∗ i y i } − √E d | h i | θ ) | h i | σ c + 2 E d α | h i | σ s +( ℑ{ h ∗ i y i } ) | h i | σ c (cid:19) , (56)and the likelihood equation as, ∂ log p ( Y h | θ ) ∂θ = N X i =1 (cid:18) √E d α ℜ{ h ∗ i y i } σ c + 2 E d α | h i | σ s − E d α | h i | θσ c + 2 E d α | h i | σ s (cid:19) = 0 , (57)where ℜ{ z } and ℑ{ z } are the real and imaginary part of z ,respectively.The log-likelihood function has only one maximum that canbe obtained by solving the likelihood equation, ˆ θ = N X i =1 ℜ{ h ∗ i y i } σ c + 2 E d α | h i | σ sN X i =1 √E d α | h i | σ c + 2 E d α | h i | σ s , (58)which can degenerate to the optimal estimator proposed in[24] under the assumptions therein.The asymptotic performance of the AF transmission infading orthogonal MACs are analyzed in [24] and [25]. By this way, the message function c ( x ) can also be used to present othermodulations such as quadrature amplitude modulation (QAM).

2) Binary Quantization:

Considering the stringent con-straint on the bandwidth of WSNs, many contributions assumethat the sensors use binary quantizer as the local processor. Ourestimators can apply when binary quantizer is used. Basedon the system models in Section II, the messaging functionwith binary quantizer and binary phase-shift keying (BPSK)modulation is c ( x ) = (cid:26) − , x ≤ τ +1 , x > τ , (59)where τ is the quantization threshold, which equals to 0 forthe uniform quantizer we considered.Substituting c ( x ) to the likelihood function shown in (11),we have, p ( Y | h , θ ) = N Y i =1 log (cid:18) exp (cid:18) − | y i + √E d h i | σ c (cid:19) F n s ( τ − θ )+exp (cid:18) − | y i − √E d h i | σ c (cid:19) (1 − F n s ( τ − θ )) (cid:19) , (60)where F n s ( x ) is the cumulative distribution function (CDF)of the observation noise.Deﬁne a ( y i ) = exp (cid:18) − | y i + √E d h i | σ c (cid:19) − exp (cid:18) − | y i − √E d h i | σ c (cid:19) , (61)and b ( y i ) = exp (cid:18) − | y i − √E d h i | σ c (cid:19) (1 − F n s ( τ − θ )) , (62)then (60) can be simpliﬁed as p ( Y | h , θ ) = N Y i =1 [ a ( y i ) F n s ( τ − θ ) + b ( y i )] . (63)This is the same as the likelihood function shown in (9) of[7] except for the presence of the channel coefﬁcients sincewe consider fading channels.VI. D ISCUSSIONS

A. Transmission Codebook Issues

When digital communications are used, the transmissioncodebooks can represent various quantization, coding andmodulation schemes. Here we discuss the impact of thecodebooks on the decentralized MLEs.We rewrite the conditional PDF with known CSI shown in(10) as, p ( y i | h i , x ) = 1( πσ c ) L exp (cid:18) − k y i − √E d h i c ( x ) k σ c (cid:19) = 1( πσ c ) L exp (cid:18) − E d | h i | σ c (cid:19) exp (cid:18) − k y i k σ c + 2 √E d ℜ{ h i y H i c ( x ) } σ c (cid:19) . (64)Comparing the conditional PDF with unknown CSI p ( y i | x ) shown in (21) with p ( y i | h i , x ) shown in (64), we see that both PDFs depend on the correlation between the received signals y i and the transmitted symbols c ( x ) . With known CSI, theoptimal estimator is a coherent algorithm, since (64) relies onthe real part of the correlation, y H i c ( x ) . With unknown CSI,the optimal estimator is a non-coherent algorithm, since (21)depends on the square norm of y H i c ( x ) . Because y H i c ( x ) = √E d h ∗ i c H ( x i ) c ( x )+ n H c,i c ( x ) , both MLEs depend on the cross-correlation of the transmission symbols, c H ( x i ) c ( x ) .Taking digital communications as an example, if there existtwo transmission symbols c m and c n in the transmissioncodebook which have the same norm, c m = c n e jφ , (65)then p ( y i | x ) will have two identical extrema since the MLEwith unknown CSI only depends on | y H i c ( x ) | . Such a phaseambiguity will lead to severe performance degradation to thedecentralized estimator. Therefore, the auto-correlation matrixof the codebook plays a critical role on the performance ofthe MLE, especially when CSI is unknown.Many transmission schemes have this phase ambiguityproblem. For example, when the natural binary code andBPSK modulation are applied to represent each quantizedobservation and to transmit, for any c m in such a transmissioncodebook, deﬁned as C tn , there exists c m ′ in C tn that satisﬁes c m ′ = − c m . Therefore, C tn is not a proper codebook.In order to cope with the phase ambiguity problem inherentin the codebook C tn , we can simply insert training symbolsinto the transmission symbols. Though heuristic, this approachcan provide fairly good performance because the MLE can ex-ploit the training symbols to estimate the channel coefﬁcientsimplicitly as we have shown.Since the MLEs are associated with the auto-correlationmatrix of the transmission codebook, this allows us to enhancethe performance of the estimators by systematically designingthe codebook. Nonetheless, this is out of the scape of this pa-per. Some preliminary results for optimizing the transmissioncodebooks are shown in [29]. B. Asymptotic Performance of the MLEs with respect to N We ﬁrst consider the Cram´er-Rao lower bound (CRLB)when CSI is unknown at the FC, which is,

Var[ˆ θ ] ≥ (cid:18) − E (cid:20) ∂ log p ( Y | h , θ ) ∂θ (cid:21)(cid:19) − =  N X i =1 Z C L (cid:16)P M − m =0 p ( y i | c m ) p ′ ( S m | θ ) (cid:17) P M − m =0 p ( y i | c m ) p ( S m | θ ) − M − X m =0 p ( y i | c m ) p ′′ ( S m | θ )d y i ! − = 1 N Z C L (cid:16)P M − m =0 p ( y | c m ) p ′ ( S m | θ ) (cid:17) P M − m =0 p ( y | c m ) p ( S m | θ ) d y − M − X m =0 p ′′ ( S m | θ ) ! − , (66) where p ′ ( S m | θ ) and p ′′ ( S m | θ ) is the ﬁrst and second orderpartial derivatives with respect to θ , respectively.It shows that the CRLB of the MLE with unknown CSIdecreases with the factor of /N , which is the same asthe BLUE lower bound of the centralized estimation [17].This is due to the fact that given θ , the received signals y i from different sensors are statistically identical distributed andindependent among each other. Therefore, all these signalscontribute equally for reducing the estimation errors.When CSI is available at the FC, given h i , the receivedsignals are no longer identical distributed. In this case, theCRLB will depend on the channel realization, which is veryhard to derive. However, since more information can beexploited for estimation, we can infer that the CRLB withknown CSI is always lower than that with unknown CSI. Inother words, the asymptotic performance of the MLE withknown CSI will be no worse than that of the MLE withunknown CSI. C. Computational Complexity1) MLE:

We take the MLE with known CSI as an exampleto analyze the computational complexity. The analysis for theMLE with unknown CSI is similar.The MLE can be found by performing exhaustive searching.In order to make the MSE introduced by the discrete searchingneglectable, we let the searching step-size be less than ∆ /N ,Then we need to compute the value of the likelihood functionat least M × N times to obtain the MLE.The FC applies (12), (13), and (14) to compute the values ofthe likelihood function with different θ . The exponential termin (13) is independent of θ , thus it can be computed beforesearching and be stored for future use.Given θ , we still need to compute p ( S m | θ ) , m =0 , · · · , M − , which complexity is O ( M ) , then to conduct M additions and M multiplications to obtain each value ofthe likelihood function. Thus the computational complexityfor getting one value of log p ( Y | h , θ ) is O ( M N ) .After considering the operations required by the exhaustivesearching, the overall complexity of the MLE is O ( M N ) .

2) Suboptimal Estimator:

The estimator presented in Sec-tion IV uses an iterative algorithm. For each iteration, we needto get ˆ S m i and its variance with (38) and (40), then obtain theestimate of θ with (41). The complexity is similar to that ofcomputing the log-likelihood function, which is O ( M N ) .If the algorithm converges after I t times iteration, thecomplexity of the suboptimal estimator will be O ( I t M N ) .VII. S IMULATIONS

We use the MSE of estimating θ as the metric to evaluatethe performance of the estimators. The observation SNRconsidered in simulations is deﬁned as [13], γ s = 20 log (cid:18) Wσ s (cid:19) . (67)We use E d , the energy consumed by each sensor to transmitone observation, to deﬁne the communication SNR in orderto fairly compare the energy efﬁciency of the estimators with different transmission schemes. The communication SNR isthen, γ c = 10 log (cid:18) E d N (cid:19) . (68)The codebooks used in the simulations are summarized inTable II. Consider the general features of WSNs that shortdata packets are usually transmitted and each sensor is of lowcost. We use a simple error-control coding scheme, cyclicredundancy check (CRC) codes with generator polynomial G ( x ) = x + x + 1 , as an example of the coded transmission.Its codebook is denoted as C tc . For comparison, uncodedtransmission is also evaluated, which codebook is denotedas C tn . We consider BPSK modulation to generate all code-books. Because the code length of the uncoded transmissionis shorter than that of the coded transmission, the energy totransmit each symbol will be higher for the given E d . Due tothe phase ambiguity problem discussed in Section VI-A, weuse the codebook with training symbols (TS) C tp wheneverwe evaluate the estimators with unknown CSI unless speciﬁed. TABLE IIT HE S UMMARY OF THE C ODEBOOKS C ONSIDERED .Codebook ECC TS Modulation C tn No No BPSK C tc CRC No BPSK C tp No Yes BPSK

Two estimators with ideal communications are shown asthe baseline, which MSEs can be served as the performancelower bound. They are BLUE which MSE is σ s /N , and Quasi-BLUE when considering quantization noise [17]. The Quasi-BLUE bound is a more practical lower bound for comparisonsince we consider quantization for all estimators in this paperonly except for the estimator with AF transmission. A. Inﬂuence of the Quantization Bit-Rate

We ﬁrst examine the impact of quantization bit-rate ofthe sensors. Three WSNs are considered, where the sensorsin different WSNs use different quantization bit-rates. Thethree quantization bit-rates are set to be K = 1 , 2, and4, respectively. The sensors apply C tn as the transmissioncodebook, where the length of the transmitted symbols is L = K . In this simulation, we let both the total energyand the total bandwidth consumed by the networks to beidentical when using different quantization levels. Due tothe total network bandwidth constraint, the numbers of theactive sensors when K = 1 , 2, and 4 are 40, 20, and 10,respectively. Due to the total network energy constraint, theenergy consumed by each sensor to transmit one observationis also different. For example, if the transmitted energy of asensor when N = 40 is E d , the transmitted energy is E d and E d when N = 20 and 10, respectively.We compare the MSEs of the MLE with known CSI andthe Quasi-BLUE lower bound for different quantization bit-rates in Fig. 2. It is shown that low quantization bit-rate isonly applicable for the cases with extremely low observationSNR. At medium and high observation SNR levels, the optimalestimator with 1 or 2-bit quantization is inferior to that with −3 −2 −1 Observation SNR M SE Q−BLUE, 4bitMLE CSI, 4bitQ−BLUE, 2bitMLE CSI, 2bitQ−BLUE, 1bitMLE CSI, 1bit

Fig. 2. The MSEs of the MLE with different K . The MSE of MLE withknown CSI is marked as MLE CSI in the legend, and Quasi-BLUE lowerbound is marked as

Q-BLUE . The communication SNR is 6 dB for 4 bitquantization, 3 dB for 2bit quantization, and 0 dB for binary quantization. B. Convergence of the Suboptimal Estimators

We then study the convergence of the suboptimal estimators.Figure 3 depicts the MSEs of the suboptimal estimators as afunction of the number of iterations. It is shown that the MSEsof the suboptimal estimators will converge after two iterationsat different communication SNRs no matter if CSI is known. −3 −2 −1 Number of Iterations M SE Subopt NoCSI 3dBSubopt CSI 3dBSubopt NoCSI 6dBSubopt CSI 6dBSubopt NoCSI 9dBSubopt CSI 9dB

Fig. 3. The convergence of the suboptimal estimators when γ s = 20 dB , N = 10 . In the legend, NoCSI indicates suboptimal estimator withunknown CSI, and

CSI stands for the suboptimal estimator with known CSI.The communication SNRs are 3 dB , 6 dB and 9 dB , which are marked inthe legend. C. MSE versus the Communication SNR

Figure 4 depicts the MSEs of the estimators with knownCSI. Except for the estimator using AF transmission, all otherestimators use digital communications with a 4-bit uniformquantizer ( M = 16 ).To demonstrate the performance gain of the proposed es-timators that jointly optimize demodulation and parameterestimation, two traditional fusion based estimators and a MRCbased estimators are simulated. In the fusion based estimators,the FC ﬁrst demodulates the transmitted data from eachsensor, then reconstructs the observation of each sensor fromthe demodulated symbols following the rule of quantization,afterward combines these estimated observations with BLUEfusion rule to produce the ﬁnal estimate of θ . When ECCs areapplied at the sensors, the receiver at the FC will exploit itserror detection ability to discard the data that cannot pass theerror check. In the MRC based estimator, the FC ﬁrst combinesthe received signals from all sensors, then demodulates thetransmitted symbols. Finally the FC obtains the estimate of θ using the detected symbols according to the quantization rule.Except for the fusion based estimator with ECC uses code-book C tc , all estimators use codebook C tn in this simulation. −3 −2 −1 γ c M SE Fusion−CRCFusion−NoECCMRCAnalog AFMLE CSISubopt CSIQ−BLUE BoundBLUE Bound

Fig. 4. The MSEs of the estimators with known CSI as a function ofcommunication SNR when N = 10 , γ s = 20 dB . In the legend, “ Fusion-CRC ” and “

Fusion-NoECC ” stand for two fusion based estimators using thecodebook C tc and C tn , “ MRC ” stand for the MRC based estimators, “

AnalogAF ” stands for the MLE with AF transmission, “

MLE ” and “

Subopt ” denotesthe MLE and suboptimal estimators, respectively, and “

Q-BLUE Bound ” and“

BLUE Bound ” stands for two lower bounds with ideal communications.

It is shown that the MLE and suboptimal estimators out-perform both the MRC based and the fusion based estimators.The MSEs of the MLE and suboptimal estimator approach theQuasi-BLUE lower bound rapidly along with the increasingof the communication SNR, whereas the suboptimal estimatordegrades a little at low SNR. The MSE of MLE using AFtransmission is larger than that using digital transmission, sinceAF transmission is no longer optimal in fading channels.According to the performance analysis for BPSK mod-ulation in Rayleigh fading channels [30], the BER of thetransmission scheme with codebook C tn exceeds . whenthe communication SNR is lower than dB . ECC can improvethe transmission performance for high communication SNR, but it causes more errors for low SNR. For the transmissionschemes using CRC, the BER is even worse because longcodes will reduce the transmission energy per symbol. Forsuch a high BER, the fusion based estimators, especially thosewith ECCs, do not perform well. Most of the demodulateddata will be dropped due to the error check, thus the fusionestimators do not have enough information to exploit, whichﬁnally leads to worse MSE performance.The performance of the estimator based on the MRC ismuch worse than the proposed estimators, which shows thesigniﬁcant impact of the observation noise. −3 −2 −1 γ c M SE MLE NoCSI NoTSMLE NoCSI TS2MLE NoCSI TS5MLE EstCH TS2MLE EstCH TS5MLE CSI (ref.)Q−BLUE BoundBLUE Bound

Fig. 5. The MSEs of the MLEs with different level of channel information,where N = 10 and γ s = 20 dB . In the legend, “ MLE NoCSI NoTS ”, “

MLENoCSI TS2 ” and “

MLE NoCSI TS5 ” stands for the MLE with unknown CSIand different number of training symbols, respectively. “

MLE EstCH ” denotesthe MLE with known CSI that applies ˆ h i as true value of h i . “ MLE CSI (ref.) ”is the MLE with known CSI, just for a reference. −3 −2 −1 γ c M SE MLE NoCSI TS2Subopt NoCSI TS2MLE CSISubopt CSIQ−BLUE BoundBLUE Bound

Fig. 6. The MSEs of the MLE and suboptimal estimators with trainingsymbols and with known CSI, where N = 10 and γ s = 20 dB . In thelegend, MLE and Subopt means MLE and suboptimal estimators, respectively.

In Fig. 5, the MSEs of the MLEs with unknown CSI areshown. Two MLEs, which differently use training symbols, areconsidered. One is the MLE with training symbols as shownin (31), and the other is the estimator as in (33), which usesthe estimated channel coefﬁcients as their true values. Besides the codebook C tn without training symbols, we also evaluatethe codebooks with 2 and 5 training symbols. It is shownthat if C tn is applied as the codebook for the MLE withunknown CSI, the MLE exhibits rather high MSE that cannotbe improved by increasing the communication SNR. Thisvalidates our analysis in Section VI-A that the phase ambiguityof C tn will lead to severe performance degradation of theestimator. When we insert training symbols, the performanceof the MLE with unknown CSI improves signiﬁcantly, but it isstill much worse than that of the MLE with known CSI at lowcommunication SNR levels. It is interesting to see that usingmore training symbols do not improve the performance of theMLE as expected. This is because the energy for transmittingan observation is ﬁxed, inserting training symbols will reducethe energy for the data symbols. Our simulations show that thebest performance is obtained when L p = 2 . This is consistentwith the observation of [31], where the optimal L p equals √ K .To further observe the impact of different levels of CSI onthe optimal and suboptimal estimators, Fig. 6 shows the MSEsof the MLE and suboptimal estimators with known CSI andwith unknown CSI but using two training symbols. Similar tothe estimators with known CSI, the suboptimal estimator withtraining symbols is inferior to the MLE at low communicationSNR. However, the performance of the suboptimal estimatordegrades less than the MLE due to the channel estimationerrors. D. MSE versus the Number of Sensors −3 −2 −1 N M SE Fusion−CRCFusion−NoECCMRCAnalog AFMLE CSISubopt CSIQ−BLUE BoundBLUE Bound

Fig. 7. The MSEs of the estimators with known CSI, where γ c = 6 dB and γ s = 20 dB . The meaning of the legends is the same as Fig.4. Figure 7 and Fig. 8 show the MSEs of the estimators withknown CSI and unknown CSI as a function of the number ofsensors, N . We can see that the MSEs of all the estimatorsdecrease at the speed of /N for large enough N , but theMSEs cannot approach the lower bound due to communicationerrors. Compare the MSEs of the MLEs, we can see thatthe results validates our asymptotic performance analysis forMLEs both with known CSI and unknown CSI in VI-B. FromFig. 7, we can observe that the proposed estimators perform −3 −2 −1 N M SE MLE NoCSI TS2MLE NoCSI TS5MLE EstCH TS2MLE EstCH TS5MLE CSI (ref.)Q−BLUE BoundBLUE Bound

Fig. 8. The MSEs of the estimators with training symbols and with estimatedCSI when γ c = 6 dB and γ s = 20 dB . The legends are the same as Fig.5. much better than the fusion based estimators and the MRCbased estimators. It means that the networks with traditionalapproaches must activate more sensors to achieve the sameMSE performance as those with our estimators, which willlead to low energy and bandwidth efﬁciency. E. Computational Complexity of the Estimators

To evaluate the computational complexity, we record thetime consumed by 10,000 Monte-Carlo simulations for theproposed estimators with known CSI. Table III shows thecomputation time in seconds at different communication SNRlevels. The step-size for the exhausting searching of the MLEis set to ∆ /N . The number of the iterations for suboptimalestimator is set to 2 according to the convergence analysis.It is shown that the computation time of the suboptimalestimator is much less than that of the MLE, and is almostinvariant with the communication SNR since the number ofthe iterations is ﬁxed. The computation time consumed by theMLE varies slightly, which comes from the implementation ofthe truncate exponential function in simulation codes. TABLE IIIT

HE COMPUTATION TIME IN SECONDS CONSUMED BY SIMULATING THEESTIMATORS WITH KNOWN

CSI. γ c VIII. C

ONCLUSION

In this paper, we studied the decentralized estimation fora deterministic parameter using digital communications overorthogonal multiple-access fading channels with a uniformmultiple-bit quantizer. By introducing a general messagingfunction, the proposed estimators can be applied for digitalcommunication systems using various quantization, codingand modulation schemes and for analog communication sys-tems such as those using the well-studied AF transmission. We derived the MLEs with known and unknown CSI.When inserting training symbols before the data symbols, theMLE with unknown CSI estimates channels implicitly andexploits the channel estimates in an optimal way. Following thestructure of the MLE, we designed a suboptimal estimator thathas affordable complexity and converges rapidly. It performsas well as the MLE at high communication SNR and has minorperformance loss at low communication SNR.Simulation results show that both the MLEs and the sub-optimal estimators outperform the traditional MRC basedand fusion based estimators, and the estimators using digitalcommunications outperform those using AF transmission inRayleigh fading channels. Compared with the WSN thatusing binary quantization for decentralized estimation, thesystem using multiple-bit quantization has superior energy andbandwidth efﬁciency. Therefore, even with the strict bandwidthconstraints, we suggest that the WSNs should use multiple-bit quantization rather than binary quantization when theobservation SNR is relative high.R

EFERENCES[1] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “Wirelesssensor networks: A survey,”

Computer Networks , vol. 38, no. 4, pp. 393–422, Mar. 2002.[2] J.-J. Xiao, A. Ribeiro, Z.-Q. Luo, and G. B. Giannakis, “Distributedcompression-estimation using wireless sensor networks,”

IEEE SignalProcessing Magazine , vol. 23, no. 7, pp. 27–41, July 2006.[3] X. R. Li, Y. Zhu, J. Wang, and C. Han, “Optimal linear estimationfusion—part I: Uniﬁed fusion rules,”

IEEE Transactions on InformationTheory , vol. 49, no. 9, pp. 2192–2208, Sept. 2003.[4] A. Ribeiro and G. B. Giannakis, “Bandwidth-constrained distributedestimation for wireless sensor networks—part I: Gaussian case,”

IEEETransactions on Signal Processing , vol. 54, no. 3, pp. 1131–1143, Mar.2006.[5] ——, “Bandwidth-constrained distributed estimation for wireless sensornetworks—part II: Unknown probability density function,”

IEEE Trans-actions on Signal Processing , vol. 54, no. 7, pp. 2784–2796, July 2006.[6] Z.-Q. Luo, “An isotropic universal decentralized estimation scheme fora bandwidth constrained ad hoc sensor network,”

IEEE Journal onSelected Areas in Communications , vol. 23, no. 4, pp. 735–744, Apr.2005.[7] T. Aysal and K. Barner, “Constrained decentralized estimation over noisychannels for sensor networks,”

IEEE Transactions on Singal Processing ,vol. 56, no. 4, pp. 1398–1410, Apr. 2008.[8] H. Li and J. Fang, “Distributed adaptive quantization and estimationfor wireless sensor networks,”

IEEE Signal Processing Letters , vol. 14,no. 10, pp. 669–672, Oct. 2007.[9] J. Fang and H. Li, “Distributed adaptive quantization for wirelesssensor networks: From delta modulation to maximum likelihood,”

IEEETransactions on Signal Processing , vol. 56, no. 10, pp. 5246–5257, 2008.[10] W. M. Lam and A. R. Reibman, “Design of quantizers for decentralizedestimation systems,”

IEEE Transactions on Communications , vol. 41,no. 11, pp. 1602–1605, Nov. 1993.[11] H. C. Papadopoulos, G. W. Wornell, and A. V. Oppenheim, “Sequentialsignal encoding from noisy measurements using quantizers with dy-namic bias control,”

IEEE Transactions on Information Theory , vol. 47,no. 3, pp. 978–1002, Mar. 2001.[12] X. Luo and G. B. Giannakis, “Energy-constrained optimal quantizationfor wireless sensor networks,” in

First Annual IEEE CommunicationsSociety Conference on Sensor and Ad Hoc Communications and Net-works , 2004, pp. 272–278.[13] J.-J. Xiao and Z.-Q. Luo, “Decentralized estimation in an inhomoge-neous sensing environment,”

IEEE Transactions on Information Theory ,vol. 51, no. 10, pp. 3564–3575, Oct. 2005.[14] P. Venkitasubramaniam, L. Tong, and A. Swami, “Score-function quan-tization for distributed esitmation,” in , Mar. 2006, pp. 369–374.[15] J. Li and G. AlRegib, “Distributed estimation in energy-constrainedwireless sensor networks,”

IEEE Transactions on Signal Processing ,vol. 57, no. 10, pp. 3746–3758, 2009. [16] J.-J. Xiao, S. Cui, Z.-Q. Luo, and A. J. Goldsmith, “Joint estimation insensor networks under energy constraints,” in IEEE Communications So-ciety Conference on Sensor and Ad Hoc Communications and Networks ,2004, pp. 264–271.[17] ——, “Power scheduling of universal decentralized estimation in sensornetworks,”

IEEE Transactions on Signal Processing , vol. 54, no. 2, pp.413–422, Feb. 2006.[18] M. Gastpar, “To code or not to code,” Ph.D. dissertation, Ecole Poly-technique F´ed´erale de Lausanne, EPFL, Dec. 2002.[19] M. Gastpar and M. Vetterli, “Source-Channel Communication in SensorNetworks,”

Lecture Notes in Computer Science , vol. 2634, pp. 162–177,2003.[20] M. Gastpar, “Uncoded transmission is exactly optimal for a simpleGaussian “sensor” network,” in

Proc. 2007 Information Theory andApplications Workshop , Jan. 2007, pp. 5247–5251.[21] S. Cui, J.-J. Xiao, A. J. Goldsmith, Z.-Q. Luo, and H. V. Poor, “Energy-efﬁcient joint estimation in sensor networks: Analog vs. digital,” in

IEEEInternational Conference on Acoustics, Speech, and Signal Processing(ICASSP’ 05) , vol. IV, 2005, pp. 745–748.[22] J.-J. Xiao, Z.-Q. Luo, S. Cui, and A. J. Goldsmith, “Power-efﬁcientanalog forwarding transmission in an inhomogeneous gaussian sensornetwork,” in

IEEE 6th Workshop on Signal Processing Advances inWireless Communications , 2005, pp. 121–125.[23] J.-J. Xiao and Z.-Q. Luo, “Multiterminal source-channel communicationover an orthogonal multiple access channel,”

IEEE Transactions onInformation Theory , vol. 53, no. 9, pp. 3255–3264, Sept. 2007.[24] S. Cui, J.-J. Xiao, A. J. Goldsmith, Z.-Q. Luo, and H. V. Poor,“Estimation diversity and energy efﬁciency in distributed sensing,”

IEEETransactions on Signal Processing , vol. 55, no. 9, pp. 4683–4695, Sept.2007.[25] K. Bai, H. Senol, and C. Tepedelenlio˘glu, “Outage scaling laws anddiversity for distributed estimation over parallel fading channels,”

IEEETransactions on Signal Processing , vol. 57, no. 8, pp. 3182–3192, 2009.[26] K. Liu, H. El Gamal, and A. M. Sayeed, “On optimal parametric ﬁeldestimation in sensor networks,” in , July 2005, pp. 1170–1175.[27] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihoodfrom incomplete data via the EM algorithm,”

Journal of the RoyalStatistical Society. Series B (Methodological) , vol. 39, no. 1, pp. 1–38,1977.[28] S. Boyd and L. Vandenberghe,

Convex Optimization . CambridgeUniversity Press, 2004.[29] X. Wang and C. Yang, “Optimal transmission codebook design in fadingchannels for decentralized estimation in wireless sensor networks,”in

IEEE International Conference on Acoustics, Speech, and SignalProcessing (ICASSP’ 09) , Apr. 2009, pp. 2293–2296.[30] J. G. Proakis,

Digital Communications , 4th ed. The McGraw-HillCompanies, Inc., 2001.[31] M. Wang and C. Yang, “Distributed estimation in wireless sensornetworks with imperfect channel estimation,” in9th International Con-ference on Signal Processing (ICSP’ 08)