[PDF] A Lower Bound on the Bayesian MSE Based on the Optimal Bias Function

Abstract

A lower bound on the minimum mean-squared error (MSE) in a Bayesian estimation problem is proposed in this paper. This bound utilizes a well-known connection to the deterministic estimation setting. Using the prior distribution, the bias function which minimizes the Cramer-Rao bound can be determined, resulting in a lower bound on the Bayesian MSE. The bound is developed for the general case of a vector parameter with an arbitrary probability distribution, and is shown to be asymptotically tight in both the high and low signal-to-noise ratio regimes. A numerical study demonstrates several cases in which the proposed technique is both simpler to compute and tighter than alternative methods.

Full PDF

aa r X i v : . [ c s . I T ] M a y A Lower Bound on the Bayesian MSEBased on the Optimal Bias Function

Zvika Ben-Haim,

Student Member, IEEE, and Yonina C. Eldar,

Senior Member, IEEE

Abstract — A lower bound on the minimum mean-squarederror (MSE) in a Bayesian estimation problem is proposed inthis paper. This bound utilizes a well-known connection to thedeterministic estimation setting. Using the prior distribution, thebias function which minimizes the Cram´er–Rao bound can bedetermined, resulting in a lower bound on the Bayesian MSE.The bound is developed for the general case of a vector parameterwith an arbitrary probability distribution, and is shown to beasymptotically tight in both the high and low signal-to-noise ratioregimes. A numerical study demonstrates several cases in whichthe proposed technique is both simpler to compute and tighterthan alternative methods.

Index Terms — Bayesian bounds, Bayesian estimation, mini-mum mean-squared error estimation, optimal bias, performancebounds.

I. I

NTRODUCTION

The goal of estimation theory is to infer the value ofan unknown parameter based on observations. A commonapproach to this problem is the Bayesian framework, in whichthe estimate is constructed by combining the measurementswith prior information about the parameter [1]. In this setting,the parameter θ is random, and its distribution describesthe a priori knowledge of the unknown value. In addition,measurements x are obtained, whose conditional distribution,given θ , provides further information about the parameter. Theobjective is to construct an estimator ˆ θ , which is a functionof the measurements, so that ˆ θ is close to θ in some sense. Acommon measure of the quality of an estimator is its mean-squared error (MSE), given by E {k θ − ˆ θ k } .It is well-known that the posterior mean E { θ | x } is thetechnique minimizing the MSE. Thus, from a theoreticalperspective, there is no difﬁculty in ﬁnding the minimumMSE (MMSE) estimator in any given problem. In practice,however, the complexity of computing the posterior meanis often prohibitive. As a result, various alternatives, suchas the maximum a posteriori (MAP) technique, have beendeveloped [2]. The purpose of such methods is to approach theperformance of the MMSE estimator with a computationallyefﬁcient algorithm.An important goal is to quantify the performance degra-dation resulting from the use of these suboptimal techniques.One way to do this is to compare the MSE of the methodused in practice with the MMSE. Unfortunately, computationof the MMSE is itself infeasible in many cases. This has led The authors are with the Department of Electrical Engineering,Technion—Israel Institute of Technology, Haifa 32000, Israel (e-mail:[email protected]; [email protected]). This work was supportedin part by the Israel Science Foundation under Grant no. 1081/07 and by theEuropean Commission in the framework of the FP7 Network of Excellencein Wireless COMmunications NEWCOM++ (contract no. 216715). to a large body of work seeking to ﬁnd simple lower boundson the MMSE in various estimation problems [3]–[12].Generally speaking, previous bounds can be divided intotwo categories. The Weiss–Weinstein family is based on acovariance inequality and includes the Bayesian Cram´er–Raobound [3], the Bobrovski–Zakai bound [8], and the Weiss–Weinstein bound [9], [10]. The Ziv–Zakai family of boundsis based on comparing the estimation problem to a relateddetection scenario. This family includes the Ziv–Zakai bound[4] and its improvements, notably the Bellini–Tartara bound[6], the Chazan–Zakai–Ziv bound [7], and the generalizationof Bell et al. [11]. Recently, Renaux et al. have combined bothapproaches [12].The accuracy of the bounds described above is usuallytested numerically in particular estimation settings. Few ofthe previous results provide any sort of analytical proofof accuracy, even under asymptotic conditions. Bellini andTartara [6] brieﬂy discuss performance of their bound at highsignal-to-noise ratio (SNR), and Bell et al. [11] prove that theirbound converges to the true value at low SNR for a particularfamily of Gaussian-like probability distributions. To the bestof our knowledge, there are no other results concerning theasymptotic performance of Bayesian bounds.A different estimation setting arises when one considers θ as a deterministic unknown parameter. In this case, too,a common goal is to construct an estimator having low MSE.However, the term MSE has a very different meaning in thedeterministic setting, since in this case, the expectation is takenonly over the random variable x . One elementary differencewith far-reaching implications is that in the Bayesian case, theMSE is a single real number, whereas the deterministic MSEis a function of the unknown parameter θ [13]–[15].Many lower bounds have been developed for the determin-istic setting, as well. These include classical results such asthe Cram´er–Rao [16], [17], Hammersley–Chapman–Robbins[18], [19], Bhattacharya [20], and Barankin [21] bounds, aswell as more recent results [22]–[27]. By far the simplest andmost commonly used of these approaches is the Cram´er–Raobound (CRB). Like most other deterministic bounds, the CRBdeals explicitly with unbiased estimators, or, equivalently,with estimators having a speciﬁc, pre-speciﬁed bias function.Two exceptions are the uniform CRB [23], [25] and theminimax linear-bias bound [26], [27]. The CRB is known tobe asymptotically tight in many cases, even though many laterbounds are sharper than it [14], [25], [28].Although the deterministic and Bayesian settings stem fromdifferent points of view, there exist insightful relations betweenthe two approaches. The basis for this connection is the factthat by adding a prior distribution for θ , any deterministic problem can be transformed to a corresponding Bayesian set-ting. Several theorems relate the performance of correspondingBayesian and deterministic scenarios [13]. As a consequence,numerous bounds have both a deterministic and a Bayesianversion [3], [10], [12], [29].The simplicity and asymptotic tightness of the deterministicCRB motivate its use in problems in which θ is random.Such an application was described by Young and Westerberg[5], who considered the case of a scalar θ constrained tothe interval [ θ , θ ] . They used the prior distribution of θ todetermine the optimal bias function for use in the biased CRB,and thus obtained a Bayesian bound. It should be noted thatthis result differs from the Bayesian CRB of Van Trees [3];the two bounds are compared in Section II-C. We refer tothe result of Young and Westerberg as the optimal-bias bound(OBB), since it is based on choosing the bias function whichoptimizes the CRB using the given prior distribution.This paper provides an extension and a deeper analysisof the OBB. Speciﬁcally, we generalize the bound to anarbitrary n -dimensional estimation setting [30]. The boundis determined by ﬁnding the solution to a certain partialdifferential equation. Using tools from functional analysis, wedemonstrate that a unique solution exists for this differentialequation. Under suitable symmetry conditions, it is shown thatthe method can be reduced to the solution of an ordinarydifferential equation and, in some cases, presented in closedform.The mathematical tools employed in this paper are also usedfor characterizing the performance of the OBB. Speciﬁcally, itis demonstrated analytically that the proposed bound is asymp-totically tight for both high and low SNR values. Furthermore,the OBB is compared with several other bounds; in theexamples considered, the OBB is both simpler computationallyand more accurate than all relevant alternatives.The remainder of this paper is organized as follows. In Sec-tion II, we derive the OBB for a vector parameter. Section IIIdiscusses some mathematical concepts required to ensure theexistence of the OBB. In Section IV, a practical technique forcalculating the bound is developed using variational calculus.In Section V, we demonstrate some properties of the OBB,including its asymptotic tightness. Finally, in Section VI, wecompare the performance of the bound with that of otherrelevant techniques.II. T HE O PTIMAL -B IAS B OUND

In this section, we derive the OBB for the general vectorcase. To this end, we ﬁrst examine the relation between theBayesian and deterministic estimation settings (Section II-A).Next, we focus on the deterministic case and review the basicproperties of the CRB (Section II-B). Finally, the OBB isderived from the CRB (Section II-C).The focus of this paper is the Bayesian estimation prob-lem, but the bound we propose stems from the theory ofdeterministic estimation. To avoid confusion, we will indicatethat a particular quantity refers to the deterministic settingby appending the symbol ; θ to it. For example, the notation E {·} denotes expectation over both θ and x , i.e., expectation in the Bayesian sense, while expectation solely over x (inthe deterministic setting) is denoted by E {· ; θ } . The notation E {· | θ } indicates Bayesian expectation conditioned on θ .Some further notation used throughout the paper is as fol-lows. Lowercase boldface letters signify vectors and uppercaseboldface letters indicate matrices. The i th component of avector v is denoted v i , while v (1) , v (2) , . . . signiﬁes a sequenceof vectors. The derivative ∂f /∂ v of a function f ( v ) is avector function whose i th element is ∂f /∂v i . Similarly, givena vector function b ( θ ) , the derivative ∂ b /∂ θ is deﬁned as thematrix function whose ( i, j ) th entry is ∂b i /∂θ j . The squaredEuclidean norm v T v of a vector v is denoted k v k , whilethe squared Frobenius norm Tr(

M M T ) of a matrix M isdenoted k M k F . In Section III, we will also deﬁne somefunctional norms, which will be of use later in the paper. A. The Bayesian–Deterministic Connection

We now review a fundamental relation between the Bayes-ian and deterministic estimation settings. Let θ be an unknownrandom vector in R n and let x be a measurement vector.The joint probability density function (pdf) of θ and x is p x , θ ( x , θ ) = p x | θ ( x | θ ) p θ ( θ ) , where p θ is the prior distri-bution of θ and p x | θ is the conditional distribution of x given θ . For later use, deﬁne the set Θ of feasible parameter valuesby Θ = { θ ∈ R n : p θ ( θ ) > } . (1)Suppose ˆ θ = ˆ θ ( x ) is an estimator of θ . Its (Bayesian) MSEis given by MSE = E n k ˆ θ − θ k o = Z k ˆ θ − θ k p x , θ ( x , θ ) d x d θ . (2)By the law of total expectation, we have MSE =

Z (cid:18)Z k ˆ θ − θ k p x | θ ( x | θ ) d x (cid:19) p θ ( θ ) d θ = E n E n k ˆ θ − θ k (cid:12)(cid:12)(cid:12) θ oo . (3)Now consider a deterministic estimation setting, i.e., sup-pose θ is a deterministic unknown which is to be estimatedfrom random measurements x . Let the distribution p x ; θ of x (as a function of θ ) be given by p x ; θ ( x ; θ ) = p x | θ ( x | θ ) ,i.e., the distribution of x in the deterministic case equalsthe conditional distribution in the corresponding Bayesianproblem.The estimator ˆ θ deﬁned above is simply a function of themeasurements, and can therefore be applied in the determin-istic case as well. Its deterministic MSE is given by E n k ˆ θ − θ k ; θ o = Z k ˆ θ − θ k p x ; θ ( x ; θ ) d x (4)Since p x ; θ ( x ; θ ) = p x | θ ( x | θ ) , we have E n k ˆ θ − θ k ; θ o = E n k ˆ θ − θ k (cid:12)(cid:12)(cid:12) θ o . (5)Combining this fact with (3), we ﬁnd that the BayesianMSE equals the expectation of the MSE of the correspondingdeterministic problem, i.e. E n k ˆ θ − θ k o = E n E n k ˆ θ − θ k ; θ oo . (6)This relation will be used to construct the OBB in Section II-C. B. The Deterministic Cram´er–Rao Bound

Before developing the OBB, we review some basic results inthe deterministic estimation setting. Suppose θ is a determinis-tic parameter vector and let x be a measurement vector havingpdf p x ; θ ( x ; θ ) . Denote by Θ ⊆ R n the set of all possiblevalues of θ . We assume for technical reasons that Θ is anopen set. Let ˆ θ be an estimator of θ from the measurements x . Werequire the following regularity conditions to ensure that theCRB holds [31, § p x ; θ ( x ; θ ) is continuously differentiable with respect to θ . This condition is required to ensure the existence ofthe Fisher information.2) The Fisher information matrix J ( θ ) , deﬁned by [ J ( θ )] ij = E (cid:26) ∂ log p x ; θ ∂θ i ∂ log p x ; θ ∂θ j ; θ (cid:27) (7)is bounded and positive deﬁnite for all θ ∈ Θ . Thisensures that the measurements contain data about theunknown parameter.3) Exchanging the integral and derivative in the equation Z t ( x ) ∂∂θ i p x ; θ ( x ; θ ) d x = ∂∂θ i Z t ( x ) p x ; θ ( x ; θ ) d x (8)is justiﬁed for any measurable function t ( x ) , in the sensethat, if one side exists, then the other exists and the twosides are equal. A sufﬁcient condition for this to hold isthat the support of p x ; θ does not depend on θ .4) All estimators ˆ θ are Borel measurable functions whichsatisfy (cid:13)(cid:13)(cid:13)(cid:13) ∂p x ; θ ∂ θ ˆ θ T (cid:13)(cid:13)(cid:13)(cid:13) F ≤ g ( x ) for all θ (9)for some integrable function g ( x ) . This technical re-quirement is needed in order to exclude certain patholog-ical estimators whose statistical behavior is insufﬁcientlysmooth to allow the application of the CRB.The bias of an estimator ˆ θ is deﬁned as b ( θ ) = E n ˆ θ ; θ o − θ . (10)Under the above assumptions, it can be shown that the biasof any estimator is continuously differentiable [5, Lemma 2].Furthermore, under these assumptions, the CRB holds, andthus, for any estimator having bias b ( θ ) , we have E n k θ − ˆ θ k ; θ o ≥ CRB[ b , θ ] , Tr "(cid:18) I + ∂ b ∂ θ (cid:19) J − ( θ ) (cid:18) I + ∂ b ∂ θ (cid:19) T + k b ( θ ) k . (11)A more common form of the CRB is obtained by restrictingattention to unbiased estimators (i.e., techniques for which This is required in order to ensure that one can discuss differentiabilityof p x ; θ with respect to θ at any point θ ∈ Θ . In the Bayesian setting towhich we will return in Section II-C, Θ is deﬁned by (1); in this case, addinga boundary to Θ essentially leaves the setting unchanged, as long as theprior probability for θ to be on the boundary of Θ is zero. Therefore, thisrequirement is of little practical relevance. b ( θ ) = ). Under the unbiasedness assumption, the boundsimpliﬁes to MSE ≥ Tr( J − ( θ )) . However, in the sequel wewill make use of the general form (11). C. A Bayesian Bound from the CRB

The OBB of Young and Westerberg [5] is based on apply-ing the Bayesian–deterministic connection described in Sec-tion II-A to the deterministic CRB (11). Speciﬁcally, returningnow to the Bayesian setting, one can combine (6) and (11) toobtain that, for any estimator ˆ θ with bias function b ( θ ) , E n k θ − ˆ θ k o ≥ Z [ b ] , Z Θ CRB[ b , θ ] p θ ( d θ ) (12)where the expectation is now performed over both θ and x .Note that (12) describes the Bayesian MSE as a function ofa deterministic property (the bias) of ˆ θ . Since any estimatorhas some bias function, and since all bias functions arecontinuously differentiable in our setting, minimizing Z [ b ] over all continuously differentiable functions b yields a lowerbound on the MSE of any Bayesian estimator. Thus, underthe regularity conditions of Section II-B, a lower bound onthe Bayesian MSE is given by s = inf b ∈ C Z Θ " k b ( θ ) k +Tr (cid:18) I + ∂ b ∂ θ (cid:19) J − ( θ ) (cid:18) I + ∂ b ∂ θ (cid:19) T ! p θ ( d θ ) (13)where C is the space of continuously differentiable functions f : Θ → R n .Note that the OBB differs from the Bayesian CRB ofVan Trees [3]. Van Trees’ result is based on applying theCauchy–Schwarz inequality to the joint pdf p x , θ , whereas thedeterministic CRB is based on applying a similar procedureto p x ; θ . As a consequence, the regularity conditions requiredfor the Bayesian CRB are stricter, requiring that p x , θ be twicedifferentiable with respect to θ . By contrast, the OBB requiresdifferentiability only of the conditional pdf p x | θ . An examplein which this difference is important is the case in which theprior distribution p θ is discontinuous, e.g., when p θ is uniform.The performance of the OBB in this setting will be examinedin Section VI.In the next section, we will see that it is advantageous toperform the minimization (13) over a somewhat modiﬁed classof functions. This will allow us to prove the unique existenceof a solution to the optimization problem, a result which willbe of use when examining the properties of the bound later inthe paper. III. M ATHEMATICAL S AFEGUARDS

In the previous section, we saw that a lower bound on theMMSE can be obtained by solving the minimization problem(13). However, at this point, we have no guarantee that thesolution s of (13) is anywhere near the true value of theMMSE. Indeed, at ﬁrst sight, it may appear that s = 0 forany estimation setting. To see this, note that Z [ b ] is a sumof two components, a bias gradient part and a squared bias Fig. 1. A sequence of continuous functions for which both | b ( θ ) | and | b ′ ( θ ) | tend to zero for almost every value of θ . part. Both parts are nonnegative, but the former is zero whenthe bias gradient is − I , while the latter is zero when thebias is zero. No differentiable function b satisﬁes these twoconstraints simultaneously for all θ , since if the squared bias iseverywhere zero, then the bias gradient is also zero. However,it is possible to construct a sequence of functions b ( i ) forwhich both the bias gradient and the squared bias norm tendto zero for almost every value of θ . An example of such a se-quence in a one-dimensional setting is plotted in Fig. 1. Here,a sequence b ( i ) of smooth, periodic functions is presented. Thefunction period tends to zero, and the percentage of the cyclein which the derivative equals − increases as i increases.Thus, the pointwise limit of the function sequence is zeroalmost everywhere, and the pointwise limit of the derivativeis − almost everywhere.In the speciﬁc case shown in Fig. 1, it can be shown that thevalue of Z [ b ( i ) ] does not tend to zero; in fact, Z [ b ( i ) ] tendsto inﬁnity in this situation. However, our example illustratesthat care must be taken when applying concepts from ﬁnite-dimensional optimization problems to variational calculus.The purpose of this section is to show that s > , so thatthe bound is meaningful, for any problem setting satisfying theregularity conditions of Section II-B. (This question was notaddressed by Young and Westerberg [5].) While doing so, wedevelop some abstract concepts which will also be used whenanalyzing the asymptotic properties of the OBB in Section V.As often happens with variational problems, it turns outthat the minimum of (13) is not necessarily achieved by anycontinuously differentiable function. In order to guarantee anachievable minimum, one must instead minimize (13) over aslightly modiﬁed space, which is deﬁned below. As explainedin Section II-B, all bias functions are continuously differen-tiable, so that the minimizing function ultimately obtained, ifit is not differentiable, will not be the bias of any estimator.However, as we will see, the minimum value of our newoptimization problem is identical to the inﬁmum of (13). Furthermore, this approach allows us to demonstrate severalimportant theoretical properties of the OBB.Let L be the space of p θ -measurable functions b : Θ → R n such that Z Θ k b ( θ ) k p θ ( d θ ) < ∞ . (14)Deﬁne the associated inner product D b (1) , b (2) E L , n X i =1 Z Θ b (1) i ( θ ) b (2) i ( θ ) p θ ( d θ ) (15)and the corresponding norm k b k L , h b , b i L . Any function b ∈ L has a derivative in the distributional sense, but thisderivative might not be a function. For example, discontinuousfunctions have distributional derivatives which contain a Diracdelta. If, for every i , the distributional derivative ∂b i /∂ θ of b is a function in L , then b is said to be weakly differentiable[32], and its weak derivative is the matrix function ∂ b /∂ θ .Roughly speaking, a function is weakly differentiable if it iscontinuous and its derivative exists almost everywhere.The space of all weakly differentiable functions in L iscalled the ﬁrst-order Sobolev space [32], and is denoted H .Deﬁne an inner product on H as D b (1) , b (2) E H , D b (1) , b (2) E L + n X j =1 * ∂b (1) j ∂ θ , ∂b (2) j ∂ θ + L . (16)The associated norm is k b k H , h b , b i H . An importantproperty which will be used extensively in our analysis is that H is a Hilbert space.Note that since Θ is an open set, not all functions in C arein H . For example, in the case Θ = R n , the function b ( θ ) = k , for some nonzero constant k , is continuously differentiablebut not integrable. Thus b is in C but not in H , nor evenin L . However, any measurable function which is not in H has k b k H = ∞ , meaning that either b or ∂ b /∂ θ has inﬁnite L norm. Consequently, either the bias norm part or the biasgradient part of Z [ b ] is inﬁnite. It follows that performing theminimization (13) over C ∩ H , rather than over C , doesnot change the minimum value. On the other hand, C ∩ H isdense in H , and Z [ b ] is continuous, so that minimizing (13)over H rather than C ∩ H also does not alter the minimum.Consequently, we will henceforth consider the problem s = inf b ∈ H Z [ b ] . (17)The advantage of including weakly differentiable functionsin the minimization is that a unique minimizer can now beguaranteed, as demonstrated by the following result. Proposition 1:

Consider the problem ¯ b = arg min b ∈ H Z [ b ] (18)where Z [ b ] is given by (12) and J ( θ ) is positive deﬁniteand bounded with probability 1. This problem is well-deﬁned,i.e., there exists a unique ¯ b ∈ H which minimizes Z [ b ] .Furthermore, the minimum value s = Z [¯ b ] is ﬁnite andnonzero.Proving the unique existence of a minimizer for (17) is atechnical exercise in functional analysis which can be found in Appendix II. However, once the existence of such a minimizeris demonstrated, it is not difﬁcult to see that < s < ∞ . Tosee that s < ∞ , we must ﬁnd a function b for which Z [ b ] < ∞ . One such function is b = , for which Z [ b ] is ﬁnite since J ( θ ) is bounded. Now suppose by contradiction that s = 0 ,which implies that there exists a function ¯ b ∈ H such that Z [¯ b ] = 0 . Therefore, both the bias gradient and the squaredbias parts of Z [¯ b ] are zero. In particular, since the squared biaspart equals zero, we have k ¯ b k L = 0 . Hence, ¯ b = , because L is a normed space. But then, by the deﬁnition (12) of Z [ · ] , Z [¯ b ] = Z Θ Tr( J − ( θ )) p θ ( d θ ) (19)which is positive; this is a contradiction.Note that functions in H are deﬁned up to changes on a sethaving zero measure. In particular, the fact that b (0) is uniquedoes not preclude functions which are identical to b (0) almosteverywhere (which obviously have the same value Z [ b ] ).Summarizing the discussion of the last two sections, wehave the following theorem. Theorem 1:

Let θ be an unknown random vector with pdf p θ ( θ ) > over the open set Θ ⊆ R n , and let x be ameasurement vector whose pdf, conditioned on θ , is given by p x | θ ( x | θ ) . Assume the regularity conditions of Section II-Bhold. Then, for any estimator ˆ θ , E n k θ − ˆ θ k o ≥ min b ∈ H Z Θ CRB[ b , θ ] p θ ( θ ) d θ . (20)The minimum in (20) is nonzero and ﬁnite. Furthermore, thisminimum is achieved by a function ¯ b ∈ H , which is uniqueup to changes having zero probability.Two remarks are in order concerning Theorem 1. First,the function b solving (20) might not be the bias of anyestimator; indeed, under our assumptions, all bias functions arecontinuously differentiable, whereas b need only be weaklydifferentiable. Nevertheless, (20) is still a lower bound onthe MMSE. Another important observation is that Theorem 1arises from the deterministic CRB; hence, there are no require-ments on the prior distribution p θ ( θ ) . In particular, p θ ( θ ) canbe discontinuous or have bounded support. By contrast, manyprevious Bayesian bounds do not apply in such circumstances.IV. C ALCULATING THE B OUND

In ﬁnite-dimensional convex optimization problems, therequirement of a vanishing ﬁrst derivative results in a setof equations, whose solution is the global minimum. Analo-gously, in the case of convex functional optimization problemssuch as (20), the optimum is given by the solution of a set ofdifferential equations. The following theorem, whose proof canbe found in Appendix III, speciﬁes the differential equationrelevant to our optimization problem.In this section and in the remainder of the paper, we willconsider the case in which the set

Θ = { θ : p θ ( θ ) > } isbounded. From a practical point of view, even when Θ consistsof the entire set R n , it can be approximated by a bounded setcontaining only those values of θ for which p θ ( θ ) > ǫ . Theorem 2:

Under the conditions of Theorem 1, suppose Θ is a bounded subset of R n with a smooth boundary Λ . Then, the optimal b ( θ ) of (20) is given by the solution to the systemof partial differential equations p θ ( θ ) b i ( θ ) = p θ ( θ ) X j,k ∂ b i ∂θ j ∂θ k ( J − ) jk + X j,k (cid:18) δ ik + ∂b i ∂θ k (cid:19) (cid:18) ( J − ) jk ∂p θ ∂θ j + p θ ( θ ) ∂ ( J − ) jk ∂θ j (cid:19) (21)for i = 1 , . . . n , within the range θ ∈ Θ , which satisﬁes theNeumann boundary condition (cid:18) I + ∂ b ∂ θ (cid:19) J − ν ( θ ) = (22)for all points θ ∈ Λ . Here, ν ( θ ) is a normal to the boundaryat θ . All derivatives in this system of equations are to beinterpreted in the weak sense.Note that Theorem 1 guarantees the existence of a uniquesolution in H to the differential equation (21) with theboundary conditions (22).The bound of Young and Westerberg [5] is a special caseof Theorem 2, and is given here for completeness. Corollary 1:

Under the settings of Theorem 1, suppose

Θ = ( θ , θ ) is a bounded interval in R . Then, the biasfunction b ( θ ) minimizing (20) is a solution to the second-orderordinary differential equation J ( θ ) b ( θ ) = b ′′ ( θ ) + (1 + b ′ ( θ )) (cid:18) d log p θ dθ − d log Jdθ (cid:19) (23)within the range θ ∈ Θ , subject to the boundary conditions b ′ ( θ ) = b ′ ( θ ) = − .Theorem 2 can be solved numerically, thus obtaining abound for any problem satisfying the regularity conditions.However, directly solving (21) becomes increasingly complexas the dimension of the problem increases. Instead, in manycases, symmetry relations in the problem can be used tosimplify the solution. As an example, the following sphericallysymmetric case can be reduced to a one-dimensional settingequivalent to that of Corollary 1. The proof of this theoremcan be found in Appendix IV. Theorem 3:

Under the setting of Theorem 1, suppose that

Θ = { θ : k θ k < r } is a sphere centered on the origin, p θ ( θ ) = q ( k θ k ) is spherically symmetric, and J ( θ ) = J ( k θ k ) I , where J : R → R is a scalar function. Then, the optimal-bias bound(20) is given by E n k θ − ˆ θ k o ≥ π n/ Γ( n/ Z r " b ( ρ ) + (1 + b ′ ( ρ )) J ( ρ )+ n − J ( ρ ) (cid:18) b ( ρ ) ρ (cid:19) q ( ρ ) ρ n − dρ. (24)Here, Γ( · ) is the Gamma function, and b ( ρ ) is a solution tothe ODE J ( θ ) b ( θ ) = b ′′ ( θ ) + ( n − (cid:18) b ′ ( θ ) θ − b ( θ ) θ (cid:19) + (1 + b ′ ( θ )) (cid:18) d log qdθ − d log Jdθ (cid:19) (25) subject to the boundary conditions b (0) = 0 , b ′ ( r ) = − . Thebias function for which the bound is achieved is given by b ( θ ) = b ( k θ k ) θ k θ k . (26)In this theorem, the requirement J ( θ ) = J ( k θ k ) I indicatesthat the Fisher information matrix is diagonal and that itscomponents are spherically symmetric. Parameters having adiagonal matrix J are sometimes referred to as orthogonal .The simplest case of orthogonality occurs when, to eachparameter θ i , there corresponds a measurement x i , in sucha way that the random variables x i | θ are independent. Otherorthogonal scenarios can often be constructed by an appropri-ate parametrization [33].The requirement that J have spherically symmetric compo-nents occurs, for example, in location problems, i.e., situationsin which the measurements have the form x = θ + w , where w is additive noise which is independent of θ . Indeed, undersuch conditions, J is constant in θ [31, § θ are correlated;thus, the MMSE in this situation is lower than the sum ofthe components’ MMSE. An example of such a setting ispresented in Section VI.V. P ROPERTIES

In this section, we examine several properties of the OBB.We ﬁrst demonstrate that the optimal bias function has zeromean, a property which also characterizes the bias function ofthe MMSE estimator. Next, we prove that, under very generalconditions, the resulting bound is tight at both low and highSNR values. This is an important result, since a desirableproperty of a Bayesian bound is that it provides an accurateestimate of the ambiguity region between high and low SNR[11]. Reliable estimation at the two extremes increases thelikelihood that the transition between these two regimes willbe correctly identiﬁed.

A. Optimal Bias Has Zero Mean

In any Bayesian estimation problem, the bias of the MMSEestimator ˆ θ opt = E { θ | x } has zero mean: E n ˆ θ opt o = E { E { θ | x }} = E { θ } (27)so that E n b (ˆ θ opt ) o = E { E { θ | x } − θ } = . (28)Thus, it is interesting to ask whether the optimal bias whichminimizes (20) also has zero mean. This is indeed the case,as shown by the following theorem. Theorem 4:

Let b ( θ ) be the solution to (20). Then, E { b ( θ ) } = . (29) Proof:

Assume by contradiction that b ( θ ) has nonzeromean E { b ( θ ) } = µ = . Deﬁne b ( θ ) , b ( θ ) − µ . From (11), we then have CRB[ b , θ ] − CRB[ b , θ ] = k b ( θ ) k − k b ( θ ) k = k µ k − µ T b ( θ ) . (30)Using the functional Z [ · ] deﬁned in (12), we obtain Z [ b ] − Z [ b ] = E (cid:8) k µ k − µ T b ( θ ) (cid:9) = k µ k − µ T E { b ( θ ) } = −k µ k < . (31)Thus Z [ b ] < Z [ b ] , contradicting the fact that b ( θ ) minimizes(20). B. Tightness at Low SNR

Bell et al. [11] examined the performance of the extendedZiv–Zakai bound at low SNR and demonstrated that, fora particular family of distributions, the extended Ziv–Zakaibound achieves the MSE of the optimal estimator as the SNRtends to . We now examine the low-SNR performance of theOBB, and demonstrate tightness for a much wider range ofproblem settings.Bell et al. did not deﬁne the general meaning of a low SNRvalue, and only stated that “[a]s observation time and/or SNRbecome very small, the observations become useless . . . [and]the minimum MSE estimator converges to the a priori mean.”This statement clearly does not apply to all estimation prob-lems, since it is not always clear what parameter correspondsto the observation time or the SNR. We propose to deﬁnethe zero SNR case more generally as any situation in which J ( θ ) = with probability 1. This deﬁnition implies that themeasurements do not contain information about the unknownparameter, which is the usual informal meaning of zero SNR.In the case J ( θ ) = , it can be shown that the MMSEestimator is the prior mean, so that our deﬁnition implies thestatement of Bell et al. The OBB is inapplicable when J ( θ ) = , since the CRBis based on the assumption that J ( θ ) is positive deﬁnite. Toavoid this singularity, we consider a sequence of estimationsettings which converge to zero SNR. More speciﬁcally, werequire all eigenvalues of J ( θ ) to decrease monotonically tozero for p θ -almost all θ . The following theorem, the proof ofwhich can be found in Appendix V, demonstrates the tightnessof the OBB in this low-SNR setting. Theorem 5:

Let θ be a random vector whose pdf p θ ( θ ) isnonzero over an open set Θ ⊆ R n . Let x (1) , x (2) , . . . be a se-quence of observation vectors having ﬁnite Fisher informationmatrices J (1) ( θ ) , J (2) ( θ ) , . . . , respectively. Suppose that, forall N , the matrix J ( N ) ( θ ) is positive deﬁnite for p θ -almost all θ , and that all eigenvalues of J ( N ) ( θ ) decrease monotonicallyto zero as N → ∞ for p θ -almost all θ . Let β N denote theoptimal-bias bound for estimating θ from x ( N ) . Then, lim N →∞ β N = E n k θ − E { θ }k o . (32) C. Tightness at High SNR

We now examine the performance of the OBB for highSNR values. To formally deﬁne the high SNR regime, weconsider a sequence of measurements x (1) , x (2) , . . . of a singleparameter vector θ . It is assumed that, when conditioned on θ , all measurements x ( i ) are identically and independentlydistributed (IID). Furthermore, we assume that the Fisher in-formation matrix of a single observation J ( θ ) is well-deﬁned,positive deﬁnite and ﬁnite for p θ -almost all θ . We considerthe problem of estimating θ from the set of measurements { x (1) , . . . , x ( N ) } , for a given value of N . The high SNRregime is obtained when N is large.When N tends to inﬁnity, the MSE of the optimal estimatortends to zero. An important question, however, concerns therate of convergence of the minimum MSE. More precisely,given the optimal estimator ˆ θ ( N ) of θ from { x (1) , . . . , x ( N ) } ,one would like to determine the asymptotic distribution of √ N (ˆ θ ( N ) − θ ) , conditioned on θ . A fundamental result ofasymptotic estimation theory can be loosely stated as follows[28, § III.3], [13, § √ N (ˆ θ ( N ) − θ ) ,conditioned on θ , does not depend on the prior distribution p θ ; rather, √ N (ˆ θ ( N ) − θ ) | θ converges in distribution toa Gaussian random vector with mean zero and covariance J − ( θ ) . It follows that lim N →∞ N E n k ˆ θ ( N ) − θ k o = E (cid:8) Tr[ J − ( θ )] (cid:9) . (33)Since the minimum MSE tends to zero at high SNR,any lower bound on the minimum MSE must also tend tozero as N → ∞ . However, one would further expect agood lower bound to follow the behavior of (33). In otherwords, if β N represents the lower bound for estimating θ from { x (1) , . . . , x ( N ) } , a desirable property is N β N → E (cid:8) Tr[ J − ( θ )] (cid:9) . The following theorem, whose proof isfound in Appendix V, demonstrates that this is indeed thecase for the OBB.Except for a very brief treatment by Bellini and Tartara[6], no previous Bayesian bound has shown such a result.Although it appears that the Ziv–Zakai and Weiss–Weinsteinbounds may also satisfy this property, this has not been provenformally. It is also known that the Bayesian CRB is not asymptotically tight in this sense [34, Eqs. (37)–(39)]. Theorem 6:

Let θ be a random vector whose pdf p θ ( θ ) is nonzero over an open set Θ ⊆ R n . Let x (1) , x (2) , . . . be asequence of measurement vectors, such that x (1) | θ , x (2) | θ , . . . are IID. Let J ( θ ) be the Fisher information matrix forestimating θ from x (1) , and suppose J ( θ ) is ﬁnite and positivedeﬁnite for p θ -almost all θ . Let β N be the optimal-biasbound (20) for estimating θ from the observation sequence { x (1) , . . . , x ( N ) } . Then, lim N →∞ N β N = E (cid:8) Tr( J − ( θ )) (cid:9) . (34)Note that for Theorem 6 to hold, we require only that J ( θ ) be ﬁnite and positive deﬁnite. By contrast, the varioustheorems guaranteeing asymptotic efﬁciency of Bayesian esti-mators all require substantially stronger regularity conditions [28, § III.3], [13, § ˆ θ conditioned on eachpossible value of θ , and is thus a stronger result than theasymptotic Bayesian MSE of (33).VI. E XAMPLE : U

NIFORM P RIOR

The original bound of Young and Westerberg [5] predatesmost Bayesian bounds, and, surprisingly, it has never beencited by or compared with later results. In this section, wemeasure the performance of the original bound and of itsextension to the vector case against that of various othertechniques. We consider the case in which θ is uniformlydistributed over an n -dimensional open ball Θ = { θ : k θ k r , the integration canbe limited to the range [0 , r ] . Thus, the extended Ziv–Zakaibound is given by E n k θ − ˆ θ k o ≥ Z r n V Cn ( r, h ) V n ( r ) Q (cid:18) h σ (cid:19) h dh. (48)We now compute the Weiss–Weinstein bound for the settingat hand. This bound is given by E n k θ − ˆ θ k o ≥ Tr( HG − H T ) (49) where H = [ h , . . . , h m ] is a matrix containing an arbitrarynumber m of test vectors and G is a matrix whose elementsare given by G ij = E { r ( x , θ ; h i , s i ) r ( x , θ ; h j , s j ) } E { L s i ( x ; θ + h i , θ ) } E { L s j ( x ; θ + h j , θ ) } (50)in which r ( x , θ ; h i , s i ) , L s i ( x ; θ + h i , θ ) − L − s i ( x ; θ − h i , θ ) (51)and L ( x ; θ , θ ) , p θ ( θ ) p x | θ ( x | θ ) p θ ( θ ) p x | θ ( x | θ ) . (52)The vectors h , . . . , h m and the scalars s , . . . , s m are arbi-trary, and can be optimized to maximize the bound (49). Toavoid a multidimensional nonconvex optimization problem, werestrict attention to m = n , h i = h e i , and s i = 1 / , assuggested by [10]. This results in a dependency on a singlescalar parameter h .Under these conditions, G ij can be written as G ij = 1 M ( h i ) M ( h j ) (cid:2) ˜ M ( h i − h j , − h j ) + ˜ M ( h i − h j , h i ) − ˜ M ( h i + h j , h j ) − ˜ M ( h i + h j , h i ) (cid:3) (53)where M ( h ) , E n L / ( x ; θ + h , θ ) o (54)and ˜ M ( h , h ) , E n L / ( x ; θ + h , θ ) Θ+ h o . (55)Note that we have used the corrected version of the Weiss–Weinstein bound [37]. Substituting the probability distributionof x and θ into the deﬁnitions of M ( h ) and ˜ M ( h , h ) , wehave M ( h ) = E n e −k θ + h − x k / σ e k θ − x k / σ Θ+ h o = V Cn ( r, k h k ) V n ( r ) e −k h k / σ (56)and, similarly, ˜ M ( h , h ) = e −k h k / σ V n ( r ) Z Θ Θ+ h Θ+ h d θ . (57)Thus, M ( h ) is a function only of k h k , and ˜ M ( h , h ) is afunction only of k h k , k h k , and k h − h k . Since h i = h e i ,it follows that, for i = j , the numerator of (53) vanishes. Thus, G is a diagonal matrix, whose diagonal elements equal G ii = 2 ˜ M (0 , h e ) − ˜ M (2 h e , h e ) M ( h e ) . (58)The Weiss–Weinstein bound is given by substituting this resultinto (49) and maximizing over h , i.e., E n k θ − ˆ θ k o ≥ max h ∈ [0 , r ] nh M ( h e )2[ ˜ M (0 , h e ) − ˜ M (2 h e , h e )] . (59)The value of h yielding the tightest bound can be determinedby performing a grid search. −20 −10 0 10 200.050.10.150.20.250.3 SNR (dB) M SE Actual MMSEOptimal−bias boundWeinstein−WeissBellini−Tartara (a) −20 −10 0 10 200.750.80.850.90.951 SNR (dB) R a t i o be t w een bound and a c t ua l MM SE Optimal−bias boundWeiss−WeinsteinBellini−Tartara (b)Fig. 2. Comparison of the MSE bounds and the minimum achievable MSE in a one-dimensional setting for which θ ∼ U [ − r, r ] and x | θ ∼ N ( θ, σ ) . To compare the OBB with the alternative approaches de-veloped above, we ﬁrst consider the one-dimensional case inwhich θ is uniformly distributed in the range Θ = ( − r, r ) . Let x = θ + w be a single noisy observation, where w is zero-mean Gaussian noise, independent of θ , with variance σ . Wewish to bound the MSE of an estimator of θ from x .The optimal bias function is given by (39). Using the factthat I / ( t ) = p /π sinh( t ) / √ t , we obtain b ( θ ) = − σ sinh( θ/σ )cosh( r/σ ) (60)which also follows [5] from Corollary 1. Substituting thisexpression into (20), we have that, for any estimator ˆ θ , E n ( θ − ˆ θ ) o ≥ σ (cid:18) − tanh( r/σ ) r/σ (cid:19) . (61)Apart from the reduction in computational complexity, thesimplicity of (61) also emphasizes several features of theestimation problem. First, the dependence of the problemon the dimensionless quantity r/σ , rather than on r and σ separately, is clear. This is to be expected, as a change in unitsof measurement would multiply both r and σ by a constant.Second, the asymptotic properties demonstrated in Theorems5 and 6 can be easily veriﬁed. For r ≫ σ , the bound convergesto the noise variance σ , corresponding to an uninformativeprior whose optimal estimator is ˆ θ = x ; whereas, for σ ≫ r ,a Taylor expansion of tanh( z ) /z immediately shows thatthe bound converges to r / , corresponding to the case ofuninformative measurements, where the optimal estimator is ˆ θ = 0 . Thus, the bound (61) is tight both for very low and forvery high SNR, as expected.In the one-dimensional case, we have V ( r ) = 2 r and V C ( r, h ) = max(2 r − h, , so that the extended Ziv–Zakaibound (48) and the Weiss–Weinstein bound (59) can also besimpliﬁed somewhat. In particular, the extended Ziv–Zakai bound (48) can be written as E n k θ − ˆ θ k o ≥ Z r (cid:18) − h r (cid:19) hQ (cid:18) h σ (cid:19) dh. (62)Using integration by parts, (62) becomes E n k θ − ˆ θ k o ≥ r Q (cid:16) rσ (cid:17) + σ (cid:20) Γ / (cid:18) r σ (cid:19) − √ π σr Γ (cid:18) r σ (cid:19)(cid:21) (63)where Γ a ( z ) = (1 / Γ( a )) R z e − t t a − dt is the incompleteGamma function. Like the expression (61) for the OBB, thisbound can be shown to converge to the noise variance σ when r ≫ σ and to the prior variance r / when σ ≫ r . However,while the convergence of the OBB to these asymptotic valueshas been demonstrated in general in Theorems 5 and 6, theasymptotic tightness of the Ziv–Zakai bound in the generalcase remains an open question.The Weiss–Weinstein bound (59) can likewise be simpliﬁedfurther in the one-dimensional case, yielding E n k θ − ˆ θ k o ≥ max h ∈ [0 , r ] h e − h / σ (cid:0) − h r (cid:1) (cid:0) − h r − max (cid:0) , − hr (cid:1) e − h / σ (cid:1) . (64)However, calculating this bound still requires a numericalsearch for the optimal value of h .These bounds are compared with the exact value of theMMSE in Fig. 2. In this ﬁgure, the SNR is deﬁned as SNR(dB) = 10 log (cid:18) Var( θ )Var( w ) (cid:19) = 10 log (cid:18) r σ (cid:19) . (65)The MMSE was computed by Monte Carlo approximation ofthe error of the optimal estimator E { θ | x } , which was itselfcomputed by numerical integration. Fig. 2(a) plots the MMSEand the values obtained by the aforementioned bounds, while −20 −15 −10 −5 0 5 10 15 200.10.20.30.40.50.6 SNR (dB) M SE Actual MSEOptimal−bias boundWeinstein−WeissExtended Ziv−Zakai (a) −20 −15 −10 −5 0 5 10 150.70.750.80.850.90.951 SNR (dB) R a t i o be t w een bound and op t i m a l M SE Optimal bias boundWeiss−WeinsteinExtended Ziv−Zakai (b)Fig. 3. Comparison of the MSE bounds and the minimum achievable MSE in a three-dimensional setting for which θ is uniformly distributed over a ballof radius r and x | θ ∼ N ( θ , σ I ) . Fig. 2(b) plots the ratio between each of the bounds and theactual MMSE in order to emphasize the difference in accuracybetween the various bounds. As can be seen from this ﬁgure,the OBB is closer to the true MSE than all other bounds, forall tested SNR values.The improvements provided by the OBB continue to holdin higher dimensions as well, although in this case it is notpossible to provide a closed form for any of the bounds. Forexample, Fig. 3 compares the aforementioned bounds with thetrue MMSE in the three-dimensional case. In this case the SNRis given by

SNR(dB) = 10 log (cid:18) Var( θ )Var( w ) (cid:19) = 10 log (cid:18) r σ (cid:19) . (66)Here, computation of the minimum MSE requires multi-dimensional numerical integration, and is by far more compu-tationally complex than the calculation of the bounds. Again, itis evident from this ﬁgure that the OBB is a very tight boundin all ranges of operation, and is considerably closer to thetrue value than either of the alternative approaches.VII. C ONCLUSION

Although often considered distinct settings, there are in-sightful connections between the Bayesian and deterministicestimation problems. One such relation is the use of thedeterministic CRB in a Bayesian problem. The applicationof this deterministic bound to the problem of estimating theminimum Bayesian MSE results in a Bayesian bound whichis provably tight at both high and low SNR values. Numericalsimulation of the location estimation problem demonstratesthat the technique is both simpler and tighter than alternativeapproaches. A

CKNOWLEDGEMENT

The authors are grateful to Dr. Volker Pohl for fruitfuldiscussions concerning many of the mathematical aspects of the paper. The authors would also like to thank the anonymousreviewers for their many constructive comments.A

PPENDIX IS OME T ECHNICAL L EMMAS

The proof of several theorems in the paper relies on thefollowing technical results.

Lemma 1:

Consider the minimization problems M ℓ = inf b ∈ S Z ℓ [ b ] , ℓ = 1 , , (67)where J ( θ ) is positive deﬁnite and bounded a.e. ( p θ ), Z [ b ] , Z Θ k b ( θ ) k p θ ( d θ ) Z [ b ] , Z Θ Tr (cid:18) I + ∂ b ∂ θ (cid:19) J − ( θ ) (cid:18) I + ∂ b ∂ θ (cid:19) T ! p θ ( d θ ) Z [ b ] , Z [ b ] + Z [ b ] (68)and S ⊂ H is convex, closed, and bounded under the H norm (16). Then, for each ℓ , there exists a function b (0) ∈ S such that Z [ b (0) ] = M ℓ . If ℓ = 1 or ℓ = 3 , then the minimizerof (67) is unique.Note that Z [ b ] equals Z [ b ] of (12); the notation Z [ b ] isintroduced for simplicity. Also note that under mild regularityassumptions on J ( θ ) , uniqueness can be demonstrated for ℓ =2 as well, but this is not necessary for our purposes. Proof:

The space H is a Cartesian product of n Sobolevspaces H (Θ) , each of which is a separable Hilbert space[38, § H is also a separable Hilbert space.It follows from the Banach–Alaoglu theorem [39, § H have weakly convergent subse-quences [32, § f (1) , f (2) , . . . ∈ H is said to converge weakly to f (0) ∈ H (denoted f ( i ) ⇀ f (0) ) if L [ f ( j ) ] → L [ f (0) ] (69) for all continuous linear functionals L [ · ] [32, § ℓ ∈ { , , } , let b ( i ) be a sequenceof functions in S such that Z ℓ [ b ( i ) ] → M ℓ . This is a boundedsequence since S is bounded, and therefore there exists asubsequence b ( i k ) which converges weakly to some b ( ℓ ) opt ∈ H .Furthermore, since S is closed, we have b ( ℓ ) opt ∈ S . We willnow show that Z ℓ [ b ( ℓ ) opt ] = M ℓ .To this end, it sufﬁces to show that Z ℓ [ · ] is weakly lowersemicontinuous, i.e., for any sequence f ( i ) ∈ H whichconverges weakly to f (0) ∈ H , we must show that Z ℓ [ f (0) ] ≤ lim inf i →∞ Z ℓ [ f ( i ) ] . (70)Consider a weakly convergent sequence f ( j ) ⇀ f (0) . Then,(69) holds for any continuous linear functional L [ · ] . Speciﬁ-cally, choose the continuous linear functional L [ f ] = Z Θ f (0) ( θ ) f ( θ ) p θ ( d θ ) . (71)We then have Z [ f (0) ] = L [ f (0) ]= lim j →∞ L [ f ( j ) ]= lim j →∞ Z Θ n X i =1 f (0) i ( θ ) f ( j ) i ( θ ) p θ ( d θ ) ≤ lim inf j →∞ sZ Θ k f (0) ( θ ) k p θ ( d θ ) · Z Θ k f ( j ) ( θ ) k p θ ( d θ )= q Z [ f (0) ] lim inf j →∞ q Z [ f ( j ) ] (72)where we have used the Cauchy–Schwarz inequality. It followsthat q Z [ f (0) ] ≤ lim inf j →∞ q Z [ f ( j ) ] (73)and therefore Z [ f (0) ] ≤ lim inf j →∞ Z [ f ( j ) ] , so that Z [ · ] is weakly lower semicontinuous.Similarly, consider the continuous linear functional L [ f ] = Z Θ Tr I + ∂ f (0) ∂ θ ! J − ( θ ) (cid:18) I + ∂ f ∂ θ (cid:19) T ! p θ ( d θ ) (74)for which we have Z [ f (0) ] = L [ f (0) ]= lim j →∞ L [ f ( j ) ]= lim j →∞ Z Θ Tr  I + ∂ f (0) ∂ θ ! J − ( θ ) · I + ∂ f ( j ) ∂ θ ! T  p θ ( d θ ) . (75) In fact, we require that S be “weakly closed” in the sense that weaklyconvergent sequences in S converge to an element in S . However, since S is convex, this notion is equivalent to the ordinary deﬁnition of closure [39, § Note that, for any positive deﬁnite matrix W , Tr(

AW B T ) is an inner product of the two matrices A and B . Therefore,by the Cauchy–Schwarz inequality, Tr(

AW B T ) ≤ q Tr(

AW A T ) Tr( BW B T ) . (76)Applying this to (75), we have Z [ f (0) ] ≤ lim inf j →∞ Z Θ vuuut Tr  I + ∂ f (0) ∂ θ ! J − ( θ ) I + ∂ f (0) ∂ θ ! T  · vuuut Tr  I + ∂ f ( j ) ∂ θ ! J − ( θ ) I + ∂ f ( j ) ∂ θ ! T  p θ ( d θ ) . (77)Once again using the Cauchy–Schwarz inequality results in Z [ f (0) ] ≤ lim inf j →∞ q Z [ f (0) ] Z [ f ( j ) ] (78)and therefore Z [ f (0) ] ≤ lim inf j →∞ Z [ f ( j ) ] , so that Z [ · ] isweakly lower semicontinuous. Since Z [ f ] = Z [ f ] + Z [ f ] ,it follows that Z [ · ] is also weakly lower semicontinuous.Now recall that b ( i k ) ⇀ b ( ℓ ) opt and Z ℓ [ b ( i k ) ] → M ℓ . By thedeﬁnition (70) of lower semicontinuity, it follows that Z ℓ [ b ( ℓ ) opt ] ≤ lim inf k →∞ Z ℓ [ b ( i k ) ] = M ℓ (79)and since M ℓ is the inﬁmum of Z ℓ [ b ] , we obtain Z [ b ( ℓ ) opt ] = M .Thus b ( ℓ ) opt is a minimizer of (67).It remains to show that for ℓ ∈ { , } , the minimizerof (67) is unique. To this end, we ﬁrst show that Z [ · ] isstrictly convex. Let b (0) , b (1) ∈ S be two essentially differentfunctions, i.e., p θ (cid:16)n θ ∈ Θ : b (0) ( θ ) = b (1) ( θ ) o(cid:17) > . (80)Let b (2) ( θ ) = λ b (0) ( θ ) + (1 − λ ) b (1) ( θ ) for some < λ < ,so that b (2) ∈ S by convexity. We then have Z [ b (2) ] = Z Q (cid:13)(cid:13)(cid:13) λ b (0) ( θ ) + (1 − λ ) b (1) ( θ ) (cid:13)(cid:13)(cid:13) p θ ( d θ )+ Z Θ \ Q (cid:13)(cid:13)(cid:13) λ b (0) ( θ ) + (1 − λ ) b (1) ( θ ) (cid:13)(cid:13)(cid:13) p θ ( d θ ) < Z Q h λ k b (0) ( θ ) k + (1 − λ ) k b (1) ( θ ) k i p θ ( θ )+ Z Θ \ Q h λ k b (0) ( θ ) k + (1 − λ ) k b (1) ( θ ) k i p θ ( θ )= λZ [ b (0) ] + (1 − λ ) Z [ b (1) ] (81)where the inequality follows from strict convexity of thesquared Euclidean norm k x k . Thus Z [ · ] is strictly convex,and hence has a unique minimum.Note that Z [ b ] = Z [ b ] + Z [ b ] . Since Z [ · ] is strictlyconvex and Z [ · ] is convex, it follows that Z [ · ] is strictlyconvex, and thus also has a unique minimum. This completesthe proof. The following lemma can be thought of as a triangleinequality for a normed space of matrix functions over Θ . Lemma 2:

Let p θ be a probability measure over Θ , and let M : Θ → R n × n be a matrix function. Suppose Z Θ k I + M ( θ ) k F p θ ( d θ ) ≤ α (82)for some constant α . It follows that Z Θ k M ( θ ) k F p θ ( d θ ) ≤ ( √ α + √ n ) . (83) Proof:

By the triangle inequality, k M ( θ ) k F = k M ( θ ) + I − I k F ≤ k M ( θ ) + I k F + k I k F . (84)Since k I k F = n , we have Z Θ k M ( θ ) k F p θ ( d θ ) ≤ Z Θ h k I + M ( θ ) k F + n + 2 √ n k I + M ( θ ) k F i p θ ( d θ ) . (85)Using the fact that Z Θ k I + M ( θ ) k F p θ ( d θ ) ≤ sZ Θ k I + M ( θ ) k F p θ ( d θ ) (86)and combining with (82), it follows that Z Θ k M ( θ ) k F p θ ( d θ ) ≤ α + n + 2 √ nα (87)which completes the proof.A PPENDIX

IIP

ROOF OF P ROPOSITION

Proof: [Proof of Proposition 1] Recall that Z [ b ] of (68)equals Z [ b ] . Thus, we would like to apply Lemma 1 (with ℓ = 3 ) to prove the unique existence of a minimizer of(17). However, Lemma 1 requires that the minimization beperformed over a closed, bounded, and convex set S , whereas(17) is performed over the unbounded set H . To resolvethis issue, we must show that the minimization (17) can bereformulated as a minimization over a closed, bounded, andconvex set S .To this end, note that Z [ ] = Z Θ Tr( J − ( θ )) p θ ( d θ ) , U (88)and therefore M ≤ U < ∞ . Thus, it sufﬁces to perform theminimization (17) over those functions for which Z [ b ] ≤ U .We now show that this can be achieved by minimizing overa closed, bounded, and convex set S . First, note that Z [ b ] ≥k b k L , so that one may choose to minimize (17) only overfunctions b for which k b k L ≤ U. (89) Similarly, we have Z [ b ] ≥ Z Θ Tr (cid:18) I + ∂ b ∂ θ (cid:19) J − ( θ ) (cid:18) I + ∂ b ∂ θ (cid:19) T ! p θ ( d θ ) (90)so that it sufﬁces to minimize (17) over functions b for which Z Θ Tr (cid:18) I + ∂ b ∂ θ (cid:19) J − ( θ ) (cid:18) I + ∂ b ∂ θ (cid:19) T ! p θ ( d θ ) ≤ U. (91)Note that J ( θ ) is bounded a.e., and therefore λ min ( J − ) ≥ /K a.e., for some constant K . It follows that Tr (cid:18) I + ∂ b ∂ θ (cid:19) J − ( θ ) (cid:18) I + ∂ b ∂ θ (cid:19) T ! ≥ K (cid:13)(cid:13)(cid:13)(cid:13) I + ∂ b ∂ θ (cid:13)(cid:13)(cid:13)(cid:13) F a.e. ( p θ ) . (92)Combining with (91) yields Z Θ (cid:13)(cid:13)(cid:13)(cid:13) I + ∂ b ∂ θ (cid:13)(cid:13)(cid:13)(cid:13) F p θ ( d θ ) ≤ KU. (93)From Lemma 2, we then have Z Θ (cid:13)(cid:13)(cid:13)(cid:13) ∂ b ∂ θ (cid:13)(cid:13)(cid:13)(cid:13) F p θ ( d θ ) ≤ (cid:16) √ n + √ KU (cid:17) . (94)From (89) and (94) it follows that the minimization (17) canbe limited to the closed, bounded, convex set S = (cid:26) b ∈ H : k b k H ≤ U + (cid:16) √ KU + √ n (cid:17) (cid:27) . (95)Applying Lemma 1 proves the unique existence of a minimizerof (17). The proof that < s < ∞ appears immediately afterthe statement of Proposition 1.A PPENDIX

IIIP

ROOF OF T HEOREM

Proof: [Proof of Theorem 2] Consider the more generalproblem of minimizing the functional Z [ b ] = Z Θ F [ b , θ ] d θ (96)where F [ b , θ ] is smooth and convex in b : Θ → R n , and Θ ⊂ R n is a bounded set with a smooth boundary Λ . Then, Z [ b ] is also smooth and convex in b , so that b is a globalminimum of Z [ b ] if and only if the differential δZ [ h ] equalszero at b for all admissible functions h : Θ → R n [40].By a standard technique [40, § δZ [ h ] = ǫ X i Z Θ  ∂F∂b i − X j ∂∂θ j ∂F∂b ( j ) i  h i ( θ ) d θ + ǫ X i Z Λ ∂F∂b (1) i , . . . , ∂F∂b ( n ) i ! T ν ( θ ) h i ( θ ) dσ (97) where ǫ is an inﬁnitesimal quantity, b ( j ) i = ∂b i /∂θ j , and ν ( θ ) is an outward-pointing normal at the boundary point θ ∈ Λ . We now seek conditions for which δZ [ h ] = 0 forall h ( θ ) . Consider ﬁrst functions h ( θ ) which equal zero onthe boundary Λ . In this case, the second integral vanishes, andwe obtain the Euler–Lagrange equations ∀ i, ∂F∂b i − X j ∂∂θ j ∂F∂b ( j ) i = 0 . (98)Substituting this result back into (97), and again using the factthat δZ [ h ] = 0 for all h , we obtain the boundary condition ∀ i, ∀ θ ∈ Λ , ∂F∂b (1) i , . . . , ∂F∂b ( n ) i ! T ν ( θ ) = 0 . (99)Plugging F [ b , θ ] = CRB[ b , θ ] p θ ( θ ) into (98) and (99) pro-vides the required result.A PPENDIX

IVP

ROOF OF T HEOREM

Lemma 3:

Under the conditions of Theorem 3, the func-tional Z [ b ] of (12) is rotation and reﬂection invariant, i.e., Z [ b ] = Z [ U b ] for any unitary matrix U . Proof:

We ﬁrst demonstrate that Z [ b ] is rotation invari-ant. From the deﬁnitions of Z [ b ] and CRB[ b , θ ] , we have Z [ b ] = Z Θ Tr "(cid:18) I + ∂ b ∂ θ (cid:19) (cid:18) I + ∂ b ∂ θ (cid:19) T q ( k θ k ) J ( k θ k ) d θ + Z Θ k b ( θ ) k q ( k θ k ) d θ . (100)The second integral is clearly rotation invariant, since arotation of b does not alter its norm. It remains to show that theﬁrst integral, which we denote by I [ b ] , does not change when b is rotated. To this end, we begin by considering a rotationabout the ﬁrst two coordinates, such that b is transformed to ˜ b , R φ b , where the rotation matrix R φ is deﬁned such that R φ b = ( b cos φ + b sin φ, − b sin φ + b cos φ, b , . . . , b n ) T . (101)We must thus show that I [ b ] = I [˜ b ] . Let us perform thechange of variables θ ˜ θ , where ˜ θ = R ( − φ ) θ . Rewritingthe trace in (100) as a sum, we have I [˜ b ] = Z Θ X i,j δ ij + ∂ ˜ b i ∂θ j ! q ( k ˜ θ k ) J ( k ˜ θ k ) d ˜ θ (102)where we have used the facts that k θ k = k ˜ θ k and that Θ doesnot change under the change of variables. We now demonstrate some properties of the transformationof b and θ . First, we have, for any j , ∂ ˜ b ∂θ j ! + ∂ ˜ b ∂θ j ! = (cid:18) ∂b ∂θ j cos φ + ∂b ∂θ j sin φ (cid:19) + (cid:18) − ∂b ∂θ j sin φ + ∂b ∂θ j cos φ (cid:19) = (cid:18) ∂b ∂θ j (cid:19) + (cid:18) ∂b ∂θ j (cid:19) . (103)Also, for any i , (cid:18) ∂b i ∂ ˜ θ (cid:19) + (cid:18) ∂b i ∂ ˜ θ (cid:19) = (cid:18) ∂b i ∂θ ∂θ ∂ ˜ θ + ∂b i ∂θ ∂θ ∂ ˜ θ (cid:19) + (cid:18) ∂b i ∂θ ∂θ ∂ ˜ θ + ∂b i ∂θ ∂θ ∂ ˜ θ (cid:19) = (cid:18) ∂b i ∂θ (cid:19) + (cid:18) ∂b i ∂θ (cid:19) (104)where we used the fact that θ = R φ ˜ θ . Third, we have ∂ ˜ b ∂θ = ∂b ∂ ˜ θ cos φ + ∂b ∂ ˜ θ sin φ cos φ + ∂b ∂ ˜ θ sin φ cos φ + ∂b ∂ ˜ θ sin φ,∂ ˜ b ∂θ = ∂b ∂ ˜ θ sin φ − ∂b ∂ ˜ θ sin φ cos φ − ∂b ∂ ˜ θ sin φ cos φ + ∂b ∂ ˜ θ cos φ, (105)so that ∂ ˜ b ∂θ + ∂ ˜ b ∂θ = ∂b ∂ ˜ θ + ∂b ∂ ˜ θ . (106)We now show that X i,j δ ij + ∂ ˜ b i ∂θ j ! = X i,j δ ij + ∂b i ∂ ˜ θ j ! . (107)For terms with i, j ≥ , we have b i = ˜ b i and θ j = ˜ θ j , so thatreplacing ˜ b with b and θ with ˜ θ does not change the result.The terms with i = 1 , and j ≥ do not change because of(103), while the terms with i ≥ and j = 1 , do not changebecause of (104). It remains to show that the terms i, j = 1 , do not modify the sum. To this end, we write out these four terms as ∂ ˜ b ∂θ ! + ∂ ˜ b ∂θ ! + ∂ ˜ b ∂θ ! + ∂ ˜ b ∂θ ! = 2 + 2 ∂ ˜ b ∂θ + 2 ∂ ˜ b ∂θ + ∂ ˜ b ∂θ ! + ∂ ˜ b ∂θ ! + ∂ ˜ b ∂θ ! + ∂ ˜ b ∂θ ! = 2 + 2 ∂b ∂ ˜ θ + 2 ∂b ∂ ˜ θ + (cid:18) ∂b ∂ ˜ θ (cid:19) + (cid:18) ∂b ∂ ˜ θ (cid:19) + (cid:18) ∂b ∂ ˜ θ (cid:19) + (cid:18) ∂b ∂ ˜ θ (cid:19) = (cid:18) ∂b ∂ ˜ θ (cid:19) + (cid:18) ∂b ∂ ˜ θ (cid:19) + (cid:18) ∂b ∂ ˜ θ (cid:19) + (cid:18) ∂b ∂ ˜ θ (cid:19) (108)where, in the second transition, we have used (103), (104),and (106). It follows that I [˜ b ] of (102) is equal to I [ b ] , andhence Z [ b ] = Z [˜ b ] . The result similarly holds for rotationsabout any other two coordinates. Since any rotation can bedecomposed into a sequence of two-coordinate rotations, weconclude that Z [ b ] is rotation invariant.Next, we prove that Z [ b ] is invariant to reﬂections throughhyperplanes containing the origin. Since Z [ b ] is invariant torotations, it sufﬁces to choose a single hyperplane, say { θ : θ = 0 } . Let ˜ b , ( − b ( θ ) , b ( θ ) , . . . , b n ( θ )) T (109)be the reﬂection of b , and consider the corresponding changeof variables ˜ θ , ( − θ , θ , . . . , θ n ) T . (110)By the symmetry assumptions, p θ and J are unaffected by thechange of variables; furthermore, ∂ ˜ b /∂ ˜ θ = ∂ b /∂ θ . It followsthat CRB[˜ b , ˜ θ ] = CRB[ b , θ ] , and therefore Z [ b ] = Z [˜ b ] . Lemma 4:

Suppose b ( θ ) is radial and rotation invariant,i.e., b ( θ ) = t ( k θ k ) θ for some function t ∈ H . Alsosuppose that J ( θ ) = J ( k θ k ) I , where J ( · ) is a scalar function.Then, CRB[ b , θ ] of (11) is rotation invariant in θ , i.e., CRB[ b , Rθ ] = CRB[ b , θ ] for any rotation matrix R . Proof:

We will show that

CRB[ b , θ ] depends on θ onlythrough k θ k , and is therefore rotation invariant. For the givenvalue of b ( θ ) and J ( θ ) , we have CRB[ b , θ ]= k b ( θ ) k + Tr "(cid:18) I + ∂ b ∂ θ (cid:19) J − ( θ ) (cid:18) I + ∂ b ∂ θ (cid:19) T = t k θ k + 1 J ( k θ k ) Tr "(cid:18) I + ∂t θ ∂ θ (cid:19) (cid:18) I + ∂t θ ∂ θ (cid:19) T (111)where, for notational convenience, we have omitted the de-pendence of t on k θ k . It remains to show that the trace inthe above expression is a function of θ only through k θ k . To this end, we note that ∂b i ∂θ j = tδ ij + t ′ θ i ∂ k θ k ∂θ j = tδ ij + 2 t ′ θ i θ j (112)where δ ij is the Kronecker delta. It follows that (cid:18) δ ij + ∂b i ∂θ j (cid:19) = (1 + t ) δ ij + 4(1 + t ) t ′ θ i θ j δ ij + 4 t ′ θ i θ j . (113)Therefore Tr "(cid:18) I + ∂ b ∂ θ (cid:19) (cid:18) I + ∂ b ∂ θ (cid:19) T = X i,j (cid:18) δ ij + ∂b i ∂θ j (cid:19) = n (1 + t ) + 4 t ′ X i,j θ i θ j + 4(1 + t ) t ′ X i θ i = n (1 + t ) + 4 t ′ k θ k + 4(1 + t ) t ′ k θ k . (114)Thus, CRB[ b , θ ] depends on θ only through k θ k , completingthe proof. Proof: [Proof of Theorem 3] We have seen in Theorem 2that the solution of (20) is unique. Now suppose that theoptimum b is not rotation invariant, i.e., there exists a rotationmatrix R such that Rb ( θ ) is not identical to b ( θ ) . ByLemma 3, Rb ( θ ) is also optimal, which is a contradiction.Furthermore, suppose that b is not radial, i.e., for some valueof θ , b ( θ ) contains a component perpendicular to the vector θ . Consider a hyperplane passing through the origin, whosenormal is the aforementioned perpendicular component. ByLemma 3, The reﬂection of b through this hyperplane is alsoan optimal solution of (20), which is again a contradiction.Therefore, the optimum b is spherically symmetric and radial,so that it can be written as b ( θ ) = b ( k θ k ) θ k θ k (115)where b ( · ) is a scalar function.To determine the value of b ( · ) , it sufﬁces to analyze thedifferential equation (21) along a straight line from the originto the boundary. We choose a line along the θ axis, and beginby calculating the derivatives of b ( θ ) , q ( k θ k ) , and J ( k θ k ) along this axis. The derivative of q ( k θ k ) is given by ∂q∂θ j = q ′ ( ρ ) θ j ρ (116)where we have denoted ρ = k θ k , so that ρ is weaklydifferentiable and ∂ρ∂θ j = θ j ρ . (117)Along the θ axis, we have θ = ρ while θ = · · · = θ n = 0 ,so that ∂q∂θ j (cid:12)(cid:12)(cid:12)(cid:12) θ = ρ e = q ′ ( ρ ) δ j . (118)Similarly, since J ( θ ) = J ( ρ ) I , ∂ ( J − ) jk ∂θ j = − J ′ ( ρ ) J ( ρ ) θ j ρ δ jk (119)so that along the θ axis ∂ ( J − ) jk ∂θ j (cid:12)(cid:12)(cid:12)(cid:12) θ = ρ e = − J ′ ( ρ ) J ( ρ ) δ jk δ j . (120) From (115), we have ∂b i ∂θ j = b ′ ( ρ ) θ i θ j ρ + b ( ρ ) ρ (cid:18) δ ij − θ i θ j ρ (cid:19) . (121)Thus, on the θ axis, we have ∂b ∂θ j (cid:12)(cid:12)(cid:12)(cid:12) θ = ρ e = b ′ ( ρ ) δ j . (122)The second derivative of b i ( θ ) can be shown to equal ∂ b i ∂θ j ∂θ k = b ′′ ( ρ ) θ i θ j θ k ρ + (cid:18) b ′ ( ρ ) ρ − b ( ρ ) ρ (cid:19) (cid:18) θ i ρ δ jk + θ j ρ δ ik + θ k ρ δ ij − θ i θ j θ k ρ (cid:19) . (123)Therefore, on the θ axis ∂ b ∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ = ρ e = b ′′ ( ρ ) ∂ b ∂θ j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = ρ e = b ′ ( ρ ) ρ − b ( ρ ) ρ ( j = 1) ∂ b ∂θ j ∂θ k (cid:12)(cid:12)(cid:12)(cid:12) θ = ρ e = 0 ( j, k = 1) . (124)Substituting these derivatives into (21), we obtain q ( ρ ) b ( ρ ) = q ( ρ ) J ( ρ ) (cid:18) b ′′ ( ρ ) + ( n − b ′ ( ρ ) ρ − ( n − b ( ρ ) ρ (cid:19) + (1 + b ′ ( ρ )) (cid:18) q ′ ( ρ ) J ( ρ ) − q ( ρ ) J ′ ( ρ ) J ( ρ ) (cid:19) (125)which is equivalent to (25).To obtain the boundary conditions, observe that Lemma 3implies b ( ) = , whence we conclude that b (0) = 0 . Next,evaluate the boundary condition (22) at boundary point θ = r e , where the surface normal ν ( θ ) equals e , so that b ′ ( ρ ) = 1 + ∂b ∂θ = 0 , θ = r e (126)which is equivalent to the boundary condition b ′ ( r ) = − .To ﬁnd the OBB (24), we must now calculate Z [ b ] forthe obtained bias function (115). To this end, note that, byLemma 4, CRB[ b , θ ] is rotation invariant in θ for the required b ( θ ) . Thus, the integrand CRB[ b , θ ] q ( k θ k ) is constant on any ( n − -sphere centered on the origin, so that Z [ b ] = Z r CRB[ b , ρ e ] q ( ρ ) S n ( ρ ) dρ (127)where S n ( ρ ) = 2 π n/ Γ( n/ ρ n − (128)is the hypersurface area of an ( n − -sphere of radius ρ [35].It thus sufﬁces to calculate the value of CRB[ b , θ ] at pointsalong the θ axis. From (121), it follows that ∂ b ∂ θ (cid:12)(cid:12)(cid:12)(cid:12) θ = ρ e = diag (cid:18) b ′ ( ρ ) , b ( ρ ) ρ , . . . , b ( ρ ) ρ (cid:19) . (129) Substituting this into the deﬁnition of CRB[ b , θ ] , we obtain CRB[ b , ρ e ]= b ( ρ ) + 1 J ( ρ ) (1 + b ′ ( ρ )) + n − J ( ρ ) (cid:18) b ( ρ ) ρ (cid:19) . (130)Combining (130) with (127) yields (24), as required.A PPENDIX VP ROOFS OF A SYMPTOTIC P ROPERTIES

Theorems 5 and 6 demonstrate asymptotic tightness of theOBB. The proofs of these two theorems follow.

Proof: [Proof of Theorem 5] We begin the proof bystudying a certain optimization problem, whose relevance willbe demonstrated shortly. Let t ≥ be a constant and considerthe problem u ( t ) = inf b ∈ H Z Θ (cid:13)(cid:13)(cid:13)(cid:13) I + ∂ b ∂ θ (cid:13)(cid:13)(cid:13)(cid:13) F p θ ( d θ ) s.t. Z Θ k b ( θ ) k p θ ( d θ ) ≤ t. (131)Notice that u ( t ) ≤ n for all t , since an objective having a valueof n is achieved by the function b ( θ ) = . Thus, it sufﬁcesto perform the minimization (131) over functions b ∈ H satisfying Z Θ (cid:13)(cid:13)(cid:13)(cid:13) I + ∂ b ∂ θ (cid:13)(cid:13)(cid:13)(cid:13) F p θ ( d θ ) ≤ n. (132)It follows from Lemma 2 that such functions also satisfy Z Θ (cid:13)(cid:13)(cid:13)(cid:13) ∂ b ∂ θ (cid:13)(cid:13)(cid:13)(cid:13) F p θ ( d θ ) ≤ (2 √ n ) = 4 n. (133)Therefore, (131) is equivalent to the minimization u ( t ) = inf b ∈ S t Z Θ (cid:13)(cid:13)(cid:13)(cid:13) I + ∂ b ∂ θ (cid:13)(cid:13)(cid:13)(cid:13) F p θ ( d θ ) (134)where S t = (cid:26) b ∈ H : Z Θ k b ( θ ) k p θ ( d θ ) ≤ t, Z Θ (cid:13)(cid:13)(cid:13)(cid:13) ∂ b ∂ θ (cid:13)(cid:13)(cid:13)(cid:13) F p θ ( d θ ) ≤ n (cid:27) . (135)The set S t is convex, closed, and bounded in H . ApplyingLemma 1 (with ℓ = 2 ) implies that there exists a function b opt ∈ S t which minimizes (134), and hence also minimizes(131).Note that the objective in (131) is zero if and only if ∂ b opt ∂ θ = − I a.e. ( p θ ) . (136)The only functions in H satisfying this requirement are thefunctions b ( θ ) = k − θ a.e. ( p θ ) (137)for some constant k ∈ R n . Let µ , E { θ } and deﬁne v , E (cid:8) k θ − E { θ } k (cid:9) . (138) For functions of the form (137), the constraint of (131) is givenby Z Θ k k − θ k p θ ( d θ ) = Z Θ k k − µ + µ − θ k p θ ( d θ )= k k − µ k + v ≥ v. (139)In (139), equality is obtained if and only if k = µ . Therefore,if t < v , no functions satisfying (136) are feasible, and thus u ( t ) = 0 if t ≥ v,u ( t ) > if t < v. (140)We now return to the setting of Theorem 5. We must showthat β N → v as N → ∞ . We denote functions correspondingto the problem of estimating θ from x ( N ) with a superscript ( N ) . Thus, for example, Z ( N ) [ b ] denotes the functional Z [ b ] of (12) for the problem corresponding to the measurementvector x ( N ) .Since all eigenvalues of J ( N ) ( θ ) decrease monotonicallywith N for p θ -almost all θ , we have CRB ( N ) [ b , θ ] ≤ CRB ( N +1) [ b , θ ] (141)for any b ∈ H , for p θ -almost all θ , and for all N . Therefore Z ( N ) [ b ] ≤ Z ( N +1) [ b ] . (142)for any b ∈ H and for all N . It follows that for all Nβ N = min b ∈ H Z ( N ) [ b ] ≤ min b ∈ H Z ( N +1) [ b ] = β N +1 (143)so that β N is a non-decreasing sequence. Furthermore, notethat Z ( N ) [ µ − θ ] = v for all N (144)where v is given by (138). Therefore, β N ≤ v for all N . Thus β N converges to some value q , and we have β N ≤ q ≤ v for all N. (145)To prove the theorem, it remains to show that q = v .Let b ( N ) be the minimizer of (17) when θ is estimated from x ( N ) ; this minimizer exists by virtue of Proposition 1. We thenhave β N = Z ( N ) [ b ( N ) ] ≤ q (146)and therefore Z Θ k b ( N ) ( θ ) k p θ ( d θ ) ≤ q. (147)It follows that b ( N ) satisﬁes the constraint of the optimizationproblem (131) with t = q . As a consequence, we have Z Θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) I + ∂ b ( N ) ∂ θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F p θ ( d θ ) ≥ u ( q ) . (148)Deﬁne λ N , ess sup θ ∈ Θ λ max ( J ( N ) ( θ )) (149) and note that λ N > for all N , since J ( N ) ( θ ) is positivedeﬁnite. Thus Z ( N ) [ b ( N ) ] ≥ Z Θ Tr  I + ∂ b ( N ) ∂ θ ! (cid:16) J ( N ) ( θ ) (cid:17) − · I + ∂ b ( N ) ∂ θ ! T  p θ ( d θ ) ≥ λ N Z Θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) I + ∂ b ( N ) ∂ θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F p θ ( d θ ) ≥ u ( q ) λ N . (150)Assume by contradiction that q < v . From (140), it thenfollows that u ( q ) > . Since all eigenvalues of J ( N ) ( θ ) decrease to zero, we have λ N → , and thus β N ≥ u ( q ) λ N → ∞ . (151)This contradicts the fact (145) that β N ≤ v . We conclude that q = v , as required. Proof: [Proof of Theorem 6] The proof is analogous tothat of Theorem 5. We begin by considering the optimizationproblem inf b ∈ H Z Θ k b ( θ ) k p θ ( d θ ) s.t. Z Θ Tr (cid:18) I + ∂ b ∂ θ (cid:19) J − ( θ ) (cid:18) I + ∂ b ∂ θ (cid:19) T ! p θ ( d θ ) ≤ t (152)for some constant t ≥ . Denote the minimum value of (152)by w ( t ) . Let µ = E { θ } and note that b ( θ ) = µ − θ satisﬁesthe constraint in (152) for any t ≥ , and has an objectiveequal to v of (138). Thus, to determine w ( t ) , it sufﬁces tominimize (152) over the set S t = (cid:26) b ∈ H : Z Θ k b ( θ ) k p θ ( d θ ) ≤ v, Z Θ Tr (cid:18) I + ∂ b ∂ θ (cid:19) J − ( θ ) (cid:18) I + ∂ b ∂ θ (cid:19) T ! p θ ( d θ ) ≤ t (cid:27) . Deﬁne λ , ess sup θ ∈ Θ λ max ( J ( θ )) . (153)Since J ( θ ) is positive deﬁnite almost everywhere, we have λ > . For any b ∈ S t , we have λ Z Θ (cid:13)(cid:13)(cid:13)(cid:13) I + ∂ b ∂ θ (cid:13)(cid:13)(cid:13)(cid:13) F p θ ( d θ ) ≤ t (154)and therefore, by Lemma 2, Z Θ (cid:13)(cid:13)(cid:13)(cid:13) ∂ b ∂ θ (cid:13)(cid:13)(cid:13)(cid:13) F p θ ( d θ ) ≤ (cid:16) √ tλ + √ n (cid:17) . (155) Hence, for any b ∈ S t , k b k H = Z Θ k b ( θ ) k p θ ( d θ ) + Z Θ (cid:13)(cid:13)(cid:13)(cid:13) ∂ b ∂ θ (cid:13)(cid:13)(cid:13)(cid:13) F p θ ( d θ ) ≤ v + (cid:16) √ tλ + √ n (cid:17) . (156)Thus S t is bounded for all t . It is straightforward to show that S t is also closed and convex. Therefore, employing Lemma 1(with ℓ = 1 ) ensures that there exists a (unique) b opt ∈ S t minimizing (152).Note that the objective in (152) is 0 if and only if b opt ( θ ) = almost everywhere. So, if ∈ S t , we have w ( t ) = 0 , andotherwise w ( t ) > . Let us deﬁne s , E (cid:8) Tr( J − ( θ )) (cid:9) (157)and note that ∈ S t if and only if t ≥ s . Thus w ( t ) = 0 for t ≥ sw ( t ) > otherwise. (158)Let us now return to the setting of Theorem 6. For sim-plicity, we denote functions corresponding to the problemof estimating θ from { x (1) , . . . , x ( N ) } with a superscript ( N ) . For example, from the additive property of the Fisherinformation [2, § J ( N ) ( θ ) = N J ( θ ) . (159)It follows that ( N + 1)CRB ( N +1) [ b , θ ] ≥ N CRB ( N ) [ b , θ ] (160)for all b ∈ H , all θ ∈ Θ , and all N . Therefore ( N + 1) Z ( N +1) [ b ] ≥ N Z ( N ) [ b ] (161)for all b ∈ H , and hence ( N + 1) β N +1 = min b ∈ H (cid:16) ( N + 1) Z ( N +1) [ b ] (cid:17) ≥ min b ∈ H (cid:16) N Z ( N ) [ b ] (cid:17) = N β N . (162)Thus { N β N } is a non-decreasing sequence. Furthermore, wehave N Z ( N ) [ ] = s (163)so that N β N ≤ s for all N . It follows that { N β N } is non-decreasing and bounded, and therefore converges to somevalue r such that N β N ≤ r ≤ s for all N. (164)To prove the theorem, we must show that r = s .Let b ( N ) ∈ H denote the minimizer of (17) when θ is estimated from { x (1) , . . . , x ( N ) } (the existence of b ( N ) is guaranteed by Proposition 1). We then have N β N = N Z ( N ) [ b ( N ) ] ≤ r , so that Z Θ Tr  I + ∂ b ( N ) ∂ θ ! J − ( θ ) I + ∂ b ( N ) ∂ θ ! T  p θ ( d θ ) ≤ r. (165) Thus, b ( N ) satisﬁes the constraint of (152) with t = r . As aconsequence, we have Z Θ k b ( N ) ( θ ) k p θ ( d θ ) ≥ w ( r ) (166)and therefore N β N = N Z ( N ) [ b ( N ) ] ≥ N Z Θ k b ( N ) ( θ ) k p θ ( d θ ) ≥ N w ( r ) . (167)Now suppose by contradiction that r < s . It follows from(158) that w ( r ) > . Hence, by (167), N β N → ∞ , whichcontradicts the fact that N β N is bounded. We conclude that r = s , as required. R EFERENCES[1] J. O. Berger,

Statistical Decision Theory and Bayesian Analysis , 2nd ed.New York, NY: Springer-Verlag, 1985.[2] S. M. Kay,

Fundamentals of Statistical Signal Processing: EstimationTheory . Englewood Cliffs, NJ: Prentice Hall, 1993.[3] H. L. Van Trees,

Detection, Estimation, and Modulation Theory . NewYork: Wiley, 1968, vol. 1.[4] J. Ziv and M. Zakai, “Some lower bounds on signal parameter estima-tion,”

IEEE Trans. Inf. Theory , vol. 15, no. 3, pp. 386–391, May 1969.[5] T. Y. Young and R. A. Westerberg, “Error bounds for stochasticestimation of signal parameters,”

IEEE Trans. Inf. Theory , vol. 17, no. 5,pp. 549–557, Sep. 1971.[6] S. Bellini and G. Tartara, “Bounds on error in signal parameter estima-tion,”

IEEE Trans. Commun. , vol. 22, no. 3, pp. 340–342, 1974.[7] D. Chazan, M. Zakai, and J. Ziv, “Improved lower bounds on signalparameter estimation,”

IEEE Trans. Inf. Theory , vol. 21, no. 1, pp. 90–93, 1975.[8] B. Z. Bobrovski and M. Zakai, “A lower bound on the estimation errorfor certain diffusion problems,”

IEEE Trans. Inf. Theory , vol. 22, no. 1,pp. 45–52, Jan. 1976.[9] A. J. Weiss and E. Weinstein, “A lower bound on the mean-square errorin random parameter estimation,”

IEEE Trans. Inf. Theory , vol. 31, no. 5,pp. 680–682, Sep. 1985.[10] E. Weinstein and A. J. Weiss, “A general class of lower bounds inparameter estimation,”

IEEE Trans. Inf. Theory , vol. 34, no. 2, pp. 338–342, Mar. 1988.[11] K. L. Bell, Y. Steinberg, Y. Ephraim, and H. L. Van Trees, “ExtendedZiv–Zakai lower bound for vector parameter estimation,”

IEEE Trans.Inf. Theory , vol. 43, no. 2, pp. 624–637, 1997.[12] A. Renaux, P. Forster, P. Larzabal, and C. Richmond, “The BayesianAbel bound on the mean square error,” in

Proc. Int. Conf. Acoust.,Speech and Signal Processing (ICASSP 2006) , vol. III, Toulouse, France,May 2006, pp. 9–12.[13] E. L. Lehmann and G. Casella,

Theory of Point Estimation , 2nd ed.New York: Springer, 1998.[14] Y. C. Eldar, “Rethinking biased estimation: Improving maximum like-lihood and the Cram´er–Rao bound,”

Foundations and Trends in SignalProcessing , vol. 1, no. 4, pp. 305–449, 2008.[15] S. M. Kay and Y. C. Eldar, “Rethinking biased estimation,”

IEEE SignalProcess. Mag. , vol. 25, no. 3, pp. 133–136, May 2008.[16] H. Cram´er, “A contribution to the theory of statistical estimation,”

Skand.Akt. Tidskr. , vol. 29, pp. 85–94, 1945.[17] C. R. Rao, “Information and accuracy attainable in the estimation ofstatistical parameters,”

Bull. Calcutta Math. Soc. , vol. 37, pp. 81–91,1945.[18] J. M. Hammersley, “On estimating restricted parameters,”

J. Roy. Statist.Soc. B , vol. 12, no. 2, pp. 192–240, 1950.[19] D. G. Chapman and H. Robbins, “Minimum variance estimation withoutregularity assumptions,”

Ann. Math. Statist. , vol. 22, no. 4, pp. 581–586,Dec. 1951.[20] P. K. Bhattacharya, “Estimating the mean of a multivariate normalpopulation with general quadratic loss function,”

Ann. Math. Statist. ,vol. 37, no. 6, pp. 1819–1824, Dec. 1966.[21] E. W. Barankin, “Locally best unbiased estimates,”

Ann. Math. Statist. ,vol. 20, no. 4, pp. 477–501, Dec. 1949. [22] J. S. Abel, “A bound on mean-square-estimate error,” IEEE Trans. Inf.Theory , vol. 39, no. 5, pp. 1675–1680, 1993.[23] A. O. Hero, J. A. Fessler, and M. Usman, “Exploring estimator bias-variance tradeoffs using the uniform CR bound,”

IEEE Trans. SignalProcess. , vol. 44, no. 8, pp. 2026–2041, 1996.[24] P. Forster and P. Larzabal, “On lower bounds for deterministic parameterestimation,” in

Proc. Int. Conf. Acoust., Speech and Signal Processing(ICASSP 2002) , vol. 2, Orlando, FL, May 2002, pp. 1137–1140.[25] Y. C. Eldar, “Minimum variance in biased estimation: Bounds andasymptotically optimal estimators,”

IEEE Trans. Signal Process. , vol. 52,no. 7, pp. 1915–1930, 2004.[26] ——, “Uniformly improving the Cram´er-Rao bound and maximum-likelihood estimation,”

IEEE Trans. Signal Process. , vol. 54, no. 8, pp.2943–2956, 2006.[27] ——, “MSE bounds with afﬁne bias dominating the Cram´er–Raobound,”

IEEE Trans. Signal Process. , vol. 56, no. 8, pp. 3824–3836,Aug. 2008.[28] I. A. Ibragimov and R. Z. Has’minskii,

Statistical Estimation: Asymp-totic Theory . New York: Springer, 1981.[29] A. Renaux, “Contribution `a l’analyse des performances d’estimationen traitement statistique du signal,” Ph.D. dissertation, `EcoleNormale Superieure de Cachan, 2006. [Online]. Available:http://tel.archives-ouvertes.fr/tel-00129527/[30] Z. Ben-Haim and Y. C. Eldar, “A Bayesian estimation bound based onthe optimal bias function,” in

Proc. 2nd Int. Workshop on ComputationalAdv. in Multi-Sensor Adapt. Process. (CAMSAP 2007) , St. Thomas, U.S.Virgin Islands, Dec. 2007.[31] J. Shao,

Mathematical Statistics , 2nd ed. New York: Springer, 2003.[32] E. H. Lieb and M. Loss,

Analysis , 2nd ed. American MathematicalSociety, 2001.[33] D. R. Cox and N. Reid, “Parameter orthogonality and approximateconditional inference,”

J. Roy. Statist. Soc. B , vol. 49, no. 1, pp. 1–39,1987.[34] H. L. Van Trees and K. L. Bell,

Bayesian Bounds for ParameterEstimation and Nonlinear Filtering/Tracking . New York: Wiley, 2007.[35] I. M. Vinogradov, Ed.,

Encyclopaedia of Mathematics . Dordrecht, TheNetherlands: Kluwer, 1995.[36] M. Abramowitz and I. A. Stegun,

Handbook of Mathematical Functionswith Formulas, Graphs, and Mathematical Tables . New York: Dover,1964.[37] Z. Ben-Haim and Y. C. Eldar, “A comment on the use of the Weiss–Weinstein bound with constrained parameter sets,”

IEEE Trans. Inf.Theory , vol. 54, no. 10, pp. 4682–4684, Oct. 2008.[38] L. P. Lebedev and M. J. Cloud,

The Calculus of Variations andFunctional Analysis . New Jersey: World Scientiﬁc, 2003.[39] W. Rudin,

Functional Analysis . New York: McGraw-Hill, 1973.[40] I. M. Gelfand and S. V. Fomin,