A Lower Bound on the Estimator Variance for the Sparse Linear Model
Sebastian Schmutzhard, Alexander Jung, Franz Hlawatsch, Zvika Ben-Haim, Yonina C. Eldar
aa r X i v : . [ m a t h . S T ] S e p A Lower Bound on the Estimator Variancefor the Sparse Linear Model
Sebastian Schmutzhard , Alexander Jung , Franz Hlawatsch , Zvika Ben-Haim , and Yonina C. Eldar NuHAG, Faculty of Mathematics, University of ViennaA-1090 Vienna, Austria; e-mail: [email protected] Institute of Communications and Radio-Frequency Engineering, Vienna University of TechnologyA-1040 Vienna, Austria; e-mail: {ajung, fhlawats}@nt.tuwien.ac.at Technion—Israel Institute of TechnologyHaifa 32000, Israel; e-mail: {zvikabh@tx, yonina@ee}.technion.ac.il
Abstract —We study the performance of estimators of a sparsenonrandom vector based on an observation which is linearlytransformed and corrupted by additive white Gaussian noise.Using the reproducing kernel Hilbert space framework, we derivea new lower bound on the estimator variance for a given differen-tiable bias function (including the unbiased case) and an almostarbitrary transformation matrix (including the underdeterminedcase considered in compressed sensing theory). For the specialcase of a sparse vector corrupted by white Gaussian noise—i.e.,without a linear transformation—and unbiased estimation, ourlower bound improves on previously proposed bounds.
Index Terms —Sparsity, parameter estimation, sparse linearmodel, denoising, variance bound, reproducing kernel Hilbertspace, RKHS.
I. I
NTRODUCTION
We study the problem of estimating a nonrandom parametervector x ∈ R N which is sparse, i.e., at most S of its entries arenonzero, where ≤ S < N (typically S ≪ N ). We thus have x ∈ X S , with X S , (cid:8) x ′ ∈ R N (cid:12)(cid:12) k x ′ k ≤ S (cid:9) , (1)where k x k denotes the number of nonzero entries of x . Whilethe sparsity degree S is assumed to be known, the set ofpositions of the nonzero entries of x (denoted by supp( x ) ) isunknown. The estimation of x is based on the observed vector y ∈ R M given by y = Hx + n , (2)with a known system matrix H ∈ R M × N and white Gaussiannoise n ∼ N ( , σ I ) with known variance σ > . The matrix H is arbitrary except that it is assumed to satisfy the standardrequirement spark( H ) > S , (3)where spark( H ) denotes the minimum number of linearlydependent columns of H [1]. The observation model (2) This work was supported by the FWF under Grants S10602-N13 (Signaland Information Representation) and S10603-N13 (Statistical Inference) withinthe National Research Network SISE, by the Israel Science Foundation underGrant 1081/07, and by the European Commission under the FP7 Network ofExcellence in Wireless COMmunications NEWCOM++ (contract no. 216715). together with (1) will be referred to as the sparse linear model (SLM). Note that we also allow
M < N (this case is relevantto compressed sensing methods [1], [2]); however, condition(3) implies that M ≥ S . The case of correlated Gaussian noise n with a known nonsingular correlation matrix can be reducedto the SLM by means of a noise whitening transformation. Animportant special case of the SLM is given by H = I (so that M = N ), i.e., y = x + n , (4)where again x ∈ X S and n ∼ N ( , σ I ) . This will be referredto as the sparse signal in noise model (SSNM).Lower bounds on the estimation variance for the SLM havebeen studied previously. In particular, the Cramér–Rao bound(CRB) for the SLM was derived in [3]. For the SSNM (4),lower and upper bounds on the minimum variance of unbiasedestimators were derived in [4]. A problem with the lowerbounds of [3] and [4] is the fact that they exhibit a discontinuitywhen passing from the case k x k = S to the case k x k < S .In this paper, we use the mathematical framework of repro-ducing kernel Hilbert spaces (RKHS) [5]–[7] to derive a novellower variance bound for the SLM. The RKHS frameworkallows pleasing geometric interpretations of existing bounds,including the CRB, the Hammersley-Chapman-Robbins bound[8], and the Barankin bound [9]. The bound we derive hereholds for estimators with a given differentiable bias function.For the SSNM, in particular, we obtain a lower bound forunbiased estimators which is tighter than the bounds in [4]and, moreover, everywhere continuous. As we will show, RKHStheory relates the bound for the SLM to that obtained for thelinear model without a sparsity assumption. We note that theRKHS framework has been previously applied to estimation[6], [7] but, to the best of our knowledge, not to the SLM.This paper is organized as follows. In Section II, we reviewsome fundamentals of parameter estimation. Relevant elementsof RKHS theory are summarized in Section III. In Section IV,we use RKHS theory to derive a lower variance bound forthe SLM. Section V considers the special case of unbiasedestimation within the SSNM. Section VI presents a numericalcomparison of the new bound with the variance of two estab-lished estimation schemes.I. B ASIC C ONCEPTS
We first review some basic concepts of parameter estimation[10]. Let x ∈ X ⊆ R N be the nonrandom parameter vectorto be estimated, y ∈ R M the observed vector, and f ( y ; x ) theprobability density function (pdf) of y , parameterized by x . Forthe SLM, X = X S as defined in (1) and f ( y ; x ) = 1(2 πσ ) M/ exp (cid:18) − σ k y − Hx k (cid:19) . (5) A. Minimum-Variance Estimators
The estimation error incurred by an estimator ˆ x ( y ) canbe quantified by the mean squared error (MSE) ε (ˆ x ( · ); x ) , E x (cid:8) k ˆ x ( y ) − x k (cid:9) , where the notation E x {·} indicates that theexpectation is taken with respect to the pdf f ( y ; x ) parameter-ized by x . Note that ε (ˆ x ( · ); x ) depends on the true parametervalue, x . The MSE can be decomposed as ε (ˆ x ( · ); x ) = k b (ˆ x ( · ); x ) k + v (ˆ x ( · ); x ) , (6)with the estimator bias b (ˆ x ( · ); x ) , E x { ˆ x ( y ) } − x and theestimator variance v (ˆ x ( · ); x ) , E x (cid:8)(cid:13)(cid:13) ˆ x ( y ) − E x { ˆ x ( y ) } (cid:13)(cid:13) (cid:9) . Astandard approach to defining an optimum estimator is to fixthe bias, i.e., b (ˆ x ( · ); x ) ! = c ( x ) for all x ∈ X , and minimizethe variance v (ˆ x ( · ); x ) for all x ∈ X under this bias constraint.However, in many cases, such a “uniformly optimum” estimatordoes not exist. It is then natural to consider “locally optimum”estimators that minimize v (ˆ x ( · ); x ) only at a given parametervalue x = x ∈ X . This approach is taken here. Note that itfollows from (6) that once the bias is fixed, minimizing thevariance is equivalent to minimizing the MSE ε (ˆ x ( · ); x ) .The bias constraint b (ˆ x ( · ); x ) = c ( x ) can be equivalentlywritten as the mean constraint E x { ˆ x ( y ) } = γ ( x ) , with γ ( x ) , c ( x ) + x . Thus, we consider the constrained optimization problem ˆ x x ( · ) = arg min ˆ x ( · ) ∈B γ v (ˆ x ( · ); x ) , (7)where B γ , (cid:8) ˆ x ( · ) (cid:12)(cid:12) E x { ˆ x ( y ) } = γ ( x ) , ∀ x ∈ X (cid:9) . The minimum variance achieved by the locally optimum esti-mator ˆ x x ( · ) at x will be denoted as V γ ( x ) , v (ˆ x x ( · ); x ) = min ˆ x ( · ) ∈B γ v (ˆ x ( · ); x ) . This is also known as the
Barankin bound (for the prescribedmean γ ( x ) ) [9]. Using RKHS theory, it can be shown that ˆ x x ( · ) exists, i.e., there exists a unique minimum for (7),provided that there exists at least one estimator with mean γ ( x ) for all x ∈ X and finite variance at x (see also Section III). Forunbiased estimation, i.e., γ ( x ) ≡ x , ˆ x x ( · ) is called a locallyminimum variance unbiased (LMVU) estimator. Unfortunately, V γ ( x ) is difficult to compute in many cases, including the caseof the SLM. Lower bounds on V γ ( x ) are, e.g., the CRB andthe Hammersley-Chapman-Robbins bound [8].Let x k , ˆ x k ( y ) , and γ k ( x ) denote the k th entries of x , ˆ x ( y ) ,and γ ( x ) , respectively. We have v (ˆ x ( · ); x ) = P Nk =1 v (ˆ x k ( · ); x ) with v (ˆ x k ( · ); x ) , E x (cid:8)(cid:2) ˆ x k ( y ) − E x { ˆ x k ( y ) } (cid:3) (cid:9) . Thus, (7) isequivalent to the N scalar optimization problems ˆ x x ,k ( · ) = arg min ˆ x k ( · ) ∈B γk v (ˆ x k ( · ); x ) , k = 1 , . . . , N , (8)where B γ k , (cid:8) ˆ x ( · ) (cid:12)(cid:12) E x { ˆ x ( y ) } = γ k ( x ) , ∀ x ∈ X (cid:9) . The minimum variance achieved by ˆ x x ,k ( · ) at x is denoted as V γ k ( x ) , v (ˆ x x ,k ( · ); x ) = min ˆ x k ( · ) ∈B γk v (ˆ x k ( · ); x ) . (9) B. CRB of the Linear Gaussian Model
In our further development, we will make use of the CRBfor the linear Gaussian model (LGM) defined by z = As + n , (10)with the nonrandom parameter s ∈ R S (not assumed sparse), theobservation z ∈ R M , the known matrix A ∈ R M × S , and whiteGaussian noise n ∼ N ( , σ I ) . As before, we assume that M ≥ S ; furthermore, we assume that A has full column rank,i.e., A T A ∈ R S × S is nonsingular. The relationship of this modelwith the SLM, as well as the different notation and differentdimension ( S instead of N ), will become clear in Section IV.Consider estimators ˆ s k ( z ) of the k th parameter compo-nent s k whose bias is equal to some prescribed differen-tiable function ˜ c k ( s ) , i.e., b (ˆ s k ( · ); s ) = ˜ c k ( s ) or equivalently E s (cid:8) ˆ s k ( z ) (cid:9) = ˜ γ k ( s ) with ˜ γ k ( s ) , ˜ c k ( s ) + s k , for all s ∈ R S .Let V LGM ˜ γ k ( s ) denote the minimum variance achievable by suchestimators at a given true parameter s . The CRB C LGM ˜ γ k ( s ) isthe following lower bound on the minimum variance [10]: V LGM ˜ γ k ( s ) ≥ C LGM ˜ γ k ( s ) , σ ˜ r Tk ( s )( A T A ) − ˜ r k ( s ) , (11)where ˜ r k ( s ) , ∂ ˜ γ k ( s ) /∂ s , i.e., ˜ r k ( s ) is the vector of dimension S whose l th entry is ∂ ˜ γ k ( s ) /∂s l . We note that V LGM ˜ γ k ( s ) = C LGM ˜ γ k ( s ) if ˜ γ k ( s ) is an affine function of s . In particular, thisincludes the unbiased case ( ˜ γ k ( s ) ≡ s k ).III. T HE RKHS F
RAMEWORK
In this section, we review some RKHS fundamentals whichwill provide a basis for our further development. Consider a set X (not necessarily a linear space) and a positive semidefinite “kernel” function R ( x , x ′ ) : X ×X → R . For each fixed x ′ ∈ X ,the function f x ′ ( x ) , R ( x , x ′ ) maps X into R . The RKHS H ( R ) is a Hilbert space of functions f : X → R which isdefined as the closure of the linear span of the set of functions { f x ′ ( x ) = R ( x , x ′ ) } x ′ ∈X . This closure is taken with respectto the topology given by the scalar product h· , ·i H ( R ) which isdefined via the reproducing property [5] (cid:10) f ( · ) , R ( · , x ′ ) (cid:11) H ( R ) = f ( x ′ ) . This relation holds for all f ∈ H ( R ) and x ′ ∈ X . The associatednorm is given by k f k H ( R ) = h f, f i / H ( R ) . That is, for any finite set { x k } k =1 ,...,P with x k ∈ X , the matrix R ∈ R P × P with entries ( R ) k,l , R ( x k , x l ) is positive semidefinite. e now consider the constrained optimization problem (8)for a given mean function γ ( x ) (formerly denoted by γ k ( x ) ;we temporarily drop the subscript k for better readability).According to [6], [7], for certain classes of parametrized pdf’s f ( y ; x ) (which include the Gaussian pdf in (5)), one canassociate with this optimization problem an RKHS H ( R x ) whose kernel R x ( x , x ′ ) : X ×X → R is given by R x ( x , x ′ ) , E x (cid:26) f ( y ; x ) f ( y ; x ) f ( y ; x ′ ) f ( y ; x ) (cid:27) = Z R M f ( y ; x ) f ( y ; x ′ ) f ( y ; x ) d y . It can be shown [6], [7] that γ ( x ) ∈ H ( R x ) if and only ifthere exists at least one estimator with mean γ ( x ) for all x and finite variance at x . Furthermore, under this condition,the minimum variance V γ ( x ) in (9) is finite and allows thefollowing expression involving the norm k γ k H ( R x ) : V γ ( x ) = k γ k H ( R x ) − γ ( x ) . (12)This is an RKHS formulation of the Barankin bound. Unfortu-nately, the norm k γ k H ( R x ) is often difficult to compute.For the SLM in (2), (1), (5), X = X S ; the kernel here is amapping X S × X S → R which is easily shown to be given by R x ( x , x ′ ) = exp (cid:18) σ ( x − x ) T H T H ( x ′ − x ) (cid:19) , (13)where x ∈ X S . An RKHS can also be defined for the LGM in(10). Here, X = R S , and the kernel R LGM s ( s , s ′ ) with s ∈ R S is a mapping R S × R S → R given by R LGM s ( s , s ′ ) = exp (cid:18) σ ( s − s ) T A T A ( s ′ − s ) (cid:19) . (14)Note that these kernels differ in their domain, which is X S ×X S for R x ( x , x ′ ) and R S × R S for R LGM s ( s , s ′ ) .IV. A L OWER B OUND ON THE E STIMATOR V ARIANCE
We now continue our treatment of the SLM estimationproblem. In what follows, V γ ( x ) will be understood to denotethe bias-constrained minimum variance (9) specifically for theSLM . This means, in particular, that X = X S , and hence the setof admissible estimators is given by B γ = (cid:8) ˆ x ( · ) (cid:12)(cid:12) E x { ˆ x ( y ) } = γ ( x ) , ∀ x ∈ X S (cid:9) . (15)We will next derive a lower bound on V γ ( x ) . A. Relaxing the Bias Constraint
The first step in this derivation is to relax the bias constraint ˆ x ( · ) ∈ B γ . Let K , { k , . . . , k S } be a fixed set of S differentindices k i ∈ { , . . . , N } (not related to supp( x ) ), and let X K S , { x ∈ X S | supp( x ) ⊆ K} . Clearly, X K S ⊆ X S ; however, contrary to X S , X K S is a linearsubspace of R N . Let B K γ be the set of all estimators with mean γ ( x ) for all x ∈ X K S (but not necessarily for all x ∈ X S ), i.e., B K γ , (cid:8) ˆ x ( · ) (cid:12)(cid:12) E x { ˆ x ( y ) } = γ ( x ) , ∀ x ∈ X K S (cid:9) . Comparing with (15), we see that B K γ ⊇ B γ .Let us now consider the minimum variance among allestimators in B K γ , i.e., V K γ ( x ) , min ˆ x ( · ) ∈B K γ v (ˆ x ( · ); x ) . (16)Because ˆ x ( · ) ∈ B K γ is a less restrictive constraint than ˆ x ( · ) ∈ B γ used in the definition of V γ ( x ) , we have V γ ( x ) ≥ V K γ ( x ) , (17)i.e., V K γ ( x ) is a lower bound on V γ ( x ) . A closed-formexpression of V K γ ( x ) appears to be difficult to obtain in thegeneral case, because x
6∈ X K S in general. Therefore, we willuse RKHS theory to derive a lower bound on V K γ ( x ) . B. Two Isometric RKHSs
An RKHS for the SLM can also be defined on X K S , using akernel R K x : X K S ×X K S → R that is given by the right-hand sideof (13) but whose arguments x , x ′ are assumed to be in X K S and not just in X S (however, recall that x
6∈ X K S in general).This RKHS will be denoted H ( R K x ) . The minimum variance V K γ ( x ) in (16) can then be expressed as (cf. (12)) V K γ ( x ) = k γ k H ( R K x ) − γ ( x ) . (18)In order to develop this expression, we define some notation.Consider an index set I = { k , . . . , k |I| } ⊆ { , . . . , N } . Wedenote by H I ∈ R M ×|I| the submatrix of our matrix H ∈ R M × N whose i th column is given by the k i th column of H .Furthermore, for a vector x ∈ R N , we denote by x I ∈ R |I| thesubvector whose i th entry is the k i th entry of x .We now introduce a second RKHS. Consider the LGM in(10) with matrix A = H K ∈ R M × S , and let H ( R LGM s ) with s ∈ R S denote the RKHS for that LGM as defined by the kernel R LGM s : R S × R S → R in (14). Exploiting the linear-subspacestructure of X K S , it can be shown that our RKHS H ( R K x ) fora given x is isometric to H ( R LGM s ) with s chosen as s = H †K H x . (19)Here, H †K , ( H T K H K ) − H T K ∈ R S × M is the pseudo-inverse of H K (recall that M ≥ S , and note that ( H T K H K ) − is guaranteedto exist because of our assumption (3)). More specifically, theisometry J : H ( R K x ) → H ( R LGM s ) mapping each f ∈ H ( R K x ) to an ˜ f ∈ H ( R LGM s ) is given by J { f ( x ) } = ˜ f ( x K ) = β x f ( x ) , x ∈ X K S , (20)where β x , exp (cid:18) − σ (cid:13)(cid:13) ( I − P K ) Hx (cid:13)(cid:13) (cid:19) . (21)Here, P K , H K H †K is the orthogonal projection matrix on therange of H K . The factor β x can be interpreted as a measureof the distance between the point Hx and the subpsace X K S associated with the index set K . We can write (20) as f ( s ) = β x f ( x ( s )) , s ∈ R S , where x ( s ) denotes the x ∈ X K S for which x K = s (i.e., the S entries of s appear in x ( s ) at the appropriate positions within K , and the N − S remaining entries of x ( s ) are zero).Consider now the image of γ ( x ) under the mapping J , ˜ γ ( s ) , J { γ ( x ) } = β x γ ( x ( s )) , s ∈ R S . (22)Since J is an isometry, we have k ˜ γ k H ( R LGM s ) = k γ k H ( R K x ) .Combining this identity with (18), we obtain V K γ ( x ) = k ˜ γ k H ( R LGM s ) − γ ( x ) . (23) C. Lower Bound on V K γ ( x ) We will now use expression (23) to derive a lower bound on V K γ ( x ) in terms of the CRB for the LGM in (11). Consider theminimum estimator variance for the LGM under the constraintof the prescribed mean function ˜ γ ( s ) , V LGM ˜ γ ( s ) , still for A = H K and for s given by (19). We have (cf. (12)) V LGM ˜ γ ( s ) = k ˜ γ k H ( R LGM s ) − ˜ γ ( s ) . Combining with (23), we obtain the relation V K γ ( x ) = V LGM ˜ γ ( s ) + ˜ γ ( s ) − γ ( x ) . Using the CRB V LGM ˜ γ ( s ) ≥ C LGM ˜ γ ( s ) (see (11)) yields V K γ ( x ) ≥ L K γ ( x ) , (24)with L K γ ( x ) , C LGM ˜ γ ( s ) + ˜ γ ( s ) − γ ( x ) . (25)Finally, using (22) and the implied CRB relation C LGM ˜ γ ( s ) = β x C LGM γ ( x ( s )) ( s ) , the lower bound (25) can be reformulated as L K γ ( x ) = β x (cid:2) C LGM γ ( x ( s )) ( s ) + γ ( x ( s )) (cid:3) − γ ( x ) . (26)Here, C LGM γ ( x ( s )) ( s ) denotes the CRB for prescribed mean func-tion γ ′ ( s ) = γ ( x ( s )) , which is given by (see (11)) C LGM γ ( x ( s )) ( s ) = σ r T ( s )( H T K H K ) − r ( s ) , (27)where r ( s ) , ∂ γ ( x ( s )) /∂ s and s is related to x via (19).To summarize, we have the following chain of lower boundson the bias-constrained variance at x : v (ˆ x ( · ); x ) (9) ≥ V γ ( x ) (17) ≥ V K γ ( x ) (24) ≥ L K γ ( x ) . (28)While L K γ ( x ) is the loosest of these bounds, it is attractivebecause of its closed-form expression in (26) (together with(27) and (19)). We note that the inequality (24) becomes anequality if ˜ γ ( s ) is an affine function of s , or equivalently (see(22)), if γ ( x ) is an affine function of x . In particular, thisincludes the unbiased case ( γ ( x ) ≡ x ). Recalling that v (ˆ x ( · ); x ) = P Nk =1 v (ˆ x k ( · ); x ) (we nowreintroduce the subscript k ), a lower bound on v (ˆ x ( · ); x ) isobtained from (28) as v (ˆ x ( · ); x ) ≥ N X k =1 L K k γ k ( x ) . For a high lower bound, the index sets K k should in general bechosen such that the respective factors β x ,k in (26) are large.(This means that the “distances” between Hx and X K k S aresmall, see (21).) Formally using the optimum K k for each k ,we arrive at the main result of this paper. Theorem.
Let ˆ x ( · ) be an estimator for the SLM (2) , (1) whosemean equals γ ( x ) for all x ∈ X S . Then the variance of ˆ x ( · ) ata given parameter vector x = x ∈ X S satisfies v (ˆ x ( · ); x ) ≥ N X k =1 L ∗ γ k ( x ) , (29) where L ∗ γ k ( x ) , max K k : |K k | = S L K k γ k ( x ) , with L K k γ k ( x ) givenby (26) together with (27) and (19) . V. S
PECIAL C ASE : U
NBIASED E STIMATION FOR THE
SSNMThe SSNM in (4) is a special case of the SLM with H = I .We now consider unbiased estimation (i.e., γ ( x ) ≡ x ) for theSSNM. Since an unbiased estimator with uniformly minimumvariance does not exist [4], we are interested in a lower variancebound at a fixed x ∈ X S . We denote by ξ ( x ) and j ( x ) thevalue and index, respectively, of the S -largest (in magnitude)entry of x ; note that this is the smallest (in magnitude) nonzeroentry of x if k x k = S , and zero if k x k < S .Consider an unbiased estimator ˆ x k ( · ) . For k ∈ supp( x ) ,using the lower bound L K k γ k ( x ) in (26) with any index set K k of size |K k | = S such that supp( x ) ⊆ K k , one can show that v (ˆ x k ( · ); x ) ≥ σ , k ∈ supp( x ) . (30)This bound is actually the minimum variance (i.e., the varianceof the LMVU estimator) since it is achieved by the specificunbiased estimator ˆ x k ( y ) = y k (which is the LMVU estimatorfor k ∈ supp( x ) ). On the other hand, for k / ∈ supp( x ) , thelower bound L K k γ k ( x ) with K k = (cid:0) supp( x ) \{ j ( x ) } (cid:1) ∪ { k } can be shown to lead to the inequality v (ˆ x k ( · ); x ) ≥ σ e − ξ ( x ) /σ , k / ∈ supp( x ) . (31)Combining (30) and (31), a lower bound on the overall variance v (ˆ x ( · ); x ) = P Nk =1 v (ˆ x k ( · ); x ) is obtained as v (ˆ x ( · ); x ) ≥ X k ∈ supp( x ) σ + X k / ∈ supp( x ) σ e − ξ ( x ) /σ . (32)Thus, recalling that v (ˆ x ( · ); x ) = ε (ˆ x ( · ); x ) for unbiasedestimators, we arrive at the following result. Corollary.
Let ˆ x ( · ) be an unbiased estimator for the SSNM in (4) . Then the MSE of ˆ x ( · ) at a given x = x ∈ X S satisfies ε (ˆ x ( · ); x ) ≥ (cid:2) S + ( N − S ) e − ξ ( x ) /σ (cid:3) σ . (33)his lower bound is tighter (i.e., higher) than the lower boundderived in [4]. Furthermore, in contrast to the bound in [4], itis a function of x that is everywhere continuous. This factis theoretically pleasing since the MSE of any estimator is acontinuous function of x [11].Let us consider the special case of S = 1 . Here, ξ ( x ) and j ( x ) are simply the value and index, respectively, of the singlenonzero entry of x . Using RKHS theory, one can show thatthe estimator ˆ x ( · ) given componentwise by ˆ x k ( y ) = ( y j ( x ) , k = j ( x ) α ( y ; x ) y k , else , with α ( y ; x ) , exp (cid:0) − σ [2 y j ( x ) ξ ( x ) + ξ ( x )] (cid:1) , is theLMVU estimator at x . That is, the estimator ˆ x ( · ) is unbiasedand its MSE achieves the lower bound (33). This also meansthat (33) is actually the minimum MSE (achieved by theLMVU estimator). While ˆ x ( · ) is not very practical since itexplicitly involves the unknown true parameter x , its existencedemonstrates the tightness of the bound (33).VI. N UMERICAL R ESULTS
For the SSNM in (4), we will compute the lower variancebound P Nk =1 L ∗ γ k ( x ) (see (29)) and compare it with thevariance of two established estimators, namely, the maximumlikelihood (ML) estimator and the hard-thresholding (HT) es-timator. The ML estimator is given by ˆ x ML ( y ) , arg max x ′ ∈X S f ( y ; x ′ ) = P S ( y ) , where the operator P S retains the S largest (in magnitude)entries and zeros out all others. The HT estimator ˆ x HT ( y ) isgiven by ˆ x HT ,k ( y ) = ( y k , | y k | ≥ T , else , (34)where T is a fixed threshold.For simplicity, we consider the SSNM for S = 1 . In this case,the bound (29) can be shown to be v (ˆ x ( · ); x ) ≥ L K j γ j ( x ) + ( N − e − ξ ( x ) /σ L K i γ i ( x ) , (35)where j , j ( x ) , i is any index different from j ( x ) (it can beshown that all such indices equally maximize the lower bound), K j , { j ( x ) } , and K i , { i } . (We note that (35) simplifies to(32) for the special case of an unbiased estimator.) Since wecompare the bound (35) to the ML and HT estimators, γ ( x ) isset equal to the mean of the respective estimator (ML or HT).For a numerical evaluation, we generated parameter vectors x with N = 5 , S = 1 , j ( x ) = 1 , and different ξ ( x ) . (The fixedchoice j ( x ) = 1 is justified by the fact that neither the variancesof the ML and HT estimators nor the corresponding variancebounds depend on j ( x ) .) In Fig. 1, we plot the variances v (ˆ x ML ( · ); x ) and v (ˆ x HT ( · ); x ) (the latter for three differentchoices of T in (34)) along with the corresponding bounds (35),as a function of the signal-to-noise ratio (SNR) ξ ( x ) /σ . Itis seen that for SNR larger than about 18 dB, all variances andbounds are effectively equal (for the HT estimator, this is trueif T is not too small). However, in the medium-SNR range, the PSfrag replacements
SNR [dB] − − −
10 0 10 200 v a r i a n ce / bound bound on v (ˆ x ML ( · ); x ) v (ˆ x ML ( · ); x ) v (ˆ x HT ( · ); x ) , T =3 v (ˆ x HT ( · ); x ) , T =4 v (ˆ x HT ( · ); x ) , T =5 bound on v (ˆ x HT ( · ); x ) , T =3 bound on v (ˆ x HT ( · ); x ) , T =4 bound on v (ˆ x HT ( · ); x ) , T =5 ML HT ( T =3 )HT ( T =4 )HT ( T =5 )Figure 1. Variance of the ML and HT estimators and corresponding lowerbounds versus the SNR ξ ( x ) /σ , for the SSNM with N =5 and S =1 . variances of the ML and HT estimators are significantly higherthan the corresponding lower bounds. We can conclude thatthere might exist estimators with the same mean as that of theML or HT estimator but smaller variance. Note, however, thata positive statement regarding the existence of such estimatorscannot be based on our analysis.VII. C ONCLUSION
Using the mathematical framework of reproducing kernelHilbert spaces, we derived a novel lower bound on the varianceof estimators of a sparse vector under a bias constraint. Theobserved vector was assumed to be a linearly transformedand noisy version of the sparse vector to be estimated. Thissetup includes the underdetermined case relevant to compressedsensing. In the special case of unbiased estimation of a noise-corrupted sparse vector, our bound improves on the best knownlower bound. A comparison with the variance of two establishedestimators showed that there might exist estimators with thesame bias but a smaller variance.R
EFERENCES[1] J. A. Tropp, “Greed is good: Algorithmic results for sparse approxima-tion,”
IEEE Trans. Inf. Theory , vol. 50, pp. 2231–2242, Oct. 2004.[2] D. L. Donoho, “Compressed sensing,”
IEEE Trans. Inf. Theory , vol. 52,pp. 1289–1306, April 2006.[3] Z. Ben-Haim and Y. C. Eldar, “The Cramér–Rao bound for estimatinga sparse parameter vector,”
IEEE Trans. Signal Processing , vol. 58,pp. 3384–3389, June 2010.[4] A. Jung, Z. Ben-Haim, F. Hlawatsch, and Y. C. Eldar, “On unbiasedestimation of sparse vectors corrupted by Gaussian noise,” in
Proc. IEEEICASSP-2010 , (Dallas, TX), pp. 3990–3993, March 2010.[5] N. Aronszajn, “Theory of reproducing kernels,”
Trans. Am. Math. Soc. ,vol. 68, pp. 337–404, May 1950.[6] E. Parzen, “Statistical inference on time series by Hilbert space methods,I.,” Tech. Rep. 23, Appl. Math. Stat. Lab., Stanford University, Stanford,CA, Jan. 1959.[7] D. D. Duttweiler and T. Kailath, “RKHS approach to detection andestimation problems – Part V: Parameter estimation,”
IEEE Trans. Inf.Theory , vol. 19, pp. 29–37, Jan. 1973.[8] J. D. Gorman and A. O. Hero, “Lower bounds for parametric estimationwith constraints,”
IEEE Trans. Inf. Theory , vol. 36, pp. 1285–1301, Nov.1990.[9] E. W. Barankin, “Locally best unbiased estimates,”
Ann. Math. Statist. ,vol. 20, no. 4, pp. 477–501, 1949.[10] S. M. Kay,
Fundamentals of Statistical Signal Processing: EstimationTheory . Englewood Cliffs, NJ: Prentice Hall, 1993.[11] E. L. Lehmann and G. Casella,