11 Estimation of block sparsity incompressive sensing
Zhiyong Zhou and Jun Yu
Abstract
Explicitly using the block structure of the unknown signal can achieve better recovery performance in compressive censing.An unknown signal with block structure can be accurately recovered from underdetermined linear measurements provided that itis sufficiently block sparse. However, in practice, the block sparsity level is typically unknown. In this paper, we consider a softmeasure of block sparsity, k α ( x ) = ( (cid:107) x (cid:107) ,α / (cid:107) x (cid:107) , ) α − α , α ∈ [0 , ∞ ] and propose a procedure to estimate it by using multivariateisotropic symmetric α -stable random projections without sparsity or block sparsity assumptions. The limiting distribution of theestimator is given. Some simulations are conducted to illustrate our theoretical results. Index Terms
Block sparsity; Multivariate isotropic symmetric α -stable distribution; Compressive sensing; Characteristic function. I. I
NTRODUCTION S INCE its introduction a few years ago [4], [5], [6], [9], Compressive Sensing (CS) has attracted considerable interests (seethe monographs [13], [16] for a comprehensive view). Formally, one considers the standard CS model, y = A x + ε , (1)where y ∈ R m × is the measurements, A ∈ R m × N is the measurement matrix, x ∈ R N is the unknown signal, ε is themeasurement error, and m (cid:28) N . The goal of CS is to recover the unknown signal x by using only the underdeterminedmeasurements y and the matrix A . Under the assumption of sparsity of the signal, that is x has only a few nonzero entries,and the measurement matrix A is properly chosen, x can be recovered from y by certain algorithms, such as the Basis Pursuit(BP), or (cid:96) -minimization approach, the Orthogonal Matching Pursuit (OMP) [31], Compressive Sampling Matching Pursuit(CoSaMP) [25] and the Iterative Harding Thresholding algorithm [2]. Specifically, when the sparsity level of the signal x is s = (cid:107) x (cid:107) = card { j : x j (cid:54) = 0 } , if m ≥ Cs ln( N/s ) with some universal constant C , and A is subgaussian random matrix,then accurate or robust recovery can be guaranteed with high probability.The sparsity level parameter s plays a fundamental role in CS, as the number of measurements, the properties of measurementmatrix A , and even some recovery algorithms all involve it. However, the sparsity level of the signal is usually unknown inpractice. To fill the gap between theory and practice, very recently [17], [18] proposed a numerically stable measure ofsparsity s α ( x ) = (cid:16) (cid:107) x (cid:107) α (cid:107) x (cid:107) (cid:17) α − α with α ∈ [0 , ∞ ] , which is in ratios of norms. By random linear projections using i.i.d univariatesymmetric α -stable random variables, the author constructed the estimation equation for s α ( x ) with α ∈ (0 , by adoptingthe characteristic function method and obtained the asymptotic normality of the estimator.As a natural extension of the sparsity with nonzero entries arbitrarily spread throughout the signal, we can consider thesparse signals exhibit additional structure in the form of the nonzero entries occurring in clusters. Such signals are referred toas block sparse [10], [12], [14]. Block sparse model appears in many practical scenarios, such as when dealing with multi-bandsignals [24], in measurements of gene expression levels [27], and in colour imaging [20]. Moreover, block sparse model canbe used to treat the problems of multiple measurement vector (MMV) [7], [8], [14], [23] and sampling signals that lie in aunion of subspaces [3], [14], [24].To make explicit use of the block structure to achieve better sparse recovery performance, the corresponding extendedversions of sparse representation algorithms have been developed, such as mixed (cid:96) /(cid:96) -norm recovery algorithm [12], [14],[30], group lasso [33] or adaptive group lasso [19], iterative reweighted (cid:96) /(cid:96) recovery algorithms [34], block version ofOMP algorithm [12] and the extensions of the CoSaMP algorithm and of the Iterative Hard Thresholding to the model-basedsetting [1], which includes block sparse model as a special case. It was shown in [14] that if the measurement matrix A hassmall block-restricted isometry constants which generalizes the conventional RIP notion, then the mixed (cid:96) /(cid:96) -norm recoveryalgorithm is guaranteed to recover any block sparse signal, irrespectively of the locations of the nonzero blocks. Furthermore,recovery will be robust in the presence of noise and modeling errors (i.e., when the vector is not exactly block sparse). [1]showed that the block versions of CoSaMP and Iterative Hard Thresholding exhibit provable recovery guarantees and robustnessproperties. In addition, with the block-coherence of A is small, the robust recovery of mixed (cid:96) /(cid:96) -norm method, and theblock version of the OMP algorithm are guaranteed in [12]. The authors are with the Department of Mathematics and Mathematical Statistics, Ume˚a University, Ume˚a, 901 87, Sweden (e-mail: [email protected],[email protected]). a r X i v : . [ s t a t . A P ] A p r The block sparsity level plays the same central role in recovery for block sparse signals as the sparsity level in recovery forsparse signals. Namely, the required number of measurements, properties of the recovery measurement matrix (Block RIP),and some recovery algorithms for signals with block structure all depend on the block sparsity level. However, in reality, theblock sparsity level of the signals are also unknown. To obtain its estimator is very important from both the theoretical andpractical views.
A. Contributions
First, as a extension of the soft sparsity measure s α ( x ) = (cid:16) (cid:107) x (cid:107) α (cid:107) x (cid:107) (cid:17) α − α with α ∈ [0 , ∞ ] in [17], [18], we propose a softmeasure of block sparsity, that is k α ( x ) = ( (cid:107) x (cid:107) ,α / (cid:107) x (cid:107) , ) α − α .Second, we obtain an estimator for the block sparsity by using multivariate isotropic symmetric α -stable random projections.When the block size is 1, our estimation procedure reduces to the case considered in [18]. The asymptotic distributions of theestimators are obtained, similar to the results presented in [18].Finally, a series of simulation experiments are conducted to illustrate our theoretical results. B. Organization and Notations
The remainder of the paper is organized as follows. In Section II, we introduce the definition of block sparsity and a softmeasure of block sparsity. In Section III, we present the estimation procedure for the block sparsity measure and obtain theasymptotic properties for the estimators. In Section IV, we conduct some simulations to illustrate the theoretical results. SectionV is devoted to the conclusion. Finally, the proofs are postponed to the Appendix.Throughout the paper, we denote vectors by boldface lower letters e.g., x , and matrices by upper letters e.g., A . Vectorsare columns by default. x T is the transpose of the vector x . The notation x j denotes the j -th component of x . For any vector x ∈ R N , we denote the (cid:96) p -norm (cid:107) x (cid:107) p = ( (cid:80) Nj =1 | x j | p ) /p for p > . I ( · ) is the indicator function. E is the expectation function. (cid:98)·(cid:99) is the bracket function, which takes the maximum integer value. Re( · ) is the real part function. i is the unit imaginarynumber. (cid:104)· , ·(cid:105) is the inner product of two vectors. p −→ indicates convergence in probability, while d −→ is convergence indistribution. II. B LOCK S PARSITY M EASURES
A. Definitions
We firstly introduce some basic concepts for block sparsity and propose a new soft measure of block sparsity.With N = (cid:80) pj =1 d j , we define the j -th block x [ j ] of a length- N vector x over I = { d , · · · , d p } . The j -th block is oflength d j , and the blocks are formed sequentially so that x = ( x · · · x d (cid:124) (cid:123)(cid:122) (cid:125) x T [1] x d +1 · · · x d + d (cid:124) (cid:123)(cid:122) (cid:125) x T [2] · · · x N − d p +1 · · · x N (cid:124) (cid:123)(cid:122) (cid:125) x T [ p ] ) T . (2)Without loss of generality, we assume that d = d = · · · = d p = d , then N = pd . A vector x ∈ R N is called block k -sparseover I = { d, · · · , d } if x [ j ] is nonzero for at most k indices j . In other words, by denoting the mixed (cid:96) /(cid:96) norm (cid:107) x (cid:107) , = p (cid:88) j =1 I ( (cid:107) x [ j ] (cid:107) > , a block k -sparse vector x can be defined by (cid:107) x (cid:107) , ≤ k .Despite the important theoretical role of the parameter (cid:107) x (cid:107) , , it has a severe practical drawback of being not sensitiveto small entries of x . For instance, if x has k large blocks and p − k small blocks, then (cid:107) x (cid:107) , = p as soon as they arenonzero. To overcome this drawback, it is desirable to replace the mixed (cid:96) /(cid:96) norm with a soft version. Specifically, wegeneralize the sparsity measure based on entropy to the block sparsity measure. For any non-zero signal x given in (2), itinduces a distribution π ( x ) ∈ R p on the set of block indices { , · · · , p } , assigning mass π j ( x ) = (cid:107) x [ j ] (cid:107) / (cid:107) x (cid:107) , at index j ∈ { , · · · , p } , where (cid:107) x (cid:107) , = (cid:80) pj =1 (cid:107) x [ j ] (cid:107) . Then the entropy based block sparsity goes to k α ( x ) = (cid:40) exp( H α ( π ( x ))) if x (cid:54) = if x = , (3)where H α is the R´enyi entropy of order α ∈ [0 , ∞ ] [18], [28], [32]. When α / ∈ { , , ∞} , the R´enyi entropy is given explicitlyby H α ( π ( x )) = − α ln( (cid:80) pj =1 π j ( x ) α ) , and the cases of α ∈ { , , ∞} are defined by evaluating limits, with H being theordinary Shannon entropy. Then, for x (cid:54) = and α / ∈ { , , ∞} , we have the measure of block sparsity written conveniently interms of mixed (cid:96) /(cid:96) α norm as k α ( x ) = (cid:18) (cid:107) x (cid:107) ,α (cid:107) x (cid:107) , (cid:19) α − α , Fig. 1. Three vectors (red, green, blue) in R are plotted with the (cid:96) norm of blocks in decreasing order. We set d = 5 and compare the values k ( x ) with (cid:107) x (cid:107) , . where the mixed (cid:96) /(cid:96) α norm (cid:107) x (cid:107) ,α = (cid:16)(cid:80) pj =1 (cid:107) x [ j ] (cid:107) α (cid:17) /α for α > . The cases of α ∈ { , , ∞} are evaluated aslimits: k ( x ) = lim α → k α ( x ) = (cid:107) x (cid:107) , , k ( x ) = lim α → k α ( x ) = exp( H ( π ( x ))) , and k ∞ ( x ) = lim α →∞ k α ( x ) = (cid:107) x (cid:107) , (cid:107) x (cid:107) , ∞ , where (cid:107) x (cid:107) , ∞ = max ≤ j ≤ p (cid:107) x [ j ] (cid:107) . When the block size d equals 1, our block sparsity measure k α ( x ) reduces to the nonblock sparsitymeasure s α ( x ) = (cid:16) (cid:107) x (cid:107) α (cid:107) x (cid:107) (cid:17) α − α given by [18].The fact that k α ( x ) ( α = 2 ) is a sensible measure of the block sparsity for non-idealized signals is illustrated in Figure 1.In the case that x has k large blocks and p − k small blocks, we have (cid:107) x (cid:107) , = p , whereas k ( x ) ≈ k .In addition, the quantity k α ( x ) has some important properties similar as s α ( x ) . • Continuity: Unlike the mixed (cid:96) /(cid:96) norm, the function k α ( · ) is continuous on R N \ for all α > . Thus, it is stable withrespective to small perturbations of the signal. • Range equal to [0 , p ] : For all x ∈ R N given as (2) and all α ∈ [0 , ∞ ] , we have ≤ k α ( x ) ≤ p. • Scale-invariance: For all c (cid:54) = 0 , it holds that k α ( c x ) = k α ( x ) . Scale-invariance encodes the idea that block sparsity shouldalso be based on relative (rather than absolute) magnitudes of the entries of the signal as the sparsity. • Non-increasing in α : For any α (cid:48) ≥ α ≥ , we have k ∞ ( x ) = (cid:107) x (cid:107) , (cid:107) x (cid:107) , ∞ ≤ k α (cid:48) ( x ) ≤ k α ( x ) ≤ k ( x ) = (cid:107) x (cid:107) , . B. Recovery results in terms of k ( x ) Before presenting the estimation procedure for the (cid:107) x (cid:107) α ,α and k α ( x ) with α ∈ (0 , , we give the block sparse signalrecovery results in terms of k ( x ) by using mixed (cid:96) /(cid:96) -norm optimization algorithm. To recover the block sparse signal in CS model (1), we use the following mixed (cid:96) /(cid:96) -norm optimization algorithm proposedin [12], [14]: (cid:98) x = arg min e ∈ R N (cid:107) e (cid:107) , , subject to (cid:107) y − A e (cid:107) ≤ δ, (4)where δ ≥ is a upper bound on the noise level (cid:107) ε (cid:107) . Then, we have the following result concerning on the robust recoveryfor block sparse signals. Lemma 1 ([14]). Let y = A x + ε be noisy measurements of a vector x and fix a number k ∈ { , · · · , p } . Let x k denote thebest block k -sparse approximation of x , such that x k is block k -sparse and minimizes (cid:107) x − f (cid:107) , over all the block k -sparsevectors f , and let (cid:98) x be a solution to (4), a random Gaussian matrix A of size m × N with entries A ij ∼ N (0 , m ) , and blocksparse signals over I = { d = d, · · · , d p = d } , where N = pd for some integer p . Then, there are constants c , c , c , c > ,such that the following statement is true. If m ≥ c k ln( eN/kd ) , then with probability at least − − c m ) , we have (cid:107) (cid:98) x − x (cid:107) (cid:107) x (cid:107) ≤ c (cid:107) x − x k (cid:107) , √ k (cid:107) x (cid:107) + c δ (cid:107) x (cid:107) . (5) Remark 1.
Note that the first term in (5) is a result of the fact that x is not exactly block k -sparse, while the second termquantifies the recovery error due to the measurement noise. When the block size d = 1 , this Lemma goes to the conventional CSresult for sparse signals. Explicit use of block sparsity reduces the required number of measurements from O( kd ln( eN/kd )) to O( k ln( eN/kd )) by d times.The limitation of the previous bound is that the ratio term (cid:107) x − x k (cid:107) , √ k (cid:107) x (cid:107) is typically unknown. Thus, it is not clear how large m should be chosen to guarantee that the relative (cid:96) -error (cid:107) (cid:98) x − x (cid:107) (cid:107) x (cid:107) is small with high probability. Next, we present an upper boundof the relative (cid:96) -error by an explicit function of m and the new proposed block sparsity measure k ( x ) , which is estimable.The following result is an extension of Proposition 1 in [18]. Its proof is left to Appendix. Lemma 2.
Let y = A x + ε be noisy measurements of a vector x , and let (cid:98) x be a solution to (4), a random Gaussian matrix A of size m × N with entries A ij ∼ N (0 , m ) , and block sparse signals over I = { d = d, · · · , d n = d } , where N = pd for some integer p . Then, there are constants κ , κ , κ , κ > , such that the following statement is true. If m and N satisfy κ ln( κ eNm ) ≤ m ≤ N , then with probability at least − − κ m ) , we have (cid:107) (cid:98) x − x (cid:107) (cid:107) x (cid:107) ≤ κ (cid:115) k ( x ) d ln( eNm ) m + κ δ (cid:107) x (cid:107) . (6)III. E STIMATION M ETHOD FOR (cid:107) x (cid:107) α ,α AND k α ( x ) In this section, we mainly focus on the estimation of k α ( x ) with α ∈ (0 , . There are two reasons to consider this interval.One is that small α is usually a better block sparsity measure than very large α in applications. And we can approximate (cid:107) x (cid:107) , with very small α as will be shown later. The other reason is that our estimation method relies on the α -stable distribution,which requires α to lie in (0 , . The core idea to obtain the estimators for (cid:107) x (cid:107) α ,α and k α ( x ) with α ∈ (0 , is using randomprojections. Contrast to the conventional sparsity estimation by using projections with univariate symmetric α -stable randomvariables [18], [35], we use projections with the multivariate centered isotropic symmetric α -stable random vectors [26], [29]for the block sparsity estimation. A. Multivariate Isotropic Stable Distribution
We firstly give the definition of the multivariate centered isotropic symmetric α -stable distribution. Definition 1.
For d ≥ , a d -dimensional random vector v has a centered isotropic symmetric α -stable distribution if there areconstants γ > and α ∈ (0 , such that its characteristic function has the form E [exp( i u T v )] = exp( − γ α (cid:107) u (cid:107) α ) , for all u ∈ R d . (7)We denote the distribution by v ∼ S ( d, α, γ ) , and γ is referred to as the scale parameter. Remark 2.
The most well-known example of multivariate isotropic symmetric stable distribution is the case of α = 2 (Multivariate Independent Gaussian Distribution), and in this case, the components of the Multivariate Gaussian random vectorare independent. Another case is α = 1 (Multivariate Spherical Symmetric Cauchy Distribution [29]), unlike Multivariate Fig. 2. Perspective and Contour Plots for the Bivariate Centered Isotropic Symmetric Stable Densities. The top ones are for the Bivariate Independent GaussianDistribution, while the bottom ones are for the Bivariate Spherical Symmetric Cauchy Distribution.
Independent Gaussian case, the components of Multivariate Spherical Symmetric Cauchy are uncorrelated, but dependent.The perspective and contour plots for the densities of these two cases are illustrated in Figure 2. The multivariate centeredisotropic symmetric α -stable random vector is a direct extension of the univariate symmetric α -stable random variable, whichis the special case when the dimension parameter d = 1 . In applications, to simulation a d -dimensional random vector v fromthe multivariate centered isotropic symmetric α -stable distribution S ( d, α, γ ) , we can adopt the fact that v = D / q , where D ∼ ˜ S (1 , α/ , γ [cos( πα/ /α ) is a independent univariate positive ( α/ -stable random variable and q ∼ N (0 , I d ) is astandard d -dimensional Gaussian random vector, see [26] for more details. B. Estimation Procedure
By random projections using i.i.d multivariate centered isotropic symmetric α -stable random vectors, we can obtain theestimators for (cid:107) x (cid:107) α ,α and k α ( x ) with α ∈ (0 , , which is presented as follows.We estimate the (cid:107) x (cid:107) α ,α by using the random linear projection measurements: y i = (cid:104) a i , x (cid:105) + σε i , i = 1 , , · · · , n, (8)where a i ∈ R N is i.i.d random vector, and a i = ( a Ti , · · · , a Tip ) T with a ij , j ∈ { , · · · , p } i.i.d drawn from S ( d, α, γ ) . The noiseterm ε i are i.i.d from a distribution F and assume its characteristic function is ϕ , the sets { ε , · · · , ε n } and { a , · · · , a n } are independent. { ε i , i = 1 , , · · · , n } are assumed to be symmetric about , with < E | ε | < ∞ , but they may have infinitevariance. The assumption of symmetry is only for convenience, it was explained how to drop it in Section III-B.e of [18]. Aminor technical condition we place on F is that the roots of its characteristic function ϕ are isolated (i.e. no limit points).This condition is satisfied by many families of distributions, such as Gaussian, Student’s t , Laplace, uniform [ a, b ] , and stablelaws. And we assume that the noise scale parameter σ ≥ and the distribution F are treated as being known for simplicity.Since our work involves different choices of α , we will write γ α instead of γ . Then the link with the norm (cid:107) x (cid:107) α ,α hingeson the following basic lemma. Lemma 3.
Let x = ( x [1] T , · · · , x [ p ] T ) T ∈ R N be fixed, and suppose a = ( a T , · · · , a T p ) T with a j , j ∈ { , · · · , p } i.i.ddrawn from S ( d, α, γ α ) with α ∈ (0 , and γ α > . Then, the random variable (cid:104) a , x (cid:105) has the distribution S (1 , α, γ α (cid:107) x (cid:107) ,α ) . Remark 3.
When x = ( x [1] T , · · · , x [ p ] T ) T ∈ R N has different block lengths which are { d , d , · · · , d p } respectively, thenwe need choose the projection random vector a = ( a T , · · · , a T p ) T with a j , j ∈ { , · · · , p } i.i.d drawn from S ( d j , α, γ α ) . In that case, the conclusion in our Lemma and all the results in the followings still hold without any modifications. This Lemmais an extension of Lemma 1 in [18] from i.i.d univariate symmetric α -stable projection to i.i.d multivariate isotropic symmetric α -stable projection.By using this result, if we generate a set of i.i.d measurement random vectors { a , · · · , a n } as given above and let ˜ y i = (cid:104) a i , x (cid:105) , then { ˜ y , · · · , ˜ y n } is an i.i.d sample from the distribution S (1 , α, γ α (cid:107) x (cid:107) ,α ) . Hence, in the special case of randomlinear measurements without noise, estimating the norm (cid:107) x (cid:107) α ,α reduces to estimating the scale parameter of a univariate stabledistribution from an i.i.d sample.Next, we present the estimation procedure by using the characteristic function method [18], [21], [22]. We use two separatesets of measurements to estimate (cid:92) (cid:107) x (cid:107) , and (cid:92) (cid:107) x (cid:107) α ,α . The respective sample sizes of each measurements are denoted by n and n α . To unify the discussion, we will describe just the procedure to obtain (cid:92) (cid:107) x (cid:107) α ,α for any α ∈ (0 , , since α = 1 is aspecial case. The two estimators are combined to obtain the estimator for k α ( x ) , which follows as: ˆ k α ( x ) = (cid:16) (cid:92) (cid:107) x (cid:107) α ,α (cid:17) − α (cid:16) (cid:92) (cid:107) x (cid:107) , (cid:17) α − α . (9)In fact, the characteristic function of y i has the form: Ψ( t ) = E [exp( ity i )] = exp( − γ αα (cid:107) x (cid:107) α ,α | t | α ) · ϕ ( σt ) , (10)where t ∈ R . Then, we have (cid:107) x (cid:107) α ,α = − γ αα | t | α log (cid:12)(cid:12)(cid:12)(cid:12) Re (cid:18) Ψ( t ) ϕ ( σt ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) . By using the empirical characteristic function ˆΨ n α ( t ) = 1 n α n α (cid:88) i =1 e ity i to estimate Ψ( t ) , we obtain the estimator of (cid:107) x (cid:107) α ,α given by (cid:92) (cid:107) x (cid:107) α ,α =: ˆ v α ( t ) = − γ αα | t | α log (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Re (cid:32) ˆΨ n α ( t ) ϕ ( σt ) (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (11)when t (cid:54) = 0 and ϕ ( σt ) (cid:54) = 0 . C. Asymptotic Properties
Then, similar to the Theorem 2 in [18], we have the uniform central limit theorem (CLT) [11] for ˆ v α ( t ) . Before presentingthe result, we introduce the noise-to-signal ratio constant ρ α = σγ α (cid:107) x (cid:107) ,α . Theorem 1 (Uniform CLT for Mixed (cid:96) /(cid:96) α Norm Estimator). Let α ∈ (0 , . Let ˆ t be any function of { y , · · · , y n α } thatsatisfies γ α ˆ t (cid:107) x (cid:107) ,α p −→ c α , (12)as ( n α , N ) → ∞ for some finite constant c α (cid:54) = 0 and ϕ ( ρ α c α ) (cid:54) = 0 . Then, we have √ n α (cid:32) ˆ v α (ˆ t ) (cid:107) x (cid:107) α ,α − (cid:33) d −→ N (0 , θ α ( c α , ρ α )) (13)as ( n α , N ) → ∞ , where the limiting variance θ α ( c α , ρ α ) is strictly positive and defined according to the formula θ α ( c α , ρ α ) = 1 | c α | α (cid:16) exp(2 | c α | α )2 ϕ ( ρ α | c α | ) + ϕ (2 ρ α | c α | )2 ϕ ( ρ α | c α | ) exp((2 − α ) | c α | α ) − (cid:17) . (14) For simplicity, we use the ˆ t pilot instead of the optimal ˆ t opt in [18]. Since it is simple to implement, and still gives areasonably good estimator. To describe the pilot value, let η > be any number such that ϕ ( η ) > for all η ∈ [0 , η ] (whichexists for any characteristic function). Also, we define the median absolute deviation statistic ˆ m α = median {| y | , · · · , | y n α |} and define ˆ t pilot = min { m α , η σ } . Then we obtain the consistent estimator ˆ c α = γ α ˆ t pilot [ˆ v α (ˆ t pilot )] /α of a constant c α =min (cid:16) | S + ρ α ε | ) , η ρ α (cid:17) , where random variable S ∼ S (1 , , (see the Proposition 3 in [18]), and the consistent estimatorof ρ α , ˆ ρ α = σγ α [ˆ v α (ˆ t pilot )] /α . Therefore, the consistent estimator of the limiting variance θ α ( c α , ρ α ) is θ α (ˆ c α , ˆ ρ α ) . Thus, weimmediately have the following corollary to obtain the confidence intervals for (cid:107) x (cid:107) α ,α . Corollary 1 (Confidence Interval for (cid:107) x (cid:107) α ,α ). Under the conditions of Theorem 1, as ( n α , N ) → ∞ , we have (cid:114) n α θ α (ˆ c α , ˆ ρ α ) (cid:32) ˆ v α (ˆ t pilot ) (cid:107) x (cid:107) α ,α − (cid:33) d −→ N (0 , . (15)Then, it follows that the asymptotic − β confidence interval for (cid:107) x (cid:107) α ,α is (cid:20)(cid:16) − (cid:115) θ α (ˆ c α , ˆ ρ α ) n α z − β/ (cid:17) ˆ v α (ˆ t pilot ) , (cid:16) (cid:115) θ α (ˆ c α , ˆ ρ α ) n α z − β/ (cid:17) ˆ v α (ˆ t pilot ) (cid:21) , (16)where z − β/ is the (1 − β/ -quantile of the standard normal distribution.As a consequence, we can obtain a CLT and a confidence interval for ˆ k α ( x ) by combining the estimators ˆ v α and ˆ v with theirrespective ˆ t pilot . Before we present the main result, for each α ∈ (0 , \ { } , we assume that there is a constant ¯ π α ∈ (0 , ,such that ( n , n α , N ) → ∞ , π α := n α n + n α = ¯ π α + o ( n − / α ) . Theorem 2 (Asymptotic Property for ˆ k α ( x ) ). Let α ∈ (0 , \{ } and the conditions of Theorem 1 hold. Then as ( n , n α , N ) →∞ , (cid:114) n + n α ˆ w α (cid:32) ˆ k α ( x ) k α ( x ) − (cid:33) d −→ N (0 , , (17)where ˆ w α = θ α (ˆ c α , ˆ ρ α ) π α ( − α ) + θ (ˆ c , ˆ ρ )1 − π α ( α − α ) . And consequently, the asymptotic − β confidence interval for k α ( x ) is (cid:20)(cid:16) − (cid:114) ˆ w α n + n α z − β/ (cid:17) ˆ k α ( x ) , (cid:16) (cid:114) ˆ w α n + n α z − β/ (cid:17) ˆ k α ( x ) (cid:21) , (18)where z − β/ is the (1 − β/ -quantile of the standard normal distribution. D. Estimating (cid:107) x (cid:107) , with ˆ k α ( x ) and Small α Next, we present the approximation of ˆ k α ( x ) to (cid:107) x (cid:107) , when α is close to . To state the theorem, we define the blockdynamic range of a non-zero signal x ∈ R N given in (2) as BDNR( x ) = (cid:107) x (cid:107) , ∞ | x | , min , (19)where | x | , min is the smallest (cid:96) norm of the non-zero block of x , i.e. | x | , min = min {(cid:107) x [ j ] (cid:107) : x [ j ] (cid:54) = , j = 1 , · · · , p } .When the block size d = 1 , our BDNR( x ) goes to the DNR( x ) defined in [18]. The following result involves no randomnessand is applicable to any estimator ˜ k α ( x ) . Theorem 3.
Let α ∈ (0 , , x ∈ R N is non-zero signal given in (2), and let ˜ k α ( x ) be any real number. Then, we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˜ k α ( x ) (cid:107) x (cid:107) , − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˜ k α ( x ) k α ( x ) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + α − α (cid:16) ln(BDNR( x )) + α ln( (cid:107) x (cid:107) , ) (cid:17) . (20) Remark 4.
This theorem is a direct extension of Proposition 5 in [18], which corresponds the special case of the block size d = 1 . When choosing ˜ k α ( x ) to be the proposed estimator ˆ k α ( x ) , the first term in (20) is already controlled by Theorem2. As pointed out in [18], the second term is the approximation error that improves for smaller choices of α . When the (cid:96) norms of the signal blocks are similar, the quantity of ln(BDNR( x )) will not be too large. In this case, the bound behaveswell and estimating (cid:107) x (cid:107) , is of interest. On the other hand, if the (cid:96) norms of the signal blocks are very different, that is ln(BDNR( x )) is large, then (cid:107) x (cid:107) , may not be the best measure of block sparsity to estimate. IV. S
IMULATION
In this section, we conduct some simulations to illustrate our theoretical results. We focus on choosing α = 2 , that is weuse ˆ k ( x ) to estimate the block sparsity measure k ( x ) . When estimating k ( x ) , we requires a set of n measurements byusing multivariate isotropic symmetric cauchy projection, and a set of n by using multivariate isotropic symmetric normalprojection. We generated the samples y ∈ R n and y ∈ R n according to y = A x + σ ε and y = A x + σ ε , (21)where A = ( a , · · · , a n ) ∈ R n × N , with a i ∈ R N is i.i.d random vector, and a i = ( a Ti , · · · , a Tip ) T with a ij , j ∈ { , · · · , p } i.i.d drawn from S ( d, , γ ) , we let γ = 1 . Similarly, A = ( b , · · · , b n ) ∈ R n × N , with b i ∈ R N is i.i.d random vector,and b i = ( b Ti , · · · , b Tip ) T with b ij , j ∈ { , · · · , p } i.i.d drawn from S ( d, , γ ) , we let γ = √ . The noise terms ε and ε are generated with i.i.d entries from a standard normal distribution. We considered a sequence of pairs for the sample sizes ( n , n ) = (50 , , (100 , , (200 , , · · · , (500 , . For each experiments, we replicates 200 times. Consequently, wehave 200 realizations of ˆ k ( x ) for each ( n , n ) . We then averaged the quantity | ˆ k ( x ) k ( x ) − | as an approximation of E | ˆ k ( x ) k ( x ) − | . A. Exactly Block Sparse Case
First, we let our signal x be a very simple exactly block sparse vector, that is x = ( 1 √ T , TN − ) T , where q is a vector of length q with entries all ones, q is the zero vector. Then it is obvious that (cid:107) x (cid:107) , = (cid:107) x (cid:107) = 1 , while (cid:107) x (cid:107) , and k ( x ) depend on the block size d that we choose.a) We set η = 1 . The simulation is conducted under several choices of the parameters, N , d and σ –with each parametercorresponding to a separate plot in Figure 3. The signal dimension N is set to 1000, except in the top left plot, where N = 20 , , , . We set d = 5 in all cases, except in top right plot, where d = 1 , , , , corresponding the realvalue k ( x ) = 10 , , , , which also equals (cid:107) x (cid:107) , , the exact block sparsity level of our signal x with the block size to be d . In turn, σ = 0 . in all cases, except in the bottom plot where σ = 0 , . , . , . . In all three plots, the theoretical curvesare computed in the following way. From Theorem 2, we have | ˆ k ( x ) k ( x ) − | ≈ √ ω √ n + n | Z | , where Z is a standard Gaussianrandom variable, and we set ω = θ ( c ,ρ ) π + 4 θ ( c ,ρ )1 − π . Since E | Z | = (cid:112) /π , the theoretical curves are simply √ ω /π √ n + n , asa function of n + n . Note that ω depends on σ and d , which is why there is only one theoretical curve in top left plot forerror dependence on N .From Figure 3, we can see that the black theoretical curves agree well with the colored empirical ones. In addition, theaveraged relative error has no observable dependence on N or d (when σ is fixed), as expected from Theorem 2, and thedependence on the σ is mild.b) Next, a simulation study is conducted to illustrate the asymptotic normality of our estimators in Corollary 1 and Theorem2. We have 1000 replications for these experiments, that is we have 1000 samples of the standardized statistics res1 = (cid:113) n θ (ˆ c , ˆ ρ ) (cid:16) ˆ v (ˆ t pilot ) (cid:107) x (cid:107) , − (cid:17) , res2 = (cid:113) n θ (ˆ c , ˆ ρ ) (cid:16) ˆ v (ˆ t pilot ) (cid:107) x (cid:107) , − (cid:17) and res = (cid:113) n + n ˆ w (cid:16) ˆ k ( x ) k ( x ) − (cid:17) . We consider four cases, with ( n , n ) = (500 , , (1000 , and the noise is standard normal and t (2) which has infinite variance. In all the cases, weset N = 1000 , d = 5 , and σ = 0 . .Figure 4 shows that the density curves of the standardized statistics all are very close to the standard normal density curve,which verified our theoretical results. And these results hold even when the noise distribution is heavy-tailed. Comparing thefour plots, we see that it leads to improve the normal approximation by increasing the sample size n + n and reducing thenoise variance. B. Nearly Block Sparse Case
Second, we consider our signal x ∈ R N to be not exactly block sparse but nearly block sparse, that is the entries of j -thblock all equal c √ d · j − , with c chosen so that (cid:107) x (cid:107) , = (cid:107) x (cid:107) = 1 . In this case, the (cid:96) norm of blocks decays like j − for j ∈ { , · · · , p } . With the same settings as in the previous subsection, we obtain the similar simulation results as the exactlyblock sparse case in Figure 5 and Figure 6. Fig. 3. The averaged relative error | ˆ k ( x ) k ( x ) − | depending on N , d and σ for the exactly block sparse case. C. Estimating (cid:107) x (cid:107) , with ˆ k α ( x ) and Small α Third, we consider the estimation of the mixed (cid:96) /(cid:96) norm (cid:107) x (cid:107) , by using ˆ k α ( x ) with α = 0 . . We consider the signals x ∈ R N of the form c (cid:48) ( 1 √ d · · · √ d (cid:124) (cid:123)(cid:122) (cid:125) d / √ d · · · / √ d (cid:124) (cid:123)(cid:122) (cid:125) d · · · / √ d (cid:107) x (cid:107) , · · · / √ d (cid:107) x (cid:107) , (cid:124) (cid:123)(cid:122) (cid:125) d · · · with c (cid:48) chosen so that (cid:107) x (cid:107) , = (cid:107) x (cid:107) = 1 . In this experiment, we set N = 1000 , d = 5 . To obtain ˆ k α ( x ) , we generatethe samples y = A x + σ ε and y α = A α x + σ ε α , where A = ( a , · · · , a n ) ∈ R n × N , with a i ∈ R N is i.i.drandom vector, and a i = ( a Ti , · · · , a Tip ) T with a ij , j ∈ { , · · · , p } i.i.d drawn from S ( d, , γ ) , we let γ = 1 . Similarly, A α = ( h , · · · , h n ) ∈ R n × N , with h i ∈ R N is i.i.d random vector, and h i = ( h Ti , · · · , h Tip ) T with h ij , j ∈ { , · · · , p } i.i.d drawn from S ( d, α, γ α ) , we let γ α = 1 . The noise terms ε and ε are generated with i.i.d entries from a stan-dard normal distribution. The noise level was set to σ = 0 . . We considered a sequence of pairs for the sample sizes Fig. 4. The density plots of the stanardized statistics for the exactly block sparse case. The dashed black curve is the standard normal density in all fourplots. ( n , n ) = (50 , , (100 , , (200 , , · · · , (500 , . For each experiments, we replicates 200 times. Then, we have 200realizations of ˆ k α ( x ) for each ( n , n ) . We varied (cid:107) x (cid:107) , and BDNR( x ) , and averaged the quantity (cid:12)(cid:12)(cid:12) ˆ k α ( x ) (cid:107) x (cid:107) , − (cid:12)(cid:12)(cid:12) . Specifically,we considered the four cases (cid:107) x (cid:107) , = BNDR( x ) = 10 , , , .Figure 7 shows that ˆ k . ( x ) estimates (cid:107) x (cid:107) , accurately over a wide range of the parameters (cid:107) x (cid:107) , and BDNR( x ) , andthese parameters have a small effect on the relative estimate error (cid:12)(cid:12)(cid:12) ˆ k . ( x ) (cid:107) x (cid:107) , − (cid:12)(cid:12)(cid:12) , which is expected in Theorem 3.V. C ONCLUSION
In this paper, we proposed a new soft measure of block sparsity and obtained its estimator by adopting multivariate centeredisotropic symmetric α -stable random projections. The asymptotic properties of the estimators were presented. A series ofsimulation experiments illustrated our theoretical results.There are some interesting issues left for future research. Throughout the paper, we assume that the noise scale parameter σ and the characteristic function of noise ψ are known. In practice, however, they are usually unknown and need to be estimated.Although [18] considered the effects of adopting their estimators in estimation procedure, how to estimate these parametersbased on our random linear projection measurements y itself is still unknown. In addition, we have been considering thesparsity and block sparsity estimations for real-valued signals so far. It will be interesting to generalize the existing results tothe case of complex-valued signals. A PPENDIX AP ROOFS
Our main theoretical results Theorem 1 and Theorem 2 follow from Theorem 2 and Corollary 1 in [18], since in bothestimation procedures, the measurements without noise both have the univariate symmetric stable distribution but with different Fig. 5. The averaged relative error | ˆ k ( x ) k ( x ) − | depending on N , d and σ for the nearly block sparse case. scale parameters after the random projection, γ α (cid:107) x (cid:107) α for sparsity estimation, γ α (cid:107) x (cid:107) ,α for block sparsity estimation. Therefore,the asymptotic results for the scale parameters estimators by using characteristic function method are rather similar. In ordernot to repeat, all the details are omitted. Next, we only present the proofs for Lemma 2, Lemma 3 and Theorem 3. Proof of Lemma 2.
The proof procedure follows from the proof of Proposition 1 in [18]with some careful modifications.Let c be as in Lemma 1, and let κ ≥ be any number such that κ ) + 2 dκ + 2 κ ≤ c . (22)Define the positive number t = m/κ d ln( eNm ) , and choose k = (cid:98) t (cid:99) in Lemma 1. Note that when m ≤ N , this choice of k is clearly Fig. 6. The density plots of the stanardized statistics for the nearly block sparse case. The dashed black curve is the standard normal density in all four plots. at most p , and hence lies in { , · · · , p } . Then we have k ln( eNkd ) ≤ ( t + 1) ln( eNtd )= (cid:32) m/κ d ln( eNm ) + 1 (cid:33) · ln (cid:18) κ eNm · ln( eNm ) (cid:19) ≤ (cid:32) m/κ d ln( eNm ) + 1 (cid:33) · ln (cid:20) ( κ eNm ) (cid:21) = 2 m/κ d ln( eNm ) (cid:18) ln( κ ) + ln( eNm ) (cid:19) + 2 ln( κ eNm ) ≤ (cid:18) κ ) + 2 dκ + 2 κ (cid:19) m ≤ mc by using our assumption N ≥ m ≥ κ ln( κ eNm ) . Hence, our choice of κ ensures m ≥ c k ln( eN/kd ) . To finish the proof,let κ = c be as in Lemma 1 so that the bound (5) holds with probability at least − − κ m ) . Moreover, we have √ k (cid:107) x − x k (cid:107) , ≤ t (cid:107) x (cid:107) , = √ κ √ m (cid:114) d (cid:107) x (cid:107) , ln( eNm ) . Fig. 7. The average relative error (cid:12)(cid:12)(cid:12) ˆ k α ( x ) (cid:107) x (cid:107) , − (cid:12)(cid:12)(cid:12) with α = 0 . depending on (cid:107) x (cid:107) , and BNDR( x ) . Let c and c be as in (5), then we have (cid:107) ˆ x − x (cid:107) (cid:107) x (cid:107) ≤ c (cid:107) x − x k (cid:107) , √ k (cid:107) x (cid:107) + c δ (cid:107) x (cid:107) ≤ c √ κ √ m (cid:115) d (cid:107) x (cid:107) , (cid:107) x (cid:107) ln( eNm ) + c δ (cid:107) x (cid:107) , then the proof is completed by setting κ = c √ κ , κ = c and noticing the fact that (cid:107) x (cid:107) = (cid:107) x (cid:107) , . Proof of Lemma 3.
By using the independence of a j , j ∈ { , · · · , p } , for t ∈ R , the characteristic function of (cid:104) a , x (cid:105) hasthe form: E [exp( it (cid:104) a , x (cid:105) )] = E exp (cid:16) it ( p (cid:88) j =1 x [ j ] T a j ) (cid:17) = p (cid:89) j =1 E [exp( it x [ j ] T a j )]= p (cid:89) j =1 exp( − γ αα (cid:107) t x [ j ] (cid:107) α )= exp − γ αα p (cid:88) j =1 (cid:107) x [ j ] (cid:107) α | t | α = exp( − ( γ α (cid:107) x (cid:107) ,α ) α | t | α ) . Then, the Lemma follows from the Definition 1.
Proof of Theorem 3.
The triangle inequality implies | ˜ k α ( x ) − (cid:107) x (cid:107) , |(cid:107) x (cid:107) , ≤ | ˜ k α ( x ) − k α ( x ) |(cid:107) x (cid:107) , + | k α ( x ) − (cid:107) x (cid:107) , |(cid:107) x (cid:107) , . As k α ( x ) ≤ (cid:107) x (cid:107) , , we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˜ k α ( x ) (cid:107) x (cid:107) , − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˜ k α ( x ) k α ( x ) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + | k α ( x ) − (cid:107) x (cid:107) , |(cid:107) x (cid:107) , . Thus, it suffices to bound the last term on the right side. Since k ( x ) = (cid:107) x (cid:107) , and k α ( x ) is a non-increasing function of α , then | k α ( x ) − (cid:107) x (cid:107) , | = (cid:90) α (cid:12)(cid:12)(cid:12)(cid:12) ddu k u ( x ) (cid:12)(cid:12)(cid:12)(cid:12) du. We now derive a bound on (cid:12)(cid:12) ddu k u ( x ) (cid:12)(cid:12) . For u ∈ (0 , α ] and α ∈ (0 , , if we define the probability vector ω j = π j ( x ) (cid:107) π ( x ) (cid:107) uu with j = 1 , · · · , p , π j ( x ) and π ( x ) defined as Section II.A, then it holds that (cid:12)(cid:12)(cid:12)(cid:12) ddu k u ( x ) (cid:12)(cid:12)(cid:12)(cid:12) = − ddu k u ( x )= − ddu exp( H u ( π ( x )))= − k u ( x ) ddu H u ( π ( x ))= − k u ( x ) (cid:18) − − u ) (cid:88) j : (cid:107) x [ j ] (cid:107) (cid:54) =0 ω j ( x ) log (cid:16) ω j ( x ) π j ( x ) (cid:17)(cid:19) ≤ (cid:107) x (cid:107) , (1 − u ) (cid:88) j : (cid:107) x [ j ] (cid:107) (cid:54) =0 ω j ( x ) log (cid:16) ω j ( x ) π j ( x ) (cid:17) , by using k u ( x ) ≤ (cid:107) x (cid:107) , and the formula for ddu H u ( π ( x )) . Next, as ω j ( x ) π j ( x ) = π j ( x ) u − k u ( x ) − u , then for j with (cid:107) x [ j ] (cid:107) (cid:54) = 0 , we have ω j ( x ) π j ( x ) ≤ π j ( x ) u − k ∞ ( x ) − u , since k ∞ ( x ) ≤ k u ( x ) ≤ π j ( x ) − k ∞ ( x ) · k ∞ ( x ) u , since π j ( x ) ∈ (0 , (cid:107) x (cid:107) , (cid:107) x [ j ] (cid:107) (cid:107) x (cid:107) , ∞ (cid:107) x (cid:107) , · k ∞ ( x ) u , since k ∞ ( x ) = (cid:107) x (cid:107) , (cid:107) x (cid:107) , ∞ ≤ BDNR( x ) · (cid:107) x (cid:107) u , , since k ∞ ( x ) ≤ (cid:107) x (cid:107) , . As this bound does not depend on j and ω j sum to 1, we obtain for all u ∈ (0 , α ] and α ∈ (0 , , (cid:12)(cid:12)(cid:12)(cid:12) ddu k u ( x ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:107) x (cid:107) , (1 − u ) (cid:16) log BDNR( x ) + u log( (cid:107) x (cid:107) , ) (cid:17) ≤ (cid:107) x (cid:107) , (1 − u ) (cid:16) log BDNR( x ) + α log( (cid:107) x (cid:107) , ) (cid:17) . (23)Finally, the basic integral result (cid:82) α − u ) du = α − α completes the proof.A CKNOWLEDGMENT
This work is supported by the Swedish Research Council grant (Reg.No. 340-2013-5342).R
EFERENCES[1] Baraniuk, R. G., Cevher, V., Duarte, M. F., & Hegde, C. (2010). Model-based compressive sensing.
IEEE Transactions on Information Theory & Davies, M. E. (2009). Iterative hard thresholding for compressed sensing.
Applied and Computational Harmonic Analysis & Davies, M. E. (2009). Sampling theorems for signals from the union of finite-dimensional linear subspaces.
IEEE Transactions onInformation Theory [4] Candes, E. J., Romberg, J. K., & Tao, T. (2006). Stable signal recovery from incomplete and inaccurate measurements.
Communications on Pure andApplied Mathematics & Tao, T. (2005). Decoding by linear programming.
IEEE Transactions on Information Theory & Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies?.
IEEE Transactions on InformationTheory & Huo, X. (2006). Theoretical results on sparse representations of multiple-measurement vectors.
IEEE Transactions on Signal Processing & Kreutz-Delgado, K. (2005). Sparse solutions to linear inverse problems with multiple measurement vectors.
IEEETransactions on Signal Processing
IEEE Transactions on Information Theory & Eldar, Y. C. (2011). Structured compressed sensing: From theory to applications.
IEEE Transactions on Signal Processing 59(9) & Bolcskei, H. (2010). Block-sparse signals: Uncertainty relations and efficient recovery.
IEEE Transactions on SignalProcessing & Kutyniok, G. (2012). Compressed Sensing: Theory and Applications. Cambridge University Press.[14] Eldar, Y. C., & Mishali, M. (2009). Robust recovery of signals from a structured union of subspaces.
IEEE Transactions on Information Theory & Vidal, R. (2012). Block-sparse recovery via convex optimization.
IEEE Transactions on Signal Processing & Rauhut, H. (2013). A Mathematical Introduction to Compressive Sensing. New York, NY, USA: Springer-Verlag.[17] Lopes, M. E. (2013). Estimating Unknown Sparsity in Compressed Sensing. In
ICML (3) (pp. 217–225).[18] Lopes, M. E. (2016). Unknown Sparsity in Compressed Sensing: Denoising and Inference.
IEEE Transactions on Information Theory & Wan, C. (2011). The group lasso for stable recovery of block-sparse signal representations.
IEEE Transactions on Signal Processing & Ward, R. K. (2010). Compressed sensing of color images.
Signal Processing & Horowitz, J. L. (1995). Robust scale estimation in the error-components model using the empirical characteristic function.
CanadianJournal of Statistics & Lenth, R. V. (1995). Robust scale estimation based on the the empirical characteristic function.
Statistics & ProbabilityLetters & Eldar, Y. C. (2008). Reduce and boost: Recovering arbitrary sets of jointly sparse vectors.
IEEE Transactions on Signal Processing & Eldar, Y. C. (2009). Blind multiband signal reconstruction: Compressed sensing for analog signals.
IEEE Transactions on Signal Processing & Tropp, J. A. (2009). CoSaMP: Iterative signal recovery from incomplete and inaccurate samples.
Applied and Computational HarmonicAnalysis
Computational Statistics & Hassibi, B. (2008). Recovering sparse signals using sparse measurement matrices in compressed DNA microarrays.
IEEE Journal of Selected Topics in Signal Processing & Vershynin, R. (2013). OneBit Compressed Sensing by Linear Programming.
Communications on Pure and Applied Mathematics
Journal of Multivariate Analysis & Hassibi, B. (2010). On the reconstruction of block-sparse signals with an optimal number of measurements.
IEEE Transactionson Signal Processing & Gilbert, A. C. (2007). Signal recovery from random measurements via orthogonal matching pursuit.
IEEE Transactions on InformationTheory
Sampling Theory, a Renaissance (pp. 3-66). Springer InternationalPublishing.[33] Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables.
Journal of the Royal Statistical Society: Series B(Statistical Methodology) & Banihashemi, A. H. (2015). Iterative Reweighted (cid:96) /(cid:96) Recovery Algorithms for Compressed Sensing of Block Sparse Signals.
IEEETransactions on Signal Processing63(17)