The Gelfand widths of ℓ p -balls for 0<p≤1
aa r X i v : . [ m a t h . F A ] D ec The Gelfand widths of ℓ p -balls for 0 < p ≤ Simon Foucart, Alain Pajor, Holger Rauhut, Tino UllrichApril 19 2010
Abstract
We provide sharp lower and upper bounds for the Gelfand widths of ℓ p -balls in the N -dimensional ℓ Nq -space for 0 < p ≤ p < q ≤
2. Such estimates are highly relevantto the novel theory of compressive sensing, and our proofs rely on methods from this area.
Key Words:
Gelfand widths, compressive sensing, sparse recovery, ℓ -minimization, ℓ p -minimization. AMS Subject classification:
Gelfand widths are an important concept in classical and modern approximation and com-plexity theory. They have found recent interest in the rapidly emerging field of compressivesensing [6, 14, 17] because they give general performance bounds for sparse recovery methods.Since vectors in ℓ p -balls, 0 < p ≤
1, can be well-approximated by sparse vectors, the Gelfandwidths of such balls are particularly relevant in this context. In remarkable papers [26, 21, 19]from the 1970s and 80s due to Kashin, Gluskin, and Garnaev, upper and lower estimates forthe Gelfand widths of ℓ -balls are provided. In his seminal paper introducing compressivesensing [17], Donoho extends these estimates to the Gelfand widths of ℓ p -balls with p < p = 1. For completeness, we also give aproof of the upper bound based again on compressive sensing arguments. These argumentsalso provide the same sharp asymptotic behavior for the Gelfand widths of weak- ℓ p -balls. In this paper, we consider the finite-dimensional spaces ℓ Np , that is, R N endowed with the usual ℓ p -(quasi-)norm defined, for x ∈ R N , by k x k p := (cid:16) N X ℓ =1 | x ℓ | p (cid:17) /p , < p < ∞ , k x k ∞ := max ℓ =1 ,...,N | x ℓ | . For 1 ≤ p ≤ ∞ , this is a norm, while for 0 < p <
1, it only satisfies the p -triangle inequality k x + y k pp ≤ k x k pp + k y k pp , x, y ∈ R N . (1.1)1hus, k · k p defines a quasi-norm with (optimal) quasi-norm constant C = max { , /p − } . Asa reminder, k · k X is called a quasi-norm on R N with quasi-norm constant C ≥ k x + y k X ≤ C ( k x k X + k y k X ) , x, y ∈ R N . Other quasi-normed spaces considered in this paper are the spaces weak- ℓ Np , that is, R N en-dowed with the ℓ p, ∞ -quasi-norm defined, for x ∈ R N , by k x k p, ∞ := max ℓ =1 ,...,N ℓ /p | x ∗ ℓ | , < p ≤ ∞ , where x ∗ ∈ R N is a non-increasing rearrangement of x . We shall investigate the Gelfand widthsin ℓ Nq of the unit balls B Np := { x ∈ R N , k x k p ≤ } and B Np, ∞ := { x ∈ R N , k x k p, ∞ ≤ } of ℓ Np and ℓ Np, ∞ for 0 < p ≤ p < q ≤ m of a subset K of R N in the (quasi-)normedspace ( R N , k · k X ) is defined as d m ( K, X ) := inf A ∈ R m × N sup v ∈ K ∩ ker A k v k X , where ker A := { v ∈ R N , Av = 0 } denotes the kernel of A . It is well-known that the aboveinfimum is actually realized [35]. Let us observe that d m ( K, X ) = 0 for m ≥ N when 0 ∈ K ,so that we restrict our considerations to the case m < N in the sequel. Let us also observethat the simple inclusion B Np ⊂ B Np, ∞ implies d m ( B Np , ℓ Nq ) ≤ d m ( B Np, ∞ , ℓ Nq ) . From this point on, we aim at finding a lower bound for d m ( B Np , ℓ Nq ) and an upper bound for d m ( B Np, ∞ , ℓ Nq ) with the same asymptotic behaviors. Our main result is summarized below. Theorem 1.1.
For < p ≤ and p < q ≤ , there exist constants c p,q , C p,q > dependingonly on p and q such that, if m < N , then c p,q min (cid:26) , ln( N/m ) + 1 m (cid:27) /p − /q ≤ d m ( B Np , ℓ Nq ) ≤ C p,q min (cid:26) , ln( N/m ) + 1 m (cid:27) /p − /q , (1.2) and, if p < , c p,q min (cid:26) , ln( N/m ) + 1 m (cid:27) /p − /q ≤ d m ( B Np, ∞ , ℓ Nq ) ≤ C p,q min (cid:26) , ln( N/m ) + 1 m (cid:27) /p − /q . (1.3)In the case p = 1 and q = 2, the upper bound of (1.2) with a slightly worse log-term wasshown by Kashin in [26] by considering Kolmogorov widths, which are dual to Gelfand widths[29, 35]. The lower bound and the optimal log-term for p = 1 and 1 < q ≤ p = 1 was given by Carl and Pajor in [10]. Theydid not pass to Kolmogorov widths, but rather used Carl’s theorem [9] (see also [11, 35]) thatbounds in particular Gelfand numbers from below by entropy numbers, which are completelyunderstood even for p, q <
1, see [38, 25, 27]. An upper bound for p < q = 2 was first2rovided by Donoho [17] with log( N ) instead of log( N/m ). With an adaptation of a methodfrom [29], Vyb´ıral [39, Lem. 4.11] also provided the upper bound of (1.2) when 0 < p ≤
1. InSection 3, we use compressive sensing techniques to give an alternative proof that provides theupper bound of (1.3).Donoho’s attempt to prove the lower bound of (1.2) for the case 0 < p < q = 2consists in applying Carl’s theorem and then using known estimates for entropy numbers,similarly to the approach by Carl and Pajor for p = 1. However, it is unknown whether Carl’stheorem extends to quasi-norm balls, in particular to ℓ p -balls with p <
1. The standard proofof Carl’s theorem for Gelfand widths [11, 29] uses duality arguments, which are not availablefor quasi-Banach spaces. We believe that Carl’s theorem actually fails for Gelfand widths ofgeneral quasi-norm balls, although it turns out to be a posteriori true in our specific situationdue to Theorem 1.1.We shortly comment on the case q >
2. Since then k v k q ≤ k v k for all v ∈ R N , we havethe upper estimate d m ( B Np , ℓ Nq ) ≤ d m ( B Np , ℓ N ) ≤ C p, min (cid:26) , ln( N/m ) + 1 m (cid:27) /p − / . (1.4)The lower bound in (1.2) extends to q >
2, but is unlikely to be optimal in this case. It seemsrather that (1.4) is close to the correct behavior. At least for p = 1 and q >
2, [20] gives lowerestimates of related Kolmogorov widths which then leads to (see also [39]) d m ( B N , ℓ Nq ) ≥ c q m − / . The latter matches (1.4) up to the log-factor. We expect a similar behavior for p <
1, but thisfact remains to be proven.
Let us now outline the connection to compressive sensing. This emerging theory explores therecovery of vectors x ∈ R N from incomplete linear information y = Ax ∈ R m , where A ∈ R m × N and m < N . Without additional information, reconstruction of x from y is clearly impossiblesince, even in the full rank case, the system y = Ax has infinitely many solutions. Compressivesensing makes the additional assumption that x is sparse or at least compressible. A vector x ∈ R N is called s -sparse if at most s of its coordinates are non-zero. The error of best s -termapproximation is defined as σ s ( x ) p := inf {k x − z k p , z is s -sparse } . Informally, a vector x is called compressible if σ s ( x ) p decays quickly in s . It is classical to showthat, for q > p , σ s ( x ) q ≤ s /p − /q k x k p , (1.5) σ s ( x ) q ≤ D p,q s /p − /q k x k p, ∞ , D p,q := ( q/p − − /q . (1.6)This implies that the balls B Np and B Np, ∞ with p ≤ p , the better x in B Np or in B Np, ∞ is approximable in ℓ Nq by s -sparse vectors.3he aim of compressive sensing is to find good pairs of linear measurement maps A ∈ R m × N and (non-linear) reconstruction maps ∆ : R m → R N that recover compressible vectors x withsmall errors x − ∆( Ax ). In order to measure the performance of a pair ( A, ∆), one defines, fora subset K of R N and a (quasi-)norm k · k X on R N , E m ( K, X ) := inf A ∈ R m × N , ∆: R m → R N sup x ∈ K k x − ∆( Ax ) k X . Quantities of this type play a crucial role in the modern field of information based complexity[34]. In our situation, only linear information is allowed in order to recover K uniformly.The quantities E m ( K, X ) are closely linked to the Gelfand widths, as stated in the followingproposition [17, 14], see also [36, 33].
Proposition 1.2.
Let K ⊂ R N be such that K = − K and K + K ⊂ C K for some C ≥ ,and let k · k X be a quasi-norm on R N with quasi-norm constant C . Note that C = 2 if K isa norm ball and that C = 1 if k · k X is a norm. Then C − d m ( K, X ) ≤ E m ( K, X ) ≤ C d m ( K, X ) . Combining the previous proposition with Theorem 1.1 gives optimal performance boundsfor recovery of compressible vectors in B Np , 0 < p ≤
1, when the error is measured in ℓ q , p < q ≤
2. Typically, the most interesting case is q = 2, for which we end up with c p min (cid:26) , ln( N/m ) + 1 m (cid:27) /p − / ≤ E m ( B Np , ℓ N ) ≤ C p min (cid:26) , ln( N/m ) + 1 m (cid:27) /p − / . For practical purposes, it is of course desirable to find matrices A ∈ R m × N and efficientlyimplementable reconstruction maps ∆ that realize the optimal bound above. For instance,Gaussian random matrices A ∈ R m × N , i.e., matrices whose entries are independent copiesof a zero-mean Gaussian variable, provide optimal measurement maps with high probability[8, 17, 1]. An optimal reconstruction map is obtained via basis pursuit [13, 17, 8], i.e., via the ℓ -minimization mapping given by∆ ( y ) := arg min k z k subject to Az = y. This mapping can be computed with efficient convex optimization methods [2], and works verywell in practice. The proof of the lower bound in (1.2) will further involve ℓ p -minimization for0 < p ≤ p ( y ) := arg min k z k p subject to Az = y. A key concept in the analysis of sparse recovery via ℓ p -minimization is the restricted isometryproperty (RIP). This well-established concept in compressive sensing [8, 7] is the main toolfor the proof of the upper bound in (1.3). We recall that the s th order restricted isometryconstant δ s ( A ) of a matrix A ∈ R m × N is defined as the smallest δ > − δ ) k x k ≤ k Ax k ≤ (1 + δ ) k x k for all s -sparse x ∈ R N . Small restricted isometry constants imply stable recovery by ℓ -minimization, as well as by ℓ p -minimization for 0 < p <
1. For later reference, we state the following result [7, 5, 18].4 heorem 1.3.
Let < p ≤ . If A ∈ R m × N has a restricted isometry constant δ s < √ − ,then, for all x ∈ R N , k x − ∆ p ( Ax ) k pp ≤ Cσ s ( x ) pp , (1.7) where C > is a constant that depends only on δ s . In particular, the reconstruction of s -sparsevectors is exact. Given a prescribed 0 < δ <
1, it is known [1, 8, 31] that, if the entries of the matrix A are independent copies of a zero-mean Gaussian variable with variance 1 /m , then there existconstants C , C > δ ) such that δ s ( A ) ≤ δ holds with probability greaterthan 1 − e − C m provided that m ≥ C s ln( eN/s ) . (1.8)In particular, there exists a matrix A ∈ R m × N such that the pair ( A, ∆ ), and more generally( A, ∆ p ) for 0 < p ≤
1, allows stable recovery in the sense of (1.7) as soon as the number ofmeasurements satisfies (1.8). Vice versa, we will see in Theorem 2.7 that the existence of anypair ( A, ∆) allowing such a stable recovery forces the number of measurements to satisfy (1.8).Lemma 2.4, which is of independent interest, estimates the minimal number of measure-ments for the pair ( A, ∆ p ) to allow exact (but not necessarily stable) recovery of sparse vectors.Namely, we must have m ≥ c ps ln( N/ ( c s )) (1.9)for some explicitly given constants c , c >
0. In the case p = 1, this result can be alsoobtained as a consequence of a corresponding lower bound on neighborliness of centrosymmetricpolytopes, see [16, 28]. Decreasing p while keeping N fixed shows that the bound (1.9) becomesin fact irrelevant for small p , since the bound m ≥ s holds as soon as there exists a pair ( A, ∆)allowing exact recovery of all s -sparse vectors, see [14, Lem. 3.1]. Combining the two bounds,we see that s -sparse recovery by ℓ p -minimization forces m ≥ C s (cid:0) p ln( N/ ( C s )) (cid:1) , for some constants C , C >
0. Interestingly, if such an inequality is fulfilled (with possiblydifferent constants C , C ) and if A is a Gaussian random matrix, then the pair ( A, ∆ p ) allows s -sparse recovery with high probability, see [12]. We note, however, that exact ℓ p -minimizationwith p <
1, as a non-convex optimization program, encounters significant difficulties of imple-mentation. For more information on compressive sensing, we refer to [4, 6, 8, 14, 17, 37].
The first author is supported by the French National Research Agency (ANR) through theproject ECHANGE (ANR-08-EMER-006). The third and fourth author acknowledge supportby the Hausdorff Center for Mathematics, University of Bonn. The third author acknowledgesfunding through the WWTF project SPORTS (MA07-004).
In this section, we use compressive sensing methods to establish the lower bound in (1.2), hencethe lower bound in (1.3) as a by-product. Precisely, we show the following result, in which therestriction q ≤ roposition 2.1. For < p ≤ and p < q ≤ ∞ , there exists a constant c p,q > such that d m ( B Np , ℓ Nq ) ≥ c p,q min n , ln( eN/m ) m o /p − /q , m < N . (2.1)The proof of Proposition 2.1 involves several auxiliary steps. We start with a result [23, 24]on the unique recovery of sparse vectors via ℓ p -minimization for 0 < p ≤
1. A proof is includedfor the reader’s convenience. We point out that, given a subset S of [ N ] := { , ..., N } and avector v ∈ R N , we denote by v S the vector that coincides with v on S and that vanishes onthe complementary set S c := [ N ] \ S . Lemma 2.2.
Let < p ≤ and N, m, s ∈ N with m, s < N . For a matrix A ∈ R m × N , thefollowing statements are equivalent.(a) Every s -sparse vector x is the unique minimizer of k z k p subject to Az = Ax ,(b) A satisfies the p -null space property of order s , i.e., for every v ∈ ker A \ { } and every S ⊂ [ N ] with | S | ≤ s , k v S k pp < k v k pp . Proof: ( a ) ⇒ ( b ): Let v ∈ ker A \ { } and S ⊂ [ N ] with | S | ≤ s . Since v = v S + v S c satisfies Av = 0, we have Av S = A ( − v S c ). Then, since v S is s -sparse, (a) implies k v S k pp < k − v S c k pp = k v S c k pp . Adding k v S k pp on both sides and using k v S c k pp + k v S k pp = k v k pp gives (b).( b ) ⇒ ( a ): Let x be an s -sparse vector and let S := supp x . Let further z = x be such that Az = Ax . Then v := x − z ∈ ker A \ { } and k x k pp ≤ k x S − z S k pp + k z S k pp = k v S k pp + k z S k pp , (2.2)where the first estimate is a consequence of the p -triangle inequality (1.1). Clearly, (b) implies k v S k pp < k v S c k pp . Plugging this into (2.2) and using that v S c = − z S c gives k x k pp < k v S c k pp + k z S k pp = k z S c k pp + k z S k pp = k z k pp . This proves that x is the unique minimizer of k z k p subject to Az = Ax .The next auxiliary step is a well-known combinatorial lemma, see for instance [32, 3, 22],[30, Lem. 3.6]. A proof that provides explicit constants is again included for the reader’sconvenience. Lemma 2.3.
Let
N, s ∈ N with s < N . There exists a family U of subsets of [ N ] such that(i) Every set in U consists of exactly s elements.(ii) For all I, J ∈ U with I = J , it holds | I ∩ J | < s/ .(iii) The family U is “large” in the sense that |U | ≥ (cid:16) N s (cid:17) s/ . roof: We may assume that s ≤ N/
4, for otherwise we can take a family U consisting of justone element. Let us denote by B ( N, s ) the family of subsets of [ N ] having exactly s elements.This family has size |B ( N, s ) | = (cid:0) Ns (cid:1) . We draw an arbitrary element I ∈ B ( N, s ) and collectin a family A all the sets J ∈ B ( N, s ) such that | I ∩ J | ≥ s/ A has size at most s X k = ⌈ s/ ⌉ (cid:18) sk (cid:19)(cid:18) N − ss − k (cid:19) ≤ s max ⌈ s/ ⌉≤ k ≤ s (cid:18) N − ss − k (cid:19) = 2 s (cid:18) N − s ⌊ s/ ⌋ (cid:19) , (2.3)the latter inequality holding because ⌊ s/ ⌋ ≤ ( N − s ) / s ≤ N/
2. We throw away A and observe that every element in J ∈ B ( N, s ) \ A satisfies | I ∩ J | < s/ I ∈ B ( N, s ) \ A , provided that the latter is not empty. Werepeat the procedure, i.e., we define a family A relative to I and draw an arbitrary element I ∈ B ( N, s ) \ ( A ∪ A ), and so forth until no more elements are left. The size of each set A i can always be estimated from above by (2.3). This results in a collection U = { I , . . . , I L } ofsubsets of [ N ] satisfying (i) and (ii). We finally observe that L ≥ (cid:0) Ns (cid:1) s (cid:0) N − s ⌊ s/ ⌋ (cid:1) = 12 s N ( N − · · · ( N − s + 1)( N − s )( N − s − · · · ( N − s − ⌊ s/ ⌋ + 1) 1 s ( s − · · · ( ⌊ s/ ⌋ + 1) ≥ s N ( N − · · · ( N − ⌈ s/ ⌉ + 1) s ( s − · · · ( s − ⌈ s/ ⌉ + 1) ≥ s (cid:16) Ns (cid:17) ⌈ s/ ⌉ ≥ (cid:16) N s (cid:17) s/ . This concludes the proof by establishing (iii).We now use Lemma 2.3 for the final auxiliary result, which is quite interesting on its own. Itgives an estimate of the minimal number of measurements for exact recovery of sparse vectorsvia ℓ p -minimization, where 0 < p ≤ Lemma 2.4.
Let < p ≤ and N, m, s ∈ N with m < N and s < N/ . If A ∈ R m × N is amatrix such that every s -sparse vector x is a minimizer of k z k p subject to Az = Ax , then m ≥ c ps ln (cid:16) Nc s (cid:17) , where c := 1 / ln 9 ≈ . and c := 4 . Remark 2.5.
Lemma 2.6 could be rephrased (with modified constants) by replacing s -sparsevectors, s ≥ , by s -sparse vectors, s ≥ . In the case s = 1 , it is possible for every -sparsevector x to be a (nonunique) minimizer of k z k subject to Az = Ax , yet m ≥ c p ln( N/c ) failsfor all constants c , c > . This can be verified by taking m = 1 and A = (cid:2) · · · (cid:3) . Proof:
We consider the quotient space X := R N / ker A = { [ x ] := x + ker A , x ∈ R N } , which has algebraic dimension r := rank A ≤ m . It is a quasi-Banach space equipped with k [ x ] k A,p := inf v ∈ ker A k x + v k p . Indeed, a simple computation reveals that k · k
A,p satisfies the p -triangle inequality, i.e., k [ x ] + [ y ] k pA,p ≤ k [ x ] k pA,p + k [ y ] k pA,p .
7y assumption, the quotient map [ · ] preserves the norm of every 2 s -sparse vector. We nowchoose a family U of subsets of [ N ] satisfying (i), (ii), (iii) of Lemma 2.3. For a set I ∈ U , wedefine an element x I ∈ ℓ Np with k x I k p = 1 by x I := 1 s /p X i ∈ I e i , (2.4)where ( e , . . . , e N ) denotes the canonical basis of R N . For I, J ∈ U , I = J , (ii) yields k x I − x J k pp > s − s/ s = 1 . Since the vector x I − x J is a 2 s -sparse vector, we obtain k [ x I ] − [ x J ] k A,p = k [ x I − x J ] k A,p = k x I − x J k p > . The p -triangle inequality implies that { [ x I ] + (1 / /p B X , I ∈ U } is a disjoint collection ofballs included in the ball (3 / /p B X , where B X denotes the unit ball of ( X, k · k A,p ). Let vol( · )denote a volume form on X , that is a translation invariant measure satisfying vol( B X ) > λB X ) = λ r vol( λB X ) for all λ > X is isomorphic to R r ).The volumes satisfy the relation X I ∈U vol (cid:0) [ x I ] + (1 / /p B X (cid:1) ≤ vol (cid:0) (3 / /p B X (cid:1) . By translation invariance and homogeneity, we then derive |U | (1 / r/p vol (cid:0) B X (cid:1) ≤ (3 / r/p vol (cid:0) B X (cid:1) . As a result of (iii), we finally obtain (cid:16) N s (cid:17) s/ ≤ r/p ≤ m/p . Taking the logarithm on both sides gives the desired result.Now we are ready to prove Proposition 2.1 . The underlying idea is that a small Gelfandwidth would imply 2 s -sparse recovery for s large enough to violate the conclusion of Lemma 2.4. Proof:
With c := (1 / /p − /q and d := 2 c p/ (4 + c ) ≈ . p , we are going to prove that d m ( B Np , ℓ Nq ) ≥ c µ /p − /q , where µ := min n , d ln( eN/m ) m o . (2.5)The desired result will follow with c p,q := c d /p − /q . By way of contradiction, we assume that d m ( B Np , ℓ Nq ) < c µ /p − /q . This implies the existence of a matrix A ∈ R m × N such that, for all v ∈ ker A \ { } , k v k q < c µ /p − /q k v k p . For a fixed v ∈ ker A \{ } , in view of the inequalities k v k p ≤ N /p − /q k v k q and c ≤ (1 / /p − /q ,we derive 1 < ( µN/ /p − /q , so that 1 ≤ /µ < N/
2. We then define s := ⌊ /µ ⌋ ≥
1, so that12 µ < s ≤ µ . v ∈ ker A \ { } and S ⊂ [ N ] with | S | ≤ s , we have k v S k p ≤ (2 s ) /p − /q k v S k q ≤ (2 s ) /p − /q k v k q < c (2 sµ ) /p − /q k v k p ≤ /p k v k p . This shows that the p -null space property of order 2 s is satisfied. Hence, Lemmas 2.2 and 2.4imply m ≥ c ps ln (cid:16) Nc s (cid:17) . (2.6)Besides, since the pair ( A, ∆ p ) allows exact recovery of all 2 s -sparse vectors, we have m ≥ s ) = c s. (2.7)Using (2.7) in (2.6), it follows that m ≥ c ps ln (cid:16) Nm (cid:17) = c ps ln (cid:16) eNm (cid:17) − c ps > c p µ ln (cid:16) eNm (cid:17) − c m. After rearrangement, we deduce m > c p c ln( eN/m )min (cid:8) , d ln( eN/m ) /m (cid:9) ≥ c p c ln( eN/m ) d ln( eN/m ) m = m. This is the desired contradiction.
Remark 2.6.
When m is close to N , the lower estimate (2.5) is rather poor. In this case, anice and simple argument proposed to us by Vyb´ıral gives the improved estimate d m ( B Np , ℓ Nq ) ≥ (cid:16) m + 1 (cid:17) /p − /q , m < N . (2.8) Indeed, for an arbitrary matrix A ∈ R m × N , the kernel of A and the ( m + 1) -dimensional space { x ∈ R N : x i = 0 for all i > m + 1 } have a nontrivial intersection. We then choose a vector v = 0 in this intersection, and (2.8) follows from the inequality k v k p ≤ ( m + 1) /p − /q k v k q . We close this section with the important observation that any measurement/reconstructionscheme that provides ℓ -stability requires a number of measurements scaling at least like thesparsity times a log-term. This may be viewed as a consequence of Propositions 1.2 and 2.1.Indeed, fixing p <
1, the inequalities (1.7) and (1.5) imply d m ( B Np , ℓ N ) ≤ E m ( B Np , ℓ N ) ≤ C sup x ∈ B Np σ s ( x ) ≤ Cs /p − . The lower bound (2.1) for the Gelfand width then yields, for some constant c , c min n , ln( eN/m ) m o ≤ s . We derive either s ≤ /c or m ≥ cs ln( eN/m ). In short, if s > /c , then ℓ -stability implies m ≥ cs ln( eN/m ) — which can be shown to imply in turn m ≥ c ′ s ln( eN/s ). We provide belowa direct argument that removes the restriction s > /c . It uses Lemma 2.3 and works also for ℓ p -stability with p <
1. It borrows ideas from a paper by Do Ba et al. [15, Thm. 3.1], whichcontains the case p = 1 in a stronger non-uniform version.9 heorem 2.7. Let
N, m, s ∈ N with m, s < N . Suppose that a measurement matrix A ∈ R m × N and a reconstruction map ∆ : R N → R m are stable in the sense that, for all x ∈ R N , k x − ∆( Ax ) k pp ≤ Cσ s ( x ) pp for some constant C > and some < p ≤ . Then there exists a constant C ′ > dependingonly on C such that m ≥ C ′ p s ln( eN/s ) . Proof:
We consider again a family U of subsets of [ N ] given by Lemma 2.3. For each I ∈ U ,we define an s -sparse vector x I with k x I k p = 1 as in (2.4). With ρ := (2( C + 1)) − /p , we claimthat { A ( x I + ρB Np ) , I ∈ U } is a disjoint collection of subsets of A ( R N ), which has algebraicdimension r ≤ m . Suppose indeed that there exist I, J ∈ U with I = J and z, z ′ ∈ ρB Np suchthat A ( x I + z ) = A ( x J + z ′ ). A contradiction follows from1 < k x I − x J k pp ≤ k x I + z − ∆( A ( x I + z )) k pp + k x J + z ′ − ∆( A ( x J + z ′ )) k pp + k z k pp + k z ′ k pp ≤ Cσ s ( x I + z ) pp + Cσ s ( x J + z ′ ) pp + k z k pp + k z ′ k pp ≤ C k z k pp + C k z ′ k pp + k z k pp + k z ′ k pp ≤ C + 1) ρ p = 1 . We now observe that the collection { A ( x I + ρB Np ) , I ∈ U } is contained in (1 + ρ p ) /p A ( B Np ).As in the proof of Lemma 2.4, we use a standard volumetric argument to derive |U | ρ r vol (cid:0) A ( B Np ) (cid:1) = X I ∈U vol (cid:0) A ( x I + ρB Np ) (cid:1) ≤ vol (cid:0) (1 + ρ p ) /p A ( B Np ) (cid:1) = (1 + ρ p ) r/p vol (cid:0) A ( B Np ) (cid:1) . We deduce that (cid:16) N s (cid:17) s/ ≤ ( ρ − p + 1) r/p ≤ ( ρ − p + 1) m/p = (2 C + 3) m/p . Taking the logarithm on both sides yields m ≥ cps ln( N/ (4 s )) , with c := 1 / (2 ln(2 C + 3)) . Finally, noticing that m ≥ s because the pair ( A, ∆) allows exact s -sparse recovery, we obtain m ≥ cps ln( eN/s ) − cps ln(4 e ) ≥ cps ln( eN/s ) − c ln(4 e )2 m. The desired result follows with C ′ := (2 c ) / (2 + c ln(4 e )). In this section, we establish the upper bound in (1.3), hence the upper bound in (1.2) asa by-product. As already mentioned in the introduction, the bound for the Gelfand widthof ℓ p -balls was already provided by Vyb´ıral in [39], but the bound for the Gelfand width ofweak- ℓ p -balls is indeed new. Proposition 3.1.
For < p < and p < q ≤ , there exists a constant C p,q > such that d m ( B Np, ∞ , ℓ Nq ) ≤ C p,q min n , ln( eN/m ) m o /p − /q , m < N . (3.1)10he argument relies again on compressive sensing methods. According to Proposition 1.2,it is enough to establish the upper bound for the quantity E m ( B Np, ∞ , ℓ Nq ). This is done in thefollowing theorem, which we find rather illustrative because it shows that, even when p <
1, anoptimal reconstruction map ∆ for the realization of the number E m ( B Np, ∞ , ℓ Nq ) can be chosento be the ℓ -minimization mapping, at least when q ≥
1. The argument is originally due toDonoho for q = 2 [17, Proof of Theorem 9] and can be extended to all 2 ≥ q > p . Theorem 3.2.
For < p < and p < q ≤ , there exists a matrix A ∈ R m × N such that, with r = min { , q } , sup x ∈ B Np, ∞ k x − ∆ r ( Ax ) k q ≤ C p,q min n , ln( N/m ) + 1 m o /p − /q , where C p,q > is a constant that depends only on p and q . Proof:
Let C be the constant in (1.8) relative to the RIP associated with δ = 1 /
3, say. Wechoose a constant
D > D/ > e, D/
21 + ln( D/ > C . We are going to prove that, for any x ∈ B Np, ∞ , k x − ∆ r ( Ax ) k q ≤ C ′ p,q min n , D ln( eN/m ) m o /p − /q (3.2)for some constant C ′ p,q >
0. This will imply the desired result with C p,q := C ′ p,q D /p − /q . Case 1: m > D ln( eN/m ).We define s ≥ m/ ( D ln( eN/m )), so that m D ln( eN/m ) ≤ s < mD ln( eN/m ) . (3.3)Putting t = 2 s and noticing that t/m < /D < /e and that u u ln( u ) is decreasing on[0 , /e ], we obtain m > D t ln( eN/m ) = D t ln( eN/t ) + D m ( t/m ) ln( t/m ) > D t ln( eN/t ) − m ln( D/ , so that m > D/
21 + ln( D/ t ln( eN/t ) > C t ln( eN/t ) . It is then possible to find a matrix A ∈ R m × N with δ t ( A ) ≤ δ . In particular, we have δ s ( A ) ≤ δ . Now, given v := x − ∆ r ( Ax ) ∈ ker A , we decompose [ N ] as the disjoint union ofsets S , S , S , . . . of size s (except maybe the last one) in such a way that | v i | ≥ | v j | for all i ∈ S k − , j ∈ S k , and k ≥
2. This easily implies (cid:0) k v S k k /s (cid:1) / ≤ (cid:0) k v S k − k rr /s (cid:1) /r , i.e., k v S k k ≤ s /r − / k v S k − k r , k ≥ . (3.4)11sing the r -triangle inequality, we have k v k rq = (cid:13)(cid:13)(cid:13) X k ≥ v S k (cid:13)(cid:13)(cid:13) rq ≤ X k ≥ k v S k k rq ≤ X k ≥ (cid:0) s /q − / k v S k k (cid:1) r ≤ X k ≥ (cid:16) s /q − / √ − δ k Av S k k (cid:17) r . The fact that v ∈ ker A implies Av S = − P k ≥ Av S k . It follows that k v k rq ≤ (cid:16) s /q − / √ − δ (cid:17) r (cid:16) X k ≥ k Av S k k (cid:17) r + (cid:16) s /q − / √ − δ (cid:17) r X k ≥ k Av S k k r ≤ (cid:16) s /q − / √ − δ (cid:17) r X k ≥ k Av S k k r ≤ (cid:16)r δ − δ s /q − / (cid:17) r X k ≥ k v S k k r . We then derive, using the inequality (3.4), k v k rq ≤ (cid:16)r δ − δ s /r − /q (cid:17) r X k ≥ k v S k k rr . In view of the choice δ = 1 / k x − ∆ r ( Ax ) k q ≤ /r √ (cid:16) D ln( eN/m ) m (cid:17) /r − /q k x − ∆ r ( Ax ) k r . (3.5)Moreover, in view of δ s ≤ / C > k x − ∆ r ( Ax ) k r ≤ C /r σ s ( x ) r . (3.6)Finally, using (1.6) and (3.3), we have σ s ( x ) r ≤ D p,r s /p − /r ≤ D p,r (cid:16) D ln( eN/m ) m (cid:17) /p − /r . (3.7)Putting (3.5), (3.6), and (3.7) together, we obtain, for any x ∈ B Np, ∞ , k x − ∆ r ( Ax ) k q ≤ C ′′ p,q (cid:16) D ln( eN/m ) m (cid:17) /p − /q = C ′′ p,q min n , D ln( eN/m ) m o /p − /q , where C ′′ p,q := C /r D p,r /r +1 / /p − /q . Case 2: m ≤ D ln( eN/m ).We simply choose the matrix A ∈ R m × N as A = 0. Then, for any x ∈ B Np, ∞ , we have k x − ∆ r ( Ax ) k q = k x k q ≤ C ′′′ p,q k x k p, ∞ ≤ C ′′′ p,q , for some constant C ′′′ p,q >
0. This yields k x − ∆ r ( Ax ) k q ≤ C ′′′ p,q min n , D ln( eN/m ) m o /p − /q . Both cases show that (3.2) is valid with C ′ p,q := max { C ′′ p,q , C ′′′ p,q } . This completes the proof. Remark 3.3.
The case p = 1 , for which r = 1 , is not covered by our arguments. Since sup x ∈ B N , ∞ σ s ( x ) ≍ log( N/s ) the quantity σ s ( x ) cannot be bounded by a constant times k x k , ∞ in order to obtain (3.7) . Instead, the additional log-factor log( N/m ) appears on the right-handside and therefore in the upper estimate of (1.3) in the case p = 1 . The correct behaviorof the Gelfand widths of weak- ℓ -balls does not seem to be known. Nonetheless, the inequality σ s ( x ) ≤ k x k is always true. This yields the well-known upper estimate for the Gelfand widthsof ℓ -balls and hence completes the proof of Theorem 1.1. eferences [1] R. G. Baraniuk, M. Davenport, R. A. DeVore, and M. Wakin. A simple proof of therestricted isometry property for random matrices. Constr. Approx. , 28(3):253–263, 2008.[2] S. Boyd and L. Vandenberghe.
Convex Optimization.
Cambridge Univ. Press, 2004.[3] H. Buhrman, P. Miltersen, J. Radhakrishnan, and S. Venkatesh. Are bitvectors optimal?In
STOC ’00: Proceedings of the Thirty-second Annual ACM Symposium on Theory ofComputing , pages 449–458. ACM, 2000.[4] E. J. Cand`es. Compressive sampling. In
Proceedings of the International Congress ofMathematicians , 2006.[5] E. J. Cand`es. The restricted isometry property and its implications for compressed sensing.
C. R. Acad. Sci. Paris S´er. I Math. , 346:589–592, 2008.[6] E. J. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal recon-struction from highly incomplete frequency information.
IEEE Trans. Inform. Theory ,52(2):489–509, 2006.[7] E. J. Cand`es, J. Romberg, and T. Tao. Stable signal recovery from incomplete andinaccurate measurements.
Comm. Pure Appl. Math. , 59(8):1207–1223, 2006.[8] E. J. Cand`es and T. Tao. Near optimal signal recovery from random projections: universalencoding strategies?
IEEE Trans. Inform. Theory , 52(12):5406–5425, 2006.[9] B. Carl. Entropy numbers, s-numbers, and eigenvalue problems.
J. Funct. Anal. , 41:290–306, 1981.[10] B. Carl and A. Pajor. Gel’fand numbers of operators with values in a Hilbert space.
Invent. Math. , 94(3):479–504, 1988.[11] B. Carl and I. Stephani.
Entropy, compactness and the approximation of operators.
Cam-bridge University Press, 1990.[12] R. Chartrand and V. Staneva. Restricted isometry properties and nonconvex compressivesensing.
Inverse Problems , 24(035020):1–14, 2008.[13] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by Basis Pursuit.
SIAM J. Sci. Comput. , 20(1):33–61, 1999.[14] A. Cohen, W. Dahmen, and R. A. DeVore. Compressed sensing and best k -term approx-imation. J. Amer. Math. Soc. , 22(1):211–231, 2009.[15] K. Do Ba, P. Indyk, E. Price, and D. Woodruff. Lower bounds for sparse recovery. In
Proc. SODA , 2010.[16] D. L. Donoho. Neighborly polytopes and sparse solutions of underdetermined linear equa-tions. preprint , 2005.[17] D. L. Donoho. Compressed sensing.
IEEE Trans. Inform. Theory , 52(4):1289–1306, 2006.1318] S. Foucart and M. Lai. Sparsest solutions of underdetermined linear systems via ℓ q -minimization for 0 < q ≤ Appl. Comput. Harmon. Anal. , 26(3):395–407, 2009.[19] A. Y. Garnaev and E. D. Gluskin. On widths of the Euclidean ball.
Sov. Math., Dokl. ,30:200–204, 1984.[20] E. D. Gluskin. On some finite-dimensional problems in the theory of widths.
Vestn.Leningr. Univ., Math. , 14:163–170, 1982.[21] E. D. Gluskin. Norms of random matrices and widths of finite-dimensional sets.
Math.USSR-Sb. , 48:173–182, 1984.[22] R. Graham and N. Sloane. Lower bounds for constant weight codes.
IEEE Trans. Inform.Theory , 26(1):37–43, 1980.[23] R. Gribonval and M. Nielsen. Sparse representations in unions of bases.
IEEE Trans.Inform. Theory , 49(12):3320–3325, 2003.[24] R. Gribonval and M. Nielsen. Highly sparse representations from dictionaries are uniqueand independent of the sparseness measure.
Appl. Comput. Harmon. Anal. , 22(3):335–355,2007.[25] O. Gu´edon and A. E. Litvak. Euclidean projections of a p -convex body. In GeometricAspects of Functional Analysis , volume 1745 of
Lecture Notes in Math.
Springer, Berlin,2000.[26] B. S. Kashin. Diameters of some finite-dimensional sets and classes of smooth functions.
Math. USSR, Izv. , 11:317–333, 1977.[27] T. K¨uhn. A lower estimate for entropy numbers.
J. Approx. Theory , 110(1):120–124,2001.[28] N. Linial and I. Novik. How neighborly can a centrally symmetric polytope be?
Discr.Comput. Geom. , 36(2):273–281, 2006.[29] G. Lorentz, M. von Golitschek, and Y. Makovoz.
Constructive Approximation: AdvancedProblems , volume 304 of
Grundlehren der Mathematischen Wissenschaften . Springer,Berlin, 1996.[30] S. Mendelson, A. Pajor, and M. Rudelson. The geometry of random {− , } -polytopes. Discr. Comput. Geom. , 34(3):365–379, 2005.[31] S. Mendelson, A. Pajor, and N. Tomczak-Jaegermann. Uniform uncertainty principle forBernoulli and subgaussian ensembles.
Constr. Approx. , 28(3):277–289, 2009.[32] N. Noam and W. Avi. Hardness vs randomness.
Journal of Computer and System Sciences ,49(2):149 – 167, 1994.[33] E. Novak. Optimal recovery and n -widths for convex classes of functions. J. Approx.Theory , 80(3):390–408, 1995.[34] E. Novak and H. Wo´zniakowski.
Tractability of Multivariate Problems. Vol. I. LinearInformation.
EMS, Z¨urich, 2008. 1435] A. Pinkus. n -widths in Approximation Theory . Springer-Verlag, Berlin, 1985.[36] A. Pinkus. N-widths and optimal recovery. , pages 51–66. Proc. Symp. Appl. Math. 36.Lect. Notes AMS Short Course edition, 1986.[37] H. Rauhut. Compressive sensing and structured random matrices. In
Theoretical Foun-dations and Numerical Methods for Sparse Recovery , Radon Series Comp. Appl. Math.deGruyter, in preparation.[38] C. Sch¨utt. Entropy numbers of diagonal operators between symmetric Banach spaces.
J.Approx. Theory , 40(2):121–128, 1984.[39] J. Vyb´ıral. Widths of embeddings in function spaces.