A rigorous geometry-probability equivalence in characterization of ℓ 1 -optimization
aa r X i v : . [ c s . I T ] M a r A rigorous geometry-probability equivalence in characterization of ℓ -optimization M IHAILO S TOJNIC
School of Industrial EngineeringPurdue University, West Lafayette, IN 47907e-mail: [email protected]
Abstract
In this paper we consider under-determined systems of linear equations that have sparse solutions. Thissubject attracted enormous amount of interest in recent years primarily due to influential works [8, 16]. In astatistical context it was rigorously established for the first time in [8, 16] that if the number of equations issmaller than but still linearly proportional to the number of unknowns then a sparse vector of sparsity alsolinearly proportional to the number of unknowns can be recovered through a polynomial ℓ -optimizationalgorithm (of course, this assuming that such a sparse solution vector exists). Moreover, the geometricapproach of [16] produced the exact values for the proportionalities in question. In our recent work [43] weintroduced an alternative statistical approach that produced attainable values of the proportionalities. Thosehappened to be in an excellent numerical agreement with the ones of [16]. In this paper we give a rigorousanalytical confirmation that the results of [43] indeed match those from [16]. Index Terms: Linear systems; Neighborly polytopes; ℓ -optimization . The main concern of this paper is an analytical study of under-determined systems of linear equations thathave sparse solutions. To that end, let us assume that there is a k -sparse n dimensional vector x such that y = A x (1)for an m × n ( m < n ) statistical matrix A and an m × vector y (see Figure 1; here and in the rest of thepaper, under k -sparse vector we assume a vector that has at most k nonzero components; also, in the rest ofthe paper we will assume the so-called linear regime, i.e. we will assume that k = βn and that the numberof the equations is m = αn where α and β are constants independent of n (more on the non-linear regime,i.e. on the regime when m is larger than linearly proportional to k can be found in e.g. [12, 24, 25]). We thenlook at the inverse problem: given the A and y from (1) can one then recover the k -sparse x in (1).There are of course many ways how one can attempt to recover the k -sparse x . If one has the freedom todesign A in parallel with designing the recovery algorithm then the results from [3,32,36] demonstrated thatthe techniques from coding theory (based on the coding/decoding of Reed-Solomon codes) can be employedto determine any k -sparse x in (1) for any < α ≤ and any β ≤ α in polynomial time. It is relatively easyto show that under the unique recoverability assumption β can not be greater than α . Therefore, as long asone is concerned with the unique recovery of k -sparse x in (1) in polynomial time the results from [3,32,36]1 NM = A xy Figure 1: Model of a linear system; vector x is k -sparseare optimal. The complexity of algorithms from [3, 32, 36] is roughly O ( n ) . In a similar fashion one can,instead of using coding/decoding techniques associated with Reed/Solomon codes, design matrix A and thecorresponding recovery algorithm based on the techniques related to the coding/decoding of Expander codes(see e.g. [30, 31, 47] and references therein). In that case recovering x in (1) is significantly faster for largedimensions n . Namely, the complexity of the techniques from e.g. [30, 31, 47] (or their slight modifications)is usually O ( n ) which is clearly for large n significantly smaller than O ( n ) . However, the techniques basedon coding/decoding of Expander codes usually do not allow for β to be as large as α .If one has no freedom in the choice of the matrix A (instead the matrix A is rather given to us) then therecovery problem (1) becomes NP-hard. The following two algorithms (and their different variations) arethen of special interest (and certainly have been the subject of an extensive research in recent years):1. Orthogonal matching pursuit - OMP Basis pursuit - ℓ -optimization. Under certain probabilistic assumptions on the elements of A it can be shown (see e.g. [35, 44, 45]) thatif m = O ( k log( n )) OMP (or slightly modified OMP) can recover x in (1) with complexity of recovery O ( n ) . On the other hand a stage-wise OMP from [22] recovers x in (1) with complexity of recovery O ( n log n ) . Somewhere in between OMP and BP are recent improvements CoSAMP (see e.g. [34]) andSubspace pursuit (see e.g. [13]), which guarantee (assuming the linear regime) that the k -sparse x in (1) canbe recovered in polynomial time with m = O ( k ) equations.In this paper we will focus on the second of the two above mentioned algorithms, i.e. we will focus onthe performance of ℓ -optimization. (Variations of the standard ℓ -optimization from e.g. [10, 11, 40]) aswell as those from [14, 23, 26–28, 39] related to ℓ q -optimization, < q < are possible as well.) Basic ℓ -optimization algorithm finds x in (1) by solving the following ℓ -norm minimization problemmin k x k subject to A x = y . (2)In seminal work [8], it was established that for any constant α ∈ (0 , and m = αn there is a constant β ∈ (0 , α ) and k = βn such that the solution of (2) is with overwhelming probability the k -sparse x in(1) (moreover, this remains true for any k -sparse x ). (Under overwhelming probability we in this paperassume a probability that is no more than a number exponentially decaying in n away from .) The resultsof [8] rested on having matrix A satisfy the restricted isometry property (RIP) which is only a sufficient condition for ℓ -optimization to produce the solution of (1) (more on RIP and its importance can be foundin e.g. [1, 4, 7, 9, 38]). 2nstead of characterizing the m × n matrix A through the RIP condition, in [15, 16] Donoho associatescertain polytope with the matrix A . Namely, [15, 16] consider polytope obtained by projecting the regular n -dimensional cross-polytope by A . It turns out that a necessary and sufficient condition for (2) to producethe k -sparse solution of (1) is that this polytope associated with the matrix A is k -neighborly [15, 16, 18,19]. Using the results of [2, 6, 33, 37, 41, 46], it is further shown in [16], that if A is a random m × n ortho-projector matrix then with overwhelming probability polytope obtained projecting the standard n -dimensional cross-polytope by A is k -neighborly. The precise relation between m and k in order for this tohappen is characterized in [15, 16] as well.It should be noted that one usually considers success of (2) in recovering any given k -sparse x in (1).It is also of interest to consider success of (2) in recovering almost any given x in (1). We below make adistinction between these cases and recall on some of the definitions from [16, 18, 20, 21, 42, 43].Clearly, for any given constant α ≤ there is a maximum allowable value of β such that for any given k -sparse x in (1) the solution of (2) is exactly that given k -sparse x with overwhelming probability. Wewill refer to this maximum allowable value of β as the strong threshold (see [16]). Similarly, for any givenconstant α ≤ and any given x with a given fixed location of non-zero components and a given fixedcombination of its elements signs there will be a maximum allowable value of β such that (2) finds thatgiven x in (1) with overwhelming probability. We will refer to this maximum allowable value of β as the weak threshold and will denote it by β w (see, e.g. [42, 43]). In this paper we will provide a rigorous proofthat β w one can determine through Donoho’s framework from [16] is exactly the same as β w determinedin [43].We organize the rest of the paper in the following way. In Section 2 we will first recall on the basicingredients of the analysis done in [16]. Using the insights from [43] we will then give a closed formula for β w computed in [16]. As hinted above, this formula will match the one computed in [43]. In Section 3 wewill then specialize the results from Section 2 to the case when the nonzero components of sparse vector x in (1) are positive (or in general with a priori known signs). Using again the insights from [43] we will thengive a closed formula for β w computed for this case in [18]. This formula will match the corresponding onecomputed in [43]. Finally, in Section 4 we discuss obtained results. x ℓ and neighborliness of projected cross-polytope In this section we show that the weak thresholds obtained in [43] are the same as the ones obtained in [16].To that end, we start by recalling on the basics of the analysis from [15, 16]. In his, now legendary, paper[15] Donoho took a geometric approach to the performance analysis of ℓ -optimization and managed toconnect the performance analysis of ℓ -optimization to the concepts of polytope’s neighborliness. The mainrecognition went along the following lines: 1) Let C np be the regular n -dimensional cross-polytope and letthe AC np be the polytope one obtains after projecting C np by A ; 2) Then the solution of (2) will be exactlythe k -sparse solution of (1) if and only if polytope AC np is centrally k -neighborly (more on the definitions,importance, and many incredible properties of neighborliness can be found in e.g. [16, 29]). Here we justbriefly recall on the basic definitions of neighborliness and central-neighborliness from [16]. Namely, apolytope is k -neighborly if its every k + 1 vertices span its a k dimensional face. On the other hand apolytope is centrally k -neighborly if its every k + 1 vertices that do not include any antipodal pair span its a k dimensional face.The above characterization then enables one to replace studying the success of ℓ -optimization in solvingan under-determined system by studying the neighborliness of projected cross-polytopes. Of course, a priori,it is not really clear that the latter problem is any easier then the former one. However, it turns out that ithas been explored to some extent in the literature on the geometry of random high-dimensional polytopes.3sing the “sum of angles” result from [2] (which at its core relies on [33, 41]) it was established in [16] thatif A is a random ortho-projector AC np will be centrally k -neighborly with overwhelming probability if n − log( C com C int ( T k , T m ) C ext ( F m , C np )) < (3)where C com = 2 m − k (cid:0) n − k − m − k (cid:1) , C int ( T k , T m ) is the internal angle at face T k of T m , C ext ( F m , C np ) is theexternal angle of C np at any m -dimensional face F m , and T k and T m are the standard k and m dimensionalsimplices, respectively (more on the definitions and meaning of the internal and external angles can be foundin e.g. [29]). Donoho then proceeded by establishing that (3) is equivalent to the following inequality relatedto the sum/difference of the exponents of C com , C int , and C ext : Ψ net = Ψ com − Ψ int − Ψ ext < (4)where Ψ com = n − log( C com ) = ( α − β ) log(2) + (1 − β ) H ( α − β − β )Ψ int = n − log( C int ( T k , T m ))Ψ ext = n − log( C ext ( F m , C np )) (5)and H ( p ) = − p log( p ) − (1 − p ) log(1 − p ) is the standard entropy function and log (cid:0) npn (cid:1) = e nH ( p ) is thestandard approximation of the binomial factor by the entropy function in the limit of n → ∞ . The rest ofthe Donoho’s approach is the analysis of the closed form expressions for C int ( T k , T m ) and C ext ( F m , C np ) obtained/analyzed in various forms in [6, 37, 38]. In the following two subsections we will separately con-sider results Donoho established for the internal and the external angle exponents. Relying on the insightsfrom [43] we will provide neat characterizations of the exponents that will eventually help us establish theequivalence of results from [16] and [43]. Starting from the explicit formulas for internal angles given in [6] Donoho in [16] through a saddle-pointintegral computation established the following procedure for determining the exponent of the internal angle Ψ int . Let γ = βα and for s ≥ s ) = 1 √ π Z ∞ s e − x dxφ ( s ) = 1 √ π e − s . (6)Then one has Ψ int ( β, α ) = ( α − β ) ξ γ ( y γ ) + ( α − β ) log(2) (7)where y γ = γ − γ s γ ξ γ ( y γ ) = − y γ − γγ −
12 log( 2 π ) + log( y γ γ ) (8)4nd s γ ≥ is the solution of Φ( s ) = (1 − γ ) φ ( s ) s . (9)Now, if one can determine s γ then a combination of (7) and (8) would give a convenient closed formexpression for the exponent Ψ int ( β, α ) . Finding s γ amounts to nothing but solving (9) over s which for anunknown γ could be incredibly hard. At this point we will make a “bold” guess and say that t = − α − β and s γ = √ erfinv ( t ) = √ erfinv ( 1 − α − β ) (10)where erfinv ( · ) is the inverse of the error function erf ( · ) associated with the standard normal random variable(erf ( r ) = √ π R r e − q dq ). Of course, it is rather hard to believe that s γ from (10) will be the solution of (9)for every γ = βα . However, what we hope is that it may be the solution of (9) for the optimal γ = β w α , i.e forthe one for which the net exponent, Ψ net , in (4) is zero (strictly speaking, instead of “zero” we should say“smaller than − ǫ where ǫ > is arbitrarily small”; in an effort to make writing and main ideas clearer wewill throughout the rest of the paper almost always ignore ǫ ’s). Even a hope like this is fairly out of the blueand it would require an enormous amount of intuition for one to come up with a hopeful guess like the onefrom (10) just by staring at equations (7-9) and not knowing the results of [43].Now what is left to do is to confirm that our guess is actually right. We start by noting that Φ( s ) = (1 − erf ( s √ )) . If (10) is to be correct then to satisfy (9) one must have
12 (1 − t ) = (1 − γ ) φ ( s ) s = (1 − γ ) 1 √ π e − ( erfinv ( t )) √ erfinv ( t ) or in a more convenient algebraic form − βα r π e − ( erfinv ( − α − β )) √ erfinv ( − α − β ) = 1 . (11)If it eventually turns out that for α and β for which (11) holds one also has that Ψ net in (4) is zero then wecould claim that the guess we made for s γ in (10) is actually correct.We now proceed with the evaluation of the “internal exponent” Ψ int assuming that both (10) and (11)are correct. Plugging (10) back in (8) we obtain y γ = γ − γ s γ = βα − β √ erfinv ( 1 − α − β ) . (12)Combining (8) and (12) further we have ξ γ ( y γ ) = − y γ − γγ −
12 log( 2 π ) + log( y γ γ )= − βα − β ( √ erfinv ( 1 − α − β )) −
12 log( 2 π ) + log( αα − β ) + log( √ erfinv ( 1 − α − β )) . (13)5inally plugging ξ γ ( y γ ) computed in (13) back in (7) we have for the exponent of the internal angle Ψ int = − β ( √ erfinv ( 1 − α − β )) − α − β π ) + ( α − β ) log( α ) − ( α − β ) log( α − β ) + ( α − β ) log( √ erfinv ( 1 − α − β )) + ( α − β ) log(2) . (14) In this subsection we provide results for the external angle that are analogous to those provided in theprevious subsection for the internal angle. In [16] Donoho established that the exponent of the externalangle can be computed in the following way Ψ ext ( β, α ) = min y ≥ ( αy − (1 − α ) log( erf ( y ))) . (15)It was further shown in [16] that function ( αy − (1 − α ) log( erf ( y ))) is smooth and convex. If one couldsolve the above minimization analytically then there would be a neat expression for the exponent of theexternal angle. As in the previous section, solving this minimization does not appear as an easy task forany fixed α ( β ). However, we will again take a “bold” guess and assume that the solution of the aboveminimization is y ext = erfinv ( 1 − α − β ) . (16)It is of course unreasonable to expect that this choice of y would be the solution of the minimization problemin (15) for every given α . However, we do hope that it could be the solution for the optimal pair ( α, β ) (asstated above, the optimal pair ( α, β ) is the one that makes the net exponent Ψ net in (4) equal to zero). If y ext defined above is to be the solution of the minimization problem in (15) for the optimal pair ( α, β ) thenat the very least one has to have that d ( αy − (1 − α ) log( erf ( y ))) dy | y = y ext = 0 . (17)We proceed with checking whether (17) indeed holds. To that end we have: d ( αy − (1 − α ) log( erf ( y ))) dy | y = y ext = (2 αy − − α erf ( y ) d erf ( y ) dy ) | y = y ext = 2 α erfinv ( 1 − α − β ) − (1 − β ) 2 √ √ π e − y | y = y ext = √ α erfinv ( 1 − α − β ) − (1 − β ) r π e − ( erfinv ( − α − β )) = 0 (18)where the last equality follows by our assumption that ( α, β ) are optimal and therefore satisfy (11). Essen-tially, (18) shows that if (11) is correct then (16) is correct as well.Combination of (15) and (16) then gives us the following convenient characterization of the “externalexponent” Ψ ext : Ψ ext = αy ext − (1 − α ) log( erf ( y ext ))) = α ( erfinv ( 1 − α − β )) − (1 − α ) log( 1 − α − β ) . (19)6 .4 Net exponent In this section we combine the expressions for the “internal” and “external” exponents obtained in (14) and(19), respectively, with the expression for the “combinatorial” exponent given in (5). Before proceeding fur-ther with this exponent combination we first slightly modify the expression for the combinatorial exponentgiven in (5). Ψ com = ( α − β ) log(2) + (1 − β ) H ( α − β − β )= ( α − β ) log(2) + (1 − β )( − α − β − β log( α − β − β ) − ( 1 − α − β ) log( 1 − α − β ))= ( α − β ) log(2) − ( α − β ) log( α − β − β ) − (1 − α ) log( 1 − α − β ) (20)Plugging the results from (14), (19), and (20) back in (4) one has Ψ net = Ψ com − Ψ int − Ψ ext = ( α − β ) log(2) − ( α − β ) log( α − β − β ) − (1 − α ) log( 1 − α − β ) − ( − β ( √ erfinv ( 1 − α − β )) − α − β π )+ ( α − β ) log( α ) − ( α − β ) log( α − β ) + ( α − β ) log( √ erfinv ( 1 − α − β )) + ( α − β ) log(2)) − ( α ( erfinv ( 1 − α − β )) − (1 − α ) log( 1 − α − β )) . (21)After canceling all terms that can be canceled one finally has Ψ net = ( α − β ) log( 1 − βα ) + α − β π ) − ( α − β ) log( √ erfinv ( 1 − α − β )) − ( α − β )( erfinv ( 1 − α − β )) = ( α − β )(log( 1 − βα ) + log( r π ) − log( √ erfinv ( 1 − α − β )) + log( e − ( erfinv ( − α − β )) )= ( α − β ) log( 1 − βα r π e − ( erfinv ( − α − β )) √ erfinv ( − α − β ) )= 0 (22)where the last equality follows by assumption (11). Since we obtained that Ψ net = 0 and we never contra-dicted assumption (11), the assumption must be correct. To be completely rigorous one should add that if an α is given and β w is such that pair ( α, β w ) satisfies (11) then for any β < β w AC np is centrally βn -neighborly(i.e. one needs β to be strictly less than β w because (4) asserts that one actually needs Ψ net < ).We summarize the results from this section in the following theorem. Theorem 1. (Geometry-probability equivalence — General x ) Let A in (1) be an m × n ortho-projector (oran m × n matrix with the null-space uniformly distributed in the Grassmanian). Let k, m, n be large andlet α = mn and β w = kn be constants independent of m and n . Let erfinv be the inverse of the standard errorfunction associated with zero-mean unit variance Gaussian random variable. Further, let α and β w be suchthat − β w α r π e − ( erfinv ( − α − βw )) √ erfinv ( − α − β w ) = 1 . (23)7 hen with overwhelming probability polytope AC np will be centrally βn -neighborly for any β < β w .Further, let the unknown x in (1) be k -sparse and let the location and signs of nonzero elements of x bearbitrarily chosen but fixed. Then, as shown in [43], for any β < β w one with overwhelming probabilityhas that the solution of (2) is exactly the βn -sparse x in (1).Proof. Follows from the previous discussion through a combination of (11), (22), and the main resultsof [43].The results for the weak threshold obtained from the above theorem have been already plotted in [43]and as it was mentioned in [43], they were in an excellent numerical agreement with the ones obtainedin [15, 16] (for the completeness we present the results again in Figure 2). Finally, Theorem 1 rigorouslyestablishes that the agreement is not only numerical but also analytical and that the weak thresholds obtainedin [16] and [43] are indeed exactly equal to each other. α β / α Weak threshold, l −optimization Donoho Present paper
Figure 2:
Weak threshold, ℓ -optimization x ℓ and neighborliness of projected simplices In this section we consider a special case of (1). We will assume that the nonzero components of x in (1)are all of same sign (say they are all positive). If this is a priori known then instead of using (2) to recoverthe “nonnegative” x in (1) one can use (see, e.g. [18, 19, 43])min k x k subject to A x = yx i ≥ , ≤ i ≤ n. (24)8ince so to say more structure is imposed on x (and this structure is made known to the system’s solver)one would expect that the recoverable thresholds should be higher in this case than they were in the caseof “general” sparse vectors x . As demonstrated in [18, 19, 43] the thresholds are indeed higher. Moreover,although the results of [19] and [43] were obtained through completely different approaches as demonstratedin [43] they happened to be in an excellent numerical agreement. In this section we will rigorously showthat the agreement is not only numerical but also algebraic/analytical.Before proceeding further we quickly recall on and appropriately modify the definition of the weakthreshold. The definition of the weak threshold was already introduced in Section 1. However, that definitionwas suited for the recovery of general vectors x considered in the previous section. Here, we slightly modifyit so that it fits the scenario of a priori known sign patterns of nonzero elements of x . For any given constant α ≤ and any given x with a given fixed location of nonzero components and for which it is known that itsnonzero components are say positive there will be a maximum allowable value of β such that (24) finds thatgiven x in (1) with overwhelming probability. We will refer to this maximum allowable value of β as thenonnegative weak threshold and will denote it by β + w .The story again starts from Donoho’s classic [15]. In a follow-up [19] Donoho and Tanner made akey observation that a majority of what was done in [15] and was related to the regular n -dimensional cross-polytope would continue to hold in a slightly modified way if translated to the standard n -dimensional simplex . In a more mundane language, in [19] Donoho and Tanner took again a geometric approach but thistime to the performance analysis of ℓ -optimization from (24) and again managed to establish a connectionbetween the performance analysis of (24) and the concepts of polytope’s neighborliness. This time the mainrecognition went along slightly different lines: 1) Let T n be the standard n -dimensional simplex and let AT n be the polytope one obtains after projecting T n by A ; 2) Then the solution of (24) will be exactly the nonnegative k -sparse solution of (1) if and only if polytope AT n is k -neighborly. For completeness we justbriefly recall that a polytope is k -neighborly if its every k + 1 vertices span its a k dimensional face.As in the previous section the above characterization then enables one to replace studying the success of(24) in recovering the nonnegative sparse solution of an under-determined system by studying the neighbor-liness of the projected standard simplex. Of course, as earlier, it is not, a priori, clear that the latter problemis any easier then the former one. However, knowing the results of the previous section (and ultimately ofcourse those of [15]) one could now be tempted to believe that polytope type of characterization could bemanageable. As discovered in [18], it turns out that the neighborliness of randomly projected simplices hasbeen explored to some extent in the literature on the geometry of random high-dimensional polytopes. Asin the previous section, using the “sum of angles” result from [2, 33, 41] it was established in [18] that if A is a random ortho-projector AT n will be k -neighborly with overwhelming probability if n − log( C + com C + int ( T k , T m ) C + ext ( F m , T n )) < (25)where C + com = (cid:0) n − k − m − k (cid:1) , C + int ( T k , T m ) is the internal angle at face T k of T m , C + ext ( T m , T n − ) is theexternal angle of T n − at face T m , and T k , T m , and T n − are the standard k , m , and ( n − dimensionalsimplices, respectively. The authors in [18] then proceeded by establishing that (25) is equivalent to thefollowing inequality related to the sum/difference of the exponents of C + com , C + int , and C + ext : Ψ + net = Ψ + com − Ψ + int − Ψ + ext < (26)9here Ψ + com = n − log( C + com ) = (1 − β ) H ( α − β − β )Ψ + int = n − log( C + int ( T k , T m ))Ψ + ext = n − log( C + ext ( T m , T n − )) (27)and as earlier H ( p ) = − p log( p ) − (1 − p ) log(1 − p ) is the standard entropy function and log (cid:0) npn (cid:1) = e nH ( p ) is the standard approximation of the binomial factor by the entropy function in the limit of n → ∞ .The rest of the approach from [18] is the analysis of the closed form expressions for C + int ( T k , T m ) and C + ext ( T m , T n − ) obtained/analyzed in various forms in [6, 37, 38]. In the following two subsections we willseparately consider results Donoho and Tanner established for the internal and the external angle exponents.Relying on the insights from [43] we will provide convenient characterizations of the exponents that willeventually help us establish the rigorous equivalence of results from [18] and [43]. x Just by simply looking at formulas (4) and (26) one can hardly see any difference between the definition ofthe external angle that we in this section and the one that we had in the previous section. The definitionsare indeed the sam and the angle’s exponents are indeed the same. However, characterizations that we willprovide will differ.To that end we recall on the procedure for determining the exponent of the internal angle Ψ + int (theprocedure is of course the same as the one from the previous section and is ultimately the one introducedin [16]). As earlier, let γ = βα and for s ≥ let Φ( s ) = 1 √ π Z ∞ s e − x dxφ ( s ) = 1 √ π e − s . (28)Then one has Ψ + int ( β, α ) = ( α − β ) ξ + γ ( y γ ) + ( α − β ) log(2) (29)where y + γ = γ − γ s + γ ξ + γ ( y + γ ) = −
12 ( y + γ ) − γγ −
12 log( 2 π ) + log( y + γ γ ) (30)and s + γ ≥ is the solution of Φ( s ) = (1 − γ ) φ ( s ) s . (31)As earlier, if one can determine s + γ then a combination of (29) and (30) would give a convenient closed formexpression for the exponent Ψ + int ( β, α ) . Finding s + γ is equivalent to solving (31) over s which for a generic γ seems to be possibly only numerically. As we have done in the previous section, we will at this point10ake again a guess and say that t + = (2 − α − β − and s + γ = √ erfinv ( t + ) = √ erfinv (2 1 − α − β − (32)where as earlier erfinv ( · ) is the inverse of the error function erf ( · ) associated with the standard normalrandom variable (i.e erfinv ( · ) is the inverse of erf ( r ) = √ π R r e − q dq ). Of course, as it was the case earlier, s + γ from (32) will not be the solution of (31) for every γ = βα . However, we will again hope that it maybe the solution of (31) for the optimal γ = β + w α , i.e for the one for which the net exponent, Ψ + net , in (26) iszero (again, strictly speaking, instead of “zero” we should say “smaller than − ǫ where ǫ > is arbitrarilysmall”). This hope does not seem any more likely to succeed than the one we made in the previous sectionunless one is aware of the results of [43].As it was the case in the previous section, we proceed by trying to confirm that our guess is actuallyright. We again start by noting that Φ( s ) = (1 − erf ( s √ )) . If (32) is to be correct then to satisfy (31) onemust have
12 (1 − t ) = (1 − γ ) φ ( s ) s = (1 − γ ) 1 √ π e − ( erfinv ( t )) √ erfinv ( t ) or in a more convenient algebraic form − βα r π e − ( erfinv (2 − α − β − √ erfinv (2 − α − β −
1) = 1 . (33)Our logic from the previous section still remains in place. Namely, if it eventually turns out that for α and β for which (33) holds one also has that Ψ + net in (4) is zero then we could claim that the guess we made for s + γ in (32) is actually correct.We now proceed with the evaluation of the “internal exponent” Ψ + int assuming that both (32) and (33)are correct. Plugging (32) back in (30) we obtain y + γ = γ − γ s + γ = βα − β √ erfinv (2 1 − α − β − . (34)Further combination of (30) and (34) gives ξ + γ ( y + γ ) = −
12 ( y + γ ) − γγ −
12 log( 2 π ) + log( y + γ γ )= − βα − β ( √ erfinv (2 1 − α − β − −
12 log( 2 π ) + log( αα − β ) + log( √ erfinv (2 1 − α − β − . (35)Finally plugging ξ + γ ( y + γ ) computed in (35) back in (29) we have for the exponent of the internal angle Ψ + int = − β ( √ erfinv (2 1 − α − β − − α − β π ) + ( α − β ) log( α ) − ( α − β ) log( α − β ) + ( α − β ) log( √ erfinv (2 1 − α − β − α − β ) log(2) . (36)11 .3 External angle In this subsection we provide the external angle counterparts to results provided in the previous subsectionfor the internal angle. In [18] Donoho and Tanner established that the exponent of the external angle can becomputed in the following way Ψ + ext ( β, α ) = min y ≥ ( αy − (1 − α ) log( 12 (1 + erf ( y )))) . (37)It was also shown in [18] that function ( (1 + erf ( y ))) is smooth and convex. As earlier, solving analyticallythe above minimization does not appear to be an easy task for a generic fixed α ( β ). However, we will againtake a guess and assume that the solution of the above minimization is y + ext = erfinv (2 1 − α − β − . (38)It is again of course unreasonable to expect that this choice of y would be the solution of the minimizationproblem in (37) for every given α . However, we do hope that it could be the solution for the optimal pair ( α, β ) (as stated above, the optimal pair ( α, β ) is the one that makes the net exponent Ψ + net in (26) equal tozero). If y + ext defined above is to be the solution of the minimization problem in (37) for the optimal pair ( α, β ) then at the very least one has to have that d ( αy − (1 − α ) log( (1 + erf ( y )))) dy | y = y + ext = 0 . (39)To check whether (39) holds or not we write: d ( αy − (1 − α ) log( (1 + erf ( y )))) dy | y = y + ext = (2 αy − − α (1 + erf ( y )) d erf ( y )2 dy ) | y = y + ext = 2 α erfinv (2 1 − α − β − − (1 − β ) √ √ π e − y | y = y + ext = √ √ α erfinv (2 1 − α − β − − (1 − β ) r π e − ( erfinv (2 − α − β − ) = 0 (40)where the last equality follows by our assumption that ( α, β ) are optimal and therefore satisfy (33). Since,(40) shows that (39) indeed holds one then has that if (33) is correct then (38) is correct as well.Combination of (37) and (38) then gives us the following convenient characterization of the “externalexponent” Ψ + ext : Ψ + ext = α ( y + ext ) − (1 − α ) log( 12 (1 + erf ( y + ext )))) = α ( erfinv (2 1 − α − β − − (1 − α ) log( 1 − α − β ) . (41) In this section we combine the expressions for the “internal” and “external” exponents obtained in (36) and(41), respectively, with the expression for the “combinatorial” exponent given in (27). Before proceeding12urther we first slightly modify the expression for the combinatorial exponent given in (27). Ψ + com = (1 − β ) H ( α − β − β )= (1 − β )( − α − β − β log( α − β − β ) − ( 1 − α − β ) log( 1 − α − β ))= − ( α − β ) log( α − β − β ) − (1 − α ) log( 1 − α − β ) (42)Plugging the results from (36), (41), and (42) back in (26) one has Ψ + net = Ψ + com − Ψ + int − Ψ + ext = − ( α − β ) log( α − β − β ) − (1 − α ) log( 1 − α − β ) − ( − β ( √ erfinv (2 1 − α − β − − α − β π )+ ( α − β ) log( α ) − ( α − β ) log( α − β ) + ( α − β ) log( √ erfinv (2 1 − α − β − α − β ) log(2)) − ( α ( erfinv (2 1 − α − β − − (1 − α ) log( 1 − α − β )) . (43)After canceling all terms that can be canceled one finally has Ψ + net = ( α − β )(log( 1 − βα ) + 12 log( 12 π ) − log( √ erfinv (2 1 − α − β − − ( erfinv (2 1 − α − β − )= ( α − β )(log( 1 − βα ) + log( r π ) − log( √ erfinv (2 1 − α − β − e − ( erfinv (2 − α − β − )= ( α − β ) log( 1 − βα r π e − ( erfinv (2 − α − β − √ erfinv (2 − α − β −
1) )= 0 (44)where the last equality follows by assumption (33). Since we obtained that Ψ + net = 0 and we never con-tradicted assumption (33), following the logic presented in the previous section, the assumption must becorrect. Again, to be completely rigorous one should add that if an α is given and β + w is such that pair ( α, β + w ) satisfies (33) then for any β < β + w AT n is βn -neighborly (i.e. one needs β to be strictly less than β + w because (26) asserts that one actually needs Ψ + net < ).We summarize the results from this section in the following theorem. Theorem 2. (Geometry-probability equivalence — Nonnegative x ) Let A in (1) be an m × n ortho-projector(or an m × n matrix with the null-space uniformly distributed in the Grassmanian). Let k, m, n be largeand let α = mn and β + w = kn be constants independent of m and n . Let erfinv be the inverse of the standarderror function associated with zero-mean unit variance Gaussian random variable. Further, let α and β + w be such that − β + w α r π e − ( erfinv (2 − α − β + w − √ erfinv (2 − α − β + w −
1) = 1 . (45) Then with overwhelming probability polytope AT n will be βn -neighborly for any β < β + w .Further, let the unknown x in (1) be k -sparse and nonnegative and let the location of nonzero compo- ents of x be arbitrarily chosen but fixed. Then, as shown in [43], for any β < β + w one with overwhelmingprobability has that the solution of (24) is exactly the nonnegative βn -sparse x in (1).Proof. Follows from the previous discussion through a combination of (33), (44), and the main resultsof [43].The results for the weak threshold obtained from the above theorem have been already plotted in [43]and as it was mentioned in [43], they were in an excellent numerical agreement with the ones obtainedin [18, 19] (for the completeness we present the results again in Figure 3). Finally, Theorem 2 rigorouslyestablishes that the agreement is not only numerical but also analytical and that the weak thresholds obtainedin [18] and [43] are indeed exactly equal to each other. α β / α Weak threshold, l −optimization, signed x Donoho, TannerPresent paper
Figure 3:
Weak threshold, ℓ -optimization; signed x In this paper we considered under-determined systems of linear equations with sparse solutions. We focusedon solving such systems via a classical polynomial-time ℓ -optimization algorithm. We also focused onrandom systems, i.e. on systems where the system matrix is random.Two different approaches, the geometric one from [16] and the probabilistic one from [43], were con-sidered. These approaches were known to provide characterizations of ℓ -optimization success that are inexcellent numerical agreement. Here we provided a rigorous proof that the recovery thresholds that one canobtain through one of these approaches are exactly the same as the ones that can be obtained through theother.We also showed that this remains true when one restricts to under-determined systems with nonnegativeand sparse solutions. Namely, we rigorously showed that the nonnegative recovery thresholds one can obtainthrough either of approaches [18] and [43] are exactly the same as the ones that can be obtained through theother. 14n interesting bonus is the following connection with the recent works [5, 17]. Namely, in [17] abelief propagation type of algorithm is put forth as a faster alternative to the standard ℓ -optimization. Itsperformance was analyzed through a state evolution formalism (which was later made rigorous in [5]) andthe recovery thresholds were computed. Moreover, it was shown in [17] that these thresholds are the sameas those computed in [43]. The result of this paper confirms that the sparsity recovery abilities of the beliefpropagation algorithm from [17] are exactly the same not only sa those from [43] but also as those from [16]and [18] (an overwhelming numerical evidence of this was of course already presented in [17]). References [1] R. Adamczak, A. E. Litvak, A. Pajor, and N. Tomczak-Jaegermann. Restricted isometry property ofmatrices with independent columns and neighborly polytopes by random sampling.
Preprint , 2009.available at arXiv:0904.4723.[2] F. Afentranger and R. Schneider. Random projections of regular simplices.
Discrete Comput. Geom. ,7(3):219–226, 1992.[3] M. Akcakaya and V. Tarokh. A frame construction and a universal distortion bound for sparse repre-sentations.
IEEE Trans. on Signal Processing , 56(6), June 2008.[4] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin. A simple proof of the restricted isometryproperty for random matrices.
Constructive Approximation , 28(3), 2008.[5] M. Bayati and A. Montanari. The dynamics of message passing on dense graphs, with applications tocompressed sensing.
Preprint . available online at arXiv:1001.3448.[6] K. Borocky and M. Henk. Random projections of regular polytopes.
Arch. Math. (Basel) , 73(6):465–473, 1999.[7] E. Candes. The restricted isometry property and its implications for compressed sensing.
CompteRendus de l’Academie des Sciences, Paris, Series I, 346 , pages 589–59, 2008.[8] E. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction fromhighly incomplete frequency information.
IEEE Trans. on Information Theory , 52:489–509, December2006.[9] E. Candes and T. Tao. Decoding by linear programming.
IEEE Trans. on Information Theory , 51:4203–4215, Dec. 2005.[10] E. Candes, M. Wakin, and S. Boyd. Enhancing sparsity by reweighted l1 minimization.
J. FourierAnal. Appl.
SIROCCO,13th Colloquium on Structural Information and Communication Complexity , pages 280–294, 2006.[13] W. Dai and O. Milenkovic. Subspace pursuit for compressive sensing signal reconstruction.
Preprint ,page available at arXiv:0803.0811, March 2008.1514] M. E. Davies and R. Gribonval. Restricted isometry constants where ell-p sparse recovery can fail for < p ≤ Disc. Comput. Geometry , 35(4):617–652, 2006.[17] D. Donoho, A. Maleki, and A. Montanari. Message-passing algorithms for compressed sensing.
Proc.National Academy of Sciences , 106(45):18914–18919, Nov. 2009.[18] D. Donoho and J. Tanner. Neighborliness of randomly-projected simplices in high dimensions.
Proc.National Academy of Sciences , 102(27):9452–9457, 2005.[19] D. Donoho and J. Tanner. Sparse nonnegative solutions of underdetermined linear equations by linearprogramming.
Proc. National Academy of Sciences , 102(27):9446–9451, 2005.[20] D. Donoho and J. Tanner. Thresholds for the recovery of sparse solutions via l minimization. Proc.Conf. on Information Sciences and Systems , March 2006.[21] D. Donoho and J. Tanner. Counting faces of randomly projected polytopes when the projection radi-cally lowers dimension.
J. Amer. Math. Soc. , 22:1–53, 2009.[22] D. L. Donoho, Y. Tsaig, I. Drori, and J.L. Starck. Sparse solution of underdetermined linear equationsby stagewise orthogonal matching pursuit. < q ≤ , 2006.[25] A. Gilbert, M. J. Strauss, J. A. Tropp, and R. Vershynin. One sketch for all: fast algorithms forcompressed sensing. ACM STOC , pages 237–246, 2007.[26] R. Gribonval and M. Nielsen. Sparse representations in unions of bases.
IEEE Trans. Inform. Theory ,49(12):3320–3325, December 2003.[27] R. Gribonval and M. Nielsen. On the strong uniqueness of highly sparse expansions from redundantdictionaries. In
Proc. Int Conf. Independent Component Analysis (ICA’04) , LNCS. Springer-Verlag,September 2004.[28] R. Gribonval and M. Nielsen. Highly sparse representations from dictionaries are unique and indepen-dent of the sparseness measure.
Appl. Comput. Harm. Anal. , 22(3):335–355, May 2007.[29] B. Grunbaum. Convex polytopes.
Springer-Verlag
IEEE Trans. on Signal Processing , 53(8):2788–2805, August 2005.[33] P. McMullen. Non-linear angle-sum relations for polyhedral cones and polytopes.
Math. Proc. Cam-bridge Philos. Soc. , 78(2):247–261, 1975.[34] D. Needell and J. A. Tropp. CoSaMP: Iterative signal recovery from incomplete and inaccurate sam-ples.
Applied and Computational Harmonic Analysis , 26(3):301–321, 2009.[35] D. Needell and R. Vershynin. Unifrom uncertainly principles and signal recovery via regularizedorthogonal matching pursuit.
Foundations of Computational Mathematics , 9(3):317–334, 2009.[36] F. Parvaresh and B. Hassibi. Explicit measurements with almost optimal thresholds for compressedsensing.
IEEE ICASSP , Mar-Apr 2008.[37] H. Ruben. On the geometrical moments of skew regular simplices in hyperspherical space; with someapplications in geometry and mathematical statistics.
Acta. Math. (Uppsala) , 103:1–23, 1960.[38] M. Rudelson and R. Vershynin. Geometric approach to error correcting codes and reconstruction ofsignals.
International Mathematical Research Notices , 64:4019 – 4041, 2005.[39] R. Saab, R. Chartrand, and O. Yilmaz. Stable sparse approximation via nonconvex optimization.
ICASSP, IEEE Int. Conf. on Acoustics, Speech, and Signal Processing , Apr. 2008.[40] V. Saligrama and M. Zhao. Thresholded basis pursuit: Quantizing linear programming solutions foroptimal support recovery and approximation in compressed sensing. 2008. available on arxiv.[41] L. A. Santalo. Geometria integral en espacios de curvatura constante.
Rep. Argentina Publ. Com. Nac.Energia Atomica, Ser. Mat. , 1952.[42] M. Stojnic. A simple performance analysis of ℓ -optimization in compressed sensing. ICASSP, Inter-national Conference on Acoustics, Signal and Speech Processing , April 2009.[43] M. Stojnic. Various thresholds for ℓ -optimization in compressed sensing. submitted to IEEE Trans.on Information Theory , 2009. available at arXiv:0907.3666.[44] J. Tropp and A. Gilbert. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. on Information Theory , 53(12):4655–4666, 2007.[45] J. A. Tropp. Greed is good: algorithmic results for sparse approximations.
IEEE Trans. on InformationTheory , 50(10):2231–2242, 2004.[46] A. M. Vershik and P. V. Sporyshev. Asymptotic behavior of the number of faces of random polyhedraand the neighborliness problem.
Selecta Mathematica Sovietica , 11(2), 1992.[47] W. Xu and B. Hassibi. Efficient compressive sensing with determinstic guarantees using expandergraphs.