Stability of low-rank matrix recovery and its connections to Banach space geometry
aa r X i v : . [ m a t h . F A ] J un STABILITY OF LOW-RANK MATRIX RECOVERY AND ITSCONNECTIONS TO BANACH SPACE GEOMETRY
JAVIER ALEJANDRO CH ´AVEZ-DOM´INGUEZ AND DENKA KUTZAROVA
Abstract.
There are well-known relationships between compressed sensing and the ge-ometry of the finite-dimensional ℓ p spaces. A result of Kashin and Temlyakov [20] canbe described as a characterization of the stability of the recovery of sparse vectors via ℓ -minimization in terms of the Gelfand widths of certain identity mappings between finite-dimensional ℓ and ℓ spaces, whereas a more recent result of Foucart, Pajor, Rauhut andUllrich [16] proves an analogous relationship even for ℓ p spaces with p <
1. In this paperwe prove what we call matrix or noncommutative versions of these results: we charac-terize the stability of low-rank matrix recovery via Schatten p -(quasi-)norm minimizationin terms of the Gelfand widths of certain identity mappings between finite-dimensionalSchatten p -spaces. Introduction
A mathematical problem that appears often in real-world situations is the following:we wish to recover a high-dimensional vector x ∈ R N from a measurement Ax where A : R N → R m is a linear map and m is smaller than N . As stated the problem of coursecannot be solved, but that changes if we have the additional condition that the unknownvector x is sparse , i.e. it has a small number of non-zero coordinates. This is the subjectmatter of compressed sensing, a very active area of research with numerous applications;the book [17] is a recent comprehensive reference. Formally, this sparse recovery problem can be stated as(1.1) min k x k subject to Ax = y, where k·k represents the number of nonzero coordinates of a vector. This is an NP-hard[27] and non-convex problem, so we are interested in conditions (especially on the map A )that would allow us to solve an easier problem and still arrive to the right solution. In thatspirit, a basic technique in compressed sensing is that of ℓ -minimization: if the vector x issparse enough, then minimizing k x ′ k ℓ over all vectors x ′ so that Ax ′ = Ax actually allowsus to recover x . Formally, instead of problem (1.1) we are considering its convex relaxation (1.2) min k x k ℓ subject to Ax = y. Aditionally, one can consider the analogous problem of ℓ p -minimization.In practice the unknown vectors are not necessarily sparse, but are close to sparse ones.Thus for any method of recovery it is of utmost importance to investigate its stability, thatis, having a control on the distance between the original vector and its reconstruction interms of the distance from the original vector to the sparse vectors. It turns out that the stability of sparse vector recovery through ℓ p -minimization has connections to the Banach-space geometry of finite-dimensional ℓ p -spaces. More generally, it is known that there areconnections between recovery – in particular the compressed sensing model – and geometricquantities called Gelfand widths , see e.g. [28, 10, 9, 20].In many practical situations, there is extra structure in the space of unknown vectors. Agood example is the famous matrix completion problem (also known as the
Netflix problem ),where the unknown is a matrix and the measurement map gives us a subset of its entries.In this case sparsity gets replaced by the more natural condition of having low rank, andthe last few years have witnessed an explosion of work in this area. In what follows, M N will denote the space of N × N real-valued matrices. We now consider a linear operator A : M N → R m , and a fixed vector y ∈ R m . The low-rank recovery problem can thus bestated as the problem of finding the solution to(1.3) min rank( X ) subject to A X = y. This is again an NP-hard problem, so once again we would like to replace it by another onewhich is simpler to solve but has the same solution.In noncommutative functional analysis the Schatten p -spaces are usually considered to bethe counterparts of the classical ℓ p spaces (recall that the Schatten p -norm of a matrix X isthe ℓ p -norm of its vector of singular values), so from that point of view it is natural to wonderwhether the Schatten p -norm minimization approach can work in the matrix context. Wewould like to consider operators A for which the previous problem is equivalent to(1.4) min k X k S subject to A X = y. Where k X k S p denotes the Schatten p -norm of the matrix X ∈ M N . This has already beenstudied in several situations of interest, with the idea going back to the Ph.D. thesis of M.Fazel [13]. Schatten 1-norm (also known as nuclear norm) minimization in the particularcase of the matrix completion problem was studied by Cand`es and Recht [4] (and lateron Cand`es and Tao [7] gave optimality results quantifying the minimum number of entriesneeded to recover a matrix of low rank exactly by any method whatsoever, and showedthat nuclear norm minimization is nearly optimal). Plenty of concepts from the classicaltheory of compressed sensing have found matrix counterparts: Cand`es and Recht [4] usethe idea of coherence ; Recht, Fazel and Parrilo [32] used the matrix version of the restrictedisometry property [32]; whereas both Recht, Xu and Hassibi [33] and Fornasier, Rauhutand Ward [14] consider null-space conditions ; the spherical section property was used byDvijotham and Fazel [11] and Oymak, Mohan, Fazel and Hassibi [29].One thing that does not appear to have been explicitly studied in the matrix context isthe aforementioned relationship to Gelfand widths. Recall that the Gelfand k -width of asubset K of a normed space E is defined as d k ( K, E ) := inf (cid:26) sup x ∈ K ∩ L k x k E : L subspace of E with codim( L ) ≤ k (cid:27) . A closely related concept that is more commonly used in Banach space theory is that of a
Gelfand number : if T : X → Y is a linear operator between normed spaces, its k -Gelfand TABILITY OF LOW-RANK MATRIX RECOVERY 3 number is defined by c k ( T ) := inf ( sup x ∈ L, k x k≤ k T x k : L subspace of X with codim( L ) < k ) . The speed of convergence to zero of the sequence of Gelfand numbers ( c k ( T )) ∞ k =1 is a measureof the compactness of the operator T , and is an example of a sequence of s -numbers; see[30, 22] for more details. In the cases under consideration in this paper the concepts ofGelfand numbers and Gelfand widths actually coincide (up to a small shift in the index), sowe will freely use them both depending on the particular context. It should be mentionedthat there is a general concept of Gelfand width for a linear map that is not always thesame as the corresponding Gelfand number (see [31, Sec. 6.2.6] for the details), but bothconcepts do coincide in nice situations (see [12]).The work of Kashin and Temlyakov [20] made more precise the already-known connectionbetween compressed sensing and the Kashin-Garnaev-Gluskin [19, 18] result that calculatesthe m -Gelfand numbers of the identity map from ℓ N to ℓ N , namely c m +1 ( id : ℓ N → ℓ N ) ≤ C r N/m ) m . In a nutshell, the main result of Kashin and Temlyakov shows that the stability of sparserecovery via ℓ -minimization is equivalent to the kernel of the measurement map being a“good” subspace where the Gelfand number of a certain order is achieved. This idea wastaken further by Foucart, Pajor, Rauhut and Ullrich [16], who used compressed sensingideas to calculate the Gelfand numbers of identity maps from ℓ Np to ℓ Nq for 0 < p ≤ p < q ≤ p -spaces. As far as we know the only part of our results thatis already written down in the literature is the following analogue of the Kashin-Garnaev-Gluskin result due to Carl and Defant [8, p. 252], namely the calculation of the m -Gelfandnumbers of the identity map from S N to S N : for 1 ≤ m ≤ N , c m ( id : S N → S N ) ≍ min (cid:26) , Nm (cid:27) / . Here and in the rest of the paper, the symbol ≍ means that the quantities on the left andthe right are equivalent up to universal constants. If we want to emphasize the dependanceof the constants on some parameters, those will appear as subindices of the equivalencesymbol ( ≍ p,q , for example).The rest of this paper is organized as follows. In section 2 we introduce our notation andstate several known results that will be needed in the sequel. In section 3 we show the firstrelationships between the stability of low-rank matrix recovery and the geometry of Banachspaces, by proving a matrix version of the Kashin-Temlyakov theorem. Section 4 containsa technical result, a matrix version of the main theorem from [15] that gives conditionson the measurement map A that guarantee the stability of the Schatten p -minimizationscheme. A very similar theorem was recently obtained independently by Liu, Huang andChen [25], though our proof is different and we require a weaker hypothesis. In the final JAVIER ALEJANDRO CH ´AVEZ-DOM´INGUEZ AND DENKA KUTZAROVA section, the technical result from Section 4 is used to calculate the Gelfand numbers of theidentity maps from S Np to S Nq for 0 < p ≤ p < q ≤ Notation and preliminaries
In this paper we will only consider square matrices, but all the results can be adaptedto rectangular ones. For p > S Np the space of N × N matrices with theSchatten p -quasi-norm, given by k X k S p = (cid:16) N X i =1 | σ i | p (cid:17) /p , where ( σ i ) Ni =1 is the vector of singular values of the matrix X . Similarly, S Np, ∞ will denotethe space of N × N matrices with the weak-Schatten- p -quasi-norm given by k X k S p, ∞ = max ≤ k ≤ N k /p | σ ∗ k | where ( σ ∗ i ) Ni =1 is the non-increasing rearrangement of ( σ i ) Ni =1 . For any quasi-normed space X , B X will denote its unit ball.We will need to consider the best s -rank approximation error in the Schatten p -quasi-norm, ρ s ( X ) S p := inf (cid:8) k X − Y k S p : rank( Y ) ≤ s (cid:9) . It is well known that the infimum is actually attained at the s -spectral truncation Y = X [ s ] (that is, keeping only the s largest singular values in the singular value decomposition).Given a linear map A : M N → R m and a vector y ∈ R m , for 0 < p ≤ p ( y ) a solution to minimize k Z k S p subject to A Z = y. That is, ∆ p is the Schatten p -quasi-norm minimization reconstruction map. The map ∆ p ofcourse depends on the measuring map A , but for simplicity we do not make this dependenceexplicit in the notation.2.1. The Restricted Isometry Property.
The Restricted Isometry Property (RIP) fora linear map A : R N → R m was introduced by Cand`es and Tao [5], and quickly becamea key concept in the analysis of sparse recovery via ℓ p -norm minimization. The s -orderrestricted isometry constant of such a map is the smallest δ > x ∈ R N of sparsity at most s ,(1 − δ ) k x k ℓ ≤ k Ax k ℓ ≤ (1 + δ ) k x k ℓ . The importance of the RIP stems from the fact that small restricted isometry constantsimply exact recovery via ℓ p -quasi-norm minimization for 0 < p ≤
1, and it should be notedthat it is well known that random choices of the matrix A give small RIP constants of order s , as long as m is at least of the order of s ln( eN/s ) [6, 1, 26].The version of the RIP for matrix recovery was introduced by Recht, Fazel and Parrilo[32], and is as follows: a linear map A : M N → R m is said to have the Restricted IsometryProperty of rank s with constant δ > Z ∈ M N of rank at most s ,(1 − δ ) k Z k S ≤ kA Z k ℓ ≤ (1 + δ ) k Z k S . TABILITY OF LOW-RANK MATRIX RECOVERY 5
The best such constant is denoted by δ s ( A ).Just as in the vector case, random constructions give small RIP constants. The nextresult follows from [3, Thm. 2.3], and will be very important for us in the sequel. Theorem 2.1.
Given a prescribed δ ∈ (0 , , there is a constant C δ such that if the entriesof the map A (seen a matrix with respect to the canonical bases in M N and R m ) are indepen-dent gaussians with mean zero and variance /m , then with positive (even overwhelming)probability δ s ( A ) ≤ δ holds provided that (2.1) m ≥ C δ sN. A noncommutative Kashin-Temlyakov theorem
We will prove a matrix version of the Kashin-Temlyakov characterization of the stabilityof sparse recovery via ℓ -norm minimization in terms of widths. To this end, we definethree properties modeled after the ones studied in [20]. Definition 3.1.
Let N > m and A : M N → R m a linear operator. We say that A has a:(a) Matrix Strong Compressed Sensing Property (MSCSP) if for any X ∈ M N we have k X − ∆ ( A X ) k S ≤ Cs − / ρ s ( X ) S for s ≍ m/N .(b) Matrix Weak Compressed Sensing Property (MWCSP) if for any X ∈ M N we have k X − ∆ ( A X ) k S ≤ Cs − / k X k S for s ≍ m/N .(c) Matrix Width Property (MWP) if for any X ∈ ker( A ),(3.1) k X k S ≤ C ( N/m ) − / k X k S . Notice that the MSCSP is a weakening of condition (i) in [29, Lemma 8], since we are onlyconsidering X ′ = ∆ ( A X ). Also, the name of the MWP comes from its clear relationshipto the definition of the Gelfand numbers/widths. The following theorem is a matrix versionof the Kashin-Temlyakov theorem [20, Thm. 2.2]: Theorem 3.2.
For a linear operator A : M N → R m , the MSCSP, MWCSP and MWP areequivalent (up to a change in the constants).Proof. The MSCSP trivially implies the MWCSP, since ρ s ( X ) S ≤ k X − k S = k X k .Assume that A has the MWSCSP. Given X ∈ ker( A ), note that A X = 0 = A
0, so clearly0 = ∆ (0) = ∆ ( A X ) and thus from the MWSCSP we have k X k S ≤ Cs − / k X k S ,giving the MWP. Assume now that we have the MWP, that is, that equation (3.1) holds. If s < C − N/m , from [29, Thm. 2] (which is a matrix version of [20, Thm. 2.1]) we obtain k X − ∆ ( A X ) k S ≤ C ′ ρ s ( X ) S for C ′ = 2(1 − p C sm/N ) − . Since X − ∆ ( A X ) ∈ ker A , the previous equation togetherwith (3.1) imply the MSCSP. (cid:3) JAVIER ALEJANDRO CH ´AVEZ-DOM´INGUEZ AND DENKA KUTZAROVA
The aforementioned Kashin-Temlyakov theorem says, in a nutshell, that the stability ofsparse-vector recovery via ℓ -minimization has limits imposed by the geometry of Banachspaces encoded in the appropriate Gelfand widths. In the previous proposition, we showed asimilar relationship relating the stability of low-rank recovery via nuclear norm minimizationwith some other Gelfand widths. As in the vector case, following [17, Cor. 10.6], there isa relationship between the geometry of S N and the stability of compressed sensing by anymethod. See Theorem 5.5 below for the precise statement.4. Stability of low-rank matrix recovery through Schatten p quasi-normminimization In this technical section we prove a general result (a matrix version of the main theoremin [15]) that gives RIP-style conditions on the measuring map A that guarantee the stabilityof the Schatten p -norm minimization scheme. For that we will need some notation: Let α s , β s ≥ α s k Z k S ≤ kA Z k ℓ ≤ β s k Z k S , rank( Z ) ≤ s. The results will be stated in terms of a quantity invariant under the change
A ← c A ,namely γ s := β s α s ≥ . Note that this constant is related to the RIP constant, in fact γ s = 1 + δ s − δ s . Unlike in the rest of the paper, we will consider the more general situation of approximaterecovery when measurements are moderately flawed, namely the problem( P p,θ ) minimize k Z k S p subject to kA Z − y k ℓ ≤ β s · θ. For simplicity, we will write ( P p ) instead of ( P p, ). Note that by a compactness argument, asolution of ( P p,θ ) exists for any 0 < p ≤ θ ≥
0. The following theorem is a matrixversion of [15, Thm. 3.1]. It gives conditions (in the spirit of the RIP) that guarantee notonly the stability but also the robustness (that is, resistance to errors in the measurements)of the Schatten p -quasi-norm-minimization for low-rank matrix recovery. Theorem 4.1.
Given < p ≤ , if for some integer t ≥ s (4.1) γ t − < √ − (cid:18) ts (cid:19) /p − / then a solution X ∗ of ( P p,θ ) approximates the original matrix X with errors k X − X ∗ k S p ≤ C ρ s ( X ) S p + D · s /p − / · θ, (4.2) k X − X ∗ k S ≤ C ρ s ( X ) S p t /p − / + D · θ, (4.3) where the constants C , C , D and D depend only on p , γ t and the ratio s/t . TABILITY OF LOW-RANK MATRIX RECOVERY 7
Proof.
We will need to recall some properties of the S p -quasi-norm. Namely, for any ma-trices U and V ,(4.4) k U k S ≤ k U k S p , k U k S p ≤ N /p − / k U k S , k U + V k pS p ≤ k U k pS p + k V k pS p . STEP 1:
Consequence of the assumption on γ t .We will consider certain matrix decompositions similar to the ones in [21]. Consider thesingular value decomposition of X , given by X = U diag( λ i ( X )) V T where U , V are unitary matrices and λ ( X ) = ( λ ( X ) , . . . , λ N ( X )) are the singular valuesof X arranged in decreasing order. For any matrix Z ∈ M N , we will consider a blockdecomposition of Z with respect to X as follows: let U T ZV have the block form U T ZV = (cid:18) Z Z Z Z (cid:19) where Z , Z , Z , Z are of sizes s × s , s × ( N − s ), ( N − s ) × s , ( N − s ) × ( N − s ),respectively. We then decompose Z as Z = Z ( s ) + Z c ( s ) where Z ( s ) = U (cid:18) Z Z Z (cid:19) V T Z c ( s ) = U (cid:18) Z (cid:19) V T Furthermore, we now consider the singular value decomposition of Z given by Z = P diag( λ ( Z )) Q T with P and Q being ( N − s ) × ( N − s ) unitary matrices, and λ ( Z ) is the vector of the N − s singular values of Z arranged in decreasing order. We decompose λ ( Z ) as a sumof vectors Z T i , each of sparsity at most t , where T corresponds to the locations of the t largest entries of λ ( Z ), T to the locations of the next t largest entries, and so on. For i ≥ Z T i = U (cid:18) P diag( λ T i ( Z )) Q T (cid:19) V T , and denote Z T := Z ( s ) .We first observe that k Z T k S + k Z T k S = k Z T + Z T k S ≤ α t kA ( Z T + Z T ) k ℓ = 1 α t (cid:10) A ( Z − Z T − Z T − · · · ) , A ( Z T + Z T ) (cid:11) = 1 α t (cid:10) A Z, A ( Z T + Z T ) (cid:11) + 1 α t X k ≥ (cid:2)(cid:10) A ( − Z T k ) , A Z T (cid:11) + (cid:10) A ( − Z T k ) , A Z T (cid:11)(cid:3) (4.5)Let us renormalize the vectors − Z T k and Z T so that their S -norms equal one by setting Y k := − Z T k / k Z T k k S and Y := Z T / k Z T k S . We then obtain, using the polarization JAVIER ALEJANDRO CH ´AVEZ-DOM´INGUEZ AND DENKA KUTZAROVA identity (cid:10) A ( − Z T k ) , A Z T (cid:11) k Z T k k S k Z T k S = hA Y k , A Y i = 14 (cid:2) kA ( Y k + Y ) k ℓ − kA ( Y k − Y ) k ℓ (cid:3) ≤ (cid:2) β t k Y k + Y k S − α t k Y k − Y k S (cid:3) = 12 [ β t − α t ] . An analogous argument with T in place of T allows us to conclude(4.6) (cid:10) A ( − Z T k ) , A Z T (cid:11) + (cid:10) A ( − Z T k ) , A Z T (cid:11) ≤ β t − α t k Z T k k S (cid:2) k Z T k S + k Z T k S (cid:3) . On the other hand, we have(4.7) (cid:10) A Z, A ( Z T + Z T ) (cid:11) ≤ kA Z k ℓ ·kA ( Z T + Z T ) k ℓ ≤ kA Z k ℓ · β t (cid:2) k Z T k S + k Z T k S (cid:3) . Substituting the inequalities (4.6) and (4.7) into (4.5) we have k Z T k S + k Z T k S ≤ γ t β t kA Z k ℓ + γ t − X k ≥ k Z T k k S (cid:2) k Z T k S + k Z T k S (cid:3) . If we set c := kA Z k ℓ · γ t /β t , d := ( γ t − / P k ≥ k Z T k k S , the previousinequality is k Z T k S − ( c + d Σ) k Z T k S + k Z T k S − ( c + d Σ) k Z T k S ≤ , or equivalently, (cid:20) k Z T k S − c + d Σ2 (cid:21) + (cid:20) k Z T k S − c + d Σ2 (cid:21) ≤ ( c + d Σ) . by getting rid of the second squared term, this easily implies(4.8) k Z T k S ≤ c + d Σ2 + c + d Σ √ √
22 ( c + d Σ) . By H¨older’s inequality (see (4.4)) we get(4.9) k Z T k S p ≤ s /p − / k Z T k S ≤ s /p − / √
22 ( c + d Σ) . We now proceed to bound Σ. For k ≥
2, let η , η ′ be singular values of Z T k , Z T k − ,respectively. By definition, we must have η ≤ η ′ . Raising to the p -th power and averagingover all singular values of Z T k − , η p ≤ t − (cid:13)(cid:13) Z T k − (cid:13)(cid:13) pS p , and hence η ≤ t − /p (cid:13)(cid:13) Z T k − (cid:13)(cid:13) S p .Adding over all singular values of Z T k and taking the square root, this yields k Z T k k S ≤ t / − /p (cid:13)(cid:13) Z T k − (cid:13)(cid:13) S p . Therefore,Σ = X k ≥ k Z T k k S ≤ t / − /p X k ≥ k Z T k k S p ≤ t / − /p " X k ≥ k Z T k k pS p /p = t / − /p (cid:13)(cid:13)(cid:13) Z c ( s ) (cid:13)(cid:13)(cid:13) S p . Combining the above inequality with (4.9), we obtain(4.10) (cid:13)(cid:13) Z ( s ) (cid:13)(cid:13) S p ≤ λ β t · kA Z k ℓ · s /p − / + µ · (cid:13)(cid:13)(cid:13) Z c ( s ) (cid:13)(cid:13)(cid:13) S p . TABILITY OF LOW-RANK MATRIX RECOVERY 9 where the constants λ and µ are given by λ := (1 + √ γ t and µ := 14 (1 + √ γ t − (cid:16) st (cid:17) /p − / . Note that the assumption on γ t translates into the inequality µ < STEP 2:
From now on let Z := X − X ∗ .Because X ∗ is a minimizer of ( P p,θ ), we have(4.11) k X ∗ k pS p ≤ k X k pS p . From [21, Lemma 2.2], whenever
B, C ∈ M N satisfy B T C = 0 and BC T = 0 one has k B + C k pS p = k B k pS p + k C k pS p . In particular, note that(4.12) k X k pS p = (cid:13)(cid:13) X ( s ) (cid:13)(cid:13) pS p + (cid:13)(cid:13)(cid:13) X c ( s ) (cid:13)(cid:13)(cid:13) pS p and (cid:13)(cid:13)(cid:13) X ( s ) − Z c ( s ) (cid:13)(cid:13)(cid:13) pS p = (cid:13)(cid:13) X ( s ) (cid:13)(cid:13) pS p + (cid:13)(cid:13)(cid:13) Z c ( s ) (cid:13)(cid:13)(cid:13) pS p . From the p -triangle inequality (see (4.4)), since X ( s ) − Z c ( s ) = X − Z − X c ( s ) + Z ( s ) = X ∗ − X c ( s ) + Z ( s ) , we get (cid:13)(cid:13)(cid:13) X ( s ) − Z c ( s ) (cid:13)(cid:13)(cid:13) pS p ≤ k X ∗ k pS p + (cid:13)(cid:13)(cid:13) X c ( s ) (cid:13)(cid:13)(cid:13) pS p + (cid:13)(cid:13) Z ( s ) (cid:13)(cid:13) pS p . Together with (4.11) and both equalities in (4.12), this yields (cid:13)(cid:13) X ( s ) (cid:13)(cid:13) pS p + (cid:13)(cid:13)(cid:13) Z c ( s ) (cid:13)(cid:13)(cid:13) pS p ≤ (cid:13)(cid:13) X ( s ) (cid:13)(cid:13) pS p + (cid:13)(cid:13)(cid:13) X c ( s ) (cid:13)(cid:13)(cid:13) pS p (cid:13)(cid:13)(cid:13) X c ( s ) (cid:13)(cid:13)(cid:13) pS p + (cid:13)(cid:13) Z ( s ) (cid:13)(cid:13) pS p . After a cancellation and noticing that (cid:13)(cid:13)(cid:13) X c ( s ) (cid:13)(cid:13)(cid:13) pS p = ρ s ( X ) pS p , we obtain(4.13) (cid:13)(cid:13)(cid:13) Z c ( s ) (cid:13)(cid:13)(cid:13) pS p ≤ ρ s ( X ) pS p + (cid:13)(cid:13) Z ( s ) (cid:13)(cid:13) pS p . STEP 3:
Error estimates.We first note the bound kA Z k ℓ = kA X − A X ∗ k ℓ ≤ kA X − y k ℓ + k y − A X ∗ k ℓ ≤ β s · θ. For the S p -error, we combine the estimates in (4.10) and (4.13) to obtain (cid:13)(cid:13)(cid:13) Z c ( s ) (cid:13)(cid:13)(cid:13) pS p ≤ ρ s ( X ) pS p + λ p · s − p/ · θ p + µ p · (cid:13)(cid:13)(cid:13) Z c ( s ) (cid:13)(cid:13)(cid:13) pS p . As a consequence of µ <
1, we have (cid:13)(cid:13)(cid:13) Z c ( s ) (cid:13)(cid:13)(cid:13) pS p ≤ − µ p ρ s ( X ) pS p + λ p − µ p · s − p/ · θ p . Using the estimate (4.10) once again, we can derive that k Z k pS p ≤ (cid:13)(cid:13) Z ( s ) (cid:13)(cid:13) pS p + (cid:13)(cid:13)(cid:13) Z c ( s ) (cid:13)(cid:13)(cid:13) pS p ≤ (1 + µ q ) · (cid:13)(cid:13)(cid:13) Z c ( s ) (cid:13)(cid:13)(cid:13) pS p + λ p · s − p/ · θ p ≤ − µ p (1 + µ p ) ρ s ( X ) pS p + 2 λ p − µ p · s − p/ · θ p ≤ − /p " /p (1 − µ p ) /p (1 + µ p ) /p ρ s ( X ) S p + 2 /p λ (1 − µ p ) /p · s /p − / · θ p where we have used the inequality ( a p + b p ) /p ≤ /p − ( a + b ) for a, b ≥
0. The desiredestimate (4.2) follows with C := 2 /p − (1 + µ p ) /p (1 − µ p ) /p , D := 2 /p − λ (1 − µ p ) /p . For the S -error, let us observe that the bound in (4.8) also holds if we replace k Z T k S by k Z T k S , and hence k Z k S = X k ≥ k Z T k k S / ≤ X k ≥ k Z T k k S ≤ (1 + √ · ( c + d Σ) + Σ ≤ ν · Σ + 2 λ · θ, where ν = ( λ + 1 − √ /
2. We also have thatΣ ≤ t / − /p (cid:13)(cid:13)(cid:13) Z c ( s ) (cid:13)(cid:13)(cid:13) S p ≤ t / − /p (cid:20) − µ p ρ s ( X ) pS p + λ p − µ p · s − p/ · θ p (cid:21) /p ≤ t / − /p /p − " /p (1 − µ p ) /p ρ s ( X ) S p + λ (1 − µ p ) /p · s /p − / · θ , and hence we conclude that k Z k S ≤ νt / − /p /p − " /p (1 − µ p ) /p ρ s ( X ) S p + λ (1 − µ p ) /p · s /p − / · θ + 2 λθ. This gives the estimate (4.3) with C = 2 /p − ( λ + 1 − √ − µ p ) /p , D = 2 /p − λ ( λ + 1 − √ − µ p ) /p + 2 λ. (cid:3) As consequences of Theorem 4.1, we obtain two corollaries that are matrix versions ofthe ones in [15]. The first one corresponds to the case of exact recovery.
Corollary 4.2.
Given < p ≤ , if γ t − < √ − (cid:18) ts (cid:19) /p − / for some integer t ≥ s, then every rank s matrix is exactly and stably recovered by solving ( P p ). The second one deals with the special case of nuclear norm minimization.
TABILITY OF LOW-RANK MATRIX RECOVERY 11
Corollary 4.3.
Under the assumption that γ s < √ − ≈ . , every rank s matrixis exactly and stably recovered by solving ( P ). This last Corollary is clearly related to existing results on the RIP, it corresponds tothe condition δ s < − √ / ≈ . p = 1 thiscondition is not the best possible: a very recent result of Cai and Zhang [2] shows that theoptimal condition to have exact recovery of rank s matrices via nuclear norm minimizationis in fact δ s < / √ ≈ . A satisfying the hypothesis of Theorem4.1. 5. The Gelfand widths of S p -balls for < p ≤ c m ( id : S Np → S Nq )for 0 < p ≤ p < q ≤
2. This can be considered as a noncommutative version of theresults from [16], where they use compressed sensing ideas to calculate the correspondingGelfand numbers c m ( id : ℓ Np → ℓ Nq ) . Inspired by their approach, our proof is based on low-rank matrix recovery ideas.Our main result is the following (compare to [16, Thm. 1.1]).
Theorem 5.1.
For < p ≤ and p < q ≤ , if ≤ m < N , then d m ( B NS p , S Nq ) ≍ p,q min (cid:26) , Nm (cid:27) /p − /q . and, if p < , d m ( B NS p, ∞ , S Nq ) ≍ p,q min (cid:26) , Nm (cid:27) /p − /q . Before the proof, let us go through some preliminaries. Recall that it is classical to showthat, for q > p , ρ s ( X ) S q ≤ s /p − /q k X k S p , (5.1) ρ s ( X ) S q ≤ D p,q s /p − /q k X k S p, ∞ , D p,q := ( q/p − − /q . (5.2)5.1. Lower bounds.
In this section we prove a result that will easily imply the desiredlower bounds in Theorem 5.1. It is a matrix version of [16, Thm. 2.1] and, just like in theirresult, we note that the restriction q ≤ Proposition 5.2.
For < p ≤ and p < q ≤ ∞ , there exists a constant c p,q > such that d m ( B NS p , S Nq ) ≥ c p,q min (cid:26) , Nm (cid:27) /p − /q Proof.
With c = (1 / /p − /q and µ = min { , N/ m } , we are going to prove that d m ( B NS p , S Nq ) ≥ cµ /p − /q . We proceed by contradiction, assuming that d m ( B NS p , S Nq ) < cµ /p − /q . This implies theexistence of a linear map A : M N → R m such that for all V ∈ ker( A ) \ { } , k V k S q < cµ /p − /q k V k S p . For a fixed V ∈ ker( A ) \ { } , in view of the inequalities k V k S p ≤ N /p − /q k V k S q and c ≤ (1 / /p − /q , we derive 1 < ( µN/ /p − /q , so 1 ≤ /µ < N/
2. We then define s := ⌊ /µ ⌋ ≥
1, so 2 s < N and 12 µ < s ≤ µ . Now for V ∈ ker( A ) \ { } , (cid:13)(cid:13) V [2 s ] (cid:13)(cid:13) S p ≤ (2 s ) /p − /q (cid:13)(cid:13) V [2 s ] (cid:13)(cid:13) S q ≤ (2 s ) /p − /q k V k S q < c (2 sµ ) /p − /q k V k S p ≤ /p k V k S p and therefore, using that k V k pS p = (cid:13)(cid:13) V [2 s ] (cid:13)(cid:13) pS p + (cid:13)(cid:13) V − V [2 s ] (cid:13)(cid:13) pS p , we conclude (cid:13)(cid:13) V [2 s ] (cid:13)(cid:13) pS p ≤ (cid:13)(cid:13) V − V [2 s ] (cid:13)(cid:13) pS p . This means that A satisfies the sufficient conditions in [29, Thm. 3], which implies thatSchatten p -quasinorm minimization gives exact recovery of rank s matrices. By well-knownarguments (see, for example, the discussion after the statement of theorem 2.3 in [3]), thisgives m ≥ N s > N µ ≥ N µ ≥ N mN = m, a blatant contradiction. (cid:3) Upper bounds.
In this subsection we establish a result from which the desired upperbounds in Theorem 5.1 will follow easily. The proof relies on low-rank matrix recoverymethods, and the reader will notice similarities with the proof of Theorem 4.1. It shouldbe mentioned that the bound for the case p ≥ p < p < E m ( B NS p, ∞ , S Nq ) can bechosen to be the S -minimization mapping, at least when q ≥ Theorem 5.3.
For < p < and p < q ≤ , there exists a linear map A : M N → R m such that, with r = min { , q } , sup X ∈ B Np, ∞ k X − ∆ r ( A X ) k S q ≤ C p,q min n , Nm o /p − /q , where C p,q > is a constant that depends only on p and q . TABILITY OF LOW-RANK MATRIX RECOVERY 13
Proof.
Let C be the constant in (2.1) relative to the RIP associated with δ = 1 /
3, say.
Case 1: m ≥ CN .We define s ≥ mCN , so that(5.3) m CN < s ≤ mCN . Let t = 2 s . It is then possible to find a linear map A : M N → R m with δ t ( A ) ≤ δ . Inparticular, we have δ s ( A ) ≤ δ . Now, given Z := X − ∆ r ( A X ) ∈ ker A , we decompose Z into matrices Z T , Z T , Z T , . . . of rank at most s by taking the s largest singular values of Z for Z T , then the next s largest ones for Z T and so on.This easily implies (cid:0) k Z T k k S /s (cid:1) / ≤ (cid:0) (cid:13)(cid:13) Z T k − (cid:13)(cid:13) rS r /s (cid:1) /r , i.e.,(5.4) k Z T k k S ≤ s /r − / (cid:13)(cid:13) Z T k − (cid:13)(cid:13) S r , k ≥ . Using the r -triangle inequality, we have k Z k rS q = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)X k ≥ Z T k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) rS q ≤ X k ≥ k Z T k k rS q ≤ X k ≥ (cid:0) s /q − / k Z T k k S (cid:1) r ≤ X k ≥ (cid:16) s /q − / √ − δ kA Z T k k ℓ (cid:17) r . The fact that Z ∈ ker A implies A Z T = − P k ≥ A Z T k . It follows that k Z k rS q ≤ (cid:16) s /q − / √ − δ (cid:17) r (cid:16) X k ≥ kA Z T k k ℓ (cid:17) r + (cid:16) s /q − / √ − δ (cid:17) r X k ≥ kA Z T k k rℓ ≤ (cid:16) s /q − / √ − δ (cid:17) r X k ≥ kA Z T k k rℓ ≤ (cid:16)r δ − δ s /q − / (cid:17) r X k ≥ k Z T k k rS . We then derive, using the inequality (5.4), k Z k rS q ≤ (cid:16)r δ − δ s /r − /q (cid:17) r X k ≥ k Z T k k rS r . In view of the choice δ = 1 / k X − ∆ r ( A X ) k S q ≤ /r √ (cid:16) CNm (cid:17) /r − /q k X − ∆ r ( A X ) k S r . Moreover, in view of δ s ≤ / C > k X − ∆ r ( A X ) k S r ≤ ( C ) /r ρ s ( x ) S r . Finally, using (5.2) and (5.3), we have(5.7) ρ s ( X ) S r ≤ D p,r s /p − /r ≤ D p,r (cid:16) CNm (cid:17) /p − /r . Putting (5.5), (5.6), and (5.7) together, we obtain, for any x ∈ B Np, ∞ , k X − ∆ r ( A X ) k S q ≤ /r √ C /r D p,r (cid:16) CNm (cid:17) /p − /q . Case 2: m ≤ CN .We simply choose the map A as the zero map. Then, for any X ∈ B NS p, ∞ , we have k X − ∆ r ( A X ) k S q = k X k S q ≤ D p,q k X k p, ∞ ≤ D p,q , for some constant D p,q > (cid:3) Remark 5.4.
When p = 1, the same proof but using inequality (5.1) instead of (5.2) givesthe following: for 1 < q ≤
2, there exists a linear map A : M N → R m such that,sup X ∈ B N k X − ∆ ( A X ) k S q ≤ C q min n , Nm o − /q , where C q > q .5.3. Proof of theorem 5.1.
Proof.
First, an observation. As in the vector case, the simple inclusion B NS p ⊆ B NS p, ∞ implies d m ( B NS p , S Nq ) ≤ d m ( B NS p, ∞ , S Nq ) , hence it suffices to show lower bounds for d m ( B NS p , S Nq ) and upper bounds for d m ( B NS p, ∞ , S Nq ).The lower bounds follow immediately from Proposition 5.2. When 0 < p <
1, the upperbounds follow from Theorem 5.3. For p = 1, the upper bound when 1 ≤ m ≤ N followsfrom the trivial inequality k X k S q ≤ k X k S , whereas when N ≤ m ≤ N it follows fromRemark 5.4. (cid:3) Relation to compressive widths.
As promised after the proof of our matrix versionof the Kashin-Temlyakov theorem, the relationship between the Banach space geometryof the finite-dimensional Schatten p -classes and matrix recovery goes beyond the normminimization scheme. Below we use the notation from [17, Sec. 10.1]: the quantities E m and E m ada measure the worst-case reconstruction errors of optimal measurement/reconstructionschemes in the nonadaptive and adaptive settings, respectively. Theorem 5.5.
For < p ≤ and p < q ≤ , if ≤ m < N then the adaptive andnonadaptive compressive widths satisfy E m ada ( B S Np , S Nq ) ≍ p,q E m ( B S Np , S Nq ) ≍ p,q min (cid:26) , Nm (cid:27) /p − /q . Proof.
Since − B S Np = B S Np and B S Np + B S Np ⊆ /p B S Np , [17, Thm. 10.4] implies d m ( B S Np , S Nq ) ≤ E m ada ( B S Np , S Nq ) ≤ E m ( B S Np , S Nq ) ≤ /p d m ( B S Np , S Nq )But now, since d m ( B S Np , S Nq ) = c m +1 ( id : S Np → S Nq ), an appeal to Theorem 5.1 finishesthe proof. (cid:3) In the ℓ p case the lower estimate is of particular importance in compressed sensing, sinceit allows one to prove lower bounds for the number of measurements required to stablyrecover s -sparse vectors in R N . In the matrix case, that is no longer the case. Tryingit only gives that (under certain conditions), the minimum number of measurements m TABILITY OF LOW-RANK MATRIX RECOVERY 15 required to stably recover rank s matrices in M N is ≥ CN s , which is not an improvementover the information-theoretical limit. The reason behind this is that, unlike in the ℓ p case,there are compressed sensing algorithms (including norm minimization) that give stabilitywith a number of measurements of that order [3]. Acknowledgements
We would like to thank Rachel Ward for suggesting the reference [15], and also thankthe Workshop in Analysis in Probability at Texas A&M University. The first author waspartially supported by NSF grant DMS-1400588.
References [1] Richard Baraniuk, Mark Davenport, Ronald DeVore, and Michael Wakin. A simple proof of the re-stricted isometry property for random matrices.
Constr. Approx. , 28(3):253–263, 2008.[2] T. Tony Cai and Anru Zhang. Sparse representation of a polytope and recovery of sparse signals andlow-rank matrices.
IEEE Trans. Inform. Theory , 60(1):122–132, 2014.[3] Emmanuel J. Cand`es and Yaniv Plan. Tight oracle inequalities for low-rank matrix recovery from aminimal number of noisy random measurements.
IEEE Trans. Inform. Theory , 57(4):2342–2359, 2011.[4] Emmanuel J. Cand`es and Benjamin Recht. Exact matrix completion via convex optimization.
Found.Comput. Math. , 9(6):717–772, 2009.[5] Emmanuel J. Cand`es and Terence Tao. Decoding by linear programming.
IEEE Trans. Inform. Theory ,51(12):4203–4215, 2005.[6] Emmanuel J. Candes and Terence Tao. Near-optimal signal recovery from random projections: universalencoding strategies?
IEEE Trans. Inform. Theory , 52(12):5406–5425, 2006.[7] Emmanuel J. Cand`es and Terence Tao. The power of convex relaxation: near-optimal matrix comple-tion.
IEEE Trans. Inform. Theory , 56(5):2053–2080, 2010.[8] Bernd Carl and Andreas Defant. Asymptotic estimates for approximation quantities of tensor productidentities.
J. Approx. Theory , 88(2):228–256, 1997.[9] Albert Cohen, Wolfgang Dahmen, and Ronald DeVore. Compressed sensing and best k -term approxi-mation. J. Amer. Math. Soc. , 22(1):211–231, 2009.[10] David L. Donoho. Compressed sensing.
IEEE Trans. Inform. Theory , 52(4):1289–1306, 2006.[11] K. Dvijotham and Maryam Fazel. A nullspace analysis of the nuclear norm heuristic for rank minimiza-tion. In
Proc. of ICASSP 2010 , Dallas, TX, March 2010.[12] David E. Edmunds and Jan Lang. Gelfand numbers and widths.
J. Approx. Theory , 166:78–84, 2013.[13] Maryam Fazel.
Matrix rank minimization with applications . PhD thesis, Stanford University, 2002.[14] Massimo Fornasier, Holger Rauhut, and Rachel Ward. Low-rank matrix recovery via iterativelyreweighted least squares minimization.
SIAM J. Optim. , 21(4):1614–1640, 2011.[15] Simon Foucart and Ming-Jun Lai. Sparsest solutions of underdetermined linear systems via l q -minimization for 0 < q ≤ Appl. Comput. Harmon. Anal. , 26(3):395–407, 2009.[16] Simon Foucart, Alain Pajor, Holger Rauhut, and Tino Ullrich. The Gelfand widths of ℓ p -balls for0 < p ≤ J. Complexity , 26(6):629–640, 2010.[17] Simon Foucart and Holger Rauhut.
A mathematical introduction to compressive sensing . Applied andNumerical Harmonic Analysis. Birkh¨auser/Springer, New York, 2013.[18] A. Yu. Garnaev and E. D. Gluskin. The widths of a Euclidean ball.
Dokl. Akad. Nauk SSSR ,277(5):1048–1052, 1984.[19] B. S. Kashin. The widths of certain finite-dimensional sets and classes of smooth functions.
Izv. Akad.Nauk SSSR Ser. Mat. , 41(2):334–351, 478, 1977.[20] B. S. Kashin and V. N. Temlyakov. A remark on the problem of compressed sensing.
Mat. Zametki ,82(6):829–837, 2007.[21] Lingchen Kong and Naihua Xiu. Exact low-rank matrix recovery via nonconvex schatten p -minimization. Asia-Pacific Journal of Operational Research , pages 1340010–1—1340010–13, 2013. [22] Hermann K¨onig. Eigenvalues of operators and applications. In
Handbook of the geometry of Banachspaces, Vol. I , pages 941–974. North-Holland, Amsterdam, 2001.[23] Kiryung Lee and Yoram Bresler. Guaranteed minimum rank approximation from linear observationsby nuclear norm minimization with an ellipsoidal constraint.
ArXiv preprint , page arXiv:0903.4742.[24] Kiryung Lee and Yoram Bresler. ADMiRA: atomic decomposition for minimum rank approximation.
IEEE Trans. Inform. Theory , 56(9):4402–4416, 2010.[25] Lu Liu, Wei Huang, and Di-Rong Chen. Exact minimum rank approximation via Schatten p -normminimization. J. Comput. Appl. Math. , 267:218–227, 2014.[26] Shahar Mendelson, Alain Pajor, and Nicole Tomczak-Jaegermann. Uniform uncertainty principle forBernoulli and subgaussian ensembles.
Constr. Approx. , 28(3):277–289, 2008.[27] B. K. Natarajan. Sparse approximate solutions to linear systems.
SIAM J. Comput. , 24(2):227–234,1995.[28] Erich Novak. Optimal recovery and n -widths for convex classes of functions. J. Approx. Theory ,80(3):390–408, 1995.[29] Samet Oymak, Karthik Mohan, Maryam Fazel, and Babak Hassibi. A simplified approach to recov-ery conditions for low rank matrices. In , pages 2318–2322, Piscataway, NJ, 2011. IEEE.[30] Albrecht Pietsch.
Eigenvalues and s -numbers , volume 13 of Cambridge Studies in Advanced Mathemat-ics . Cambridge University Press, Cambridge, 1987.[31] Albrecht Pietsch.
History of Banach spaces and linear operators . Birkh¨auser Boston, Inc., Boston, MA,2007.[32] Benjamin Recht, Maryam Fazel, and Pablo A. Parrilo. Guaranteed minimum-rank solutions of linearmatrix equations via nuclear norm minimization.
SIAM Rev. , 52(3):471–501, 2010.[33] Benjamin Recht, Weiyu Xu, and Babak Hassibi. Null space conditions and thresholds for rank mini-mization.
Math. Program. , 127(1, Ser. B):175–202, 2011.
Department of Mathematics, University of Texas at Austin, 2515 Speedway Stop C1200,Austin, TX 78712-1202.
Current address : Instituto de Ciencias Matem´aticas, CSIC-UAM-UC3M-UCM, C/Nicol´as Cabrera,n ◦ E-mail address : [email protected] Institute of Mathematics, Bulgarian Academy of Sciences, Sofia, Bulgaria.
Current address : Department of Mathematics, University of Illinois at Urbana-Champaign, 1409 W.Green Street Urbana, IL 61801.
E-mail address ::