A Geometric Approach to Low-Rank Matrix Completion
aa r X i v : . [ c s . I T ] J un A Geometric Approach to Low-Rank MatrixCompletion
Wei Dai ∗ , Ely Kerman ∗∗ , Olgica Milenkovic ∗∗ Department of Electrical and Computer Engineering, ∗∗ Department of MathematicsUniversity of Illinois at Urbana-ChampaignEmail: {weidai07,ekerman,milenkov}@illinois.edu
Abstract —The low-rank matrix completion problem can besuccinctly stated as follows: given a subset of the entries of amatrix, find a low-rank matrix consistent with the observations.While several low-complexity algorithms for matrix completionhave been proposed so far, it remains an open problem to devisesearch procedures with provable performance guarantees for abroad class of matrix models. The standard approach to theproblem, which involves the minimization of an objective functiondefined using the Frobenius metric, has inherent difficulties: theobjective function is not continuous and the solution set is notclosed. To address this problem, we consider an optimizationprocedure that searches for a column (or row) space thatis geometrically consistent with the partial observations. Thegeometric objective function is continuous everywhere and thesolution set is the closure of the solution set of the Frobeniusmetric. We also preclude the existence of local minimizers,and hence establish strong performance guarantees, for specialcompletion scenarios, which do not require matrix incoherenceor large matrix size.
I. I
NTRODUCTION
In many practical applications of data acquisition, the sig-nals of interest have a sparse representation in some basis.That is, they can be well approximated using only a few basiselements. This allows for efficient sampling and reconstructionof signals [1], [2], [3], [4], [5], [6]. More precisely, the numberof linear measurements required to capture a sparse signalcan be much smaller than the number of inherent dimensionsof the signal, and various polynomial time algorithms areknown for accurately reconstructing the sparse signal basedon these linear measurements. Due to the significant reductionin sampling resources and modest requirements for compu-tational resources, sparse signal processing has been studiedintensively [1], [2], [3], [4], [5], [6].There are two categories of sparse signals which frequentlyarise in applications. In the first category, the sparse signal canbe modeled a vector with only a small fraction of non-zeroentries. Compressive sensing is the framework of sampling andrecovering such signals. In the second category, the signals arerepresented by matrices whose ranks are much smaller thaneither of their dimensions. In the second setting, one of thefundamental problems of sparse signal processing is the low-rank matrix completion problem – to determine when and howone can recover a low-rank matrix based on only a subset ofits entries [5], [6], [7].Scores of methods and algorithms have been proposed forlow-rank matrix completion. Many of them are based on sim- ilarities between compressive sensing reconstruction and low-rank matrix completion. In general, both reconstruction tasksare ill-posed and computationally intractable. Nevertheless,exact recovery in an efficient manner is possible for both signalcategories provided that the signal is sufficiently sparse or suf-ficiently densely sampled. Casting the sparse signal recoveryproblem as an optimization problem, ℓ -minimization has beenproposed for compressive sensing signal reconstruction [1],[2], [3]. Following the same idea, methods based on nuclearnorm minimization have been developed for low-rank matrixcompletion [5], [6], [8], [9]. In terms of greedy algorithms,many of the approaches for low-rank completion can beviewed as generalizations of their counterparts for compressivesensing reconstruction. In particular, the ADMiRA algorithm[10] is a counterpart of the subspace pursuit (SP) [11] andCoSaMP [12] algorithms, while the singular value projection(SVP) method [13] extends the iterative hard thresholding(IHT) [14] approach. There are also other approaches thatutilize some special structural properties of the low-rankmatrices. Examples include the power factorization algorithm[15], the OptSpace algorithm [16], and the subspace evolutionand transfer algorithm [17].Nevertheless, there is a fundamental problem in low-rankmatrix completion which has not been successfully addressedyet: how to search for a low-rank matrix consistent withpartial observations. The fundamental difference between com-pressive sensing and low-rank matrix completion lies in theknowledge of the “sparse basis”. In compressive sensing, thebasis under which the signal is sparse is known a priori. Inprinciple, the support set of the nonzero entries can be foundby exhaustive search. However, in low-rank matrix completion,the corresponding “sparse basis” is not known. Note that theset of all possible bases forms a continuous space. In such aspace, “exhaustive” search is impossible. Moreover, we shallshow, in Example 1 of Section III, that a direct gradient-descent search does not work either.The understanding of the search for consistent matricesis incomplete. There are two special cases where speciallydesigned algorithms can guarantee a consistent low-rank so-lution. The first case is when the low-rank matrix is fullysampled. The consistent low-rank solution is simply the obser-vation matrix itself. The corresponding “sparse basis” (singularvectors) can be easily obtained by a singular value decompo-sition. The other case is when the rank equals to one. Givenan arbitrary sampling pattern, one simply looks at the ratios between the revealed entries in the same column and usesthese ratios to construct a column vector that represents thecolumn space. This method is guaranteed to find a consistentsolution for rank-one matrices. However, it remains an openproblem how to extend this method for general ranks. Hence,such an approach is not universal. On the other hand, none ofexisting general algorithms provides performance guaranteeeven for the rank-one case. The performance guarantee ofnuclear norm minimization is built on incoherence conditions,which only holds with high probability when the low-rankmatrix is drawn randomly from certain ensembles and whenthe size of the matrix is sufficiently large. Our understandingof low-rank matrix completion is far from complete.Our approach to address these issues is summarized asfollows.1) We provide a framework for searching for a low-rankmatrix that is consistent with the partial observations.There is no requirement that such a matrix is unique: ifthere is a unique low-rank solution, we should be ableto find this unique matrix; otherwise, it suffices to findjust one solution that agrees with the revealed entries. Inour approach, we assume that the rank of the underlyinglow-rank matrix is known a priori. Finding a consistentlow-rank matrix is equivalent to finding a consistentcolumn/row space. This is different from the OptSpacealgorithm in [16], where the search is performed on bothcolumn and row spaces simultaneously.2) We propose a geometric performance metric to measurethe consistency between the estimated column space andthe partial observations. In the literature, the standardapproach is to minimize an objective function that isdefined via the Frobenius norm. As we shall illustratewith explicit examples, this objective function may havesingularities, and therefore the corresponding solutionset may not be closed. Hence, we introduce a new for-mulation where consistency is now defined in geometricterms. This allows us to address the difficulties relatedto the Frobenius metric. In particular, we show thatour geometric objective function is always continuous.The set of the corresponding consistent solutions is theclosure of the set corresponding to the Frobenius norm.This new metric allows for provably strong performanceguarantees, described below.3) We provide strong performance guarantees for specialcompletion scenarios: rank-one matrices with arbitrarysampling patterns, and fully sampled matrices of arbi-trary rank. For these two scenarios, a gradient descentsearch starting from a random point will converge to aglobal minimum with probability one. More importantly,if the partial observations admit a unique consistentsolution, this search procedure finds this unique solutionwith probability one.
The performance guarantees aredifferent from those previously established in litera-ture. Roughly speaking, previous performance guaran- For full sampled matrices, even though using a simple singular valuedecomposition produces a consistent column space, it is not clear that arandomlly initialized search would converge to a consistent column space.In what follows, we prove that this is the case. tees require large matrix sizes and only hold with highprobability. Ours hold with probability one regardlessof matrix size. It is also worth noting that we do notrequire incoherence conditions, which are essential forthe performance guarantees of nuclear norm minimiza-tion. Unfortunately, we are presently unable to obtainperformance guarantees for more general cases.The paper is organized as follows. In Section II we in-troduce the low-rank matrix completion problem, and somebackground material regarding Grassmann manifolds and theirgeometry. In Section III we show that formulating the low-rank matrix completion problem as an optimization problemusing the Frobenius norm may yield singularities which canobstruct standard minimization algorithms. We then proposea new geometric formulation of the problem as a remedyto this difficulty. This new formulation allows for strongperformance guarantees that are presented in Section IV.Section V summarizes the main contributions of the work.Proofs of the main results are presented in the Appendices.II. L OW -R ANK M ATRIX C OMPLETION AND P RELIMINARIES
Let X ∈ R m × n be an unknown matrix with rank r ≤ min ( m, n ) , and let Ω ⊂ [ m ] × [ n ] be the set of indices ofthe observed entries, where [ K ] = { , , · · · , K } . Define theprojection operator P Ω by P Ω : R m × n → R m × n P Ω ( X ) X Ω , where ( X Ω ) i,j = ( X i,j if ( i, j ) ∈ Ω0 if ( i, j ) / ∈ Ω . The consistent matrix completion problem is to find one rank- r matrix X ′ that is consistent with the observations X Ω , i.e., ( P
0) : find a X ′ such thatrank ( X ′ ) = r and P Ω ( X ′ ) = P Ω ( X ) = X Ω . (1)By definition, this problem is well defined since X Ω isobtained from some rank- r matrix X which is therefore asolution. As in other works, [10], [15], [16], we assume thatthe rank r is given. In practice, one may try to sequentiallyguess a rank bound until a satisfactory solution has been found.We also introduce the (standard) projection operator P , P : R m × R m × k → R m P ( x , U ) y = U U † x , where ≤ k ≤ m , and where the superscript † denotes thepseudoinverse of a matrix. Let span ( U ) denote the subspacespanned by the columns of the matrix U , i.e.,span ( U ) = { v ∈ R m : v = U w for some w ∈ R m } . One can describe P ( x , U ) , in geometric terms, as the pro-jection of the vector x onto span ( U ) . It should be observedthat U † x is the global minimizer of the quadratic optimizationproblem min w ∈ R k k x − U w k . A. Search for a consistent column space
We now show that the problem ( P is equivalent to findinga column space consistent with the observed entries of X .Let U m,r be the set of m × r matrices with r orthonormalcolumns, i.e., U m,r = (cid:8) U ∈ R m × r : U T U = I r (cid:9) . Definethe function f F : U m,r → R by setting f F ( U ) = min W ∈ R n × r (cid:13)(cid:13) X Ω − P Ω (cid:0) U W T (cid:1)(cid:13)(cid:13) F , (2)where k·k F denotes the Frobenius norm. This function mea-sures the consistency between the matrix U and the obser-vations X Ω . In particular, if f F ( U ) = 0 , then there existsa matrix W such that the rank- r matrix U W T satisfies P Ω (cid:0) U W T (cid:1) = X Ω . Hence, the consistent matrix completionproblem is equivalent to ( P
1) : find U ∈ U m,r such that f F ( U ) = 0 . (3)In fact, f F ( U ) depends only on the subspace span ( U ) sincethe columns of a matrix of the form U W T all lie in span ( U ) .Hence, to solve the consistent matrix completion problem, itsuffices to find a column space that is consistent with theobserved entries. Note that the same conclusion holds for therow space as well. For simplicity, we restrict our attention tothe column space only. B. Grassmann Manifolds
The set of column spaces of elements in U m,r can beidentified with the Grassmann manifold G m,r , the set of r -dimensional subspaces in the m -dimensional Euclidean space R m . This is a smooth compact manifold of dimension r ( m − r ) . Conversely, every element, say U ∈ G m,r can be presentedby a generator matrix U ∈ U m,r satisfying span ( U ) = U .However, this presentation of U by a generator matrix isclearly not unique. Nevertheless, it follows from the discussionin the previous section that the function f F descends to afunction on G m,r . Thus, problem ( P can be viewed as anoptimization problem on the compact manifold G m,r .In this section we recall some facts concerning the geometryof Grassmann manifolds which will be useful in addressingthis and similar optimization problems. For the proofs of thesefacts the reader is referred to [18]. We begin by recallingthe construction of the standard Riemannian metric, g m,r , on G m,r . Note that the group U m,m of orthogonal m × m matricesacts transitively on G m,r (by multiplication on generatormatrices). More precisely, G m,r can be described as a quotientof U m,m , i.e., G m,r = U m,m / ( U m − r,m − r × U r,r ) Now, as a compact Lie group, U m,m has a standard (bi-invariant) Riemannian metric (can be defined by using innerproduct in the tangent space). This descends to the quotient G m,r as the metric g m,r . By construction, g m,r is invariantunder the action of U m,m .The metric g m,r determines a chordal distance function andgeodesic curves on G m,r which will play an important role inwhat follows. To obtain the relevant formulas for these objectswe require the notion of the principal angles between two subspaces [19], [20]. Consider the subspaces span ( U ) andspan ( V ) of R m for some U ∈ U m,p and V ∈ U m,q . Theprincipal angles between these two subspaces can be defined inthe following constructive manner. Without loss of generality,assume that ≤ p ≤ q ≤ m . Let u ∈ span ( U ) and v ∈ span ( V ) be unit-length vectors such that (cid:12)(cid:12) u T v (cid:12)(cid:12) is maximal.Inductively, let u k ∈ span ( U ) and v k ∈ span ( V ) be unitvectors such that u Tk u j = 0 and v Tk v j = 0 for all ≤ j < k and (cid:12)(cid:12) u Tk v k (cid:12)(cid:12) is maximal. The principal angles are then definedas α k = arccos u Tk v k for k = 1 , , · · · , p .Alternatively, the principal angles can be computed viasingular value decomposition. Consider the singular valuedecomposition U U T V V T = ¯ U Λ ¯ V T , where ¯ U ∈ U m,p and ¯ V ∈ U m,p contain the first p left and right singular vectors,respectively, and Λ ∈ R p × p is a diagonal matrix comprisedof singular values λ ≥ · · · ≥ λ p . Then the k th columns of ¯ U and ¯ V correspond to the vectors u k and v k used in theconstructive definition, respectively. The k th singular value λ k defines the k th principal angle α k via cos α k = λ k . Chordal distance on G m,r . For U and U in U m,r , the chordal distance between the two subspaces span ( U ) andspan ( U ) in G m,r is given, in terms of the p principal anglesbetween them, via the formula vuut r X k =1 sin α k . The chordal distance can also be expressed in terms of singularvalues as vuut r X k =1 (1 − λ k ) . Geodesics on G m,r . We will use the gradient descent methodon G m,r to search for consistent column spaces. This willrequire some information concerning the geodesics of themetric g m,r on G m,r which we now recall.Roughly speaking, a geodesic in a manifold is a general-ization of the notion of a straight line in the Euclidean space:given any two points in G m,r , among all curves that connectthese two points, the one of the shortest length is geodesic.More precisely, fix a subspace U in G m,r and a tangent vector H to G m,r at U . Let U ∈ U m,r be a generator matrixfor U . The tangent space to G m,r at U can be identifiedwith the set of horizontal tangent vectors to U , i.e., the setof tangent vectors W at U which satisfy U T W = 0 [18].Let H ∈ R m × r be the horizontal tangent vector at U whichcorresponds to H and set U ( t ) = [ U V H , U H ] (cid:20) cos ( S H t )sin ( S H t ) (cid:21) V TH , (4)where U H S H V TH is the compact singular value decompositionof H . Then span ( U ( t )) is the unique geodesic of g m,r whichstarts at U with “initial velocity” H . We now use this general solution for the geodesic flowof g m,r to establish the following technical result concerninggeodesics between a given pair of subspaces. Lemma 1:
Fix two elements U and U of U m,r . Let V Λ V T be the singular value decomposition of the matrix U T U , and denote the i th singular value by λ i = cos α i . Set ¯ U = U V and ¯ U = U V and note that ¯ U T ¯ U = Λ .
1) Consider the path U ( t ) = (cid:2) ¯ U , G (cid:3) (cid:20) diag ([ · · · , cos α i t, · · · ]) diag ([ · · · , sin α i t, · · · ]) (cid:21) V T , (5)where the columns of G = [ · · · , g i , · · · ] ∈ R m × r aredefined as follows g i = ¯ U , : i − λ i ¯ U , : i k ¯ U , : i − λ i ¯ U , : i k if λ i = 1 , if λ i = 1 . Here, the subscript : i denotes the i th column of thecorresponding matrix. Then the path span ( U ( t )) is ageodesic of g m,r such that span ( U (0)) = span ( U ) and span ( U (1)) = span ( U ) .2) Let ¯ x ∈ span ( U ) be a unit-norm vector. It’s clear thatthere exists a unique ¯ w ∈ U r, such that ¯ x = ¯ U ¯ w .Suppose that ¯ x / ∈ span (cid:0) ¯ U (cid:1) . Let k the number of thesingular values of ¯ U T ¯ U that equal to one. Then k < r and there exists an index j ∈ [ r ] such that k < j ≤ r and ¯ w j = 0 . Proof:
Clearly, U (0) = U . Since ¯ U T ¯ U = Λ , we have (cid:13)(cid:13) ¯ U , : i − λ i ¯ U , : i (cid:13)(cid:13) = 1 − λ i (cid:10) ¯ U , : i , ¯ U , : i (cid:11) + λ i = 1 − λ i . Thus, we have U (1) = (cid:2) · · · , ¯ U , : i cos α i + g i sin α i , · · · (cid:3) V T = (cid:20) · · · , ¯ U , : i λ i + g i q − λ i , · · · (cid:21) V T = (cid:2) · · · , ¯ U , : i λ i + g i (cid:13)(cid:13) ¯ U , : i − λ i ¯ U , : i (cid:13)(cid:13) , · · · (cid:3) V T = (cid:0) ¯ U Λ + (cid:0) ¯ U − ¯ U Λ (cid:1)(cid:1) V T = U V V T . Hence, span ( U (1)) = span ( U ) . To prove the first part of thelemma it just remains to show that span ( U ( t )) is geodesic.Setting H = ˙ U (0) we have H = G diag ([ · · · , α i , · · · ]) V T . (6)We first verify that the tangent vector H is horizontal whichis equivalent to showing that U T H = 0 . According to thedefinition of the vectors g i , when λ i = 1 , one has (cid:13)(cid:13) ¯ U , : i − λ i ¯ U , : i (cid:13)(cid:13) = 0 and ¯ U T g i = 1 (cid:13)(cid:13) ¯ U , : i − λ i ¯ U , : i (cid:13)(cid:13) ¯ U T (cid:0) ¯ U , : i − λ i ¯ U , : i (cid:1) = 1 (cid:13)(cid:13) ¯ U , : i − λ i ¯ U , : i (cid:13)(cid:13) λ i e i − λ i e i = . Hence, U T G = V T ¯ U T G = . By (6), this implies that U T H = 0 , as desired. Note thatequation (6) can also be viewed as an expression for thecompact singular value decomposition of H . It then followsdirectly from (4) that span ( U ( t )) is indeed a geodesic.To prove the second part of the lemma, let u , , · · · , u ,r and u , , · · · , u ,r be the column vectors of the matrix ¯ U and ¯ U , respectively. By assumption, λ = · · · = λ k = 1 and > λ k +1 ≥ · · · ≥ λ r . Hence, u ,j = u ,j , for all j ≤ k, and h u ,j , u ,j i = λ j < , for all k < j ≤ r. Suppose that k = r . Then ¯ x = ¯ U ¯ w = ¯ U ¯ w ∈ span (cid:0) ¯ U (cid:1) , which contradicts the assumption that ¯ x / ∈ span (cid:0) ¯ U (cid:1) . Hence,we have k < r . Now suppose that ¯ w k +1 = · · · = ¯ w r = 0 .Then ¯ x = k X j =1 u ,j ¯ w j = k X j =1 u ,j ¯ w j ∈ span ( U ) , which again contradicts the assumption that ¯ x / ∈ span ( U ) .Hence, there exists a j such that k < j ≤ r and ¯ w j = 0 . Thiscompletes the proof. An invariant measure on G m,r . The space U m,m admitsa standard invariant measure (the Haar measure) [21]. Thisdescends to a measure µ on G m,r which is also invariant inthe following sense: for any measurable set M ⊂ G m,r andany A ∈ U m,m , one has µ ( M ) = µ ( A M ) , where A M = { span ( AU ) : U ∈ U m,r , span ( U ) ∈ M} [21], [20]. Thisinvariant measure defines the uniform/isotropic distribution onthe Grassmann manifold. Furthermore, let span ( U ) ∈ G m,r be fixed and span ( V ) ∈ G m,r be drawn randomly from theisotropic distribution. The joint probability density functionof the principal angles between the spans of U and V isexplicitly given in [21], [22], [20], [23]. Two properties of thisdensity function will be relevant to our later analysis: first, itis independent of the choice of U ; second, there is no masspoint.III. F ROM THE F ROBENIUS N ORM TO THE G EOMETRIC M ETRIC
In the previous section, we showed that the matrix comple-tion problem reduces to a search for a consistent column space.In other words, one only needs to find a global minimum ofthe objective function f F ( U ) , where f F ( U ) , min W ∈ R r × n k X Ω − P Ω ( U W ) k F . (7)However, as we shall show in Section III-A, this approachhas a serious drawback: the objective function (7) is nota continuous function of the variable U . The discontinuityof the objective function is due to the composition of theFrobenius norm with the projection operator P Ω . It mayprevent gradient-descent-based algorithms from converging to a global optimum (see [17] for a detailed example). To addressthis issue, we propose another objective function f G ( U ) basedon the geometry of the problem, detailed in Section III-B. Tosolve the matrix completion problem, one then needs to solvethe problem ( P
2) : find a U ∈ U m,r such that f G ( U ) = 0 . (8)where f G denotes the geometric metric, which is formallydefined in Section III-B.In the rest of this section, we shall show that the newobjective function f G is a continuous function. Furthermore,we shall show that the preimage of f G ( U ) = 0 is the closure of the preimage of f F ( U ) = 0 . Because of these niceproperties of the geometric objective function, one can derivestrong performance guarantees for gradient descent methods,as described in Section IV. A. Why the Frobenius Norm Fails
We use an example to show that the objective function (7)based on the Frobenius norm is not continuous. Let x Ω ,i bethe i th column of the matrix X Ω . Let Ω i ⊂ [ m ] be the setof indices of known entries in the i th column. We use P Ω ,i to denote the projection operator corresponding to the indexset of Ω i . By additivity of the squared Frobenius norm, theobjective function can be written as a sum of atomic functions,i.e., f F ( U ) = min W ∈ R r × n k X Ω − P Ω ( U W ) k F = n X i =1 min w i ∈ R r k x Ω ,i − P Ω ,i ( U w i ) k F | {z } f F,i ( U ) . Denote the i th atomic function by f F,i ( U ) . It can be verifiedthat f F,i ( U ) = min w ∈ R r k x Ω ,i − P Ω ,i ( U w i ) k F = k x Ω ,i − P ( x Ω ,i , U Ω i ) k .F , where U Ω i = [ P Ω ,i ( u ) , · · · , P Ω ,i ( u r )] and u , · · · , u r arecolumn vectors of the matrix U . We show in the next examplethat an atomic function, say f F, ( U ) , may not be continuous. Example 1:
Suppose that x Ω , = [0 , , T and Ω = { , } .Let U be of the form U = (cid:2) √ − ǫ , ǫ, ǫ (cid:3) T ∈ U , where ǫ ∈ (cid:2) − / √ , / √ (cid:3) . For a given U , the atomic function f F, ( U ) is given by f F, ( U ) = min w ∈ R (cid:13)(cid:13)(cid:13) [0 , , T − P Ω , ( U w ) (cid:13)(cid:13)(cid:13) F . This is a quadratic optimization problem and can be easilysolved. The optimal w ∗ is given by w ∗ = ( ǫ if ǫ = 0 , if ǫ = 0 . Hence, one has f F, ( U ( ǫ )) = ( if ǫ ∈ h − √ , (cid:17) S (cid:16) , √ i , if ǫ = 0 . Figure 1. Contours projected to the ( u , u ) plane. The left depicts thecontours of the squared Frobenius norm. The right corresponds to the chordaldistance. which shows that f F, ( U ( ǫ )) has a singular point at ǫ = 0 .It is straightforward to verify that the overall objectivefunction (7) is also a discontinuous function of U . As weargued in [17], this discontinuity creates so called barriers,which may prevent gradient-descent algorithms from converg-ing to a global minimum. Hence, one seeks an optimizationcriteria that will allow for a continuous objective function andconsequently, no search path barriers. B. A Geometric Metric
To address the problem due to the singularities of theobjective functions, we propose to replace the Frobenius normby a geometric performance metric.In this case, the objective function is defined as f G ( U ) = n X i =1 f G,i ( U ) , where f G,i ( U ) denotes the geometric metric correspondingto the i th column, defined as follows. If x Ω ,i = , we set f G,i ( U ) = 0 . Henceforth, we only consider the case when x Ω ,i = . For any x Ω ,i = , let ¯ x Ω ,i = x Ω ,i / k x Ω ,i k F bethe normalized vector x Ω ,i . Let Ω ci = { , , · · · , m } \ Ω i bethe complement of Ω i . Let e k ∈ R m be the k th natural basisvector, i.e., the k th entry of e k equals to one and all otherentries are zero. Define B i = [ ¯ x Ω ,i , e k , · · · , e k ℓ ] , (9)where { k , · · · , k ℓ } = Ω ci . Let λ max (cid:0) B Ti U (cid:1) be the largestsingular value of the matrix B Ti U . Then f G,i ( U ) = 1 − λ (cid:0) B Ti U (cid:1) . (10)This expression is closely related to the chordal distancebetween two subspaces, as described in Section II-B. Wehenceforth refer to the function (10) either as the geometricmetric (10), or with slight abuse of terminology, as the chordaldistance.One advantage of the chordal distance is its continuity. Thisfollows directly from the continuity of the singular valuesof the underlying matrix. Recall Example 1. In Fig. 1, weillustrate the differences between f F, and f G, by projectingtheir contours of constant value onto the u - u plane. More importantly, the following theorem shows that thepreimage of f G,i ( U ) = 0 is actually the closure of thepreimage of f F,i ( U ) = 0 . Theorem 1:
Given x Ω ,i ∈ R m and Ω i ⊂ [ m ] . Let U Ω i ∈ R m × r be such that ( U Ω i ) k,ℓ = U k,ℓ if k ∈ Ω i and ( U Ω i ) k,ℓ =0 if k / ∈ Ω i . Define U F,i = n U ∈ U m,r : f F,i ( U ) = k x Ω ,i − P ( x Ω ,i , U Ω i ) k = 0 o and U G,i = (cid:8) U ∈ U m,r : f G,i ( U ) = 1 − λ max (cid:0) B Ti U (cid:1) = 0 (cid:9) . Then U G,i is the closure of U F,i , i.e., U G,i = U F,i .The proof is given in Appendix A. Although this theoremdeals with only one column of the observed matrix, the resultcan be easily extended to the whole matrix X Ω : let U F = T ni =1 U F,i and U G = n \ i =1 U G,i = (cid:8) U ∈ U m,r : λ max (cid:0) U T B i (cid:1) = 1 for all i (cid:9) ; (11)then U G = U F . Example 1 (Continued):
It can be seen that B = (cid:20) √ √ (cid:21) T . Hence, f G, ( U ) = 1 − λ (cid:18)(cid:20) √ ǫ √ − ǫ (cid:21)(cid:19) = 0 . As a result, U F, = (cid:26)hp − ǫ , ǫ, ǫ i T : ǫ ≤ and ǫ = 0 (cid:27)[ (cid:26)h − p − ǫ , ǫ, ǫ i T : ǫ ≤ and ǫ = 0 (cid:27) , and U G, = (cid:26)hp − ǫ , ǫ, ǫ i T : ǫ ≤ (cid:27)[ (cid:26)h − p − ǫ , ǫ, ǫ i T : ǫ ≤ (cid:27) . Clearly, U G, = U F, . C. Computations Related to the Chordal Distance
For a given performance metric, the computational complex-ity of the supporting optimization procedure is an importantfactor for assessing its practical value. In this subsection,we show that besides its continuity, the chordal distance andthe related gradient can be computed efficiently. Hence, allthe algorithmic solutions using gradient descent methods canbe easily modified to accommodate the geometric distortionmeasure.The principal angle θ i and the chordal distance sin θ i canbe computed using the singular value decomposition. Giventhe i th column of the observed matrix, one can form B i easily.Let λ i be the largest singular value of the matrix B i B Ti U , and let b i and v i be the corresponding left and right singularvectors, respectively . Following the definition of the chordaldistance, one has f G,i ( U ) = sin θ i = 1 − λ i . Let G i ∈ R m × r be a matrix such that ( G i ) k,ℓ = ∂∂ U k,ℓ f G,i ( U ) = − θ i ∂ cos θ i ∂ U k,ℓ . It can be verified that G i = − λ b i v Ti . (12)Note that in the matrix completion problem, one only needsto search for a column space span ( U ) consistent with theobservations. Taking this fact into consideration, we have [18] ∇ U f G = n X i =1 ∇ U f G,i = (cid:0) I − U U T (cid:1) n X i =1 G i . (13)Switching from the Frobenius norm to the chordal distancedoes not introduce extra computational cost. Due to theparticular structure of B i , the matrix multiplication B i B Ti U can be executed in O ( mr ) steps. The resulting matrix hasdimensions m × r , where typically r ≪ m . The majorcomputational burden is incurred by the singular value de-composition. Computing the largest singular value and thecorresponding singular vectors of an m × r matrix essentiallyreduces to computing the largest eigenvalue of an r × r matrix and the corresponding eigenvector. Hence, the overallcomplexity of computing f G,i is O (cid:0) mr + r (cid:1) = O (cid:0) mr (cid:1) ,where the O (cid:0) mr (cid:1) and O (cid:0) r (cid:1) terms come from matrixmultiplication and eigenvalue computation, respectively. Incomparison, to solve the least square problem in the definitionof f F,i has a O (cid:0) mr (cid:1) cost as well.IV. P ERFORMANCE G UARANTEES
Consider the matrix completion problem described in (8).The following theorem describes completion scenarios forwhich a global optimum can be found with probability one.
Theorem 2:
Consider the following cases:1) (rank-one matrices with arbitrary sampling) : Let X Ω = P Ω ( X ) for some unknown matrix X with rank equalto one. Here, Ω ⊂ [ m ] × [ n ] can be arbitrary.2) (full sampling with arbitrary rank matrices) : Let X Ω = X , i.e., Ω = [ m ] × [ n ] .Suppose that r = rank ( X ) is given. Let U G ⊂ U m,r be thepreimage of f G ( U ) = 0 (also defined in (11)). Let U berandomly generated from the isotropic distribution on U m,r ,and used as the initial point of the search procedure. Withprobability one, there exists a continuous path U ( t ) , t ∈ [0 , ,such that U (0) = U , U (1) ∈ U G and ddt f G ≤ for all t ∈ (0 , , where the equality holds if and only if U ∈ U G .The proof of the theorem is outlined in Section IV-A. Itis worth to note that almost all starting points are good: itis certainly good if the starting point is a consistent solution; For convenience, we use the following convention regarding the singularvectors b i and v i : we let the first nonzero entry of v i be positive; otherwise,we let v ′ i = − v i and b ′ i = − b i , and use v ′ i and b ′ i for singular valuedecomposition. The simultaneous changes in signs do not affect the singularvalue decomposition nor the computation of the gradient. otherwise, there exists a continuous path from this startingpoint to a global optimum such that the objective functionkeeps decreasing. The performance guarantee provided inTheorem 2 is strong in the sense that it does not require eitherincoherence conditions or large matrix sizes.A simple corollay of the Theorem 2 is the following result:suppose that the partial observations X Ω admit a uniqueconsistent solution in terms of the Frobenius norm; then agradient search procedure using the geometric norm finds thisunique solution with probability one. This conclusion followsfrom the fact that the solution set under the Frobenius normcontains only a single point and therefore U G = U F = U F .For the more general case where r > and Ω = [ m ] × [ n ] ,we can not prove the same performance guarantees. Neverthe-less, in Section IV-B, we present a collection of results thatmay be helpful for future exploration. A. Proof of Theorem 2
For our proof techniques, we need the following two as-sumptions.
Assumption I : There exists a global optimum U X ∈ U m,r such that f G ( U X ) = 0 and all the r principal angles betweenspan ( U X ) and span ( U ) are less than π/ . That is, all thesingular values of U TX U are strictly positive. Assumption II : All of the θ i ’s (the smallest principal anglebetween span ( U ) and span ( B i ) ) are less than π/ . Remark 1:
Suppose that the matrix U is randomly drawnfrom the uniform (isotropic) distribution on U m,r . Then U satisfies both assumptions with probability one. This resultcan be easily verified using the probability density functionof the principal angles [21], [22], [20], [23].Assuming that these two assumptions are satisfied, we havethe following two theorems corresponding to the two cases inTheorem 2, respectively. Theorem 3: (Rank-One Case)
Let X Ω be the partial obser-vation matrix generated from a rank-one matrix. Let u ∈U m, be an estimate of the column space that satisfies As-sumptions I and II. Suppose that P ni =1 sin θ i = 0 . Then thereexists a continuous path u ( t ) ∈ U m,r such that u (0) = u , u (1) ∈ U G , and ddt (cid:12)(cid:12) t =0 sin θ i ≤ for all i ∈ [ n ] , whereequality holds if and only if θ i (0) = 0 . Theorem 4: (Full-Sampling Case)
Let X ∈ R m × n be arank- r matrix. Let U ∈ U m,r satisfy Assumptions I and II.Suppose that P ni =1 sin θ i = 0 . Then there exists a U ( t ) ∈U m,r such that U (0) = U , U (1) ∈ U G and ddt (cid:12)(cid:12) t =0 sin θ i ≤ for all i ∈ [ n ] , where equality holds if and only if θ i (0) = 0 .The proofs of Theorem 3 and 4 are given in Appendix B andC, respectively. Since the proof techniques differ significantly,we present the two theorems/proofs separately.Both theorems are stated for derivatives taken at t = 0 . Nev-ertheless, the analysis can be extended for arbitrary t ∈ [0 , ,that is, ddt sin θ i ≤ for all t ∈ [0 , , where the equality holdsif and only if θ i ( t ) = 0 . To show that this is the case, note thatin proving both Theorem 3 and Theorem 4, we constructed acontinuous path U ( t ) such that U (0) = U and U (1) ∈ U G .By fixing this continuous path, we observe that: 1) All the r principal angles between span ( U ) andspan ( U (1)) are monotonically decreasing as t increasesto one. This implies that Assumption I holds for all t ∈ [0 , .2) We have θ i ( t ) < π/ for all i ∈ [ n ] and for all t ∈ [0 , ǫ ) for some sufficiently small ǫ > . This claim can beverified by invoking the facts that θ i (0) < π/ for all i ∈ [ n ] and that θ i is a continuous functions for all i ∈ [ n ] . As a result, all U ( t ) ’s, where t ∈ [0 , ǫ ) , satisfyAssumptions I and II.3) For every t in the interval [0 , ǫ ) , U ( t ) is the startingpoint of the geodesic path from U ( t ) to U (1) , whichis a part of the geodesic path from U (0) to U (1) . Usingthe same proof techniques as in Appendix B and C, itis clear that ddt sin θ i ( t ) ≤ for all t ∈ [0 , ǫ ) . Hence, θ i ( t ) ≤ θ i (0) < π for all i ∈ [ n ] and for all t ∈ [0 , ǫ ) .4) The arguments above can be extended. It can be verifiedthat θ i ( t ) ≤ θ i (0) < π/ for all i ∈ [ n ] and for all t ∈ [0 , . This implies that U ( t ) satisfies AssumptionsI and II for all t ∈ [0 , . Hence, ddt sin θ i ( t ) ≤ forall i ∈ [ n ] and all t ∈ [0 , , where the equality holds ifand only if θ i ( t ) = 0 . Theorem 2 therefore holds.A direct consequence of Theorem 2 is that for almostall U ∈ U m,r , there exists a continuous path leading toa global minimizer. However, one does not know this pathin the process of solving the matrix completion problem. Apractical approach is to use a gradient descent method. Weconsider the following randomized gradient descent algorithm.Let U ( i ) ∈ U m,r , i = 1 , , · · · , be the starting point of the i th iteration. Clearly, U ( i ) , i ≥ , is also the end point of the ( i − th iteration. We generate the sequence of U ( i ) ’s in thefollowing manner.1) Let U (1) be randomly generated from the isotropicdistribution.2) Set i = 1 . Execute the following iterative process.a) Compute the gradient ∇ U ( i ) f G .b) Let U ( i ) ( t ) be the geodesic curve starting at U ( i ) (0) = U ( i ) with direction H = −∇ U ( i ) f G .c) Let t ( i ) ∗ be such that ddt f G (cid:0) t ( i ) ∗ (cid:1) = 0 and ddt f G ( t ) < for all t < t ( i ) ∗ .d) Randomly generate a t ( i ) from the uniform distri-bution on (cid:0) , t ( i ) ∗ (cid:1) .e) Let U ( i +1) = U ( i ) (cid:0) t ( i ) (cid:1) . Let i = i +1 . Go to Step(a).Due to the randomness of U ( i ) , all U ( i ) ’s satisfy AssumptionsI and II with probability one. The objective function decreasesafter each iteration. This gradient descent procedure convergesto a global minimum as the number of iterations approachsinfinity. Remark 2:
Denote the obtained global minimum by ˆ U . Itmay happen that ˆ U ∈ U G \U F . In this case, the solution isinconsistent with respect to to the standard Frobenius norm.One can use perturbation techniques to move ˆ U from theboundary of U F to the interior region of U F . B. The General Framework
For the cases that are not described in Theorem 2, we havethe following corollary.
Corollary 1: (General Cases)
Let X ∈ R m × n be a rank- r matrix. Let U X ∈ U G be a global minimum. For each i ∈ [ n ] , the following statements are true. Let u X,i ∈ span ( U X ) T span ( B i ) be a unit norm vector. Let U ∈ U m,r and w i ∈ U r, be randomly drawn from the correspondingisotropic distributions respectively. Then with probability one,the vector u ,i , U w i is not orthogonal to u X,i . Supposethat this is true. Define θ i = cos − kP ( u i ( t ) , B i ) k . Thereexists a continuous path u i ( t ) ∈ U m, such that u i (0) = u ,i , u i (1) ∈ span ( U X,i ) T U m, , and ddt sin θ i ≤ , where theequality holds if and only if θ i ( t ) = 0 . Proof:
Without loss of generality, we assume that h u ,i , u X,i i > . The desired continuous path is given by u i ( t ) = (1 − t ) u ,i + t u X,i k (1 − t ) u ,i + t u X,i k , t ∈ [0 , . The detailed arguments are the same as those in the proof ofTheorem 3, and therefore omitted.
Remark 3:
This corollary is similar to Theorems 3 and 4in the sense that there exist continuous paths along which theatomic functions decreases.At the same time, Corollary 1 differs from Theorems 3and 4 in two aspects. First, the paths u i ( t ) in Corollary1 may be different for different i ’s, while in Theorems 3and 4, a single continuous path U ( t ) is constructed. Second,the angle θ i in Corollay 1 is essentially the principal anglebetween the 1-dimensional subspace span ( u i ( t )) and thesubspace span ( B i ) . In contrast, Theorem 3 and 4 involve theminimum principal angle between the r -dimensional subspacespan ( U ( t )) and the subspace span ( B i ) .V. C ONCLUSION
We considered the problem of how to search for a consistentcompletion of low-rank matrices. We showed that Frobeniusnorm combined with a projection operator results in a dis-continuous objective function and therefore makes gradientdescent approach fail. We proposed to replace the Frobeniusnorm with the chordal distance. The chordal distance is the“best” smooth version of the Frobenius norm in the sense thatthe solution set of the former is the closure of the solution setof the latter. Based on the chordal distance, we derived strongperformance guarantees for two completion scenarios. Thederived performance guarantees do not rely on incoherenceconditions or large matrix sizes, and they hold with probabilityone. A
PPENDIX
A. Proof of Theorem 1
We omit the subscript i to simplify notation. The proofconsists of two parts, showing that:1) U F ⊂ U G ;2) for any given U ∈ U G , there exists a sequence (cid:8) U ( n ) (cid:9) ⊂ U F such that lim n →∞ (cid:13)(cid:13) U − U ( n ) (cid:13)(cid:13) F = 0 . We start by proving that U F ⊂ U G . For any given U ∈ U F ,there exists a nonzero vector w ∈ R r such that U Ω w = x Ω .Let b = U w / k w k . Clearly, k b k F = 1 . Recall the formulafor B x Ω . We can write b as a linear combination of columnsof B x Ω : b = 1 k w k x Ω + X j ∈ Ω c b j e j = k x Ω kk w k ¯ x Ω + X j ∈ Ω c b j e j . As a result, (cid:13)(cid:13) B T x Ω b (cid:13)(cid:13) F = (cid:13)(cid:13)(cid:13)(cid:13) B T x Ω U w k w k F (cid:13)(cid:13)(cid:13)(cid:13) F = 1 . It follows that the largest singular value of B T x Ω U is one.Therefore, U ∈ U G , and we thus have U F ⊂ U G .To prove the second part, we make use of the followingnotation. For any given U ∈ U G , let u , · · · , u r be the leftsingular vectors of the matrix U U T B x Ω corresponding to the i th largest singular value. Let k be the multiplicity of the sin-gular value one, i.e., the number of singular values that equal toone. Let U k = [ u , · · · , u k ] and U k +1: r = [ u k +1 , · · · , u r ] .Clearly, λ max (cid:0) U Tk +1: r B x Ω (cid:1) < .It suffices to focus on U instead of U . That is, to prove thesecond part, it suffices to find a sequence in U F convergingto U . To verify this claim, let V = U T U . Then V ∈ U r,r and U = U V . Suppose that (cid:8) U ( n ) (cid:9) ⊂ U F is a sequencesuch that U ( n ) → U . It is clear that U ( n ) V → U V = U .Furthermore, since x Ω = U ( n )Ω w ( n ) = U ( n )Ω V (cid:16) V T w ( n ) (cid:17) = (cid:16) U ( n ) V (cid:17) Ω w ′ ( n ) , one has U ( n ) V ∈ U F . The sequence (cid:8) U ( n ) V (cid:9) ⊂ U F is thedesired sequence that converges to U . It is also important tonote that U ∈ U G , since λ (cid:0) U U T B x Ω (cid:1) = λ (cid:0) U V V T U T B x Ω (cid:1) = λ (cid:0) U U T B x Ω (cid:1) . We claim that U ∈ U F if and only if U k, Ω = . (14)To prove this claim, we shall show that U k, Ω = ⇒ U ∈ U F (15)and U k, Ω = ⇒ U / ∈ U F . (16)To prove (15), suppose that U k, Ω = 0 . Without loss ofgenerality, let u , Ω = . Since u is the left singular vectorcorresponding to the singular value equal to one, u can bewritten as a linear combination of the columns of B x Ω : u = a ¯ x Ω + P j ∈ Ω c a j e j . Since u , Ω = a ¯ x Ω = , one has a =0 . As a result, x Ω = a u , Ω for some constant a = 0 . Hence, d F ( x Ω , U ) = 0 and U ∈ U F .To prove (16), assume that U k, Ω = . Since P ( x Ω , U Ω ) = P ( x Ω , U k +1: r, Ω ) , proving that U / ∈ U F isequivalent to proving that x Ω − P ( x Ω , U k +1: r, Ω ) = . Thisinequality can be proved by contradiction. Suppose that wehave an equality. Then there exists a vector w ∈ R r − k such that U k +1: r, Ω w = x Ω . Let b = U k +1: r w / k w k . It isstraightforward to show (using similar arguments as the ones used for proving U F ⊂ U G ) that b ∈ span ( B x Ω ) and thelargest singular value of U Tk +1: r B x Ω is one. This contradictsthe fact that λ max (cid:0) U Tk +1: r B x Ω (cid:1) < .Now we are ready to construct a sequence in U F convergingto U . If U k, Ω = 0 , then U ∈ U F and it is trivial to find asequence in U F converging to U . It remains to find a sequence (cid:8) U ( n ) (cid:9) ⊂ U F that converges to U when U k, Ω = 0 . Define x r = x Ω − P ( x Ω , U Ω ) . Since U k, Ω = 0 , one has U / ∈ U F and x r = . Note that x r, Ω c = and that x r, Ω ⊥ u i, Ω for all i ∈ [ r ] . It can be verified that x r ⊥ u , · · · , x r ⊥ u r . Let U ǫ = (cid:20) u + ǫ x r √ ǫ , u , · · · , u r (cid:21) . It can be verified that U ǫ ∈ U m,r . Furthermore, P ( x Ω , U Ω ) = P ( x Ω , [ x r , U k +1: r, Ω ]) = x Ω and therefore U ǫ ∈ U F for all ǫ = 0 . Now choose a sequence (cid:8) U ( n ) (cid:9) = (cid:8) U /n (cid:9) . It is asequence in U F and it converges to U . This completes theproof. B. Proof of Theorem 3
Since X Ω is generated from a rank-one matrix, there exists a u X ∈ U m, such that u X ∈ span ( B i ) for all i ∈ [ n ] . Withoutloss of generality, we assume h u , u X i > : by Assumption I, h u , u X i 6 = 0 ; if h u , u X i < , we replace u X with − u X .Now define u ( t ) = (1 − t ) u + t u X k (1 − t ) u + t u X k = (1 − t ) u + t u X L ( t ) , where L ( t ) , k (1 − t ) u + t u X k . Clearly u (0) = u and u ( t ) ∈ U m, in a neighborhood of t = 0 .For every i ∈ [ n ] , we shall show that ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 sin θ i = − ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 (cid:18)
12 cos θ i (cid:19) ≤ , (17)where the equality holds if and only if θ i = 0 . Let P i u denotethe vector P ( u , B i ) = B i B Ti u . Since u X ∈ span ( B i ) , onehas P i u = 1 L ( t ) ((1 − t ) P i u + t u X ) . We then have ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 (cid:18)
12 cos θ i (cid:19) = ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 kP i u k = ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 " (cid:18) − tL ( t ) (cid:19) kP i u k + 12 (cid:18) tL ( t ) (cid:19) + (cid:0) t − t (cid:1) L ( t ) hP i u , u X i = ( − − L ′ (0)) kP i u k + hP i u , u X i . Note that hP i u , u X i = u T X B i B Ti u = h u , P i u X i = h u , u X i . Consequently, ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 (cid:18)
12 cos θ i (cid:19) = ( − − L ′ (0)) kP i u k + h u , u X i . (18) The term L ′ (0) can be computed as follows. Note that L ( t ) = (1 − t ) k u k + t k u X k + 2 (cid:0) t − t (cid:1) h u , u X i = 1 − t + 2 t + 2 (cid:0) t − t (cid:1) h u , u X i . Therefore, ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 L ( t ) = − h u , u X i = 2 L (0) L ′ (0) . As a result, L ′ (0) = − h u , u X i . (19)Substituting (19) into (18) one can see that ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 (cid:18)
12 cos θ i (cid:19) = h u , u X i (cid:16) − kP i u k (cid:17) ≥ , where the equality holds if and only if kP i u k = 1 , i.e., u ∈ span ( B i ) and θ i = 0 . This completes the proof. C. Proof of Theorem 4
Let U X ∈ U m,r be such that every column of X isin the subspace span ( U X ) . Consider the compact singulardecomposition U U T U X U T X = U ′ SU ′ T X , where S ∈ R r × r is the diagonal matrix containing the singular values and U ′ and U ′ X are the left and right singular vector matrices,respectively. Clearly, U and U ′ generate the same subspace,and so do U X and U ′ X . For simplicity, we present ourproof for U ′ and U ′ X and omit the superscripts. With thissimplification, one has U T U X = S = diag ([ λ , · · · , λ r ]) .For the i th column of X , we compute ∇ U cos θ i . Sincewe are considering the full sampling case, we have B i = ¯ x i .Because ¯ x i ∈ span ( U X ) , there exists ¯ w ∈ U r, such that ¯ x i = U X ¯ w . To compute ∇ U cos θ i , we need the first leftand the first right singular vectors of the matrix ¯ x i ¯ x Ti U . Thefirst left singular vector is clearly ¯ x i and the first right singularvector equals U T ¯ x i = U T U X ¯ w = S ¯ w . Hence, ∇ U cos θ i = (cid:0) I − U U T (cid:1) ¯ x i ¯ w T S T = (cid:0) I − U U T (cid:1) U X ¯ w ¯ w T S T . According to Lemma 1, (cid:0) I − U U T (cid:1) U X can be writtenas G diag ([sin α , · · · , sin α j ]) , where G = [ g , · · · , g r ] ∈U m,r , and α i = cos − λ i ’s, i = 1 , · · · , r , are the principalangles between span ( U ) and span ( U X ) .We consider the geodesic U ( t ) from U to U X . In Lemma1 (part 1), we show that this geodesic is given by the U ( t ) satisfying U (0) = U and ˙ U (0) = G diag ([ α , · · · , α r ]) .Along this path, we have ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 cos θ i = h∇ U cos θ i , G diag ([ α , · · · , α r ]) i = trace (cid:16) ( G diag ([ α , · · · , α r ])) T (cid:0)(cid:0) I − U U T (cid:1) U X (cid:1) ¯ w ¯ w T S T (cid:1) = trace (cid:0) diag ([ · · · , α j sin α j , · · · ]) ¯ w ¯ w T S (cid:1) = trace (cid:0)(cid:0) I − S (cid:1) ¯ w ¯ w T S (cid:1) = r X j =1 ¯ w j α j sin α j cos α j ≥ . (20) We claim that under Assumption II, equality in (20) holds ifand only if θ i = 0 . If θ i = 0 , then ¯ x i ∈ span ( U ) . Accordingto Lemma 1 (part 2), ¯ w j = 0 for all j such that α j = 0 .The equality in (20) thus holds. Otherwise, if θ i = 0 , then ¯ x i / ∈ span ( U ) . Again, according to Lemma 1 (part 2), thereexists an j ∈ [ r ] such that α i > and ¯ w j = 0 . Hence, wehave a strict inequality in (20). Finally, note that ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 sin θ i = − ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 cos θ i ≤ . This proves the theorem.R
EFERENCES[1] D. Donoho, “Compressed sensing,”
IEEE Trans. Inform. Theory , vol. 52,no. 4, pp. 1289–1306, 2006.[2] E. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: exactsignal reconstruction from highly incomplete frequency information,”
IEEE Trans. Inform. Theory , vol. 52, no. 2, pp. 489–509, 2006.[3] E. Candès and T. Tao, “Decoding by linear programming,”
InformationTheory, IEEE Transactions on , vol. 51, no. 12, pp. 4203–4215, 2005.[4] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-ranksolutions of linear matrix equations via nuclear norm minimization,” arXiv:0706.4138 , 2007.[5] E. Candes and B. Recht, “Exact matrix completion via convex optimiza-tion,” arXiv:0805.4471 , 2008.[6] E. J. Candes and T. Tao, “The power of convex relaxation: Near-optimalmatrix completion,” arXiv:0903.1476 , Mar. 2009.[7] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky, “Rank-sparsity incoherence for matrix decomposition,” arXiv:0906.2220 .[8] E. J. Candes and Y. Plan, “Matrix completion with noise,” arXiv:0903.3131 , Mar. 2009.[9] J. Cai, E. J. Candes, and Z. Shen, “A singular value thresholdingalgorithm for matrix completion,” arXiv:0810.3286 , 2008.[10] K. Lee and Y. Bresler, “ADMiRA: atomic decomposition for minimumrank approximation,” arXiv:0905.0044 , Apr. 2009.[11] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensingsignal reconstruction,”
IEEE Trans. Inform. Theory , vol. 55, pp. 2230 –2249, May 2009.[12] D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery fromincomplete and inaccurate samples,”
Applied and Computational Har-monic Analysis , vol. 26, pp. 301–321, May 2009.[13] R. Meka, P. Jain, and I. S. Dhillon, “Guaranteed rank minimization viasingular value projection,” arXiv:0909.5457 , 2009.[14] T. Blumensath and M. E. Davies, “Iterative hard thresholding forcompressed sensing,”
Applied and Computational Harmonic Analysis ,vol. 27, pp. 265–274, Nov. 2009.[15] J. Haldar and D. Hernando, “Rank-constrained solutions to linear matrixequations using powerfactorization,”
IEEE Signal Processing Letters ,pp. 16:584–587, 2009.[16] R. H. Keshavan, A. Montanari, and S. Oh, “Matrix completion from afew entries,” arXiv:0901.3150 , 2009.[17] W. Dai and O. Milenkovic, “SET: an algorithm for consistent matrixcompletion,” in
IEEE International Conf. on Acoustics, Speech, andSignal Processing (ICASSP) , March 2010.[18] A. Edelman, T. Arias, and S. T. Smith, “The geometry of algorithmswith orthogonality constraints,”
SIAM Journal on Matrix Analysis andApplications , vol. 20, pp. 303–353, April 1999.[19] J. H. Conway, R. H. Hardin, and N. J. A. Sloane, “Packing lines, planes,etc., packing in Grassmannian spaces,”
Exper. Math. , vol. 5, pp. 139–159, 1996.[20] W. Dai, Y. Liu, and B. Rider, “Quantization bounds on grassmannmanifolds and applications to mimo communications,”
IEEE Trans. onInform. Theory , vol. 54, pp. 1108 –1123, march 2008.[21] A. T. James, “Normal multivariate analysis and the orthogonal group,”
Ann. Math. Statist. , vol. 25, no. 1, pp. 40 – 75, 1954.[22] M. Adler and P. van Moerbeke, “Integrals over Grassmannians andrandom permutations,”
Advances in Mathematics , vol. 181, no. 1,pp. 190–249, 2004.[23] W. Dai, B. Rider, and Y. Liu, “Volume growth and general rate quan-tization on grassmann manifolds,” in