[PDF] A Geometric Approach to Low-Rank Matrix Completion

Abstract

The low-rank matrix completion problem can be succinctly stated as follows: given a subset of the entries of a matrix, find a low-rank matrix consistent with the observations. While several low-complexity algorithms for matrix completion have been proposed so far, it remains an open problem to devise search procedures with provable performance guarantees for a broad class of matrix models. The standard approach to the problem, which involves the minimization of an objective function defined using the Frobenius metric, has inherent difficulties: the objective function is not continuous and the solution set is not closed. To address this problem, we consider an optimization procedure that searches for a column (or row) space that is geometrically consistent with the partial observations. The geometric objective function is continuous everywhere and the solution set is the closure of the solution set of the Frobenius metric. We also preclude the existence of local minimizers, and hence establish strong performance guarantees, for special completion scenarios, which do not require matrix incoherence or large matrix size.

Full PDF

aa r X i v : . [ c s . I T ] J un A Geometric Approach to Low-Rank MatrixCompletion

Wei Dai ∗ , Ely Kerman ∗∗ , Olgica Milenkovic ∗∗ Department of Electrical and Computer Engineering, ∗∗ Department of MathematicsUniversity of Illinois at Urbana-ChampaignEmail: {weidai07,ekerman,milenkov}@illinois.edu

Abstract —The low-rank matrix completion problem can besuccinctly stated as follows: given a subset of the entries of amatrix, ﬁnd a low-rank matrix consistent with the observations.While several low-complexity algorithms for matrix completionhave been proposed so far, it remains an open problem to devisesearch procedures with provable performance guarantees for abroad class of matrix models. The standard approach to theproblem, which involves the minimization of an objective functiondeﬁned using the Frobenius metric, has inherent difﬁculties: theobjective function is not continuous and the solution set is notclosed. To address this problem, we consider an optimizationprocedure that searches for a column (or row) space thatis geometrically consistent with the partial observations. Thegeometric objective function is continuous everywhere and thesolution set is the closure of the solution set of the Frobeniusmetric. We also preclude the existence of local minimizers,and hence establish strong performance guarantees, for specialcompletion scenarios, which do not require matrix incoherenceor large matrix size.

I. I

NTRODUCTION

In many practical applications of data acquisition, the sig-nals of interest have a sparse representation in some basis.That is, they can be well approximated using only a few basiselements. This allows for efﬁcient sampling and reconstructionof signals [1], [2], [3], [4], [5], [6]. More precisely, the numberof linear measurements required to capture a sparse signalcan be much smaller than the number of inherent dimensionsof the signal, and various polynomial time algorithms areknown for accurately reconstructing the sparse signal basedon these linear measurements. Due to the signiﬁcant reductionin sampling resources and modest requirements for compu-tational resources, sparse signal processing has been studiedintensively [1], [2], [3], [4], [5], [6].There are two categories of sparse signals which frequentlyarise in applications. In the ﬁrst category, the sparse signal canbe modeled a vector with only a small fraction of non-zeroentries. Compressive sensing is the framework of sampling andrecovering such signals. In the second category, the signals arerepresented by matrices whose ranks are much smaller thaneither of their dimensions. In the second setting, one of thefundamental problems of sparse signal processing is the low-rank matrix completion problem – to determine when and howone can recover a low-rank matrix based on only a subset ofits entries [5], [6], [7].Scores of methods and algorithms have been proposed forlow-rank matrix completion. Many of them are based on sim- ilarities between compressive sensing reconstruction and low-rank matrix completion. In general, both reconstruction tasksare ill-posed and computationally intractable. Nevertheless,exact recovery in an efﬁcient manner is possible for both signalcategories provided that the signal is sufﬁciently sparse or suf-ﬁciently densely sampled. Casting the sparse signal recoveryproblem as an optimization problem, ℓ -minimization has beenproposed for compressive sensing signal reconstruction [1],[2], [3]. Following the same idea, methods based on nuclearnorm minimization have been developed for low-rank matrixcompletion [5], [6], [8], [9]. In terms of greedy algorithms,many of the approaches for low-rank completion can beviewed as generalizations of their counterparts for compressivesensing reconstruction. In particular, the ADMiRA algorithm[10] is a counterpart of the subspace pursuit (SP) [11] andCoSaMP [12] algorithms, while the singular value projection(SVP) method [13] extends the iterative hard thresholding(IHT) [14] approach. There are also other approaches thatutilize some special structural properties of the low-rankmatrices. Examples include the power factorization algorithm[15], the OptSpace algorithm [16], and the subspace evolutionand transfer algorithm [17].Nevertheless, there is a fundamental problem in low-rankmatrix completion which has not been successfully addressedyet: how to search for a low-rank matrix consistent withpartial observations. The fundamental difference between com-pressive sensing and low-rank matrix completion lies in theknowledge of the “sparse basis”. In compressive sensing, thebasis under which the signal is sparse is known a priori. Inprinciple, the support set of the nonzero entries can be foundby exhaustive search. However, in low-rank matrix completion,the corresponding “sparse basis” is not known. Note that theset of all possible bases forms a continuous space. In such aspace, “exhaustive” search is impossible. Moreover, we shallshow, in Example 1 of Section III, that a direct gradient-descent search does not work either.The understanding of the search for consistent matricesis incomplete. There are two special cases where speciallydesigned algorithms can guarantee a consistent low-rank so-lution. The ﬁrst case is when the low-rank matrix is fullysampled. The consistent low-rank solution is simply the obser-vation matrix itself. The corresponding “sparse basis” (singularvectors) can be easily obtained by a singular value decompo-sition. The other case is when the rank equals to one. Givenan arbitrary sampling pattern, one simply looks at the ratios between the revealed entries in the same column and usesthese ratios to construct a column vector that represents thecolumn space. This method is guaranteed to ﬁnd a consistentsolution for rank-one matrices. However, it remains an openproblem how to extend this method for general ranks. Hence,such an approach is not universal. On the other hand, none ofexisting general algorithms provides performance guaranteeeven for the rank-one case. The performance guarantee ofnuclear norm minimization is built on incoherence conditions,which only holds with high probability when the low-rankmatrix is drawn randomly from certain ensembles and whenthe size of the matrix is sufﬁciently large. Our understandingof low-rank matrix completion is far from complete.Our approach to address these issues is summarized asfollows.1) We provide a framework for searching for a low-rankmatrix that is consistent with the partial observations.There is no requirement that such a matrix is unique: ifthere is a unique low-rank solution, we should be ableto ﬁnd this unique matrix; otherwise, it sufﬁces to ﬁndjust one solution that agrees with the revealed entries. Inour approach, we assume that the rank of the underlyinglow-rank matrix is known a priori. Finding a consistentlow-rank matrix is equivalent to ﬁnding a consistentcolumn/row space. This is different from the OptSpacealgorithm in [16], where the search is performed on bothcolumn and row spaces simultaneously.2) We propose a geometric performance metric to measurethe consistency between the estimated column space andthe partial observations. In the literature, the standardapproach is to minimize an objective function that isdeﬁned via the Frobenius norm. As we shall illustratewith explicit examples, this objective function may havesingularities, and therefore the corresponding solutionset may not be closed. Hence, we introduce a new for-mulation where consistency is now deﬁned in geometricterms. This allows us to address the difﬁculties relatedto the Frobenius metric. In particular, we show thatour geometric objective function is always continuous.The set of the corresponding consistent solutions is theclosure of the set corresponding to the Frobenius norm.This new metric allows for provably strong performanceguarantees, described below.3) We provide strong performance guarantees for specialcompletion scenarios: rank-one matrices with arbitrarysampling patterns, and fully sampled matrices of arbi-trary rank. For these two scenarios, a gradient descentsearch starting from a random point will converge to aglobal minimum with probability one. More importantly,if the partial observations admit a unique consistentsolution, this search procedure ﬁnds this unique solutionwith probability one.

The performance guarantees aredifferent from those previously established in litera-ture. Roughly speaking, previous performance guaran- For full sampled matrices, even though using a simple singular valuedecomposition produces a consistent column space, it is not clear that arandomlly initialized search would converge to a consistent column space.In what follows, we prove that this is the case. tees require large matrix sizes and only hold with highprobability. Ours hold with probability one regardlessof matrix size. It is also worth noting that we do notrequire incoherence conditions, which are essential forthe performance guarantees of nuclear norm minimiza-tion. Unfortunately, we are presently unable to obtainperformance guarantees for more general cases.The paper is organized as follows. In Section II we in-troduce the low-rank matrix completion problem, and somebackground material regarding Grassmann manifolds and theirgeometry. In Section III we show that formulating the low-rank matrix completion problem as an optimization problemusing the Frobenius norm may yield singularities which canobstruct standard minimization algorithms. We then proposea new geometric formulation of the problem as a remedyto this difﬁculty. This new formulation allows for strongperformance guarantees that are presented in Section IV.Section V summarizes the main contributions of the work.Proofs of the main results are presented in the Appendices.II. L OW -R ANK M ATRIX C OMPLETION AND P RELIMINARIES

Let X ∈ R m × n be an unknown matrix with rank r ≤ min ( m, n ) , and let Ω ⊂ [ m ] × [ n ] be the set of indices ofthe observed entries, where [ K ] = { , , · · · , K } . Deﬁne theprojection operator P Ω by P Ω : R m × n → R m × n P Ω ( X ) X Ω , where ( X Ω ) i,j = ( X i,j if ( i, j ) ∈ Ω0 if ( i, j ) / ∈ Ω . The consistent matrix completion problem is to ﬁnd one rank- r matrix X ′ that is consistent with the observations X Ω , i.e., ( P

0) : ﬁnd a X ′ such thatrank ( X ′ ) = r and P Ω ( X ′ ) = P Ω ( X ) = X Ω . (1)By deﬁnition, this problem is well deﬁned since X Ω isobtained from some rank- r matrix X which is therefore asolution. As in other works, [10], [15], [16], we assume thatthe rank r is given. In practice, one may try to sequentiallyguess a rank bound until a satisfactory solution has been found.We also introduce the (standard) projection operator P , P : R m × R m × k → R m P ( x , U ) y = U U † x , where ≤ k ≤ m , and where the superscript † denotes thepseudoinverse of a matrix. Let span ( U ) denote the subspacespanned by the columns of the matrix U , i.e.,span ( U ) = { v ∈ R m : v = U w for some w ∈ R m } . One can describe P ( x , U ) , in geometric terms, as the pro-jection of the vector x onto span ( U ) . It should be observedthat U † x is the global minimizer of the quadratic optimizationproblem min w ∈ R k k x − U w k . A. Search for a consistent column space

We now show that the problem ( P is equivalent to ﬁndinga column space consistent with the observed entries of X .Let U m,r be the set of m × r matrices with r orthonormalcolumns, i.e., U m,r = (cid:8) U ∈ R m × r : U T U = I r (cid:9) . Deﬁnethe function f F : U m,r → R by setting f F ( U ) = min W ∈ R n × r (cid:13)(cid:13) X Ω − P Ω (cid:0) U W T (cid:1)(cid:13)(cid:13) F , (2)where k·k F denotes the Frobenius norm. This function mea-sures the consistency between the matrix U and the obser-vations X Ω . In particular, if f F ( U ) = 0 , then there existsa matrix W such that the rank- r matrix U W T satisﬁes P Ω (cid:0) U W T (cid:1) = X Ω . Hence, the consistent matrix completionproblem is equivalent to ( P

1) : ﬁnd U ∈ U m,r such that f F ( U ) = 0 . (3)In fact, f F ( U ) depends only on the subspace span ( U ) sincethe columns of a matrix of the form U W T all lie in span ( U ) .Hence, to solve the consistent matrix completion problem, itsufﬁces to ﬁnd a column space that is consistent with theobserved entries. Note that the same conclusion holds for therow space as well. For simplicity, we restrict our attention tothe column space only. B. Grassmann Manifolds

The set of column spaces of elements in U m,r can beidentiﬁed with the Grassmann manifold G m,r , the set of r -dimensional subspaces in the m -dimensional Euclidean space R m . This is a smooth compact manifold of dimension r ( m − r ) . Conversely, every element, say U ∈ G m,r can be presentedby a generator matrix U ∈ U m,r satisfying span ( U ) = U .However, this presentation of U by a generator matrix isclearly not unique. Nevertheless, it follows from the discussionin the previous section that the function f F descends to afunction on G m,r . Thus, problem ( P can be viewed as anoptimization problem on the compact manifold G m,r .In this section we recall some facts concerning the geometryof Grassmann manifolds which will be useful in addressingthis and similar optimization problems. For the proofs of thesefacts the reader is referred to [18]. We begin by recallingthe construction of the standard Riemannian metric, g m,r , on G m,r . Note that the group U m,m of orthogonal m × m matricesacts transitively on G m,r (by multiplication on generatormatrices). More precisely, G m,r can be described as a quotientof U m,m , i.e., G m,r = U m,m / ( U m − r,m − r × U r,r ) Now, as a compact Lie group, U m,m has a standard (bi-invariant) Riemannian metric (can be deﬁned by using innerproduct in the tangent space). This descends to the quotient G m,r as the metric g m,r . By construction, g m,r is invariantunder the action of U m,m .The metric g m,r determines a chordal distance function andgeodesic curves on G m,r which will play an important role inwhat follows. To obtain the relevant formulas for these objectswe require the notion of the principal angles between two subspaces [19], [20]. Consider the subspaces span ( U ) andspan ( V ) of R m for some U ∈ U m,p and V ∈ U m,q . Theprincipal angles between these two subspaces can be deﬁned inthe following constructive manner. Without loss of generality,assume that ≤ p ≤ q ≤ m . Let u ∈ span ( U ) and v ∈ span ( V ) be unit-length vectors such that (cid:12)(cid:12) u T v (cid:12)(cid:12) is maximal.Inductively, let u k ∈ span ( U ) and v k ∈ span ( V ) be unitvectors such that u Tk u j = 0 and v Tk v j = 0 for all ≤ j < k and (cid:12)(cid:12) u Tk v k (cid:12)(cid:12) is maximal. The principal angles are then deﬁnedas α k = arccos u Tk v k for k = 1 , , · · · , p .Alternatively, the principal angles can be computed viasingular value decomposition. Consider the singular valuedecomposition U U T V V T = ¯ U Λ ¯ V T , where ¯ U ∈ U m,p and ¯ V ∈ U m,p contain the ﬁrst p left and right singular vectors,respectively, and Λ ∈ R p × p is a diagonal matrix comprisedof singular values λ ≥ · · · ≥ λ p . Then the k th columns of ¯ U and ¯ V correspond to the vectors u k and v k used in theconstructive deﬁnition, respectively. The k th singular value λ k deﬁnes the k th principal angle α k via cos α k = λ k . Chordal distance on G m,r . For U and U in U m,r , the chordal distance between the two subspaces span ( U ) andspan ( U ) in G m,r is given, in terms of the p principal anglesbetween them, via the formula vuut r X k =1 sin α k . The chordal distance can also be expressed in terms of singularvalues as vuut r X k =1 (1 − λ k ) . Geodesics on G m,r . We will use the gradient descent methodon G m,r to search for consistent column spaces. This willrequire some information concerning the geodesics of themetric g m,r on G m,r which we now recall.Roughly speaking, a geodesic in a manifold is a general-ization of the notion of a straight line in the Euclidean space:given any two points in G m,r , among all curves that connectthese two points, the one of the shortest length is geodesic.More precisely, ﬁx a subspace U in G m,r and a tangent vector H to G m,r at U . Let U ∈ U m,r be a generator matrixfor U . The tangent space to G m,r at U can be identiﬁedwith the set of horizontal tangent vectors to U , i.e., the setof tangent vectors W at U which satisfy U T W = 0 [18].Let H ∈ R m × r be the horizontal tangent vector at U whichcorresponds to H and set U ( t ) = [ U V H , U H ] (cid:20) cos ( S H t )sin ( S H t ) (cid:21) V TH , (4)where U H S H V TH is the compact singular value decompositionof H . Then span ( U ( t )) is the unique geodesic of g m,r whichstarts at U with “initial velocity” H . We now use this general solution for the geodesic ﬂowof g m,r to establish the following technical result concerninggeodesics between a given pair of subspaces. Lemma 1:

Fix two elements U and U of U m,r . Let V Λ V T be the singular value decomposition of the matrix U T U , and denote the i th singular value by λ i = cos α i . Set ¯ U = U V and ¯ U = U V and note that ¯ U T ¯ U = Λ .

1) Consider the path U ( t ) = (cid:2) ¯ U , G (cid:3) (cid:20) diag ([ · · · , cos α i t, · · · ]) diag ([ · · · , sin α i t, · · · ]) (cid:21) V T , (5)where the columns of G = [ · · · , g i , · · · ] ∈ R m × r aredeﬁned as follows g i =  ¯ U , : i − λ i ¯ U , : i k ¯ U , : i − λ i ¯ U , : i k if λ i = 1 , if λ i = 1 . Here, the subscript : i denotes the i th column of thecorresponding matrix. Then the path span ( U ( t )) is ageodesic of g m,r such that span ( U (0)) = span ( U ) and span ( U (1)) = span ( U ) .2) Let ¯ x ∈ span ( U ) be a unit-norm vector. It’s clear thatthere exists a unique ¯ w ∈ U r, such that ¯ x = ¯ U ¯ w .Suppose that ¯ x / ∈ span (cid:0) ¯ U (cid:1) . Let k the number of thesingular values of ¯ U T ¯ U that equal to one. Then k < r and there exists an index j ∈ [ r ] such that k < j ≤ r and ¯ w j = 0 . Proof:

Clearly, U (0) = U . Since ¯ U T ¯ U = Λ , we have (cid:13)(cid:13) ¯ U , : i − λ i ¯ U , : i (cid:13)(cid:13) = 1 − λ i (cid:10) ¯ U , : i , ¯ U , : i (cid:11) + λ i = 1 − λ i . Thus, we have U (1) = (cid:2) · · · , ¯ U , : i cos α i + g i sin α i , · · · (cid:3) V T = (cid:20) · · · , ¯ U , : i λ i + g i q − λ i , · · · (cid:21) V T = (cid:2) · · · , ¯ U , : i λ i + g i (cid:13)(cid:13) ¯ U , : i − λ i ¯ U , : i (cid:13)(cid:13) , · · · (cid:3) V T = (cid:0) ¯ U Λ + (cid:0) ¯ U − ¯ U Λ (cid:1)(cid:1) V T = U V V T . Hence, span ( U (1)) = span ( U ) . To prove the ﬁrst part of thelemma it just remains to show that span ( U ( t )) is geodesic.Setting H = ˙ U (0) we have H = G diag ([ · · · , α i , · · · ]) V T . (6)We ﬁrst verify that the tangent vector H is horizontal whichis equivalent to showing that U T H = 0 . According to thedeﬁnition of the vectors g i , when λ i = 1 , one has (cid:13)(cid:13) ¯ U , : i − λ i ¯ U , : i (cid:13)(cid:13) = 0 and ¯ U T g i = 1 (cid:13)(cid:13) ¯ U , : i − λ i ¯ U , : i (cid:13)(cid:13) ¯ U T (cid:0) ¯ U , : i − λ i ¯ U , : i (cid:1) = 1 (cid:13)(cid:13) ¯ U , : i − λ i ¯ U , : i (cid:13)(cid:13) λ i e i − λ i e i = . Hence, U T G = V T ¯ U T G = . By (6), this implies that U T H = 0 , as desired. Note thatequation (6) can also be viewed as an expression for thecompact singular value decomposition of H . It then followsdirectly from (4) that span ( U ( t )) is indeed a geodesic.To prove the second part of the lemma, let u , , · · · , u ,r and u , , · · · , u ,r be the column vectors of the matrix ¯ U and ¯ U , respectively. By assumption, λ = · · · = λ k = 1 and > λ k +1 ≥ · · · ≥ λ r . Hence, u ,j = u ,j , for all j ≤ k, and h u ,j , u ,j i = λ j < , for all k < j ≤ r. Suppose that k = r . Then ¯ x = ¯ U ¯ w = ¯ U ¯ w ∈ span (cid:0) ¯ U (cid:1) , which contradicts the assumption that ¯ x / ∈ span (cid:0) ¯ U (cid:1) . Hence,we have k < r . Now suppose that ¯ w k +1 = · · · = ¯ w r = 0 .Then ¯ x = k X j =1 u ,j ¯ w j = k X j =1 u ,j ¯ w j ∈ span ( U ) , which again contradicts the assumption that ¯ x / ∈ span ( U ) .Hence, there exists a j such that k < j ≤ r and ¯ w j = 0 . Thiscompletes the proof. An invariant measure on G m,r . The space U m,m admitsa standard invariant measure (the Haar measure) [21]. Thisdescends to a measure µ on G m,r which is also invariant inthe following sense: for any measurable set M ⊂ G m,r andany A ∈ U m,m , one has µ ( M ) = µ ( A M ) , where A M = { span ( AU ) : U ∈ U m,r , span ( U ) ∈ M} [21], [20]. Thisinvariant measure deﬁnes the uniform/isotropic distribution onthe Grassmann manifold. Furthermore, let span ( U ) ∈ G m,r be ﬁxed and span ( V ) ∈ G m,r be drawn randomly from theisotropic distribution. The joint probability density functionof the principal angles between the spans of U and V isexplicitly given in [21], [22], [20], [23]. Two properties of thisdensity function will be relevant to our later analysis: ﬁrst, itis independent of the choice of U ; second, there is no masspoint.III. F ROM THE F ROBENIUS N ORM TO THE G EOMETRIC M ETRIC

In the previous section, we showed that the matrix comple-tion problem reduces to a search for a consistent column space.In other words, one only needs to ﬁnd a global minimum ofthe objective function f F ( U ) , where f F ( U ) , min W ∈ R r × n k X Ω − P Ω ( U W ) k F . (7)However, as we shall show in Section III-A, this approachhas a serious drawback: the objective function (7) is nota continuous function of the variable U . The discontinuityof the objective function is due to the composition of theFrobenius norm with the projection operator P Ω . It mayprevent gradient-descent-based algorithms from converging to a global optimum (see [17] for a detailed example). To addressthis issue, we propose another objective function f G ( U ) basedon the geometry of the problem, detailed in Section III-B. Tosolve the matrix completion problem, one then needs to solvethe problem ( P

2) : ﬁnd a U ∈ U m,r such that f G ( U ) = 0 . (8)where f G denotes the geometric metric, which is formallydeﬁned in Section III-B.In the rest of this section, we shall show that the newobjective function f G is a continuous function. Furthermore,we shall show that the preimage of f G ( U ) = 0 is the closure of the preimage of f F ( U ) = 0 . Because of these niceproperties of the geometric objective function, one can derivestrong performance guarantees for gradient descent methods,as described in Section IV. A. Why the Frobenius Norm Fails

We use an example to show that the objective function (7)based on the Frobenius norm is not continuous. Let x Ω ,i bethe i th column of the matrix X Ω . Let Ω i ⊂ [ m ] be the setof indices of known entries in the i th column. We use P Ω ,i to denote the projection operator corresponding to the indexset of Ω i . By additivity of the squared Frobenius norm, theobjective function can be written as a sum of atomic functions,i.e., f F ( U ) = min W ∈ R r × n k X Ω − P Ω ( U W ) k F = n X i =1 min w i ∈ R r k x Ω ,i − P Ω ,i ( U w i ) k F | {z } f F,i ( U ) . Denote the i th atomic function by f F,i ( U ) . It can be veriﬁedthat f F,i ( U ) = min w ∈ R r k x Ω ,i − P Ω ,i ( U w i ) k F = k x Ω ,i − P ( x Ω ,i , U Ω i ) k .F , where U Ω i = [ P Ω ,i ( u ) , · · · , P Ω ,i ( u r )] and u , · · · , u r arecolumn vectors of the matrix U . We show in the next examplethat an atomic function, say f F, ( U ) , may not be continuous. Example 1:

Suppose that x Ω , = [0 , , T and Ω = { , } .Let U be of the form U = (cid:2) √ − ǫ , ǫ, ǫ (cid:3) T ∈ U , where ǫ ∈ (cid:2) − / √ , / √ (cid:3) . For a given U , the atomic function f F, ( U ) is given by f F, ( U ) = min w ∈ R (cid:13)(cid:13)(cid:13) [0 , , T − P Ω , ( U w ) (cid:13)(cid:13)(cid:13) F . This is a quadratic optimization problem and can be easilysolved. The optimal w ∗ is given by w ∗ = ( ǫ if ǫ = 0 , if ǫ = 0 . Hence, one has f F, ( U ( ǫ )) = ( if ǫ ∈ h − √ , (cid:17) S (cid:16) , √ i , if ǫ = 0 . Figure 1. Contours projected to the ( u , u ) plane. The left depicts thecontours of the squared Frobenius norm. The right corresponds to the chordaldistance. which shows that f F, ( U ( ǫ )) has a singular point at ǫ = 0 .It is straightforward to verify that the overall objectivefunction (7) is also a discontinuous function of U . As weargued in [17], this discontinuity creates so called barriers,which may prevent gradient-descent algorithms from converg-ing to a global minimum. Hence, one seeks an optimizationcriteria that will allow for a continuous objective function andconsequently, no search path barriers. B. A Geometric Metric

To address the problem due to the singularities of theobjective functions, we propose to replace the Frobenius normby a geometric performance metric.In this case, the objective function is deﬁned as f G ( U ) = n X i =1 f G,i ( U ) , where f G,i ( U ) denotes the geometric metric correspondingto the i th column, deﬁned as follows. If x Ω ,i = , we set f G,i ( U ) = 0 . Henceforth, we only consider the case when x Ω ,i = . For any x Ω ,i = , let ¯ x Ω ,i = x Ω ,i / k x Ω ,i k F bethe normalized vector x Ω ,i . Let Ω ci = { , , · · · , m } \ Ω i bethe complement of Ω i . Let e k ∈ R m be the k th natural basisvector, i.e., the k th entry of e k equals to one and all otherentries are zero. Deﬁne B i = [ ¯ x Ω ,i , e k , · · · , e k ℓ ] , (9)where { k , · · · , k ℓ } = Ω ci . Let λ max (cid:0) B Ti U (cid:1) be the largestsingular value of the matrix B Ti U . Then f G,i ( U ) = 1 − λ (cid:0) B Ti U (cid:1) . (10)This expression is closely related to the chordal distancebetween two subspaces, as described in Section II-B. Wehenceforth refer to the function (10) either as the geometricmetric (10), or with slight abuse of terminology, as the chordaldistance.One advantage of the chordal distance is its continuity. Thisfollows directly from the continuity of the singular valuesof the underlying matrix. Recall Example 1. In Fig. 1, weillustrate the differences between f F, and f G, by projectingtheir contours of constant value onto the u - u plane. More importantly, the following theorem shows that thepreimage of f G,i ( U ) = 0 is actually the closure of thepreimage of f F,i ( U ) = 0 . Theorem 1:

Given x Ω ,i ∈ R m and Ω i ⊂ [ m ] . Let U Ω i ∈ R m × r be such that ( U Ω i ) k,ℓ = U k,ℓ if k ∈ Ω i and ( U Ω i ) k,ℓ =0 if k / ∈ Ω i . Deﬁne U F,i = n U ∈ U m,r : f F,i ( U ) = k x Ω ,i − P ( x Ω ,i , U Ω i ) k = 0 o and U G,i = (cid:8) U ∈ U m,r : f G,i ( U ) = 1 − λ max (cid:0) B Ti U (cid:1) = 0 (cid:9) . Then U G,i is the closure of U F,i , i.e., U G,i = U F,i .The proof is given in Appendix A. Although this theoremdeals with only one column of the observed matrix, the resultcan be easily extended to the whole matrix X Ω : let U F = T ni =1 U F,i and U G = n \ i =1 U G,i = (cid:8) U ∈ U m,r : λ max (cid:0) U T B i (cid:1) = 1 for all i (cid:9) ; (11)then U G = U F . Example 1 (Continued):

It can be seen that B = (cid:20) √ √ (cid:21) T . Hence, f G, ( U ) = 1 − λ (cid:18)(cid:20) √ ǫ √ − ǫ (cid:21)(cid:19) = 0 . As a result, U F, = (cid:26)hp − ǫ , ǫ, ǫ i T : ǫ ≤ and ǫ = 0 (cid:27)[ (cid:26)h − p − ǫ , ǫ, ǫ i T : ǫ ≤ and ǫ = 0 (cid:27) , and U G, = (cid:26)hp − ǫ , ǫ, ǫ i T : ǫ ≤ (cid:27)[ (cid:26)h − p − ǫ , ǫ, ǫ i T : ǫ ≤ (cid:27) . Clearly, U G, = U F, . C. Computations Related to the Chordal Distance

For a given performance metric, the computational complex-ity of the supporting optimization procedure is an importantfactor for assessing its practical value. In this subsection,we show that besides its continuity, the chordal distance andthe related gradient can be computed efﬁciently. Hence, allthe algorithmic solutions using gradient descent methods canbe easily modiﬁed to accommodate the geometric distortionmeasure.The principal angle θ i and the chordal distance sin θ i canbe computed using the singular value decomposition. Giventhe i th column of the observed matrix, one can form B i easily.Let λ i be the largest singular value of the matrix B i B Ti U , and let b i and v i be the corresponding left and right singularvectors, respectively . Following the deﬁnition of the chordaldistance, one has f G,i ( U ) = sin θ i = 1 − λ i . Let G i ∈ R m × r be a matrix such that ( G i ) k,ℓ = ∂∂ U k,ℓ f G,i ( U ) = − θ i ∂ cos θ i ∂ U k,ℓ . It can be veriﬁed that G i = − λ b i v Ti . (12)Note that in the matrix completion problem, one only needsto search for a column space span ( U ) consistent with theobservations. Taking this fact into consideration, we have [18] ∇ U f G = n X i =1 ∇ U f G,i = (cid:0) I − U U T (cid:1) n X i =1 G i . (13)Switching from the Frobenius norm to the chordal distancedoes not introduce extra computational cost. Due to theparticular structure of B i , the matrix multiplication B i B Ti U can be executed in O ( mr ) steps. The resulting matrix hasdimensions m × r , where typically r ≪ m . The majorcomputational burden is incurred by the singular value de-composition. Computing the largest singular value and thecorresponding singular vectors of an m × r matrix essentiallyreduces to computing the largest eigenvalue of an r × r matrix and the corresponding eigenvector. Hence, the overallcomplexity of computing f G,i is O (cid:0) mr + r (cid:1) = O (cid:0) mr (cid:1) ,where the O (cid:0) mr (cid:1) and O (cid:0) r (cid:1) terms come from matrixmultiplication and eigenvalue computation, respectively. Incomparison, to solve the least square problem in the deﬁnitionof f F,i has a O (cid:0) mr (cid:1) cost as well.IV. P ERFORMANCE G UARANTEES

Consider the matrix completion problem described in (8).The following theorem describes completion scenarios forwhich a global optimum can be found with probability one.

Theorem 2:

Consider the following cases:1) (rank-one matrices with arbitrary sampling) : Let X Ω = P Ω ( X ) for some unknown matrix X with rank equalto one. Here, Ω ⊂ [ m ] × [ n ] can be arbitrary.2) (full sampling with arbitrary rank matrices) : Let X Ω = X , i.e., Ω = [ m ] × [ n ] .Suppose that r = rank ( X ) is given. Let U G ⊂ U m,r be thepreimage of f G ( U ) = 0 (also deﬁned in (11)). Let U berandomly generated from the isotropic distribution on U m,r ,and used as the initial point of the search procedure. Withprobability one, there exists a continuous path U ( t ) , t ∈ [0 , ,such that U (0) = U , U (1) ∈ U G and ddt f G ≤ for all t ∈ (0 , , where the equality holds if and only if U ∈ U G .The proof of the theorem is outlined in Section IV-A. Itis worth to note that almost all starting points are good: itis certainly good if the starting point is a consistent solution; For convenience, we use the following convention regarding the singularvectors b i and v i : we let the ﬁrst nonzero entry of v i be positive; otherwise,we let v ′ i = − v i and b ′ i = − b i , and use v ′ i and b ′ i for singular valuedecomposition. The simultaneous changes in signs do not affect the singularvalue decomposition nor the computation of the gradient. otherwise, there exists a continuous path from this startingpoint to a global optimum such that the objective functionkeeps decreasing. The performance guarantee provided inTheorem 2 is strong in the sense that it does not require eitherincoherence conditions or large matrix sizes.A simple corollay of the Theorem 2 is the following result:suppose that the partial observations X Ω admit a uniqueconsistent solution in terms of the Frobenius norm; then agradient search procedure using the geometric norm ﬁnds thisunique solution with probability one. This conclusion followsfrom the fact that the solution set under the Frobenius normcontains only a single point and therefore U G = U F = U F .For the more general case where r > and Ω = [ m ] × [ n ] ,we can not prove the same performance guarantees. Neverthe-less, in Section IV-B, we present a collection of results thatmay be helpful for future exploration. A. Proof of Theorem 2

For our proof techniques, we need the following two as-sumptions.

Assumption I : There exists a global optimum U X ∈ U m,r such that f G ( U X ) = 0 and all the r principal angles betweenspan ( U X ) and span ( U ) are less than π/ . That is, all thesingular values of U TX U are strictly positive. Assumption II : All of the θ i ’s (the smallest principal anglebetween span ( U ) and span ( B i ) ) are less than π/ . Remark 1:

Suppose that the matrix U is randomly drawnfrom the uniform (isotropic) distribution on U m,r . Then U satisﬁes both assumptions with probability one. This resultcan be easily veriﬁed using the probability density functionof the principal angles [21], [22], [20], [23].Assuming that these two assumptions are satisﬁed, we havethe following two theorems corresponding to the two cases inTheorem 2, respectively. Theorem 3: (Rank-One Case)

Let X Ω be the partial obser-vation matrix generated from a rank-one matrix. Let u ∈U m, be an estimate of the column space that satisﬁes As-sumptions I and II. Suppose that P ni =1 sin θ i = 0 . Then thereexists a continuous path u ( t ) ∈ U m,r such that u (0) = u , u (1) ∈ U G , and ddt (cid:12)(cid:12) t =0 sin θ i ≤ for all i ∈ [ n ] , whereequality holds if and only if θ i (0) = 0 . Theorem 4: (Full-Sampling Case)

Let X ∈ R m × n be arank- r matrix. Let U ∈ U m,r satisfy Assumptions I and II.Suppose that P ni =1 sin θ i = 0 . Then there exists a U ( t ) ∈U m,r such that U (0) = U , U (1) ∈ U G and ddt (cid:12)(cid:12) t =0 sin θ i ≤ for all i ∈ [ n ] , where equality holds if and only if θ i (0) = 0 .The proofs of Theorem 3 and 4 are given in Appendix B andC, respectively. Since the proof techniques differ signiﬁcantly,we present the two theorems/proofs separately.Both theorems are stated for derivatives taken at t = 0 . Nev-ertheless, the analysis can be extended for arbitrary t ∈ [0 , ,that is, ddt sin θ i ≤ for all t ∈ [0 , , where the equality holdsif and only if θ i ( t ) = 0 . To show that this is the case, note thatin proving both Theorem 3 and Theorem 4, we constructed acontinuous path U ( t ) such that U (0) = U and U (1) ∈ U G .By ﬁxing this continuous path, we observe that: 1) All the r principal angles between span ( U ) andspan ( U (1)) are monotonically decreasing as t increasesto one. This implies that Assumption I holds for all t ∈ [0 , .2) We have θ i ( t ) < π/ for all i ∈ [ n ] and for all t ∈ [0 , ǫ ) for some sufﬁciently small ǫ > . This claim can beveriﬁed by invoking the facts that θ i (0) < π/ for all i ∈ [ n ] and that θ i is a continuous functions for all i ∈ [ n ] . As a result, all U ( t ) ’s, where t ∈ [0 , ǫ ) , satisfyAssumptions I and II.3) For every t in the interval [0 , ǫ ) , U ( t ) is the startingpoint of the geodesic path from U ( t ) to U (1) , whichis a part of the geodesic path from U (0) to U (1) . Usingthe same proof techniques as in Appendix B and C, itis clear that ddt sin θ i ( t ) ≤ for all t ∈ [0 , ǫ ) . Hence, θ i ( t ) ≤ θ i (0) < π for all i ∈ [ n ] and for all t ∈ [0 , ǫ ) .4) The arguments above can be extended. It can be veriﬁedthat θ i ( t ) ≤ θ i (0) < π/ for all i ∈ [ n ] and for all t ∈ [0 , . This implies that U ( t ) satisﬁes AssumptionsI and II for all t ∈ [0 , . Hence, ddt sin θ i ( t ) ≤ forall i ∈ [ n ] and all t ∈ [0 , , where the equality holds ifand only if θ i ( t ) = 0 . Theorem 2 therefore holds.A direct consequence of Theorem 2 is that for almostall U ∈ U m,r , there exists a continuous path leading toa global minimizer. However, one does not know this pathin the process of solving the matrix completion problem. Apractical approach is to use a gradient descent method. Weconsider the following randomized gradient descent algorithm.Let U ( i ) ∈ U m,r , i = 1 , , · · · , be the starting point of the i th iteration. Clearly, U ( i ) , i ≥ , is also the end point of the ( i − th iteration. We generate the sequence of U ( i ) ’s in thefollowing manner.1) Let U (1) be randomly generated from the isotropicdistribution.2) Set i = 1 . Execute the following iterative process.a) Compute the gradient ∇ U ( i ) f G .b) Let U ( i ) ( t ) be the geodesic curve starting at U ( i ) (0) = U ( i ) with direction H = −∇ U ( i ) f G .c) Let t ( i ) ∗ be such that ddt f G (cid:0) t ( i ) ∗ (cid:1) = 0 and ddt f G ( t ) < for all t < t ( i ) ∗ .d) Randomly generate a t ( i ) from the uniform distri-bution on (cid:0) , t ( i ) ∗ (cid:1) .e) Let U ( i +1) = U ( i ) (cid:0) t ( i ) (cid:1) . Let i = i +1 . Go to Step(a).Due to the randomness of U ( i ) , all U ( i ) ’s satisfy AssumptionsI and II with probability one. The objective function decreasesafter each iteration. This gradient descent procedure convergesto a global minimum as the number of iterations approachsinﬁnity. Remark 2:

Denote the obtained global minimum by ˆ U . Itmay happen that ˆ U ∈ U G \U F . In this case, the solution isinconsistent with respect to to the standard Frobenius norm.One can use perturbation techniques to move ˆ U from theboundary of U F to the interior region of U F . B. The General Framework

For the cases that are not described in Theorem 2, we havethe following corollary.

Corollary 1: (General Cases)

Let X ∈ R m × n be a rank- r matrix. Let U X ∈ U G be a global minimum. For each i ∈ [ n ] , the following statements are true. Let u X,i ∈ span ( U X ) T span ( B i ) be a unit norm vector. Let U ∈ U m,r and w i ∈ U r, be randomly drawn from the correspondingisotropic distributions respectively. Then with probability one,the vector u ,i , U w i is not orthogonal to u X,i . Supposethat this is true. Deﬁne θ i = cos − kP ( u i ( t ) , B i ) k . Thereexists a continuous path u i ( t ) ∈ U m, such that u i (0) = u ,i , u i (1) ∈ span ( U X,i ) T U m, , and ddt sin θ i ≤ , where theequality holds if and only if θ i ( t ) = 0 . Proof:

Without loss of generality, we assume that h u ,i , u X,i i > . The desired continuous path is given by u i ( t ) = (1 − t ) u ,i + t u X,i k (1 − t ) u ,i + t u X,i k , t ∈ [0 , . The detailed arguments are the same as those in the proof ofTheorem 3, and therefore omitted.

Remark 3:

This corollary is similar to Theorems 3 and 4in the sense that there exist continuous paths along which theatomic functions decreases.At the same time, Corollary 1 differs from Theorems 3and 4 in two aspects. First, the paths u i ( t ) in Corollary1 may be different for different i ’s, while in Theorems 3and 4, a single continuous path U ( t ) is constructed. Second,the angle θ i in Corollay 1 is essentially the principal anglebetween the 1-dimensional subspace span ( u i ( t )) and thesubspace span ( B i ) . In contrast, Theorem 3 and 4 involve theminimum principal angle between the r -dimensional subspacespan ( U ( t )) and the subspace span ( B i ) .V. C ONCLUSION

We considered the problem of how to search for a consistentcompletion of low-rank matrices. We showed that Frobeniusnorm combined with a projection operator results in a dis-continuous objective function and therefore makes gradientdescent approach fail. We proposed to replace the Frobeniusnorm with the chordal distance. The chordal distance is the“best” smooth version of the Frobenius norm in the sense thatthe solution set of the former is the closure of the solution setof the latter. Based on the chordal distance, we derived strongperformance guarantees for two completion scenarios. Thederived performance guarantees do not rely on incoherenceconditions or large matrix sizes, and they hold with probabilityone. A

PPENDIX

A. Proof of Theorem 1

We omit the subscript i to simplify notation. The proofconsists of two parts, showing that:1) U F ⊂ U G ;2) for any given U ∈ U G , there exists a sequence (cid:8) U ( n ) (cid:9) ⊂ U F such that lim n →∞ (cid:13)(cid:13) U − U ( n ) (cid:13)(cid:13) F = 0 . We start by proving that U F ⊂ U G . For any given U ∈ U F ,there exists a nonzero vector w ∈ R r such that U Ω w = x Ω .Let b = U w / k w k . Clearly, k b k F = 1 . Recall the formulafor B x Ω . We can write b as a linear combination of columnsof B x Ω : b = 1 k w k x Ω + X j ∈ Ω c b j e j = k x Ω kk w k ¯ x Ω + X j ∈ Ω c b j e j . As a result, (cid:13)(cid:13) B T x Ω b (cid:13)(cid:13) F = (cid:13)(cid:13)(cid:13)(cid:13) B T x Ω U w k w k F (cid:13)(cid:13)(cid:13)(cid:13) F = 1 . It follows that the largest singular value of B T x Ω U is one.Therefore, U ∈ U G , and we thus have U F ⊂ U G .To prove the second part, we make use of the followingnotation. For any given U ∈ U G , let u , · · · , u r be the leftsingular vectors of the matrix U U T B x Ω corresponding to the i th largest singular value. Let k be the multiplicity of the sin-gular value one, i.e., the number of singular values that equal toone. Let U k = [ u , · · · , u k ] and U k +1: r = [ u k +1 , · · · , u r ] .Clearly, λ max (cid:0) U Tk +1: r B x Ω (cid:1) < .It sufﬁces to focus on U instead of U . That is, to prove thesecond part, it sufﬁces to ﬁnd a sequence in U F convergingto U . To verify this claim, let V = U T U . Then V ∈ U r,r and U = U V . Suppose that (cid:8) U ( n ) (cid:9) ⊂ U F is a sequencesuch that U ( n ) → U . It is clear that U ( n ) V → U V = U .Furthermore, since x Ω = U ( n )Ω w ( n ) = U ( n )Ω V (cid:16) V T w ( n ) (cid:17) = (cid:16) U ( n ) V (cid:17) Ω w ′ ( n ) , one has U ( n ) V ∈ U F . The sequence (cid:8) U ( n ) V (cid:9) ⊂ U F is thedesired sequence that converges to U . It is also important tonote that U ∈ U G , since λ (cid:0) U U T B x Ω (cid:1) = λ (cid:0) U V V T U T B x Ω (cid:1) = λ (cid:0) U U T B x Ω (cid:1) . We claim that U ∈ U F if and only if U k, Ω = . (14)To prove this claim, we shall show that U k, Ω = ⇒ U ∈ U F (15)and U k, Ω = ⇒ U / ∈ U F . (16)To prove (15), suppose that U k, Ω = 0 . Without loss ofgenerality, let u , Ω = . Since u is the left singular vectorcorresponding to the singular value equal to one, u can bewritten as a linear combination of the columns of B x Ω : u = a ¯ x Ω + P j ∈ Ω c a j e j . Since u , Ω = a ¯ x Ω = , one has a =0 . As a result, x Ω = a u , Ω for some constant a = 0 . Hence, d F ( x Ω , U ) = 0 and U ∈ U F .To prove (16), assume that U k, Ω = . Since P ( x Ω , U Ω ) = P ( x Ω , U k +1: r, Ω ) , proving that U / ∈ U F isequivalent to proving that x Ω − P ( x Ω , U k +1: r, Ω ) = . Thisinequality can be proved by contradiction. Suppose that wehave an equality. Then there exists a vector w ∈ R r − k such that U k +1: r, Ω w = x Ω . Let b = U k +1: r w / k w k . It isstraightforward to show (using similar arguments as the ones used for proving U F ⊂ U G ) that b ∈ span ( B x Ω ) and thelargest singular value of U Tk +1: r B x Ω is one. This contradictsthe fact that λ max (cid:0) U Tk +1: r B x Ω (cid:1) < .Now we are ready to construct a sequence in U F convergingto U . If U k, Ω = 0 , then U ∈ U F and it is trivial to ﬁnd asequence in U F converging to U . It remains to ﬁnd a sequence (cid:8) U ( n ) (cid:9) ⊂ U F that converges to U when U k, Ω = 0 . Deﬁne x r = x Ω − P ( x Ω , U Ω ) . Since U k, Ω = 0 , one has U / ∈ U F and x r = . Note that x r, Ω c = and that x r, Ω ⊥ u i, Ω for all i ∈ [ r ] . It can be veriﬁed that x r ⊥ u , · · · , x r ⊥ u r . Let U ǫ = (cid:20) u + ǫ x r √ ǫ , u , · · · , u r (cid:21) . It can be veriﬁed that U ǫ ∈ U m,r . Furthermore, P ( x Ω , U Ω ) = P ( x Ω , [ x r , U k +1: r, Ω ]) = x Ω and therefore U ǫ ∈ U F for all ǫ = 0 . Now choose a sequence (cid:8) U ( n ) (cid:9) = (cid:8) U /n (cid:9) . It is asequence in U F and it converges to U . This completes theproof. B. Proof of Theorem 3

Since X Ω is generated from a rank-one matrix, there exists a u X ∈ U m, such that u X ∈ span ( B i ) for all i ∈ [ n ] . Withoutloss of generality, we assume h u , u X i > : by Assumption I, h u , u X i 6 = 0 ; if h u , u X i < , we replace u X with − u X .Now deﬁne u ( t ) = (1 − t ) u + t u X k (1 − t ) u + t u X k = (1 − t ) u + t u X L ( t ) , where L ( t ) , k (1 − t ) u + t u X k . Clearly u (0) = u and u ( t ) ∈ U m, in a neighborhood of t = 0 .For every i ∈ [ n ] , we shall show that ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 sin θ i = − ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 (cid:18)

12 cos θ i (cid:19) ≤ , (17)where the equality holds if and only if θ i = 0 . Let P i u denotethe vector P ( u , B i ) = B i B Ti u . Since u X ∈ span ( B i ) , onehas P i u = 1 L ( t ) ((1 − t ) P i u + t u X ) . We then have ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 (cid:18)

12 cos θ i (cid:19) = ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 kP i u k = ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 " (cid:18) − tL ( t ) (cid:19) kP i u k + 12 (cid:18) tL ( t ) (cid:19) + (cid:0) t − t (cid:1) L ( t ) hP i u , u X i = ( − − L ′ (0)) kP i u k + hP i u , u X i . Note that hP i u , u X i = u T X B i B Ti u = h u , P i u X i = h u , u X i . Consequently, ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 (cid:18)

12 cos θ i (cid:19) = ( − − L ′ (0)) kP i u k + h u , u X i . (18) The term L ′ (0) can be computed as follows. Note that L ( t ) = (1 − t ) k u k + t k u X k + 2 (cid:0) t − t (cid:1) h u , u X i = 1 − t + 2 t + 2 (cid:0) t − t (cid:1) h u , u X i . Therefore, ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 L ( t ) = − h u , u X i = 2 L (0) L ′ (0) . As a result, L ′ (0) = − h u , u X i . (19)Substituting (19) into (18) one can see that ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 (cid:18)

12 cos θ i (cid:19) = h u , u X i (cid:16) − kP i u k (cid:17) ≥ , where the equality holds if and only if kP i u k = 1 , i.e., u ∈ span ( B i ) and θ i = 0 . This completes the proof. C. Proof of Theorem 4

Let U X ∈ U m,r be such that every column of X isin the subspace span ( U X ) . Consider the compact singulardecomposition U U T U X U T X = U ′ SU ′ T X , where S ∈ R r × r is the diagonal matrix containing the singular values and U ′ and U ′ X are the left and right singular vector matrices,respectively. Clearly, U and U ′ generate the same subspace,and so do U X and U ′ X . For simplicity, we present ourproof for U ′ and U ′ X and omit the superscripts. With thissimpliﬁcation, one has U T U X = S = diag ([ λ , · · · , λ r ]) .For the i th column of X , we compute ∇ U cos θ i . Sincewe are considering the full sampling case, we have B i = ¯ x i .Because ¯ x i ∈ span ( U X ) , there exists ¯ w ∈ U r, such that ¯ x i = U X ¯ w . To compute ∇ U cos θ i , we need the ﬁrst leftand the ﬁrst right singular vectors of the matrix ¯ x i ¯ x Ti U . Theﬁrst left singular vector is clearly ¯ x i and the ﬁrst right singularvector equals U T ¯ x i = U T U X ¯ w = S ¯ w . Hence, ∇ U cos θ i = (cid:0) I − U U T (cid:1) ¯ x i ¯ w T S T = (cid:0) I − U U T (cid:1) U X ¯ w ¯ w T S T . According to Lemma 1, (cid:0) I − U U T (cid:1) U X can be writtenas G diag ([sin α , · · · , sin α j ]) , where G = [ g , · · · , g r ] ∈U m,r , and α i = cos − λ i ’s, i = 1 , · · · , r , are the principalangles between span ( U ) and span ( U X ) .We consider the geodesic U ( t ) from U to U X . In Lemma1 (part 1), we show that this geodesic is given by the U ( t ) satisfying U (0) = U and ˙ U (0) = G diag ([ α , · · · , α r ]) .Along this path, we have ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 cos θ i = h∇ U cos θ i , G diag ([ α , · · · , α r ]) i = trace (cid:16) ( G diag ([ α , · · · , α r ])) T (cid:0)(cid:0) I − U U T (cid:1) U X (cid:1) ¯ w ¯ w T S T (cid:1) = trace (cid:0) diag ([ · · · , α j sin α j , · · · ]) ¯ w ¯ w T S (cid:1) = trace (cid:0)(cid:0) I − S (cid:1) ¯ w ¯ w T S (cid:1) = r X j =1 ¯ w j α j sin α j cos α j ≥ . (20) We claim that under Assumption II, equality in (20) holds ifand only if θ i = 0 . If θ i = 0 , then ¯ x i ∈ span ( U ) . Accordingto Lemma 1 (part 2), ¯ w j = 0 for all j such that α j = 0 .The equality in (20) thus holds. Otherwise, if θ i = 0 , then ¯ x i / ∈ span ( U ) . Again, according to Lemma 1 (part 2), thereexists an j ∈ [ r ] such that α i > and ¯ w j = 0 . Hence, wehave a strict inequality in (20). Finally, note that ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 sin θ i = − ddt (cid:12)(cid:12)(cid:12)(cid:12) t =0 cos θ i ≤ . This proves the theorem.R

EFERENCES[1] D. Donoho, “Compressed sensing,”

IEEE Trans. Inform. Theory , vol. 52,no. 4, pp. 1289–1306, 2006.[2] E. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: exactsignal reconstruction from highly incomplete frequency information,”

IEEE Trans. Inform. Theory , vol. 52, no. 2, pp. 489–509, 2006.[3] E. Candès and T. Tao, “Decoding by linear programming,”

InformationTheory, IEEE Transactions on , vol. 51, no. 12, pp. 4203–4215, 2005.[4] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-ranksolutions of linear matrix equations via nuclear norm minimization,” arXiv:0706.4138 , 2007.[5] E. Candes and B. Recht, “Exact matrix completion via convex optimiza-tion,” arXiv:0805.4471 , 2008.[6] E. J. Candes and T. Tao, “The power of convex relaxation: Near-optimalmatrix completion,” arXiv:0903.1476 , Mar. 2009.[7] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky, “Rank-sparsity incoherence for matrix decomposition,” arXiv:0906.2220 .[8] E. J. Candes and Y. Plan, “Matrix completion with noise,” arXiv:0903.3131 , Mar. 2009.[9] J. Cai, E. J. Candes, and Z. Shen, “A singular value thresholdingalgorithm for matrix completion,” arXiv:0810.3286 , 2008.[10] K. Lee and Y. Bresler, “ADMiRA: atomic decomposition for minimumrank approximation,” arXiv:0905.0044 , Apr. 2009.[11] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensingsignal reconstruction,”

IEEE Trans. Inform. Theory , vol. 55, pp. 2230 –2249, May 2009.[12] D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery fromincomplete and inaccurate samples,”

Applied and Computational Har-monic Analysis , vol. 26, pp. 301–321, May 2009.[13] R. Meka, P. Jain, and I. S. Dhillon, “Guaranteed rank minimization viasingular value projection,” arXiv:0909.5457 , 2009.[14] T. Blumensath and M. E. Davies, “Iterative hard thresholding forcompressed sensing,”

Applied and Computational Harmonic Analysis ,vol. 27, pp. 265–274, Nov. 2009.[15] J. Haldar and D. Hernando, “Rank-constrained solutions to linear matrixequations using powerfactorization,”

IEEE Signal Processing Letters ,pp. 16:584–587, 2009.[16] R. H. Keshavan, A. Montanari, and S. Oh, “Matrix completion from afew entries,” arXiv:0901.3150 , 2009.[17] W. Dai and O. Milenkovic, “SET: an algorithm for consistent matrixcompletion,” in

IEEE International Conf. on Acoustics, Speech, andSignal Processing (ICASSP) , March 2010.[18] A. Edelman, T. Arias, and S. T. Smith, “The geometry of algorithmswith orthogonality constraints,”

SIAM Journal on Matrix Analysis andApplications , vol. 20, pp. 303–353, April 1999.[19] J. H. Conway, R. H. Hardin, and N. J. A. Sloane, “Packing lines, planes,etc., packing in Grassmannian spaces,”

Exper. Math. , vol. 5, pp. 139–159, 1996.[20] W. Dai, Y. Liu, and B. Rider, “Quantization bounds on grassmannmanifolds and applications to mimo communications,”

IEEE Trans. onInform. Theory , vol. 54, pp. 1108 –1123, march 2008.[21] A. T. James, “Normal multivariate analysis and the orthogonal group,”

Ann. Math. Statist. , vol. 25, no. 1, pp. 40 – 75, 1954.[22] M. Adler and P. van Moerbeke, “Integrals over Grassmannians andrandom permutations,”

Advances in Mathematics , vol. 181, no. 1,pp. 190–249, 2004.[23] W. Dai, B. Rider, and Y. Liu, “Volume growth and general rate quan-tization on grassmann manifolds,” in