Greedy Approach for Low-Rank Matrix Recovery
aa r X i v : . [ m a t h . NA ] A p r Greedy Approach for Low-Rank matrix recovery
A. Petukhov , I. Kozlov Contact Author, Department of Mathematics, University of Georgia, Athens, GA 30602, USA,[email protected] Algosoft Tech USA, Bishop, GA, USA, [email protected]
IPCV’13
Abstract — We describe the Simple Greedy Matrix Comple-tion Algorithm providing an efficient method for restorationof low-rank matrices from incomplete corrupted entries.We provide numerical evidences that, even in the simplestimplementation, the greedy approach may increase the re-covery capability of existing algorithms significantly.
Keywords:
Law-Rank Matrix Completion, Compressed Sensing,Image Inpainting, Motion Tracking, Face Recognition
1. Introduction
We consider a greedy strategy based algorithm for therecovery of the low-rank matrix from incomplete corruptedsamples.The problem of low-rank matrix completion is not new.However, it got a new impulse ([4], [2]) in connectionwith the development of the compressed sensing theoryand algorithms and ideas to use the ℓ minimization asa surrogate for the sparsest solution ([3], [6], [12]) .This paper can be considered as a feasibility study forthe methods inspired by ideas from both low-rank matrixcompletion and our compressed sensing oriented ℓ -greedyalgorithm ([7], [10], [11]).The problem is set as follows. It is required to restore(complete) the matrix A ∈ R m × n of rank r , r < min { m, n } ,given by its k entries, k < nm . The set of the given entriesis Ω ⊂ { , . . . , m } × { , . . . , n } . | Ω | is the cardinality of Ω (which in our case is equal to k ). We also introduce notation d (Ω) for the density of the set Ω , d (Ω) := | Ω | / ( nm ) .The complementary set ¯Ω is a set of erasures, − d (Ω) represents the density of erasures. The theoretical boundsfor recoverability of the matrix depends not only on thedensity of the samples but also on the matrix A and on the2D-geometry of Ω . The matrix consisting of only one non-zero entry is the simplest example of a rank 1 matrix whichcan be restored only if the value at the non-zero entry isknown. Anyway, it turned out (cf. [2]) that under quite mildconditions random matrices of size n × n and rank r can berecovered from at most O ( rn . log n ) entries as a matrixwith the minimum of nuclear norm.The popularity of that problem can be explained byan enormous number of applied problems which can beformulated in terms of matrix completion. Among many settings in different applied areas, we mention problems re-lated to image processing. Image inpainting, including moreparticular image upsampling, face recognition technique,motion tracking and segmentation in video are most typicalof those problems. While the problem of low-rank matrixcompletion was studied long ago, the theory got a big pushdue to development of Compressed Sensing / CompressiveSampling (CS) technique. After some simplification, the CSdata decoding goal can be reduced to solving underdeter-mined systems A x = y + e , (1)where x ∈ R n is a sparse vector of data "encoded" with theknown to the decoder matrix A ∈ R m × n , m < n , y ∈ R m is a vector of measurements (of x ) possibly corrupted by thevector e ∈ R n . Here and bellow we assume that the sparsesolution x exists and the vector of errors e is also sparse.The sparsity of a ∈ R N means that | a | := |{ a i = 0 }| < N. The value | a | is called the Hamming weight of the vector a . Since the problem of finding sparse solutions has non-polynomial complexity ([9]), the mainstream CS researchessuggested to use to replace the minimization of | x | (or | x | + | e | ) with the minimization of ℓ -norm. It turned outthat that such approach based on convex optimization givesthe optimal sparse solution at least when | x | is not verylarge (cf. [6], [3], [12] for the case e = ~ ). Thus, insome special cases the original non-convex problem can bereduced to convex programming. In what follows, like forthe notion for the Hamming norm, we use notation | · | p , < p < ∞ , for element-wise (quasi-)norms of vectors anmatrices. Say, for the matrix A , | A | p := (cid:16)P i,j | A ij | p (cid:17) /p .In particular, | · | is the Frobenius norm. The inner productof 2 matrices A and B is defined as h A, B i := trace ( A T B ) .Thus, h A, A i = | A | . The notation k · k p is reserved for theoperator norms of matrices.CS results inspired the authors of [2] and [4] on replacingthe minimum rank condition leading to non-polynomialcomplexity with the minimization of the nuclear norm k A k ∗ := P σ i of the matrix A , where σ i are singular valuesf A . To be more precise, the problem k A k ∗ → min subject to A ij = M ij , ( i, j ) ∈ Ω , where M i,j are the known entries (measurements) of the ma-trix A , is considered as relaxation of the rank minimizationproblem above.Many different settings giving a solution of the originalproblems have been studied for the last years. In most cases,the intention of those studies was to find the faster algorithmswith the higher capability of the recovery. Typically, modifi-cations of the problem leading to unconstrained optimizationwere introduced.
2. Basic Algorithm
For our experiments we need the algorithm providingconvex minimization recovering low-rank matrices fromincomplete corrupted samples. It is used as a basic con-structive block in our algorithm. The problem of restoringmatrix from corrupted entries is less studied than the simplermatrix completion problem when the available entries are notcorrupted. Anyway, there are a few computationally efficientalgorithms solving that problem (e.g., [5], [8], [14]).For our purposes, we selected the algorithm from [8]based on the method of Augmented Lagrange Multipliers(ALM) (e.g., [1]). Having corrupted samples, instead offinding the matrix with the sparsest set of singular valuescoinciding with the measurements on as large as possibleset, the algorithm finds the minimum of the functional L ( A, E, Y, µ ) := k A k ∗ + λ | E | + h Y, R i + µ | R | , where R = M − A − E is the residual of approximation ofthe measurements M of the estimate unknown matrix matrix A and the estimate of the unknown matrix of errors E . Theentries of the input matrix M on the ¯Ω are unknown. It isassumed that R vanishes on ¯Ω and does not contribute intothe third and the forth term as well as E does not contributeinto the second term.If it is known that the observed entries in M are not cor-rupted, the second term can be omitted. However, we assumethat we never know whether the entries are corrupted. So,in what follows, we minimize the 4-term functional.We will need the following notation S ǫ [ x ] := x − ǫ, x > ǫ,x + ǫ, x < − ǫ, , otherwise ; where x can be either a number or a vector or a matrix.The operator S ǫ is called the shrinkage operator. The norms k · k , k · k ∞ applied to matrices mean the operator norms.The minimization algorithm, as it is described in [8] andimplemented in Matlab code, is as follows Algorithm ALM.Input.
Observation matrix M ∈ R m × n , defined on Ω , and λ > . Initialization. Y = {k M k , k M k ∞ /λ } M , E = 0 , µ > , ρ > , k = 0 ;1. while not converged do ( U, S, V ) := svd ( M − E k + µ − Y k ) ;3. A k +1 := U S µ − k [ S ] V T ;4. E k +1 := S λµ − k [ M − A k +1 + µ − Y k ] ;5. Y k +1 := Y k + µ k ( M − A k +1 − E k +1 ) ;6. µ k +1 := ρµ k , k := k + 1 ;7. end while . Output. A k +1 , E k +1 .
3. Our algorithm
Our modification of the algorithm above is inspired bysignificant success reached by applying greedy ideas tosolving underdetermined systems ([7], [10], [11]). The gen-eral greedy strategy in optimization algorithms consists insequential finding a simple suboptimal solutions giving someinformation about the optimal solution. A greedy algorithmpicks up some most obvious features or elements of thosesolutions and gives them a privilege to be pivot for the nextiteration of the suboptimal algorithm. Each iteration bringsnew pivot elements.In the matrix recovery algorithm, the erasures from theset ¯Ω forms such group from the beginning. Whereas, theelements of Ω are just suspicious to be erroneous. If wehave sufficient evidence that some element in Ω containsa random error independent of the content of other entriesfrom Ω , that the decision to move this element to ¯Ω is quitejustifiable. While the independence condition is not alwaysaccurate even in our experiments with artificially generateddata, we use this strategy for estimation of the capability ofthe greedy ideas for matrix recovery.Our greedy algorithm consists in iterating with an updated(dilated) sets Ω k . We will call it the Simple Greedy MatrixCompletion Algorithm (SGMCA). Generally speaking , anymatrix recovery algorithm, including SGMCA itself, whichis able to fight the mixture of erasures and errors can beused as a basic block of SGMCA.Formally, all our experiments can be described in thefollowing way. Algorithm SGMCA.Input. M , Ω . Initialization. λ > , Ω := Ω , A := M , E := 0 , < q < , k = 0 .1. Set k := k + 1 ;2. A k = ALM ( M, Ω k ) ;3. if k = 1 then T = 0 . i,j {| A ij − M ij | ; else T k :=0 . T k − ;4. Ω k +1 = Ω k \ { ( i, j ) | | A kij − M ij | > T k } ;5. if not converged go to
4. Numerical Experiments
Since our intention was to conduct algorithm feasibilitystudy, the goal of this section is to give comparison withhe output of recently published algorithms and with pureALM (one iteration of the algorithm above with no update).The parameters in the basic algorithm are selected as µ =0 . / k M k , ρ = 1 . . | Ω | / ( mn ) . The parameter λ isdefined by the combination of d (Ω) and the density of errorsin Ω samples. The general trend can be characterized asfollows: the higher error rate, the less value of λ has tobe used. In what follows, we do not use fine tuning of λ .The same λ is used for big groups of experiments. At thesame time, tuning λ may bring significant increase of thealgorithm efficiency. In this paper, we use values of λ in therange . ÷ For our experiments, we used Matlab implementationof ALM algorithm available at are available at http://perception.csl.illinois.edu/matrix-rank/home.html
We used the code forMatrix Completion via the inexact ALM Method with ouradaptation to the input with errors.In all our experiments, A are square m × n , m = n matri-ces of rank r obtained as A = U V T , where U, V ∈ R n × r .The matrices U and V consist of independent gaussianrandom values with zero expectation and the variance 1. Thecoordinates of erasures were selected randomly. The modelsof errors below were different for different experiments.In the first experiment (Fig.1), we demonstrate advantageof the iterative SGMCA over ALM (one iteration of thesame algorithm). For the matrix with fixed sizes m × n , m = n , and the rank of matrices r = 15 . The solid curveson Fig. 1 correspond to SGMCA (up to 10 iterations) for n = 128 , , (from the bottom to the top). Thecorresponding graphs for ALM algorithm are plotted withdashed curves. Fig.1. Greedy iterations vs. convex ALM optimization
The horizontal coordinate indicates the fraction of thematrix available for restoration (i.e., d (Ω) ), while the verticalcoordinate is the fraction of randomly corrupted entries in Ω . The magnitude of the corruption is randomly set from the standard normal distribution. The curves define "phasetransition" bounds. In our experiments, we run 10 trials. Atthe points of curves all 10 attempts were accomplished withsuccess, i.e., for the obtained estimate ˆ A , | A − ˆ A | / | A | < − , whereas for the points above the curves, at least oneattempt failed. It is assumed that the regions under the curvesare regions of "success".The second experiment (Fig.2 and Fig.3) is devoted tocomparison with the results from [5]. Unfortunately, we donot have full information about the error model. So we usethe same additive model of errors as above. While all otherparameters are taken from [5]. The matrix of rank 2 isconstructed as above. Its size changes from 100 to 3000.The experiment consists of 2 parts. The first plot (Fig.2)contains the curves for the fixed erasure rate 0.1, i.e, d (Ω) = 0 . . However, the probability of errors in thoseentries varies. There are 3 graphs on Fig. 2. The solid linecorresponds to 10 iterations of SGMCA, the dashed linecorresponds to ALM, and the dotted curve corresponds tothe result from [5]. The values defined by curves give themaximum error probability admitting successful correctionby the corresponding algorithm. If we were aware of theerror model from [5], the dashed and dotted curves have tocoincide up to statistical discrepancy.On the second plot (Fig.3) the error rate is fixed and equalto 0.1. The graphs show dependence of maximum possiblerate of erasures from the size of matrix.
100 200 300 500 700 1000 1500 30000.20.30.40.50.60.70.80.91
Fig.2. Admissible error rate for erasure rate 0.1.
The efficiency of the algorithms is defined by the distanceof curve values on Fig. 2 and Fig. 3 to 1. It is easy tosee that, when the error rate is fixed, SGMCA curve istwice closer to 1 than ALM. Hence, SGMCA restores lowrank matrices from only half of entries necessary for theALM minimization. For the fixes erasure rate, the numberof uncorrupted entries for SGMCA successful restoration canbe only one third of that necessary for recovery with ALM.
00 200 300 500 700 1000 1500 30000.650.70.750.80.850.90.951
Fig.3. Admissible erasure rate for error rate 0.1.
The more precise value of the SGMCA graph at m =3000 on Fig 3 is 0.986, i.e., having error probability 0.1, thematrix can be restored from 1.4% of random entries.In our last experiment (Fig. 4–6), we compare the outputof SGMCA with the results of RTRMC algorithm from[14] providing very impressive recovery. However, for suchsuccessful recovery it requires a priori knowledge of therank of the matrix A . The ALM-based algorithm used as abasic algorithm in SGMCA does not require any knowledgeabout the rank while the rank knowledge is useful for it. Toprovide equal opportunities we applied the internal fixationof the rank inside ALM procedure.The results for ranks r = 5 , , are given on Fig.4–6 correspondingly. The size of matrices is × .The horizontal coordinate is d (Ω) , whereas the the verticalcoordinate is the probability of errors in the coefficientsavailable for reconstruction. In most of cases, SGMCA out-performs RTRMC by 15–25% in the maximum admissibleprobability of errors. The reason why SGMCA loses oninterval [0 , . on Fig.4 is the parameter λ = 0 . which was fixed for all 3 experiments. Setting λ = 0 . onthat interval, we would get the overwhelming advantage ofSGMCA. We emphasize that when the rank is known inadvance, the optimal parameter λ can be computed for each d (Ω) . This would not contradict the equal opportunity ofthe two algorithms. Optimal selection of λ is a reserve notused in our experiments.The model of data in [14] is identical to the modeldescribed above. Whereas, the error model is different.The values of the corrupted values are randomly uniformlydistributed between minimum and maximum of uncorruptedvalues. We also use that error model in our experiment. Fig.4. SGMCA vs. RTRMC. r = 5 . Fig.5. SGMCA vs. RTRMC. r = 15 . Fig.6. SGMCA vs. RTRMC. r = 25 . The dashed line corresponding to RTRMC is shorter thanur solid line since we used the data directly from [14].
5. Future Studies
This study shows that even the simplest implementationof the greedy idea in the form of SGMCA outperforms therecent algorithms significantly in the recovering ability forvery incomplete measurements with high level of corruption.The results above show the feasibility of the idea, itsperspectives and a high level of expectation.Because of its iterative nature, the algorithm has to repeatbasic step a few times. We restricted a number of iterationsby 10 however in most of cases 5 iterations provided neces-sary precision. Among possible directions for improvement,acceleration ways are needed to be considered. One ofpossible ways is to do not wait for the completion of eachiteration, updating Ω within internal iterations of the basicalgorithm. In the simplest, but maybe in less efficient form, itcan be done even with no intrusion into the basic algorithm.For instance, when the greedy step looks for coordinatesof large errors, we do not need high precision output ofthe basic algorithm. So an update of the precision on eachiteration from low to high may accelerate the algorithm forthe data close to the limits of the potential recovering ability(phase transition points). However, it should be noted thatthis modification may slightly slow down the recovery ofthe data located far from the phase transition points. Otherway for acceleration skipped by us in this paper is to use theestimate of A on the previous iteration as a basic algorithmstart point for the next iteration.Now we discuss the ways for increasing the capabilitySGMCA in the matrix recovery.First of all, in our experiments, we practically did not usefine tuning of the weight λ . We used fixed λ for big rangeof the parameters of input data. Whereas, just by replacingthe value 0.02 with the value . , the results presented onFig.4–6 can be significantly improved for the high level oferasures (on the left side of the graphs). Indeed, when therank of A and the model of errors is known, the optimalvalues of λ admitting the maximum density of error can befound in advance for any number of erasures and used inthe recovery procedure.When the rank is unknown or nature of possible corruptionis unknown in advance, adaptive finding λ becomes a chal-lenging problem. This problem has many common featureswith the problem of finding sparse solutions of underdeter-mined linear systems with corrupted input. In the mentionedproblem, the weight λ is defined by the interaction betweenthe sparsity of the possible solution and the error vector. Inthe problem of of matrix completion, the low rank plays arole of the solution sparsity in CS. So the methods (or at leastprinciples) developed in CS can be applied to the matrixcompletion problem. In [11], we showed that the sparsityof the solution can be reliably estimated on the dynamicof change of the value | x | . / | x | . The same characteristic of the matrix A intermediate estimates can be used forthe matrix completion procedure. We also have to say that,generally speaking, for finding λ we do not need both therank and the sparsity of errors estimates. Indeed, if we havethe sparsity of errors, we can evaluate the potential maximumrank r admitting recovery with the given algorithm providedthat λ is optimal. Then that optimal λ will also providerecovery of matrices with the rank less than r . Thus, theadaptive λ is one of quite realistic sources for increasingalgorithm capability.
6. Conclusion
The paper presents feasibility study for the Simple GreedyMatrix Completion algorithm. We showed that it outper-forms recently developed algorithms of matrix completionsignificantly. We also discussed the ways for further increaseSGMCA efficiency.
References [1] D. Bertsekas, Constrained Optimization and Lagrange MultiplierMethod. Academic Press. 1982.[2] E.J. Candès and B. Recht, Exact Matrix Completion via ConvexOptimization, Comm. of the ACM, V. 55, ℓ1