Exact Reconstruction of Euclidean Distance Geometry Problem Using Low-rank Matrix Completion
11 Exact Reconstruction of Euclidean DistanceGeometry Problem Using Low-rank MatrixCompletion
Abiy Tasissa, Rongjie Lai
Abstract
The Euclidean distance geometry problem arises in a wide variety of applications, from determining molecular conformationsin computational chemistry to localization in sensor networks. When the distance information is incomplete, the problem can beformulated as a nuclear norm minimization problem. In this paper, this minimization program is recast as a matrix completionproblem of a low-rank r Gram matrix with respect to a suitable basis. The well known restricted isometry property can not besatisfied in this scenario. Instead, a dual basis approach is introduced to theoretically analyze the reconstruction problem. If theGram matrix satisfies certain coherence conditions with parameter ν , the main result shows that the underlying configuration of n points can be recovered with very high probability from O ( nr ν log ( n )) uniformly random samples. Computationally, simple andfast algorithms are designed to solve the Euclidean distance geometry problem. Numerical tests on di ff erent three dimensionaldata and protein molecules validate e ff ectiveness and e ffi ciency of the proposed algorithms. Index Terms
Euclidean distance geometry, low-rank matrix completion, nuclear norm minimization, dual basis, random matrices, Grammatrix.
I. I ntroduction
The Euclidean distance geometry problem, EDG hereafter, aims at constructing the configuration of points given partialinformation on pairwise inter-point distances. The problem has applications in diverse areas, such as molecular conformationin computational chemistry [1], [2], [3], dimensionality reduction in machine learning [4] and localization in sensor networks[5], [6]. For instance, in the molecular conformation problem, the goal is to determine the structure of protein given partial inter-atomic distances obtained from nuclear magnetic resonance (NMR) spectroscopy experiments. Since the structure of proteindetermines its physical and chemical properties, the molecular conformation problem is crucial for biological applications suchas drug design. In recent works, novel applications of EDG like solving partial di ff erential equations (PDEs) on manifoldsrepresented as incomplete inter-point distance information have been explored in [7].For the mathematical setup of the problem, consider a set of n points P = { p , p , ..., p n } T ∈ R n × r . The squared distancebetween any two points p i and p j is given by d i , j = || p i − p j || = p Ti p i + p Tj p j − p Ti p j . The Euclidean distance matrix D = [ d i , j ]can be written compactly as follows D = diag( P P T ) T + diag( P P T ) T − P P T (1)where is a column vector of ones and diag( · ) is a column vector of diagonal entries of the matrix in consideration. X = P P T is the inner product matrix well known as the Gram matrix. If the complete distance information is available, the coordinatescan be recovered from the eigendecomposition of the Gram matrix using classical multidimensional scaling (MDS) [8]. Itshould be noted that this solution is unique up to rigid transformations and translations.In practice, the distance matrix is incomplete and finding the underlying point coordinates with partial information is ingeneral not possible. One approach proposed in [7] is based on a matrix completion applied to the Gram matrix which webriefly summarize here. Assume that the entries of D are sampled uniformly at random. By construction, X is symmetric andpositive semidefinite. The solution for X is unique up to translations. This ambiguity is fixed by considering the constraintthat the centroid of the points is located at the origin, (cid:80) nk = p k = , which leads to X · = . Assuming that X is low-rank, r (cid:28) n , the work in [7] considers the following nuclear norm minimization program to recover X .minimize || X || ∗ subject to X i , i + X j , j − X i , j = D i , j ( i , j ) ∈ Ω (2) X · = ; X = X T ; X (cid:23) A. Tasissa is with the Department of Mathematics, Rensselaer Polytechnic Institute, Troy, NY 12180, ( [email protected] ).R. Lai is with the Department of Mathematics, Rensselaer Polytechnic Institute, Troy, NY 12180, U.S.A. ( [email protected] ). This work is partially supportedby NSF DMS–1522645 and an NSF CAREER award, DMS–1752934. a r X i v : . [ c s . I T ] O c t Here || X || ∗ denotes the nuclear norm and Ω ⊂ { ( i , j ) | i , j = , ..., n , i < j } , | Ω | = m , denotes the random set that consists of allthe sampled indices. One characterization of the Euclidean distance matrix, due to Gower [9], states that D is an Euclideandistance matrix if and only if D = X i , i + X j , j − X i , j = D i , j ∀ i , j for some positive semidefinite matrix X satisfying X · = .As such, the above nuclear norm minimization can be interpreted as a regularization of the rank with a prior assumption thatthe true Gram matrix is low rank. An alternative approach based on the matrix completion method is direct completion of thedistance matrix [10], [11], [12]. Compared to this approach, an advantage of the above minimization program can be seen bycomparing the rank of the Gram matrix X and the distance matrix D . Using (1), the rank of D is at most r + X is simply r . Using matrix completion theory, it can be surmised that relatively less number of samples are required forthe Gram matrix completion. Numerical experiments in [7] confirm this observation. In [13], the authors consider a theoreticalanalysis of a specific instance of localization problem and propose an algorithm similar to (2). The paper considers a randomgeometric model and derives interesting results of bound of errors in reconstructing point coordinates. While the localizationproblem and EDG problem share a common theme, we remark that the EDG problem is di ff erent and our analysis adopts thematrix completion framework. The main task of this paper is a theoretical analysis of the above minimization problem. Inparticular, under appropriate conditions, we will show that the above nuclear norm minimization recovers the underlying innerproduct matrix. a) Related Work: The EDG problem has been well studied in various contexts. Early works establish mathematicalproperties of Euclidean distance matrices (EDM) [9] and prove conditions for a symmetric hollow matrix to be EDM [14],[15]. Particularly, the important result due to Schoenberg states that a symmetric hollow matrix D is EDM if and only if thematrix − J DJ is positive semidefinite. J is the geometric centering matrix defined as J = I − n T [14]. In follow-uptheoretical papers, the EDM completion problem was considered. Given a symmetric hollow matrix with some entries specified,the EDM completion problem asks if the partial matrix can be completed to an EDM. A work of Hendrickson relates theEDM completion problem to a graph realization problem and provides necessary graph theoretic conditions on the partialmatrix that ensures a unique solution [16]. Bakonyi and Johnson establish a link between EDM completion and the graph ofpartial distance matrix. Specifically, they show that there is a completion to a distance matrix if the graph is chordal [17]. Theconnection between the EDM completion problem and the positive semidefinite matrix completion was established in [18] and[19]. In [20], Alfakih establishes necessary and su ffi cient conditions for the solution of EDM completion problem to be unique.The emphasis in the above theoretical works is analytic characterization of EDMs and the EDM completion problem mostlyemploying graph theoretic methods. In the numerical side, a variety of algorithms using di ff erent optimization techniques havebeen developed to solve the EDM completion problem [3], [21], [22], [23], [24], [25], [26]. The above review is by no meansexhaustive and we recommend the interested reader to refer to [27] and [28]. This paper adopts the low-rank matrix recoveryframework. While the well-known matrix completion theory can be applied to reconstruct D [10], [12], it can not be directlyused to analyze (2). In particular, the measurements in (2) are with respect to the Gram matrix while the measurements in thework of [10], [12] are entry wise sampling of the distance matrix. The emphasis in this paper is on theoretical understanding ofthe nuclear norm minimization formulation for the EDG problem as stated in (2). In particular, the goal is to provide rigorousprobabilistic guarantees that give precise estimates for the minimum number of measurements needed for a certain successprobability of the recovery algorithm. b) Main Challenges: The random linear constraints in (2) can be equivalently written as a linear map L acting on X . Afirst approach to showing uniqueness for the problem (2) is to check if L satisfies the restricted isometry property (RIP) [12].However, the RIP does not hold for the completion problem (2). This can be simply verified by choosing any ( i , j ) (cid:60) Ω and construct a matrix X with X i , j = X j , i = X i , i = X j , j = L ( X ) = implying that the RIP condition does not hold. In general, RIP based analysis is not suitable for deterministic structuredmeasurements. Adopting the framework introduced in [29], the nuclear norm minimization problem in (2) can be written as amatrix completion problem with respect to a basis { w α } . It, however, turns out that w α is not orthogonal. With non-orthogonalbasis, the measurements (cid:104) w α , X (cid:105) are not compatible with the expansion coe ffi cients of the true Gram matrix. A possibleremedy is orthogonalization of the basis, say using the Gram-Schmidt orthogonalization. Unfortunately, the orthogonalizationprocess does not preserve the structure of the basis w α . This has the implication that the modified minimization problem (2)no longer corresponds to the original problem. As such, the lack of orthogonality is critical in this problem. In addition, it isnecessary that the solution of (2) is symmetric positive semidefinite satisfying the constraint X · = . On the basis of theabove considerations, an alternative approach is considered to show that (2) admits a unique solution. The analysis presentedin this paper is not based on RIP but on the dual certificate approach introduced in [10]. Our proof was inspired by the workof David Gross [29] where the author generalizes the matrix completion problem to any orthonormal basis. In the case ofthe EDG problem, one main challenge is that the sampling operator, an important operator in matrix completion, is no longerself-adjoint. This necessitates modifications and alternative proofs to some of the technical statements that appear in [29]. c) Major Contributions: In this paper, a dual basis approach is introduced to show that (2) has a unique solution underappropriate sampling conditions. First, the minimization problem in (2) is written as matrix completion problem with respectto a basis w α . Second, by introducing a dual basis { v α } to { w α } , one can ensure that the measurements (cid:104) X , w α (cid:105) in (2) arecompatible with expansion coe ffi cients of the true Gram matrix M . The two main contributions of this paper are as follows.1) A dual basis approach is introduced to address the EDG problem. Under certain assumptions, we show that the nuclear norm minimization problem succeeds in recovering the underlying low rank solution. Precisely, the main result statesthat if | Ω | = m ≥ O ( nr log n ), the nuclear norm minimization program (2) recovers the underlying inner product matrixwith very high probability, 1 − n − β for β > ff bound. We would like toemphasize that this part of the proof and the estimate using Cherno ff bound is simple and could be useful in a broadersetting.2) We develop simple and fast algorithms to solve the EDG problem under two instances. The first instance considers thecase of exact partial information. The second instance considers the more realistic setup of a noisy partial information.Numerical tests on various data show that the algorithms are accurate and e ffi cient.The outline of the paper is as follows. In section II, we introduce a dual basis approach and formulate a well-defined matrixcompletion problem for the EDG problem. We conclude the section by proposing coherence conditions for the EDG problemand explaining the sampling scheme. In section III, the proof of exact reconstruction is presented. In brief terms, the mainparts are summarized as follows. From convex optimization theory, showing uniqueness of the EDG problem is equivalent toshowing that there exists a dual certificate, denoted by Y , satisfying certain conditions. Y is constructed using the golfingscheme proposed in [29]. Next, we show that these conditions hold with very high probability by employing concentrationinequalities. This implies that there is a unique solution with very high probability. In section IV, fast numerical algorithmsfor the EDG matrix completion problem are proposed. Section V validates the e ffi ciency and e ff ectiveness of the proposednumerical algorithms. Finally, we conclude the work in the last section. d) Notation: To make our notation consistent in the paper, we summarize notations used in this paper in Table I.
TABLE IN otations x Vector || x || l norm X Matrix || X || F Frobenius norm X Operator || X || Operator norm X T Transpose || X || ∗ Nuclear normTr( X ) Trace ||A|| sup || X || F = ||A X || F . (cid:104) X , Y (cid:105) Trace( X T Y ) λ max , λ min Maximum, Minimum eigenvalue A vector or matrix of ones Sgn
X U sign ( Σ ) V T ; here [ U , Σ , V ] = svd( X ) A vector or matrix of zeros Ω , I Random sampled set, Universal set
II. D ual basis formulation
The aim of this section is to show that the EDG problem (2) can be equivalently stated as a matrix completion problem withrespect to a special designed basis. The matrix completion problem is an instance of the low rank matrix recovery problemwhere the goal is to recover a low rank matrix given random partial linear observations. The natural minimization problemfor low rank recovery problem is rank minimization with linear constraints. Unfortunately, this problem is NP hard [12]motivating other solutions. In a series of remarkable theoretical papers [10], [12], [29], [30], [31], it was shown that, undercertain conditions, the solutions of the NP hard rank minimization problem can be obtained by solving the convex nuclearnorm minimization problem which is computationally tractable [32], [33]. Our theoretical analysis is inspired by the work [29]where the author extends the nuclear norm minimization formulation to recovering a low rank matrix given that a few of itscoe ffi cients in a fixed orthonormal basis are known. a) Dual basis: To write the EDG problem (2) as a matrix completion problem with respect to an appropriate operatorbasis, let us introduce few notations. We write I = { α = ( α , α ) | ≤ α < α ≤ n } and define w α = e α , α + e α , α − e α , α − e α , α (3)where e α , α is a matrix whose entries are all zero except a 1 in the ( α , α )-th entry. It is clear that { w α } forms a basis ofthe linear space S = { X ∈ R n × n | X = X T & X · = } and the number of basis is L = n ( n − . For conciseness and ease inlater analysis, we further define S + = S ∩ { X ∈ R n × n | X (cid:23) } . Therefore, for any given subsample Ω of I , the EDG problem(2) can be written as the following nuclear norm minimization problemminimize X ∈ S + || X || ∗ subject to (cid:104) w α , X (cid:105) = (cid:104) w α , M (cid:105) ∀ α ∈ Ω (4)where M is the true underlying rank r Gram matrix satisfying M i , i + M j , j − M i , j = D i , j ∀ ( i , j ). The EDG problem cannow be equivalently interpreted as a matrix completion problem with respect to the basis w α . By construction, w α is symmetric and satisfies w α · = . The latter condition naturally enforces the constraint X · = . Itis clear that any X ∈ S can be expanded in the basis { w α } as X = (cid:80) β ∈ I c β w β . After minor algebra, c β = (cid:80) α ∈ I (cid:104) X , w α (cid:105) H α , β where we define H α , β = (cid:104) w α , w β (cid:105) and H α , β = H − α , β . Note that, since { w α } is a basis, the inverse H − is well defined.This results in the following representation of X . X = (cid:88) β ∈ I (cid:88) α ∈ I H α , β (cid:104) w α , X (cid:105) w β (5)The crux of the dual basis approach is to simply consider (5) and rewrite it as follows X = (cid:88) α ∈ I (cid:104) X , w α (cid:105) v α (6)where v α = (cid:80) β H α , β w β . It can be easily verified that { v α } is a dual basis of { w α } satisfying (cid:104) v α , w β (cid:105) = δ α , β . Let W = [ w , w , ..., w L ] and V = [ v , v , ..., v L ] denote the matrix of vectorized basis matrices and vectorized dual basismatrices respectively. The following basic relations are useful in later analysis, H = W T W , H − = V T V and V = W H − .In the context of the EDG completion problem, the dual basis approach ensures that the expansion coe ffi cients match themeasurements while preserving the condition that the matrix in consideration is symmetric with zero row sums. With this, (4)turns into a well formulated matrix completion problem with respect to the basis w α . An alternative way to rewrite (4) makesuse of the sampling operator defined as follows. R Ω : X ∈ S −→ Lm (cid:88) α ∈ Ω (cid:104) X , w α (cid:105) v α (7)where we assume that Ω are sampled uniformly at random from I with replacement and m = | Ω | is the size of Ω . The scalingfactor Lm is for ease in later analysis. It can be easily verified that m L R Ω = mL R Ω . The adjoint operator of R Ω , R ∗ Ω , can besimply derived and is given by R ∗ Ω : X ∈ S −→ Lm (cid:88) α ∈ Ω (cid:104) X , v α (cid:105) w α (8)Using the sampling operator R Ω , we can write (4) as followsminimize X ∈ S + (cid:107) X (cid:107) ∗ subject to R Ω ( X ) = R Ω ( M ) (9)In addition to the sampling operator, the restricted frame operator appears in the analysis and is defined as follows F Ω : X ∈ S −→ Lm (cid:88) α ∈ Ω (cid:104) X , w α (cid:105) w α (10)Note that the restricted frame operator is self-adjoint and positive semidefinite. b) Coherence: Pathological situations can arise in (9) when M has very few non-zero coe ffi cients. In the context of EDG,in extreme cases, this happens if only the diagonal elements of D are sampled and / or there are many overlapping points. Thismotivates the notion of coherence first introduced in [10]. Let M = (cid:80) rk = λ k u k u Tk , where the eigenvectors have been chosen tobe orthonormal and λ k ≥ M ∈ S + . We write U = span { u , ..., u r } as the column space of M , U ⊥ = span { u r + , ..., u n } asthe orthogonal complement of U and further denote P U and P U ⊥ as the orthogonal projections onto U and U ⊥ , respectively.Let T = { U Z T + ZU T : Z ∈ R n × r } be the tangent space of the rank r matrix in S + at M . The orthogonal projection onto T is given by P T X = P U X + X P U − P U X P U (11)It can be readily verified that P T ⊥ X = X − P T X = P U ⊥ X P U ⊥ . The coherence conditions can now be defined as follows. Definition 1.
The aforementioned rank r matrix M ∈ S + has coherence ν with respect to basis { w α } α ∈ I if the followingestimates hold max α ∈ I (cid:88) β ∈ I (cid:104)P T w α , w β (cid:105) ≤ ν rn (12)max α ∈ I (cid:88) β ∈ I (cid:104)P T v α , w β (cid:105) ≤ ν rn (13)max α ∈ I (cid:104) w α , U U T (cid:105) ≤ ν rn (14) where { v α } α ∈ I is the dual basis of { w α } α ∈ I . Remark 1. If { w α } is an orthonormal basis, it follows trivially that the dual basis { v α } is also { w α } . For this specific case, theabove coherence conditions are equivalent, up to constants, to the coherence conditions in [29]. However, in the most generalsetting, the coherence conditions (12) and (13) depart from their orthonormal counterparts. This is because these conditionsdepend on the spectrum of the correlation matrix H . Since the analysis in the sequel makes repeated use of the coherenceconditions, the above equations are further simplified to convenient forms as allows. First, using (12) , consider a bound on ||P T w α || F . Using Lemma A.4 and Lemma A.6, one obtains ||P T w α || F ≤ max α ∈ I (cid:88) β ∈ I (cid:104)P T w α , w β (cid:105) ≤ ν rn −→ ||P T w α || F ≤ ν rnUsing the above inequality and Lemma A.4, one obtains the following bound for ||P T v α || F . ||P T v α || F ≤ (cid:88) β ∈ I || H α , β P T w β || F = (cid:88) β ∈ I | H α , β | ||P T w β || F ≤ (cid:114) ν rnNext, consider a bound on (cid:104) v α , U U T (cid:105) . Using (14) and Lemma A.4, one arrives at the following inequality. (cid:104) v α , U U T (cid:105) = (cid:104) (cid:88) β ∈ I H α , β w β , U U T (cid:105) ≤ max β ∈ I (cid:104) w β , U U T (cid:105) (cid:88) β ∈ I | H α , β | ≤ ν rn All in all, the simplified forms of the coherence conditions are summarized as follows. max α ∈ I ||P T w α || F ≤ ν rn (15)max α ∈ I ||P T v α || F ≤ ν rn (16)max α ∈ I (cid:104) v α , U U T (cid:105) ≤ ν rn (17)The simplified coherence conditions presented above are the same as the standard coherence assumptions up to constants (see[29], [31]). If the matrix M has coherence ν with respect to the standard basis, comparable bounds could be derived for theabove coherence conditions. This is true since the basis { w α } is not “far” from the standard basis. For a precise statement ofthis fact, we refer the interested reader to Lemma A.5. The implication of this fact is that an incoherent M with respect to thestandard basis is also incoherent with respect to the EDG basis. Intuitively, the coherence parameter is fundamentally aboutthe concentration of information in the underlying matrix. For a matrix with low coherence(incoherent), each measurement isequally as informative as the other. In contrast, the information is concentrated on few measurements for a coherent matrix. Remark 2 (Coherence and Geometry) . In the context of the EDG problem, an interesting question is the relationship, ifany, between coherence and the geometry of the underlying points. One analytical task is to relate the coherence, usingthe compressive sensing definition, of the Gram matrix to the coherence conditions (12) - (14) . Computationally, numericaltests indicate that points arising from random distributions or points which are a random sample of structured points havecomparable coherence values irrespective of the underlying distribution. However, since estimating coherence accurately ande ffi ciently for large n is computationally expensive, the numerical tests were limited to relatively small values of n. The questionof the geometry of points and how it relates to coherence is important to us and future work will explore this problem throughanalysis and extensive numerical tests.c) Sampling Model: The sampling scheme is an important element of the matrix completion problem. For the EDGcompletion problem, it is assumed that the basis vectors are sampled uniformly at random with replacement. Previous workshave considered uniform sampling with replacement [29], [31] and Bernoulli sampling [10]. The implication of our choiceis that the sampling process is independent. This property is essential as we make use of concentration inequalities for i.i.dmatrix valued random variables.
Remark 3.
For the EDG completion problem, we have considered uniform sampling with out replacement. In this case, thesampling process is no longer independent. The implication is that one could not simply use concentration inequalities for i.i.dmatrix valued random variables. However, as noted in the classic work of Hoe ff ding [34, section 6], the results derived for thecase of the sampling with replacement also hold true for the case of sampling without replacement. An analogous argumentfor matrix concentration inequalities, resulting from the operator Cherno ff bound technique [35], under uniform sampling without replacement is shown in [36]. The analysis based on this choice leads to a “slightly lower” number of measurementsm. Since the gain is minimal, which will be made apparent in later analysis, we use the uniform sampling with replacementmodel. III. P roof of M ain R esult The main goal of this section is to show that the nuclear norm minimization problem (9) admits a unique solution. Thus,the optimization problem provides the desired reconstruction. Under certain conditions, our main result guarantees a uniquesolution for the minimization problem (9). More precisely, we have the following theorem.
Theorem 1.
Let M ∈ R n × n be a matrix of rank r that obeys the coherence conditions (12) , (13) and (14) with coherence ν .Assume m measurements, {(cid:104) M , w α (cid:105)} α ∈ Ω , are sampled uniformly at random with replacement. For β > , ifm ≥ nr log (cid:16) √ Lrn (cid:17) (cid:34) (cid:32) ν + nr (cid:33) (cid:16) β log( n ) + log (cid:16) (4 L √ r ) (cid:17)(cid:17)(cid:35) the solution to (9) , equivalently (2) and (4) , is unique and equal to M with probability at least − n − β . The optimization problem in (9) is a convex minimization problem for which the optimality of a feasible solution X isequivalent to the su ffi cient KKT conditions. For details, we refer the interested reader to [10] where the the authors derive acompact form of these optimality conditions. The over all structure of the proof is as follows. First, we show that if certaindeterministic optimality and uniqueness conditions hold, then it certifies that M is a unique solution to the minimizationproblem. The remaining part of the proof will focus on showing that these conditions hold with very high probability undercertain conditions. This in turn will imply that M is a unique solution with very high probability for an appropriate choice of m .The proof of Theorem 1 closely follows the approach in [29]. For ease of presentation, the proof is divided into severalintermediate results. Readers familiar with matrix completion proof in [29], [31] will recall that one crucial part of the proof isbounding the spectral norm of ||P T R Ω P T −P T || . The interpretation of this bound is that the operator P T R Ω P T is almost isometricto P T . In the case of our problem, R Ω is no longer self-adjoint and the equivalent statement is a bound on ||P T R ∗ Ω R Ω P T − P T || .Unfortunately, the term ||P T R ∗ Ω R Ω P T − P T || is not amenable to simpler analysis as standard concentration inequalities resultsub-optimal success rates. The idea is instead to consider the operator P T F Ω P T and show that the minimum eigenvalue isbounded away from 0 with high probability using the operator Cherno ff bound. The implication is that the operator P T F Ω P T is full rank on the space T . For non orthogonal measurements, P T F Ω P T can be interpreted as the analogue of the operator P T R Ω P T . In [30], for the standard matrix completion problem with measurement basis e i j , it was argued theoretically that alower bound for m is O ( nr ν log n ). Theorem 1 requires on the order of nr ν log n measurements which is log( n ) multiplicativefactor away from this sharp lower bound. We remark that, despite the aforementioned technical challenges, our result is ofthe same order as the result in [29], [31] which consider the low rank recovery problem with any orthogonal basis and thematrix completion problem respectively. We start the proof by stating and proving Theorem 2 which rigorously shows that ifappropriate conditions as discussed earlier hold, then M is a unique solution to (9). Theorem 2.
Given X ∈ S + , we write ∆ = X − M as the deviation from the underlying low rank matrix M . Let ∆ T and ∆ T ⊥ be the orthogonal projection of ∆ to T and T ⊥ , respectively. For any given Ω with | Ω | = m, the following two statementshold.(a). If || ∆ T || F ≥ (cid:16) n √ L || ∆ T ⊥ || F (cid:17) and λ min ( P T F Ω P T ) > , then R Ω ∆ (cid:44) .(b). If || ∆ T || F < (cid:16) n √ L || ∆ T ⊥ || F (cid:17) for ∆ ∈ ker R Ω , and there exists a Y ∈ range R ∗ Ω satisfying, ||P T Y − Sgn M || F ≤ n √ L and ||P T ⊥ Y || ≤
12 (18) then (cid:107) X (cid:107) ∗ = (cid:107) M + ∆ (cid:107) ∗ > (cid:107) M (cid:107) ∗ . To interpret the above theorem, note that Theorem 2(a) states that any deviation from M fails to be in the null space of thesampling operator for “large” ∆ T . For “small” ∆ T , Theorem 2(b) claims that any deviation from M increases the nuclearnorm thereby violating the minimization condition. We would like to emphasize that this is a deterministic theorem whichdoes not depend on the random sampling of Ω as long as the conditions in the statement are satisfied. In fact, we will latershow that these conditions will hold with very high probability under certain sampling conditions and an appropriate choiceof m = | Ω | . A. Proof of Theorem 2Proof of Theorem 2(a).
Suppose (cid:107) ∆ T (cid:107) F ≥ (cid:16) n √ L (cid:107) ∆ T ⊥ (cid:107) F (cid:17) . To prove R Ω ∆ (cid:44)
0, it su ffi ces to show (cid:107)R Ω ∆ (cid:107) F >
0. Note that (cid:107)R Ω ∆ (cid:107) F = (cid:107)R Ω ∆ T + R Ω ∆ T ⊥ (cid:107) F ≥ (cid:107)R Ω ∆ T (cid:107) F − (cid:107)R Ω ∆ T ⊥ (cid:107) F . This motivates finding a lower bound for ||R Ω ∆ T || F and an upperbound for ||R Ω ∆ T ⊥ || F . For any X , ||R Ω X || F can be bounded as follows ||R Ω X || F = (cid:104)R Ω X , R Ω X (cid:105) = (cid:104) X , R ∗ Ω R Ω X (cid:105) = L m (cid:88) β ∈ Ω (cid:88) α ∈ Ω (cid:104) X , w α (cid:105)(cid:104) v α , v β (cid:105)(cid:104) X , w β (cid:105) = L m (cid:88) β ∈ Ω (cid:88) α ∈ Ω (cid:104) X , w α (cid:105) H α , β (cid:104) X , w β (cid:105) noting that (cid:104) v α , v β (cid:105) = H α , β . Using the min-max theorem in the above equation, one obtains L m λ min ( H − ) (cid:88) α ∈ Ω (cid:104) X , w α (cid:105) ≤ ||R Ω X || F ≤ L m λ max ( H − ) (cid:88) α ∈ Ω (cid:104) X , w α (cid:105) ≤ L m λ max ( H − ) (cid:88) α ∈ I (cid:104) X , w α (cid:105) (19)Using the fact λ max ( H − ) = (cid:80) α ∈ I (cid:104) X , w α (cid:105) ≤ n (cid:107) X (cid:107) F (Lemma A.6) and setting X = ∆ T ⊥ in the rightof the above inequality results ||R Ω ∆ T ⊥ || F ≤ L m nm || ∆ T ⊥ || F (20)In the above inequality, the constant m bounds the maximum number of repetitions for any given measurement. Next, thelower bound for ||R Ω ∆ T || F is considered using the left inequality in (19). ||R Ω ∆ T || F ≥ L m n (cid:88) α ∈ Ω (cid:104) ∆ T , w α (cid:105) = L mn (cid:104) ∆ T , Lm (cid:88) α ∈ Ω (cid:104) ∆ T , w α (cid:105) w α (cid:105) = L mn (cid:104) ∆ T , F Ω ∆ T (cid:105) (21)with the fact that λ min ( H − ) = n (Lemma A.4). Using the min-max theorem on the projection onto T of the restricted frameoperator, we have ||R Ω ∆ T || F ≥ L mn (cid:104) ∆ T , F Ω ∆ T (cid:105) = L mn (cid:104) ∆ T , P T F Ω P T ∆ T (cid:105) ≥ L mn λ min ( P T F Ω P T ) || ∆ T || F (22)Above, the first equality holds since P T ∆ T = ∆ T and P T is self-adjoint. The last inequality simply applies the min-maxtheorem on the self-adjoint operator P T F Ω P T . Using the assumption that λ min ( P T F Ω P T ) > , the inequality in (22) reducesto ||R Ω ∆ T || F > L mn || ∆ T || F (23)Combining (20) and (23), we have (cid:107)R Ω ∆ (cid:107) F > (cid:114) L mn || ∆ T || F − Lm √ n || ∆ T ⊥ || F ≥ (cid:114) L mn (cid:16) n √ L (cid:17) || ∆ T ⊥ || F − Lm √ nm || ∆ T ⊥ || F = (cid:3) Remark 4. ( ) The estimate of the upper bound of ||R Ω ∆ T ⊥ || F is not sharp. Specifically, the constant m can be much lowered.This can be achieved, for example, by employing standard concentration inequalities and arguing that the expected number ofduplicate measurements is substantially smaller than m. Alternatively, as noted earlier in the sampling model section, uniformsampling with out replacement can be adopted which has the implication that the constant m is not necessary. Both analysis leadto a better estimate of the upper bound. However, the gain in terms of measurements, m, is minimal, resulting lower constantsinside of a log . We use the current estimate, albeit not sharp, for sake of more compact analysis. ( ) Lower bounding the term (cid:80) α ∈ Ω (cid:104) ∆ T , w α (cid:105) is not amenable to simpler analysis. In fact, a mere application of the standard concentration inequalitiesresult an undesirable probability of failure that increases with n. Note that ||R Ω ∆ T || F = (cid:104) ∆ T , P T R ∗ Ω R Ω P T ∆ T − P T ∆ T (cid:105) + ||P T ∆ T || F . A lower bound for ||R Ω ∆ T || F motivates an upper bound of ||P T R ∗ Ω R Ω P T ∆ T − P T || . This bound can be achieved,albeit long calculations, but it relies on the special structure of w α . To make the technical analysis simple and more general,we use an alternative approach shown in the above proof. This is a di ff erent way to handle the lower bound estimation of ||R Ω ∆ T || F .Proof of Theorem 2(b). Consider a feasible solution X = M + ∆ ∈ S + satisfying || ∆ T || F < n √ L (cid:107) ∆ T ⊥ (cid:107) F and ∆ ∈ ker R Ω .We need to show that (cid:107) M + ∆ (cid:107) ∗ > (cid:107) M (cid:107) ∗ which implies that nuclear minimization is violated. The proof of this requiresthe construction of a dual certificate Y satisfying certain conditions. The proof is similar to the proof in section 2E of[29] but it is shown below for completeness and ease of reference in later analysis. Using the pinching inequality [37], (cid:107) M + ∆ (cid:107) ∗ ≥ ||P U ( M + ∆ ) P U || ∗ + ||P U ⊥ ( M + ∆ ) P U ⊥ || ∗ . Noting that P U M P U = M , P U ⊥ M P U ⊥ = and P U ⊥ ∆ P U ⊥ = ∆ T ⊥ ,the above inequality reduces to (cid:107) M + ∆ (cid:107) ∗ ≥ || M + P U ∆ P U || ∗ + || ∆ T ⊥ || ∗ (24)Note that || ∆ T ⊥ || ∗ = (cid:104) Sgn ∆ T ⊥ , ∆ T ⊥ (cid:105) . The first term on right hand side can be lower bounded using the fact that the nuclearnorm and the spectral norm are dual to one another. Stated precisely, || X || ∗ = sup || X ||≤ (cid:104) X , X (cid:105) for any X . Using this inequalitywith X = Sgn M and X = M + P U ∆ P U in (24) results || M + P U ∆ P U || ∗ + || ∆ T ⊥ || ∗ ≥ (cid:104) Sgn M , M + P U ∆ P U (cid:105) + (cid:104) Sgn ∆ T ⊥ , ∆ T ⊥ (cid:105) = || M || ∗ + (cid:104) Sgn M , P U ∆ P U (cid:105) + (cid:104) Sgn ∆ T ⊥ , ∆ T ⊥ (cid:105) Note that Sgn M ∈ T (see Lemma A.1). Using this fact, it can be verified that (cid:104) Sgn M , P U ∆ T ⊥ P U (cid:105) = (cid:104) Sgn M , P U ∆ T P U (cid:105) = (cid:104) Sgn M , ∆ T (cid:105) . The above inequality can now be written as || M + P U ∆ P U || ∗ + || ∆ T ⊥ || ∗ ≥ || M || ∗ + (cid:104) Sgn M , ∆ T (cid:105) + (cid:104) Sgn ∆ T ⊥ , ∆ T ⊥ (cid:105) (25)Using (24) and (25), it can be concluded that || M + ∆ || ∗ > || M || ∗ if it can be shown that (cid:104) Sgn M , ∆ T (cid:105) + (cid:104) Sgn ∆ T ⊥ , ∆ T ⊥ (cid:105) > (cid:104) Y , ∆ (cid:105) = Y ∈ range R ∗ Ω , we have (cid:104) Sgn M , ∆ T (cid:105) + (cid:104) Sgn ∆ T ⊥ , ∆ T ⊥ (cid:105) = (cid:104) Sgn M , ∆ T (cid:105) + (cid:104) Sgn ∆ T ⊥ , ∆ T ⊥ (cid:105) − (cid:104) Y , ∆ (cid:105) = (cid:104) Sgn M − P T Y , ∆ T (cid:105) + (cid:104) Sgn ∆ T ⊥ , ∆ T ⊥ (cid:105) − (cid:104)P T ⊥ Y , ∆ T ⊥ (cid:105) Assuming the conditions in the statement of the theorem and considering the last equation, we obtain (cid:104)
Sgn M , ∆ T (cid:105) + (cid:104) Sgn ∆ T ⊥ , ∆ T ⊥ (cid:105) ≥ − n √ L || ∆ T || F + || ∆ T ⊥ || ∗ − || ∆ T ⊥ || ∗ ≥ − n √ L || ∆ T || F + || ∆ T ⊥ || F > − n √ L (cid:16) n √ L (cid:107) ∆ T ⊥ (cid:107) F (cid:17) + || ∆ T ⊥ || F = || ∆ T ⊥ || F Above, the first inequality follows from the duality of the spectral norm and the nuclear norm. It has been shown that (cid:104)
Sgn M , ∆ T (cid:105) + (cid:104) Sgn ∆ T ⊥ , ∆ T ⊥ (cid:105) > (cid:107) M + ∆ (cid:107) ∗ > (cid:107) M (cid:107) ∗ concluding the proof of Theorem 2(b). (cid:3) Consequently, if the deterministic conditions in Theorem 2 hold, M is a unique solution to (9). We formally write it as acorollary which can help us to prove the probabilistic statement in Theorem 1. Corollary 1.
If the conditions of Theorem 2 hold, M is a unique solution to (9) .Proof. For any X ∈ S + , define ∆ = M − X . Using Theorem 2(a), R Ω ∆ (cid:44) if || ∆ T || F ≥ (cid:16) n √ L || ∆ T ⊥ || F (cid:17) . It then su ffi ces toconsider the case || ∆ T || F < (cid:16) n √ L || ∆ T ⊥ || F (cid:17) for ∆ ∈ ker R Ω . For this case, the proof of Theorem 2(b) shows that (cid:107) X (cid:107) ∗ > (cid:107) M (cid:107) ∗ .It can then be concluded that M is the unique solution to (9). (cid:3) B. Proof of Theorem 1
It follows that certifying that the two conditions in Theorem 2 hold implies a unique solution to (9). Hence, the main taskin the proof is to show that, under the assumptions of Theorem 1, these two conditions hold with very high probability. If thiscan be achieved, it implies that the conclusion of Theorem 1 holds true with the same high probability. The first condition inTheorem 2 is that λ min ( P T F Ω P T ) > . The goal is to show that the minimum eigenvalue of the operator P T F Ω P T is boundedaway from zero. This will be proven in Lemma 1 to follow shortly. The lemma makes use of the matrix Cherno ff bound in[38] and is restated below for convenience. Theorem 3.
Let {L k } be a finite sequence of independent, random, self-adjoint operators satisfying L k (cid:23) and ||L k || ≤ R almost surelywith the minimum and maximum eigenvalues of the sum of the expectations, µ min : = λ min (cid:88) k E [ L k ] and µ max : = λ max (cid:88) k E [ L k ] Then, we have Pr (cid:20) λ min (cid:88) k L k ≤ (1 − δ ) µ min (cid:21) ≤ n (cid:20) exp( − δ )(1 − δ ) − δ (cid:21) µ min R for δ ∈ [0 , δ ∈ [0 , − δ ), note that (1 − δ ) log(1 − δ ) ≥ − δ + δ . This results the following simplifiedestimate Pr (cid:20) λ min (cid:88) k L k ≤ (1 − δ ) µ min (cid:21) ≤ n exp (cid:18) − δ µ min R (cid:19) for δ ∈ [0 , Lemma 1.
Given the operator P T F Ω P T : T → T with κ = mnr , the following estimate holds.Pr (cid:32) λ min ( P T F Ω P T ) ≤ (cid:33) ≤ n exp (cid:18) − κ ν (cid:19) Proof.
Using the definition of F Ω ( X ) = Lm (cid:80) α ∈ Ω (cid:104) X , w α (cid:105) w α , for X ∈ T , P T F Ω P T X can be written as follows P T F Ω P T X = (cid:88) α ∈ Ω Lm (cid:104) X , P T w α (cid:105) P T w α Let L α denote the operator in the summand, L α = Lm (cid:104)· , P T w α (cid:105) P T w α . It can be easily verified that L α is positive semidefinite.The operator Cherno ff bound requires estimate of R and µ min . R can be estimated as follows. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Lm (cid:104)· , P T w α (cid:105) P T w α (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = Lm ||P T w α || F ≤ n ν rm The last inequality follows from the coherence estimate in (15). Therefore, R = n ν rm . Next, we consider µ min . The first step isto compute (cid:80) α ∈ Ω E [ L α ]. (cid:88) α ∈ Ω E [ L α ] = (cid:88) α ∈ Ω (cid:20) (cid:88) α m (cid:104)· , P T w α (cid:105) P T w α (cid:21) = (cid:88) α (cid:104)· , P T w α (cid:105) P T w α For any X ∈ T , lower bound (cid:104) X , (cid:88) α ∈ Ω E [ L α ]( X ) (cid:105) as follows. (cid:104) X , (cid:88) α ∈ Ω E [ L α ]( X ) (cid:105) = (cid:88) α (cid:104) X , w α (cid:105) ≥ || X || F The inequality results from Lemma A.4 and Lemma A.6. Noting that (cid:80) α ∈ Ω E [ L α ] is a self-adjoint operator, and using thevariational characterization of the minimum eigenvalue, it can be concluded that the minimum eigenvalue of (cid:80) α ∈ Ω E [ L α ]is at least 1. With this, set µ min =
1. Finally, apply the matrix Cherno ff bound with R = n ν rm and µ min =
1. Setting δ = , λ min ( P T F Ω P T ) > with probability of failure at most p given by p = n exp (cid:18) − κ ν (cid:19) This concludes the proof. (cid:3)
Under the assumptions of Theorem 1 and using Lemma 1, λ min ( P T F Ω P T ) > holds with probability 1 − p where theprobability of failure is p = n exp (cid:18) − κ ν (cid:19) with κ = mnr .The proof of the second condition in Theorem 2 is more technically involved and requires the construction of Y satisfyingthe conditions in (18). The construction follows the ingenious golfing scheme introduced in [29]. Partition the basis elementsin Ω into l batches with the i -th batch Ω i containing m i elements and l (cid:88) i = m i = m . The restriction operator for a batch Ω i isdefined as R i = Lm i (cid:88) α ∈ Ω i (cid:104) X , w α (cid:105) v α . Then Y = Y l is constructed using the following recursive scheme Q = sgn M , Y i = i (cid:88) j = R ∗ j Q j − , Q i = sgn M − P T Y i , i = , · · · , l (26)Using the golfing scheme, it will be shown that the conditions in (18) hold with very high probability. The technical analysisof the scheme, particularly certifying the first condition in (18), requires a probabilistic estimate of ||P T R ∗ Ω P T X − P T X || F ≥ t for a fixed matrix X ∈ S . Using Lemma 2 to follow, a suitable estimate can be achieved. The lemma uses the vector Bernsteininequality in [29]. An easier but slightly weaker version is restated below for convenience of reference. Theorem 4 (Vector Bernstein inequality) . Let x , ..., x m be independent zero-mean vector valued random variables. Assumingthat max i || x i || ≤ R and let (cid:80) i E [ || x i || ] ≤ σ , then for any t ≤ σ R , the following inequality holdsPr (cid:20)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m (cid:88) i = x i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t (cid:21) ≤ exp (cid:32) − t σ + (cid:33) , Lemma 2.
For a fixed matrix X ∈ S , the following estimate holdsPr ( ||P T R ∗ Ω P T X − P T X || F ≥ t || X || F ) ≤ exp − t κ (cid:16) ν + nr (cid:17) + (27) for all t ≤ with κ = mnr .Proof. It is clear that P T X ∈ S from the definition of P T . Thus, we can expand P T R ∗ Ω P T X as follows. P T R ∗ Ω P T X = Lm (cid:88) α ∈ Ω (cid:104)P T X , v α (cid:105) P T w α (28) With this, P T R ∗ Ω P T X − P T X can be written as follows P T R ∗ Ω P T X − P T X = (cid:88) α ∈ Ω (cid:20) Lm (cid:104)P T X , v α (cid:105) P T w α − m P T X (cid:21) (29)Let X α = Lm (cid:104)P T X , v α (cid:105)P T w α . Noting that E [ Lm (cid:104)P T X , v α (cid:105)P T w α ] = m P T X , let Y α = X α − E [ X α ] denote the summand.It is clear that E [ Y α ] = by construction. The vector Bernstein inequality requires a suitable bound for || Y α || F and E [ || Y α || F ].Without loss of generality, assume || X || F =
1. The first step is to bound || Y α || F . || Y α || F = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Lm (cid:104)P T X , v α (cid:105) P T w α − m P T X (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Lm (cid:104)P T X , v α (cid:105)P T w α (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m P T X (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F (30) < n m max α ∈ I ||P T w α || F max α ∈ I ||P T v α || F + m ≤ m (2 n ν r +
1) (31)Above, the last inequality follows from the coherence estimates (15) and (16). Using the above estimate, set R in the vectorBernstein inequality as R = m (2 n ν r + E [ || Y α || F ] ≤ E [ || X α || F ] + || E [ X α ] || F . With this, E [ || Y α || F ] can be upper bounded as follows E [ || Y α || F ] ≤ L m E [ (cid:104)P T X , v α (cid:105) ||P T w α || F ] + m ||P T X || F ≤ Lm max α ∈ I ||P T w α || F (cid:88) α ∈ I (cid:104)P T X , v α (cid:105) + m ≤ n m ν rn + m ≤ m (1 + n ν r )Above, the third inequality follows from the coherence estimates, (15) and (16), and Lemma A.6. Using the above estimate,set σ in vector Bernstein inequality as σ = m (1 + n ν r ). Finally, apply the vector Bernstein inequality with R = m (2 n ν r + σ = m (1 + n ν r ). For t ≤ σ R > ||P T R ∗ Ω P T X − P T X || F ≥ t ) ≤ exp − t κ (cid:16) ν + nr (cid:17) + (32)where κ = mnr . This concludes the proof of Lemma 2. (cid:3) Having proved Lemma 2, the next step is to show that the golfing scheme (26) certifies the conditions in (18) with veryhigh probability. The precise statement is stated in Lemma 3 below.
Lemma 3. Y l obtained from the golfing scheme (26) satisfies the conditions in (18) with failure probability at most p = l (cid:88) i = ( p ( i ) + p ( i ) + p ( i )) where p ( i ) = exp − κ i (cid:16) ν + nr (cid:17) + , p ( i ) = n exp (cid:32) − k i ν (cid:33) and p ( i ) = n exp − κ i (cid:16) ν + nr (cid:17) withk i = m i nr .Proof. Note that sgn M is symmetric. It is easy to verify that Q i is symmetric as Y i and P T Y i are symmetric. In addition,using Lemma A.1, Q i is in T since sgn M ∈ T . To show that the first condition in (18) holds, we derive a recursive formulafor Q i as follows. Q i = sgn M − P T i (cid:88) j = R ∗ j Q j − = sgn M − P T i − (cid:88) j = R ∗ j Q j − + R ∗ i Q i − = sgn M − P T i − (cid:88) j = R ∗ j Q j − − P T R ∗ i Q i − = sgn M − P T Y i − − P T R ∗ i Q i − = Q i − − P T R ∗ i Q i − = ( P T − P T R ∗ i P T ) Q i − (33)Observe that the first condition in (18) is equivalent to a bound on (cid:107) Q l (cid:107) F . Using the bound (cid:107) ( P T −P T R ∗ i P T ) Q i − (cid:107) F < t , i (cid:107) Q i − (cid:107) F with failure probability p ( i ) obtained from Lemma 2 and setting t , i = / p ( i ) = exp − κ i (cid:16) ν + nr (cid:17) + with κ i = m i nr , wehave the following bound of (cid:107) Q i (cid:107) F using the above recursive formula. (cid:107) Q i (cid:107) F < i (cid:89) k = t , k (cid:107) Q (cid:107) F = − i √ r (34) To satisfy the first condition in (18), set l = log (cid:16) n √ Lr (cid:17) . Using the union bound on the failure probabilities p ( i ), we have ||P T Y − Sgn M || F = || Q l || F ≤ √ r − l = n √ L holds true with failure probability at most (cid:80) li = p ( i ).To complete the proof, it remains to show that Y satisfies the second condition in (18). The condition is equivalentto controlling the operator norm of P T ⊥ Y l . First, it is clear that (cid:107)P T ⊥ Y l (cid:107) = (cid:107) (cid:80) lj = P T ⊥ R ∗ j Q j − (cid:107) ≤ (cid:80) lj = (cid:107)P T ⊥ R ∗ j Q j − (cid:107) . Thismotivates bounding the operator norm of (cid:107)P T ⊥ R ∗ j Q j − (cid:107) which will be the focus of Lemma 4 below. Before proving the lemma,we start by addressing an assumption the lemma entails on the size of η ( Q i ) = max β |(cid:104) Q i , v β (cid:105)| . Specifically, at the i -th stageof the golfing scheme, we need to show that the assumption η ( Q i ) ≤ ν n c i with || Q i || F ≤ c i holds with very high probability.To enforce this, let η ( Q i ) < t , i with probability 1 − p ( i ). We set t , i = η ( Q i − ) to obtain η ( Q i ) < − η ( Q i − ) < − i η (sgn M ) ≤ ν rn (2 − i ) = ν n (2 − i r )Above, the second inequality is obtained by applying the inequality η ( Q i ) < η ( Q i − ) recursively and the last inequalityfollows from the coherence condition in (17). Using (34), at the i -th stage of the golfing scheme, the above inequality preciselyenforces the condition that η ( Q i ) ≤ ν n c i with c i = − i √ r . Using Lemma A.8, noting that η ( Q i ) = η ( Q i − − P T R ∗ i Q i − ) , thefailure probability p ( i ) is given by p ( i ) = n exp − κ i (cid:16) ν + nr (cid:17) ∀ i ∈ [1 , l ]This concludes the analysis on the assumption of the size of η ( Q i ) which will be used freely in Lemma 4. Using the bound ||P T ⊥ R ∗ j Q j − || ≤ t , j c j − , where once again c j − satisfies || Q j − || F ≤ c j − = − ( j − , with failure probability p ( j ) obtained fromLemma 4 and setting t , j = √ r , p ( j ) = n exp (cid:32) − κ j ν (cid:33) with κ = m j nr , it follows that ||P T ⊥ Y l || ≤ l (cid:88) k = ||P T ⊥ R ∗ k Q k − || ≤ l (cid:88) k = t , k c k − = √ r l (cid:88) k = c k − = √ r l (cid:88) k = √ r − ( k − < || Q k − || F . Using the union bound of failure probabilities, ||P T ⊥ Y l || < holds true with failure probability which is at most (cid:80) lj = [ p ( j ) + p ( j )]. The second condition in (18) now holds with thisfailure probability. This completes the proof of Lemma 3. (cid:3) The proof of Lemma 4 uses the Bernstein inequality. A simplified version of the inequality was derived in [38]. Our analysisuses this simpler version and is restated below for ease of reference.
Theorem 5 (Bernstein inequality) . Consider a finite sequence { X i } of independent, random, self-adjoint matrices with dimensionn. Assuming that E [ X i ] = and λ max ( X i ) ≤ R almost surely.with the norm of the total variance σ = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:80) i E ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , then, for all t ≥ , the following estimate holds:Pr (cid:20)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) i X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > t (cid:21) ≤ n exp (cid:32) − t σ (cid:33) t ≤ σ R n exp (cid:32) − t R (cid:33) t ≥ σ R (35)Now we are ready to estimate ||P T ⊥ R ∗ j Q j − || formally described in the following lemma. Lemma 4.
Consider a fixed matrix G ∈ T . Assume that max β (cid:104) G , v β (cid:105) ≤ ν n c with c satisfying || G || F ≤ c . Then, the followingestimate holds for all t ≤ √ r with κ j = m j nr .Pr ( ||P T ⊥ R ∗ j G || > t c ) ≤ n exp (cid:32) − t κ j r ν (cid:33) Proof of Lemma 4.
We expand P T ⊥ R ∗ j G in the dual basis as P T ⊥ R ∗ j G = (cid:88) α ∈ Ω j Lm j (cid:104) G , v α (cid:105)P T ⊥ w α . Let X α denote the summand.The proof makes use of Bernstein inequality (35) which mandates appropriate bound on (cid:107) X α (cid:107) and (cid:107) E [ X α ] (cid:107) . First consider the bound on the latter term. Noting that X α = L m j (cid:104) G , v α (cid:105) ( P T ⊥ w α ) , we have the following bound for E [ X α ] using LemmaA.7 and the fact that w α is positive semidefinite. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E [ X α ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ n m j max (cid:107) ϕ (cid:107) = (cid:88) α ∈ I (cid:104) G , v α (cid:105) (cid:104) ϕ , w α ϕ (cid:105) ≤ n m j max α ∈ I (cid:104) v α , G (cid:105) max (cid:107) ϕ (cid:107) = (cid:104) ϕ , (cid:88) α ∈ I w α ϕ (cid:105) (36)Due to the special structure of w α in the EDG problem, it is straightforward to verify that (cid:88) α ∈ I w α = n I − T It now follows that λ max ( (cid:80) α ∈ I w α ) = n which implies that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E [ X α ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ n m j max α ∈ I (cid:104) v α , G (cid:105) ≤ n m j ν n c = n ν m j c Next, consider a bound on (cid:107) X α (cid:107) which results (cid:107) X α (cid:107) ≤ n m j | max α ∈ I (cid:104) v α , G (cid:105)| (cid:107)P T ⊥ w α (cid:107) ≤ n m j √ ν n (cid:107) w α (cid:107) c (37)We consider two cases. If ν ≥ r , (cid:107) X α (cid:107) ≤ n ν √ rm j c = R and if ν < r , (cid:107) X α (cid:107) < n √ ν m j c = R . Finally, we apply the Bernsteininequality with R , R and σ = n ν m j c . It can be easily verified that σ R = √ r c and σ R < √ r c . Therefore, for all t ≤ √ r c ,the following estimate holds. Pr( (cid:107)P T ⊥ R ∗ j G (cid:107) ≥ t ) ≤ n exp (cid:32) − t κ j r ν c (cid:33) (38)with κ j = m j nr . This concludes the proof of Lemma 4. (cid:3) In what follows, it will be argued that M is a unique solution to (9) with “very high” probability. It is su ffi ces to showall X ∈ S + − { M } are solutions with a very small probability by choosing Ω “su ffi ciently large”. For X ∈ S + , writethe deviation ∆ = X − M . We define S = (cid:8) X ∈ S + : || ∆ T || F ≥ (cid:18) n (cid:113) Lm || ∆ T ⊥ || F (cid:19) (cid:9) and S = (cid:8) X ∈ S + : || ∆ T || F < (cid:18) n (cid:113) Lm || ∆ T ⊥ || F (cid:19) & R Ω ( ∆ ) = } be two spaces defined based on deterministic comparisons of || ∆ T || F and || ∆ T ⊥ || F . Assumingthat Ω is sampled uniformly at random with replacement, we consider the following two cases.1) Choose | Ω | = m “su ffi ciently large” such that Pr (cid:16)(cid:110) Ω ⊂ I (cid:12)(cid:12)(cid:12) | Ω | = m , λ min ( P T F Ω P T ) > / (cid:111)(cid:17) ≥ − p based on Lemma 1.Using Theorem 2(a), all X ∈ S are feasible solutions to (9) with probability p .2) Recall that using the golfing scheme Pr (cid:16)(cid:110) Ω ⊂ I (cid:12)(cid:12)(cid:12) | Ω | = m , Y ∈ range R ∗ Ω & ||P T Y − Sgn M || F ≤ L & ||P T ⊥ Y || ≤ (cid:111)(cid:17) ≥ − (cid:15) with (cid:15) = l (cid:88) i = (cid:2) p ( i ) + p ( i ) + p ( i ) (cid:3) by choosing | Ω | = m “su ffi ciently large”. Using Theorem 2(b), all X ∈ S are notsolutions to (9) with probability (cid:15) .Using the union bound, X (cid:44) M is a solution to the EDG nuclear minimization problem with probability which is at most p = p + (cid:80) li = [ p ( i ) + p ( i ) + p ( i )]. In what follows, the goal is to ensure that p is “small” with m su ffi ciently large implyingthe uniqueness of M to (9) with very high probability. This is attained by a suitable choice of the parameters m , l and m i which are detailed below.The first failure probability is the failure that condition in Theorem 2(a) does not hold and is given by p = n exp (cid:18) − κ ν (cid:19) In the proof of Lemma 3, the failure probabilities p ( i ), p ( i ) and p ( i ) are given by p ( i ) = exp − κ i (cid:16) ν + nr (cid:17) + ; p ( i ) = n exp (cid:32) − κ i ν (cid:33) ; p ( i ) = n exp − κ i (cid:16) ν + nr (cid:17) ∀ i ∈ [1 , l ]To prove Theorem 1, it remains to specify k i and show that the total probability of failure is very “small”. In precise terms,this means that the probability of failure is bounded above by n − β for some β > k i is chosen in such a way that all the failureprobabilities, p , p ( i ) , p ( i ) and p ( i ), are at most 14 l n − β . With this, one appropriate choice for k i is k i = (cid:0) ν + nr (cid:1)(cid:0) β log( n ) + log(4 l ) (cid:1) . Using the union bound, it can be verified that the total failure is bounded above by n − β . The number of measurement m = lnrk i must be at least log (cid:16) n √ Lr (cid:17) nr (cid:32) (cid:0) ν + nr (cid:1)(cid:2) β log( n ) + log (cid:16) (4 L √ r ) (cid:17) (cid:3)(cid:33) (39)This finishes the proof of Theorem 1 and concludes that the minimization program in (9) recovers the true inner product matrixwith very high probability. C. Noisy EDG Completion
In a practical settings, the available partial information is not exact but noisy. For simplicity, consider an additive Gaussiannoise Z with mean µ and variance σ . The modified nuclear norm minimization for the EDG problem can now be written asminimize X ∈ S + (cid:107) X (cid:107) ∗ subject to ||R Ω ( X ) − R Ω ( M ) || F ≤ δ (40)where δ characterizes the level of noise. Unlike the noisy matrix completion problem, the noise parameters for the EDGproblem can not be chosen arbitrarily. The reason is that the perturbed distance matrix needs to be non-negative. In the contextof numerically solving the noisy EDG problem, details of how to set these parameters will be discussed in the next section.Under the assumption that µ and σ are chosen appropriately, we posit that the theoretical guarantees for the exact case extendto noisy EDG completion. Following the analysis in [39] and using the dual basis framework, it can be surmised that Theorem1 holds true with failure probability proportional to noise level δ . A sketch of such a theorem is stated below. Theorem 6.
Let M ∈ R n × n matrix of rank r that obeys the coherence conditions (12) , (13) and (14) with coherence ν . Assumem measurements (cid:104) M , w α (cid:105) , sampled uniformly at random with replacement are corrupted with Gaussian noise of mean µ ,variance σ and noise level δ . For any β > , ifm ≥ log (cid:16) √ Lrn (cid:17) nr (cid:32) ν + nr ][ β log( n ) + log (cid:16) (4 L √ r ) (cid:17) ] (cid:33) then || M − ¯ M || F ≤ f (cid:18) n , mn , δ (cid:19) where ¯ M is a solution to (40) with probability at least − n − β . The interpretation of the above theorem is that the EDG problem is stable to noise. Numerical experiments confirm thisconsideration. However, a precise analysis mandates characterization of the level of noise and an exact specification of f inthe above theorem. This is left for future work. IV. N umerical A lgorithms The theoretical analysis so far shows that the nuclear minimization approach yields a unique solution for the EDG problem.In this section, we aim at developing a practical algorithm to recover the inner product matrix X given partial pairwise distanceinformation supported on some random set Ω . Since the available partial information might be noisy in applications, we alsoextend the algorithm to this case. A. Exact partial information
An algorithm similar to ours appears in [7] where the authors design an algorithm employing the alternative directionalminimization (ADM) method to recover the Gram matrix. A crucial part of their algorithm uses the hard thresholding operatorwhich computes eigendecompositions at every iteration and is computationally intensive. Comparatively, an advantage of ouralgorithm is that it does not require an eigendecomposition making it fast and suitable for tests on large data. Since the nuclearnorm of a positive semidefinite matrix equates to its trace, we consider the following minimization problem.min ¯ X ∈ R n × n Trace ( ¯ X ) subject to R Ω ( ¯ X ) = R Ω ( M ) , ¯ X · = and ¯ X (cid:23) Consider a matrix C ∈ R n × ( n − satisfying C T C = I and C T = , we rewrite the above minimization problem as follows bya change of variable ¯ X = CXC T .min X ∈ R ( n − × ( n − Trace (
CXC T ) subject to R Ω ( CXC T ) = R Ω ( M ) , X (cid:23) Note that the sum to one constraint drops out since C T = CXC T (cid:23) if X (cid:23) . To enforce that X is positivesemidefinite, let X = ¯ P ¯ P T where ¯ P ∈ R ( n − × q with q unknown a priori. Since ¯ P ¯ P T has at most rank q , it entails a good estimate for q which ideally should be a reasonable estimate of the rank. Due to the trace regularization, our algorithm onlyneeds a rough guess of q . The above minimization problem now reduces tomin ¯ P ∈ R ( n − × q Trace ( C ¯ P ¯ P T C T ) subject to R Ω ( C ¯ P ¯ P T C T ) = R Ω ( M ) (41)With P = C ¯ P , consider the simplified minimization problem.min P ∈ R n × q Trace (
P P T ) subject to R Ω ( P P T ) = R Ω ( M ) (42)The above technique, the change of variables employing C , has been previously used [21], [27]. In [27], the authors remarkthat the reparamterization leads to numerically stable interior point algorithms. Note that C is simply an orthonormal basis forthe space { x ∈ R n | x t = } . For the EDG problem, the goal is to find the Gram matrix ¯ X = CXC T = CC T P P T CC T . CC T is simply the orthogonal projection onto the aforementioned space given by CC T = I − n T . Given ¯ X , classicalMDS can then be used to find the point coordinates. Therefore, our focus is on solving for P in (42). We employ the methodof augmented Lagrangian to solve the minimization problem. The constraint R Ω ( P P T ) = R Ω ( M ) can be written compactlyusing the linear operator A defined as A : R n × n −→ R | Ω | with A ( X ) = f ∈ R | Ω |× , f i = (cid:104) X , w α i (cid:105) for α i ∈ Ω . For latter use,the adjoint of A , A ∗ , can be derived as follows. For y ∈ R | Ω |× , (cid:104)A X , y (cid:105) = (cid:80) i (cid:104) X , w α i (cid:105) y i = (cid:104) X , (cid:80) i y i w α i (cid:105) . It follows that A ∗ y = (cid:80) i y i w α i . Thus, we write (42) asmin P Trace (
P P T ) subject to A ( P P T ) = b (43)The augmented Lagrangian is given by L ( P ; Λ ) = Trace (
P P T ) + r ||A ( P P T ) − b + Λ || where Λ ∈ R | Ω |× denotes the Lagrangian multiplier and r is the penalty term. The augmented Lagrangian step is simply P k = argmin L ( P ; Λ k − ) followed by updating of the multiplier Λ k . To solve the first problem with respect to P , the Barzilai-Borwein steepest descent method [40] is employed with objective function Trace ( P P T ) + r ||A ( P P T ) − b + Λ k − || and gradient2 P + r A ∗ (cid:16) A ( P P T ) − b + Λ k − (cid:17) P . The iterative scheme is summarized in Algorithm 1. Algorithm 1
Augmented Lagrangian based scheme to solve (43) Initialization:
Set Λ = , q = P = rand(n, q), E = E =
0. Set maxiterations , bbiterations , r , Tol for k = do Barzilai-Borwein(BB) descent for P k = argmin L ( P ; Λ k − ). Λ k = Λ k − + A ( P k ( P k ) T ) − b E k = r ||A ( P k ( P k ) T ) − b || E k Total = Trace ( P k ( P k ) T ) + r ||A ( P k ( P k ) T ) − b + Λ k || if E k < Tol & E k Total < Tol then break B. Partial information with Gaussian noise
Assume that the available partial information is noisy. Formally, R Ω ( X ) = R Ω ( M ) + R Ω ( Z ) where R Ω ( Z ) is an additiveGaussian noise. Proceeding analogously to the case of exact partial information, the following minimization problem is obtained.min P Trace (
P P T ) + λ ||R Ω ( P P T ) − R Ω ( M ) || F (44)Using the operator A introduced earlier, (44) can be rewritten asmin P Trace (
P P T ) + λ ||A ( P P T ) − b || F (45)where b = A ( M ). The augmented Lagrangian is given by L ( P ; Λ ) = Trace (
P P T ) + λ ||A ( P P T ) − b || where Λ ∈ R n × denotes the Lagrangian multiplier and r is a penalty term. The augmented Lagrangian step is simply P k = argmin L ( P ; Λ k − ). As before, P is computed using the Barzilai-Borwein steepest descent method with objective functionTrace ( P P T ) + λ ||A ( P P T ) − b || and gradient 2 P + λ A ∗ (cid:16) A ( P P T ) − b (cid:17) P . The iterative scheme is summarized in Algorithm2. Algorithm 2
Augmented Lagrangian based scheme to solve (45) Initialization:
Set q = , P = rand(n,q) , E =
0. Set maxiterations , bbiterations , r , λ , Tol for k = do Barzilai-Borwein(BB) descent for P k . E k Total = Trace ( P k ( P k ) T ) + λ ||A ( P k ( P k ) T ) − b || if E k Total < Tol then break V. N umerical R esults In this section, we demonstrate the e ffi ciency and accuracy of the proposed algorithms. All the numerical experiments areran in MATLAB on a laptop with an Intel Core I7 2 . A. Experiments on synthetic and real data
We first test the Euclidean distance geometry problem on di ff erent three-dimensional objects. These objects include a sphere,a cow, and a map of a subset of US cities. Given n points from these objects, the full n × n distance matrix D has n ( n − = L degrees of freedom. The objective is to recover the full point coordinates given γ L entries of D , γ ∈ [0 , P from which X = P P T is constructed. Classical MDS is then used tofind the global coordinates. In all of the numerical experiments, a rank estimation q =
10 is used. This choice shows thatthe algorithms recover the ground truth despite a rough estimate of the true rank. The stopping criterion is a tolerance onthe relative total energy defined as ( E k Total − E k − ) / E k Total . For algorithm 1, an additional stopping criterion is a tolerance on E k = r ||A ( ¯ P k ) − b || . In all of the numerical experiments, the tolerance is set to 10 − . The maximum iteration is set to 100.Accuracy of the reconstruction is measured using the relative error of the inner product matrix || X − M || F || M || F where X is thenumerically computed inner product matrix and M is the ground truth.For di ff erent sampling rates, Figure 1 displays the reconstructed three-dimensional objects assuming that exact partialinformation is provided. For all experiments, the penalty term r is set to 1 .
0. Table II shows the relative error of the innerproduct matrix for all the di ff erent cases. The algorithm provides very good reconstruction except the 1% sphere. For thisspecific case, the distance matrix for the sphere is comparatively small. This means that 1% provides very little informationand more samples are needed for reasonable reconstruction. TABLE IIR elative error of the inner product matrix for different three - dimensional objects under different sampling rates . T he partial information is exact . R esultsare averages over runs . γ
1% 2% 3% 5%Sphere 3 . e −
01 2 . e −
03 7 . e −
05 3 . e − . e −
04 7 . e −
05 5 . e −
05 3 . e − . e −
04 1 . e −
04 7 . e −
05 5 . e − For the case of partial information with the additive Gaussian noise N ( µ, σ ), the noisy distance matrix can be written as¯ D = D + N ( µ, σ ). The standard deviation σ is a critical parameter determining the extent to which the underlying informationis noisy. In the EDG problem, the perturbed distance matrix ¯ D must have non-negative entries. Thus µ and σ need to be chosencarefully to satisfy this condition. In our numerical experiments, σ is set to be the minimum non-zero value of the partialdistance matrix and µ is 3 σ . It can be easily verified that, with very high probability, these choices ensure a non-negative noisydistance matrix. The choice of σ corresponds to the case where the error of the measurement is in the order of the minimumdistance. While this choice might not reflect practical measurements, the setting allows us to test the extent to which thealgorithm handles a noisy data. The parameter λ is a penalty term which needs to be chosen carefully. It can be surmised that λ needs to increase with increasing sampling rate. For our numerical experiments, a simple heuristic is to set λ = γ . Whilea more sophisticated analysis might result an optimal choice of λ , the simple choice is found to be su ffi cient and e ff ective inthe numerical tests. For di ff erent sampling rates, Figure 2 displays the reconstructed three-dimensional objects assuming that anoisy partial information is provided. Table III shows the relative error of the inner product matrix for all the di ff erent cases.The algorithm results good reconstruction except the 1% sphere. As discussed earlier, this specific case requires relatively moresamples for reasonable reconstruction.We further apply the proposed algorithm to the molecular conformation problem [1]. In this problem, the aim is to determinethe three-dimensional structure of a molecule given partial information on pairwise distances between the atoms. An instance of (a) 1% (b) 2% (c) 3% (d) 5%(e) 1% (f) 2% (g) 3% (h) 5%(i) 1% (j) 2% (k) 3% (l) 5%Fig. 1. Reconstruction results of di ff erent three-dimensional objects from exact partial information with sampling rates 1% (1st column), 2% (2nd column),3% (3rd column) and 5% (4th column), respectively. TABLE IIIR elative error of the inner product matrix for different three - dimensional objects under different sampling rates . T he relative error is an average of runs and the partial information is noisy .1% 2% 3% 5%Sphere 5 . e −
01 5 . e −
02 1 . e −
02 8 . e − . e −
02 6 . e −
03 3 . e −
03 2 . e − . e −
02 9 . e −
03 5 . e −
03 3 . e − this problem is determining the structure of proteins using nuclear magnetic resonance (NMR) spectroscopy or X-ray di ff ractionexperiments. The problem is challenging since the partial distance matrix obtained from experiments is sparse, non-uniform,noisy and prone to outliers [1], [41]. We test our method on a simple version of the problem to illustrate that the algorithm canalso work on real data. Our numerical experiment considers two protein molecules identified as 1AX8 and 1RGS obtained fromthe Protein Data Bank [42]. We use a pre-processed version of the data taken from [41]. Given the full distance matrix, thepartial data is a random uniform sample with 3% sampling rate. The setup of the numerical experiments is the same as before.Figure 3 displays the reconstructed three-dimensional structure of 1AX8 and 1RGS under exact partial information and noisypartial information. The results demonstrate that the algorithm provides good reconstruction of the underlying three-dimensionalstructures. B. Computational comparisons of Algorithms
To evaluate the e ffi ciency of the proposed algorithms, numerical tests are carried out using Algorithm1 and Algorithm 2.For the test, the input is a random uniform sample of the exact / noisy distance matrix of one of the three-dimensional objectsdiscussed earlier(a sphere, a cow and a map of a subset of US cities). The sampling rate is set to 5% and the algorithms arerun until a stopping criterion discussed before is met. In what follows, the reported results are averages of 50 runs and the (a) 1% (b) 2% (c) 3% (d) 5%(e) 1% (f) 2% (g) 3% (h) 5%(i) 1% (j) 2% (k) 3% (l) 5%Fig. 2. Reconstruction results of di ff erent three-dimensional objects from noisy partial information with di ff erent sampling rates 1% (1st column), 2% (2ndcolumn), 3% (3rd column) and 5% (4th column), respectively.(a) 1AX8 (b) 1AX8 (c) 1RGS (d) 1RGSFig. 3. Reconstructed three-dimensional structure of proteins identified as 1AX8 and 1RGS in the Protein Data Bank. Exact partial information (3% samplerate) for the first and third, noisy partial information for the second and fourth. number of iterations is the ceiling of the average number of iterations. Tables IV summarizes the results of the computationalexperiments for the three-dimensional objects. It can be concluded that the algorithm is fast and converges in few iterations tothe desired solution.We also conduct comparisons with the algorithm for the EDG problem proposed in [7] that also employs the augmentedLagrangian method. As remarked earlier, the main di ff erence between the two algorithms is on the way the positive semidefinitecondition is imposed on the inner product matrix X . In [7], the positive semidefinite condition is imposed on X directly whilealgorithm 1 uses the factorization P P T to enforce this condition. To compare the two algorithms, we use the same input ofdata which is a random uniform sample of the distance matrix of one of the three-dimensional objects. The sampling rate isset to 5%. For both algorithms, the stopping criterion is the relative error of the total energy and the tolerance is set 10 − .Table V summarizes the comparison of these two algorithms. The reported results are averages of 10 runs. We see that ouralgorithm converges to the desired solution faster and with significantly less number of iterations.Finally, we consider the molecular conformation problem and compare our algorithm to the DISCO algorithm proposed in[41]. The DISCO algorithm is an SDP based divide and conquer algorithm. In [41], the authors demonstrate that the algorithm is e ff ective on sparse and highly noisy molecular conformation problems. For the numerical tests, the input for both algorithmsis a protein molecule. The sampling rate is set to 5% and it is assumed that the underlying partial information is exact. Wedownloaded the DISCO code, a MATLAB mex code version 1 .
4, from http: // / ∼ mattohkc / disco.html whichprovides an input file and executables. For our algorithm, the stopping criterion is maximum number of iterations set to 200.We emphasize that the rank estimate using our method for all experiments is 5. This means our method has 5 / TABLE IVC omputational summary of A lgorithm from runs . T he data are the different three - dimensional objects . T he sampling rate is .
27 9Cow 2601 68 .
08 26US Cities 2920 101 .
92 34Noisy Sphere 1002 10 .
56 19Cow 2601 62 .
63 17US Cities 2920 62 .
33 18TABLE VC omparison of the proposed A lgorithm and the algorithm in [7] with runs . T he data are the different three - dimensional objects . I t is assumed that thepartial information is exact . T he sampling rate is .
06 101 .
85 9 204Cow 2601 51 .
40 1387 .
80 20 257US Cities 2920 72 .
51 2417 .
30 23 250TABLE VIC omparison of the proposed A lgorithm and the algorithm in [41] with runs . T he data are the different protein molecules . I t is assumed that the partialinformation is exact . T he sampling rate is .
38 11 .
97 8 . e −
07 1 . e − .
44 45 .
94 1 . e −
08 8 . e − .
62 214 .
30 5 . e −
09 1 . e − .
60 438 .
04 2 . e −
09 8 . e − .
14 392 .
29 3 . e −
09 3 . e − C. Phase transition of Algorithm 1
Last but not least, we numerically investigate the optimality of the proposed algorithm by plotting the phase transition. Giventhe number of points and the underlying rank, the theory provides the sampling rate which leads to successful recovery withvery high probability (see Theorem 1). To investigate the optimality of Algorithm 1, the following numerical experiment was carried out. Consider sampling rates ranging from 1% to 100% and rank ranging from 1 to 40. For each pair, Algorithm 1 isrun 50 times. Successful recovery refers to the case where the relative error of the inner product matrix is within tolerance.As remarked earlier, the tolerance is set to 10 − . Out of the 50 runs, the number of times the algorithm succeeds provides uswith a probability of success. This procedure is repeated for all combination of sampling rate and rank. Figure 4 shows theoptimality result of Algorithm 1. Namely, for a large portion in the sampling rate-rank domain, the proposed algorithm canprovide successful reconstruction. Fig. 4. Success probability of Algorithm 1 given sampling rate and rank. For each sampling rate, Algorithm 1 is run for ranks ranging from 1 to 40. Theaverage success probability, of 50 runs, is shown above. White indicates perfect recovery and black indicates failure of the algorithm.
VI. C onclusion
In this paper, we formulate the Euclidean distance geometry (EDG) problem as a low rank matrix recovery problem.Adopting the matrix completion framework, our approach can be viewed as completing the Gram matrix with respect to asuitable basis given few uniformly random distance samples. However, the existing analysis based on the restricted isometryproperty (RIP) does not hold for our problem. Alternatively, we conduct analysis by introducing the dual basis approach toformulate the EDG problem. Our main result shows that the underlying configuration of points can be recovered with very highprobability from O ( nr ν log ( n )) measurements if the underlying Gram matrix obeys the coherence condition with parameter ν . Numerical algorithms are designed to solve the EDG problem under two scenarios, exact and noisy partial information.Numerical experiments on various test data demonstrate that the algorithms are simple, fast and accurate. The technique inthis paper is not specifically limited to the EDG problem. The extension of this result to the low rank recovery of a matrixgiven few measurements with respect to any non-orthogonal basis is a work in preparation.VII. A cknowledgment The authors would like to thank Dr. Jia Li for his discussions in the early stage of this project. Abiy Tasissa would liketo thank Professor David Gross for correspondence over email regarding the work in [29]. Particularly, the proof of LemmaA.7 is a personal communication from Professor David Gross. The authors also would like to thanks Professor Peter Kramerand Professor Alex Gittens for their comments and suggestions. The authors’ gratitude is further extended to the anonymousreviewers for their valuable feedback which has improved the manuscript.R eferences [1] W. Glunt, T. Hayden, and M. Raydan, “Molecular conformations from distance matrices,”
Journal of Computational Chemistry , vol. 14, no. 1, pp.114–120, 1993.[2] M. W. Trosset,
Applications of multidimensional scaling to molecular conformation . Citeseer, 1997.[3] X. Fang and K.-C. Toh, “Using a distributed sdp approach to solve simulated protein molecular conformation problems,” in
Distance Geometry . Springer,2013, pp. 351–376.[4] J. B. Tenenbaum, V. De Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” science , vol. 290, no. 5500,pp. 2319–2323, 2000.[5] Y. Ding, N. Krislock, J. Qian, and H. Wolkowicz, “Sensor network localization, euclidean distance matrix completions, and graph realization,”
Optimizationand Engineering , vol. 11, no. 1, pp. 45–66, 2010.[6] P. Biswas, T.-C. Lian, T.-C. Wang, and Y. Ye, “Semidefinite programming based algorithms for sensor network localization,”
ACM Transactions onSensor Networks (TOSN) , vol. 2, no. 2, pp. 188–220, 2006.[7] R. Lai and J. Li, “Solving partial di ff erential equations on manifolds from incomplete interpoint distance,” SIAM Journal on Scientific Computing , vol. 39,no. 5, pp. A2231–A2256, 2017. [8] W. S. Torgerson, “Multidimensional scaling: I. theory and method,” Psychometrika , vol. 17, no. 4, pp. 401–419, 1952.[9] J. C. Gower, “Properties of euclidean and non-euclidean distance matrices,”
Linear Algebra and its Applications , vol. 67, pp. 81–97, 1985.[10] E. J. Cand`es and B. Recht, “Exact matrix completion via convex optimization,”
Foundations of Computational mathematics , vol. 9, no. 6, pp. 717–772,2009.[11] N. Moreira, L. Duarte, C. Lavor, and C. Torezzan, “A novel low-rank matrix completion approach to estimate missing entries in euclidean distancematrices,” arXiv preprint arXiv:1711.06182 , 2017.[12] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization,”
SIAM review ,vol. 52, no. 3, pp. 471–501, 2010.[13] A. Javanmard and A. Montanari, “Localization from incomplete noisy distance measurements,”
Foundations of Computational Mathematics , vol. 13,no. 3, pp. 297–345, 2013.[14] I. J. Schoenberg, “Remarks to maurice frechet’s article“sur la definition axiomatique d’une classe d’espace distances vectoriellement applicable surl’espace de hilbert,”
Annals of Mathematics , pp. 724–732, 1935.[15] G. Young and A. S. Householder, “Discussion of a set of points in terms of their mutual distances,”
Psychometrika , vol. 3, no. 1, pp. 19–22, 1938.[16] B. Hendrickson, “Conditions for unique graph realizations,”
SIAM journal on computing , vol. 21, no. 1, pp. 65–84, 1992.[17] M. Bakonyi and C. R. Johnson, “The euclidian distance matrix completion problem,”
SIAM Journal on Matrix Analysis and Applications , vol. 16, no. 2,pp. 646–654, 1995.[18] C. R. Johnson and P. Tarazaga, “Connections between the real positive semidefinite and distance matrix completion problems,”
Linear Algebra and itsApplications , vol. 223, pp. 375–391, 1995.[19] M. Laurent, “A connection between positive semidefinite and euclidean distance matrix completion problems,”
Linear Algebra and its applications , vol.273, no. 1-3, pp. 9–22, 1998.[20] A. Y. Alfakih, “On the uniqueness of euclidean distance matrix completions,”
Linear Algebra and its Applications , vol. 370, pp. 1–14, 2003.[21] A. Y. Alfakih, A. Khandani, and H. Wolkowicz, “Solving euclidean distance matrix completion problems via semidefinite programming,”
Computationaloptimization and applications , vol. 12, no. 1-3, pp. 13–30, 1999.[22] H.-r. Fang and D. P. O’Leary, “Euclidean distance matrix completion problems,”
Optimization Methods and Software , vol. 27, no. 4-5, pp. 695–717,2012.[23] B. Hendrickson, “The molecule problem: Exploiting structure in global optimization,”
SIAM Journal on Optimization , vol. 5, no. 4, pp. 835–857, 1995.[24] J. J. Mor´e and Z. Wu, “Global continuation for distance geometry problems,”
SIAM Journal on Optimization , vol. 7, no. 3, pp. 814–836, 1997.[25] M. W. Trosset, “Distance matrix completion by numerical optimization,”
Computational Optimization and Applications , vol. 17, no. 1, pp. 11–22, 2000.[26] Z. Zou, R. H. Bird, and R. B. Schnabel, “A stochastic / perturbation global optimization algorithm for distance geometry problems,” Journal of GlobalOptimization , vol. 11, no. 1, pp. 91–105, 1997.[27] I. Dokmanic, R. Parhizkar, J. Ranieri, and M. Vetterli, “Euclidean distance matrices: Essential theory, algorithms, and applications,”
IEEE SignalProcessing Magazine , vol. 32, no. 6, pp. 12–30, 2015.[28] L. Liberti, C. Lavor, N. Maculan, and A. Mucherino, “Euclidean distance geometry and applications,”
Siam Review , vol. 56, no. 1, pp. 3–69, 2014.[29] D. Gross, “Recovering low-rank matrices from few coe ffi cients in any basis,” Information Theory, IEEE Transactions on , vol. 57, no. 3, pp. 1548–1566,2011.[30] E. J. Cand`es and T. Tao, “The power of convex relaxation: Near-optimal matrix completion,”
IEEE Transactions on Information Theory , vol. 56, no. 5,pp. 2053–2080, 2010.[31] B. Recht, “A simpler approach to matrix completion,”
The Journal of Machine Learning Research , vol. 12, pp. 3413–3430, 2011.[32] J.-F. Cai, E. J. Cand`es, and Z. Shen, “A singular value thresholding algorithm for matrix completion,”
SIAM Journal on Optimization , vol. 20, no. 4,pp. 1956–1982, 2010.[33] A. Carmi, L. Mihaylova, and S. Godsill,
Compressed Sensing & Sparse Filtering , ser. Signals and Communication Technology. Springer BerlinHeidelberg, 2013. [Online]. Available: https: // books.google.com / books?id = EAO4BAAAQBAJ[34] W. Hoe ff ding, “Probability inequalities for sums of bounded random variables,” Journal of the American statistical association , vol. 58, no. 301, pp.13–30, 1963.[35] R. Ahlswede and A. Winter, “Strong converse for identification via quantum channels,”
IEEE Transactions on Information Theory , vol. 48, no. 3, pp.569–579, 2002.[36] D. Gross and V. Nesme, “Note on sampling without replacing from a finite collection of matrices,” arXiv preprint arXiv:1001.2738 , 2010.[37] R. Bhatia,
Matrix Analysis , ser. Graduate Texts in Mathematics. Springer New York, 2013. [Online]. Available: https: // books.google.com / books?id = lh4BCAAAQBAJ[38] J. A. Tropp, “User-friendly tail bounds for sums of random matrices,” Foundations of computational mathematics , vol. 12, no. 4, pp. 389–434, 2012.[39] E. J. Candes and Y. Plan, “Matrix completion with noise,”
Proceedings of the IEEE , vol. 98, no. 6, pp. 925–936, 2010.[40] J. Barzilai and J. M. Borwein, “Two-point step size gradient methods,”
IMA Journal of Numerical Analysis , vol. 8, no. 1, pp. 141–148, 1988.[41] N.-H. Z. Leung and K.-C. Toh, “An sdp-based divide-and-conquer algorithm for large-scale noisy anchor-free graph realization,”
SIAM Journal onScientific Computing , vol. 31, no. 6, pp. 4351–4372, 2009.[42] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne, “The protein data bank,”
Nucleic acidsresearch , vol. 28, no. 1, pp. 235–242, 2000.[43] R. Horn and C. Johnson,
Matrix Analysis . Cambridge University Press, 1990. [Online]. Available: https: // books.google.com / books?id = PlYQN0ypTwEC A ppendix Lemma A.1. If X ∈ S , Sgn X ∈ S . If X ∈ T , Sgn X ∈ T Proof.
Using the eigenvalue decomposition of X , X = U Σ U T . Sgn X is simply Sgn X = U (Sgn Σ ) U T = U DU T where D is the diagonal matrix resulting from applying the sign function to Σ . To show that Sgn X ∈ S , we need to verify Sgn X = (Sgn X ) T and Sgn X · = . Symmetry of Sgn X is apparent from its definition, (Sgn X ) T = U D T U T = U DU T = Sgn X .To show that Sgn X · = , consider X · = using the spectral decomposition of X . (cid:88) i λ i u i u Ti · = −→ u Tj (cid:88) i λ i u i u Ti · = −→ ( λ j u Tj ) · = The implication is that λ j = u Tj · = . With this, consider the spectral decomposition of the symmetric matrix Sgn X .Sgn X = (cid:88) j sgn ( λ j ) u j u Tj Sgn X · = follows in the following way. If λ j =
0, sgn ( λ j ) =
0. Otherwise, from above, u Tj · = . It can now beconcluded that Sgn X ∈ S . Next, we show that for X ∈ T , Sgn X ∈ T . Using the eigenvalue decomposition of Sgn X ,consider P T ⊥ Sgn X . P T ⊥ Sgn X = Sgn X − P T Sgn X = U DU T − [ P U Sgn X + Sgn X P U − P U Sgn X P U ] = U DU T − [ U U T U DU T + U DU T U U T − U U T U DU T U U T ] = where the last step simply follows from the fact that U T U = I . This confirms that sgn X ∈ T and concludes the proof. (cid:3) In the next two lemmas, we state and prove some facts about H and H − . We start with the latter first deriving an explicitform of H − . This will be the focus of Lemma A.3 to follow shortly. In the proof of Lemma A.3, a certain form of the basis v α is used. The form is conjectured by inspection of the basis v α for the case when n is small. We start by stating and provingthis form. Lemma A.2.
Given an index ( i , j ) with ≤ i < j ≤ n, the matrix v i , j has the following form. v i , j = ˜ w i , j + p i , j + q i , j The matrices ˜ w i j , p i , j and q i j are respectively defined as follows. ˜ w i , j = n − n e i , i + n − n e j , j − ( n − + n e i , j − ( n − + n e j , i where e α , α is a matrix of zeros except a at the location ( α , α ) . p i j has the following form. p i , j = (cid:88) t , t (cid:44) i , t (cid:44) j n − n e i , t + (cid:88) s , s (cid:44) i , s (cid:44) j n − n e s , j q i j is defined as follows. q i , j = (cid:88) ( t , s ) , ( t , s ) (cid:44) ( i , j ) − n e s , t where ( t , s ) (cid:44) ( i , j ) is defined as { t , s } ∩ { i , j } = ∅ .Proof. To make the proposed form concrete, before proceeding with the proof, consider the following example. a) Example : The form of v , : Consider the case where n =
5. Using the proposed form, the matrix v , can be writtenas follows v , = ˜ w , + p , + q , where ˜ w , = − − p , = . . .
50 0 1 . . . . . . . . . and q , = − − −
10 0 − − −
10 0 − − − Therefore, v , has the following explicit form v , = − . . . − . . . . . − − − . . − − − . . − − − As a first step simple check, consider (cid:104) v , , w , (cid:105) which results (8 + = v i , j is dual to w i , j . Since the dual basis is unique, establishing duality willconclude the proof. The result will be shown considering di ff erent cases.Case 1: (cid:104) v i , j , w i , j (cid:105) (cid:104) v i , j , w i , j (cid:105) = (cid:104) ˜ w i , j + p i , j + q i , j , w i , j (cid:105) = (cid:104) ˜ w i , j , w i , j (cid:105) + (cid:104) p i , j , w i , j (cid:105) + (cid:104) q i , j , w i , j (cid:105) = n − n + n − + n + + = (cid:104) p i , j , w i , j (cid:105) =
0. and (cid:104) q i , j , w i , j (cid:105) = (cid:104) v i , j , w α, β (cid:105) for ( i , j ) (cid:44) ( α, β ) (cid:104) v i , j , w α, β (cid:105) = (cid:104) ˜ w i , j + p i , j + q i , j , w α, β (cid:105) = (cid:104) ˜ w i , j , w α, β (cid:105) + (cid:104) p i , j , w α, β (cid:105) + (cid:104) q i , j , w α, β (cid:105) = + + = (cid:104) v i , j , w α, β (cid:105) for { i , j } ∩ { α, β } (cid:44) ∅ A. i = α and j (cid:44) β (cid:104) v i , j , w α, β (cid:105) = (cid:104) ˜ w i , j + p i , j + q i , j , w α, β (cid:105) = (cid:104) ˜ w i , j , w α, β (cid:105) + (cid:104) p i , j , w α, β (cid:105) + (cid:104) q i , j , w α, β (cid:105) = n − n − n − n − n = i (cid:44) α and j = β (cid:104) v i , j , w α, β (cid:105) = (cid:104) ˜ w i , j + p i , j + q i , j , w α, β (cid:105) = (cid:104) ˜ w i , j , w α, β (cid:105) + (cid:104) p i , j , w α, β (cid:105) + (cid:104) q i , j , w α, β (cid:105) = n − n − n − n − n = i = β and j (cid:44) α (cid:104) v i , j , w α, β (cid:105) = (cid:104) ˜ w i , j + p i , j + q i , j , w α, β (cid:105) = (cid:104) ˜ w i , j , w α, β (cid:105) + (cid:104) p i , j , w α, β (cid:105) + (cid:104) q i , j , w α, β (cid:105) = n − n − n − n − n = i (cid:44) β and j = α (cid:104) v i , j , w α, β (cid:105) = (cid:104) ˜ w i , j + p i , j + q i , j , w α, β (cid:105) = (cid:104) ˜ w i , j , w α, β (cid:105) + (cid:104) p i , j , w α, β (cid:105) + (cid:104) q i , j , w α, β (cid:105) = n − n − n − n − n = { v i , j } is dual to { w i , j } and the proposed form is established. (cid:3) The next Lemma uses the result of the above Lemma to derive an explicit form of the matrix H − . Lemma A.3.
The inverse of the matrix H , H − , has the following explicit form H α , β = (cid:104) v α , v β (cid:105) = ( n − + n if α = β & α = β n if α (cid:44) β & α (cid:44) β & α (cid:44) β & α (cid:44) β − n n otherwiseProof. We consider three di ff erent cases.Case 1: (cid:104) v i , j , v i , j (cid:105)(cid:104) v i , j , v i , j (cid:105) = (cid:104) ˜ w i , j + p i , j + q i , j , ˜ w i , j + p i , j + q i , j (cid:105) = (cid:104) ˜ w i , j , ˜ w i , j (cid:105) + (cid:104) p i , j , p i , j (cid:105) + (cid:104) q i , j , q i , j (cid:105) = n − n + (cid:20) ( n − + n (cid:21) + (cid:32) n − n (cid:33) n − + n ( n − = n − n + n + (cid:32) n − n + n − n + n (cid:33) + n − n + n − n + n − n + n = − n + n = ( n − + n The second equality uses the fact that (cid:104) ˜ w i , j , p i , j (cid:105) = (cid:104) p i , j , q i , j (cid:105) = w i , j , q i , j =
0. The third inequality simply uses thedefinition of ˜ w i , j , p i , j and q i , j . The second to last result follows after some algebraic calculations.Case 2: (cid:104) v i , j , v α, β (cid:105) for ( i , j ) (cid:44) ( α, β ), { i , j } ∩ { α, β } = ∅ (cid:104) v i , j , v α, β (cid:105) = (cid:104) ˜ w i , j + p i , j + q i , j , ˜ w α, β + p α, β + q α, β = (cid:104) ˜ w i , j , ˜ w α, β (cid:105) + (cid:104) ˜ w i , j , p α, β (cid:105) + (cid:104) ˜ w i , j , q α, β (cid:105) + (cid:104) p i , j , ˜ w α, β (cid:105) + (cid:104) p i , j , p α, β (cid:105) + (cid:104) p i , j , q α, β (cid:105) + (cid:104) q i , j , ˜ w α, β (cid:105) + (cid:104) q i , j , p α, β (cid:105) + (cid:104) q i , j , q α, β (cid:105) Each term will be evaluated separately.1. (cid:104) ˜ w i , j , ˜ w α, β (cid:105) = i , j ) (cid:44) ( α, β ), ˜ w i , j and ˜ w α, β have disjoint supports.2. (cid:104) ˜ w i , j , p α, β (cid:105) = i , j ) (cid:44) ( α, β ), ˜ w i , j and p α, β have disjoint supports.3. Consider (cid:104) ˜ w i , j , q α, β (cid:105) . (cid:104) ˜ w i , j , q α, β (cid:105) = [ ˜ w i , j ] i , i [ q α, β ] i , i + [ ˜ w i , j ] j , j [ q α, β ] j , j + [ ˜ w i , j ] i , j [ q α, β ] i , j + [ ˜ w i , j ] j , i [ q α, β ] j , i = − n − n + ( n − + n (cid:104) p i , j , ˜ w α, β (cid:105) = (cid:104) p i , j , p α, β (cid:105) . (cid:104) p i , j , p α, β (cid:105) = (cid:88) s , s (cid:44) i , s (cid:44) j [ p i , j ] i , s [ p α, β ] i , s + (cid:88) t , t (cid:44) i , t (cid:44) j [ p i , j ] j , t [ p α, β ] j , t + (cid:88) s , s (cid:44) i , s (cid:44) j [ p i , j ] s , i [ p α, β ] s , i + (cid:88) t , t (cid:44) i , t (cid:44) j [ p i , j ] t , j [ p α, β ] t , j = [ p i , j ] i , α [ p α, β ] i , α + [ p i , j ] i , β [ p α, β ] i , β + [ p i , j ] j , α [ p α, β ] j , α + [ p i , j ] j , β [ p α, β ] j , β + [ p i , j ] α, i [ p α, β ] α, i + [ p i , j ] β, i [ p α, β ] β, i + [ p i , j ] α, j [ p α, β ] α, j + [ p i , j ] β, j [ p α, β ] β, j = (cid:32) n − n (cid:33) = n − n
6. Consider (cid:104) p i , j , q α, β (cid:105) . (cid:104) p i , j , q α, β (cid:105) = (cid:88) t , t (cid:44) i , t (cid:44) j [ p i , j ] i , t [ q α, β ] i , t + (cid:88) s , s (cid:44) i , s (cid:44) j [ p i , j ] j , s [ q α, β ] j , s + (cid:88) t , t (cid:44) i , t (cid:44) j [ p i , j ] t , i [ q α, β ] t , i + (cid:88) s , s (cid:44) i , s (cid:44) j [ p i , j ] s , j [ q α, β ] s , j = (4 n −
16) 2 n − n (cid:32) − n (cid:33) = − n − n − n The first equality follows since it su ffi ces to consider (cid:104) p i , j , q α, β (cid:105) on the support of p i , j . The second and last equality result fromthe following analysis. Restricted to the support of p i , j , the matrix q α, β is non-zero except at the entries [ q α, β ] i , α , [ q α, β ] i , β , [ q α, β ] α, i ,[ q α, β ] β, i , [ q α, β ] j , α , [ q α, β ] j , β , [ q α, β ] α, j and [ q α, β ] β, j . Using this and the fact that p i , j has 4( n −
2) entries, | supp( p i , j ) ∩ supp( q α, β ) | = n − − = n −
16. With this, the final form above follows.7. (cid:104) q i , j , ˜ w α, β (cid:105) = − n − n + ( n − + n follows by a similar argument as 3.8. (cid:104) q i , j , p α, β (cid:105) = − n − n − n follows by a similar argument as 6.9. Consider (cid:104) q i , j , q α, β (cid:105) . Since ( i , j ) (cid:44) ( α, β ), both q i , j and q α, β have non-zero entry at ( s , t ) if and only if s (cid:44) i , s (cid:44) j , s (cid:44) α, s (cid:44) β , t (cid:44) i , t (cid:44) j , t (cid:44) α and t (cid:44) β . Therefore, | supp( q i , j ) ∩ supp( q α, β ) | = ( n − n − = ( n − given the n − s and n − t . (cid:104) q i , j , q α, β (cid:105) can now be written as follows (cid:104) q i , j , q α, β (cid:105) = (cid:88) ( s , t ) ∈ supp( q i , j ) ∩ supp( q α,β ) [ q i , j ] s , t [ q α, β ] s , t = ( n − (cid:32) − n (cid:33) (cid:32) − n (cid:33) = ( n − n Therefore, (cid:104) v i , j , v α, β (cid:105) is the sum of the above terms. (cid:104) v i , j , v α, β (cid:105) = − n − n + n − + n + n − n − n − n − n + ( n − n = n The last equality follows after some algebraic manipulations. Case 3: (cid:104) v i , j , v α, β (cid:105) for { i , j } ∩ { α, β } (cid:44) ∅ A. i = α and j (cid:44) β . (cid:104) v i , j , v α, β (cid:105) = (cid:104) ˜ w i , j + p i , j + q i , j , ˜ w α, β + p α, β + q α, β (cid:105) = (cid:104) ˜ w i , j , ˜ w α, β (cid:105) + (cid:104) ˜ w i , j , p α, β (cid:105) + (cid:104) ˜ w i , j , q α, β (cid:105) + (cid:104) p i , j , ˜ w α, β (cid:105) + (cid:104) p i , j , p α, β (cid:105) + (cid:104) p i , j , q α, β (cid:105) + (cid:104) q i , j , ˜ w α, β (cid:105) + (cid:104) q i , j , p α, β (cid:105) + (cid:104) q i , j , q α, β (cid:105) Each term will be evaluated separately.1. (cid:104) ˜ w i , j , ˜ w α, β (cid:105) = [ ˜ w i , j ] i , i [ ˜ w α, β ] i , i = ( n − n .2. Consider (cid:104) ˜ w i , j , p α, β (cid:105) . (cid:104) ˜ w i , j , p α, β (cid:105) = [ ˜ w i , j ] i , j [ p i , j ] i , j + [ ˜ w i , j ] j , i [ p i , j ] j , i = − (cid:32) n − + n (cid:33) (cid:32) n − n (cid:33) = − ( n − n − + n (cid:104) ˜ w i , j , q α, β (cid:105) = [ ˜ w i , j ] j , j [ q α, β ] j , j = n − n (cid:32) − n (cid:33) = − n − n .4. (cid:104) p i , j , ˜ w α, β (cid:105) = − ( n − n − + n follows by a similar argument as 2.5. Consider (cid:104) p i , j , p α, β (cid:105) . (cid:104) p i , j , p α, β (cid:105) = (cid:88) s , s (cid:44) i , s (cid:44) j [ p i , j ] i , s [ p α, β ] i , s + (cid:88) t , t (cid:44) i , t (cid:44) j [ p i , j ] j , t [ p α, β ] j , t + (cid:88) s , s (cid:44) i , s (cid:44) j [ p i , j ] s , i [ p α, β ] s , i + (cid:88) t , t (cid:44) i , t (cid:44) j [ p i , j ] t , j [ p α, β ] t , j = (cid:88) s , s (cid:44) i , s (cid:44) j [ p i , j ] i , s [ p α, β ] i , s + [ p i , j ] j , β [ p α, β ] j , β + (cid:88) s , s (cid:44) i , s (cid:44) j [ p i , j ] s , i [ p α, β ] s , i + [ p i , j ] t , β [ p α, β ] t , β = ( n − (cid:32) n − n (cid:33) + n − n + ( n − (cid:32) n − n (cid:33) + n − n = n − n − (cid:32) n − n (cid:33) =
12 ( n − n The second line follows since [ p i , β ] j , t = t except t = β and t = i and [ p i , β ] t , j = t except t = β and t = i . Thethird line uses the fact that [ p i , β ] i , s (cid:44) t except t = i and t = β and [ p i , β ] s , i (cid:44) t except t = i and t = β . Thefinal equality results after some algebraic manipulations.6. Consider (cid:104) p i , j , q α, β (cid:105) . (cid:104) p i , j , q α, β (cid:105) = (cid:88) t , t (cid:44) i , t (cid:44) j [ p i , j ] i , t [ q α, β ] i , t + (cid:88) s , s (cid:44) i , s (cid:44) j [ p i , j ] j , s [ q α, β ] j , s + (cid:88) t , t (cid:44) i , t (cid:44) j [ p i , j ] t , i [ q α, β ] t , i + (cid:88) s , s (cid:44) i , s (cid:44) j [ p i , j ] s , j [ q α, β ] s , j = + (cid:88) s , s (cid:44) i , s (cid:44) j [ p i , j ] j , s [ q α, β ] j , s + + (cid:88) s , s (cid:44) i , s (cid:44) j [ p i , j ] s , j [ q α, β ] s , j = n −
3) 2 n − n (cid:32) − n (cid:33) = − ( n − n − n The first equality follow since it su ffi ces to consider (cid:104) p i , j , q α, β (cid:105) on the support of p i , j . The second equality results since[ q i , β ] i , t = [ q i , β ] t , i = t . The third and last equality result from the following analysis. Restricted to the support of p i , j ,the matrix q i , β is zero except at the entries [ q i , β ] j , s and [ q i , β ] s , j for all j (cid:44) β . With this, | supp( p i , j ) ∩ supp( q α, β ) | = n − − = n −
3) and the final form follows.7. (cid:104) q i , j , ˜ w α, β (cid:105) = − n − n follows by a similar argument as 3.8. (cid:104) q i , j , p α, β (cid:105) = − ( n − n − n follows by a similar argument as 6.
9. Consider (cid:104) q i , j , q α, β (cid:105) . Since i = α and j (cid:44) ( α, β ), both q i , j and q α, β have a non-zero entry at ( s , t ) if and only if s (cid:44) i , s (cid:44) α, s (cid:44) β , t (cid:44) i , t (cid:44) j , and t (cid:44) β . Therefore, | supp( q i , j ) ∩ supp( q α, β ) | = ( n − n − = ( n − given the n − s and n − t . (cid:104) q i , j , q α, β (cid:105) can now be written as follows (cid:104) q i , j , q α, β (cid:105) = (cid:88) ( s , t ) ∈ supp( q i , j ) ∩ supp( q α,β ) [ q i , j ] s , t [ q α, β ] s , t = ( n − (cid:32) − n (cid:33) (cid:32) − n (cid:33) = ( n − n Therefore, (cid:104) v i , j , v α, β (cid:105) is the sum of the above terms. (cid:104) v i , j , v α, β (cid:105) = ( n − n + − ( n − n − + n − n − n +
12 ( n − n − n − n − n + ( n − n = − n n The last equality follows after some algebraic manipulations.Now the following three cases remain: i (cid:44) α and j = β , i = β and j (cid:44) α and i (cid:44) β and j = α . However, since v i , j is symmetricthe above argument can be adapted to these three cases by interchanging indices as appropriate. This concludes the proof. (cid:3) Remark 5.
A short proof of the form of H − might be plausible. The main technical challenge has been the locations of and ’s in H which does not lend itself to simple analysis. Next, in Lemma A.4, we state and prove the spectral properties of the matrix H . Lemma A.4.
The matrix H ∈ R L × L is symmetric and positive definite. The minimum eigenvalue of H is at least and themaximum eigenvalue is n. The absolute sum of each row of H − is given by − n + n .Proof. The symmetry of H is trivial as H α , β = (cid:104) w α , w β (cid:105) = (cid:104) w β , w α (cid:105) = H β , α . H is positive definite since H = W T W and W has linearly independent columns. With α = ( α , α ) and β = ( β , β ), H is defined as follows H α , β = (cid:104) w α , w β (cid:105) = α = β & α = β α (cid:44) β & α (cid:44) β & α (cid:44) β & α (cid:44) β H , the number of zeros in any given row is given by ( n − n − L − − = n ( n − − ( n − n − − = n − H , note that is an eigenvector of H with eigenvalue 2 n . From Gerschgorin theorem,the upper bound for an eigenvalue is simply 4 + (2 n − = n . It follows that the maximum eigenvalue of H is 2 n . Using theform of H − from Lemma A.3, consider the absolute sum of any row of H − , (cid:80) i | H − i j | . (cid:88) i | H − i , j | = n (cid:20) ( n − + (cid:21) + n (cid:20) ( n − n − (cid:21) + n (2 n − (cid:12)(cid:12)(cid:12) − n (cid:12)(cid:12)(cid:12) = − n + n Finally, we make use of a variant of Gerschgorin’s theorem to bound the maximum eigenvalue of H − . If P is invertible, P − H − P and H − have the same eigenvalues. For simplicity, let P = diag ( d , d , ..., d L ). In [43, p. 347], this fact wasused to state the following variant of Gerschgorin’s theorem. Below, we restate this result, albeit minor changes, for ease ofreference. Corollary 2 ([43, p. 347]) . Let A = [ a i j ] ∈ R L × L and let d , d , ..., d L be positive real numbers. Then all eigenvalues of A liein the region L (cid:91) i = (cid:26) z ∈ R : | z − a ii | ≤ d i L (cid:88) j = , j (cid:44) i d j | a i j | (cid:27) Using the corollary, set d = d = d = ... = d L =
3. Apply the corollary on the matrix H − . After minor calculation, wehave λ max ( H − ) ≤
1. We remark here that with suitable choice of ( d , d , ..., d L ), the bound can be tightened to λ max ( H − ) = but the current bound is su ffi cient for our analysis. (cid:3) Assume that the underlying inner product matrix M has coherence ν with respect to the standard basis. Let e i ∈ R n be thestandard vector, a vector of zeros except a 1 in the i th position. For all i , 1 ≤ i ≤ n , the coherence definition [10] states that ||P U e i || ≤ ν rn (46)It su ffi ces to consider the condition on U since M is symmetric. Given this, could one derive coherence conditions for theEDG problem? The answer is a ffi rmative and is given in Lemma A.5 below. Lemma A.5.
If the underlying inner product matrix M has coherence ν with respect to the standard basis, i.e. M satisfies (46) , then the following coherence conditions hold for the EDG problem. ||P T w α || F ≤ ν rn ; ||P T v α || F ≤ ν rnProof. Using the definition of P T and the fact that w α is symmetric for any α , we have ||P T w α || F = (cid:104) w α , U U T w α (cid:105) + (cid:104) w α , w α U U T (cid:105) − (cid:104) w α , U U T w α U U T (cid:105) = (cid:104) w α , U U T w α (cid:105) − (cid:104) U U T w α U U T , U U T w α U U T (cid:105) ≤ (cid:104) w α , U U T w α (cid:105) Note that (cid:104) w α , U U T w α (cid:105) = (cid:104) U U T w α , U U T w α (cid:105) ≥
0. Using the definition of (cid:104) X , w α (cid:105) and the fact that w α = w α forany α , (cid:104) w α , U U T w α (cid:105) = U U T ) α , α + ( U U T ) α , α − U U T ) α , α ] ≤ U U T ) α , α + ( U U T ) α , α + | ( U U T ) α , α | ] ≤ U U T ) α , α + ( U U T ) α , α ]. The last inequality holds since U U T is positive semidefinite. This motivates a bound onmax i j | U U T | i , j . | U U T | i , j = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r (cid:88) k = U i , k U j , k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ r (cid:88) k = | U i , k | | U j , k | ≤ (cid:118)(cid:116) r (cid:88) k = U i , k (cid:118)(cid:116) r (cid:88) k = U j , k ≤ (cid:114) ν rn (cid:114) ν rn = ν rn Using the above bound, (cid:104) w α , U U T w α (cid:105) ≤ ν rn resulting the following bound for ||P T w α || F . ||P T w α || F ≤ ν rn To bound ||P T v α || F , note that ||P T v α || F ≤ (cid:80) β ∈ I || H α , β P T w β || F = (cid:80) β ∈ I | H α , β | ||P T w β || F . Using Lemma A.4 and the boundfor ||P T w α || F , ||P T v α || F ≤ ν rn (cid:3) In addition, the standard matrix completion analysis assumes that max i j | ( U U T ) i , j | ≤ µ √ rn for some constant µ [10]. If thisholds, for some α , it follows that (cid:104) U U T , w α (cid:105) = ( U U T ) α , α + ( U U T ) α , α − U U T ) α , α ≤ ( U U T ) α , α + ( U U T ) α , α + | ( U U T ) α , α |≤ U U T ) α , α + ( U U T ) α , α ] ≤ µ √ rn With this, it can be concluded that max α ∈ I (cid:104) w α , U U T (cid:105) ≤ µ rn (47)Another short calculation results a bound on (cid:104) v α , U U T (cid:105) (cid:104) v α , U U T (cid:105) = (cid:104) (cid:88) β ∈ I H α , β w β , U U T (cid:105) ≤ max β ∈ I (cid:104) w β , U U T (cid:105) (cid:88) β ∈ I | H α , β | ≤ µ rn The last inequality uses (47) and Lemma A.4. From the above inequality, it follows thatmax α ∈ I (cid:104) v α , U U T (cid:105) ≤ µ rn Lemma A.5 and the discussion above show that the coherence conditions with respect to standard basis lead to comparableEDG coherence conditions. Specifically, we obtain conditions equivalent up to constants to (15), (16) and (17). We remark herethat the condition (13) does not simply follow from the coherence conditions with respect to the standard basis. We speculatethat the equivalence is possible under certain assumptions but a rigorous analysis is left for future work. Lemma A.6.
Given any X ∈ S , the following estimates hold. n || X || F ≤ (cid:88) α ∈ I (cid:104) X , v α (cid:105) ≤ || X || F ; || X || F ≤ (cid:88) α ∈ I (cid:104) X , w α (cid:105) ≤ n || X || F Proof.
Vectorize the matrix X and each dual basis v α . It follows that (cid:88) α ∈ I (cid:104) X , v α (cid:105) = (cid:88) α ∈ I x T v α v T α x = x T V V T x Orthogonalize V , V = V ( √ H − ) − . With this, (cid:80) β ∈ I (cid:104) X , v β (cid:105) = x T V H − V T x . It follows that λ min ( H − ) || x || = n || x || ≤ (cid:88) β ∈ I (cid:104) X , v β (cid:105) ≤ λ max ( H − ) || x || ≤ || x || where the above result follows from the min-max theorem and Lemma A.4. Proceeding analogously as above, noting that λ max ( H ) = n and λ min ( H ) ≥ || X || F ≤ (cid:88) α ∈ I (cid:104) X , w α (cid:105) ≤ n || X || F This concludes the proof. (cid:3)
Lemma A.7.
Let c α ≥ . Then, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) α c α ( P T ⊥ w α ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) α c α w α (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Proof.
Using the definition of P T ⊥ w α , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:80) α c α ( P T ⊥ w α ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) can be written as follows. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) α c α ( P T ⊥ w α ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) α c α P U ⊥ w α P U ⊥ w α P U ⊥ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P U ⊥ (cid:88) α c α w α P U ⊥ w α P U ⊥ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Using the fact that the operator norm is unitarily invariant and ||P X P|| ≤ || X || for any X and a projection P , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:80) α c α ( P T ⊥ w α ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) can be upper bounded as follows (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) α c α ( P T ⊥ w α ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) α c α w α P U ⊥ w α (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) α c α w α ( w α − P U w α ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) α [ c α w α − c α w α P U w α ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) where the first equality follows from the relation P U ⊥ = I − P U . Since w α = w T α and c α ≥ (cid:80) α c α w α is positivesemidefinite. Using the relation P U = P U and the assumption that c α ≥ (cid:80) α c α w α P U w α = (cid:80) α c α w α P U P U w α isalso positive semidefinite. Repeating the same argument, (cid:80) α c α w α P U ⊥ w α is also positive semidefinite. Finally, the norminequality, || A + B || ≥ max( || A || , || B || ), for positive semidefinite matrices A and B , concludes the proof. (cid:3) Lemma A.8.
Define η ( X ) = max β ∈ I |(cid:104) X , v β (cid:105)| . For a fixed X ∈ T , the following estimate holds.Pr (max β ∈ I |(cid:104)P T R ∗ j X − X , v β (cid:105)| ≥ t ) ≤ n exp − t κ j η ( X ) (cid:16) ν + nr (cid:17) (48) for all t ≤ η ( X ) with κ j = m j nr .Proof. For some v β , expand (cid:104)P T R ∗ j X − X , v β (cid:105) in the following way: (cid:104)P T R ∗ j X − X , v β (cid:105) = (cid:104) (cid:88) α ∈ Ω j Lm j (cid:104) X , v α (cid:105)P T w α − X , v β (cid:105) = (cid:88) α ∈ Ω j (cid:32) Lm j (cid:104) X , v α (cid:105)(cid:104)P T w α , v β (cid:105) − m j (cid:104) X , v β (cid:105) (cid:33) Note that the summand can be written as Y α = X α − E [ X α ]. By construction, E [ Y α ] =
0. To apply Bernstein inequality, itremains to compute a suitable bound for | Y α | and | E [ Y α ] | . | Y α | is bounded as follows. | Y α | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Lm j (cid:104) X , v α (cid:105)(cid:104)P T w α , v β (cid:105) − m j (cid:104) X , v β (cid:105) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Lm j η ( X ) 4 ν rn + m j η ( X ) < m j η ( X ) (1 + nr ν ) Above, the first inequality follows from the coherence estimates (15) and (16). Next, we consider a bound | E [ Y α ] | . Since E [ Y α ] = E [ X α ] − ( E [ X α ]) , E [ Y α ] can be upper bounded as follows E [ Y α ] ≤ E (cid:20) L m j (cid:104) X , v α (cid:105) (cid:104)P T w α , v β (cid:105) (cid:21) + m j (cid:104) X , v β (cid:105) = Lm j (cid:88) α ∈ I (cid:104) X , v α (cid:105) (cid:104)P T w α , v β (cid:105) + m j (cid:104) X , v β (cid:105) ≤ η ( X ) n m j (cid:88) α ∈ I (cid:104)P T w α , v β (cid:105) + m j η ( X ) ≤ η ( X ) nr ν m j + m j η ( X ) = η ( X ) m j (1 + nr ν )Above, the last inequality follows from the coherence estimate (13). Finally, apply the scalar Bernstein inequality with R = η ( X ) (1 + nr ν ) m j and σ = η ( X ) (1 + nr ν ) m j . For t ≤ σ R > η ( X ),Pr( |(cid:104)P T R ∗ j X − X , v β (cid:105)| ≥ t ) ≤ exp − t κ j η ( X ) (cid:16) ν + nr (cid:17) (49)The proof of Lemma A.8 concludes by simply applying the union bound.(49)The proof of Lemma A.8 concludes by simply applying the union bound.