On characteristic rank for matrix and tensor completion
OOn characteristic rank for matrix and tensor completion
Alexander Shapiro, Yao Xie, and Rui ZhangNovember 12, 2020
In this lecture note, we discuss a fundamental concept, referred to as the characteristic rank ,which suggests a general framework for characterizing the basic properties of various low-dimensional models used in signal processing. Below, we illustrate this framework using twoexamples: matrix and three-way tensor completion problems, and consider basic propertiesinclude identifiability of a matrix or tensor, given partial observations. In this note, weconsider cases without observation noise to illustrate the principle.
Characteristic rank provides a fundamental tool for determining the “order” of low-rankstructures, such as the rank of low-rank matrices and rank of three-way tensors. The conceptof characteristic rank was introduced in [6], where it was used to establish necessary andsufficient conditions to determine the “recoverability” of the low-rank matrices in [6].The characteristic rank can also be generally applied to determine the “intrinsic” degree-of-freedom in other low-rank manifold structures. Such instances include determining thenumber of hidden nodes in one-layer neural networks and determining the number of sourcesin blind demixing problems, as shown in [7]. 1 a r X i v : . [ m a t h . S T ] N ov Prerequisite
To better comprehend the concepts discussed in this lecture-notes article, readers are ex-pected to have a good background in linear algebra, multivariate calculus, and basic conceptsof measure theory (which we will explain whenever running into them). Suggested referencesare [4] and [3]. Below, we review some basic concepts necessary for the notes.
Manifold of low-rank matrices.
Consider the set of n × n matrices of rank r , denoted M r . Note that the rank is no larger than the dimension of the matrix: r ≤ min { n , n } . Itis known that such a set of rank- r matrices M r forms a smooth manifold in the space R n × n and the dimension of the manifold is given bydim( M r ) = r ( n + n − r ) . (1)A matrix A ∈ M r can be represented in the form A = V W (cid:62) , where V and W are matrices ofthe respective order n × r and n × r , both of full column rank r . Thus, we can view ( V, W ) asa parametrization of M r . Note that the number of involved parameters is r ( n + n ), whichis larger than the dimension of M r ; this is because V and W in the above representationare not unique. Three-way tensor.
Another example we will consider is the (three-way) tensor X ∈ R n × n × n . It is said that X has rank one if X = a ◦ b ◦ c where a, b, c are vectors of therespective dimensions n , n , n , and “ ◦ ” denotes the vector outer product. That is, everyelement of tensor X can be written as the product X ijk = a i b j c k . The smallest number r suchthat tensor X can be represented as a sum of r of rank-one tensors is called the rank of X .The corresponding decomposition is often referred to as the (tensor) rank-decomposition orCanonical Polyadic Decomposition (CPD) [8, 10]. We would like to remark that our methodcan apply to higher order tensors as well. Let us start by considering the problem of reconstructing an n × n matrix of a given rank r while observing its entries M ij , ( i, j ) ∈ Ω, for an index set Ω ⊂ { , ..., n } × { , ..., n }
2f cardinality m = | Ω | . This is known as the exact matrix completion problem [1], whichis now well-studied. The conditions for recovery have been derived assuming entries are missing-at-random and the performance guarantees are given in a probabilistic sense. Here,we aim to approach the problem from a geometric perspective, which can possibly lead toa deterministic and more intuitive answer. There are two basic problems associated withthis problem, namely the existence and uniqueness of the solution. That is, whether such amatrix does exist and if it exists, whether it is unique. Fundamentally, these questions arerelated to the identifiability of low-rank matrices, which we define as follows. Definition 1 (Local identifiability of low-rank matrix completion problem)
Let Y ∈M r be such that [ Y ] ij = M ij , ( i, j ) ∈ Ω . (Thus, rank ( Y ) = r .) It is said that the matrixcompletion problem is locally identifiable at Y if there exists a neighborhood N ⊂ R n × n of Y such that for any Y (cid:48) ∈ N with [ Y (cid:48) ] ij = M ij , ( i, j ) ∈ Ω , the rank of Y (cid:48) is different from r . Uniqueness is the key question of the tensor rank decomposition. Here, we consider thefollowing tensor decomposition problem: given a three-way tensor X , we would like to findthe associated matrix factors A, B, C of the respective order n × r , n × r and n × r , suchthat X = (cid:74) A, B, C (cid:75) , meaning that X = (cid:80) ri =1 a i ◦ b i ◦ c i with a i , b i , c i being i th columns ofthe respective matrices A, B, C . Clearly the decomposition X = (cid:74) A, B, C (cid:75) is invariant withrespect to permutations of the rank one components, and rescaling of the columns of matrices
A, B, C by factors λ i , λ i , λ i such that λ i λ i λ i = 1, i = 1 , . . . , r . We first introduce theglobal and local identifiability of tensor. Definition 2 (Global identifiability of tensor decompositon)
The decomposition X = (cid:74) A, B, C (cid:75) is (globally) identifiable of rank r if it is unique, i.e., if X = (cid:74) A (cid:48) , B (cid:48) , C (cid:48) (cid:75) is an-other decomposition of tensor X with matrices A (cid:48) , B (cid:48) , C (cid:48) being of the respective order n × r (cid:48) , n × r (cid:48) , n × r (cid:48) and r (cid:48) = r , then both decompositions are the same up to the correspondingpermutation and rescaling. It is said that the rank r decomposition is generically identifiableif for almost every ( A, B, C ) ∈ R n × r × R n × r × R n × r the corresponding tensor X = (cid:74) A, B, C (cid:75) is identifiable of rank r . Definition 3 (Local identitifiability of tensor decomposition)
We say that ( A, B, C ) ∈ R n × r × R n × r × R n × r is locally identifiable if there is a neighborhood N of ( A, B, C ) such hat ( A (cid:48) , B (cid:48) , C (cid:48) ) ∈ N and (cid:74) A (cid:48) , B (cid:48) , C (cid:48) (cid:75) = (cid:74) A, B, C (cid:75) imply that ( A (cid:48) , B (cid:48) , C (cid:48) ) can be obtainedfrom ( A, B, C ) by the corresponding rescaling. We say that model ( n , n , n , r ) is generi-cally locally identifiable if a.e. ( A, B, C ) ∈ R n × r × R n × r × R n × r is locally identifiable. Like the matrix completion problem, it is also possible to consider a tensor completionproblem: reconstructing a tensor of a given rank when only a subset of the entries is observed.The respective local and global identifiability concepts can be defined similarly.
Re-parameterization of matrix completion problem.
Let us start with the matrixcompletion problem using the following parametrization. Consider the set X of n × n matrices X such that [ X ] ij = 0, ( i, j ) ∈ Ω (imagining adding such matrices to solutions andthey are still consistent with observations). We can view X as a linear space of dimensiondim( X ) = n n − m . Then the matrix completion problem has a solution if and only if thereexist respective matrices V and W of rank r and X ∈ X such that [ V W (cid:62) + X ] ij = M ij ,( i, j ) ∈ Ω. Let Θ be the set of vectors θ formed from the components of ( V, W, X ). Notethat Θ is a subset of vector space of dimension r ( n + n ) + n n − m . Characteristic rank.
The matrix completion parametrization can be considered as amapping assigning matrix
V W (cid:62) + X to vector of parameters θ = ( V, W, X ) ∈ Θ. With thismapping, we can define the so-called
Jacobian matrix ∆( θ ), which is the partial derivativesof V W (cid:62) + X with respect to components of vector θ . Then we associate this mapping withits characteristic rank , defined as r = max θ ∈ Θ { rank(∆( θ )) } . (2)Note that the characteristic rank r does not depend on order in which the parameters arearranged.The characteristic rank has the following properties: the rank of ∆( θ ) is equal to r foralmost every (a.e.) θ ∈ Θ. By “almost every” we mean that the set of such θ ∈ Θ for whichrank(∆( θ )) (cid:54) = r is of Lebesgue measure zero. Moreover, the set { θ ∈ Θ : rank(∆( θ )) = r } θ ) is constant, and equals r , in aneighborhood of a.e. θ ∈ Θ. This result implies that the characteristic rank is an intrinsicquantity associated with the “degrees-of-freedom” of the problem, regardless of the value ofthe parameters.
Implication of characteristic rank on matrix completion.
We can also look at thecharacteristic rank from the following point of view. Consider the tangent space T M r ( Y ) tothe manifold M r , at the point Y = V W (cid:62) ∈ M r . We have thatrank(∆( θ )) = dim( T M r ( Y )) + dim( X ) − dim ( T M r ( Y ) ∩ X ) , (3)The above relation (3) can be explained as follows. Generically the image of the consideredmapping V W (cid:62) + X forms a smooth manifold in the image space, at least locally. The tangentspace to this manifold, at the considered point, is the sum of the tangent space to M r (fromthe paramerization V W (cid:62) ) and the linear space X in the image space. On the other handthis tangent space is generated by columns of the Jacobian matrix ∆( θ ) (or in other wordsby the differential of the mapping) and its dimension is equal to the rank of ∆( θ ). Thenthe right hand side of (3) is the usual formula for dimension of the sum of two linear spaces T M r ( Y ) and X . Hence, from (3) and the definition of the characteristic rank (2), we havethat r = dim( T M r ( Y )) + dim( X ) − inf Y ∈M r (cid:8) dim (cid:0) T M r ( Y ) ∩ X (cid:1)(cid:9) . (4)By classical Sard’s theorem [5], we have that the image of the set Θ by the mapping θ (cid:55)→ V W (cid:62) + X , has Lebesgue measure zero if and only if r < n n . That is, if r < n n , then generically the problem of reconstructing matrix of rank r by observing its entries M ij ,( i, j ) ∈ Ω, is unsolvable. By “generically” we mean that the set of rank- r solutions withcomponents matching M ij , ( i, j ) ∈ Ω, has a Lebesgue measure zero in the correspondingvector space of dimension m .In other words, if the characteristic rank is smaller than the dimension n n of the imagespace, then any solution of rank r is unstable: this means that arbitrarily small changes ofthe data values M ij make rank r solution unattainable. Note that the characteristic rank isa function of the index set Ω and does not depend on the observed values M ij . In particular,because of (4) we have that r < n n if m > r ( n + n − r ) . For example, if n = n = 10, r = 3, then we have r <
100 if m > × (10 + 10 −
3) = 51. Since the characteristic rank is5he dimension of the image of the mapping, if it is smaller than the dimension n n of theimage space, then it is “thin”, i.e. of measure zero in the image space. Well-posedness condition.
By the above discussion we have that if T M r ( Y ) ∩ X = { } (5)at least for one point Y ∈ M r , then r = dim( T M r ( Y )) + dim( X ) . (6)Conversely if (6) holds, then condition (5) is satisfied for all Y ∈ M r except for a set ofmeasure zero in M r . Condition (5) implies local identifiability at Y . • Generically the matrix completion problem is locally identifiable if and only if condition(6) holds, which is referred to as the well-posedness condition in [6].Figure 1 illustrates the above point. Generically the intersection of T M r ( Y ) and X givesthe tangent space to the intersection of M r and X . When the intersection of T M r ( Y ) and X is { } we have well posedness and local uniqueness. 𝒯 ℳ ! (𝑌)ℳ " 𝒳 Figure 1: Illustration of well-posedness condition for the matrix completion problem.
Simple example.
Here we illustrate the characteristic rank using a simple example of6-by-2 rank-one matrix M = vw (cid:62) , with partial observations at Ω = { (1 , , (2 , } . Then X = (cid:34) x x (cid:35) ,θ = ( v , v , w , w , x , x ). We have,∆( θ ) = ∂ ( vw (cid:62) + X ) ∂θ = w v w v w v w v . It can be verified that rank(∆( θ )) = 4 for a.e. θ ∈ Θ; thus, r = 4. Consider possible rank-one solution to this problem. The tangent space of the rank-one manifold dim( T ( M r )) =2 + 2 − X ) = 2; r < dim( T ( M r )) + dim( X ) and the well-posedness condition(6) is not satisfied. Indeed, the rank-one solution to this problem is not unique: it can beany x x = c where c is the product of the observed diagonal elements.On the other hand, if Ω = { (1 , , (1 , , (2 , } , X = (cid:34) x (cid:35) ,θ = ( v , v , w , w , x ). We have,∆( θ ) = ∂ ( vw (cid:62) + X ) ∂θ = w v w v w v w v . It can be verified that rank(∆( θ )) = 4 for a.e. θ ∈ Θ, and thus r = 4. The rank of thetangent space is 2 + 2 − X is 1. Thus, r = dim( T ( M r )) + dim( X ) andthe well-posedness condition (6) is satisfied. Indeed, the solution to this matrix completionproblem is unique. Checking conditions.
Although the above simple example is easy to check, to evaluate the7haracteristic rank in a closed-form is not always easy for larger instances. Nevertheless, therank of the Jacobian matrix can be computed numerically, and hence condition (6) can beverified for a considered index set Ω and rank r . Clearly, local identifiability is a necessarycondition for global identifiability (i.e., for global uniqueness of the solution). Assumingthat all observed entries are different from zero, necessary and sufficient conditions for globalidentifiability are known when r = 1. Those conditions are the same for local identifiability(see [6] for more details). To give necessary and sufficient conditions for global identifiabilityfor general r and Ω could be too difficult and out of reach. On the other hand, the simpledimensionality condition (6) gives a verifiable condition at least for local identifiability. Invoking characteristic rank on three-way tensor.
Here we briefly discuss the localidentifiability for tensor decomposition. For three-way tensor recovery, we can consider themapping G r : ( A, B, C ) (cid:55)→ (cid:74) A, B, C (cid:75) . (7)Similar to (2), the characteristic rank r of the above mapping is given by the maximal rankof its Jacobian matrix, and it has generic properties similar to the ones discussed for thematrix completion problem. Note that r is always less than or equal to r ( n + n + n − A, B, C ) and making correction for thescaling factors. • The model ( n , n , n , r ) is generically locally identifiable if and only if the followingcondition for the characteristic rank holds r = r ( n + n + n − . (8)The above condition (8) is necessary for the generic global identification, and can be verifiednumerically by computing rank of the Jacobian matrix of the mapping G r .Let us note further that in a similar spirit, it is also possible to give conditions for localidentifiability of the tensor completion problem when only a set of observed values of thetensor components are available (i.e., tensor completion problems). To do so, we need to setup appropriate mapping and study the associated characteristic rank.8e can refer to [2, section 3.2], and references there in, for a discussion of uniqueness(identifiability) of tensor rank decompositions. For the tensor completion problem, localidentifiability does not imply the respective global identifiability even in the rank one case(e.g., [9]). Here we present a numerical example to illustrate how to use the characteristic rank tostudy a three-way tensor’s completion problem. Consider the case where the tensor entriesare randomly sampled. Assume the size of each dimension of the tensor is n , and thus the sizeof the tensor is R n × n × n . The proportion of the observed entries is p , and the total number ofobserved entries is m = (cid:100) pn (cid:101) , where (cid:100) x (cid:101) is the ceiling function for rounding up to the nearestinteger. For each p , we randomly choose m observations from the tensor. For the reportedexperiments, we used n = 2 , . . . ,
10 and p = 0 . , . , . . . , .
6. As previously mentioned,the necessary condition for the well-posedness is that m ≥ n − p ≥ (3 n − /n . Figure 2 shows the probability that the well-posedness issatisfied for rank-one tensors under different tensor sizes and sampling proportions. Notethat the empirical results match well with the theoretical prediction. Moreover, it can beobserved that as the tensor size becomes large, the well-posedness condition is satisfied witha small sampling proportion. In this note, we explained how to use a fundamental concept, namely the characteristic rank ,to answer essential questions such as identifiability when given observations of a low-rankstructure (e.g., low-rank matrices and low-rank three-way tensors). The framework involves afew steps. We first find the map that associates the truth to the observations, then study theJacobian matrix of the map to find the characteristic rank, and compare the characteristicrank with respective conditions that are problem-specific (such as well-posedness condition).Once the concepts are understood, the analysis usually involves only basic multi-variatecalculus. The benefit is that the tool can generally be applicable to study other problemswith low-rank structures. In this note, we have considered cases without observation noise to9 size n s a m p l e p r opo r t i on p necessary condition for wellposeness99.9%90% Figure 2: Example of the recovering a three-way tensor with missing data: the probabilityof well-posedness being satisfied versus theoretical prediction. The blueline corresponds to p = (3 n − /n ; the yellow and the orange lines correspond to the sampling proportion thatthe well-posedness condition is satisfied with probability 90% and 99 .
9% empirically.illustrate the principle. When there are additive Gaussian noises, statistical goodness-of-fittests can be developed based on the framework [7].
Author Biography
Alexander Shapiro is A. Russell Chandler III Chair and Professor at Georgia Institute ofTechnology in the H. Milton Stewart School of Industrial and Systems Engineering. Hehas published more than 140 research articles in peer review journals and is a coauthor ofseveral books. He served on editorial board of a number of professional journals. He wasthe Editor-In-Chief of the Mathematical Programming, Series A, journal. In 2013 he wasa recipient of Khachiyan Prize for Life-time Accomplishments in Optimization, awarded bythe INFORMS Optimization Society, and in 2018 he was a recipient of the Dantzig Prizeawarded by the Mathematical Optimization Society and Society for Industrial and AppliedMathematics. In 2020 he was elected to the National Academy of Engineering.Yao Xie is an Associate Professor and Harold R. and Mary Anne Nash Early Career Pro-fessor at Georgia Institute of Technology in the H. Milton Stewart School of Industrial and10ystems Engineering, and an Associate Director of the Machine Learning Center. She re-ceived her Ph.D. in Electrical Engineering (minor in Mathematics) from Stanford University,M.Sc. in Electrical and Computer Engineering from the University of Florida, and B.Sc. inElectrical Engineering and Computer Science from the University of Science and Technologyof China (USTC). She was a Research Scientist at Duke University. Her research areas arestatistics (in particular sequential analysis and sequential change-point detection), machinelearning, and signal processing, providing the theoretical foundation and developing compu-tationally efficient and statistically powerful algorithms. She has worked on such problemsin sensor networks, social networks, power systems, crime data analysis, and wireless com-munications. She received the National Science Foundation (NSF) CAREER Award in 2017.She is currently an Associate Editor for IEEE Transactions on Signal Processing, SequentialAnalysis: Design Methods and Applications, and INFORMS Journal on Data Science, andserves on the Editorial Board of Journal of Machine Learning Research.Rui Zhang is a Ph.D. student of H. Milton Stewart School of Industrial and Systems Engi-neering, Georgia Institute of Technology. Rui Zhang received a B.S. (2015) in statistics fromSun Yat-sen University and a M.S. (2017) in statistics from University of Michigan. He isinterested in change-point detection, machine learning and statistics.
Acknowledgement
The work is partially funded by an NSF CAREER Award CCF-1650913, NSF CMMI-2015787, DMS-1938106, DMS-1830210.
References [1] Emmanuel J Cand`es and Terence Tao. The power of convex relaxation: Near-optimalmatrix completion.
IEEE Trans. Info. Theory , 56(5):2053–2080, 2010.[2] Tamara G Kolda and Brett W Bader. Tensor decompositions and applications.
SIAMreview , 51(3):455–500, 2009.[3] Carl D Meyer.
Matrix analysis and applied linear algebra , volume 71. Siam, 2000.114] Walter Rudin et al.
Principles of mathematical analysis , volume 3. McGraw-hill NewYork, 1964.[5] A Sards. The measure of critical values of differential maps.
Bull. Amer. Math. Soc ,48:883–890, 1942.[6] A. Shapiro, Y. Xie, and R. Zhang. Matrix completion with deterministic pattern - ageometric perspecve.
IEEE Transactions on Signal Processing , 67:1088–1103, 2019.[7] Alexander Shapiro, Yao Xie, and Rui Zhang. Goodness-of-fit tests on manifolds.
ArXiv ,abs/1909.05229, 2019.[8] Nicholas D Sidiropoulos, Lieven De Lathauwer, Xiao Fu, Kejun Huang, Evangelos EPapalexakis, and Christos Faloutsos. Tensor decomposition for signal processing andmachine learning.
IEEE Transactions on Signal Processing , 65(13):3551–3582, 2017.[9] Mohit Singh, Alexander Shapiro, and Rui Zhang. Rank one tensor completion problem.
ArXiv , arXiv:2009.10533, 2020.[10] Laurent Sorber, Marc Van Barel, and Lieven De Lathauwer. Optimization-based al-gorithms for tensor decompositions: Canonical polyadic decomposition, decompositionin rank-(l r,l r,1) terms, and a new generalization.