[PDF] On characteristic rank for matrix and tensor completion

Abstract

In this lecture note, we discuss a fundamental concept, referred to as the {\it characteristic rank}, which suggests a general framework for characterizing the basic properties of various low-dimensional models used in signal processing. Below, we illustrate this framework using two examples: matrix and three-way tensor completion problems, and consider basic properties include identifiability of a matrix or tensor, given partial observations. In this note, we consider cases without observation noise to illustrate the principle.

Full PDF

OOn characteristic rank for matrix and tensor completion

Alexander Shapiro, Yao Xie, and Rui ZhangNovember 12, 2020

In this lecture note, we discuss a fundamental concept, referred to as the characteristic rank ,which suggests a general framework for characterizing the basic properties of various low-dimensional models used in signal processing. Below, we illustrate this framework using twoexamples: matrix and three-way tensor completion problems, and consider basic propertiesinclude identiﬁability of a matrix or tensor, given partial observations. In this note, weconsider cases without observation noise to illustrate the principle.

Characteristic rank provides a fundamental tool for determining the “order” of low-rankstructures, such as the rank of low-rank matrices and rank of three-way tensors. The conceptof characteristic rank was introduced in [6], where it was used to establish necessary andsuﬃcient conditions to determine the “recoverability” of the low-rank matrices in [6].The characteristic rank can also be generally applied to determine the “intrinsic” degree-of-freedom in other low-rank manifold structures. Such instances include determining thenumber of hidden nodes in one-layer neural networks and determining the number of sourcesin blind demixing problems, as shown in [7]. 1 a r X i v : . [ m a t h . S T ] N ov Prerequisite

To better comprehend the concepts discussed in this lecture-notes article, readers are ex-pected to have a good background in linear algebra, multivariate calculus, and basic conceptsof measure theory (which we will explain whenever running into them). Suggested referencesare [4] and [3]. Below, we review some basic concepts necessary for the notes.

Manifold of low-rank matrices.

Consider the set of n × n matrices of rank r , denoted M r . Note that the rank is no larger than the dimension of the matrix: r ≤ min { n , n } . Itis known that such a set of rank- r matrices M r forms a smooth manifold in the space R n × n and the dimension of the manifold is given bydim( M r ) = r ( n + n − r ) . (1)A matrix A ∈ M r can be represented in the form A = V W (cid:62) , where V and W are matrices ofthe respective order n × r and n × r , both of full column rank r . Thus, we can view ( V, W ) asa parametrization of M r . Note that the number of involved parameters is r ( n + n ), whichis larger than the dimension of M r ; this is because V and W in the above representationare not unique. Three-way tensor.

Another example we will consider is the (three-way) tensor X ∈ R n × n × n . It is said that X has rank one if X = a ◦ b ◦ c where a, b, c are vectors of therespective dimensions n , n , n , and “ ◦ ” denotes the vector outer product. That is, everyelement of tensor X can be written as the product X ijk = a i b j c k . The smallest number r suchthat tensor X can be represented as a sum of r of rank-one tensors is called the rank of X .The corresponding decomposition is often referred to as the (tensor) rank-decomposition orCanonical Polyadic Decomposition (CPD) [8, 10]. We would like to remark that our methodcan apply to higher order tensors as well. Let us start by considering the problem of reconstructing an n × n matrix of a given rank r while observing its entries M ij , ( i, j ) ∈ Ω, for an index set Ω ⊂ { , ..., n } × { , ..., n }

2f cardinality m = | Ω | . This is known as the exact matrix completion problem [1], whichis now well-studied. The conditions for recovery have been derived assuming entries are missing-at-random and the performance guarantees are given in a probabilistic sense. Here,we aim to approach the problem from a geometric perspective, which can possibly lead toa deterministic and more intuitive answer. There are two basic problems associated withthis problem, namely the existence and uniqueness of the solution. That is, whether such amatrix does exist and if it exists, whether it is unique. Fundamentally, these questions arerelated to the identiﬁability of low-rank matrices, which we deﬁne as follows. Deﬁnition 1 (Local identiﬁability of low-rank matrix completion problem)

Let Y ∈M r be such that [ Y ] ij = M ij , ( i, j ) ∈ Ω . (Thus, rank ( Y ) = r .) It is said that the matrixcompletion problem is locally identiﬁable at Y if there exists a neighborhood N ⊂ R n × n of Y such that for any Y (cid:48) ∈ N with [ Y (cid:48) ] ij = M ij , ( i, j ) ∈ Ω , the rank of Y (cid:48) is diﬀerent from r . Uniqueness is the key question of the tensor rank decomposition. Here, we consider thefollowing tensor decomposition problem: given a three-way tensor X , we would like to ﬁndthe associated matrix factors A, B, C of the respective order n × r , n × r and n × r , suchthat X = (cid:74) A, B, C (cid:75) , meaning that X = (cid:80) ri =1 a i ◦ b i ◦ c i with a i , b i , c i being i th columns ofthe respective matrices A, B, C . Clearly the decomposition X = (cid:74) A, B, C (cid:75) is invariant withrespect to permutations of the rank one components, and rescaling of the columns of matrices

A, B, C by factors λ i , λ i , λ i such that λ i λ i λ i = 1, i = 1 , . . . , r . We ﬁrst introduce theglobal and local identiﬁability of tensor. Deﬁnition 2 (Global identiﬁability of tensor decompositon)

The decomposition X = (cid:74) A, B, C (cid:75) is (globally) identiﬁable of rank r if it is unique, i.e., if X = (cid:74) A (cid:48) , B (cid:48) , C (cid:48) (cid:75) is an-other decomposition of tensor X with matrices A (cid:48) , B (cid:48) , C (cid:48) being of the respective order n × r (cid:48) , n × r (cid:48) , n × r (cid:48) and r (cid:48) = r , then both decompositions are the same up to the correspondingpermutation and rescaling. It is said that the rank r decomposition is generically identiﬁableif for almost every ( A, B, C ) ∈ R n × r × R n × r × R n × r the corresponding tensor X = (cid:74) A, B, C (cid:75) is identiﬁable of rank r . Deﬁnition 3 (Local identitiﬁability of tensor decomposition)

We say that ( A, B, C ) ∈ R n × r × R n × r × R n × r is locally identiﬁable if there is a neighborhood N of ( A, B, C ) such hat ( A (cid:48) , B (cid:48) , C (cid:48) ) ∈ N and (cid:74) A (cid:48) , B (cid:48) , C (cid:48) (cid:75) = (cid:74) A, B, C (cid:75) imply that ( A (cid:48) , B (cid:48) , C (cid:48) ) can be obtainedfrom ( A, B, C ) by the corresponding rescaling. We say that model ( n , n , n , r ) is generi-cally locally identiﬁable if a.e. ( A, B, C ) ∈ R n × r × R n × r × R n × r is locally identiﬁable. Like the matrix completion problem, it is also possible to consider a tensor completionproblem: reconstructing a tensor of a given rank when only a subset of the entries is observed.The respective local and global identiﬁability concepts can be deﬁned similarly.

Re-parameterization of matrix completion problem.

Let us start with the matrixcompletion problem using the following parametrization. Consider the set X of n × n matrices X such that [ X ] ij = 0, ( i, j ) ∈ Ω (imagining adding such matrices to solutions andthey are still consistent with observations). We can view X as a linear space of dimensiondim( X ) = n n − m . Then the matrix completion problem has a solution if and only if thereexist respective matrices V and W of rank r and X ∈ X such that [ V W (cid:62) + X ] ij = M ij ,( i, j ) ∈ Ω. Let Θ be the set of vectors θ formed from the components of ( V, W, X ). Notethat Θ is a subset of vector space of dimension r ( n + n ) + n n − m . Characteristic rank.

The matrix completion parametrization can be considered as amapping assigning matrix

V W (cid:62) + X to vector of parameters θ = ( V, W, X ) ∈ Θ. With thismapping, we can deﬁne the so-called

Jacobian matrix ∆( θ ), which is the partial derivativesof V W (cid:62) + X with respect to components of vector θ . Then we associate this mapping withits characteristic rank , deﬁned as r = max θ ∈ Θ { rank(∆( θ )) } . (2)Note that the characteristic rank r does not depend on order in which the parameters arearranged.The characteristic rank has the following properties: the rank of ∆( θ ) is equal to r foralmost every (a.e.) θ ∈ Θ. By “almost every” we mean that the set of such θ ∈ Θ for whichrank(∆( θ )) (cid:54) = r is of Lebesgue measure zero. Moreover, the set { θ ∈ Θ : rank(∆( θ )) = r } θ ) is constant, and equals r , in aneighborhood of a.e. θ ∈ Θ. This result implies that the characteristic rank is an intrinsicquantity associated with the “degrees-of-freedom” of the problem, regardless of the value ofthe parameters.

Implication of characteristic rank on matrix completion.

We can also look at thecharacteristic rank from the following point of view. Consider the tangent space T M r ( Y ) tothe manifold M r , at the point Y = V W (cid:62) ∈ M r . We have thatrank(∆( θ )) = dim( T M r ( Y )) + dim( X ) − dim ( T M r ( Y ) ∩ X ) , (3)The above relation (3) can be explained as follows. Generically the image of the consideredmapping V W (cid:62) + X forms a smooth manifold in the image space, at least locally. The tangentspace to this manifold, at the considered point, is the sum of the tangent space to M r (fromthe paramerization V W (cid:62) ) and the linear space X in the image space. On the other handthis tangent space is generated by columns of the Jacobian matrix ∆( θ ) (or in other wordsby the diﬀerential of the mapping) and its dimension is equal to the rank of ∆( θ ). Thenthe right hand side of (3) is the usual formula for dimension of the sum of two linear spaces T M r ( Y ) and X . Hence, from (3) and the deﬁnition of the characteristic rank (2), we havethat r = dim( T M r ( Y )) + dim( X ) − inf Y ∈M r (cid:8) dim (cid:0) T M r ( Y ) ∩ X (cid:1)(cid:9) . (4)By classical Sard’s theorem [5], we have that the image of the set Θ by the mapping θ (cid:55)→ V W (cid:62) + X , has Lebesgue measure zero if and only if r < n n . That is, if r < n n , then generically the problem of reconstructing matrix of rank r by observing its entries M ij ,( i, j ) ∈ Ω, is unsolvable. By “generically” we mean that the set of rank- r solutions withcomponents matching M ij , ( i, j ) ∈ Ω, has a Lebesgue measure zero in the correspondingvector space of dimension m .In other words, if the characteristic rank is smaller than the dimension n n of the imagespace, then any solution of rank r is unstable: this means that arbitrarily small changes ofthe data values M ij make rank r solution unattainable. Note that the characteristic rank isa function of the index set Ω and does not depend on the observed values M ij . In particular,because of (4) we have that r < n n if m > r ( n + n − r ) . For example, if n = n = 10, r = 3, then we have r <

100 if m > × (10 + 10 −

3) = 51. Since the characteristic rank is5he dimension of the image of the mapping, if it is smaller than the dimension n n of theimage space, then it is “thin”, i.e. of measure zero in the image space. Well-posedness condition.

By the above discussion we have that if T M r ( Y ) ∩ X = { } (5)at least for one point Y ∈ M r , then r = dim( T M r ( Y )) + dim( X ) . (6)Conversely if (6) holds, then condition (5) is satisﬁed for all Y ∈ M r except for a set ofmeasure zero in M r . Condition (5) implies local identiﬁability at Y . • Generically the matrix completion problem is locally identiﬁable if and only if condition(6) holds, which is referred to as the well-posedness condition in [6].Figure 1 illustrates the above point. Generically the intersection of T M r ( Y ) and X givesthe tangent space to the intersection of M r and X . When the intersection of T M r ( Y ) and X is { } we have well posedness and local uniqueness. 𝒯 ℳ ! (𝑌)ℳ " 𝒳 Figure 1: Illustration of well-posedness condition for the matrix completion problem.

Simple example.

Here we illustrate the characteristic rank using a simple example of6-by-2 rank-one matrix M = vw (cid:62) , with partial observations at Ω = { (1 , , (2 , } . Then X = (cid:34) x x (cid:35) ,θ = ( v , v , w , w , x , x ). We have,∆( θ ) = ∂ ( vw (cid:62) + X ) ∂θ =  w v w v w v w v  . It can be veriﬁed that rank(∆( θ )) = 4 for a.e. θ ∈ Θ; thus, r = 4. Consider possible rank-one solution to this problem. The tangent space of the rank-one manifold dim( T ( M r )) =2 + 2 − X ) = 2; r < dim( T ( M r )) + dim( X ) and the well-posedness condition(6) is not satisﬁed. Indeed, the rank-one solution to this problem is not unique: it can beany x x = c where c is the product of the observed diagonal elements.On the other hand, if Ω = { (1 , , (1 , , (2 , } , X = (cid:34) x (cid:35) ,θ = ( v , v , w , w , x ). We have,∆( θ ) = ∂ ( vw (cid:62) + X ) ∂θ =  w v w v w v w v  . It can be veriﬁed that rank(∆( θ )) = 4 for a.e. θ ∈ Θ, and thus r = 4. The rank of thetangent space is 2 + 2 − X is 1. Thus, r = dim( T ( M r )) + dim( X ) andthe well-posedness condition (6) is satisﬁed. Indeed, the solution to this matrix completionproblem is unique. Checking conditions.

Although the above simple example is easy to check, to evaluate the7haracteristic rank in a closed-form is not always easy for larger instances. Nevertheless, therank of the Jacobian matrix can be computed numerically, and hence condition (6) can beveriﬁed for a considered index set Ω and rank r . Clearly, local identiﬁability is a necessarycondition for global identiﬁability (i.e., for global uniqueness of the solution). Assumingthat all observed entries are diﬀerent from zero, necessary and suﬃcient conditions for globalidentiﬁability are known when r = 1. Those conditions are the same for local identiﬁability(see [6] for more details). To give necessary and suﬃcient conditions for global identiﬁabilityfor general r and Ω could be too diﬃcult and out of reach. On the other hand, the simpledimensionality condition (6) gives a veriﬁable condition at least for local identiﬁability. Invoking characteristic rank on three-way tensor.

Here we brieﬂy discuss the localidentiﬁability for tensor decomposition. For three-way tensor recovery, we can consider themapping G r : ( A, B, C ) (cid:55)→ (cid:74) A, B, C (cid:75) . (7)Similar to (2), the characteristic rank r of the above mapping is given by the maximal rankof its Jacobian matrix, and it has generic properties similar to the ones discussed for thematrix completion problem. Note that r is always less than or equal to r ( n + n + n − A, B, C ) and making correction for thescaling factors. • The model ( n , n , n , r ) is generically locally identiﬁable if and only if the followingcondition for the characteristic rank holds r = r ( n + n + n − . (8)The above condition (8) is necessary for the generic global identiﬁcation, and can be veriﬁednumerically by computing rank of the Jacobian matrix of the mapping G r .Let us note further that in a similar spirit, it is also possible to give conditions for localidentiﬁability of the tensor completion problem when only a set of observed values of thetensor components are available (i.e., tensor completion problems). To do so, we need to setup appropriate mapping and study the associated characteristic rank.8e can refer to [2, section 3.2], and references there in, for a discussion of uniqueness(identiﬁability) of tensor rank decompositions. For the tensor completion problem, localidentiﬁability does not imply the respective global identiﬁability even in the rank one case(e.g., [9]). Here we present a numerical example to illustrate how to use the characteristic rank tostudy a three-way tensor’s completion problem. Consider the case where the tensor entriesare randomly sampled. Assume the size of each dimension of the tensor is n , and thus the sizeof the tensor is R n × n × n . The proportion of the observed entries is p , and the total number ofobserved entries is m = (cid:100) pn (cid:101) , where (cid:100) x (cid:101) is the ceiling function for rounding up to the nearestinteger. For each p , we randomly choose m observations from the tensor. For the reportedexperiments, we used n = 2 , . . . ,

10 and p = 0 . , . , . . . , .

6. As previously mentioned,the necessary condition for the well-posedness is that m ≥ n − p ≥ (3 n − /n . Figure 2 shows the probability that the well-posedness issatisﬁed for rank-one tensors under diﬀerent tensor sizes and sampling proportions. Notethat the empirical results match well with the theoretical prediction. Moreover, it can beobserved that as the tensor size becomes large, the well-posedness condition is satisﬁed witha small sampling proportion. In this note, we explained how to use a fundamental concept, namely the characteristic rank ,to answer essential questions such as identiﬁability when given observations of a low-rankstructure (e.g., low-rank matrices and low-rank three-way tensors). The framework involves afew steps. We ﬁrst ﬁnd the map that associates the truth to the observations, then study theJacobian matrix of the map to ﬁnd the characteristic rank, and compare the characteristicrank with respective conditions that are problem-speciﬁc (such as well-posedness condition).Once the concepts are understood, the analysis usually involves only basic multi-variatecalculus. The beneﬁt is that the tool can generally be applicable to study other problemswith low-rank structures. In this note, we have considered cases without observation noise to9 size n s a m p l e p r opo r t i on p necessary condition for wellposeness99.9%90% Figure 2: Example of the recovering a three-way tensor with missing data: the probabilityof well-posedness being satisﬁed versus theoretical prediction. The blueline corresponds to p = (3 n − /n ; the yellow and the orange lines correspond to the sampling proportion thatthe well-posedness condition is satisﬁed with probability 90% and 99 .

9% empirically.illustrate the principle. When there are additive Gaussian noises, statistical goodness-of-ﬁttests can be developed based on the framework [7].

Author Biography

Alexander Shapiro is A. Russell Chandler III Chair and Professor at Georgia Institute ofTechnology in the H. Milton Stewart School of Industrial and Systems Engineering. Hehas published more than 140 research articles in peer review journals and is a coauthor ofseveral books. He served on editorial board of a number of professional journals. He wasthe Editor-In-Chief of the Mathematical Programming, Series A, journal. In 2013 he wasa recipient of Khachiyan Prize for Life-time Accomplishments in Optimization, awarded bythe INFORMS Optimization Society, and in 2018 he was a recipient of the Dantzig Prizeawarded by the Mathematical Optimization Society and Society for Industrial and AppliedMathematics. In 2020 he was elected to the National Academy of Engineering.Yao Xie is an Associate Professor and Harold R. and Mary Anne Nash Early Career Pro-fessor at Georgia Institute of Technology in the H. Milton Stewart School of Industrial and10ystems Engineering, and an Associate Director of the Machine Learning Center. She re-ceived her Ph.D. in Electrical Engineering (minor in Mathematics) from Stanford University,M.Sc. in Electrical and Computer Engineering from the University of Florida, and B.Sc. inElectrical Engineering and Computer Science from the University of Science and Technologyof China (USTC). She was a Research Scientist at Duke University. Her research areas arestatistics (in particular sequential analysis and sequential change-point detection), machinelearning, and signal processing, providing the theoretical foundation and developing compu-tationally eﬃcient and statistically powerful algorithms. She has worked on such problemsin sensor networks, social networks, power systems, crime data analysis, and wireless com-munications. She received the National Science Foundation (NSF) CAREER Award in 2017.She is currently an Associate Editor for IEEE Transactions on Signal Processing, SequentialAnalysis: Design Methods and Applications, and INFORMS Journal on Data Science, andserves on the Editorial Board of Journal of Machine Learning Research.Rui Zhang is a Ph.D. student of H. Milton Stewart School of Industrial and Systems Engi-neering, Georgia Institute of Technology. Rui Zhang received a B.S. (2015) in statistics fromSun Yat-sen University and a M.S. (2017) in statistics from University of Michigan. He isinterested in change-point detection, machine learning and statistics.

Acknowledgement

The work is partially funded by an NSF CAREER Award CCF-1650913, NSF CMMI-2015787, DMS-1938106, DMS-1830210.

References [1] Emmanuel J Cand`es and Terence Tao. The power of convex relaxation: Near-optimalmatrix completion.

IEEE Trans. Info. Theory , 56(5):2053–2080, 2010.[2] Tamara G Kolda and Brett W Bader. Tensor decompositions and applications.

SIAMreview , 51(3):455–500, 2009.[3] Carl D Meyer.

Matrix analysis and applied linear algebra , volume 71. Siam, 2000.114] Walter Rudin et al.

Principles of mathematical analysis , volume 3. McGraw-hill NewYork, 1964.[5] A Sards. The measure of critical values of diﬀerential maps.

Bull. Amer. Math. Soc ,48:883–890, 1942.[6] A. Shapiro, Y. Xie, and R. Zhang. Matrix completion with deterministic pattern - ageometric perspecve.

IEEE Transactions on Signal Processing , 67:1088–1103, 2019.[7] Alexander Shapiro, Yao Xie, and Rui Zhang. Goodness-of-ﬁt tests on manifolds.

ArXiv ,abs/1909.05229, 2019.[8] Nicholas D Sidiropoulos, Lieven De Lathauwer, Xiao Fu, Kejun Huang, Evangelos EPapalexakis, and Christos Faloutsos. Tensor decomposition for signal processing andmachine learning.

IEEE Transactions on Signal Processing , 65(13):3551–3582, 2017.[9] Mohit Singh, Alexander Shapiro, and Rui Zhang. Rank one tensor completion problem.

ArXiv , arXiv:2009.10533, 2020.[10] Laurent Sorber, Marc Van Barel, and Lieven De Lathauwer. Optimization-based al-gorithms for tensor decompositions: Canonical polyadic decomposition, decompositionin rank-(l r,l r,1) terms, and a new generalization.