[PDF] A Higher-Order Generalized Singular Value Decomposition for Rank Deficient Matrices

Abstract

The higher-order generalized singular value decomposition (HO-GSVD) is a matrix factorization technique that extends the GSVD to N \ge 2 data matrices, and can be used to identify shared subspaces in multiple large-scale datasets with different row dimensions. The standard HO-GSVD factors N matrices A_i\in\mathbb{R}^{m_i\times n} as A_i=U_i\Sigma_i V^\text{T}, but requires that each of the matrices A_i has full column rank. We propose a reformulation of the HO-GSVD that extends its applicability to rank-deficient data matrices A_i. If the matrix of stacked A_i has full rank, we show that the properties of the original HO-GSVD extend to our reformulation. The HO-GSVD captures shared right singular vectors of the matrices A_i, and we show that our method also identifies directions that are unique to the image of a single matrix. We also extend our results to the higher-order cosine-sine decomposition (HO-CSD), which is closely related to the HO-GSVD. Our extension of the standard HO-GSVD allows its application to datasets with m_i < n, such as are encountered in bioinformatics, neuroscience, control theory or classification problems.

Full PDF

aa r X i v : . [ m a t h . NA ] F e b A HIGHER-ORDER GENERALIZED SINGULAR VALUE DECOMPOSITION FOR RANKDEFICIENT MATRICES ∗ IDRIS KEMPF † , PAUL J. GOULART † , AND

STEPHEN R. DUNCAN † Abstract.

The higher-order generalized singular value decomposition (HO-GSVD) is a matrix factorization technique thatextends the GSVD to N ≥ N matrices A i ∈ R m i × n as A i = U i Σ i V T , but requires that each ofthe matrices A i has full column rank. We propose a reformulation of the HO-GSVD that extends its applicability to rank-deﬁcientdata matrices A i . If the matrix of stacked A i has full rank, we show that the properties of the original HO-GSVD extend to ourreformulation. The HO-GSVD captures shared right singular vectors of the matrices A i , and we show that our method also identiﬁesdirections that are unique to the image of a single matrix. We also extend our results to the higher-order cosine-sine decomposition(HO-CSD), which is closely related to the HO-GSVD. Our extension of the standard HO-GSVD allows its application to datasetswith m i < n , such as are encountered in bioinformatics, neuroscience, control theory or classiﬁcation problems. Key words.

Higher-Order Generalized Singular Value Decomposition, Higher-Order Cosine-Sine Decomposition, Diagonaliza-tion, Multimodal Data Fusion.

AMS subject classiﬁcations.

1. Introduction.

The generalized singular value decomposition (GSVD) [12] is an extension of the well-known singular value decomposition (SVD) to N = 2 matrices. The GSVD is concerned with the decompositionof two matrices, A ∈ R m × n and A ∈ R m × n with m ≥ n , and factorizes each of the matrices as A i = U i Σ i V T .The matrix of right basis vectors V ∈ R n × n , with det V = 0, is shared among the decompositions but is not anorthogonal matrix, in contrast to the standard SVD. The columns of the matrices U i ∈ R m i × m i are commonlyreferred to as left generalized singular vectors , each satisfying U T i U i = I . The matrices Σ i = diag( σ i, , . . . , σ i,r ),with σ i,k ≥ r = rank[ A T1 , A T2 ] T , contain the generalized singular values . In contrast to an SVD, where thesingular values represent an ampliﬁcation in a particular direction, the generalized singular values measure therelative signiﬁcance of the right basis vectors in the image of each A i . If σ ,k = σ ,k , then v k contributes equallyto the images of A and A .The higher-order GSVD (HO-GSVD) [11] is an extension of the GSVD to N ≥ N matrices A , . . . , A N , the HO-GSVD decomposes each A i as A i = U i Σ i V T , (1.1)where U i ∈ R m i × n , Σ i ∈ R n × n and V ∈ R n × n with det V = 0 is shared among all factorizations. The matrix V is obtained from the eigensystem S π V = V Σ , where Σ := diag( ς , . . . , ς n ) and S π is the arithmetic mean of allpairwise quotients D i,π D j,π , S π := 1 N ( N − N X i =1 N X j = i +1 (cid:0) D i,π D j,π + D j,π D i,π (cid:1) , (1.2)with D i,π deﬁned as D i,π := A T i A i + πA T A, (1.3)where A := [ A T1 , . . . , A T N ] T . The standard HO-GSVD framework [11] corresponds to the case π = 0. We introducethe term πA T A with π > A i .As in the case of the GSVD, the columns of the matrices U i are the left generalized singular vectors and thediagonal matrices Σ i contain the generalized singular values. It can be shown that the GSVD is a special caseof the HO-GSVD with N = 2 and that the standard SVD of A j can be obtained from setting A i = I for i = j and N ≥ π = 0 it was shown in [11] that the subspace associated with the unit eigenvalues of S π forms a common HO-GSVD subspace , and we show that this property is preserved for π >

0. This subspace ∗ Submitted February 22, 2021.

Funding:

The research leading to these results is supported by Diamond Light Source and the Engineering and Physical SciencesResearch Council (EPSRC) with a CASE studentship. † Department of Engineering Science, University of Oxford, Parks Road, Oxford, OX1 3PJ, UK([email protected],[email protected], [email protected]).1

I. KEMPF, P. J. GOULART AND S. R. DUNCAN is spanned by the right basis vectors v k that are of equal signiﬁcance in all A i and A j and therefore associatedwith the generalized singular values for which σ i,k = σ j,k .The HO-GSVD is a technique that can be attributed to the ﬁeld of multimodal data fusion [10], which aimsto indentify common features across multiple data sets that describe related phenomena. The HO-GSVD canbe compared to other matrix or tensor decompositions, such as the parallel factor analysis (PARAFAC [6] orPARAFAC2 [7]), multilinear SVDs [2] or multilinear principal component analysis [4]. On one hand, some of thesedecompositions allow constraints to be imposed on the factor matrices U i , Σ i and V in (1.1), such as orthogonalityor rank conditions, but the matrices A i are often approximated as A i ≈ U i Σ i V T by minimizing P Ni =1 k A i − U i Σ i V T k F . On the other, tensor decompositions require that the matrices A i share the same dimensions, e.g. athird-order tensor A = A × A × . . . A N requires that all matrices A i have dimensions m × n , which imposesconstraints on the data acquisition. In contrast, the HO-GSVD is an exact matrix factorization so that A i = U i Σ i V T for i = 1 , . . . , N , and it can accommodate A i ∈ R m i × n with diﬀerent row dimensions m i , although noconstraints are imposed on the factor matrices.One shortcoming of the original HO-GSVD framework [11] is that the arithmetic mean (1.2) is only well deﬁnedfor matrices A i that have full column rank. If rank A i < n or m i < n for some i , then the inverse ( A T i A i ) doesnot exist and so S in (1.2) is not well deﬁned. In addition, computing (1.2) may be inaccurate when one ormore of the A i have small singular values. In Section 7, we describe several applications in which the problemdata generally has m i < n .To extend the HO-GSVD framework to rank-deﬁcient A i , we introduce the term πA T A with parameter π > A of stacked A i has full rank, this has the eﬀect of shifting the eigenvaluesof each D i,π , so that the terms D i,π are guaranteed to be invertible and the HO-GSVD can be computed for A i with arbitrary rank. When all A i have full column rank, we show that S π with π > S both capture thecommon subspaces of A , . . . , A N , so that our reformulation is equivalent to the original HO-GSVD [11]. Weintroduce the notion of an isolated HO-GSVD subspace that accounts for the fact that a rank deﬁcient A i canhave a non-empty (right) nullspace. An isolated HO-GSVD subspace captures directions that are in the kernelof all but one A i .The GSVD is closely related to the (thin) cosine-sine decomposition (CSD) [3, Ch. 2.5.4]. In essence, the CSDstates that the SVDs of Q ∈ R m × n and Q ∈ R m × n satisfying Q T1 Q + Q T2 Q = I share the same matrixof right singular vectors. The term CSD originates from the fact that the singular values µ i,k of Q and Q satisfy µ ,k + µ ,k = 1. The GSVD can be obtained from applying a CSD to the matrices Q and Q thatare obtained from the thin QR factorization of the stacked matrices [ A T1 , A T2 ] T = QR , where Q is conformablypartitioned such that A i = Q i R . Analogous to the GSVD and the CSD, the HO-GSVD is closely related tothe higher-order CSD (HO-CSD) [13]. The HO-GSVD of N matrices A i can be obtained from the HO-CSDof Q , . . . , Q N that are obtained from the thin QR factorization of the stacked matrices [ A T1 , . . . , A T N ] T . As inthe case of the HO-GSVD, the computation of the HO-CSD proposed in [13] is limited to the case that all Q i have full rank. In this paper, we also propose to compute the HO-CSD in a diﬀerent way, which allows for thefactorization of rank-deﬁcient Q i satisfying Q T1 Q + · · · + Q T N Q N = I .The paper is organized as follows. Section 2 presents the HO-CSD and the HO-GSVD that are applicable torank-deﬁcient matrices. In Section 3, we extend the notion of common HO-CSD and HO-GSVD subspaces torank-deﬁcient matrices. In Section 4, the canonical decompositions are derived by applying the results fromSection 3 to the decompositions from Section 2. Finally, we relate our ﬁndings to existing methods in Section 5and provide a numerical example in Section 6.

2. Main results.

Given N matrices A i ∈ R m i × n , let A denote the matrix of stacked A i and QR = A beits thin QR factorization, A =  A ... A N  = QR =  Q ... Q N  R, Q i ∈ R m i × n , R ∈ R n × n . (2.1)The matrices A i = Q i R can individually have arbitrary rank, but we assume that rank A = n so that det R = 0.Because Q T Q = I , it holds that P Ni =1 Q T i Q i = I and k Q i k ≤

1. The quotient terms D i,π (2.2) of the arithmeticmean S π (1.2) can be rewritten as D i,π = A T i A i + πA T A = R T (cid:0) Q T i Q i + πI (cid:1) R, (2.2)with parameter π >

0. Since A T i A i (cid:23) πA T A ≻

0, the terms D i,π are guaranteed to be invertible. HIGHER-ORDER GSVD FOR RANK DEFICIENT MATRICES T π , as T π := 1 N N X i =1 (cid:0) Q T i Q i + πI (cid:1) . (2.3)The eigensystem of T π will lead us to the HO-CSD of the matrices Q i . It can be shown (Appendix A) that S π and T π are related by: R T S π R T = 1 N − πN ) T π − I ) . (2.4) Theorem

The matrix T π is non-defective. There exists Z ∈ R n × n such that Z T T π Z = diag( τ , . . . , τ n ) with Z T Z = I , where the columns of Z are eigenvectors of T π and the eigenvalues τ i of T π satisfy τ i ∈ (cid:20) N πN , πN + N − πN (1 + π ) (cid:21) . Proof.

From (2.3), the matrix T π is real and symmetric, hence non-defective with an orthonormal eigenbasis [3,Ch. 8.1]. For the bounds on τ i , see Appendix B. Theorem

The matrix S π is non-defective. There exists an invertible V ∈ R n × n such that V S π V =diag( ς , . . . , ς n ) , where the columns of V are eigenvectors of S π and the eigenvalues ς i satisfy ς i ∈ (cid:20) , πN (1 + π ) (cid:21) . Proof.

Pre- and post-multiplying (2.4) with Z T and Z from Thm. 2.1 yields Z T R T S π R T Z = 1 N − (cid:0) (1 + πN ) Z T T π Z − I (cid:1) . Since Z T T π Z = diag( τ , . . . , τ n ), the matrix Z T R T S π R T Z is diagonal. Set V := R T Z , which is invertiblebecause det R = 0 and Z T Z = I , then the columns of V are eigenvectors for S π associated with eigenvalues ς i = ((1 + πN ) τ i − / ( N − ς i are obtained from the bounds on τ i .The signiﬁcance of Thm. 2.1 and 2.2 is that the diagonalizable matrices T π and S π have eigenvalues that areboth bounded away from zero and contained in ﬁnite intervals, in contrast to the formulation [11] that requiresthe full rank condition, which corresponds to π = 0. More precisely, the range of eigenvalues of S π is contractedfrom [1 , ∞ ) for the full-rank case to [1 , / ( πN (1 + π ))] in our case, which bounds the condition number as κ ( S π ) ≤ / ( πN (1 + π )).Before examining the eigenvalues of S π and T π further, we state our version of the HO-CSD and HO-GSVD.The HO-CSD and HO-GSVD have already been described in [13] and [11], respectively, but our modiﬁed D i,π from (2.2) allows us to omit the requirements that A i and Q i be full rank. Definition If P Ni =1 Q T i Q i = I , then the HO-CSD of Q i ∈ R m i × n is given by Q i = U i Σ i Z T , i = 1 , . . . , N , where Z ∈ R n × n is such that Z T Z = I and Z T T π Z = diag( τ , . . . , τ n ) , with T π deﬁned as in (2.3) .The matrices Σ i ∈ R n × n with Σ i = diag( σ i, , . . . , σ i,n ) (cid:23) are obtained from B i := Q i Z, B i = [ b i, , . . . , b i,n ] , σ i,k = k b i,k k , and U i ∈ R m i × n with U i = [ u i, , . . . , u i,n ] from u i,k = ( b i,k /σ i,k if σ i,k > u ∈ R m i with k u k = 1 if σ i,k = 0 . Because we allow for rank Q i < n , it is possible that Q i z k = 0 for some eigenvector z k of T π , consequentlymaking the corresponding generalized singular value σ i,k = 0. In these cases, the column u i,k can be chosenfreely or the corresponding row of Σ i can be dropped. The columns of U i have unit 2-norm and are, undercertain circumstances, orthogonal (see Section 3). I. KEMPF, P. J. GOULART AND S. R. DUNCAN

Definition

If the matrix A in (2.1) has full column-rank, then the HO-GSVD of the N matrices A , . . . , A N is given by A i = U i Σ i V T , where V ∈ R n × n is such that V S π V = diag( ς , . . . , ς n ) where S π is deﬁned in (1.2) using (2.2) . The matrices Σ i ∈ R n × n with Σ i = diag( σ i, , . . . , σ i,n ) (cid:23) are obtained from V B T i := A T i , B i = [ b i, , . . . , b i,n ] , σ i,k = k b i,k k , and U i ∈ R m i × n with U i = [ u i, , . . . , u i,n ] from u i,k = ( b i,k /σ i,k if σ i,k > u ∈ R m i with k u k = 1 if σ i,k = 0 .

3. Common and isolated subspaces.

The HO-CSD and HO-GSVD identify directions, corresponding tocolumns of Z and V , that contribute equally to the right images of each Q i and A i , respectively. These directionsare the right singular vectors that are shared across each Q i and A i and associated with identical singular values.The shared right singular vectors form subspaces [11, 13], which are referred to as the common HO-CSD andHO-GSVD subspaces. However, these have been deﬁned for rank Q i = n and rank A i = n . The following twotheorems extend the notion of common subspaces to consider rank-deﬁcient matrices. Theorem

Suppose the assumptions on Q i from Def. 2.3 hold.The common HO-CSD subspace is deﬁned as T N { Q , . . . , Q N } := { z ∈ R n | T π z = τ min z } , and the isolated HO-CSD subspace as T { Q , . . . , Q N } := { z ∈ R n | T π z = τ max z } , where τ min and τ max correspond to the left and right boundaries on the eigenvalues of T π deﬁned in Thm. 2.1. Avector z ∈ T N is a right singular vector for each Q i and associated with a singular value equal to / √ N . A vector z ∈ T is in the nullspace of all but one Q i .Proof. See Appendix B.Note that for an arbitrary set of matrices Q , . . . , Q N , the subspaces deﬁned in Thm. 3.1 are likely to be empty.However, as conjectured in [11], the eigenvectors associated with eigenvalues for T π , which lie close to theboundaries τ min or τ max , are likely to be directions that are equally ampliﬁed across all Q i or singular to oneparticular Q i . Theorem

Suppose the assumptions on A i from Def. 2.4hold. The common HO-GSVD subspace is deﬁned as S N { A , . . . , A N } := { v ∈ R n | S π v = ς min v } , and the isolated HO-GSVD subspace as S { A , . . . , A N } := { v ∈ R n | T π v = ς max v } , where ς min and ς max correspond to the left and right boundaries on the eigenvalues of S π deﬁned in Thm. 2.2.A vector v ∈ S N is a right singular vector for each A i and associated with an identical singular value. A vector v ∈ S is in the nullspace of all but one A i .Proof. Follows from Thm. 2.2 and 3.1, since if z is an eigenvector for T π , then v = R T z is an eigenvector of S π .Therefore if z is a right singular vector for each Q i associated with an identical singular value, then v is a rightsingular vector for each A i associated with an identical singular value.

4. Canonical Forms.

The common subspaces capture the right singular vectors that are shared amongmatrices and have equal ampliﬁcations. For the HO-CSD and HO-GSVD, these directions are given by theeigenvectors of T π and S π , respectively, associated with a particular eigenvalue. It turns out that these directionsproduce the left basis vectors, given by the columns of U i in Def. 2.3 and 2.4, that are orthogonal to all othercolumns of U i . The following corollaries consider this additional property. Corollary

Suppose that the assumptions on Q i from Def. 2.3 hold and let Z T T π Z =diag( τ , . . . , τ n ) be the Schur decomposition of T π with Z = [ z , . . . , z n ] and T N { Q , . . . , Q N } = span { z , . . . , z p } the common HO-CSD subspace. Then, using Def. 2.3, Q i = U i Σ i Z T with U i = (cid:2) U cip | U uin p (cid:3) , Z = (cid:2) Z cp | Z un p (cid:3) , Σ i = (cid:20) I p / √ N

00 Σ ui (cid:21) , HIGHER-ORDER GSVD FOR RANK DEFICIENT MATRICES where Σ ui is diagonal. The columns of each U ci are orthonormal and ( U ci ) T U ui = 0 . If, in addition, one of the n − p column vectors z k of Z u is such that z k ∈ T { Q , . . . , Q N } , then the corresponding column u i,k of each U ui can be chosen to be orthogonal to all other columns of U i and the corresponding generalized singular value σ i,k iseither zero or equal to .Proof. From Thm. 3.1, a vector z k ∈ T N { Q , . . . , Q N } is a right singular vector for each Q i associated with a(generalized) singular value σ i,k = 1 / √ N , whereas a vector z k ∈ T { Q , . . . , Q N } is a right singular vector for each Q i associated with a (generalized) singular value σ i,k that is either zero or equal to 1. For z k ∈ T N { Q , . . . , Q N } ,let b i,k = Q i z k = σ i,k z k denote the columns of the matrix B i from Def. 2.3. Then for k = j and each Q i b T i,k b i,j = ( Q i z k ) T Q i z j = ( Q T i Q i z k ) T z j = σ i,k z T k z j = 0 . The property ( U ci ) T U ui = 0 follows from Def. 2.3. For z k ∈ T { Q , . . . , Q N } , Def. 2.3 allows for a free choice ofthe corresponding column u i,k . Choose this particular column to be the (standard) left singular vector associatedwith a singular value that is either zero or 1. The orthogonality property follows. Corollary

Suppose that the assumptions on A i from Def. 2.4 hold and let V S π V = diag( ς , . . . , ς n ) be the Schur decomposition of S π with V = [ v , . . . , v n ] and S N { A , . . . , A N } =span { v , . . . , v p } the common HO-CSD subspace. Then, using Def. 2.4, A i = U i Σ i V T with U i = (cid:2) U cip | U uin p (cid:3) , V = (cid:2) V cp | V un p (cid:3) , Σ i = (cid:20) I p / √ N

00 Σ ui (cid:21) , where Σ ui is diagonal. The columns of each U ci are orthonormal and ( U ci ) T U ui = 0 . If, in addition, one of the n − p columns vectors v k of V u is such that v k ∈ S { A , . . . , A N } , then the corresponding column u i,k of each U ui is orthogonal to all other columns of U i and the corresponding generalized singular value σ i,k is either zeroor equal to .Proof. Follows directly from Corollary 4.1 and v k = R T z k .

5. Comparison with standard HO-GSVD, GSVD and SVD.

When one out of two matrices is theidentity matrix, the GSVD reduces to the standard SVD [12]. The same has been shown for the full-rank HO-GSVD [11]. When N − A i are identity matrices, then the full-rank HO-GSVD reverts to the standardSVD of A j , j = i . Here, this fact is demonstrated for our HO-GSVD as given in Def. 2.4. Lemma

Suppose that N = 2 , A ∈ R m × n , A = I n and rank[ A T , A T ] T = n . Then the HO-GSVD of A and A yields the standard SVD of A .Proof. See Appendix C.

Theorem

Suppose that

N > . The HO-GSVD of N − matrices satisfying A i = I yields the SVD of A j , i = j .Proof. See Appendix D.The HO-GSVD from Def. 2.4 can also be related to the GSVD. For the special case that N = 2, A ∈ R m × n with m ≥ n and rank A = n and an arbitrary A ∈ R m × n , it can be shown that the HO-GSVD yields Σ i withΣ T1 Σ + Σ T2 Σ = I and orthonormal U and U . Lemma

Let N = 2 , then T π from Def. 2.3, (cid:0) Q T i Q i + πI (cid:1) and Q T i Q i with i = 1 , and Q T Q + Q T Q = I share the same eigenspace for any π ∈ R + .Proof. See Appendix E.

Theorem

For N = 2 , the HO-CSD from Def. 2.3 yields the CSD.Proof. Using Lemma 5.3, the eigenvectors z k for T π can be chosen such that they are right singular vectors for Q and Q . Let b i,k denote the columns of B i = Q i Z , then for j = k , b T i,k b i,j = ( Q i z k ) T Q i z j = (¯ σ i,k ¯ u i,k ) T ¯ σ i,j ¯ u i,j = 0 , where ¯ σ × and ¯ u × denote standard singular values and left singular vectors, respectively. Hence, from U i Σ i = B i ,the columns of U i are either zero or orthonormal. Substituting Q i = U i Σ i V T in Q T1 Q + Q T2 Q = I yields Z Σ T1 Σ Z T + Z Σ T2 Σ Z T = I, and from Z T Z = I , follows Σ T1 Σ + Σ T2 Σ = I . Corollary

For N = 2 , A ∈ R m × n with m ≥ n and rank A = n and an arbitrary A ∈ R m × n , theHO-GSVD from Def. 2.4 yields the GSVD [12].Proof. Follows from Thm. 5.4 with V = R T Z . I. KEMPF, P. J. GOULART AND S. R. DUNCAN

Remark T π , Q T1 Q and Q T2 Q , share the same eigenspace, but not every eigendecomposition of T π yieldseigenvectors that are parallel to those of Q T1 Q and Q T2 Q . For example, suppose that dim(ker( Q i )) = 1 andthat q i ∈ ker( Q i ), i = 1 ,

2, are linearly independent. Note that from Q T1 Q + Q T2 Q = I , it holds that q T1 q = 0.It follows that dim( T ) = 2, so that T π has an eigenvalue with algebraic and geometric multiplicity of two. Whenthe associated eigenvectors are computed using numerical software, these will not necessarily be parallel to q and q , and the HO-CSD will not necessarily yield orthonormal matrices U i .The HO-GSVD from Def. 2.4 can also be compared with the full-rank HO-GSVD [11]. For N = 2 and full-rankmatrices A i , both HO GSVDs have been shown to be equivalent to the GSVD. Both HO GSVDs have also beenshown to yield the SVD of A j when A i = I for i = j .For N >

2, however, the HO-GSVD from Def. 2.4 and [11] will in general not yield identical factorizations A i = U i Σ i V T , even when rank A i = n . This can be seen by comparing the eigenspaces of T π from (2.3) forvarying π , where π = 0 corresponds to the standard HO-CSD [13]. For N = 2, Lemma 5.3 shows that theeigenvectors of T π are independent of the value of π because its eigenvectors are ﬁxed by the orthogonalityproperty Q T1 Q + Q T2 Q = I , while for N > V = R T Z , it follows that the sameholds for the HO-GSVD. However, it can be shown that in case the matrices A i and Q i have full rank, then thecommon HO-CSD and HO-GSVD subspaces will be the same for any value of π . Lemma

The common and isolated subspaces from Thm. 3.1 and 3.2 are independent of the value of π .Proof. Follows from the deﬁnitions of the common and isolated subspaces (see Appendix B), which require thatfor z ∈ T N and v ∈ S N , z and v must be right singular vectors for each Q i and A i , respectively. This requirementis independent of the value of π .

6. Numerical Example.

To illustrate the common subspaces of order N − P , we consider the followingexample with N = 3 matrices, A =  A A A  =  a a  , ( a , a ) = ( (1 ,

0) for case 1(1 ,

1) for case 2(6.1)where rank A = 2. The matrix A ( A ) has right singular vectors [1 , T and [0 , T associated with singularvalues 0 and 1 (1 and 0), respectively. For each of the cases, the matrix S π from (1.2) is obtained as S π (cid:12)(cid:12)(cid:12)(cid:12) case 1 = " π +12 π π +12 π π +3 π π +3 π , S π (cid:12)(cid:12)(cid:12)(cid:12) case 2 = " π +6 π π +6 π π +6 π π +6 π , (6.2)with eigenvalues that can be read from the diagonal.For case 1, A = A and comparison with Thm. 3.2 shows that the eigenvector [0 , T spans S and is a commonright singular vector for each A i associated with a zero singular value for A and A and with a singular valueequal to 1 for A .For case 2, none of the A i have common right singular vectors associated with identical singular values. Compar-ison with Thm. 3.2 shows that none of the eigenvalues of S π correspond to a value that would yield a non-empty S or S .

7. Applications.

The HO-GSVD has already been applied in various ﬁelds, and the following paragraphsoutline applications for which the rank-deﬁcient case is relevant.

Bioinformatics:

The (HO-)GSVD has been used for genomic signal processing [11, 15, 9]. In [11], the

N > A i represent measurements taken from diﬀerent organism with the rows of A i representing organismspeciﬁc genes and the columns time instants. Because the study includes several thousands of genes, but onlyfew time instants, the matrices have full-column rank. In other studies, such as [15] or [17], the n columns of A i are genes that are shared across diﬀerent organism. In these cases, the standard HO-GSVD cannot be appliedbecause the matrices typically satisfy m i < n . In [9], N = 2 vaccines are analyzed by comparing datasets A i that represents n genes and measurements across m i patients. These matrices also have dimensions m i < n andour HO-GSVD extension could be used to extend the study to N >

Neuroscience:

In [16], a tensor decomposition has been used to factorize matrices obtained from

N > n columns of A i model a large number of neurons, while the m i rows represent the time instantsat which the measurements are taken. These matrices typically have more columns than rows and the tensor HIGHER-ORDER GSVD FOR RANK DEFICIENT MATRICES m i = m j for i, j = 1 , . . . , N . Our HO-GSVD extension could be usedfor such a factorization and, in contrast to the tensor decomposition, our decomposition would allow for varying m i across the trials. Artiﬁcial Intelligence:

Tensor decompositions are widely used in artiﬁcial intelligence applications [4]. Thedecompositions serve to classify data or extract useful information from a vast amount of data. In a classiﬁcationproblem, for example, the data is composed out of m i classiﬁed examples with n features that are shared among N data classes. For the case that m i < n , our HO-GSVD extension could be used to identify shared and unsharedfeatures of the diﬀerent data classes. Control Theory:

In [8], the GSVD has been used to diagonalize the orbital dynamics of electrons in a synchro-tron with N = 2 actuator (magnet) arrays. The actuator matrices typically have full row-rank and fewer rowsthan columns. Our HO-GSVD extension could be used to diagonalize a system with N >

8. Conclusion.

In this paper, we have extended the standard HO-GSVD [11] to accommodate columnrank-deﬁcient matrices. By adding the term πA T A to each of the quotient terms D i,π = A T i A i + πA T A , weshifted their eigenvalues and bounded them away from zero. This allowed for omitting the full-rank requirementon each A i and extending the notion of common subspaces to consider the right nullspace of each A i .The choice of adding a multiple of A T A was motivated by the relationship between the arithmetic mean ofpairwise quotients S π and the mean of reciprocal inverses T π , which yielded the same relationship than in [11]for π = 0. The parameter π was assumed to be positive, but otherwise left unspeciﬁed. The common HO-CSDand HO-GSVD subspaces are identiﬁed irrespective of the value of π . We could have chosen to deﬁne D i,π as D i,π i = A T i A i + π i A T A with a parameter π i that is diﬀerent for each term. The resulting relationship between S π and T π would have been similar to (2.4). An indexed parameter π i , for example, could account for the casethat the matrices A i share a right singular vector v associated with diﬀerent singular values and an appropriatechoice of π i could make v an eigenvector for S π . Future research could investigate the role of the weight π or theapplication of indexed parameter π i .The majority of our developments were based on the HO-CSD. Using the QR factorization of A = [ A T1 , . . . , A T N ] T ,each A i was represented as A i = Q i R and our ﬁndings were developed for Q , . . . , Q N , from which we inferredthe corresponding properties for A , . . . , A N . This procedure allowed us to exploit the orthogonality property P Ni =1 Q T i Q i = I and ensure the invertibility of the factor matrix V in A i = U i Σ i V T , but required a full column-rank A , so that det R = 0. For a column rank-deﬁcient A , the HO-GSVD remains applicable iﬀ the D i,π areinvertible, but it is unclear whether the matrix V remains invertible. The invertibility of V is a key requirementfor the application of matrix factorizations, as it allows one to switch from the original space to the modal space.A further investigation could focus on elaborating properties for the arithmetic mean of pairwise quotients S π that are not based on the HO-CSD.We bounded the eigenvalues of T π ( S π ) and showed that the extremal eigenvalues are attained iﬀ the correspondingeigenvectors are right singular vectors for each Q i ( A i ) associated with a particular singular value. This leadto the deﬁnition of the common and isolated HO-CSD (HO-GSVD) subspaces. In Appendix B, we also showedthat if the Q i share a right singular vector v associated with a zero singular value for P matrices Q i and withan identical singular value for the other N − P matrices Q j , then T π will have a particular eigenvalue τ ( P )associated with the eigenvector v . Future research could investigate whether a biconditional (“iﬀ ”) connectionholds. REFERENCES[1]

G. P. Dada and A. Armaou , , Ind. Eng. Chem. Process Des. Dev., 58 (2019), pp. 23201–23210.[2] L. De Lathauwer, B. De Moor, and J. Vandewalle , A multilinear singular value decomposition , SIAM J. Matrix Anal.Appl., 21 (2000), pp. 1253—-1278.[3]

G. H. Golub and C. F. Van Loan , Matrix Computations , JHU Press, Baltimore, MD, USA, 4 ed., 2013.[4]

Haiping Lu, K. N. Plataniotis, and A. N. Venetsanopoulos , A survey of multilinear subspace learning for tensor data ,Pattern Recognit., 44 (2011), pp. 1540–1551.[5]

G. H. Hardy, J. E. Littlewood, and G. P´olya , Inequalities , CUP, Cambridge, UK, 1 ed., 1934.[6]

R. A. Harshman , Foundations of the PARAFAC procedure: Models and conditions for an ”explanatory” multimodal factoranalysis , UCLA Work. Pap. Phon., 16 (1970), pp. 1–84.[7]

R. A. Harshman , PARAFAC2: Mathematical and technical notes , UCLA Work. Pap. Phon., 22 (1972), pp. 30–44.[8]

I. Kempf, S. R. Duncan, P. J. Goulart, and G. Rehm , Multi-array electron beam stabilization using block-circulant trans-formation and generalized singular value decomposition , in Proc. 59th CDC, Jeju Island, Republic of Korea, Dec. 2020.

I. KEMPF, P. J. GOULART AND S. R. DUNCAN[9]

K.Van Deun, A. K. Smilde, L. Thorrez, H. A. L. Kiers, and I. Van Mechelen , Identifying common and distinctiveprocesses underlying multiset data , Chemom. Intell. Lab. Syst., 129 (2013), pp. 40–51.[10]

D. Lahat, T. Adali, and C. Jutten , Multimodal data fusion: An overview of methods, challenges, and prospects , Proc.IEEE, 103 (2015), pp. 1449–1477.[11]

S. P. Ponnapalli, M. A. Saunders, C. F. Van Loan, and O. Alter , A higher-order generalized singular value decompositionfor comparison of global mRNA expression from multiple organisms , PLOS ONE, 6 (2011), pp. 1–11.[12]

C. F. Van Loan , Generalizing the singular value decomposition , SIAM J. Numer. Anal., 13 (1976), pp. 76–83.[13]

C. F. Van Loan , Lecture 6. The higher-order generalized singular value decomposition

J. G. VanAntwerp, A. P. Featherstone, R. D. Braatz, and B. A. Ogunnaike , Cross-directional control of sheet and ﬁlmprocesses , Automatica, 43 (2007), pp. 191–211.[15]

L. J. Van’t Veer, D. Hongyue, M. J. Van de Vijver, Y. D. He, A. A. M. Hart, M. Mao, H. L. Peterse, K. Van der Kooy,M. J. Marton, A. T. Witteveen, G. J. Schreiber, R. M. Kerkhoven, C. Roberts, P. S. Linsley, R. Bernards, andS. H. Friend , Gene expression proﬁling predicts clinical outcome of breast cancer , Nature, 415 (2002), pp. 530–536.[16]

A. H. Williams, T. H. Kim, F. Wang, S. Vyas, S. I. Ryu, K. V. Shenoy, M. Schnitzer, T. G. Kolda, and S. Ganguli , Unsupervised discovery of demixed, low-dimensional neural dynamics across multiple timescales through tensor componentanalysis , Neuron, 98 (2018), pp. 1099–1115.[17]

B. Zhang and S. Horvath , A general framework for weighted gene co-expression network analysis , Stat. Appl. Genet. Mol.Biol., 4 (2005).

Appendix A. Relation between S π and T π . Let D i,π = Q T i Q i + πI and deﬁne K i := Q T i Q i + πI . Using (2.2),the matrix S π is written as S π = 1 N ( N − N X i =1 N X j = i +1 (cid:0) D i,π D j,π + D j,π D i,π (cid:1) = 1 N ( N − R T  N X i =1 N X j = i +1 K i K j + K j K i  R T , so that by considering P i ( K i + πI ) = (1 + πN ) IR T S π R T = 1 N ( N − N X i =1 N X j = i +1 K i K j + K j K i = 1 N ( N − N X i =1 K i N X j =1 K j − N − I = 1 N − πN ) T π − I ) . Appendix B. Characterizing T π . The eigenspace of T π deﬁned in (2.3) with P Ni =1 Q T i Q i = I and k Q i k ≤ Q , . . . , Q N , i.e. common features of thematrices Q i . By setting u = ( Q T i Q i + πI ) t and v = ( Q T i Q i + πI ) t with k t k = 1, the Cauchy-Schwarzinequality ( u T v ) ≤ k u k k v k becomes t T (cid:0) Q T i Q i + πI (cid:1) t ≥ (cid:0) t T ( Q T i Q i + πI ) t (cid:1) , (B.1)with equality iﬀ t is an eigenvector for Q T i Q i [5, Thm. 7]. Using (B.1) and the harmonic-mean arithmetic-mean(HM-AM) inequality [5, Thm. 16], a lower bound on t T T π t , k t k = 1, can be established as t T T π t = 1 N N X i =1 t T (cid:0) Q T i Q i + πI (cid:1) t ≥ N N X i =1 t T ( Q T i Q i + πI ) t (B.2a) ≥ NπN + P Ni =1 t T ( Q T i Q i ) t = NπN + 1 . (B.2b)From the Cauchy-Schwarz inequality, (B.2a) becomes an equality iﬀ t is a common eigenvector for each Q T i Q i + πI or equivalently for each Q T i Q i . From the HM-AM inequality, inequality (B.2b) becomes an equality iﬀ t T ( Q T i Q i + πI ) t = t T ( Q T j Q j + πI ) t for i, j = 1 , . . . , N . In this case, it follows that1 = t T Q T Qt = N X i =1 t T ( Q T i Q i ) t = N X i =1 µ Q = N µ Q , so that the common eigenvalue of Q T i Q i associated with the common eigenvector y is µ Q = 1 /N . The commonvector t is therefore a common right singular vector for each Q i with singular value σ i,k = 1 / √ N . Accordingto (B.2), t is an eigenvector for T π associated with the smallest eigenvalue τ i = N/ (1 + πN ). HIGHER-ORDER GSVD FOR RANK DEFICIENT MATRICES T π can be found by noticing that for each term, t T (cid:0) Q T i Q i + πI (cid:1) t ≤ /π for k t k = 1, which results in t T T π t < /π, (B.3)so that the maximum eigenvalue of T π is bounded above by 1 /π . Note that the strict inequality in (B.3) holdsbecause P Ni =1 Q T i Q i = I and t can not simultaneously be in the nullspace of each Q i . The orthogonality condition P Ni =1 Q T i Q i = I leads to a tighter bound that is established in the following theorem. Theorem

B.1.

It holds that ρ ( T π ) ≤ ( πN + N − / ( πN (1 + π )) , where T π is deﬁned in (2.3) and ρ ( T π ) denotesthe maximum eigenvalue of T π . Equality holds iﬀ there exists a vector that lies in the kernel of all but one Q T i Q i .Proof. Let ¯ V i ¯Σ i ¯ V T i = Q T i Q i be the eigendecomposition of Q T i Q i and set ¯Σ i = diag(¯ σ i,j ), j = 1 , . . . , n , so that (cid:0) ¯Σ i + πI (cid:1) = diag (cid:0) ¯ σ i,j + π (cid:1) . Let t with k t k = 1 be an eigenvector of T π associated with ρ ( T π ), then ρ ( T π ) = t T T π t = 1 N N X i =1 t T ¯ V i diag( 1 σ i, + π , . . . , σ i,n + π ) ¯ V T i t. For each i , express t as t = P nj =1 a i,j ¯ v i,j , where ¯ v i,j are the columns of the orthonormal matrices ¯ V i and a i,j ∈ R with P nj =1 a i,j = 1 because k t k = 1. In addition, deﬁne y i,j := ¯ σ i,j + π ∈ [ π, π ] and rewrite ρ ( T π ) as ρ ( T π ) = t T T π t = 1 N N X i =1 n X j =1 a i,j y i,j , n X j =1 a i,j = 1 , y i,j ∈ [ π, π ] . Next, bound each of the terms 1 /y i,j by a function that is linear in y i,j , i.e. f ( y i,j ) = my i,j + c with 1 /y i,j ≤ f ( y i,j )for y i,j ∈ [ π, π ]. Note that the function 1 /y i,j is convex on y i,j ∈ [ π, π ] and m and c can be chosen suchthat equality holds at y i,j = π and y i,j = 1 + π , i.e. by selecting m and c as m = − π (1 + π ) , c = 1 + 2 ππ (1 + π ) . We obtain a bound on ρ ( T π ) as ρ ( T π ) ≤ N N X i =1 n X j =1 a i,j − y i,j + 1 + 2 ππ (1 + π ) =  N N X i =1 n X j =1 − a i,j y i,j π (1 + π )  + 1 + 2 ππ (1 + π ) , where we used that P nj =1 a i,j = 1. Clearly, equality holds iﬀ the y i,j associated with non-zero a i,j are at theboundaries of [ π, π ]. Finally, exploit the orthogonality condition P Ni =1 Q T i Q i = P Ni =1 ¯ V i ¯Σ i ¯ V T i = I . Byadding N πI , we can express the same condition using y i,j as P Ni =1 ¯ V i diag( y i,j ) ¯ V T i = (1 + N π ) I . Left- and right-multiplying the modiﬁed orthogonality condition by t T and t , respectively, and expressing t using the columns¯ v i,j as above yields N X i =1 t T ¯ V i diag( y i,j ) ¯ V T i t = N X i =1 n X j =1 a i,j y i,j = 1 + N π.

Substituting in the upper bound on ρ ( T π ) yields ρ ( T π ) ≤ − N ππ (1 + π ) + 1 + 2 ππ (1 + π ) = 1 N (cid:18) N − π + 11 + π (cid:19) = πN + N − πN (1 + π ) . Equality holds iﬀ each y i,j = ¯ σ i,j + π is at the boundaries of [ π, π ], i.e. either y i,j = π , in which case ¯ σ i,j = 0,or y i,j = 1 + π , in which case ¯ σ i,j = 1. Note that the orthogonality condition requires at least one non-zero ¯ σ i,j .In addition, if ¯ σ i,j = 1 associated with ¯ v i,j for some i and j , then the vector ¯ v i,j must be shared among all ¯ V i and associated with ¯ σ k,j = 0 for k = i , because ¯ v T i,j ( P Ni =1 ¯ V i ¯Σ i ¯ V T i )¯ v i,j > ρ ( T π ) = ( πN + N − / ( πN (1 + π )) iﬀ the Q T i Q i have a shared eigenvector associated with a zero eigenvalue for N − I. KEMPF, P. J. GOULART AND S. R. DUNCAN

Finally, note that if there exists a vector t with k t k = 1 in the nullspace of P matrices Q j , but in the range ofall other Q i with index i ∈ R , then the inequalities (B.2a)-(B.2b) can be reformulated as t T T π t = 1 N N X i =1 t T (cid:0) Q T i Q i + πI (cid:1) t = PπN + 1 N X i ∈R t T (cid:0) Q T i Q i + πI (cid:1) t ≥ PπN + 1 N X i ∈R t T ( Q T i Q i + πI ) t (B.4a) ≥ PπN + N − PN N − Pπ ( N − P ) + X i ∈R t T ( Q T i Q i ) t | {z } =1 = P (1 − πN ) + πN πN (1 + π ( N − P )) . (B.4b)The term on the right-hand side of (B.4b) corresponds to the minimum and maximum eigenvalues of T π for P = 0 and P = N −

1, respectively. If there exists a shared vector t in the nullspace of P matrices Q j , but in therange of all other Q i , then an eigenvalue of T π will be equal to the corresponding value on the right-hand sideof (B.4b). Note that (B.4) does not prove the converse. Appendix C. Proof of Lemma 5.1.

Let ¯ U ¯Σ ¯ V T1 = A be the standard SVD of A with ¯Σ ∈ R m × n (cid:23) A = I n in (1.2) using (2.2): S π = 12 ¯ V (cid:16)(cid:0) ˜ π ¯Σ T1 ¯Σ + I (cid:1) (cid:0) ˜ πI + ¯Σ T1 ¯Σ (cid:1) + (cid:0) ˜ πI + ¯Σ T1 ¯Σ (cid:1) (cid:0) ˜ π ¯Σ T1 ¯Σ + I (cid:1) (cid:17) ¯ V T1 , (C.1)where ˜ π = (1 + π ) /π . Therefore, S π = S T π ≻

0, hence non-defective with an orthonormal eigenbasis V = ¯ V .According to Def. 2.4, the HO-GSVD A = U Σ V T is obtained from V B T1 = A T1 and U Σ = B . We showthat B has orthogonal columns. From the QR factorization (2.1), A = I = Q R so that Q = R , I = Q T1 Q + Q T2 Q = Q T1 Q + R T R , and R T R − I = R T Q T1 Q R = A T1 A . (C.2)Therefore, B T1 B = V T A T1 A V = V T (cid:0) R T R − I (cid:1) V Z = RV = Z T Z − V T V, (C.3)where the columns of Z are the orthogonal eigenvectors of T π = T T π , thus the matrix on the right-hand side isdiagonal. We follow that B has orthogonal columns. If A has r = rank A < n , then n − r columns of B willbe zero and, from U Σ = B , the associated columns of U can be chosen freely. These can always be chosen tohave unit length and such that U has at least m orthogonal columns. We obtain a factorization A = U Σ V T ,where Σ (cid:23) V is orthonormal. If U ∈ R m × n has more columns than rows, i.e. m < n , weremove n − m of the linear dependent columns of U and the corresponding rows of Σ , which are necessarilyzero-rows. The remaining part of U is orthonormal, thus the HO-GSVD of A must be the standard SVD of A [11, Thm. S5]. Appendix D. Proof of Lemma 5.2.

Follows from Lemma 5.1 and from the fact that the matrices M and ˜ M := aM + bI , where a, b ∈ R , share the same eigenspace so that all right singular vectors for A j = I areeigenvectors for S π as in (C.1). For N − A i = I , i = j , equation (C.2) becomes A T j A j = R T R − ( N − I .From substituting A T j A j in equation C.3, it can be seen that B T j B j is orthogonal. Appendix E. Proof of Lemma 5.3.

Let t be an eigenvector for Q T1 Q associated with eigenvalue λ . Then,using Q T1 Q + Q T2 Q = I , Q T2 Q t = ( I − Q T1 Q ) t = (1 − λ ) t, so that t is an eigenvector for Q T2 Q associated with eigenvalue 1 − λ . Note that t is also an eigenvectorfor (cid:0) Q T1 Q + πI (cid:1) and (cid:0) Q T2 Q + πI (cid:1) associated with eigenvalues 1 / ( π + λ ) and 1 / (1 + π − λ ), respectively.From 2.3, T π t = 12 (cid:16)(cid:0) Q T1 Q + πI (cid:1) + (cid:0) Q T2 Q + πI (cid:1) (cid:17) t = 12 (cid:18) π + λ + 11 + π − λ (cid:19) t, i.e. t is an eigenvector for T π associated with eigenvalue / π ( π + λ )(1+ π − λ ))