aa r X i v : . [ m a t h . R T ] M a y Algebraic Methods for Tensor Data
Neriman Tokcan , , Jonathan Gryak , Kayvan Najarian , , , and Harm Derksen , Abstract.
We develop algebraic methods for computations with tensor data. We give 3 applications: extractingfeatures that are invariant under the orthogonal symmetries in each of the modes, approximationof the tensor spectral norm, and amplification of low rank tensor structure. We introduce coloredBrauer diagrams, which are used for algebraic computations and in analyzing their computationalcomplexity. We present numerical experiments whose results show that the performance of thealternating least square algorithm for the low rank approximation of tensors can be improved usingtensor amplification.
Key words. tensors, Brauer diagrams, representation theory, invariant theory.
AMS subject classifications.
1. Introduction.
Data in applications often is structured in higher dimensional arrays.Arrays of dimension d are also called d -way tensors , or tensors of order d . It is challenging togeneralize methods for matrices, which are 2-dimensional arrays, to tensors of order 3 or higher.The notion of rank can be generalized from matrices to higher order tensors (see [13, 14]).Also, the spectral and nuclear norms are not only defined for matrices, but also for tensorsof order ≥ Brauer diagrams ([2, 8, 31]).Brauer diagrams are perfect matching graphs. We will discuss the background on ClassicalInvariant Theory and Brauer diagrams in Section 2. We will restrict ourselves to 3-way ten-sors. The techniques generalize to tensors of order ≥
4, but some of the formulas becomemore complicated. To perform computations with 3-way tensors, we generalize the notion ofBrauer diagrams to colored trivalent graphs called colored Brauer diagrams , Section 3.In this paper we consider 3 applications of our algebraic approach, namely invariant tensorfeatures from data, approximations of the spectral and nuclear norm, and tensor amplifica-tion. In Subsection 4.1, we introduce the norm ||T || mσ,m for m ∈ N to approximate the spectralnorm of 3-way tensors. In Subsection 4.2, we show that ||T || σ, is equal to the Euclidean norm(alternatively called Frobenius, Hilbert-Schmidth norm). In Subsection 4.4, we introduce an- Department of Mathematics, University of Michigan, Ann Arbor The Eli and Edythe L. Broad Institute of MIT and Harvard, Cambridge, Massachusetts Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor Department of Emergency Medicine, University of Michigan, Ann Arbor Michigan Center for Integrative Research in Critical Care, University of Michigan, Ann Arbor ther norm || T || that approximates the spectral norm. The main results are explicit formulasof these norms in terms of colored Brauer diagrams (see Theorem 4.2 and Proposition 4.5) anda comparison between the spectral norm and these approximations (see Proposition 4.8) isgiven. In Section 5, we study the low rank amplification methods based on the approximationsof the spectral norm. We employ these amplification methods to obtain better initial guessesfor the CP-ALS method; an algorithm for the low rank approximation to 3-way tensors isgiven (Section 5.3, Algorithm 5.1). In Section 6, we compare the ALS tensor approximationbased on tensor amplification initialization with random initialization. In our experiments, wesee that methods introduced in Section 5.3 give low rank r approximations (r=1 in Subsection6.1 and r=2 in Subsection 6.2) with better fits and improved time efficiency compared toCP-ALS method. We will introduce the basic concepts and notation,which will lay the foundation for the rest of the paper. We will borrow most of our notationfrom [7] and [15].As we have stated before, tensors are multi-dimensional arrays. The order of a tensor isthe number of its dimensions (ways, modes). Vectors are tensors of order 1 and matrices aretensors of order 2. We will refer to tensors of order 3 or higher as higher-order tensors. Vectorsare denoted by lower case letters x ∈ R p , matrices are denoted by capital letters X ∈ R p × q , and higher-order tensors are denoted by capital calligraphic letters X ∈ R p × p × ... × p d . The( i , i , . . . , i d ) − th entry of the d -th order ( d -way) tensor X is denoted by x i i ...i d . The vector outer product of u ∈ R p and v ∈ R q is denoted by u ⊗ v and it can be given asthe matrix multiplication uv T ∈ R p × q . The inner product of two same size tensors X , Y ∈ R p × p × ... × p d is defined as follows:(1.1) X · Y = p X i =1 p X i =1 . . . p d X i d =1 x i i ...i d y i i ...i d ∈ R . It follows immediately that the norm of a tensor is the square root of the sum of the squaresof all its elements:(1.2) kX k = vuut p X i =1 p X i =1 . . . p d X i d =1 x i i ...i d . This is analogous to the matrix Frobenius norm, see Section 1.3 for more details on theFrobenius norm.A tensor
S ∈ R p × p × ... × p d is rank one if it can be written as an outer product of n vectors,i.e., S = u ⊗ u ⊗ . . . ⊗ u d , u i = 0 ∈ R p i , ≤ i ≤ d. Such rank one tensors are also called simple or pure .The best rank 1 approximation problem of a tensor T ∈ R p × p × ... × p d can be stated asfollows:(1.3) min S kT − Sk where S is a rank one tensor in R p × p × ... × p d . he best rank 1 approximation problem is well-posed and NP-hard ([17, 28]). Differentalgebraic tools and algorithms have been proposed to find the global minimum of Problem 1.3(see [28, 34]).A tensor S ∈ R p × p × ... × p d can be represented as a linear combination of rank 1 tensors:(1.4) S = r X i =1 λ i u ,i ⊗ u ,i ⊗ . . . ⊗ u d,i where r is sufficiently large, λ i ∈ R and u j,i ∈ R p j for 1 ≤ i ≤ r, ≤ j ≤ d. The smallest suchinteger r is called the rank (real rank) of the tensor. The decomposition given in (1.4) is oftenreferred to as the rank r decomposition, CP (Candecomp / Parafac), or Canonical Polyadicdecomposition . Let U ( j ) = [ u j, u j, . . . u j,r ] ∈ R p j × r , ≤ j ≤ d. We call these matrices as factor matrices . Then CP decomposition factorizes a d -way tensor into d factor matrices anda vector Λ = [ λ , λ , . . . , λ r ] ∈ R r . The decomposition in (1.4) can be concisely expressed as S = J Λ ; U (1) , U (2) , . . . , U ( d ) K . As in (1.3), the best rank r approximation problem for atensor T ∈ R p × p × ... × p d can be given as:(1.5) min Λ ,U (1) ,...,U ( d ) kT − Sk where S = J Λ ; U (1) , U (2) , . . . , U ( d ) K . The solution to Problem (1.5) does not always exist ([15, 29]). Alternating Least Squares(ALS) is the most common method used for the low rank approximation, since it is simpleand easy to implement. However, it has some limitations: convergence is slow for some cases,it is heavily dependent on the initial guess of the factor matrices, and it may not converge toa global minimum (see [15, 17] for more details on the CP decomposition and ALS method).More details on the ALS algorithm for the low rank approximation are given in Section 5.3.
Let O p ( R ) be the group of orthogonal p × p matrices. The group O p ( R ) × O q ( R ) acts on the space R p × q of p × q matrices by left andright multiplication. A group element ( B, C ) ∈ O p ( R ) × O q ( R ) acts on a matrix A ∈ R p × q by ( B, C ) · A = BAC − . The singular values λ ( A ) ≥ λ ( A ) ≥ · · · ≥ λ r ( A ) ≥ p × q matrix A are the features that are invariant under the actions of O p ( R ) and O q ( R ) on therows and columns respectively. In other words, if B and C are orthogonal matrices, then λ i ( BAC − ) = λ i ( A ) for 1 ≤ i ≤ r. The function t k given by t k ( A ) = trace(( AA T ) k ) = λ ( A ) k + λ ( A ) k + · · · + λ r ( A ) k , k ≥ p ( R ) andO q ( R ). The invariant functions t ( A ) , t ( A ) , . . . are polynomials in the entries of the matrix A . It is known that the set { t k : k ≥ } generates the ring of polynomial invariants under theaction of O p ( R ) × O q ( R ) on p × q matrices (see for example [8, § BDI ], but here wedo not need any Pfaffians because we consider orthogonal groups and not special orthogonalgroups). We will consider invariant features for 3-way tensors. Using classical invariant theoryfor the orthogonal group, we will describe polynomial tensor invariants in terms of coloredBrauer diagrams. A similar approach to describing tensor invariants of orthogonal groupactions can be found in the thesis [32, § .3. Approximations of the spectral and nuclear norm. Important norms on the spaceof p × q matrices are the Frobenius norm (or the Euclidean ℓ -norm), the spectral norm (oroperator norm), and the nuclear norm. These norms can be expressed in terms of the singularvalues of a matrix. If λ ≥ λ ≥ · · · ≥ λ r ≥ A , then theFrobenius norm is k A k = k A k F = p λ + λ + · · · + λ r , the spectral norm is k A k σ = λ , andthe nuclear norm is k A k ⋆ = λ + λ + · · · + λ r . The nuclear norm can be seen as a convexrelaxation of the rank of a matrix. It is used for example in some algorithms for the matrixcompletion problem which asks to complete a partially filled matrix such that the rank of theresulting matrix has minimal rank ([4, 5]). This problem has applications to collaborativefiltering (see [26]). The spectral and nuclear norms generalize to higher order tensors.Let T be a tensor of order d in R p × p × ... × p d . We define its spectral norm by(1.6) kT k σ = sup {|T · u ⊗ u ⊗ . . . ⊗ u d | : u j ∈ R p j , k u j k = 1 , ≤ j ≤ d } . It is known that the dual of the spectral norm is the nuclear norm and it can be defined as(1.7) kT k ⋆ = inf { P ri =1 | λ i | : T = P ri =1 λ i u ,i ⊗ u ,i ⊗ . . . ⊗ u d,i , where λ i ∈ R ,u j,i ∈ R p j , k u j,i k = 1 , ≤ j ≤ d, ≤ i ≤ r } . These generalizations are more difficult to compute, as the corresponding decision prob-lems are NP-hard ([7, 12]). As in the matrix case, nuclear norm of a tensor is considered asa convex relaxation of the tensor rank [6]. The nuclear and spectral norms of tensors playan important role in tensor completion problems [33]. Different methods to estimate and toevaluate the spectral norm and the nuclear norm and their upper and lower bounds have beenstudied by several authors (see [16, 20, 21, 25]).The spectral norm is related to rank 1 approximation of a given tensor. If S is a best rank1 approximation of a given tensor T , then kT − Sk = p kT k − kT k σ , (Proposition 1.1, [21]).We will give approximations of the spectral norm that can be computed in polynomialtime using colored Brauer diagrams in Section 4. For every even d we define a norm k·k σ,d thatapproximates the spectral norm k · k σ such that k · k dσ,d is a polynomial function of degree d andlim d →∞ kT k σ,d = kT k σ for any tensor T . One of our main results is an explicit formula forthe norm k · k σ, for tensors of order 3 in terms of colored Brauer diagrams (see Theorem 4.2),which allows us to compute this norm efficiently. We also introduce another norm k · k (seeDefinition 4.3, Proposition 4.5) and show that it is, in some sense, a better approximation tothe spectral norm than k · k σ, (see Proposition 4.8). k . k will stand for the Frobenius norm throughout the paper. If A is a real matrix with singular values λ , . . . , λ r , then thematrix AA T A has singular values λ , λ , . . . , λ r . The map A AA T A has the effect ofamplifying the low rank structure corresponding to larger singular values, while suppressingthe smaller singular values that typically correspond to noise. Using colored Brauer diagrams,we will construct similar amplification maps for tensors of order 3 in Section 5. We also willpresent numerical experiments whose results show that tensor amplification can reduce therunning time of the alternating least square algorithm for the low rank tensor approximation,while producing a better approximation. . Brauer diagrams. In this section, we will give an overview of the classical invarianttheory of the orthogonal group. We recall the relation between Brauer diagrams and invarianttensors for the orthogonal group.
Let V ∼ = R n be a Euclidean vector spacewith basis e , e , . . . , e n . The orthogonal group O( V ) = O n ( R ) = { A ∈ R n × n | AA T = I } actson V . On V we have an inner product that allows us to identify V with its dual space V ⋆ .We consider the d -fold tensor product of V :(2.1) V ⊗ d = V ⊗ V ⊗ · · · ⊗ V | {z } d ∼ = R n × n ×···× n ∼ = R n d . There are various ways to think of elements of V ⊗ d . The following statement is well known(Chapter 2, [17]). Lemma 2.1.
There are bijections between the following sets: the set of tensors V ⊗ d ;
2. ( V ⊗ d ) ⋆ , the set of linear maps V ⊗ d → R ; the set of R -multilinear maps V d → R .Proof. We have a multi-linear map ι : V d → V ⊗ d given by ι ( v , v , . . . , v d ) = v ⊗ v ⊗· · · ⊗ v d , where v i ∈ V for i = 1 , . . . , d. Any linear map L : V ⊗ d → R induces a multi-linearmap ℓ = L ◦ ι : V d → R . Conversely, every multi-linear map ℓ : V d → R factors as ℓ = L ◦ ι for a unique linear map L : V ⊗ d → R by the universal property of the tensor product (see [18,Chapter XVI]). This proves the bijection between (2) and (3). Since we have identified V with its dual V ⋆ we can also identify V ⊗ d with ( V ⋆ ) ⊗ d ∼ = ( V ⊗ d ) ⋆ , which gives the equivalencebetween (1) and (2).We will frequently switch between these different viewpoints in the lemma.The group O( V ) and the symmetric group Σ d act on the d -fold tensor product space asfollows. Let S be a rank r tensor in V ⊗ d such that S = P ri =1 v ,i ⊗ v ,i ⊗ · · · ⊗ v d,i ∈ V ⊗ d , where v j,i ∈ V for all i = 1 , . . . , r and j = 1 , . . . , d . If A ∈ O( V ) , then we have(2.2) A · S = P ri =1 Av ,i ⊗ Av ,i ⊗ · · · ⊗ Av d,i . If π ∈ Σ d is a permutation, then(2.3) π · S = P ri =1 v π − (1) ,i ⊗ v π − (2) ,i ⊗ · · · ⊗ v π − ( d ) ,i . The actions of O n ( R ) and Σ d on V ⊗ d commute.The subspace of O( V )-invariant tensors in V ⊗ d is(2.4) ( V ⊗ d ) O( V ) = {T ∈ V ⊗ d : A · T = T for all A ∈ O( V ) } . A linear map L : V ⊗ d → R is O( V )-invariant if L ( A · T ) = L ( T ) for all tensors T and all A ∈ O( V ). A multi-linear map M : V d → R is O( V )-invariant if M ( Av , . . . , Av d ) = M ( v , . . . , v d )for all v , . . . , v d ∈ V and all A ∈ O( V ). Corollary 2.2.
There are bijections between the following sets: . ( V ⊗ d ) O( V ) , the set of O( V ) -invariant tensors in V ⊗ d ; the set of O( V ) -invariant linear maps V ⊗ d → R ; the set of O( V ) -invariant multilinear maps V d → R .Proof. Following the proof of Lemma 2.1, we see that the bijections in Lemma 2.1 preservethe action of the orthogonal group O( V ) and induce the desired bijections in Corollary 2.2. The First Fundamental The-orem of Invariant Theory for the orthogonal group (Theorem 2.5 below) gives us a descriptionof ( V ⊗ d ) O( V ) . If d is odd then ( V ⊗ d ) O( V ) = 0. We now will describe ( V ⊗ d ) O( V ) for d = 2 e where e is a positive integer.A labeled Brauer diagram of size d = 2 e is a perfect matching of a complete graphwhere the vertices are labeled 1 , , . . . , d (see [8, Chapter 10] for more details). Example D = 1 3 5 ✁✁✁✁✁ . We denote this diagram by (1 3)(2 6)(4 5).To a labeled Brauer diagram D of size d = 2 e we can associate an O( V )-invariant multi-linear map M D : V d → R as follows. If i k is connected to j k for k = 1 , , . . . , e in the diagram D , then we define(2.6) M D ( v , v , . . . , v d ) = ( v i · v j )( v i · v j ) · · · ( v i e · v j e )for all v , . . . , v d ∈ V . By Corollary 2.2 the O( V )-invariant multilinear map M D correspondsto some O( V )-invariant linear map L D : V ⊗ d → R and an O( V )-invariant tensor T D ∈ ( V ⊗ d ) O( V ) , which we make more explicit now. As in the proof of Lemma 2.1, the universalproperty of the tensor product gives us a unique linear map L D : V ⊗ d → R such that(2.7) L D ( v ⊗ v ⊗ · · · ⊗ vd ) = M D ( v , v , . . . , v d ) = ( v i · v j )( v i · v j ) · · · ( v i e · v j e ) . By Corollary 2.2, there is also a unique tensor T D ∈ V ⊗ d such that L D ( A ) = T D · A for alltensors A ∈ O( V ). Example D is the diagram in (2.5), and e , . . . , e n is a basis of V , then we have(2.8) T D = n X i =1 n X j =1 n X k =1 e i ⊗ e j ⊗ e i ⊗ e k ⊗ e k ⊗ e j . The indices i, j, k correspond to the edges (1 3), (2 6) and (4 5) respectively.The proof of the following theorem is in Theorem 4.3.3 and Proposition 10.1.3 of [8].
Theorem 2.5 (FFT of Invariant Theory for O n [8, 24]). The space ( V ⊗ d ) O( V ) of invarianttensors is spanned by all T D where D is a Brauer diagram on d vertices. he following result is well-known (see for example [3, 22]), but the idea of the proof is usefullater. Proposition 2.6.
The number of Brauer diagrams (and perfect matchings in a completegraph) for d = 2 e vertices is · · · · · (2 e − .Proof. Let N e be the number of Brauer diagrams on 2 e nodes. Clearly N = 1. We canconstruct 2 e + 1 Brauer diagrams on 2 e + 2 nodes from a Brauer diagram D on 2 e nodes asfollows. We take D and add two nodes, 2 e + 1 and 2 e + 2. First, we can choose an integer k with 1 ≤ k ≤ e and let l be the vertex that k is connected to. Then we disconnect k from l , connect 2 e + 1 to k and connect 2 e + 2 to l . This gives us a Brauer diagram D k on 2 e + 2nodes. Alternatively, we can also connect 2 e + 1 to 2 e + 2 and get a Brauer diagram on 2 e + 2nodes that we call D e +1 . Thus, we have constructed Brauer diagrams D , . . . , D e +1 from D . One can verify that we generate all Brauer diagrams on 2 e + 2 nodes exactly once if wevary D over all Brauer diagrams on 2 e nodes. So N e +1 = (2 e + 1) N e for all e . A partial Brauer diagram of size d is a graph with d vertices labeled 1 , , . . . , d whose edges form a partial matching. Our convention is to drawloose edges at the vertices that are not matched. To a partial Brauer diagram D with e edges,we can associate an O( V )-invariant multi-linear map M D : V d → V ⊗ ( d − e ) and a linear map L D : V ⊗ d → V ⊗ d − e . Example D = 1 3 5 ✁✁✁✁✁ M D ( v , v , . . . , v ) = ( v · v )( v · v ) v ⊗ v ∈ V ⊗ for v , v , . . . , v ∈ V .Before giving the general rule of computing inner products of tensors associated to Brauerdiagrams, we give an illustrative example. Example T D and T D where D and D are thediagrams below:(2.11) D = 1 3 5 ✁✁✁✁✁ D = 1 3 ❂❂❂❂❂ ✁✁✁✁✁ . e get(2.12) T D · T D = X i,j,k e i ⊗ e j ⊗ e i ⊗ e k ⊗ e k ⊗ e j · X p,q,r e p ⊗ e p ⊗ e q ⊗ e r ⊗ e r ⊗ e q ! = X i,j,k,p,q,r (e i · e p )(e j · e p )(e i · e q )(e k · e r )(e k · e r )(e j · e q ) . To get a nonzero summand we have to have i = j = p = q and k = r . The result of thesummation is P ni =1 P nk =1 n . We can visualize this computation as follows(2.13) • • • ⑦⑦⑦⑦ • • • · • • ❅❅❅❅ • ⑦⑦⑦⑦ • • • = • • ❅❅❅❅ • ⑦⑦⑦⑦⑦⑦⑦⑦ • • • = n . The edges of D correspond to the indices i, j, k and the edges of D correspond to the indices p, q, r . We overlay the diagrams. The indices of the edges in a cycle must all be the same.Since there are two cycles, namely i, p, j, q and k, r we essentially sum over two indices andget n .The general rule is clear now. Corollary 2.9.
The dot product of two tensors T D , T D ∈ V ⊗ d can be computed as follows.We overlay the two diagrams D and D so that the (labeled) nodes coincide. Then T D ·T D = n k where k is the number of cycles (including -cycles).Proof. The tensor T D is equal to the sum of all e i ⊗ e i ⊗ · · · ⊗ e i d with 1 ≤ i , i , . . . , i d ≤ n such that i j = i k whenever ( j k ) is an edge in D . Similarly, T D is the sum of alle p ⊗ e p ⊗ · · · ⊗ e p d with 1 ≤ p , p , . . . , p d ≤ n and p j = p k whenever ( j k ) is an edge in D .We compute(2.14) T D · T D = X i ,...,i d ,p ,...p d (e i · e p )(e i · e p ) · · · (e i d · e p d ) , where the sum is over all ( i , . . . , i d , p , . . . , p d ) for which i j = i k when ( j k ) is an edge in D and p j = p k when ( j k ) is an edge in D . The summand (e i · e p )(e i · e p ) · · · (e i d · e p d ) isequal to 1 if i j = p j for all j , and 0 otherwise. Setting i j equal to p j corresponds to overlayingthe diagrams D and D . So (2.14) is equal to the number of tuples ( i , i , . . . , i d ) with1 ≤ i , i , . . . , i d ≤ n , and i j = i k whenever ( j k ) is an edge in D or in D . If k is the numberof cycles, then we can choose exactly k of indices i , i , . . . , i d freely in the set { , , . . . , n } ,and the other indices are uniquely determined by these choices. This proves that (2.14) isequal to n k .The proof of the following proposition is similar to the proof of Proposition 2.6. Proposition 2.10.
Suppose that E is a Brauer diagram on d = 2 e vertices, and S d = P D T D , where the sum is over all Brauer diagrams D on d vertices. Then we have T E · S d = n ( n + 2) · · · ( n + d − .Proof. Let E be a Brauer diagram on d = 2 e vertices and let E ′ be the diagram obtainedfrom E by adding the edge ( d + 1 d + 2). Recall that in the proof of Proposition 2.6, we onstructed diagrams D , D , . . . , D d +1 from the Brauer diagram D on d vertices. Note thatif 1 ≤ i ≤ d , we get T E ′ · T D i = T E · T D because we get the diagram of T E ′ · T D i from that of T E · T D by changing one k -cycle to a ( k + 2)-cycle. We also have T E ′ · T D d +1 = n ( T E · T D ) aswe are adding one 2-cycle. This shows that T E ′ · S d +1 = ( n + d ) T E · S d . The proposition thenfollows by induction and symmetry.
Example e = 2, we get(2.15) • •• • · (cid:18) • •• • + • •• • + • ❅❅❅❅ • ⑦⑦⑦⑦ • • (cid:19) = • •• • + • •• • + • ❅❅❅❅ • ⑦⑦⑦⑦ • • = n + n + n = n ( n + 2) . Let S n − = { v ∈ V | k v k = 1 } be the unit sphereequipped with the O( V )-invariant volume form dµ that is normalized such that R S n − dµ = 1. Proposition 2.12.
If we integrate (2.16) v ⊗ e = v ⊗ v ⊗ · · · ⊗ v | {z } e over S n − then we get R S n − v ⊗ e dµ = n ( n +2) ··· ( n +2 e − S e .Proof. Let U = R S n − v ⊗ e dµ . Since U is O( V )-invariant, it is a linear combination ofBrauer diagrams. The tensor U is also invariant under the action of the symmetric groupΣ e , where the action of the symmetric group is given in (2.3). This shows that each Brauerdiagram appears with the same coefficient in U . So we have U = C S e where C is someconstant. Let D be some Brauer diagram on 2 e vertices. The value of C is obtained from(2.17) 1 = Z S n − dµ = Z S n − ( T D · v ⊗ e ) dµ = T D · Z S n − v ⊗ e dµ == C ( T D · S e ) = Cn ( n + 2) · · · ( n + 2 e − .
3. Computations with 3-way tensors.3.1. Colored Brauer diagrams.
We now consider 3 Euclidean R -vector spaces R , G and B of dimension p , q and r respectively. The tensor product space V = R ⊗ G ⊗ B is arepresentation of H := O( R ) × O( G ) × O( B ). We keep the notations introduced in Sections1 and 2 based on the fact that tensor product of vector spaces is a vector space itself. Weare interested in H -invariant tensors in V ⊗ d . We have an explicit linear isomorphism ψ : R ⊗ d ⊗ G ⊗ d ⊗ B ⊗ d → V ⊗ d defined by(3.1) ψ (cid:0) ( a ⊗ · · · ⊗ a d ) ⊗ ( b ⊗ · · · ⊗ b d ) ⊗ ( c ⊗ · · · ⊗ c d ) (cid:1) = ( a ⊗ b ⊗ c ) ⊗ · · · ⊗ ( a d ⊗ b d ⊗ c d ) , where a i ∈ R, b i ∈ G and c i ∈ B for i = 1 , . . . , d. Restriction to H -invariant tensors gives anisomorphism(3.2) ψ : ( R ⊗ d ) O( R ) ⊗ ( G ⊗ d ) O( G ) ⊗ ( B ⊗ d ) O( B ) → ( V ⊗ d ) H . It follows from Theorem 2.5 that the space ( R ⊗ d ) O( R ) of invariant tensors is spanned bytensors corresponding to Brauer diagrams on the set 1 , , . . . , d. We will use red edges for hese diagrams. The space ( G ⊗ d ) O( G ) is spanned by tensors corresponding to green Brauerdiagrams on the set 1 , , . . . , d and the space ( B ⊗ d ) O( B ) is spanned by tensors correspondingto blue Brauer diagrams on 1 , , . . . , d . Using the isomorphism (3.2), we see that ( V ⊗ d ) H isspanned by diagrams with vertices 1 , , , . . . , d and red, green and blue edges such that foreach of the colors we have a perfect matching. This means that each vertex has exactly onered, one green and one blue edge. Definition 3.1. A colored Brauer diagram of size d = 2 e is a graph with d verticeslabeled , , . . . , d and e red, e green and e blue edges such that for each color, the edges of thatcolor form a perfect matching. A colored Brauer diagram D on d vertices is an overlay of a red diagram D R , a green diagram D G and a blue diagram D B . To a colored Brauer diagram D we can associate an invarianttensor T D ∈ ( V ⊗ d ) H by(3.3) T D = ψ (cid:0) T D R ⊗ T D G ⊗ T D B (cid:1) . Proposition 3.2.
The space ( V ⊗ d ) H is spanned by all T D , where D is a colored Brauerdiagram on d vertices.Proof. It follows from Theorem 2.5 that the space ( R ⊗ d ) O( R ) is spanned by tensors T D R where D R is a red Brauer diagram on d vertices. Similarly, ( G ⊗ d ) O( G ) is spanned by tensors T D G and ( B ⊗ d ) O( B ) is spanned by tensors T D B where D G and D B are green and blue Brauerdiagrams on d vertices respectively. The space ( R ⊗ d ) O( R ) ⊗ ( G ⊗ d ) O( G ) ⊗ ( B ⊗ d ) O( B ) is spannedby tensors of the form T D R ⊗ T D G ⊗ T D B . Via the isomorphism ψ in (3.2), ( V ⊗ d ) H is spannedby all T D = ψ ( T D R ⊗ T D G ⊗ T D B ) where D is a colored Brauer diagram that is the overlay of D R , D G and D B .Using the bijections from Lemma 2.1, every colored Brauer diagram D on d vertices corre-sponds to a linear map L D : V ⊗ d → R given by L D ( A ) = T D · A for all A ∈ H. And thislinear map corresponds to a multilinear map V d → R defined by M D ( A , A , · · · , A d ) = L D ( A ⊗ A ⊗ · · · ⊗ A d ) for all tensors A , A , . . . , A d ∈ V = R ⊗ G ⊗ B . Corollary 3.3.
There are bijections between the following sets:
1. ( V ⊗ d ) H ; the set of H -invariant linear maps V ⊗ d → R ; the set of H -invariant multilinear maps V d → R .Proof. The proof is the same as for Corollary 2.2, but with O( V ) replaced by H .For example, the colored Brauer diagram D :(3.4) 1 ❂❂❂❂❂ ✁✁✁✁✁ H -invariant linear map L D : V ⊗ = ( R ⊗ G ⊗ B ) ⊗ → R defined by(3.5) ( a ⊗ b ⊗ c ) ⊗ ( a ⊗ b ⊗ c ) ⊗ ( a ⊗ b ⊗ c ) ⊗ ( a ⊗ b ⊗ c ) ( a · a )( a · a )( b · b )( b · b )( c · c )( c · c ) . n a similar way, we define an H -invariant multilinear map M D : V ⊗ d → R for every coloredBrauer diagram D of size d (i.e., with d vertices). We can view M D as a tensor in ( V ⋆ ) ⊗ d ∼ = V ⊗ d . Viewed as a tensor in V ⊗ d we will denote it by T D .If D is a colored Brauer diagram of size d , then the polynomial function defined on V by(3.6) T 7→ M D ( T , T , . . . , T | {z } d )will be denoted by P D ( T ). The function P D is an H -invariant polynomial function on V ofdegree d . Note that P D does not depend on the labeling of the vertices of D . For example, ifwe remove the labeling from the diagram D in (3.4) we get an unlabeled diagram(3.7) D : • ❅❅❅❅ • ⑦⑦⑦⑦ • • and we define P D = P D . In coordinates, if we write T = P pi =1 P qj =1 P rk =1 t ijk e i ⊗ e j ⊗ e k ∈ R ⊗ G ⊗ B , then we have(3.8) P D ( T ) = p X a =1 p X b =1 q X c =1 q X d =1 r X e =1 r X f =1 t ace t adf t bde t bcf . Proposition 3.4.
The space of H -invariant polynomial functions on V = R ⊗ G ⊗ B isspanned by all P D where D is a colored Brauer diagram.Proof. Let ( V ⊗ d ) ⋆ be the space of multilinear maps, and R [ V ] d be the space of homoge-neous polynomial functions on V of degree d . We have a linear map γ : ( V ⊗ d ) ⋆ → R [ V ] d defined as follows: If M : V d → R is multilinear, then P = γ ( M ) is given by(3.9) P ( T ) = M ( T , T , . . . , T | {z } d ) . for all T ∈ V . For a colored Brauer diagram D we have by definition γ ( M D ) = P D (see (3.6)). The surjective map γ restricts to a surjective linear map of H -invariant sub-spaces (( V ⊗ d ) ⋆ ) H → R [ V ] Hd , which is also surjective by [8, Lemma 4.2.7]. Since (( V ⊗ d ) ⋆ ) H isspanned by the tensors M D where D is a colored Brauer diagram, R [ V ] Hd is spanned by all P D = γ ( M D ).If D ` E is the disjoint union of colored Brauer diagrams, then we have P D ` E = P D P E . Corollary 3.5.
The ring of polynomial H -invariant polynomial functions on V is generatedby invariants of the form P D where D is a connected colored Brauer diagram.Proof. By Proposition 3.4 the space of H -invariant polynomials is spanned by invariantsof the form P D where D is a colored Brauer diagram. We can write D = D ` D ` · · · ` D k where D i is a connected colored Brauer diagram for every i . We have(3.10) P D = P D P D · · · P D k . Definition 3.6.
We can define L D , T D , M D and P D when D is a linear combination ofdiagrams by assuming that these depend linearly on D . For example, if D = λ D + λ D + · · · + λ k D k , then P D = λ P D + λ P D + · · · + λ k P D k where λ i ∈ R for i = 1 , . . . , k. .2. Generators of polynomial tensor invariants. There is only one connected coloredBrauer diagram on 2 vertices:(3.11) •• . There are 4 connected colored (unlabeled) Brauer diagrams on 4 nodes:(3.12) • •• • • •• • • •• • • ❅❅❅❅ • ⑦⑦⑦⑦ • • There are 11 connected colored Brauer diagrams on 6 nodes:(3.13) • • •• • • [3] • ❄❄❄❄❄ • ❄❄❄❄❄ • ❄❄❄❄❄ • • ⑧⑧⑧⑧⑧⑧⑧⑧⑧⑧ • [3] • ❂❂❂❂ • ✁✁✁✁ • •• ✁✁✁✁ • ❂❂❂❂ • ❃❃❃❃❃❃❃❃❃❃❃ • ❃❃❃❃❃ • ❃❃❃❃❃ •• • • • ❃❃❃❃❃ • ❃❃❃❃❃ ❃❃❃❃❃ •• • [3]Here, [3] means that by permuting the colors we get 3 pairwise nonisomorphic colored graphs.For d = 2 e vertices, the number of connected trivalent colored graphs is given in the followingtable: d Example T , T ∈ R n × n × n given by T = 1 n n X i =1 e i ⊗ e i ⊗ e i and(3.14) T = 1 n √ n n − X i =0 n − X j =0 n − X k =0 e ni + j +1 ⊗ e nj + k +1 ⊗ e nk + i +1 . (3.15)Any flattening of T and T is an n × n matrix whose singular values are equal to n withmultiplicity n . If every vertex in a diagram is adjacent to a double or triple edge, then thecorresponding tensor invariant cannot distinguish T from T . In the table below, we see thatonly the tetrahedron diagram can distinguish T and T . This invariant captures informationfrom the tensor that cannot be seen in any flattening. •• • •• • • •• • • •• • • ❅❅❅❅ • ⑦⑦⑦⑦ • •T n − n − n − n − T n − n − n − .3. Complexity of Tensor Invariants. A polynomial tensor invariant corresponding to acolored Brauer diagram can be computed from subdiagrams.
Example • •• • we could first compute the partial colored Brauer diagram(3.17) •• . This partial diagram corresponds to a (symmetric) tensor in R ⊗ R . If T = ( t ijk ) then thisdiagram corresponds to a p × p matrix A = ( a ij ) where a ij = P qk =1 P rℓ =1 t ikℓ t jkℓ . In practice,one can compute A by first flattening T to a p × ( qr ) matrix B and using A = BB t , where B t is the transpose of B . The space complexity of this operation is O ( pqr + p ) (we just haveto store the tensor T and the matrix A ) and the time complexity is O ( p qr ), because for eachpair ( i, j ) with 1 ≤ i, j ≤ p, we have to do O ( qr ) multiplications and additions. Finally, wecompute the invariant as an inner product:(3.18) • •• • = •• · •• . The space complexity of this step is O ( p ) and the time complexity is O ( p ). We concludethat the space complexity of computing (3.16) is O ( pqr + p ) and the time complexity is O ( p qr ). The theoretical time complexity bounds could be improved if we use fast matrixmultiplication (such as Strassen’s algorithm). Example • ❅❅❅❅ • ⑦⑦⑦⑦ • • is more difficult to compute. We first compute the p × p × q × q tensor U corresponding tothe diagram(3.20) •• . The space complexity of this computation is O ( p q + pqr ) and the time complexity is O ( p q r ).Finally we compute(3.21) • ❅❅❅❅ • ⑦⑦⑦⑦ • • = •• · • ❂❂❂❂❂ • ✁✁✁✁✁ . n explicit formula for this tensor invariant is given by(3.22) p X i =1 p X j =1 q X k =1 q X ℓ =1 U ijkℓ U ijℓk . The space complexity of this step is O ( p q ) and the time complexity is O ( p q ) as well.Combining the two steps, we see that the time complexity of computing the tetrahedroninvariant (3.19) is O ( p q r ) and space complexity O ( p q + pqr ). This is the approach wewould use if p ≤ q ≤ r . If p ≥ q ≥ r then a more efficient algorithm is obtained by switchingred and blue. In that case we get time O ( pq r ). Example • ❂❂❂❂ • ✁✁✁✁ • •• ✁✁✁✁ • ❂❂❂❂ we first compute(3.24) • ❂❂❂❂ •• ✁✁✁✁ by contracting the p × p × q × q tensor U with T . This step requires O ( p q + pqr ) in memoryand O ( p q r ) in time. From this we compute(3.25) • ❂❂❂❂ • ✁✁✁✁ • •• ✁✁✁✁ • ❂❂❂❂ = • ❂❂❂❂ •• ✁✁✁✁ · • ❂❂❂❂ •• ✁✁✁✁ . Example • ❃❃❃❃❃❃❃❃❃❃❃ • ❃❃❃❃❃ • ❃❃❃❃❃ •• • we can first compute(3.27) ••• which costs O ( p qr ) in memory and O ( p q r ) in time, and then we have(3.28) • ❃❃❃❃❃❃❃❃❃❃❃ • ❃❃❃❃❃ • ❃❃❃❃❃ •• • = ••• · • ✳✳✳✳✳✳✳✳✳ •• ✏✏✏✏✏✏✏✏ . t becomes clear that the invariants that correspond to large diagrams can be hard to computebecause of memory and time limitations. Some tensor invariants require large tensors inintermediate steps of the computation. There is a method to improve the memory and timerequirements with some loss of the accuracy of the result. One can use the Higher OrderSingular Value Decomposition (HOSVD) to reduce a p × q × r tensor T to a core tensor T ′ of size p ′ × q ′ × r ′ where p ′ ≤ p , q ′ ≤ q and r ′ ≤ r . The HOSVD is a generalization of thesingular value decomposition of a matrix, see [19]. If r > pq then the p × q × r tensor can bereduced to a p × q × pq tensor using HOSVD without any loss at all. HOSVD is a special caseof Tucker decomposition [30]. Details of these decomposition methods are beyond the scopeof this paper.
4. Approximations of the Spectral Norm.4.1. The spectral norm.
The spectral norm kT k σ of a tensor T ∈ V = R ⊗ G ⊗ B isdefined by kT k σ := max {|T · ( x ⊗ y ⊗ z ) | | k x k = k y k = k z k = 1 } . We can view T as an ℓ ∞ norm on the product of unit spheres S p − × S q − × S r − . The ℓ ∞ norm is a limit of ℓ d -normswhere d → ∞ . We have kT k σ := lim d →∞ kT k σ,d where(4.1) kT k σ,d := (cid:18)Z S p − × S q − × S r − |T · ( x ⊗ y ⊗ z ) | d dµ (cid:19) /d . Suppose that d = 2 e is even. We have |T · ( x ⊗ y ⊗ z ) | d = T ⊗ d · ( x ⊗ y ⊗ z ) ⊗ d and(4.2) Z S p − × S q − × S r − |T · ( x ⊗ y ⊗ z ) | d dµ = T ⊗ d · Z S p − × S q − × S r − ( x ⊗ y ⊗ z ) ⊗ d dµ. Up to permutation of the tensor factors, we have the following equality(4.3) Z S p − × S q − × S r − ( x ⊗ y ⊗ z ) ⊗ d dµ = Z S p − × S q − × S r − ( x ⊗ d ⊗ y ⊗ d ⊗ z ⊗ d ) dµ == (cid:18)Z S p − x ⊗ d dµ (cid:19) ⊗ (cid:18)Z S q − y ⊗ d dµ (cid:19) ⊗ (cid:18)Z S r − z ⊗ d dµ (cid:19) . We will normalize the norm k · k σ,d so that the value of simple tensors of unit length is equalto 1. So we define a norm k · k σ,d by kT k σ,d = kT k σ,d k x ⊗ y ⊗ z k σ,d , where x, y, z are unit vectors. We have lim d →∞ kT k σ,d = kT k σ . We will compute kT k σ,d for d = 2 and d = 4.For any even d , we let S R,d ∈ R ⊗ d be the sum of all red Brauer diagrams on d vertices.For example,(4.4) S R, = • •• • + • ❅❅❅❅ • ⑦⑦⑦⑦ • • + • •• • . Similarly S G,d and S B,d are the respective sums of all green and blue Brauer diagrams on d nodes. .2. The approximation for d = 2 . Proposition 4.1.
The norm k · k σ, is equal to the Euclidean (or Frobenius) norm k · k .Proof. If we let e = 1, then it follows from Proposition 2.12 that(4.5) Z S p − x ⊗ x dµ = p S R, , Z S q − y ⊗ y dµ = q S G, , and Z S r − z ⊗ z dµ = r S B, . Therefore, we get(4.6) Z S p − × S q − × S r − ( x ⊗ y ⊗ z ) ⊗ dµ = pqr S R, ⊗ S G, ⊗ S B, . In diagrams, we get Z S p − × S q − × S r − ( x ⊗ y ⊗ z ) ⊗ dµ = pqr •• . So we have kT k σ, = ( T ⊗ T ) · (cid:18) pqr •• (cid:19) = pqr T · T = pqr kT k . and kT k σ, = √ pqr kT k . It follows that kT k σ, is equal to the Euclidean norm k · k . d = 4 . Theorem 4.2.
We have that kT k σ, = P D ( T ) where (4.7) D = 3 • •• • + 6 • •• • + 6 • •• • + 6 • •• • + 6 • ❅❅❅❅ • ⑦⑦⑦⑦ • • and P D is defined as in (3.8) .Proof. If we employ the Proposition 2.12 for e = 2, then we get(4.8) Z S p − × S q − × S r − ( x ⊗ y ⊗ z ) ⊗ dµ == (cid:18)Z S p − x ⊗ dµ (cid:19) ⊗ (cid:18)Z S q − y ⊗ dµ (cid:19) ⊗ (cid:18)Z S r − z ⊗ dµ (cid:19) = p ( p +2) q ( q +2) r ( r +2) S R, ⊗S G, ⊗S B, . We calculate(4.9) S R, ⊗ S G, = (cid:18) • •• • + • ❅❅❅❅ • ⑦⑦⑦⑦ • • + • •• • (cid:19) ⊗ (cid:18) • •• • + • ❅❅❅❅ • ⑦⑦⑦⑦ • • + • •• • (cid:19) = • •• • + • ❅❅❅❅ • ⑦⑦⑦⑦ • • + • •• • + • ❅❅❅❅ • ⑦⑦⑦⑦ • • + • ❅❅❅❅ ❅❅❅❅ • ⑦⑦⑦⑦⑦⑦⑦⑦ • • + • ❅❅❅❅ • ⑦⑦⑦⑦ • • + • •• • + • ❅❅❅❅ • ⑦⑦⑦⑦ • • + • •• • = 3 • •• • + 6 • •• • . n this calculation, we have omitted the labeling of the vertices. The last equality is only trueif we symmetrize the right-hand side over all 24 permutations on the 4 vertices.(4.10) S R, ⊗ S G, ⊗ S B, = (cid:18) • •• • + 6 • •• • (cid:19) ⊗ (cid:18) • •• • + • ❅❅❅❅ • ⑦⑦⑦⑦ • • + • •• • (cid:19) =3 • •• • + 3 • ❅❅❅❅ • ⑦⑦⑦⑦ • • + 3 • •• • +6 • •• • + 6 • ❅❅❅❅ • ⑦⑦⑦⑦ • • + 6 • •• • = 3 • •• • + 6 • •• • + 6 • •• • + 6 • •• • + 6 • ❅❅❅❅ • ⑦⑦⑦⑦ • • . For this calculation, one should symmetrize the red-green diagrams over all 24 permutations.However, if we do not do this the result will not change because the blue diagrams aresymmetrized over all permutations. We conclude that kT k σ, = P D ( T ) where(4.11) D = 3 • •• • + 6 • •• • + 6 • •• • + 6 • •• • + 6 • ❅❅❅❅ • ⑦⑦⑦⑦ • • . We say that a norm k · k is a degree d norm if kT k d is a polynomial function on T of degree d . The norm k · k σ,d is a norm ofdegree d . In particular, k · k σ, is a norm of degree 4. In this section we study other norms ofdegree 4 that approximate the spectral norm.Consider the degree 2 covariant U = V → G ⊗ ⊗ B ⊗ defined by U = U ( T ) = • ❂❂❂❂❂✁✁✁✁✁ • + •• . We have(4.12) 0 ≤ • ❂❂❂❂❂✁✁✁✁✁ • · • ❂❂❂❂❂✁✁✁✁✁ • = •• · •• = • •• • and(4.13) • ❂❂❂❂❂✁✁✁✁✁ • · •• = • ❅❅❅❅ • ⑦⑦⑦⑦ • • (in this calculation, the diagrams represent their evaluations on T ⊗ T ⊗ T ⊗ T ) . So we get(4.14) • •• • + • ❅❅❅❅ • ⑦⑦⑦⑦ • • = ( U · U ) ≥ . Permuting the colors also gives(4.15) • •• • + • ❅❅❅❅ • ⑦⑦⑦⑦ • • ≥ . t follows from (4.12) that(4.16) • •• • ≥ . Adding (4.14), (4.15) and (4.16) gives(4.17) • •• • + • •• • + • •• • + 2 • ❅❅❅❅ • ⑦⑦⑦⑦ • • ≥ . Definition 4.3.
We define (4.18) kT k = 15 / (cid:18) • •• • + • •• • + • •• • + 2 • ❅❅❅❅ • ⑦⑦⑦⑦ • • (cid:19) / . We will show that kT k is a norm.
Lemma 4.4.
Suppose that f ( x ) = f ( x , x , . . . , x m ) ∈ R [ x , . . . , x m ] is a homogeneous poly-nomial of degree d > with f ( x ) > for all nonzero x ∈ R m and the Hessian matrix ( ∂ f∂x i ∂x j ) is positive semi-definite. Then k x k := f ( x ) /d is a norm on R m .Proof. It is clear that k x k = 0 if and only if x = 0. We have f ( λx ) = λ d f ( x ) whichimplies that d must be even. We get f ( λx ) /d = ( λ d f ( x )) /d = | λ | f ( x ) /d . Because theHessian is positive semi-definite, the function f ( x ) is convex and the set B = { x | f ( x ) ≤ } is convex, which is also the unit ball for k x k .If x, y ∈ R n are nonzero, then we have x k x k , y k y k ∈ B and therefore x + y k x k + k y k = k x k k x k + k y k · x k x k + k y k k x k + k y k · y k y k ∈ B. So (cid:13)(cid:13)(cid:13)(cid:13) x + y k x k + k y k (cid:13)(cid:13)(cid:13)(cid:13) = k x + y k k x k + k y k ≤ . This proves the triangle inequality.
Proposition 4.5.
The function k · k is a norm.Proof.
From (4.17) it follows that k · k is nonnegative. If kT k = 0 for some tensor, thenwe have equality in (4.17), (4.16) and (4.12). This implies that •• = 0 . If A is a p × qr flattening of T , then we have A t A = 0 where A t is the transpose of A . Itfollows that A = 0 and T = 0. To show that k · k satisfies the triangle inequality, we haveto show that the Hessian of h = k · k is nonnegative.Up to a constant, h is equal to (4.17). We can write h ( T + E ) = h ( T , E ) + h ( T , E ) + h ( T , E ) + h ( T , E ) + h ( T , E ) here h i ( T , E ) is a polynomial function of degree 4 − i in T and degree i in E . Here h ( T , E ) = kT k and h ( T , E ) = kEk . The function h ( T , E ) is linear in E and this linear function isthe gradient at T . The function h ( T , E ) is quadratic function in E and is, up to a constant,the Hessian of h at T . So we have to show that h ( T , E ) ≥ T and E .Let us write a black vertex for the tensor T and a white vertex for the tensor E . Weget the Hessian of a function in T by summing all the possible ways of replacing two blackvertices by two white vertices. The Hessian of the left-hand side of (4.17) is(4.19) h ( T , E ) = ◦ ◦• • + ◦ ◦• • + ◦ ◦• • + 2 ◦ ❅❅❅❅ ◦ ⑦⑦⑦⑦ • • + ◦ •◦ • + ◦ •◦ • + ◦ •◦ • + 2 ◦ ❅❅❅❅ • ⑦⑦⑦⑦ ◦ • + ◦ •• ◦ + ◦ •• ◦ + ◦ •• ◦ + 2 ◦ ❅❅❅❅ • ⑦⑦⑦⑦ • ◦ . Let W : V ⊗ V → G ⊗ ⊗ B ⊗ be defined by W ( T , E ) = • ❂❂❂❂❂✁✁✁✁✁ ◦ + ◦ ❂❂❂❂❂✁✁✁✁✁ • + •◦ + ◦• . We compute(4.20) 0 ≤ ( W · W ) = ◦ ❅❅❅❅ ◦ ⑦⑦⑦⑦ • • + ◦ ❅❅❅❅ • ⑦⑦⑦⑦ • ◦ + ◦ ◦• • + ◦ •• ◦ . and we have(4.21) 0 ≤ ◦• · ◦• = ◦ •◦ • . Adding (4.20) and (4.21) and all expressions obtained by cyclically permuting the colors red,green and blue yields (4.19) This proves that h ( T , E ) ≥ k · k is a norm. Definition 4.6.
A spectral-like norm is a norm k·k X in R p × q × r with the following properties: kT k X = 1 if T is a rank tensor with kT k = 1 ; kT k X < if T is a tensor of rank > with kT k = 1 . Examples of spectral-like norms are the spectral norm k · k σ , the norms k · k σ,d for d = 2 , , . . . and k · k . Definition 4.7.
A nuclear-like norm is a norm k·k Y in R p × q × r with the following properties: kT k Y = 1 if T is a rank tensor with kT k = 1 ; kT k Y > if T is a tensor of rank > with kT k = 1 . A norm k · k Y is the dual of another norm k · k X if kSk Y = max {S · T : kT k X ≤ } . norm k · k Y is the dual of k · k X if and only if k · k X is the dual of k · k Y . The dual of aspectral-like norm is a nuclear-like norm.We are particularly interested in norms that are powerful in distinguishing low rank tensorsfrom high rank tensors. Spectral-like norms are normalized such that rank 1 tensors of unitEuclidean length have norm 1. A possible measure for the rank discriminating power of aspectral-like norm k · k X is the expected value of E ( kT k X ) where T ∈ S pqr − is a randomunit tensor in R ⊗ G ⊗ B (with the uniform distribution over the sphere). A smaller valueof E ( kT k X ) means more discriminating power, which is better. In this sense, the spectralnorm is the best norm, because for spectral-like norms k · k X we have kT k X ≥ kT k σ , so E ( kT k X ) ≥ E ( kT k σ ). We may not be able to compute the value E ( kT k X ) for many norms k · k X . If we fix the size of the tensor we can estimate E ( kT k X ) by taking the average ofrandom unit vectors x .We will compare the norms k · k σ, and k · k , which both have degree 4. Although we arenot able to give closed formulas for E ( kT k σ, ) and E ( kT k ), we can compute E ( kT k σ, ) and E (cid:16) kT k (cid:17) . First we note that(4.22) E ( T ⊗ T ⊗ T ⊗ T ) = 1 pqr ( pqr + 2) (cid:18) • •• • + • •• • + • ❅❅❅❅ ❅❅❅❅ ❅❅❅❅ • ⑦⑦⑦⑦⑦⑦⑦⑦⑦⑦⑦⑦ • • (cid:19) , because we can view T as a random unit tensor in V ⊗ V ⊗ V ⊗ V and apply Proposition 2.12.To compute E ( kT k σ, ) we compute the inner product between (4.22) and (4.7). To performthis computation we overlay the two diagrams and count the number of cycles for each color. • •• • · • •• • = • •• • • •• • • •• • = pq r . The result is(4.23) E ( kT k σ, ) = pqr ( pqr +2) (( p q r + 2 pqr ) + 2( pq r + p qr + pqr ) + 2( p qr + pq r + pqr )++ 2( p q r + pqr + pqr ) + 2( p qr + pq r + pqr ) = pqr + 2( pq + pr + qr ) + 4( p + q + r ) + 89( pqr + 2) . A similar computation shows that(4.24) E (cid:0) kT k (cid:1) = ( pq + pr + qr ) + 3( p + q + r ) + 35( pqr + 2) . The following proposition shows that, in some sense, k · k is better than k · k σ, as anapproximation to the spectral norm k · k σ . Proposition 4.8. If p, q, r ≥ then we have E ( kT k σ ) ≤ E ( kT k ) ≤ E ( kT k σ, ) for a randomtensor T sampled from the uniform distribution on the unit sphere. The inequality is strictwhen two of the numbers p, q, r are at least 2. roof. We calculate(4.25) E ( kT k σ, ) − E ( kT k ) == pqr + 2( pq + pr + qr ) + 4( p + q + r ) + 89( pqr + 2) − ( pq + pr + qr ) + 3( p + q + r ) + 35( pqr + 2) == 5 pqr + ( pq + qr + rp ) − p + q + r ) + 1345( pqr + 2 == 5( p − q − r −
1) + 6(( p − q −
1) + ( p − r −
1) + ( q − r − pqr + 2) Remark p = q = r = n , then asymptotically, we have that E ( kT k σ, ) = O (1) and E ( kT k ) = O ( n ) .
5. Low rank amplification.
As motivation, we will first consider a map from matrices tomatrices that enhances the low rank structure.
Suppose A is a matrix with singular values λ ≥ λ ≥ · · · ≥ λ r ≥
0. Then we can write A = U Σ V ∗ where U, V are orthogonal, Σ is a diagonal matrixwith diagonal entries λ , . . . , λ r , and V ∗ is the conjugate transpose of V. We have(5.1) AA ∗ A = ( U Σ V ∗ )( V Σ ∗ U ∗ )( U Σ V ∗ ) = U Σ V ∗ . The matrix AA ∗ A has singular values λ ≥ λ ≥ · · · λ r ≥
0. If λ > λ then the ratio of thetwo largest singular values increases from λ /λ to λ /λ . If we define a map θ by(5.2) θ ( A ) = AA ∗ A k AA ∗ A k , where k · k is the Euclidean (Frobenius) norm, then lim n →∞ θ n ( A ) will converge to a rank 1matrix B = U DV ∗ where(5.3) D = · · · . Note that the convergence is very fast. After n iterations of θ , the ratio of the two largestsingular values is (cid:0) λ λ (cid:1) n . We have that A · B = Σ · D = λ is the spectral norm of A and λ B is the best rank 1 approximation of A in the following sense: if C is a rank 1 matrix such that k A − C k is minimal, then C = λ B .The map θ increases the highest singular value relative to the other singular values. Inthis sense, θ amplifies the sparse structure of the matrix (meaning low rank in this context).The map θ is related to the 4-Schatten norm, defined by k A k s, = trace(( AA ∗ ) ) . Namely,the gradient of the function k A k s, is 4 AA ∗ A and the gradient of the function k A k s, is equalto AA ∗ A up to a scalar function. .2. Tensor amplification. We will now consider amplification of the low rank structureof tensors. For this we take the gradient of a spectral-like norm. Let h = k · k . As before,we can write h ( T + E ) = h ( T , E ) + h ( T , E ) + h ( T , E ) + h ( T , E ) + h ( T , E ), where h i hasdegree i in E and degree 4 − i in T . Now h ( T , E ) = kT k , the function E 7→ h ( T , E ) is thegradient of h at T , and h ( T , E ) is the Hessian that we have already computed. To find aformula for the gradient h ( T , E ) we express h ( T ) in diagrams and replace each diagram byall diagrams obtained by replacing one of the closed vertices by an open vertex. Using (4.18)we get(5.4) h ( T ) = kT k = 15 (cid:18) • •• • + • •• • + • •• • + 2 • ❅❅❅❅ • ⑦⑦⑦⑦ • • (cid:19) . The gradient is now equal to(5.5) ( ∇ h )( T ) = 45 (cid:18) • •• ◦ + • •• ◦ + • •• ◦ + 2 • ❅❅❅❅ • ⑦⑦⑦⑦ • ◦ (cid:19) . We can also view these diagrams with an open vertex as partial colored Brauer diagrams byremoving the open vertex. For example,(5.6) • ❅❅❅❅ • ⑦⑦⑦⑦ • ◦ = • ❂❂❂❂ •• ✁✁✁✁ . Let Φ ( T ) = ( ∇ h )( T ). We view Φ as a polynomial map from V = R ⊗ G ⊗ B to itself. Thismap enhances the low rank structure of a tensor T .In a similar fashion, we can associate an amplification map Φ σ, to the norm k · k σ, . Using(4.7) and similar calculations as before, we get(5.7) Φ σ, ( T ) = 49 (cid:18) • •• ◦ + 2 • •• ◦ + 2 • •• ◦ + 2 • •• ◦ + 2 • ❅❅❅❅ • ⑦⑦⑦⑦ • ◦ (cid:19) . As we discussed in Section1.1, Alternating Least Squares (ALS) is a standard approach to find low rank approximationsof tensors. For rank 1, this algorithm is particularly simple. For a tensor
T ∈ R p × q × r wetry to find a rank one tensor a ⊗ b ⊗ c such that kT − a ⊗ b ⊗ c k is minimal. Here a ∈ R p , b ∈ R q and c ∈ R r . Unlike for higher rank, a best rank 1 approximation always exists. TheAlternating Least Squares algorithm works as follows. We start with an initial guess a ⊗ b ⊗ c .Then we fix b and c and update the vector a such that kT − a ⊗ b ⊗ c k is minimal. This is aleast squares regression problem that is easy to solve. Next, we fix a and c and update b , andthen we fix a and b and update c . We repeat the process of updating a, b, c until the desiredlevel of convergence is achieved. Numerical experiments were performed using the software MATLAB , along with the cp als implementation of the ALS algorithm from the package
TensorToolbox ([1]). ALS is sensitive to the choice of the initial guess.The default initialization for cp als is to use a random initial guess. We will also considera method that we call the
Quick Rank 1 method. For a matrix it is easy to find the best rank1 approximation from the singular value decomposition. If a real matrix M has a singular alue decomposition M = P si =1 λ i a i b Ti where a , . . . , a s , b , . . . , b s are unit orthogonal vectorsand λ ≥ λ ≥ . . . λ s ≥ M is λ a b T and M · a b T = λ is the spectral norm of M . (The best rank 1 approximation is unique when λ > λ .)For the Quick Rank 1 method, we use the matrix case to find an initial rank 1 approx-imation of a given tensor T of size p × q × r . We flatten (unfold) this tensor to a p × qr matrix T (1) . Then we find a rank 1 approximation of this matrix as described above. Let T (1) ≈ βad T where a, d are orthogonal vectors and β ∈ R . We convert the vector d of dimen-sion qr to a q × r matrix D . Now we find the best rank 1 approximation of D ≈ γbc T suchthat b, c are orthogonal vectors and γ ∈ R . We will use ( βγ ) a ⊗ b ⊗ c as a rank 1 approximationto the tensor T .Tensor amplification can be used to obtain better initial guesses for ALS, so that betterrank 1 approximations can be found using fewer iterations in the ALS algorithm. We willconsider 4 different ways of choosing an initial guess for ALS:1. Random.
We choose a random initial guess for the rank 1 approximation.2.
Quick Rank 1.
We first use the quick rank 1 method described above.3. Φ σ, and Quick Rank 1. We apply the Quick Rank 1 method to Φ σ, ( T ).4. Φ and Quick Rank 1. We apply the Quick Rank 1 method to Φ ( T ).Rank 1 approximation methods given above can be generalized to higher ranks. Low ranktensor approximation problem is given in (1.4) and (1.5). Let T ∈ R p × p × p be a tensor oforder 3. We will look for a rank r ≥ S such that(5.8) kT − Sk is minimal with S = J Λ ; U (1) , U (2) , U (3) K where the factor matrices U ( i ) ∈ R p i × r for 1 ≤ i ≤ ∈ R r . ALS method starts with a random initial guess for the factor matrices. We first fix U (2) and U (3) to solve for U (1) , then fix U (1) and U (3) to solve for U (2) , and then fix U (1) and U (2) to solve for U (3) . This iterative process continues until some convergence criterion is satisfied.For the iterative Quick Rank 1 method , we first employ the Quick Rank 1 method toapproximate T with a rank 1 tensor λ a ⊗ b ⊗ c . The process continues iteratively; and ateach step Quick Rank 1 method is used to find a rank 1 approximation of
T − P si =1 λ i a i ⊗ b i ⊗ c i for 2 ≤ s ≤ r − . As in the rank 1 case, we use 4 different methods to choose an initial guess for the ALSmethod for the low rank r decomposition of T :1. Random.
We choose a random initial guess for the factor matrices.2.
Quick Rank 1.
We use an iterative approach based on Quick Rank 1 method asdescribed above. (Algorithm 5.1, k = 0 . )3. Φ σ, and Quick Rank 1. We iteratively apply the Quick Rank 1 method to Φ σ, ( T ) . (Algorithm 5.1, k = 1 . )4. Φ and Quick Rank 1. We iteratively apply the Quick Rank 1 method to Φ ( T ) . (Algorithm 5.1, k = 2 . )
6. Experiments.6.1. Rank 1 approximation.
In our experiments, we started with a random 30 × × T = a ⊗ b ⊗ c, where a, b, c ∈ R are random unit vectors, independently lgorithm 5.1 Low Rank approximation to tensor T of order 3 function rank r methods( T , r, k ) D = T s = 0 while s < r do s ← s + 1 if k = 0 then U ← D else if k = 1 then U ← Φ σ, ( D ) else U ← Φ ( D ) Approximate U with a unit rank 1 tensor via Quick rank 1 method: U ≈ λ s v s = λ s a s ⊗ b s ⊗ c s Update the coefficients λ , . . . , λ s such that kDk is minimal, where D = T − P si =1 λ i v i return decomposition T = S + D where S = P ri =1 λ i v i drawn from the uniform distribution on the unit sphere. We then added a random tensor E of size 30 × ×
30 with kEk = 10 to obtain a noisy tensor T n = T + E . The noise tensor E is chosen from the sphere in R × × ∼ = R of radius 10 with uniform distribution. Notethat there is more noise than the original signal. The signal to noise ratio is 20 log (1 /
10) = −
20 dB. We used four methods for rank 1 approximation. Each method gives a rank 1 tensor λa ′ ⊗ b ′ ⊗ c ′ . To measure how good the rank 1 approximation is to the original tensor T , wecompute the inner product T · ( a ′ ⊗ b ′ ⊗ c ′ ) = ( a ⊗ b ⊗ c ) · ( a ′ ⊗ b ′ ⊗ c ′ ) = ( a · a ′ )( b · b ′ )( c · c ′ ) , which we will call the fit. The fit is a number between 0 and 1 where 1 means a perfect fit.We created 1000 noisy tensors of size 30 × ×
30 as described above. We ran each of the4 methods to find the best rank 1 approximation for each of the 1000 tensors. For the randominitial guess method, we repeated the calculation 10 times with different random initial guessesand recorded the best fit, total number of ALS iterations, and total running time. All othermethods were only run once and the fit, total number of ALS iterations, and running timewere calculated. For all records, we took the average and standard deviation.There is a tolerance parameter ε in the ALS implementation in Tensor Toolbox . Thealgorithm terminates if the fit after an iteration increases by a factor smaller than 1 + ε . Forthe default value ε = 10 − we obtained the following results:It can be observed from Table 6.1 that a better fit is obtained by using tensor amplificationrather than a random initial guess. Even if we take the best case of repeating ALS for 10different random initial conditions, quick rank with amplification still yields a better fit. Thetotal number of ALS iterations with random initial guess is much larger than for the quick andom (10 runs) Max Fit Total
Quick Rank 1
Fit σ, and Quick Rank 1 Fit and Quick Rank 1
Fit
Table 6.1
A comparison of rank-1 approximation methods with tolerance parameter ε = 10 − rank 1 initialization, or quick rank 1 with tensor amplification. On average, the number ofiterations for the best run with random initialization is 10.44, which is much larger thanthe number of iterations after tensor amplification, which is 2. The running time is alsofavorable for the quick rank 1 initialization. Amplification gives a better fit for the rank 1approximation, while the running time has only marginally increased.If we change the tolerance to ε = 10 − then the number of iterations increases and theresults are given in Table 6.2. As shown in the table, the amplification Φ performs betterthan the amplification Φ σ, . This is expected, as the norm k · k is a better approximation forthe spectral norm than Φ σ, . We see that the amplification Φ combined with the quick rank1 method still yields a better fit than the best-out-of-10 runs with random initialization. Thenumber of iterations for the random initialization approximation with the best fit is 25.78 onaverage, while the average number of ALS iterations for Φ combined with quick rank 1 isonly 3 . We started with a random 40 × ×
40 unit tensor of rank2, T = a ⊗ b ⊗ c + a ⊗ b ⊗ c , where a , b , c , a , b , c ∈ R are random unit vectors,independently drawn from the uniform distribution on the unit sphere. We then added arandom tensor E of size 40 × ×
40 with kEk = 10 to obtain a noisy tensor T n = T + E .The noise tensor E is chosen from the sphere in R × × ∼ = R of radius 10 with uniformdistribution. Each method gives a rank 2 tensor S of size 40 × ×
40 and the fit of theapproximation is given by (
T · S ) / kSk . As in Section 6.1, we created 1000 noisy tensors ofsize 40 × ×
40 and we ran each of the 4 methods to find a best rank 2 approximationfor each tensor. Random initial guess method is repeated 10 times for each tensor and thebest fits, total number of iterations and total running times were recorded. The other threemethods were run only once and the fit, total number of ALS iterations, and running timewere recorded for each tensor. For the tolerance parameter ε = 10 − , the average and thestandard deviation of all the records are given in Table 6.3. andom (10 runs) Max Fit Total
Quick rank 1
Fit σ, and Quick Rank 1 Fit and Quick Rank 1
Fit
Table 6.2
A comparison of rank-1 approximation methods with tolerance parameter ε = 10 − Random (10 runs)
Max Fit Total
Quick Rank 1
Fit σ, and Quick Rank 1 Fit and Quick Rank 1
Fit
Table 6.3
A comparison of rank 2 approximation methods with tolerance parameter ε = 10 −
7. Conclusion.
Colored Brauer diagrams are a graphical way to represent invariant fea-tures in tensor data and can be used to visualize calculations with higher order tensors, andanalyse the computational complexity of related algorithms. We have used such graphicalcalculations to find approximations of the spectral norm and to define polynomial maps thatamplify the low rank structure of tensors. Such amplification maps are useful for finding bet-ter low rank approximations of tensors and are worthy of further study. We are interested instudying n -edge-colored large Brauer diagrams when n > . Acknowledgements. This work was partially supported by the National Science Foun-dation under Grant No. 1837985 and by the Department of Defense under Grant No.BA150235. Neriman Tokcan was partially supported by University of Michigan PrecisionHealth Scholars Grant No. U063159.
REFERENCES [1] B. W. Bader, T. G. Kolda and others,
MATLAB Tensor Toolbox, Version 2.6 , available online at , 20XX.[2] R. Brauer,
On algebras which are connected with the semisimple continuous groups , Annals of Mathemat-ics (1937), no. 4, 857–872.[3] D. Callan, A combinatorial survey of identities for the double factorial (2009), arXiv:0906.1317.[4] E. J. Cand`es, B. Recht,
Exact Matrix Completion via Convex Optimization , Foundations of ComputationalMathematics (2009), no. 6, https://doi.org/10.1007/s10208-009-9045-5 .[5] E. J. Cand`es and T. Tao, The Power of Convex Relaxation: Near-optimal Matrix Completion , IEEE Trans.Inf. Theor. (2010), no. 5, https://doi.org/10.1109/TIT.2010.2044061 .[6] H. Derksen, On the Nuclear Norm and the Singular Value Decomposition of Tensor , Foundations of Com-putational Mathematics (2016), no. 3, 779–811.[7] S. Friedland, L.-H. Lim, Nuclear norm of higher-order tensors , Mathematics of Computation (2018), no. 311, 1255–1281.[8] R. Goodman and N. R. Wallach, Representations and Invariants of Classical Groups , Cambridge UniversityPress, (1998).[9] A. Grothendieck,
Produits tensoriels topologiques et espaces nucl´eares , Mem. Amer. Math. Soc. (1955),no. 16.[10] J. H˚astad,
Tensor rank is NP-complete , Journal of Algorithms (1990), no. 4, 644–654.[11] J. H˚astad, Tensor rank is NP-complete , Automata, languages and programming (Stresa, 1989), LectureNotes in Comput. Sci. (1989), Springer, Berlin, 451–460.[12] C. J. Hillar and L. -H. Lim,
Most tensor problems are NP-hard , Journal of the ACM (2013), no. 6,Art. 45.[13] F. L. Hitchcock, The expression of a tensor or a polyadic as a sum of products , J. Math. Phys. (1927),no. 1, 164–189.[14] F. L. Hitchcock, Multiple invariants and generalized rank of a p-way matrix or tensor , J. Math. Phys. (1928), no. 1, 39–79.[15] T. G. Kolda and B. W. Bader, Tensor decompositions and applications , SIAM review (2009), no. 3,455–500.[16] X. Kong, A concise proof to the spectral and nuclear norm bounds through tensor partitions , Open Math-ematics (2019), 365-373.[17] J. M. Landsberg, Tensors: Geometry and Applications , American Mathematical Society (2012).[18] S. Lang,
Algebra , Graduate Texts in Mathematics (2002), 3rd ed., Springer-Verlag, New York.[19] L. De Lathauwer, B. De Moor,
A Multilinear Singular Value Decomposition , SIAM Journal on Matrix Anal-ysis and Applications (2000), no.4, 1253–1278, https://doi.org/10.1137/S0895479896305696 .[20] Z. Li, The spectral norm and the nuclear norm of a tensor based on tensor partitions , Matrix Anal. Appl. (2016), no. 4, 1440–1452.[21] Z. Li, Y. Nakatsukasa, and others, On Orthogonal Tensors and Best Rank-One Approximation Ratio , SIAMJournal on Matrix Analysis and Applications (2017), https://doi.org/10.1137/17M1144349 .[22] OEIS Foundation Inc. (2020), The On-Line Encyclopedia of Integer Sequences, http://oeis.org/A001147 .[23] OEIS Foundation Inc. (2019), The On-Line Encyclopedia of Integer Sequences, http://oeis.org/A002831 .[24] V. L. Popov and E. B. Vinberg, “Invariant theory,” in: Encyclopaedia of Mathematical Sciences (1994),A. Parshin and I. R. Shafarevich, eds., Springer-Verlag, Berlin.[25] L. Qi and S. Hu, Spectral Norm and Nuclear Norm of a Third Order Tensor (2019), arXiv:1909.01529.[26] A. Ramlatchan, M. Yang, and others,
A survey of matrix completion methods for recommendation systems ,Big Data Mining and Analytics (2018), no. 4, 308–323.
27] R. Schatten,
A Theory of Cross-Spaces , Princeton University Press (1950),Princeton, NJ.[28] A. P. Da Silva, P. Comon, and others,
A Finite Algorithm to Compute Rank-1 Tensor Approximations ,IEEE Signal Processing Letters (2016), no. 7, 959–963.[29] V. De Silva and L. -H.Lim, Tensor rank and the ill-posedness of the best low-rank approximation prob-lem , SIAM Journal on Matrix Analysis and Applications (2008), 10841127.[30] L. R. Tucker, Some mathematical notes on three-mode factor analysis , Psychometrika (1966), 279-311, https://doi.org/10.1007/BF02289464 .[31] H. Weyl, The Classical Groups. Their Invariants and Representations (1939), Princeton University Press.[32] L. K. Williams,
Invariant Polynomials on Tensors under the Action of a Product of Orthogonal Groups ,Ph.D. Thesis, University of Wisconsin–Milwaukee, 2013.[33] M. Yuan and C. Zhang,
On Tensor Completion via Nuclear Norm Minimization , Foundations of Compu-tational Mathematics (2016), no. 4, 1031–1068, https://doi.org/10.1007/s10208-015-9269-5 .[34] T. Zhang and G. H. Golub, Rank-One Approximation to High Order Tensors , SIAM Journal on MatrixAnalysis and Applications (2001), no. 2, 534–550, https://doi.org/10.1137/S0895479899352045 ..