Chebyshev polynomials and best rank-one approximation ratio
CChebyshev polynomials andbest rank-one approximation ratio
Andrei Agrachev Khazhgali Kozhasov André Uschmajew
Abstract.
We establish a new extremal property of the classicalChebyshev polynomials in the context of best rank-one approximationof tensors. We also give some necessary conditions for a tensor to be aminimizer of the ratio of spectral and Frobenius norms.
Introduction and Outline
The classical Chebyshev polynomials are known to have many extremal properties. Thefirst result was established by Chebyshev himself: he proved [3] that a univariate monicpolynomial with real coefficients that least deviates from zero on the interval [ − ,
1] mustbe proportional to a Chebyshev polynomial of the first kind. Later there were furtherdevelopments highlighting extremal properties of this class of univariate polynomials andits relevance for approximation theory; see [11, 17] and references therein. In this articlewe discover a new extremal property of Chebyshev polynomials of the first kind in thecontext of the theory of rank-one approximations of real tensors.Let us define the binary Chebyshev form of degree d as Ч d, ( x , x ) = ( x + ix ) d + ( x − ix ) d [ d/ X k =0 d k ! ( − k x d − k x k . (0.1)Note that its restriction to the unit circle x + x = 1 can be identified with theunivariate Chebyshev polynomial of the first kind x Ч d, ( x, √ − x ) = cos( d arccos x ), x ∈ [ − , d .In [18], the more general problem of minimizing the ratio of the uniform norm onthe unit sphere and the Bombieri norm among all nonzero forms of a given degree d and number of variables n was considered. Equivalently, identifying a homogeneouspolynomial with the symmetric tensor of its coefficients, one can formulate this problemas follows: minimize the ratio of the spectral norm and the Frobenius norm among allnonzero real symmetric n d -tensors. In an attempt to attack this problem we define thefamily of homogeneous n -ary forms (1.2) that we call Chebyshev forms Ч d,n .Besides solving the above problem for the case of binary forms in Theorem 1.1, wesolve it in the case of cubic ternary forms ( d = 3, n = 3) in Theorem 1.2. This latter1 a r X i v : . [ m a t h . AG ] M a r esult in fact follows from a more general result that we obtain in Theorem 1.5: themaximal orthogonal rank of a real (3 , , , , / √ Ч , achieves the maximum possible relativedistance to the set of all rank-one (3 , , In this section we state our main results. They are all closely related but can be groupedinto somewhat different directions.
In the following P d,n denotes the space of real n -ary forms of degree d (real homogeneouspolynomials of degree d in n variables), and k x k = q x + · · · + x n is the Euclidean normon R n . For a form p we denote by k p k ∞ = max k x k =1 | p ( x ) | the uniform norm of its restriction to the unit sphere.Every form p ∈ P d,n has a standard representation in the basis of monomials: p ( x ) = P | α | = d c α x α ∈ P d,n , where α = ( α , . . . , α n ) ∈ { , , . . . , d } n is a multi-index of length | α | = α + · · · + α n = d and x α = x α · · · x α n n . The Bombieri norm [2] of p is defined as k p k B = vuuut X | α | = d dα ! − | c α | , where (cid:0) dα (cid:1) = d ! α ! ...α n ! is the multinomial coefficient.2he conformal orthogonal group CO ( n ) = R + × O ( n ) acts on the space P d,n of realforms as follows: g = ( s, ρ ) ∈ CO ( n ) , p ∈ P d,n g ∗ p ∈ P d,n , ( g ∗ p )( x ) = sp ( ρ − x ) . Note that both the uniform norm and the Bombieri norm are invariant under the subgroup O ( n ) of orthogonal transformations and their ratio is invariant under the full group CO ( n ); see section 2.2.In [18], Qi asked about the smallest possible ratio k p k ∞ / k p k B that the two normscan attain in a space P d,n . In our first result we solve this problem for binary forms ofany given degree and we also characterize minimizers in this case. Theorem 1.1.
For any nonzero p ∈ P d, it holds that k p k ∞ k p k B ≥ k Ч d, k ∞ k Ч d, k B = 1 √ d − . (1.1) When d = 0 , one has equality in (1.1) for any p ∈ P d, , when d = 2 equality holds ifand only if p = ± g ∗ ( x + x ) or p = g ∗ Ч = g ∗ ( x − x ) , where g ∈ CO (2) . When d ≥ equality holds if and only if p = g ∗ Ч d, , g ∈ CO (2) . For any d ≥ n ≥ n -ary Chebyshev form of degree d as Ч d,n ( x , . . . , x n ) = [ d/ X k =0 d k ! ( − k x d − k ( x + · · · + x n ) k . (1.2)Note that the forms Ч d,n are invariant under orthogonal transformations of R n thatpreserve the point (1 , , . . . ,
0) and for any vector v = ( v , . . . , v n ) ∈ R n − of unit lengthone has that Ч d,n ( x , v x , . . . , v n x ) = Ч d, ( x , x ) is the binary Chebyshev form (0.1).In this work we are particularly concerned with cubic Chebyshev forms Ч ,n ( x , . . . , x n ) = x − x ( x + · · · + x n ) . (1.3)It is an easy calculation that k Ч d,n k ∞ = 1 , k Ч d,n k B = [ d/ X k =0 d k ! X β =( β ,...,β n − ) , | β | = k kβ ! k β ! − , (1.4)where 2 β = (2 β , . . . , β n − ), and, in particular, k Ч ,n k B = 3 n − . (1.5)In the case d = 2 of quadratic forms one can easily determine the minimal ratio k p k ∞ / k p k B by passing to the ratio of spectral and Frobenius norms of symmetric matrices.Specifically, k p k ∞ k p k B ≥ √ n , p ∈ P ,n , p = g ∗ ( ± x ± · · · ± x n ), where g ∈ CO ( n ). Theseforms correspond to multiples of symmetric orthogonal matrices. Note that among theseextremal quadratic forms there is the Chebyshev quadric Ч ,n ( x ) = x − x − · · · − x n which is classically known as the Lorentz quadric . This fact for d = 2 together withTheorem 1.1 might suggest thinking that Chebyshev forms Ч d,n also minimize the ratioof uniform and Bombieri norm in P d,n for d ≥ n ≥
3. We show that it is indeedthe case for the first “nontrivial” situation d = 3, n = 3 of ternary cubics. Theorem 1.2.
Let p ∈ P , be a nonzero ternary cubic form. Then k p k ∞ k p k B ≥ √ and equality holds if p = g ∗ Ч , , where g ∈ CO (3) . Theorem 1.2 is part of Corollary 1.6 further below.However, at least for all sufficiently large n , the Chebyshev form Ч ,n is not a globalminimizer for the norm ratio. Indeed, [16, Thm. 5.3] provides examples of symmetric n × n × n tensors with n = 2 m that yield forms p ∈ P , m satisfying k p k ∞ k p k B = (cid:18) (cid:19) m = n ln(2 / ≤ n − . , (1.6)whereas, by (1.4) and (1.5), k Ч ,n k ∞ k Ч ,n k B = 1 √ n − . For instance, for n = 2 = 1024, it holds that k Ч ,n k ∞ / k Ч ,n k B ≥ . > (2 / ≈ . Ч ,n is a localminimum of the ratio of the two norms on the set of nonzero cubic n -ary forms. Theorem 1.3.
Let n ≥ . For all p ∈ P ,n in a small neighborhood of Ч ,n we have k p k ∞ k p k B ≥ k Ч ,n k ∞ k Ч ,n k B . Let ⊗ dj =1 R n j denote the space of real ( n , . . . , n d )-tensors, considered as n × · · · × n d tables A = ( a i ...i d ) of real numbers. For two ( n , . . . , n d )-tensors their Frobenius innerproduct is given by h A, A i F = n ,...,n d X i ,...,i d =1 a i ...i d a i ...i d and k A k F = p h A, A i F denotes the induced Frobenius norm.4he outer product x (1) ⊗ · · · ⊗ x ( d ) of vectors x ( j ) ∈ R n j is an ( n , . . . , n d )-tensor X with entries ( x (1) i · · · x ( d ) i d ). Nonzero tensors of this form are said to be of rank one ,denoted rank( X ) = 1. The spectral norm on ⊗ dj =1 R n j is defined as k A k = max k x (1) k = ··· = k x ( d ) k =1 h A, x (1) ⊗ · · · ⊗ x ( d ) i F = max k X k F =1 , rank( X )=1 h A, X i F , (1.7)where k · k denotes the standard Euclidean norm.Given a ( n , . . . , n d )-tensor A , a rank-one tensor Y ∈ ⊗ dj =1 R n j is called a best rank-oneapproximation to A if it minimizes the Frobenius distance to A from the set of rank-onetensors, that is, k A − Y k F = min X ∈⊗ dj =1 R nj , rank( X )=1 k A − X k F . (1.8)The notion of best rank-one approximation ratio of a tensor space was introduced by Qiin [18]. For the space of ( n , . . . , n d )-tensors it is defined as A ( ⊗ dj =1 R n j ) = min = A ∈⊗ dj =1 R nj k A k k A k F . (1.9)It is the largest constant c satisfying k A k ≥ c k A k F for all A ∈ ⊗ dj =1 R n j . Anotherinterpretation is that A ( ⊗ dj =1 R n j ) is the inverse of the operator norm of the identitymap from ( ⊗ dj =1 R n j , k · k ) to ( ⊗ dj =1 R n j , k · k F ). Definition 1.4.
A nonzero tensor A ∈ ⊗ dj =1 R n j is called extremal if it is a minimizer in(1.9), that is, if it satisfies k A k k A k F = A ( ⊗ dj =1 R n j ) . Seen as a function of a tensor A ∈ ⊗ dj =1 R n j , k A k F = 1, of unit Frobenius norm, therank-one approximation error (1.8) attains its maximum exactly at extremal tensorsof unit Frobenius norm. The precise relation between (1.8) and (1.9) together with apossible application is given in (2.13) in subsection 2.2.The space Sym d ( R n ) of symmetric n d -tensors consists of tensors A = ( a i ...i d ) in ⊗ dj =1 R n that satisfy a i σ ...i σd = a i ...i d for any permutation σ on d elements. This spaceis isomorphic to the space P d,n of homogeneous forms as explained in subsection 2.1.Under this isomorphism Frobenius and spectral norms of a symmetric tensor correspondto Bombieri norm and uniform norm, respectively. The best rank-one approximationratio A (Sym d ( R n )) of the space of symmetric tensors is defined by replacing ⊗ dj =1 R n j with Sym d ( R n ) in (1.9) and is equal to the minimum ratio between the uniform andthe Bombieri norms of a nonzero form in P d,n . In this context it is important to notethat the definition of the spectral norm of a symmetric tensor does not change if themaximum in (1.7) is taken over symmetric rank-one tensors only; see subsection 2.1.A general formula for A ( ⊗ dj =1 R n j ) or A (Sym d ( R n )) is not known except for specialcases; see [15]. Determining or estimating these constants is an interesting problem on5ts own and may have some useful applications for rank-truncated tensor optimizationmethods (see section 2.2). The present work contains some new contributions with themain focus on symmetric tensors.One always has0 < A ( ⊗ dj =1 R n j ) ≤ < A ( ⊗ dj =1 R n ) ≤ A (Sym d ( R n )) ≤ . The asymptotic behavior of A ( ⊗ dj =1 R n ) is O (1 / √ n d − ); see [7]. For d = 3 the currentlybest known upper bound valid for all n seems to be 1 . n ln(2 / / ln 2 ≤ . n − . andfollows directly from (1.6); see [16].Lower bounds on the best rank-one approximation ratio can be obtained fromdecomposition of tensors into pairwise orthogonal rank-one tensors. For A ∈ ⊗ dj =1 R n j let A = Y + · · · + Y r , (1.10)where Y , . . . , Y r are rank-one ( n , . . . , n d )-tensors such that h Y ‘ , Y ‘ i F = 0 for ‘ = ‘ .The smallest possible number r that allows such a decomposition (1.10) is called the orthogonal rank of the tensor A [9] and will be denoted by rank ⊥ ( A ). Since at least oneof the terms in (1.10) has to satisfy h A, Y i i F ≥ k A k F /r , it follows that k A k k A k F ≥ p rank ⊥ ( A )for all A ∈ ⊗ dj =1 R n j . Thus an upper bound on the maximal orthogonal rank in a giventensor space leads to a lower bound on the best rank-one approximation ratio of thattensor space: A ( ⊗ dj =1 R n j ) ≥ q max ⊗ dj =1 R nj rank ⊥ ( A ) . (1.11)It appears that for all known values of A ( ⊗ dj =1 R n j ) this is actually an equality [13, 15].The values for A ( R n ⊗ R n ⊗ R n ) have been determined in [14] for all combinations n , n , n ≤
4, except for (3 , , , , Theorem 1.5.
The maximal orthogonal rank of a (3 , , -tensor is seven. In [14] it has been shown that 1 / √ A ( R ⊗ R ⊗ R ) andconjectured that it is actually the exact value. Due to (1.11), Theorem 1.5 shows that1 / √ / √ , , Ч , . Sincethe spaces Sym ( R ) and P , are isometric (with respect to the both norms), Theorem 1.2is therefore part of the following corollary of Theorem 1.5. Corollary 1.6.
We have A ( R ⊗ R ⊗ R ) = A (Sym ( R )) = 1 q max A ∈ R ⊗ R ⊗ R rank ⊥ ( A ) = 1 √ and the symmetric tensor corresponding to the Chebyshev cubic Ч , is extremal. n ≤ · · · ≤ n d . Then it is not difficult to show that the orthogonalrank of an ( n , . . . , n d )-tensor is not larger than n · · · n d − . It follows from (1.11) that A ( ⊗ dj =1 R n j ) ≥ √ n · · · n d − , n ≤ · · · ≤ n d . (1.12)In [15] the concept of an orthogonal tensor is defined by the property that its contractionalong the first d − n d is the largest dimension) with any d − n d -tensors this is the case if and only if n = 1 , , , A (Sym d ( R )) = 1 √ d − = A ( ⊗ dj =1 R ) , and since the symmetric tensors associated to Chebyshev forms attain these constants,they are orthogonal in the sense of [15]. In light of Corollary 1.6 one hence may wonderwhether A (Sym d ( R n )) equals A ( ⊗ dj =1 R n ) in general, or at least in the case d = 3. Notethat this is true for matrices. In general, the answer to this question is, however, negative.In the cases n = 4 and n = 8 it would imply the existence of symmetric orthogonaltensors, which we show is not possible. Proposition 1.7. If A ∈ Sym d ( R n ) is an orthogonal symmetric tensor of order d ≥ ,then n = 1 or n = 2 . For n = 2 the only such tensors are the ones associated to rotatedChebyshev forms p = ρ ∗ Ч d, , ρ ∈ O (2) , that is, are of the form ( ρ, · · · , ρ ) · A (see (2.4) )with A given by (2.7) . Corollary 1.8.
For d ≥ we have A ( ⊗ dj =1 R ) = 1 √ d − < A (Sym d ( R )) and A ( ⊗ dj =1 R ) = 1 √ d − < A (Sym d ( R )) . The cases of 2 d - and (3 , , The problem of determining the best rank-one approximation ratio of a tensor space andfinding associated extremal tensors can be seen as a constrained optimization problemfor a Lipschitz function. The spectral norm A
7→ k A k is a Lipschitz function onthe normed space ( ⊗ dj =1 R n j , k · k F ) (with Lipschitz constant one). The best rank-oneapproximation ratio A ( ⊗ dj =1 R n j ) equals the minimal value of this function on the unitsphere { A ∈ ⊗ dj =1 R n j : k A k F = 1 } defined by the Frobenius norm, and extremal tensors(of unit Frobenius norm) are its global minima. Global as well as local minima of aLipschitz function are among its critical points. The notion of a critical point of aLipschitz function constrained to a submanifold is explained in section 2.3. It motivatesthe following terminology. 7 efinition 1.9. A nonzero tensor A ∈ ⊗ dj =1 R n j is critical if A/ k A k F is a critical pointof the restriction of the spectral norm to the Frobenius sphere, meaning that λA belongsto the generalized gradient of the spectral norm at A/ k A k F for some λ ∈ R .We can then give a characterization of critical ( n , . . . , n d )-tensors in terms of decom-positions of them into their best rank-one approximations. Theorem 1.10.
A nonzero tensor A ∈ ⊗ dj =1 R n j is critical if and only if the rescaledtensor k A k / k A k F A can be written as a convex linear combination of some best rank-oneapproximations of A . Specifically, the theorem states that a tensor A is critical if and only if there exists adecomposition (cid:18) k A k k A k F (cid:19) A = r X ‘ =1 α ‘ Y ‘ , r X ‘ =1 α ‘ = 1 , α , . . . , α r > , (1.13)where Y , . . . , Y r are best rank-one approximations of A . In particular, if A ∈ ⊗ dj =1 R n j isan extremal tensor, then A ( ⊗ dj =1 R n j ) · A = r X ‘ =1 α ‘ Y ‘ , r X ‘ =1 α ‘ = 1 , α , . . . , α r > Y , . . . , Y r of A .An analogue of Theorem 1.10 holds for symmetric tensors or, equivalently, homo-geneous forms. Considering the spectral norm as a function on the space Sym d ( R n )only, it is again a Lipschitz function, and the best rank-one approximation ratio ofSym d ( R n ) equals its minimum value on the Frobenius unit sphere in the space Sym d ( R n )of symmetric tensors. A nonzero symmetric tensor A ∈ Sym d ( R n ) is called critical in Sym d ( R n ) if the normalized symmetric tensor A/ k A k F is a critical point (see section 2.3)of the restriction of the spectral norm to the Frobenius sphere in the space Sym d ( R n ).We also say that a form p ∈ P d,n is critical if the associated symmetric tensor is criticalin Sym d ( R n ). Theorem 1.11.
A nonzero tensor A ∈ Sym d ( R n ) is critical in Sym d ( R n ) if and onlyif the rescaled tensor k A k / k A k F A can be written as a convex linear combination ofsome symmetric best rank-one approximations of A . In this case A is also critical in thespace ⊗ dj =1 R n . Here the second statement follows immediately from Theorem 1.10 and the fact that abest rank-one approximation of a symmetric tensor can always be chosen to be symmetricdue to Banach’s result [1]; see section 2.1. However, if A ∈ Sym d ( R n ) is an extremalsymmetric tensor, then, by Theorem 1.11, A (Sym d ( R n )) · A = r X ‘ =1 α ‘ Y ‘ , r X ‘ =1 α ‘ = 1 , α , . . . , α r > Y , . . . , Y r of A , and A is critical in ⊗ dj =1 R n j . But in general A is not extremal in ⊗ dj =1 R n j as discussed at the end of theprevious subsection.Theorems 1.10 and 1.11 combined with Proposition 2.2 from section 2.2 imply thatextremal tensors must have several best rank-one approximations. Corollary 1.12.
Let d ≥ . Then any extremal tensor in ⊗ dj =1 R n has at least n distinctbest rank-one approximations. Similarly, any extremal symmetric tensor in Sym d ( R n ) has at least n distinct symmetric best rank-one approximations. Below we give an alternative characterization of critical tensors in terms of theirnuclear norm. The nuclear norm of a ( n , . . . , n d )-tensor A ∈ ⊗ dj =1 R n j is defined by k A k ∗ = inf ( r X ‘ =1 k Y ‘ k F : A = r X ‘ =1 Y ‘ , r ∈ N , rank( Y ‘ ) = 1 , ‘ = 1 , . . . , r ) . (1.16)It is a result of Friedland and Lim [10] that for a symmetric tensor A ∈ Sym d ( R n ) it isenough to take the infimum in (1.16) over symmetric rank-one tensors only. Hence thenuclear norm of a symmetric tensor can be defined intrinsically in the space Sym d ( R n ).In either case, the infimum in (1.16) is attained.Nuclear and spectral norms are dual to each other (see subsection 2.1) and for anytensor A ∈ ⊗ dj =1 R n j it holds that k A k F ≤ k A k k A k ∗ . (1.17)Our next result characterizes tensors achieving equality in (1.17). Theorem 1.13.
The following two properties are equivalent for a nonzero tensor A in ⊗ dj =1 R n j or Sym d ( R n ) : (i) A is critical, (ii) k A k k A k ∗ = k A k F . We remark that the fact that extremal tensors achieve equality in (1.17) has beenalready proven in [8, Theorems 2.2 and 3.1].
For symmetric tensors, the statement of Theorem 1.11 can be reinterpreted in terms ofhomogeneous forms. Note that a symmetric rank-one tensor Y = λ y ⊗ · · · ⊗ y , λ ∈ R , k y k = 1, is a symmetric best rank-one approximation to the symmetric tensor associatedto a homogeneous form p if and only if λ = p ( y ) = ±k p k ∞ . (1.18)9lso, by (2.10), the homogeneous form associated to such a rank-one tensor is proportionalto the d th power of a linear form, p Y ( x ) = λ h y, x i d = λ ( y x + · · · + y n x n ) d . Therefore, in analogy to (1.13), Theorem 1.11 states that a form p ∈ P d,n is critical forthe ratio k p k ∞ / k p k B if and only if it can be written as (cid:18) k p k ∞ k p k B (cid:19) p ( x ) = r X ‘ =1 α ‘ λ ‘ h y ‘ , x i d , r X ‘ =1 α ‘ = 1 , α , . . . , α r > , (1.19)where λ i ∈ R and y i ∈ R n , k y i k = 1, satisfy (1.18) for i = 1 , . . . , r .From Theorem 1.1 we know that the binary Chebyshev forms Ч d, are extremal in P ,d and therefore they must admit a decomposition like (1.19). In Theorem 1.14 weprovide such a decomposition. For k = 0 , . . . , d − θ k = πk/d and a k = cos( θ k ), b k = sin( θ k ). Then a k + ib k = e iθ k , k = 0 , . . . , d −
1, are 2 d th roots of unity. Theorem 1.14.
For any d ≥ we have d − Ч d, ( x , x ) = 1 d d − X k =0 ( − k ( x a k + x b k ) d (1.20) or, in polar coordinates, d − Ч d, (cos θ, sin θ ) = 12 d − cos( dθ ) = 1 d d − X k =0 ( − k cos( θ − θ k ) d . (1.21)The second equality in (1.21) constitutes an interesting trigonometric identity, whichwe were not able to find in the literature.In the following corollary of Theorem 1.14 we provide a decomposition (1.19) forcubic Chebyshev forms Ч ,n , which shows that they are critical in P ,n . Corollary 1.15.
For n ≥ we have n − Ч ,n ( x ) = (cid:18) n + 29 n − (cid:19) x + 49 n − n X i =2 − x + √ x i ! + − x + √ x i ! . (1.22) In particular, Ч ,n , n ≥ , is critical for the ratio k p k ∞ / k p k B , p ∈ P ,n . In section 3.6 we use Corollary 1.15 to prove Theorem 1.3, that is, that Ч ,n is a localminimum for the norm ratio.It is interesting to note that a decomposition of Ч ,n or, more precisely, of itsrepresenting symmetric tensor, into nonsymmetric best rank-one approximations istrivially obtained. By (1.3), the associated symmetric tensor is A n = e ⊗ e ⊗ e − n X k =2 ( e ⊗ e k ⊗ e k + e k ⊗ e ⊗ e k + e k ⊗ e k ⊗ e ) , (1.23)10here e , . . . , e n denote the basic unit vectors in R n . Since k A n k = 1 by (1.5), this“decomposition into entries” is a decomposition into best rank-one approximations withequal weights. Scaling by k A n k / k A n k F = 1 / (3 n −
2) provides a desired convex decompo-sition (1.13). While this proves that A n is critical in R n ⊗ R n ⊗ R n (see Theorem 1.10), itdoes not imply by itself that A n is critical in Sym ( R n ). Thus Corollary 1.15 is a strongerstatement. Observe also that (1.23) is a decomposition into pairwise orthogonal rank-onetensors. This together with (1.5) and (1.11) shows that the tensor A n associated withthe cubic Chebyshev form Ч ,n has orthogonal rank 3 n − In this section we gather some basic definitions and preliminary results upon which webase our arguments for proving the main results in section 3.
The space of ( n , . . . , n d )-tensors is isomorphic to the space of multilinear maps on R n × · · · × R n d . The map associated to a tensor A is given by( x (1) , . . . , x ( d ) )
7→ h
A, x (1) ⊗ · · · ⊗ x ( d ) i F = n ,...,n d X i ,...,i d =1 a i ...i d x (1) i . . . x ( d ) i d . (2.1)The spectral norm (1.7) of A equals the uniform norm of the restriction of the associatedmultilinear map to the product of unit spheres in R n × · · · × R n d .As for the nuclear norm defined in (1.16), it can be shown that the infimum is alwaysattained (see [10, Prop. 3 . A = P r‘ =1 X ‘ of A into rank-onetensors such that k A k ∗ = P r‘ =1 k X ‘ k F is called a nuclear decomposition . We have alreadystated that the spectral and the nuclear norms are dual to each other, that is, k A k = max k A k ∗ ≤ (cid:12)(cid:12) h A, A i F (cid:12)(cid:12) , k A k ∗ = max k A k ≤ (cid:12)(cid:12) h A, A i F (cid:12)(cid:12) , (2.2)and the three above introduced norms satisfy k A k ≤ k A k F , k A k F ≤ k A k ∗ , and k A k F ≤ k A k k A k ∗ . (2.3)Moreover, in the first two inequalities in (2.3) equality holds if and only if A is a rank-onetensor. We refer to [6, 10] for these statements.The product of orthogonal groups O ( n , . . . , n d ) = O ( n ) × · · · × O ( n d ), whose ele-ments are denoted ( ρ (1) , . . . , ρ ( d ) ), acts on the space ⊗ dj =1 R n j as( ρ (1) , . . . , ρ ( d ) ) · A = n ,...,n d X j ,...,j d =1 ρ (1) i j . . . ρ ( d ) i d j d a j ...j d , (2.4)preserving the Frobenius inner product and the spectral and the nuclear norms.11he (cid:0) n + d − d (cid:1) -dimensional space Sym d ( R n ) ⊂ ⊗ dj =1 R n of symmetric n d -tensors isisomorphic to the space P d,n of n -ary d -homogeneous real forms. The symmetric tensor A is identified with the form p A defined as p A ( x ) = h A, x ⊗ · · · ⊗ x i F = n X i ,...,i d =1 a i ...i d x i . . . x i d , x ∈ R n , (2.5)which equals the restriction of the multilinear map (2.1) to the “diagonal” in R n ×· · ·× R n .It is convenient to represent p A in the basis of monomials p A ( x ) = X | α | = d a α x α , where a α = dα ! a i ...i d (2.6)and { i , . . . , i d } is any collection of indices such that for i = 1 , . . . , n the value i occurs α i times among i , . . . , i d .As an example, the binary Chebyshev form Ч d, in (0.1) corresponds to the symmetrictensor with entries a i ...i d = ( ( − k if { i j = 2 } = 2 k ,0 otherwise (2.7)and the associated multilinear map (2.1) is given by h A, x (1) ⊗ · · · ⊗ x ( d ) i = [ d/ X k =0 ( − k X { i j =2 } =2 k x (1) i . . . x ( d ) i d . Banach proved [1] that for a symmetric coefficient tensor A , the maximum absolutevalue of the multilinear form (2.1) on a product of spheres can be attained at diagonalinputs, in other words, k A k = max k x k =1 | p A ( x ) | = k p A k ∞ . (2.8)This is a generalization of the fact that for a symmetric matrix A the maximum absolutevalue of the bilinear form x T Ay is, modulo scaling, attained when x = y is an eigenvectorfor the eigenvalue with the largest absolute value. Therefore, spectral norm for symmetrictensors may be intrinsically defined in the space Sym d ( R n ).Next, one can easily check that the Frobenius inner product between two symmetrictensors A = ( a i ...i d ), A = ( a i ...i d ) ∈ Sym d ( R n ) equals the Bombieri product betweenthe corresponding homogeneous forms p A ( x ) = P | α | = d a α x α and p A ( x ) = P | α | = d a α x α with coefficients defined through (2.6): h A, A i F = n X i ,...,i d =1 a i ...i d a i ...i d = X | α | = d dα ! − a α a α =: h p A , p A i B . (2.9)12y (2.8) and (2.9), the isomorphism A p A establishes an isometry between(Sym d ( R n ) , k ·k ) and ( P d,n , k ·k ∞ ), as well as between (Sym d ( R n ) , k· k F ) and ( P d,n , k· k B ).When n = · · · = n d = n the diagonal subaction of the action (2.4) preservesthe subspace Sym d ( R n ) of symmetric tensors and it corresponds to the action of theorthogonal group on the space P d,n of homogeneous forms by orthogonal change ofvariables: ρ ∈ O ( n ) , p ∈ P d,n ρ ∗ p ∈ P d,n , ( ρ ∗ p )( x ) = p ( ρ − x ) . Due to (2.9), this shows that the Bombieri inner product is invariant under such a changeof variables.Finally, we have already noted that according to (2.5) a symmetric rank-one tensor Y = ± y ⊗ · · · ⊗ y corresponds to the d th power of a linear form h y, ·i as follows: p Y ( x ) = h± y ⊗ · · · ⊗ y, x ⊗ · · · ⊗ x i F = ±h y, x i d . (2.10)Hence a decomposition of a symmetric tensor into symmetric rank-one tensors correspondsto a decomposition of the associated homogeneous form into powers of linear forms. Notethat by (2.5) the Bombieri inner product of any homogeneous form p ∈ P d,n with a d thpower of a linear form h y, ·i equals h p, h y, ·i d i B = p ( y ) . Given a nonzero tensor A ∈ ⊗ dj =1 R n j , a rank-one ( n , . . . , n d )-tensor Y = λ y (1) ⊗· · ·⊗ y ( d ) ,where λ ∈ R and k y ( i ) k = 1, i = 1 , . . . , d , is a best rank-one approximation to A if andonly if λ = h A, y (1) ⊗ · · · ⊗ y ( d ) i F = ±k A k . (2.11)Banach’s result [1] implies that one can take y (1) = · · · = y ( d ) ∈ R n in (2.11) if the tensor A ∈ Sym d ( R n ) is symmetric.Also if Y = λ y (1) ⊗ · · · ⊗ y ( d ) is a best rank-one approximation of A as above, then forevery j = 1 , . . . , d the linear form x ( j )
7→ h
A, y (1) ⊗ · · · ⊗ x ( j ) ⊗ · · · ⊗ y ( d ) i F constrainedto k x ( j ) k = 1 achieves its maximum at y ( j ) and hence it vanishes on the orthogonalcomplement of y ( j ) , that is, h A, y (1) ⊗ · · · ⊗ y ( j − ⊗ x ( j ) ⊗ y ( j +1) ⊗ · · · ⊗ y ( d ) i F = 0 (2.12)for all x ( j ) ∈ R n j that are orthogonal to y ( j ) .We continue with some remarks on extremal tensors and best rank-one approximationratio. From the definition (1.8) of a best rank-one approximation and (2.11) we havemin rank( X )=1 k A − X k F = k A − Y k F = k A k F − k A k for any best rank-one approximation Y to A ∈ ⊗ dj =1 R n j . Recalling the definition (1.9) ofthe best rank-one approximation ratio A ( ⊗ dj =1 R n j ), the maximum relative distance of atensor to the set of rank-one tensors is given asmax = A ∈⊗ dj =1 R nj min rank( X )=1 k A − X k F k A k F = q − A ( ⊗ dj =1 R n j ) (2.13)13nd is achieved for extremal tensors. This relation explains the name “best rank-oneapproximation ratio” for the constant A ( ⊗ dj =1 R n j ). When restricting to symmetrictensors, (2.13) holds with A (Sym d ( R n )) instead.For context we note that the relation (2.13) shows that lower bounds on A ( ⊗ dj =1 R n j )can be used for convergence analysis of greedy methods for low-rank approximation usingrank-one tensors as a dictionary. For example, the pure greedy method to approximate A ∈ ⊗ dj =1 R n j produces a recursive sequence A ‘ +1 = A ‘ + Y ‘ , where A = 0 and Y ‘ is abest rank-one approximation of A − A ‘ . Then (2.13) implies k A − A ‘ +1 k F ≤ q − A ( ⊗ dj =1 R n j ) k A − A ‘ k F ;see [19] for a general introduction to greedy methods. For a more general problem offinding an approximate low-rank minimizer for a smooth cost function f : ⊗ dj =1 R n j → R ,one could replace Y ‘ with a (scaled) best rank-one approximation of a suitable residual,for example, the negative gradient −∇ f ( A ‘ ). Then A ( ⊗ dj =1 R n j ) is a lower bound forthe (cosine of the) angle between the search direction Y ‘ and −∇ f ( A ‘ ) and hence canbe used to estimate the convergence of such an iteration; see, e.g., [20] and referencestherein. Again, for symmetric tensors once can replace A ( ⊗ dj =1 R n j ) with A (Sym d ( R n ))in these considerations.In the following lemma we show that the best rank-one approximation ratio strictlydecreases with the dimension. Lemma 2.1.
Let A d,n denote either A ( ⊗ dj =1 R n ) or A (Sym d ( R n )) . Then for any d ≥ and n ≥ we have A d,n +1 ≤ A d,n q A d,n . Proof.
Let A ∈ ⊗ dj =1 R n be an n d -tensor of unit Frobenius norm, k A k F = 1. For ε ∈ [0 , A ε ∈ ⊗ dj =1 R n +1 be the ( n + 1) d -tensor with entries a εi ...i d = q − ε k A k a i ...i d if i , . . . , i d ≤ n , ε k A k if i = · · · = i d = n + 1,0 otherwise.Observe that k A ε k F = 1, and A ε is symmetric if A is. Let ξ (1) , . . . , ξ ( d ) be unit normvectors in R n +1 partitioned as ξ ( j ) = ( x ( j ) , z ( j ) ) with x ( j ) ∈ R n and z ( j ) ∈ R . Then fromthe “block diagonal” structure of A ε it follows that h A ε , ξ (1) ⊗ · · · ⊗ ξ ( d ) i F = q − ε k A k h A, x (1) ⊗ · · · ⊗ x ( d ) i F + ε k A k z (1) · · · z ( d ) ≤ max (cid:18)q − ε k A k , ε (cid:19) k A k ( k x (1) k · · · k x ( d ) k + z (1) · · · z ( d ) ) . By a generalized Hölder inequality [12, § 11], the term in the right brackets is boundedby one. The maximum on the left, on the other hand, takes its minimal value for14 = 1 / q k A k . Since ξ (1) , · · · , ξ ( d ) were arbitrary, this shows k A ε k ≤ k A k q k A k . The assertions follow by choosing A to be an extremal tensor in the space ⊗ dj =1 R n orSym d ( R n ), respectively.The previous lemma provides a lower bound on the rank of extremal tensors. Recallthat the (real) rank of a tensor A ∈ ⊗ dj =1 R n j is the smallest number r that is needed torepresent A as the linear combination A = X + · · · + X r (2.14)of rank-one tensors X , . . . , X r . The (real) symmetric rank of a symmetric tensor A isthe smallest number of symmetric rank-one tensors needed for (2.14) to hold. Proposition 2.2. If A ∈ ⊗ dj =1 R n is an extremal tensor, its rank must be at least n . If A ∈ Sym d ( R n ) is an extremal symmetric tensor, its symmetric rank must be at least n .Proof. Let A ∈ ⊗ dj =1 R n be a tensor of rank at most n −
1, that is, A = v (1)1 ⊗ · · · ⊗ v ( d )1 + · · · + v (1) n − ⊗ · · · ⊗ v ( d ) n − . For j = 1 , . . . , d let V ( j ) ’ R n − be any ( n − R n that containsvectors v ( j )1 , . . . , v ( j ) n − . Since A ∈ V (1) ⊗ · · · ⊗ V ( d ) ’ ⊗ dj =1 R n − we have k A k k A k F ≥ A ( ⊗ dj =1 R n − ) . Thus, by Lemma 2.1, A cannot be extremal in ⊗ dj =1 R n .When A is symmetric and of symmetric rank at most n − V (1) = · · · = V ( d ) = V so that A ∈ Sym d ( V ) ’ Sym d ( R n − ), leading to the analogous conclusion. The problem of determining the best rank-one approximation ratio of a tensor space andfinding extremal tensors is a constrained optimization problem for a Lipschitz function.The theory of generalized gradients developed by Clarke [4] provides necessary optimalityconditions. We provide here only the most necessary facts of this theory needed for ourresults. A comprehensive introduction is given, e.g., in [5].A function f : R m → R is called Lipschitz , if there exist a constant L such that | f ( p ) − f ( q ) | ≤ L k p − q k for all pairs p, q ∈ R m . By the classical Rademacher’s theorem,a Lipschitz function f is differentiable at almost all (in the sense of Lebesgue measure)points p ∈ R m . Denote by ∇ f ( p ) the gradient of f at such a point. The generalized radient of f at any p ∈ R m , denoted as ∂f ( p ), is then defined as the convex hull of theset of all limits ∇ f ( p i ), where p i is a sequence of differentiable points that converges to p . It turns out that ∂f ( p ) is a nonempty convex compact subset of R m . Moreover ∂f ( p )is a singleton if and only if f is differentiable at p , in which case ∂f ( p ) = {∇ f ( p ) } .Let S be a differentiable submanifold in R m . Then a necessary condition for theLipschitz function f to attain a local minimum relative to S at x ∈ S is that ∂f ( p ) ∩ N S ( p ) = ∅ , (2.15)where N S ( p ) denotes the normal space, that is, the orthogonal complement of the tangentspace of S at p . Note that this is a “Lipschitz” analogue of the classical Lagrangemultipliers rule for continuously differentiable functions. We refer to [5, Sec. 2.4]. Everypoint p ∈ S that satisfies (2.15) is called a critical point of f on S . Hence local minimaof f on S are among the critical points.The proofs of Theorems 1.10 and 1.11 in section 3.4 consist in applying the necessaryoptimality condition (2.15) to the spectral norm function on the sphere defined byFrobenius norm. Here two things are of relevance. First, for a Euclidean sphere S wehave N s ( p ) = { µp : µ ∈ R } . Hence the condition (2.15) becomes µp ∈ ∂f ( p ) (2.16)for some µ ∈ R . Second, by (1.7), the spectral norm is an example of a so-called maxfunction, that is, a function of the type f ( p ) = max u ∈ C g ( p, u ) , (2.17)where C is compact. Under certain smoothness conditions on the function g , whichare satisfied for spectral norm (1.7), Clarke [4, Thm. 2.1] has determined the followingcharacterization of the generalized gradient: ∂f ( p ) = conv {∇ p g ( p, u ) : u ∈ M ( p ) } , (2.18)where conv denotes the convex hull and M ( p ) is the set of all maximizers u in (2.17) fora fixed p . For the spectral norm (1.7), this set consists of all normalized best rank-oneapproximations of a given tensor; see (3.4). Our main results are proved in this section. We are going to repeatedly use the equivalence(2.5) between symmetric tensors and homogeneous forms and the corresponding relations(2.8), (2.9) for the different norms.
This subsection is devoted to the proof of Theorem 1.1. While the given proof isself-contained, some arguments could be omitted with reference to results in [15].16 roof of Theorem 1.1.
By (1.4), k Ч d, k k Ч d, k B = 1 √ d − . It then follows from (1.12) that this value equals A ( ⊗ dj =1 R ), so the symmetric tensorassociated to the Chebyshev form must be extremal both in ⊗ dj =1 R and in Sym d ( R ).We now consider the uniqueness statements. When d = 1, the space P ,n consists oflinear forms p ( x ) = h a, x i , for any of which it holds that k p k ∞ / k p k B = 1. In the case d = 2 of quadratic forms, the minimal ratio between spectral and Frobenius norm of asymmetric n × n matrix is attained for multiples of symmetric orthogonal matrices onlyand takes the value 1 / √ n . When n = 2, all such matrices can be obtained by orthogonaltransformation and scaling from the two diagonal matrices with diagonal entries (1 , , − p ∈ P d, .In the case d ≥ d -tensors A satisfying k A k = 1 , k A k F = √ d − (3.1)are obtained from orthogonal transformations of the Chebyshev form Ч d, . To this end,we show that under the additional condition p A ( e ) = h A, e ⊗ · · · ⊗ e i F = 1 = k A k , (3.2)the form p A equals Ч d, . The proof is given by induction over d ≥
3. Before giving thisproof we note that for a 2 d -tensor A satisfying (3.1), its two slices A = ( a i ...i d − ) and A = ( a i ...i d − ) necessarily have the same Frobenius norm k A k F = k A k F = √ d − .In fact, k A k = 1 implies k A k ≤ k A k F ≤ √ d − . Since thesame holds for A and since k A k F = k A k F + k A k F , the claim follows. Moreover, k A k = k A k = 1, again by (1.12), so that both slices are necessarily extremal. Notethat by the same argument, every 2 d -subtensor of A with d < d must be extremal.We begin the induction with d = 3. Assume A ∈ Sym ( R ) satisfies (3.1) and (3.2).Then we have seen that both, say, frontal slices of A are themselves extremal symmetric2 × a = 1 and the tensor e ⊗ e ⊗ e is a best rank-oneapproximation. From (2.12) we then deduce that a = a = a = 0. The only tworemaining options for the slices of A are A = ± (cid:12)(cid:12)(cid:12) ± ± ! . But the case a = a = a = +1 is also not possible, since it corresponds to theform p A ( x ) = x + 3 x x whose maximum on the sphere is k p A k ∞ = √ >
1. Therefore, a = a = a = − p A = x − x x is the cubic Chebyshev form.We proceed with the induction step. If A ∈ Sym d +1 ( R ) satisfies (3.1) and (3.2),then its two slices A = ( a i ...i d ) and A = ( a i ...i d ) are extremal 2 d -tensors. Since p A ( e ) = p A ( e ) = 1, it follows from the induction hypothesis that A = Ч d, . So its17ntries are given by (2.7). Let a i ...i d denote an entry of the second slice. Due to thesymmetry of A , every entry in the second slice, except for the entry a ... , equals an entryin the first slice after a permutation of the indices. Since this permutation does not affectthe number of occurrences of the value 2 among the indices, the definition (2.7) appliesto all these entries as well. It remains to show that the entry a ... satisfies (2.7), that is,equals zero in case d + 1 is odd, and equals ( − m in case d + 1 = 2 m is even. This entryis part of the symmetric subtensor A = ( a i i i ... ), which as we have noted above mustbe extremal as well. Since the entries of the first slice A are given by (2.7), we find that p A ( x ) = ( − m − ( x − x x ) + a ··· x if d + 1 = 2 m + 1 is odd. Since A is extremal, it then follows from the base case d = 3that a ··· = 0. In case d + 1 = 2 m is even, we get p A ( x , x ) = ( − m − x x + a ··· x , which by a small consideration implies a ··· = ( − m . This concludes the proof. In this section we prove Theorem 1.5. It has been mentioned in section 1.2 how Corol-lary 1.6 follows from it, and that the statement of Theorem 1.2 is included in thelatter.The proof of Theorem 1.5 requires a fact from [13]. Since it is not explicitly formulatedthere, we state it here as a lemma.
Lemma 3.1.
For odd n let A , A ∈ R n ⊗ R n be two n × n matrices. If at least one ofthem is invertible, then there exist orthogonal matrices ρ, ρ ∈ O ( n ) such that ρA ρ = B c d ! , ρA ρ = B c d ! , where B , B are matrices of size ( n − × ( n − , c , c are ( n − -dimensional vectorsand d , d are real numbers.Proof. We can assume A is invertible. Since n is odd, the matrix A − A has at leastone real eigenvalue d . Then there exists an invertible matrix P such that P − A − A P = B c d ! , where B is a matrix of size ( n − × ( n −
1) and c is an ( n − A P and P , that is, A P = Q R , P = Q R , Q , Q are orthogonal, and R , R are upper triangular and invertible. We set ρ = Q − and ρ = Q . Then ρA ρ = R P − A − A P R − = R R − is the product of two upper block triangular matrices, hence upper block triangular.Similarly, ρA ρ = R P − A − A P R − = R B c d ! R − has the asserted upper block triangular structure.In [13] the previous lemma is used to show that for odd n the maximum possibleorthogonal rank of an ( n, n, n −
1. We will only need that the orthogonalrank of a (3 , , Proof of Theorem 1.5.
Note that the aforementioned result of [14] that 1 / √ A ( R ⊗ R ⊗ R ) in combination with (1.11) implies that the maxmimalorthogonal rank cannot be less than seven. We will show that it is at most seven.For A ∈ R ⊗ R ⊗ R , it is convenient to write A = ( A | A | A ), where A , A , A arethe 3 × A , A , A is invertible,each of them can be decomposed into a sum of two rank-one matrices that are orthogonalin the Frobenius inner product: A i = u (1) i ⊗ u (2) i + v (1) i ⊗ v (2) i , i = 1 , ,
3. This leads to adecomposition of A into at most six pairwise orthogonal rank-one tensors: A = X i =1 u (1) i ⊗ u (2) i ⊗ e i + v (1) i ⊗ v (2) i ⊗ e i . Assume without loss of generality that the first slice A is invertible. Lemma 3.1 togetherwith the invariance of orthogonal rank under orthogonal transformations (2.4) allows usto assume that A has the form A = ∗ ∗ ∗∗ ∗ ∗ ∗ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∗ ∗ ∗∗ ∗ ∗ ∗ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∗ ∗ ∗∗ ∗ ∗∗ ∗ ∗ = ∗ ∗ ∗∗ ∗ ∗ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∗ ∗ ∗∗ ∗ ∗ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∗ ∗ ∗∗ ∗ ∗ + ∗ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∗ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∗ ∗ ∗ . The first term is essentially a (2 , , A is at most seven.19 .3 On symmetric orthogonal tensors We prove Proposition 1.7 below. For the general definition of orthogonal tensors ofarbitrary size we refer to [15]. For n d -tensors we can use the recursive definition that A ∈ ⊗ dj =1 R n is orthogonal if A × j u is orthogonal for every j = 1 , . . . , d and every unit normvector u ∈ R n , where for d = 2 we agree to the standard definition of an orthogonal matrix.Here and in the proof below we use standard notation A × j u = (cid:16)P ni j =1 a i ...i j ...i d u i j (cid:17) forpartial contraction of a tensor A with a vector u along mode j , resulting in a tensor oforder d −
1. Note that the above definition implies that every n d -subtensor, d < d , of A is itself orthogonal. Proof of Proposition 1.7 and Corollary 1.8.
It has been shown in [15] that an n d -tensor A is orthogonal if and only if it satisfies k A k = 1 and k A k F = √ n d − , and such tensorsonly exist when n = 1 , , ,
8. Therefore, the statement that for n = 2 the only symmetricorthogonal tensors are the ones obtained from the Chebyshev form Ч d, is hence equivalentto Theorem 1.1. Also, Corollary 1.8 is immediate from Proposition 1.7.We thus only have to show that for n = 4 , n d -tensor cannot besymmetric. We only consider the case n = 4; the arguments for n = 8 are analogous.Since n d -subtensors of an orthogonal tensor are necessarily orthogonal, it is enough toshow that orthogonal 4 × × A exists. Then k A k = 1 and A admits a symmetric best rank-oneapproximation of Frobenius norm one. Since orthogonality and symmetry are preservedunder the action of O (4) we can assume that e ⊗ e ⊗ e is the best rank-one approximationof A , that is, a = h A, e ⊗ e ⊗ e i F = k A k = 1. On the other hand, the first frontalslice A × e must be a symmetric orthogonal matrix, so it is of the form A × e = B ! , where B is a symmetric orthgonal 3 × e , we can assume that B is a diagonal matrix with diagonalentries ε , ε , ε ∈ { +1 , − } . Since A is symmetric and in fact every slice has to be anorthogonal matrix, we find that A = ( A × e | A × e | A × e | A × e ) is of the form A = ε ε
00 0 0 ε ε ε ε ε ε
00 0 0 ε ε ε ε ε ε ε , where also ε ∈ { +1 , − } . For i = 2 , , A × ( e + e i ) / √ ε = 1 and ε = ε = ε = −
1. But then the matrix A × (cid:18) e − e √ (cid:19) = 1 √ − − −
10 0 − − is not orthogonal, which contradicts the assumption that A is an orthogonal tensor.20 .4 Variational characterization In Theorems 1.10 and 1.11 we characterize critical tensors in ⊗ dj =1 R n j and Sym d ( R n ) interms of decompositions into best rank-one approximations. We now prove these resultsand then derive Corollary 1.12. Afterwards, we prove Theorem 1.13. Proof of Theorems 1.10 and 1.11.
From section 2.3, specifically (2.16), it follows thata nonzero tensor A ∈ ⊗ dj =1 R n j is critical in the sense of Definition 1.9 if the tensor A = A / k A k F of Frobenius norm one satisfies µA ∈ ∂ k A k (3.3)for some µ ∈ R . By (1.7), the spectral norm is a max function of the type (2.17) whichis easily shown to satisfy the conditions of [4, Thm. 2 . ∂ k A k = conv { X : k X k F = 1 , rank( X ) = 1 , h A, X i F = k A k } , (3.4)where conv denotes the convex hull. This lets us write (3.3) as µA = r X ‘ =1 α ‘ X ‘ , (3.5)where r > α , . . . , α r > α + · · · + α r = 1, and X ‘ are rank-one tensors of unit Frobenius norm satisfying h A, X ‘ i F = k A k . By takingthe Frobenius inner product with A itself in (3.5), we find that µ = k A k k A k F . Therefore, after multiplying the resulting equation (3.5) by k A k we obtain the assertedstatement of Theorem 1.10, since, by (2.11), the rank-one tensors Y ‘ = k A k X ‘ are bestrank-one approximations of A .Considering symmetric tensors instead of general ones in the previous arguments yieldsa proof of Theorem 1.11. Here it is crucial that in the definition (1.7) of spectral normfor symmetric tensors one can restrict to take the maximum over symmetric rank-onetensors of unit Frobenius norm thanks to Banach’s theorem; cf. (2.8). Proof of Corollary 1.12.
By Proposition 2.2 any extremal tensor in ⊗ dj =1 R n or Sym d ( R n )must be of rank (respectively, symmetric rank) at least n . In particular, there cannot beless than n best rank-one approximations in the expansions (1.14) and (1.15). Proof of Theorem 1.13.
Let a tensor A (either in ⊗ dj =1 R n j or in Sym d ( R n )) be critical,that is, by Theorem 1.10, respectively, Theorem 1.11, A = (cid:18) k A k F k A k (cid:19) r X ‘ =1 α ‘ Y ‘ (3.6) By the classical Carathéodory theorem one can take r ≤ dim( ⊗ dj =1 R n j ) + 1 = n · · · n d + 1. A is symmetric) best rank-one approximations Y , . . . , Y r to A ,and coefficients α , . . . , α r > A ∗ satisfying k A ∗ k ≤ k A k ∗ = h A, A ∗ i F . Note that we then have h X, A ∗ i F ≤ k X k F k A ∗ k ≤ k X k F for every rank-one tensor X . Since k Y ‘ k F = k A k , ithence follows from (3.6) that k A k ∗ = h A, A ∗ i F = (cid:18) k A k F k A k (cid:19) r X ‘ =1 α ‘ h Y ‘ , A ∗ i F ≤ k A k F k A k , which is the converse inequality to (1.17). This shows that (i) implies (ii).Assume now that (ii) holds for a nonzero tensor A , that is, k A k k A k ∗ = k A k F . Bythe definition of the nuclear norm there exist r ∈ N , positive numbers β , . . . , β r > X , . . . , X r of unit Frobenius norm such that A = r X ‘ =1 β ‘ X ‘ and k A k ∗ = r X ‘ =1 β ‘ . (3.7)If A is symmetric, X , . . . , X r can be taken symmetric [10]. Taking the Frobenius innerproduct with A in the first of these equations gives k A k k A k ∗ = h A, A i F = r X ‘ =1 β ‘ h A, X ‘ i F . Since h A, X ‘ i F ≤ k A k for ‘ = 1 , . . . , r and since β , . . . , β r sum up to k A k ∗ , this equalitycan hold only if h A, X ‘ i F = k A k for ‘ = 1 , . . . , r . Since, by (2.11), the rank-one tensors Y ‘ = k A k X ‘ , ‘ = 1 , . . . , r , are then best rank-one approximations of A , we see that (3.7)is equivalent to (3.6), which by Theorems 1.10 and 1.11 means that A is critical. Remark 3.2.
Observe from the proof that decomposition (3.6) of a critical tensor into itsbest rank-one approximations is also its nuclear decomposition. Vice versa, any nucleardecomposition of a tensor A satisfying k A k k A k ∗ = k A k F can be turned into a convexlinear combination of best rank-one approximations of the rescaled tensor k A k / k A k F A . In this section we give the proof of Proposition 1.14, which realizes the decompositionof critical tensors into symmetric best rank-one approximations, that is, correspondingpowers of linear forms, for the Chebyshev forms Ч d, . Proof of Proposition 1.14.
Recall that for any k = 0 , . . . , d − θ k = πk/d and a k = cos( θ k ), b k = sin( θ k ). Let us observe that for any such k we can writecos( dθ ) = Re(( − k e id ( θ − θ k ) ) = ( − k [ d/ X ‘ =0 d ‘ ! ( − ‘ cos( θ − θ k ) d − ‘ sin( θ − θ k ) ‘ dθ ) = 1 d [ d/ X ‘ =0 d ‘ ! ( − ‘ d − X k =0 ( − k cos( θ − θ k ) d − ‘ sin( θ − θ k ) ‘ . Below we show that for any ‘ = 0 , . . . , [ d/
2] it holds that( − ‘ d − X k =0 ( − k cos( θ − θ k ) d − ‘ sin( θ − θ k ) ‘ = d − X k =0 ( − k cos( θ − θ k ) d . (3.8)This together with the identity P [ d/ ‘ =0 (cid:0) d ‘ (cid:1) = 2 d − implies (1.21) (and hence also (1.20)).To derive (3.8) we write( − ‘ d − X k =0 ( − k cos( θ − θ k ) d − ‘ sin( θ − θ k ) ‘ = d − X k =0 ( − k cos( θ − θ k ) d − ‘ ‘ X j =0 ‘j ! cos( θ − θ k ) j ( − ‘ − j = ‘ X j =0 ‘j ! ( − ‘ − j d − X k =0 ( − k cos( θ − θ k ) d − ‘ − j ) and claim that for j = 0 , . . . , ‘ − s = 1 , . . . , [ d/ d − X k =0 ( − k cos( θ − θ k ) d − s = 0 . (3.9)For this let us observe first that Chebyshev polynomials of the first kind T d − j (cos θ ) =cos(( d − j ) θ ), j = 1 , . . . , [ d/ d − , d − , . . . , d − d/ θ − θ k ) d − s in terms of T d − j (cos( θ − θ k )) for j = s, . . . , [ d/ s = 1 , . . . , [ d/
2] we have d − X k =0 ( − k cos(( d − s )( θ − θ k )) = 0 . But this follows from the identity d − X k =0 ( − k e i ( d − s )( θ − θ k ) = e i ( d − s ) θ d − X k =0 (cid:16) e i πs/d (cid:17) k = 0 , hence the proof is completeWe now derive Corollary 1.15 which, in particular, implies that the cubic Chebyshevforms Ч ,n are critical for the ratio k p k ∞ / k p k B , p ∈ P ,n .23 roof of Corollary 1.15. From Proposition 1.14 we get Ч , ( x , x ) = x − x x = 43 x − x − √ x ! + − x + √ x ! . (3.10)We then write Ч ,n ( x ) = x − x ( x + · · · + x n ) = − ( n − x + n X i =2 (cid:16) x − x x i (cid:17) = − ( n − x + 43 n X i =2 x − x − √ x i ! + − x + √ x i ! , where we applied (3.10) to each binary Chebyshev form Ч , ( x , x i ) = x − x x i . Theobtained formula is equivalent to the asserted one (1.22). This subsection is devoted to the proof of Theorem 1.3, which states that the cubicChebyshev form Ч ,n ( x ) = x − x ( x + · · · + x n ) is a local minimum for the ratioof uniform and Bombieri norms. We denote by G ’ O ( n − ⊂ O ( n ) the subgroupconsisting of orthogonal transformations that preserve the point (1 , , . . . , ∈ R n . Notethat G ⊂ O ( n ) is of codimension n − Ч ,n is invariant under G . In particular,the O ( n )-orbit of Ч ,n is at most ( n − Lemma 3.3.
The O ( n ) -orbit of Ч ,n has dimension n − and its tangent space at Ч ,n consists of all reducible cubics of the form ‘ · q , where ‘ is a linear form that vanishes at (1 , , . . . , ∈ R n and q ( x ) = 3 x − x − · · · − x n .Proof. For i = 2 , . . . , n let us consider the elementary rotation R i ( ϕ ) ∈ O ( n ) in the( x , x i )-plane, that is, R i ( ϕ ) is given by the n × n matrix whose only non-zero entries are( R i ( ϕ )) = ( R i ( ϕ )) ii = cos( ϕ ), ( R i ( ϕ )) i = − ( R i ( ϕ )) i = sin ϕ , and ( R i ( ϕ )) jj = 1 for j = 1 , i . It is a straightforward calculation to check that the tangent vector to the curve ϕ R i ( ϕ ) ∗ Ч ,n at ϕ = 0 is a nonzero cubic proportional to x i q . For i = 2 , . . . , n these n − O ( n )-orbit of Ч ,n is atmost ( n − Proof of Theorem 1.3.
Let S = { p ∈ P ,n : k p k B = k Ч ,n k B } denote the sphere of radius k Ч ,n k B in ( P ,n , k · k B ). Denote by H any ( n − O ( n )that passes through the identity id ∈ O ( n ) and intersects G transversally at id ∈ H ∩ G .Denote also by M any submanifold of S that has codimension n −
1, passes through Ч ,n ∈ S , and intersects the O ( n )-orbit of Ч ,n transversally at Ч ,n . Consider nowthe smooth map f : H × M → S , ( h, m ) h ∗ m and note that by construction thedifferential of f at (id , Ч ,n ) is surjective. In particular, f maps some open neighborhoodof (id , Ч ,n ) ∈ H × M to an open neighborhood of Ч ,n ∈ S . Therefore, since the uniform24orm is O ( n )-invariant, in order to prove the claim of the theorem it is enough to showthat Ч ,n ∈ M is a local minimum of the uniform norm restricted to M . To prove thelatter let us denote by T the sphere of radius k Ч ,n k B in the tangent space to M at Ч ,n .We claim that there exists a constant δ > p ∈ T there exists a point x ∈ R n , k x k = 1, such that | Ч ,n ( x ) | = 1 and Ч ,n ( x ) p ( x ) ≥ δ. (3.11)It then follows for the geodesic γ p ( t ) = cos t · Ч ,n + sin t · p that k γ p ( t ) k ∞ ≥ | cos t Ч ,n ( x ) + sin t p ( x ) | ≥ cos t + δ sin t ≥ k Ч ,n k ∞ for all 0 ≤ t ≤ t δ , where t δ > δ . This proves that Ч ,n ∈ M is a localminimum of the uniform norm restricted to M .In order to show (3.11) let us define C n = {± e } ∪ ( ± e + √ ρe : ρ ∈ G ) = {± e } ∪ { x ∈ R n : k x k = 1 , x − x − · · · − x n = 0 } , where e = (1 , , . . . ,
0) and e = (0 , , , . . . , G -invariance of Ч ,n one cansee that C n is the set of unit vectors x ∈ R n , k x k = 1, satisfying | Ч ,n ( x ) | = 1 and Ч ,n ( ± e ) = ± , Ч ,n ± e + √ ρe ! = ∓ , ρ ∈ G. (3.12)From Lemma 3.3 it follows that a nonzero form p ∈ P ,n vanishes on C n if and only if itbelongs to the tangent space of the O ( n )-orbit of Ч ,n at Ч ,n . In particular, no p ∈ T vanishes on the whole of C n . From compactness of both C n and T we hence concludethat max x ∈ C n | p ( x ) | ≥ δ for some δ > p ∈ T . Now put δ = δ / (10 n ). Given p ∈ T , let x ∈ C n besuch that | p ( x ) | ≥ δ . If p ( x ) and Ч ,n ( x ) have the same sign, (3.11) obviously holds as δ > δ . We now treat the case when Ч ,n ( x ) p ( x ) <
0. Note first that, as a consequenceof the G -invariance of Ч ,n , together with the decomposition (1.22), we have the wholefamily of decompositions Ч ,n ( x ) = n + 23 x + 43 n X i =2 − D v + ρ,i , x E + D v − ρ,i , x E , ρ ∈ G, (3.13)where v ± ρ,i = ± / e + √ / ρe i and e i is the i th unit vector, i = 1 , . . . , n . The set ofpossible v ± ρ,i for different ρ ∈ G coincides with C n \ {± e } . Therefore, x either is ± e (in It is interesting to state this in the language of symmetric tensors: the tangent space of the O ( n )-orbitof the symmetric tensor associated with Ч ,n is the orthogonal complement of the span of its symmetricbest rank-one approximations. x = e ) or is among v ± ρ,i , i = 2 , . . . , n , for some ρ ∈ G .Since p is tangent to S at Ч ,n , that is, h p, Ч ,n i B = 0, and since h p, h v, ·i i B = p ( v ), weget from (3.12) and (3.13) that0 = h p, Ч ,n i B = n + 23 Ч ,n ( e ) p ( e ) + 43 n X i =2 Ч ,n ( v + ρ,i ) p ( v + ρ,i ) + Ч ,n ( v − ρ,i ) p ( v − ρ,i ) . One of these terms features Ч ,n ( x ) p ( x ) ≤ − δ . Elementary estimates then show thatfor some x among e and v + ρ,i , v − ρ,i , i = 2 , . . . , n , we must have Ч ,n ( x ) p ( x ) ≥ δ / (10 n ) = δ .We thus have verified (3.11) for all p ∈ T and some δ >
0, which concludes the proof.
Acknowledgment
We thank Zhening Li for pointing out a counterexample to the global optimality ofChebyshev forms Ч ,n as presented in section 1.1. References [1] S. Banach. Über homogene Polynome in ( L ). Stud. Math. , 7:36–44, 1938.[2] B. Beauzamy, E. Bombieri, P. Enflo, and H. L. Montgomery. Products of polynomialsin many variables.
J. Number Theory , 36(2):219–245, 1990.[3] P. L. Chebyshev. Théorie des mécanismes connus sous le nom de parallélogrammes.
Mém. Acad. Sci. Pétersb. , 7:539–568, 1854.[4] F. H. Clarke. Generalized gradients and applications.
Trans. Amer. Math. Soc. ,205:247–262, 1975.[5] F. H. Clarke.
Optimization and Nonsmooth Analysis . Classics in Appl. Math. 5.SIAM, Philadelphia, PA, 1990.[6] F. Cobos, T. Kühn, and J. Peetre. Schatten-von Neumann classes of multilinearforms.
Duke Math. J. , 65(1):121–156, 1992.[7] F. Cobos, T. Kühn, and J. Peetre. On G p -classes of trilinear forms. J. London Math.Soc. (2) , 59(3):1003–1022, 1999.[8] H. Derksen, S. Friedland, L.-H. Lim, and L. Wang. Theoretical and computationalaspects of entanglement. arXiv:1705.07160 , 2017.[9] A. Franc.
Etude Algèbrique des Multitableaux: Apports de l’Algèbre Tensorielle . PhDthesis, Université de Montpellier II, Montpellier, France, 1992.[10] S. Friedland and L.-H. Lim. Nuclear norm of higher-order tensors.
Math. Comp. ,87(311):1255–1281, 2018. 2611] V. L. Goncharov. The theory of best approximation of functions.
J. Approx. Theory ,106(1):2–57, 2000.[12] G. H. Hardy, J. E. Littlewood, and G. Pólya.
Inequalities . Cambridge UniversityPress, Cambridge, UK, 2nd edition, 1952.[13] X. Kong and D. Meng. The bounds for the best rank-1 approximation ratio of afinite dimensional tensor space.
Pac. J. Optim. , 11(2):323–337, 2015.[14] T. Kühn and J. Peetre. Embedding constants of trilinear Schatten-von Neumannclasses.
Proc. Est. Acad. Sci. Phys. Math. , 55(3):174–181, 2006.[15] Z. Li, Y. Nakatsukasa, T. Soma, and A. Uschmajew. On orthogonal tensors and bestrank-one approximation ratio.
SIAM J. Matrix Anal. Appl. , 39(1):400–425, 2018.[16] Z. Li and Y.-B. Zhao. On norm compression inequalities for partitioned block tensors.
Calcolo , 57(1), 2020.[17] N. N. Osipov and N. S. Sazhin. An extremal property of Chebyshev polynomials.
Russian J. Numer. Anal. Math. Model. , 23(1):89–95, 2008.[18] L. Qi. The best rank-one approximation ratio of a tensor space.
SIAM J. MatrixAnal. Appl. , 32(2):430–442, 2011.[19] V. Temlyakov.
Greedy Approximation . Cambridge University Press, Cambridge, UK,2011.[20] A. Uschmajew. Some results concerning rank-one truncated steepest descent direc-tions in tensor spaces. In
Proceedings of the International Conference on SamplingTheory and Applications , pages 415–419, 2015.
International School for Advanced Studies, 34136 Trieste, Italy [email protected]
Max Planck Institute for Mathematics in the Sciences, 04103 Leipzig, Germany [email protected]
Max Planck Institute for Mathematics in the Sciences, 04103 Leipzig, Germany [email protected]@mis.mpg.de