Barriers for recent methods in geodesic optimization
BBarriers for recent methods in geodesic optimization
Cole Franks ∗ and Philipp Reichenbach † Abstract
We study a class of optimization problems including matrix scaling, matrix balancing,multidimensional array scaling, operator scaling, and tensor scaling that arise frequently intheory and in practice. Some of these problems, such as matrix and array scaling, are convex inthe Euclidean sense, but others such as operator scaling and tensor scaling are geodesically convex on a different Riemannian manifold. Trust region methods, which include box-constrainedNewton’s method, are known to produce high precision solutions very quickly for matrix scalingand matrix balancing (Cohen et. al., FOCS 2017, Allen-Zhu et. al. FOCS 2017), and result inpolynomial time algorithms for some geodesically convex problems like operator scaling (Garget. al. STOC 2018, Bürgisser et. al. FOCS 2019). One is led to ask whether these guarantees alsohold for multidimensional array scaling and tensor scaling.We show that this is not the case by exhibiting instances with exponential diameter bound : weconstruct polynomial-size instances of 3-dimensional array scaling and 3-tensor scaling whoseapproximate solutions all have doubly exponential condition number. Moreover, we studyconvex-geometric notions of complexity known as margin and gap, which are used to bound therunning times of all existing optimization algorithms for such problems. We show that marginand gap are exponentially small for several problems including array scaling, tensor scalingand polynomial scaling. Our results suggest that it is impossible to prove polynomial runningtime bounds for tensor scaling based on diameter bounds alone. Therefore, our work motivatesthe search for analogues of more sophisticated algorithms, such as interior point methods, forgeodesically convex optimization that do not rely on polynomial diameter bounds. ∗ Department of Mathematics, Massachusetts Institute of Technology, [email protected] † Institut für Mathematik, Technische Universität Berlin, [email protected] . Supported by the EuropeanResearch Council (ERC) under the European’s Horizon 2020 research and innovation programme (grant agreement no.787840). a r X i v : . [ c s . CC ] F e b ontents d -tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4 Polynomial scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 A Notation 43B Representation theory background 44C Padding for tensor margin and tensor gap 46D Proof of Lemma 2.10 47E Padding and rounding for diameter bounds 50References 51
Introduction
We study a class of optimization problems ubiquitous in theoretical computer science, machinelearning, quantum information theory and statistics. The programs we consider are continuousoptimization problems over matrix groups. More precisely, they can be posed as Euclidean normminimization over the closure of a group orbit. The programs span two historically distinct contexts:In one context, the optimization problems are convex, and in the other they are not convex butrather geodesically convex on a suitable manifold.The commutative setting , in which the underlying group is Abelian, captures matrix scaling,matrix balancing and array scaling, which arise in scientific computing and optimal transport[Cut13, PR71]. Such problems fall into the framework of unconstrained geometric programming.Though these problems are convex, there are at least two reasons to study them further. Firstly, theyare of such practical importance that speed matters. Naïvely applying powerful algorithms likeellipsoid and interior point methods can be impractically slow. Hence, it is important to understandwhen faster methods can succeed. Matrix scaling and balancing, in particular, have enjoyed somesuccess stories - there are fast algorithms to obtain high precision solutions [CMTV17, AZLOW17],and there are more general upper bounds [BLNW20]. Secondly, the algorithms developed for thecommutative setting are candidates for generalization to our second setting, which takes place inthe less well-understood arena of geodesically convex optimization.The second context, which we call the noncommutative setting , arises when the underlying groupis non-Abelian. The noncommutative setting captures problems like operator and tensor scaling[GGOW16, BGO ` `
18] and statistical estimators such asTyler’s M estimator [FM20] and maximum likelihood estimates for matrix and tensor normal models[AKRS20]. Deciding whether the value of the optimization problem is zero or not is equivalent todeciding a central polynomial identity testing (P.I.T.) problem in invariant theory known as the nullcone problem. It is hoped that efficient optimization algorithms will result in efficient algorithmsfor the null-cone problem. One approach to complexity lower bounds, geometric complexitytheory, suggests that these P.I.T. problems should be in P [Mul17, GIM ` ` geodesically convex , a notion of convexity on a Riemannian manifold. Currently,the only implementable algorithms for geodesically convex optimization are analogues of gradientdescent and trust region methods [AMS08, ZS16, AZGL ` diameter , or how far approximate minimizers can be from the origin. The otheris a geometric measure of well-conditionedness known as margin (or gap in the noncommutativecase), which has several variants in the literature and appears in two primary ways. Firstly, thesmaller the margin, the higher the degree of precision required to decide if the value of theoptimization problem is zero or not [BFG `
19, Gur04b]. Secondly, the larger the margin, the smallerthe diameter [SV14, SV19, BFG `
19, BLNW20]. In this paper we show the following:i) In the commutative setting, and in particular for array scaling, approximate minimizers forthe functions we study can have doubly exponential condition number. That is, the problemshave exponential diameter. As a consequence, popular classes of algorithms such as gradientdescent and trust region methods cannot produce high-precision solutions in polynomial1ime in general. This result applies in the noncommutative setting as well, which providesevidence that even cutting plane methods are unlikely to produce high-precision solutions inpolynomial time. This shows it is necessary to develop powerful methods like the interiorpoint method in the geodesically convex setting.ii) In the commutative and noncommutative settings, we study the margin and gap, respectively,which appear in running time bounds for all existing algorithms. We prove that these measurescan be exponentially small in the input size for several problems including array scaling andtensor scaling. In the commutative case, this gives evidence that existing algorithms for arrayscaling do not run in near-linear time. In the noncommutative case, our results show thatmargin-based analyses like [BFG `
19] cannot prove polynomial time guarantees for decidingthe null cone problem for tensor scaling using trust region methods.We use the remainder of the introduction to describe both settings in more detail, state ourmain results precisely, and discuss previous work. For both the commutative and noncommutativesettings, we proceed in the following order. We start with an introduction and motivation ofthe setting, continue with diameter bounds and afterwards treat bounds on the margin and gap,respectively. We end each setting with a short discussion of the main proof techniques.
Matrix scaling and array scaling.
Consider the matrix scaling problem: given a nonnegativematrix A , find nonnegative diagonal matrices X, Y such that
XAY is doubly stochastic (i.e. hasrow and column sums equal to one). The matrices, if they exist, can be found by the exceedinglysimple and fast alternating minimization method known as
Sinkhorn’s algorithm . It is frequentlyused in practice, e.g. for quickly approximating the solution to optimal transport problems [Cut13].Like all other algorithms for matrix scaling, Sinkhorn’s algorithm is typically analyzed throughoptimization. One finds that X and Y are e diag p x q , e diag p y q , where x, y P R n are solutions to thefollowing optimization problem: inf x,y P R n ÿ A ij e x i ` y j ´ ¯ x ´ ¯ y (1.1)for ¯ z : “ n ř z i (c.f. [KK96]). Moreover, the infimum is greater than zero if and only if A is approximately scalable , i.e. the row and column sums of XAY can be made arbitrarily close to one for
X, Y nonnegative, diagonal.More generally, given a finite set Ω Ď R m and a nonnegative function p : Ω Ñ R ě , define the capacity [Gur04b] as the value of the unconstrained geometric program cap p p q : “ inf x P R m f p p x q : “ inf x P R m ÿ ω P Ω p ω e ω ¨ x . (1.2)The capacity is positive if and only if zero is in the Newton polytope conv p supp p q . Matrix scalingarises when m “ n and Ω “ tp ε i , ε j q : i, j P r n su for ε k : “ e k ´ n n , where e k P R n is the k th canonical unit vector and n P R n denotes the all-ones vector. In this case Eq. (1.2) reduces toprecisely Eq. (1.1), and } ∇ log f p p x q} measures the deviation of p from doubly stochastic.Matrix balancing, in which we instead wish to find a scaling for which the i th row andcolumn sum match, arises when m “ n and Ω “ t e i ´ e j : i ‰ j P r n su . When m “ n and2 “ tp ε i , ε j , ε k q : i, j, k P r n su we obtain the -dimensional array scaling problem . In analogy tomatrix scaling, in array scaling one has an array p of numbers in p R n ě q b and seeks positive vectors X, Y, Z P R n ě so that the array q with entries q ijk “ p ijk X i Y j Z k is tristochastic . That is, the sum overevery slice is equal to one, i.e. ř j,k q i ,j,k “ ř i,k q i,j ,k “ ř i,j q i,j,k “ for all i , j , k P r n s . If it ispossible to satisfy these equations to arbitrary precision we say p is approximately scalable. As formatrix scaling, p is approximately scalable if and only if cap p p q ą . In the same manner, we obtain d -dimensional array scaling for m “ dn and Ω “ Ω n,d : “ (cid:32) ε i : i P r n s ( d Ď ` R n ˘ d . (1.3)We can think of subsets of Ω n,d as d -uniform, d -partite hypergraphs. Up to an additive shift by ´ n nd , the elements of Ω n,d are indicator vectors of the edges in such hypergraphs. For d “ , thematrix p is scalable if and only if the bipartite graph corresponding to supp p contains a perfectmatching, but this is not the case for d ě (indeed, d -partite hypergraph matching is NP -hard). Algorithms for array scaling.
Array scaling serves the same role for speeding up multimarginaltransport as matrix scaling for optimal transport, and yet again there is a simple and fast alternatingminimization algorithm that produces ε -tristochastic scalings in time O p { ε q [ABA20, LHCJ19].Moreover, algorithms to approximate the capacity arise in varied settings including radial isotropicposition [HM13], entropy maximization [SV19], and approximate counting [AGV18].It is natural to ask if there are high-precision algorithms for array scaling with log p { ε q dependenceon the error and linear or mild dependence on the number of nonzero entries. For matrix scaling andmatrix balancing, several works have shown that trust regions and interior point methods can obtainsuch guarantees [CMTV17, AZLOW17]. Our work is concerned with whether the performance ofsuch algorithms carries over to array scaling and the computation of the capacity in general. Guarantees for many iterative algorithms in convex optimization require diameter bounds , or boundson the distance R from the starting point to an ε -approximate solution. Trust region methods,also called box-constrained Newton’s method , are iterative algorithms that, at each step, move to thebest solution within a typically small distance D of the previous solution. By their nature, trustregion methods take at least R { D steps to produce an ε -approximate solution. Gradient descent forLipschitz functions also depends quadratically on a diameter bound, and cutting plane methodstypically use diameter bounds to control the volume of a starting region. Known diameter upper and lower bounds.
For matrix scaling and matrix balancing, it has beenshown in [CMTV17] that one may take R “ O p n log p w A { ε qq , where w A is the ratio between the sumof the entries of the matrix and the least nonzero entry. For -dimensional array scaling, the bestupper bound of which we are aware is R “ poly p n { n , log p { ε qq , which follows from the generalupper bound of [SV19] on diameter bounds for unconstrained geometric programming. There isalso a diameter bound for array scaling in the multimarginal transport context that is polynomial inthe input size assuming the tensor has no nonzero entries [LHCJ19].Regarding diameter lower bounds, in the context of computing maximum entropy distributionsit was shown that there is some bounded set Ω Ă Z m in a poly p m q size ball such that there are no ε -approximate minimizers of norm poly p m, log 1 { ε q for f p as in Eq. (1.2) [SV19].3 ain theorem. Where do the polynomial diameter bounds for matrix scaling (i.e. -dimensionalarray scaling) transition to the superpolynomial diameter bounds for general Ω ? We show that thistransition takes place in the next simplest problem, the -dimensional array scaling problem. Theorem 1.1.
There is an absolute constant C ą and an array p ijk P p R n ě q b with O p n q nonzero entries,each of bit-complexity O p n q , that satisfies the following property. For all ă ε ď exp p´ Cn log n q and p x, y, z q P R n , if f p p x, y, z q ď cap p p q ` ε then (cid:107) p x, y, z q (cid:107) “ Ω ` n { log p { ε q ˘ . To emphasize that the difficulties do not lie in an additive vs multiplicative approximation, weremark that our array p has unit sum and cap p p q “ { . By a simple duplication trick, the samebound holds for d -dimensional array scaling with d ě ; see Corollary 3.7. Implications of Theorem 1.1 and relation to the literature.
Theorem 1.1 shows that trust regionmethods for array scaling with polynomial step size cannot provide high-precision solutions in poly p n, log p { ε qq time for d ě . Moreover, gradient descent on the Lipschitz convex function log f p has a bounded step size, and so also cannot provide high precision solutions in polynomial time.In [SV19, Section 2.1] the authors ask whether there is Ω whose elements are Boolean (up to anadditive shift) with a superpolynomial diameter lower bound. As subsets of Ω n,d are automaticallyof this form, we answer their open problem in the affirmative. Our lower bound on log R is tight upto constant factors by the diameter upper bound from [SV19] mentioned above. Determining thecorrect constant in the exponent is an interesting open direction.Lastly, we remark that [BLNW20] bounds the diameter for f p by a polynomial in the facet gap ,i.e. the minimum distance between an element of supp p and an affine hull of a facet of the Newtonpolytope. The construction in Theorem 1.1 has exponentially small facet gap; see Corollary 3.6. Many computational aspects of the capacity rely on the convex geometry of the finite set Ω Ď R m .Consider the following quantity, which we call the margin of Ω . The margin is the minimum positive distance from a convex hull of a subset of Ω to the origin. Formally, Definition 1.2 (Margin) . For a finite set Ω Ď R m , define the margin γ p Ω q by γ p Ω q : “ min (cid:32) dist ` , conv p S q ˘ | S Ď Ω , R conv p S q ( . We point out that for all considered capacity problems in this paper, the margin is actually the weight margin (c.f. [BFG `
19] and our Definition 4.3) of a certain group representation. For example,the margin for array scaling is the weight margin for tensor scaling. We now discuss how themargin enters in decision problems and diameter bounds.
Margin as a precision parameter for the decision problem.
To illustrate how the margin entersthe decision problem of whether cap p p q ą , consider matrix scaling. To certify that the capacity ofa matrix is nonzero, we compute ε -doubly stochastic scalings for some ε smaller than the distanceto doubly stochastic attained by any matrix that is not approximately scalable. This turns out to beprecisely γ p Ω n, q . More generally, it is a classical fact that for p with support contained in Ω , the4radient ∇ log f p p x q can take any value in the Newton polytope of p . Thus, cap p p q ą if and only ifthere is some x with } ∇ log f p p x q} ď γ p Ω q .For matrix scaling and matrix balancing, it is known that γ p Ω q is on the order of n ´ { , despitethe exponential number of subsets S Ď Ω ! This luck can be attributed to the extraordinary geometryof Ω in these cases, whose elements form the rows of a totally unimodular matrix (up to a shift). Onthe other hand, for d -dimensional array scaling for n “ , the margin γ p Ω ,d q is on the order of themargin of the d -dimensional hypercube t˘ u d , which satisfies γ ` t˘ u d ˘ “ d ´ d p ` o p qq by [AV97].However, between the extreme cases Ω n, (matrix scaling) and Ω ,d (the hypercube), very little isknown. Margin and related quantities for diameter bounds.
In addition to their role in the decisionproblem, margins and related quantities can be used to prove diameter bounds for Eq. (1.2). Thework [BFG `
19] proves the diameter bound poly p γ p Ω q ´ , log p { ε qq . In [SV19] it is shown that thediameter is polynomial in the logarithm of the minimum nonzero p ω and a quantity called the unaryfacet complexity . The latter is defined as the maximal length of an integer normal vector of a faceof the Newton polytope conv p supp p q . In the case when is in the relative interior of the Newtonpolytope, [SV14] has shown that there is a minimizer with Euclidean norm O p log | supp p |{ η q ,where η is the distance from to the boundary of the Newton polytope. The diameter bounds in[SV14, SV19] were used to design ellipsoid methods that are tractable even for | supp p | very large,and in [BLNW20] they were used to bound the running time of interior point methods. Main theorem.
One is led to ask if the margin remains large for array scaling when d ě . Weshow that this is not the case. In fact, the margin becomes exponentially small in nd for d ě . Whatfollows is stated in more detail later in Theorem 2.1. Theorem 1.3.
Let d ě and n ě . Let Ω n,d “ t ε i : i P r n su d Ď p R n q d , where ε j : “ e j ´ n n . Thereexists a constant C ą , independent of n and d , such that γ p Ω n,d q ď ´ Cnd . That is, there are d -dimensional arrays p P p R n ě q b d such that the d -tuple of marginals of p isat distance at most ´ Cnd from n p n , . . . , n q , yet the support of p does not admit an array withuniform marginals, i.e. cap p p q “ . We note that the support of the array p we construct has O p nd q elements. Implications of Theorem 1.3 and relation to the literature.
We remark that the constructionyields a tensor whose Newton polytope has a facet exponentially close to the origin. Therefore, thebound proved in [BLNW20] on the number of iterations for interior point methods on -tensors is Ω p k { ` k { log p { ε qq for tensors with O p k q nonzero entries.Theorem 1.3 aligns with existing results showing that the d ą array case is more complex thanthe matrix case. Indeed, it is known that the polytope of arrays with uniform marginals, known asthe d - index axial assignment polytope , has many more vertices when d ě and that the vertices canhave exponential entries [LL14]. In contrast, for d “ this polytope (known as the Birkhoff-vonNeumann polytope) has integral vertices by the Birkhoff-von Neumann theorem.The exponential rate of decay in Theorem 1.3 is tight up to log factors: calculations analogousto the proof of [BFG `
19, Theorem 6.9], show that the margin for d -dimensional array scaling is atleast p n ? d q ´ dn ´ . The proof will appear in a forthcoming version of that paper. It is interesting toask whether the true bound is ´ Θ p nd q as in our upper bound or ´ Θ p nd p log n ` log d qq as in the lowerbound. [AV97] shows that the latter is correct in the case n “ .5 .1.3 Proof techniques for the commutative setting We first discuss the techniques for proving our margin bounds. Theorem 1.3 is proven by explicitconstruction of witness sets Γ n,d Ď Ω n,d : “ t ε i : i P r n su d , i.e. R conv p Γ n,d q but zero is exponentiallyclose to conv p Γ n,d q . This is done by using that ř i n ´ ε i is the unique way to express zero as a convexcombination of the ε i , compare Lemma 2.2, and by heavily exploiting the combinatorics of Ω n,d .For example, in the case d “ and n ě the key combinatorial idea builds on a construction byKravtsov in [Kra07]. Kravtsov’s motivation is to characterize the non-integer vertices of the -indexaxial assignment polytope. He explicitly constructs a certain non-integer vertex with maximalsupport [Kra07, Theorem 1 with k “ ] which has an exponentially small entry.By definition of the 3-index axial assignment polytope, the support of this vertex corresponds toa subset S Ď Ω n, with P conv p S q . Removing the element of S corresponding to the small entryin Kravtsov’s vertex yields our witness set Γ n, with a convex hull very close to zero. In fact, thewhole idea generalizes (in a technical way) whenever d “ r ´ , r ě and n ě , see section 2.3.For n “ and d ě , the bound follows from the existing work [AV97], as mentioned before. Whilethe construction in that work via t´ , u matrices yields a stronger bound, we provide a differentconstruction of t´ , u matrices , which has the additional property of freeness . The latter willprove useful when we adapt Theorem 1.3 to the noncommutative case.We now discuss the proof of the diameter lower bound, Theorem 1.1. The high level idea is asfollows. We first construct a subset Ω Ď Ω n, with P conv p Ω q such that there is another element ω P Ω n, exponentially close to conv p Ω q , much like our construction of the witness set for smallmargin discussed above. We then choose an appropriate array p supported on Ω Y ω . This suggeststhat the only approximate minimizers of f p have a very large component in the direction x from ω to conv p Ω q , because as y P R m tends to a minimizer of f p the term e y ¨ ω should vanish compared tothe others. This reasoning requires that y is approximately a multiple of x ; to enforce this we alsoensure that zero is far into the relative interior of conv p Ω q .The structure of this argument bears some similarity to that in [SV19], which uses the constructionof [AV97]. The main difference is that the set Ω n, in the 3-dimensional array scaling problem consistsof vectors of very specific structure: up to an additive shift of ´ n n , they are Boolean vectors in R n with exactly one nonzero entry among indices in the intervals r , n s , r n ` , n s , r n ` , n s .Thus, our construction of Ω must consist of vectors of this special form and not simply boundedintegral vectors as in [SV19]. This is the main additional technical contribution of our construction. In the noncommutative setting, we consider a group G acting on C m . The optimization problemwe investigate is given by the capacity of a vector v P C m (c.f. [BFG ` cap p v q : “ inf g P G f v p g q : “ inf g P G } g ¨ v } . (1.4)For the majority of this paper we work with the tensor scaling action , in which G “ SL p n, C q d , the groupof d -tuples of complex matrices with determinant one, acts on v P p C n q b d by p g , . . . , g d q ¨ v “ p g b¨ ¨ ¨ b g d q v . The corresponding representation is always denoted by π n,d . Sometimes we also considerthe operator scaling action , in which SL p n q acts on v P p C n q b b C k by p g , g q ¨ v “ p g b g b I k q v . The p´ , q matrices from our construction are obtained by replacing all two’s in the entries of A r (2.2) with ´ . Technically we require that G is a reductive group over C which acts rationally on C m . All the group actions in thispaper satisfy this assumption. G (a torus) and making a change of variables yields an instance ofEq. (1.2) (c.f. [BFG ` G amounts precisely to the array scaling problem from the previous subsection. Likewise,restricting to diagonal matrices in the operator scaling action yields an instance of matrix scaling. Relation to null cone problem and Geometric Complexity Theory.
We study Eq. (1.4) becauseit is deeply connected to invariant theory through a well-known connection between group orbitsand invariant polynomials: zero is in the closure of an orbit of a vector v if and only if everynon-constant homogeneous G -invariant polynomial vanishes on v , i.e. if v is in the null-cone.Null-cone membership is a well-studied polynomial identity testing (P.I.T.) problem. One approachto complexity lower bounds, geometric complexity theory, suggests that null-cone membershipshould be in P [Mul17, GIM ` cap p v q “ if and only if v is in the null cone. In fact, Eq. (1.4) is a geodesically convexoptimization problem over a certain Riemannian manifold. Algebraic and optimization-basedalgorithms have, independently and nearly concurrently, resulted in polynomial time algorithmsfor nearly the same set of P.I.T. problems arising in invariant theory [FS13, Mul17, GGOW16, IQS18,DM20a, AZGL ` -tensor action. Recent degree lower bounds for invariant polynomials for the -tensor actionpose significant challenges for the algebraic approach [DM20b]. It is natural to ask whether theoptimization approach can overcome these challenges. Algorithms for computing the capacity.
A nonzero tensor w “ g ¨ v attains the capacity when w has all quantum marginals equal to I n { n . The quantum marginals of a tensor w , analogous to thesums along slices of an array, are the three n ˆ n matrices M M : , M : , M M : for the n ˆ n matrices M , M , M known as flattenings of w {} w } . For operator scaling, the capacity is attained when thefirst two quantum marginals are I n { n. To compute the capacity, existing algorithms attempt to find g such that the quantum marginals of g ¨ v are all close to I n { n . There are alternating minimizationalgorithms that can attain distance ε in time poly p n, { ε q [GGOW16, BGO ` poly p n, log p { ε qq time [AZGL ` poly p { ε q is not sufficient to efficiently decide null-cone membership, and the only algorithmswith log p { ε q dependence on ε have an exponential dependence on n [BFG ` moment polytope , denoted ∆ G p v q . In particular, R ∆ G p v q if and only if v is in the null-cone (i.e. cap p v q “ ). For tensor scaling, the moment polytope is the set of tuplesof spectra of the quantum marginals as w ranges over G ¨ v , shifted by ´ n p n , n , n q . The gap of the action of G , i.e. the minimum positive distance from to a moment polytope ∆ G p v q , is anoncommutative generalization of the margin. Whereas the operator scaling and simultaneousconjugation actions have polynomially large gaps, we show that the gap for the tensor scaling actionis exponentially small. Scaling algorithms amount to outer ε -approximation algorithms for ∆ G p v q , which is why poly p { ε q -time algorithms do not suffice to decide null-cone membership. Like for Moment polytope membership is an interesting problem in and of itself; for d “ , for generic v P p C n q b , ∆ G p v q isthe Kronecker polytope arising in representation theory and quantum information theory. Deciding membership in thispolytope is known to be in NP X coNP but not known to be in P [BCMW17]. poly p n, log p { ε qq . Here we describe how diameter bounds cause the state-of-the-art algorithms to be slow for thetensor scaling action. We begin by discussing geodesically convex optimization. In general Eq. (1.4)is not convex, but rather geodesically convex . That is, G can be viewed as a manifold in such a way thatthe function g ÞÑ } g ¨ v } is convex along “geodesics” of the form γ p t q “ e tH g for H Hermitian. Themanifold we consider is not exactly G but rather a quotient P of it; we will make this more preciselater in Section 4.5. For G “ SL p n q d , the manifold P is the set of tuples of positive-definite matriceswith determinant one. P is equipped with the geometry on positive-definite matrices known instatistics as the Fisher-Rao metric, and studied in depth in e.g. [Bha07]. Though we do not needmany details of this geometry here, one can think of the distance between g, h P G as a bound on thelogarithms of the singular values of g ´ h . In particular, the geodesic “ball” of radius R about theidentity in G is the intersection of G with the set t U exp p A q : A Hermitian , } A } F ď R, U unitary u .Note that the ball of radius ? nR includes all elements of G whose singular values are in r e ´ R , e R s . The existing algorithms to compute Eq. (1.4) adapt simple first order methods, such as gradientdescent, and second order methods, such as trust regions, to the geodesically convex setting[ZS16, AZGL `
18, BFG ` ε -approximate solution is contained in a geodesic ball of radius poly p n d , log p { ε qq .However, for -tensors we have the following diameter lower bound. Theorem 1.4 (Noncommutative diameter lower bound) . There is a constant C ą such that thefollowing holds. For all ε ď exp p´ Cn log n q , there is a tensor v “ v p ε q P p C n q b with O p n q nonzeroentries of bit complexity O p log n ` log p { ε qq , and a geodesic ball B “ B p ε q of radius Ω ` n { log p { ε q ˘ about the identity in SL p n q , such that inf g P B } g ¨ v } ě cap p v q ` ε. To emphasize that the difficulties are not caused by requiring additive approximation, we remarkthat the vector v satisfies { ď cap p v q ď and { ď } v } ď . A duplication trick analogous toCorollary 3.7 yields the same diameter bound for d ě , but for the action of G simultaneously on atuple of tensors rather than on a single one. See Corollary 4.24. Implications of Theorem 1.4 and relation to the literature.
Theorem 1.4 shows that trust regionmethods with constant step size cannot ε -approximate the capacity in poly p n, { ε q time for -tensors.It also shows that cutting plane methods are unlikely to do so. Cutting plane methods, such asellipsoid, require an exponential bound on the volume of a known region containing an approximateoptimizer. This is the case for Rusciano’s non-constructive query upper bound for cutting planemethods on manifolds of non-positive curvature [Rus20], which is essentially tight [HM21] . Thevolume of a ball in the manifold we consider grows exponentially in the radius (see Section 4.5), sothis query bound will be exponential. Regarding tightness, the best upper bound known to theauthors for the diameter bound in the noncommutative case is O p n p? n q ` n log p { ε qq , which can We define exponentials, Hermitian-ness, and Frobenius norm on tuples by treating them as block diagonal matrices. [HM21] applies to the hyperbolic plane, which is a totally geodesic submanifold of the manifold P we consider
8e deduced from the diameter and margin bounds [BFG `
19, Proposition 5.5, Theorem 6.9]. Thismatches our lower bound up to logarithmic factors in the exponent. It would be interesting to provea version of Theorem 1.1 that held for ε larger than the gap. This would imply that trust regionmethods cannot solve the null-cone problem for the -tensor action in polynomial time. In analogy to the commutative case, one typically attempts to certify cap p v q ą , i.e. P ∆ G p v q ,by finding a tensor g ¨ v such that all the quantum marginals are close to n I n . In order to certify cap p v q ą their distance to n p I n , I n , . . . , I n q must be at most a certain quantity, which we call the gap . Definition 1.5 (Gap) . The gap for the d -tensor scaling problem is γ G p π n,d q : “ min ! dist ` , ∆ G p v q ˘ | v P p C n q b d , v ‰ , R ∆ G p v q ) . If the gap is exponentially small, high-precision algorithms will be necessary to decide if cap p v q ą . In operator scaling, the gap is known to be Ω p n ´ { q [Gur04a], which explains why wedo not need high-precision algorithms for the decision problem in that case. In addition to its rolein the decision problem, the inverse of the gap is used to control the diameter bound [BFG ` d ě . Theorem 1.6.
There is a constant C ą such that for all d ě and n ě , there are non-zero tensors v P p C n q b d such that R ∆ G p v q but dist p , ∆ G p v qq ď ´ Cdn . That is, the gap for d -tensor scaling satisfies γ G p π n,d q ď ´ Cdn . A detailed statement on bounds for the gap can be found in Theorem 4.11, and we show inAppendix C how to fill in the missing values of n, d to obtain Theorem 1.6. Since the gap is largerthan the margin (c.f. Proposition 4.6), Theorem 1.6 is at least as tight as Theorem 1.3, i.e. theexponent
Cnd is tight up to an O p log n ` log d q factor. Margin and gap results for other group actions
In addition to the tensor scaling action, we also consider some other actions of groups G ofinterest in computational invariant theory. The first is the action of the special linear group on thespace of homogeneous d -forms C r x , . . . , x n s d , in which G “ SL p n q acts by g ¨ p p x q “ p p g ´ x q for p P C r x , . . . , x n s d . Homogeneous d -forms were among the objects studied earliest in computationalinvariant theory, and much of the theory was developed to catalogue invariants of the SL p n q action on forms [Wey46]. Still, deciding null-cone membership for d “ seems challenging. Afterextending the definition of the gap to other group actions in Section 4, we explain the difficultyby showing that the gap for this action is also inverse exponential in n as soon as d ě , seeTheorem 4.17. This shows that the diameter bound in [BFG `
19] becomes exponentially large in n . This notion can be defined similarly for any rational representation π of a reductive group G , see Definition 4.3. Thisdefinition of the gap is already described in [BFG ` actually, a smaller quantity known as weight margin SL p n q d on quivers with d vertices. A quiveris a directed multigraph, and a quiver representation is a labelling of the vertex set Q of the quiverwith finite-dimensional vector spaces and the edge set Q with a linear map from the vector spaceat the tail of the edge to the vector space at the head of the edge. Given a quiver representation A with vertices labeled by C n x for x P Q and edges e : x Ñ y labeled with matrices A e , the group G “ ś x P Q SL p n x q acts on A by p g ¨ A q e “ g y Ag ´ x . Quiver representations include the operatorscaling action, and an action used to bound the Brascamp-Lieb constant in analysis. In Section 4.6we show that the (weight) margin can become exponentially small as the number of vertices grows.For this, we exhibit a quiver with d ´ arrows, d vertices of dimension n and weight margin O p n ´ d q ,see Theorem 4.25. This bound shows that the diameter bound computed in [BFG `
19] can becomeexponentially large in d . Furthermore, when allowing n copies of each arrow in the constructedquiver, i.e. n p d ´ q arrows in total, we can ensure the same bound for the gap, Theorem 4.25. Regarding the idea of the proof, we may transfer both the diameter lower bound and the gap upperbound to the commutative case by virtue of the tensors we construct having free support .A tensor has free support if any two distinct p d ´ q -dimensional slices of the tensor have disjointsupport. This condition ensures that, even after being acted on by any diagonal group elements, thetensor’s quantum marginals are all diagonal. This allows us to restrict to the action of the diagonalmatrices and thereby reduce to the commutative (array scaling) case. Thus, we may obtain thesame bounds on the tensor gap as for the array margin. However, this requires additional careto ensure freeness of our constructions. This is why we cannot naïvely use the construction of[AV97] for d -tensors with n “ . Regarding the noncommutative diameter bound, we show that fortensors with free support the diameter bound matches that of the commutative problem obtainedby restricting to the diagonal. To do this, we project the group elements to the set of diagonalelements, and use the properties of spaces of non-positive curvature to show that this projectionmoves the point nearer to the origin and decreases the function value.The idea and the concept of freeness generalize to rational representations of reductive groups[Fra02]. The key statement is given in full generality in Proposition 4.8. This proposition isneeded to prove bounds on the gap for the action on homogeneous polynomials and for the actionon quivers. Interestingly, in [DM20b] the concept of freeness is used in a similar way to proveexponential lower bounds on the degree of invariants for actions on cubic forms and -tensors.There, free is called uncramped and it is used crucially to prove closedness of certain orbits. We begin with the commutative case, which is split into the study of the margin in Section 2and diameter bounds in Section 3. Then we move to the noncommutative case in Section 4. Theappendix contains some representation-theoretic background and proofs of technical lemmas. This concept is also implicitly contained in [Sja98, Lemma 7.1] and can at least be traced back to [DK85] as strongorthogonality . Indeed, [DM20b, Theorem 6.5] is used to show the vanishing of the moment map at a vector. First, freeness is usedas in Proposition 4.8 to ensure that one can restrict to the moment map for the maximal torus. Second, condition (2) of[DM20b, Theorem 6.5] just states that the moment map for the torus action vanishes at the vector. The geometry of commutative scaling problems
The purpose of this section is to show the following theorem on the margin of d -dimensional arrayscaling. Recall that the latter arises for Ω n,d : “ t ε i : i P r n su d Ď p R n q d . Theorem 2.1 (Margin for array scaling) . The margin of Ω n,d Ď p R n q d is bounded as follows.(a) If n “ and d ě , then γ p Ω ,d q ď ´ d ` . (b) If n ě and d “ , then γ p Ω n, q ď ´ n ` .(c) If n ě and d “ r ´ for some integer r ě , then γ p Ω n,d q ď ? p n ´ q? r ´ r p n ´ q` ď ´ r p n ´ q` “ ´ p d ` qp n ´ q ` . By “padding” the tensors appropriately, one sees that a bound for γ p Ω n,d q also applies to γ p Ω n,d ` q (see Proposition C.1). Combining this result with Theorem 2.1 above implies Theorem 1.3from the introduction. The next three subsections each prove one of the parts of Theorem 2.1; theconstruction for part (a) with n “ is slightly different and the construction for part (c), d ą builds on the one for part (b), d “ .To prove the results, we will frequently use the following simple lemma. Lemma 2.2. In R n we have n ÿ i “ n ε i “ n (2.1) and this is the only affine linear combination of ε , . . . , ε n giving zero.Proof. One calculates directly that ř i n ε i “ n . To show uniqueness of this affine combination, wenote that the vectors e , . . . , e n , n are linearly independent. Thus, ε , . . . , ε n are linearly independent.On the other hand, ε , . . . , ε n are linearly dependent. Therefore tp λ , . . . , λ n q P R n | ř i λ i ε i “ n u isa one-dimensional subspace of R n , which yields the uniqueness of the affine linear combination. In this subsection we prove part (a) of Theorem 2.1 by showing that the margin of Ω ,d is exponentiallysmall in d . This follows from [AV97], but we present a new construction which has the additionalproperty of freeness , which we discuss later in Section 4. Recall that Ω ,d “ (cid:32) p ε i , . . . , ε i d q | i , . . . , i d P r s ( Ď ` R ˘ d . In the following we construct a subset of Ω ,d , which witnesses the exponentially small margin. Forthis, we construct a matrix with entries in r s , and each row of the matrix will correspond to anelement of Ω ,d . For example, the row p , , q would correspond to p ε , ε , ε q P Ω , . To do so, webegin with the matrices A : “ ˆ ˙ , B : “ ˆ ˙ , B : “ ˆ ˙ , B : “ ˆ ˙ , “ ¨˚˚˝ ˚ ˚˚ ˚ ˚˚ ˚˚ ˚ ˛‹‹‚ , A “ ¨˚˚˚˚˚˚˝ ˚ ˚˚ ˚ ˚ ˚ ˚˚ ˚˚ ˚ ˚ ˚˚ ˚ ˚˚ ˚ ˛‹‹‹‹‹‹‚ Figure 2.1: The positions of the ones in A and A are marked by ˚ in the following figure and thecells are colored according to whether they belong to A , B , B or B .and define recursively A r ` : “ ¨˚˚˚˝ B A r ...B B ¨ ¨ ¨ B B ˛‹‹‹‚ “ ¨˚˚˚˚˝ A B ¨ ¨ ¨ B B B . . . ...... . . . . . . B B ¨ ¨ ¨ B B ˛‹‹‹‹‚ (2.2)for r ě . Fig. 2.1 is supplied as a visualization aid.We remark that the entry of A r at position p i, j q is independent of r and denote it by a p i, j q . Weset for r ě , r : “ (cid:32)` ε a p i, q , ε a p i, q , . . . , ε a p i, r q ˘ | i P r r s ( Ď Ω p π , r q Ď ` R ˘ r , Γ , r ` : “ (cid:32)` ε a p i, q , ε a p i, q , . . . , ε a p i, r q , ε χ p i q ˘ | i P r r s ( Ď Ω p π , r ` q Ď ` R ˘ r ` , where χ : N Ñ t , u , i ÞÑ i mod 2 . That is, Γ , r is the subset of Ω , r induced by the rows of A r and Γ , r ` is obtained by alternatingly appending ε or ε to the r -many elements of Γ , r . Lemma 2.3.
For r ě it holds that R Aff p Γ , r q and R Aff p Γ , r ` q .Proof. By construction, P Aff p Γ , r ` q implies P Aff p Γ , r q , so it suffices to prove R Aff p Γ , r q .We proceed by induction on r ě . For r “ , it is clear that R Aff p Γ , q Ď R ˆ t ε u . Now assumethat R Aff p Γ , r q . For the sake of contradiction, let r ` ÿ i “ λ i ` ε a p i, q , ε a p i, q , . . . , ε a p i, r ` q ˘ “ P ` R ˘ r ` (2.3)be an affine linear combination of Γ , r ` . Then equation (2.3) gives in each of the p r ` q -many R -components the affine linear combination ´ p ε ` ε q “ , by Lemma 2.2. Considering thescalar factor of ε in the first, the penultimate and the last R -component respectively, we conclude r ` ÿ j “ λ j ´ loooomoooon first “ “ λ r ` ` r ÿ j “ λ j ´ loooooooooomoooooooooon penultimate “ “ λ r ` ` r ` ÿ j “ λ j ´ loooooooooomoooooooooon last by construction of A r ` . Hence, λ r ` “ using the first and last component. Furthermore, thefirst and penultimate column give λ r ` “ λ r ` “ . Therefore, the first r -many components inEq. (2.3) show P Aff p Γ , r q , which contradicts our induction hypothesis.12 emma 2.4. For r ě it holds that dist p , conv p Γ , r qq ď ´ r ` and dist p , conv p Γ , r ` qq ď ´ r ` .Proof. We first prove the inequality for conv p Γ , r q . For i P r r s let ω i : “ ` ε a p i, q , . . . , ε a p i, r q ˘ P ` R ˘ r be the weight in Γ , r that corresponds to the i th row of A r . Consider the convex combination p x , . . . , x r q : “ ´ r p ω r ´ ` ω r q ` r ´ ÿ l “ ´ l ´ p ω l ´ ` ω l q P ` R ˘ r . (2.4)Note that x i P R . We will argue that p x , . . . , x r q “ ´ r ` p , . . . , , ε q . Since x is a convexcombination of the elements in Γ , r , the statement then follows from } ε } “ ? ´ .We consider A r like in its construction (2.2) as a r ˆ r block matrix with block entries being ˆ matrices. For m P r r s the two weights ω m ´ and ω m correspond to the m th block row of A r and have the same scalar factor in (2.4). Hence, whenever for i P r r s the i th column of the m th block row of A k contains exactly one entry equal to one (and so the other entry equals two),then the contribution of ω m ´ and ω m to x i cancels due to ε ` ε “ . In particular, in (2.4) allcontributions of block entries equal to B cancel. Therefore the last column of A r gives x r “ ´ r p ε ` ε q “ ´ r ` ε . Furthermore, x “ x “ . . . “ x r ´ “ using that also the first columns of A , of B and of B contain exactly one entry equal to one. For r “ we are done. If r ě , then reading off the secondcolumn of A r , we find x “ ´ p ε ` ε q loooooomoooooon first block row ` ´ r p ε ` ε q loooooomoooooon last block row ` r ´ ÿ l “ ´ l ´ p ε ` ε q looooooomooooooon middle rows “ ´ p ε ` ε q “ . Analogously, as B does not contribute we compute for j “ , , . . . , r ´ that x j “ ´ j ´ p ε ` ε q loooooooomoooooooon j th block row ` ´ r p ε ` ε q loooooomoooooon last block row ` r ´ ÿ l “ j ` ´ l ´ p ε ` ε q looooooomooooooon in between rows “ ´ j p ε ` ε q “ , because the second columns of B and B are, respectively, p , q T and p , q T . This proves theinequality in the case Γ , r .By construction, for Γ , r ` the same convex combination works, because the last R -componentdoes not contribute as the entries of the weights alternate between ε and ε .Finally, Lemma 2.3 and Lemma 2.4 together yield Theorem 2.1(a), noting that for odd d “ r ` one has ´ r ` { “ ´p d { q ` . The main goal of this section is to show that the margin of Ω n, is exponentially small in n , i.e. toshow Theorem 2.1(b). To do so, we set W n : “ n ď s “ tp s, , s q , p s, s, q , p s ´ , s, s qu Ď r n s ˆ r n s ˆ r n s (2.5)13nd consider the corresponding subset Γ n, : “ (cid:32) p ε i , ε j , ε k q | p i, j, k q P W n ( Ď Ω n, . (2.6)The key combinatorial idea, which is presented in the following lemma, is due to [Kra07,Theorem 1 with k “ ]. According to [Kra07] the special case k “ is already contained in [KL05,Theorem 9]. Lemma 2.5.
Let n ě . For p i, j, k q P r n s z ` W n Y tp , , qu ˘ set λ i,j,k : “ . Moreover, define λ , , : “ ´ n ` , λ , , : “ ´ ´ n ` , λ n, ,n “ λ n,n, : “ ´ and for s “ , , . . . , n ´ λ s, ,s “ λ s,s, : “ ´ n ` s ´ , λ s,s ` ,s ` : “ ´ ´ n ` s . Then the following equations hold: ¨˝ @ i P r n s : n ÿ j,k “ λ i,j,k “ ˛‚ , ¨˝ @ j P r n s : n ÿ i,k “ λ i,j,k “ ˛‚ , ˜ @ k P r n s : n ÿ i,j “ λ i,j,k “ ¸ . (2.7) In particular, ř i,j,k λ i,j,k “ n .Proof. This is a part of [Kra07, Theorem 1] for k “ . Alternatively, the statement can be checked bystraightforward computation. Lemma 2.6.
For n ě , it holds that dist ` , conv p Γ n, q ˘ ď ´ n ` .Proof. Define λ i,j,k ě for all i, j, k P r n s as in Lemma 2.5. Note that ř ni “ ε i “ ; thus Lemma 2.5implies ÿ i,j,k λ i,j,k p ε i , ε j , ε k q “ n , equivalently ´ ´ n ` p ε , ε , ε q “ ÿ p i,j,k qP W n λ i,j,k p ε i , ε j , ε k q . Normalizing the latter equation we obtain x : “ ´ c n ´ p ε , ε , ε q P conv p Γ n, q , where c : “ ÿ p i,j,k qP W n λ i,j,k “ n ´ ´ n ` ě ? . Finally, (cid:107) ε (cid:107) ď implies (cid:107) x (cid:107) ď c ´ ´ n ` ? ď ´ n ` .To finish the proof of Theorem 2.1(b) we are left to show R conv p Γ n, q . We actually prove thestronger statement R Aff p Γ n, q . Lemma 2.7.
The zero vector is not contained in the affine hull of Γ n, . roof. For a proof by contradiction we assume P Aff p Γ n, q . Then there exist a s , b s , c s P R for s “ , , . . . , n such that ř s a s ` b s ` c s “ and n ÿ s “ ` a s p ε s , ε , ε s q ` b s p ε s , ε s , ε q ` c s p ε s ´ , ε s , ε s q ˘ “ p n , n , n q P p R n q . In each of the three R n -components we obtain n as an affine linear combination of ε , . . . , ε n .Applying Lemma 2.2 to the coefficient of ε s ´ in the first component, respectively to the coefficientof ε s in the second and third component yields a s ´ ` b s ´ ` c s “ n for s “ , , . . . , n (2.8)respectively b s ` c s “ a s ` c s “ n for s “ , , . . . , n (2.9)where we necessarily set a “ b : “ . Equation (2.8) for s “ is c “ n ´ and hence a “ b “ by(2.9) for s “ . But now (2.8) for s “ gives c “ n ´ and we can proceed inductively to conclude a s “ b s “ and c s “ n ´ for all s “ , , . . . , n . The latter contradicts ř s a s ` b s ` c s “ , so wemust have R Aff p Γ n, q . d -tensors In this subsection we show that the margin of Ω n,d is inverse exponential in nd for n, d ě , provingpart p c q of Theorem 2.1.Let us give some intuition for our construction. The main idea is to recycle the constructionfrom the previous subsection for some multiple of n , i.e. considering W rn for r ě . Thereby,the main challenge is to ensure that the constructed subset of Ω n,d does not contain zero in itsconvex hull. We can try to extend the elements of Ω n, to elements of Ω n,d . One natural ideais duplicate each component d { times, i.e. when d “ the vector p ε i , ε j , ε k q P Ω n, becomes p ε i , ε i , ε j , ε j , ε k , ε k q P Ω n, . However, we need a subset of Ω n,d with rn many elements to imitate theconstruction from the previous subsection. We still extend the elements of Ω n, in this way, but willadditionally “shift” and “twist” by some functions σ , . . . , σ r ´ : r rn s Ñ r n s , so that the elementsof our set will look like ´ ε σ p i q , . . . , ε σ d { p i q , ε σ p j q , . . . , ε σ d { p j q , ε σ p k q , . . . , ε σ d { p k q ¯ for d { “ r ´ and p i, j, k q in W rn . We now set about choosing the functions σ k . For this, let n ě and fix a natural number r ě . For i P r r s we consider σ i : r rn s Ñ r n s , j ÞÑ R j ` p i ´ q r V mod nσ r ` i : r rn s Ñ r n s , j ÞÑ $’&’% if j “ r ´ i ` if j “ r ` σ p j q elseand combine the first r ´ of these functions to obtain σ : r rn s Ñ r n s r ´ , j ÞÑ ` σ p j q , σ p j q , . . . , σ r ´ p j q ˘ . xample 2.8. For r “ the functions σ , . . . , σ are sketched by the following table. j ¨ ¨ ¨ n ´ n ´ nσ ¨ ¨ ¨ n n nσ ¨ ¨ ¨ n n σ ¨ ¨ ¨ n σ ¨ ¨ ¨ n n nσ ¨ ¨ ¨ n n n Remark 2.9.
By construction, each element of r n s is attained exactly r -times by σ k , k P r r ´ s . Moreover,the definition of σ , . . . , σ r yields that σ is injective. For i, j, k
P r rn s we introduce the short-hand ε σ p i q : “ ` ε σ p i q , ε σ p i q , . . . , ε σ r ´ p i q ˘ P p R n q r ´ ε σ p i q ,σ p j q ,σ p k q : “ ` ε σ p i q , . . . , ε σ r ´ p i q , ε σ p j q , . . . , ε σ r ´ p j q , ε σ p k q , . . . , ε σ r ´ p k q ˘ P p R n q r ´ . and we set J r : “ (cid:32) p s, , s q , p s, s, q | s “ , , . . . , r ( Ď Z . In the following we show that the convex hull of the set Γ n, r ´ “ (cid:32) ε σ p i q ,σ p j q ,σ p k q | p i, j, k q P W rn z J r ( Ď Ω n, r ´ Ď ´` R n ˘ r ´ ¯ does not contain the zero vector, but is very close to it. Lemma 2.10.
For n ě and r ě it holds that R Aff p Γ n, r ´ q . Below we give the proof in the special case r “ , in which all main ideas of the general proofbecome apparent and visible. The proof for the general statement is given in Appendix D andcertainly looks technical at a first encounter. Therefore, we strongly suggest that the reader firstreads the proof for r “ below, which contains all the main ideas. Proof of Lemma 2.10 for r “ . For the sake of contradiction assume that P Aff p Γ n, q . Then thereare coefficients a s , b s , c s P R , where ď s ď n , such that a “ a “ b “ b “ , ř s p a s ` b s ` c s q “ and n ÿ s “ ` a s ε σ p s q ,σ p q ,σ p s q ` b s ε σ p s q ,σ p s q ,σ p q ` c s ε σ p s ´ q ,σ p s q ,σ p s q ˘ “ P p R n q . (2.10)The bulk of our work will consist of proving the equations b ` c “ b ` c “ . . . “ b n ` c n (2.11) One could suggest to consider the set t ε σ p i q ,σ p j q ,σ p k q | p i, j, k q P W rn u , but this still won’t ensure that zero is not inthe convex hull. The intuition behind is, that Γ n, from the last section is “nearly at the limit”, i.e. R conv p Γ n, q but P conv p Γ n, Y tp ε , ε , ε quq . Now the function σ “introduces r ´ additional linear relations” as ε σ p i q P p K n q r ´ ,since the orthogonal complement K n Ď R n has codimension one while p K n q r ´ Ď p R n q r ´ has codimension r ´ .Thus, it is reasonable to remove r ´ many elements from W rn . ` c “ a ` c “ . . . “ a n ` c n . (2.12)From here we will derive a contradiction. We now set about proving Eqs. (2.11) and (2.12). Rewritethe left-hand-side of Eq. (2.10) as the collection for k P r s of the following affine linear combinationsof ε , . . . , ε n in R n : n ÿ s “ ` a s ε σ k p s q ` b s ε σ k p s q ` c s ε σ k p s ´ q ˘ “ (2.13) n ÿ s “ ` a s ε σ k p q ` b s ε σ k p s q ` c s ε σ k p s q ˘ “ (2.14) n ÿ s “ ` a s ε σ k p s q ` b s ε σ k p q ` c s ε σ k p s q ˘ “ . (2.15)If we expand each expression as an affine linear combination of the ε l , then by Lemma 2.2 thecoefficient of ε l must be n ´ for all l P r n s . Translating this for equation (2.13) with k “ , l “ , . . . , n and using Example 2.8 we obtain p a m ´ ` a m ´ ` a m ´ q ` p b m ´ ` b m ´ ` b m ´ q ` p c m ´ ` c m ´ ` c m q “ n (2.16)for m “ , , , . . . , n . A similar calculation for k “ , and l “ , . . . , n shows Eq. (2.16) holds forall ď m ď n ` , where we set c n ` : “ .Similarly for Eq. (2.14) with l “ , . . . , n and k “ , , we obtain for ď m ď n that p b m ´ ` c m ´ q ` p b m ´ ` c m ´ q ` p b m ` c m q “ n (2.17)and the same equations with “ b ” replaced by “ a ” when considering Eq. (2.15).In the following we prove Eq. (2.11). Subtracting (2.17) from (2.17) with values of m differing byone, we deduce that b ` c “ b ` c “ . . . “ b n ´ ` c n ´ b ` c “ b ` c “ . . . “ b n ` c n , and b ` c “ b ` c “ . . . “ b n ´ ` c n ´ . Next we deduce Eq. (2.11) by showing b ` c “ b ` c “ b ` c .To do so, we apply Lemma 2.2 to (2.14) for the coefficient of ε using Example 2.8, which yieldsfor k “ , the equations p b ` c q ` p b ` c q ` p b ` c q “ n (2.18) p b ` c q ` p b ` c q ` p b ` c q “ n (2.19)respectively. Subtracting the two shows b ` c “ b ` c , and we have b ` c “ b ` c viasubtracting (2.18) from (2.17) for m “ . This completes the proof of Eq. (2.11); using Eq. (2.15) wesimilarly deduce Eq. (2.12). 17o get a contradiction we show that a s “ b s “ c s “ for all s “ , , . . . , n . For this, we set a : “ ř s a s and b : “ ř s b s , and recall that we have defined a “ a “ b “ b “ . This time we useLemma 2.2 applied to the coefficient of ε in (2.13), in (2.14) and in (2.15) respectively for k “ toget c ` c ` c “ n , a ` c ` c “ n and b ` c ` c “ n (2.20)respectively. We deduce from these three equations that a “ b “ c . Furthermore, b “ b “ shows that (2.17) for m “ is b ` p c ` c ` c q “ n ´ . Subtracting from the latter the left-handequation in (2.20) yields b “ . Similarly, a “ follows from a “ a “ and the analogousequation of (2.17) with a ’s replaced by b ’s.Now, (2.16) for m “ simplifies to c ` c ` c “ n ´ . Thus, c “ c with (2.20) and therefore a “ b “ by (2.11), (2.12) and a “ b “ . This simplifies (2.16) for m “ to c ` c ` c “ n ´ .Hence, c “ c as we also have c ` c ` c “ n ´ and we get via (2.11) and (2.12) that a “ b “ .The latter in turn shows that (2.16) for m “ becomes c ` c ` c “ n ´ , so c “ c and a “ b “ by, again, (2.11) and (2.12).It should have become apparent that we can proceed inductively in the same manner with (2.16)for m “ , . . . , n ` ; thereby using (2.11) and (2.12) to deduce a s “ b s “ for all s “ , , . . . , n .In particular, a “ b “ c “ . Finally, Eq. (2.11) implies c “ c s for all s “ , , . . . , n , which givesthe desired contradiction.We finish the proof of part p c q of Theorem 2.1 by showing the following Lemma. Lemma 2.11.
Let n ě and r ě . Then dist ` , conv p Γ n, r ´ q ˘ ď ? p n ´ q? r ´ r p n ´ q` ď ´ r p n ´ q` . Proof.
We set N : “ rn and for i, j, k P r N s we set λ i,j,k as in Lemma 2.5 applied for the dimension N . Then Eq. (2.7) of Lemma 2.5 yields N ÿ i,j,k “ λ i,j,k ` ε σ p i q , ε σ p j q , ε σ p k q ˘ “ N ÿ i,j,k “ λ i,j,k ` ε σ p i q , , ˘ ` N ÿ i,j,k “ λ i,j,k ` , ε σ p j q , ˘ ` N ÿ i,j,k “ λ i,j,k ` , , ε σ p k q ˘ “ N ÿ i “ ` ε σ p i q , , ˘ ` N ÿ j “ ` , ε σ p j q , ˘ ` N ÿ k “ ` , , ε σ p k q ˘ “ N ÿ i “ ε σ p i q ,σ p i q ,σ p i q “ P p R n q r ´ , where we used in the last step equation (2.1) and Remark 2.9, i.e. that each element of r n s is attainedexactly r -many times by all σ k : r rn s Ñ r n s , k P r r ´ s . Because W N contains the support of λ apart from the element p , , q , we have ÿ p i,j,k qP W N z J r λ i,j,k ε σ p i q ,σ p j q ,σ p k q “ ´ λ , , ε σ p q ,σ p q ,σ p q ´ ÿ p i,j,k qP J r λ i,j,k ε σ p i q ,σ p j q ,σ p k q “ : x P p R n q r ´ , (2.21)18hich is an element in the positive cone of Γ n, r ´ “ t ε σ p i q ,σ p j q ,σ p k q | p i, j, k q P W N z J r u . Normalizingthe latter equation with c : “ ÿ p i,j,k qP W N z J r λ i,j,k “ N ÿ i,j,k “ λ i,j,k ´ ¨˝ λ , , ` ÿ p i,j,k qP J r λ i,j,k ˛‚ ě N ´ shows c ´ x P conv p Γ n, r ´ q . To bound the norm of c ´ x we compute λ , , ` ÿ p i,j,k qP J r λ i,j,k “ ´ N ` ` r ÿ s “ p λ s, ,s ` λ s,s, q“ ´ N ` ` r ÿ s “ ` ´ N ` s ´ ` ´ N ` s ´ ˘ “ r ÿ s “ ´ N ` s ă ´ N ` r ` . Finally, using } ε i ,i ,...,i r ´ } ď ? r ´ for any i , i , . . . , i r ´ P r n s together with the triangleinequality on Eq. (2.21) implies } c ´ x } ď ? r ´ N ´ ´ N ` r ` ď ? p n ´ q? r ´ N ` r ` ď ´ N ` r ` “ ´ r p n ´ q` , where we used n ě and r ě for ? ď p n ´ q? r . A simple example of Eq. (1.2) is the minimization of an n -variate homogeneous polynomial ofdegree d with nonnegative coefficients over the set x , . . . , x n ą , ś x i “ , as studied in [Gur04b].In this case the sets conv p S q for S Ď Ω are Newton polytopes of homogeneous polynomials, and theminimum of a polynomial is bounded below if and only if the Newton polytope contains dn n . Ifthe polynomials are hyperbolic of degree n , as in [Gur04b], their Newton polytope either contains n or is at least {? n away from it. However, we show that for general homogeneous polynomialsthe margin can get exponentially small in n even for d “ .Minimizing a degree d homogeneous polynomial ř α P Z n ě p α x α with nonnegative coefficientsover the set x , . . . , x n ą , ś x i “ is the same as computing Eq. (1.2) for Ω : “ " ´ α ` dn n ˇˇˇˇ α P p Z ě q n with | α | “ d * . (2.22)If n “ dm for some integer m ě , then we have ´ Ω m,d Ď Ω . Therefore, Theorem 2.1(b) and (c) andthe padding from Appendix C directly yield the following. Corollary 2.12 (Margin for Polynomial scaling) . Fix some d ě and assume n “ dm for some m ě .Let Ω be as in Eq. (2.22) . Then γ p Ω q ď γ p Ω m,d q ď ´ m ` “ ´ nd ` . and for d ě we even have γ p Ω q ď γ p Ω m,d q ď ´ Y p m ´ qp d ` q ] ` « ´ n . Thus, for fixed d ě and n Ñ 8 the margin of Ω can be exponentially small in n . In terms ofpolynomials, this states that the Newton polytope of a degree d ě homogeneous polynomial canbe exponentially close to the origin without containing it.19 Diameter bounds in the commutative case
In this section we describe an array such that all approximate scalings are very ill conditioned,proving Theorem 1.1. Let us define the diameter bound.
Definition 3.1.
Let ε Ñ and f : R m Ñ R . The diameter bound D f p ε q is defined as the infimum over R ą such that inf } x }ď R f p x q ď ε ` inf x P R m f p x q . Thus, Theorem 1.1 is equivalent to the statement that D f p ε q “ Ω p n { log p { ε q for ε ď e ´ Cn log n .We now give a proof outline for Theorem 1.1. The high-level intuition applies not only to array scaling but to the capacity in general. Recall thatthe array scaling capacity is inf x P R n ÿ ω P Ω p ω e ω ¨ x for Ω “ Ω n, “ t e i ´ n n : i P r n su Ď R n . We build both the support Ω Ď Ω n, and the entries p inthe following way. We construct a set Ω Ď Ω n, , another element ω P Ω n, , and an array q with thefollowing properties.1. The set Ω Ď Ω n, should be the support of a tristochastic array q .2. The affine hull of Ω , should have codimension one in R n .
3. The origin is in the relative interior of conv p Ω q . Note that the origin is already in conv p Ω q bythe tristochasticity of q .4. The vector ω P Ω n, should be at a very small, but positive, distance η from Aff p Ω q . Note thatthis already implies that the facet gap of Ω Y ω is small.Finally, we define the entries of p by p | Ω “ q , p ω “ , and p ω “ elsewhere. Assuming we havefound p according to this process, we now give intuition for the diameter bound.Let v be the projection of ω to the orthogonal complement of Aff p Ω q . Intuitively, the capacityis only approximately attained by vectors very far in the ´ v direction. Indeed, first note that cap p p q “ { , because cap p q q “ by tristochasticity, cap p p q ě cap p q q “ , and f p p´ tv {} v }q “ ` e ´ ηt so f p p´ tv {} v }q tends to . However, f p p´ tv {} v }q tends to slowly if η is small. Indeed, f p p´ tv {} v }q ď p ` ε q only if t ě η log p { ε q .To conclude rigorously that the capacity is only approached by vectors very far in the ´ v direction, we must rule out directions with nonzero components in Aff p Ω q . For this, we mustuse the assumption that is rather deep in the relative interior of conv p Ω q . If this is the case,then any ε -approximate minimizer must have a bounded component in Aff p Ω q , for otherwise thecontribution to f p from the elements of Ω alone will be larger than ` ε .The remainder of the section will be concerned with the construction of a subset Ω , an array q ,and an element ω with these properties. This will not quite apply in our setting, because Aff p Ω n, q is not full-dimensional. Instead, Aff p Ω q will becodimension one in Aff p Ω n, q . .2 The construction We construct the subset Ω from a directed graph D on r n s , which we will determine later. If i, j isan edge in D , then Ω includes the elements p ε i , ε i , ε j q as well as the three cyclic permutations of it.That is, Ω “ tp ε j , ε i , ε i q , p ε i , ε j , ε i q , p ε i , ε i , ε j q : ij P E p D qu . We now describe the graph, as seen in Fig. 3.1.
Definition 3.2.
The graph D l “ p W, E q is a directed tree with l ` levels, where the root is on the th leveland the leaves are on the l th level. The tree is constructed as follows. • All the edges are directed towards the root and are between adjacent levels. • The root has three children, and on the l ´ levels below the root every node has one child. • Additionally, one of the vertices on level l ´ has an additional child which has its own child.Explicitly, the vertices W and edges E are given by W “ t u i , v i , w i : i P r l su Y t w : “ u : “ v , ¯ w l ´ , ¯ w l u .E “ t u i u i ´ , v i v i ´ , w i w i ´ : i P r l su Y t ¯ w l ´ w l ´ , ¯ w l ¯ w l ´ u . Note that D l has p l ` q vertices so we set n “ p l ` q . Thus D l has l ` edges and so | Ω | “ p l ` q “ n ´ . It is helpful to construct the matrix M whose set of rows is Ω . Tomake the matrix sparser, first replace ε i by e i by restricting to the minimization to the subspace ř x i “ ř y i “ ř z i “ , which is without loss of generality. We define Ω Ď R n to be Ω but witheach p ε i , ε j , ε k q replaced by p e i , e j , e k q ; define Ω n, similarly and define p p e i ,e j ,e k q : “ p p ε i ,ε j ,ε k q . Then inf x P R n ÿ ω P Ω n, p ω e p ε i ,ε j ,ε k q¨ x “ inf x,y,z P R n ř xi “ ř yi “ ř zi “ ÿ ω P Ω n, p ω e p e i ,e j ,e k q¨p x,y,z q . Moreover, when we write the matrix M , it is easier to write the vector p x, y, z q in the order p x , y , z , x , y , z , . . . q instead of the order p x , . . . , x n , y , . . . , y n , z , . . . , z n q . With this ordering,the matrix M with rows in Ω is a block matrix M with blocks of size , with n ´ block rows, andwith n block columns. Each block row corresponds to an edge in the directed graph D l “ p W, E q on n “ p l ` q vertices. If e P E is an edge from i Ñ j , then the e th row of M has the matrix A “ »– fifl (3.1)in the i th block entry and I “ »– fifl in the j th block entry and zeroes elsewhere. See Fig. 3.2 for a portrayal of the whole matrix M .The first three properties for Ω in the proof plan translate to the following three claims about M . The first relates to the tristochasticity of q , the second to the codimension of Aff p Ω q in thesubspace p K n q , and the third to the depth of the point n n in conv p Ω q .21 l ´ w l r u u l ´ u l ´ u l v v l ´ v l ´ v l w w l ´ w l ´ w l ´ w l ` ´ ` p ´ q l ´ ` p ´ q l ´ ` p ´ q l ´ ´ ` ´ ` ´ ` ´ Figure 3.1: The graph D l from Definition 3.2 with the edge labels proportional to the edge labeling q in Item 1 of Lemma 3.3 (the constant factor { n is omitted for readability). We have also omittedthe directions, which are all towards the root r . 22 ——————————————————————————————————————– A I . . . . . . . . . ...... . . . . . . I . . . A ......I A I . . . . . . . . . ...... . . . . . . I . . . A ......I A I . . . . . . . . . ...... . . . . . . I . . . A ......IA I A . . . I . . . fiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffifl Figure 3.2: The matrix M written in the reordered basis described before Lemma 3.3. From the left,the five groups of columns correspond to the w s , the u s , the v s , the w s , and r among the vertices of D l . As such the dimensions of the five column groups, from left, are ¨ , p l ´ q , p l ´ q , p l ´ q , ,and the dimensions of the four groups of rows from top are p l ´ q , p l ´ q , p l ´ q , ¨ . A is asin Eq. (3.1) and I is the ˆ identity matrix. vq q Figure 3.3: If v is a vertex of D l with edges weighted q and q incident to it, then the column v, i of M for i P r s sums to q ` q . That is, the incoming edge contributes its weight and the outgoingedge contributes twice its weight. 23 emma 3.3. Let n “ p l ` q .1. The probability distribution q defined byfor j P r l s , q u j u j ´ ,i “ q v j v j ´ ,i “ n ´ ` p´ q ´p l ´ j q ¯ for j P r l ´ s , q w j w j ´ ,i “ n ´ ` p´ q ´p l ´ j ´ q ¯ q w l ´ w l ´ ,i “ q ¯ w l ´ ¯ w l ´ ,i “ q w l w l ´ ,i “ q ¯ w l ¯ w l ´ ,i “ n ˆ ˙ on the rows of M has expectation n n . That is, if the rows of M are scaled by the values of q , eachcolumn sums to { n . Note that the entries of q are Θ p n q . Ignoring the index i in q uv,i allows us toview q as a labeling of the edges of the graph D l ; see Figs. 3.1 and 3.3.2. ker M “ span p Ω q K is spanned by the 2 dimensional space S Ď R W ˆr s given by S “ t s : s p v, q “ α, s p v, q “ β, s p v, q “ γ for all v P W, α ` β ` γ “ u and the function f P R W ˆr s which for all i P r s assigns f p u j , i q “ f p v j , i q “ f p w j , i q “ p´ q ´ j for j P r l s Y t u and f p ¯ w l ´ k , i q “ f p w l ´ k , i q for k P t , u . (3.2) Note that f P p K n q Ď S K .3. Apart from the three zero singular values, all singular values of M are Ω p { n q . Given the lemma, let us prove that the diameter bound holds according to the proof outline atthe beginning of the section.
Proof of Theorem 1.1.
We first show the claim for n of the form n “ p l ` q ; the bound follows for p l ` q ă n ă p l ` q by applying Proposition 3.5 with t “ p l ` q , using that the array weconstruct has capacity { and t { n ě { .We now show the diameter lower bound for n “ p l ` q . It is enough to exhibit a constant C ą , and a probability distribution p on Ω n, “ t e i : i P r n su such that for for all N ě Cn log n and all x, y, z P K n , ÿ ω P Ω n,d p ω e ω ¨p x,y,z q ď e ´ N ` inf x,y,z P K n ÿ ω P Ω n,d p ω e ω ¨p x,y,z q only if }p x, y, z q} “ Ω p n { N q . Note that the space p K n q over which we are infimizing is a subspaceof S K where S is as in Lemma 3.3, and that Ω n,d Ď S K .Consider the set Ω Ď Ω n,d of rows of M in Lemma 3.3 and the probability distribution q on Ω from Lemma 3.3. Let ω “ p e u l , e v l , e w l q for the vertices u l , v l , w l P D l . Let Ω “ Ω Y t ω u , and definethe probability distribution p on Ω by p ω “ and p ω “ q ω for ω P Ω . Recall from Lemma 3.3the following orthogonal decompositions: R n “ span p Ω q ‘ span p Ω q K , span p Ω q K “ S ‘ span f and hence S K “ span p Ω q ‘ span f . Observe that ω R span p Ω q , because by Lemma 3.3 we have span p Ω q K “ ker M “ S ` span f and clearly f ¨ ω ‰ .24y Item 1 of Lemma 3.3 we have ř ω P Ω q ω ω “ n p n , n , n q and thus cap p q q “ . Therefore, ω R span p Ω q implies that the infimum is { for this choice of Ω and p . We claim that the infimumcan only be approximately attained by h P p K n q with a very large component in the one-dimensionalspace span f “ span p Ω q K X p K n q . As in the proof outline, we must bound the components in span p Ω q of the approximate minimizer h . For h P p K n q write h “ h ` af and ω “ ω ` bf where h , ω P span Ω . Note that | b | “ | f ¨ ω |} f } “ O p ´ l q “ O p ´ n { q and that h P p K n q , because h and f are. Suppose ÿ ω P Ω p ω e ω ¨ h ď e ´ N ` . Equivalently, ÿ ω P Ω q ω e ω ¨ h ` e h ¨ ω ` ab (cid:107) f (cid:107) ď e ´ N ` . (3.3)Suppose } h } is bounded by L . If e h ¨ ω ` ab (cid:107) f (cid:107) ď e ´ N , then | ab | “ Ω p N ´ L q . In particular, } h } ě } af } “ | ab |} f }{| b | “ Ω pp N ´ L q n { q because of the previous bounds on | ab | , | b | , and the factthat } f } “ Θ p q . It remains to prove a bound L for } h } . We will do this by showing that if } h } were too large, then the first term of the left-hand side of Eq. (3.3) would be too large. This amountsto n n being in the relative interior of conv p Ω q , but will be proved using lower bounds on thesingular values of M .Let α denote the least nonzero singular value of M ; by Item 3 of Lemma 3.3 α “ Ω p { n q .As h P span p Ω q “ rowspan p M q , we have } M h } ě α } h } . We claim that there is some ω P Ω satisfying ω ¨ h “ Ω p α } h }{ n q . To prove this, first note that the ř ω P Ω q ω ω ¨ h “ n p n , n , n q¨ h “ because h P p K n q . Moreover, by Lemma 3.3 we have q ω “ Θ p { n q . The claim follows fromLemma 3.4 below applied to the sequence p ω ¨ h : ω P Ω q . Because q ω “ Θ p n q , we must have that ω ¨ h “ O p log n q for all w P Ω . Else, the contributionfrom the term q ω e ω ¨ h alone is larger than , in which case x cannot be an e ´ N -approximateminimizer. Finally, } h } “ O p n p log n q{ α q “ O p n log n q , and so we may take L “ O p n log n q and N ě L .In the above proof, we used the following simple lemma. Lemma 3.4.
Let ă β ă γ . Suppose z P R m is such that ř mi “ q i z i “ for q i P p β { m, γ { m q . Then thereexists i P r m s such that z i ě β γm } z } .Proof. Because ř q i z i “ , ÿ i : z i ă q i | z i | “ ÿ i : z i ě q i z i , and ÿ i : z i ă q i | z i | ` ÿ i : z i ě q i z i ě p β { m q} z } ě p β { m q} z } . Thus ř i : z i ě q i | z i | ě β m } z } , so there is some i such that q i z i ą m β m } z } . Thus z i ą β γm } z } .To show that our diameter lower bound holds for all values of n , we need the followingproposition, which is proved in Appendix E. The idea is to prove diameter bounds for larger arraysfrom diameter bounds for smaller ones by embedding the smaller array in a “corner” of the largerarray. 25 roposition 3.5. Suppose ď t ď n . Let p be a d -dimensional array in p R t ě q b d with unit sum; inparticular cap p p q ď . Let q be the d -dimensional array in p R n ě q b d array such that q i ,...,i d “ tn p i ,...,i d for i , . . . , i d P r t s , q iii “ { n for t ` ď i ď n , and q i ,...,i d “ otherwise. For ε ď ´ cap p p q , D f q p ε q ě D f p ˆ p ´ cap p p qq ε ´ cap p p q t { n ˙ . In particular, the norm of any ε -approximate minimizer of f q is at least the norm of some ` ´ cap p p q ´ cap p p q t { n ˘ ε -approximate minimizer of f p . As a corollary of the proof of Theorem 1.1, we have a bound on the facet gap of [BLNW20]. Thefacet gap of a finite set Ω is defined to be the least distance of an element of Ω to the affine hull of afacet of conv p Ω q . We have shown that the distance between Aff p Ω q and ω is O p ´ l q , or O p ´ n { q . Corollary 3.6 (Facet gap of array scaling) . There is a subset Ω Ď Ω n, with facet gap O p ´ n { q . Analogously to what is done for the margin in Proposition C.1, we may also embed this arrayinside a larger array to obtain a diameter bound for d ě . For d ě , take q p i, j, k, l, l, . . . , l q “ n p ijk for all i, j, k, l P r n s . Then for p x , . . . , x d q P p K n q d we have f q p x , . . . , x d q “ n f p p x , x , x q n ÿ l “ e ř dj “ p x j q l . For fixed x , x , x , by Jensen’s inequality f q is minimized when x j “ n for j ě and takes value f p p x , x , x q , and thus f q has the same diameter bound as f p . Corollary 3.7 (Diameter bound for d ě ) . There is an absolute constant C ą such that the followingholds. For all d ě , there is a family of arrays q P p R n ě q b d with O p n q nonzero entries, each of bit-complexity O p n q , that satisfies the following property. For all ă ε ď exp p´ Cn log n q and x P R dn , if f q p x q ď cap p p q ` ε then (cid:107) x (cid:107) “ Ω ` n { log p { ε q ˘ . We now prove Lemma 3.3.
Proof of Lemma 3.3.
It is first helpful to change basis on each copy of R so that the A blocks arediagonalized. Let U P Mat p q be an orthogonal matrix such that U : AU “ »– ´ ´ fifl . This is possible because , ´ , ´ are the eigenvalues of the symmetric matrix A . In particular, thefirst column of U is p , , q{? , and the second two columns span the space of vectors with sumzero. Then M “ p U ‘ n q : M U ‘ n is of the form P ‘ L ‘ L where P e,v “ M e, q , p v, q for e, v P E ˆ V L e,v “ M e, q , p v, q . Note that L is the edge-vertex incidence matrix of the directed graph D l ,and that P is the matrix obtained from L by replacing every ´ entry by a .To prove Item 2, observe that ker M is p U ‘ n q ker P ‘ ker L ‘ ker L . Because D l is connected, ker L “ span , and it is immediate that S “ p U ‘ n q ‘ ker L ‘ ker L . We next reason for ker P .Because D l is a connected tree, every choice of g p w q P R determines a unique function g : V Ñ R in ker P , and it is not hard to see that p U ‘ n q ker P ‘ ‘ is spanned by the function f of Eq. (3.2).To show Item 3, it is enough to argue that the singular values of P, L, L obey the desired bound.For L this is follows straightforwardly from the fact that L is an incidence matrix of a connected,directed tree and so is totally unimodular with linearly independent rows. The singular valuebound follows by Lemma 3.8. Rather than arguing spectrally for P , we make an ad-hoc argumentusing the structure of D l . We first show that } x t P } “ Ω p} x } q for all x P R n ´ , which sufficesbecause } x t P } ě } x t P } and } x } ě ? n } x } .Let x P R n ´ zt u and e be an edge in D l such that | x p e q| “ } x } . If e “ u i u i ´ for i P r l s , then | x t P p u i q| ě } x } because either i ă l in which case | x t P p u i q| “ | x p u i u i ´ q ` x p u i ` u i q| ě } x } ´ | x p u i ` u i q| ě } x } or i “ l and so | x t P p u i q| “ | x p e q| “ } x } . The same argument applies to all other edgesexcept e “ w l ´ w l ´ . In the latter case we are done if x t P p w l ´ q ě { } x } . Otherwise wenecessarily have | x p w l ´ w l ´ q| ` | x p ¯ w l ´ w l ´ q| ě { } x } , since x t P p w l ´ q “ x p e q ` x p w l ´ w l ´ q ` x p ¯ w l ´ w l ´ q . It follows that | x p w l ´ w l ´ q| ě { } x } ´ | x p ¯ w l ´ w l ´ q| ě { } x } ´ } x } ě { } x } .As | x t P p w l ´ q| “ x p w l ´ w l ´ q ` x p w l w l ´ q , we have | x t P p w l ´ q| ě | x p w l ´ w l ´ q| ´ | x p w l w l ´ q| ě } x } ´ } x } ě } x } . In any case, there is some value of x t P with absolute value greater or equal { } x } .Finally, for Item 1 we note that the probability distribution q on the rows of M has expectationequal to the all { n function if and only if the probability distribution q defined by q p v q “ q p v, i q on the rows of P has expectation equal to the all { n function. That q satisfies this is straightforwardto check from the definition of P and D l ; see Fig. 3.1. Lemma 3.8. If A is an n ˆ k totally unimodular matrix with linearly independent columns, then theeigenvalues of A T A are all at least { n .Proof. The least eigenvalue of A T A is min x P R k zt u p x T A T Ax q{} x } “ min x P R k zt u } Ax } {} x } , so itsuffices to show that for all x P R k , Ax has norm at least } x }{ n . Indeed, if Ax “ y , then there issome invertible k ˆ k submatrix A of A and k ˆ submatrix y of y such that A x “ y . By Cramer’srule and unimodularity of A we have that, for i P r k s , x i “ det p B i q det p A q “ ˘ det p B i q where B i is simply the matrix that one obtains by replacing the i th column of A with the vector y .By performing the Laplace expansion with respect to the i th column, and by unimodularity of theminors, we have that x i ď } y } , and so } x } ď ? n } y } ď n } Ax } .27 The noncommutative case
In this section we extend the results from the commutative to the noncommutative case. For this,we recall in the first subsection necessary concepts such as moment maps and moment polytopes,and we define the weight margin and the gap of a representation. The second subsection introducesthe key concept of a free subset of weights, see [Fra02]. This concept dates at least back to [DK85,Proposition 1.2], where it is called strong orthogonality . Freeness will be used to transfer resultsfrom the commutative to the noncommutative case. The latter is done in the following threesubsections, where we prove bounds on the tensor gap, on the gap for homogeneous polynomialsand on the diameter for the natural SL p n q action on -tensors. Finally, we show a bound for theweight margin of certain quiver representations. This provides an example, where the constructedset of weights is not free, compare Remark 4.28. Still, after adding enough arrows to the consideredquiver, we are able to ensure the same bound for the gap. In the following we introduce the null-cone problem and its dual characterization via momentmaps and moment polytopes. This allows us to rigorously introduce the weight margin and thegap of a rational representation. Thereby we establish precise meaning and interpretation of ourresults regarding these two notions (in view of the null-cone problem). We stick to the notationof [BFG ` G “ SL p n q d , K “ SU p n q d , T “ ST p n q d and T K “ K X T be matrix Lie subgroups of GL p dn q via block-diagonal embedding. Then we can think of their Lie algebras Lie p G q etc. as beingblock diagonally embedded into C dn ˆ dn . For a rational representation π : G Ñ GL p V q we write g ¨ v : “ π p g q v for the induced action, where g P G and v P V . Moreover, we denote the set of weightsof π by Ω p π q Ď i Lie p T K q and the induced representation on Lie algebras by Π : Lie p G q Ñ End p V q .We remark that we usually identify i Lie p T K q – p K n q d Ď p R n q d , where K n denotes the orthogonalcomplement of the all-ones vector n in R n .The orbit of v P V is G ¨ v : “ t g ¨ v | g P G u and we denote its closure by G ¨ v . A vector v is called G - unstable , if P G ¨ v , and otherwise v is G -semistable. Equivalently, a vector v P V is G -unstable if and only if its capacity cap G p v q : “ inf g P G } g ¨ v } equals zero. The G -unstable vectors form an affine subvariety of V - the null-cone (with respect to G ). Orbit, stability, and capacity can also be defined for T by replacing G by T in the definitions.As discussed in Section 1.2, the null-cone problem has many applications in different fields ofcomputer science, mathematics and physics.Next, we introduce the moment map. Given a rational representation π : G Ñ GL p V q thereexists an Hermitian inner product x¨ , ¨y on V , by convention linear in the second argument, suchthat x k ¨ v, k ¨ w y “ x v, w y holds for all k P K and all v, w P V . Actually all presented concepts in the first two subsections work in the very general setting of reductive groups andtheir rational representations. For the sake of clarity and concreteness we stick to the special case needed in this paper,i.e. the reductive group SL p n q d : “ SL p n q ˆ ¨ ¨ ¨ ˆ SL p n q with d ě many copies of SL p n q . The Euclidean- and the Zariski-closure of G ¨ v coincide. In our concrete representations later on this will be the standard inner product. efinition 4.1. For v P V zt u we define µ G p v q P i Lie p K q as the unique element of the real vector space i Lie p K q , which satisfies for all A P i Lie p K q tr ` µ G p v q A ˘ “ x v, Π p A q v yx v, v y . This defines the moment map µ G : V zt u Ñ i Lie p T q of G . Replacing G by T and K by T K we derive themoment map µ T : V zt u Ñ i Lie p T K q of T . The maps µ G and µ T are indeed moment maps in the sense of symplectic geometry; namely forthe induced action of K and, respectively, T K on the projective space P p V q . Recall i Lie p K q Ď C dn ˆ dn so we can consider } µ G p v q} F and } µ T p v q} F .An important application of these moment maps is due to the Kempf-Ness theorem [KN79],which provides a duality for the null-cone membership problem: cap G p v q “ ô ă inf g P G } µ G p g ¨ v q} F “ min ‰ w P G ¨ v } µ G p w q} F (4.1)and similarly for T , replacing G by T in the above equation. The two moment maps are related asfollows. Proposition 4.2.
Let p : i Lie p K q Ñ i Lie p T K q be the orthogonal projection. Then µ T “ p ˝ µ G and (cid:107) µ T p v q (cid:107) F ď (cid:107) µ G p v q (cid:107) F for all v P V zt u .Proof. Since i Lie p T K q Ď i Lie p K q the definition of the moment maps gives tr r µ T p v q H s “ tr r µ G p v q H s for all H P i Lie p T K q . But µ T p v q P i Lie p T K q is the unique element with this property, hence p p µ G p v qq “ µ T p v q . The inequality (cid:107) µ T p v q (cid:107) F ď (cid:107) µ G p v q (cid:107) F follows directly from the first part.Now, we explain how the moment maps induce certain polytopes, which can also be usedto express the duality in (4.1). Moreover, the combinatorics of these polytopes captures theimportant complexity measures (weight) margin and gap . Indeed, one of our main contributions is toanalyze parts of this combinatorics, thereby deducing complexity barriers for certain computationalproblems.Since the action of T via π is completely determined by the weight space decomposition V “ À ω P Ω p π q V ω of V , one can compute µ T p v q in terms of this decomposition. For this, write v “ ř ω v ω with v ω P V ω and define the support of v with respect to π as supp p v q : “ t ω P Ω p π q | v ω ‰ u . Using that distinct weight spaces are orthogonal, one computes µ T p v q “ ÿ ω x v ω , v ω yx v, v y ω, which is a convex combination of the weights in supp p v q . Noting that supp p v q “ supp p t ¨ v q for t P T also µ T p t ¨ v q P ∆ T p v q : “ conv t ω | ω P supp p v qu . In fact, ∆ T p v q “ t µ T p t ¨ v q | t P T u “ (cid:32) µ T p w q | w P T ¨ v, w ‰ ( Ď i Lie p T K q and ∆ T p v q is called the weight polytope of v . 29t is an astonishing result that for fixed v P V zt u , the set t µ G p g ¨ v q : g P G u gives rise to apolytope as follows. Let spec : Herm p n q Ñ R n be the function sending a Hermitian matrix to itseigenvalues in decreasing order. Recalling that i Lie p K q Ď Herm p n q d is block-diagonally embeddedin C dn ˆ dn , we set s : i Lie p K q Ñ p R n q d , diag p A , . . . , A d q ÞÑ ` spec p A q , . . . , spec p A d q ˘ . Then for v P V zt u the set ∆ G p v q : “ (cid:32) s ` µ G p w q ˘ | w P G ¨ v, w ‰ ( is a rational convex polytope, see e.g. [GS84] or [Nes84, Appendix] by Mumford. We call ∆ G p v q the moment polytope of v . Noting that } A } F “ } spec p A q} for any A P Herm p n q we have } µ G p v q} F “ } s p µ G p v qq} for all v P V zt u . Thus, we can formulate the duality from (4.1) also asfollows: cap G p v q “ ô dist ` , ∆ G p v q ˘ ą ô R ∆ G p v q , and similarly for T . This motivates the following two definitions. Definition 4.3.
Let π : G Ñ GL p V q be a rational representation. We define the gap of π as γ G p π q : “ min (cid:32) (cid:107) µ G p v q (cid:107) F | v ‰ is G -unstable ( “ min (cid:32) dist ` , ∆ G p v q ˘ | v ‰ is G -unstable ( , and the weight margin of π as γ T p π q : “ min (cid:32) (cid:107) µ T p v q (cid:107) F | v ‰ is T -unstable ( “ min (cid:32) dist ` , ∆ T p v q ˘ | v ‰ is T -unstable ( . Equivalently, γ T p π q is the margin of the set of weights Ω p π q , i.e. γ T p π q “ γ p Ω p π qq . Thus, the gap γ G p π q is the largest constant C ą with the following property: If } µ G p v q} F ă C for some vector v P V , then v is G -semistable. The same statement holds for the weight margin γ T p π q replacing G by T . Therefore, these notions capture how small µ G p g ¨ v q (respectively µ T p t ¨ v q )must be to certify null-cone non-membership. The next remark connects the gap to the classicalnotion of instability due to Mumford [Mum65]. Remark 4.4.
The gap is twice the minimum value of all positive instabilities. Indeed, let M p v q denote theinstability of a non-zero vector v , see e.g. [Nes84, eq. (9)]. Then dist p , ∆ G p v qq ě M p v q and [Nes84,Theorem 6.1] implies γ G p π q “ inf t M p v q : v ‰ , v is G -unstable u . Example 4.5.
Recall the tensor scaling action, in which the group G “ SL p n q d acts on p C n q b d via therepresentation π n,d : SL p n q d Ñ GL ´ p C n q b d ¯ , p g , . . . , g d q ÞÑ g b ¨ ¨ ¨ b g d . Gap and weight margin are well-defined, i.e. the minimum is attained. Indeed, the moment maps give rise tocontinuous maps on P p V q and the non-zero G -unstable (respectively non-zero T -unstable) vectors form a projectivesubvariety of P p V q ; in particular they form a compact set. imilar computations to those in Example B.2 show that the set of weights of π n,d is Ω p π n,d q “ Ω n,d “ (cid:32) ε i | i P r n s ( d Ď p R n q d . Therefore, the weight margin γ T p π n,d q is the margin γ p Ω n,d q for the array scaling problem from Theorem 1.3and Theorem 2.1. Moreover, the moment map µ G for π n,d can be computed in terms of the quantum marginalsas described in the introduction, i.e. γ G p π n,d q is indeed the tensor gap. The weight margin and the gap satisfy the following inequality.
Proposition 4.6.
It holds that γ T p π q ď γ G p π q .Proof. Let v ‰ be G -unstable. Then there exists k P K such that k ¨ v is T -unstable; see [Wal17,Theorem 3.25]. By Proposition 4.2 we obtain (cid:107) µ G p v q (cid:107) F “ (cid:107) µ G p k ¨ v q (cid:107) F ě (cid:107) µ T p k ¨ v q (cid:107) F ě γ T p π q where we used in the first equality that µ G p k ¨ v q “ kµ G p v q k : . Therefore γ G p π q ě γ T p π q .This inequality motivates the next subsection. Proposition 4.6 from the preceding subsection shows us that an upper bound for the weight margin γ T p π q need not necessarily apply to the gap γ G p π q . Still, many of our bounds in the commutativecase (weight margin and diameter) transfer to the noncommutative case (gap and diameter). Weuse crucially the notion of a free subset of weights (or [Fra02]). Freeness is also known as strongorthogonality [DK85]. Definition 4.7.
Let π : G Ñ GL p V q be a rational representation with set of weights Ω p π q .A subset Γ Ď Ω p π q is called free if no two distinct elements of Γ differ by a root of G . In other words, Γ X p Γ ` α q “ H holds for all roots α of G .Furthermore, a vector v P V zt u is called free if its support supp p v q Ď Ω p π q is free. We transfer the results from the commutative to the noncommutative case with the upcomingProposition 4.8. It is known that for vectors v with free support one has µ G p v q “ µ T p v q . Thisappears implicitly in [Sja98, Lemma 7.1] and [Fra02, Proposition 2.2], but we prove it below forcompleteness. We thank Visu Makam for pointing out to us that this equality still holds under aweaker condition on v , when the representation decomposes into orthogonal subrepresentations,and this can be used to turn our weight margin upper bound for quivers into a gap upper bound(Theorem 4.25). This weaker condition also appears in [DM20b, Theorem 6.5]. Proposition 4.8.
Let π : G Ñ GL p V q be a rational representation and suppose V “ À ki “ V i is an orthogonaldecomposition into G -subrepresentations with respect to the K -invariant inner product, that is used to define µ T and µ G . Let v “ p v , . . . , v k q P V zt u , v i P V i be such that all supports Γ i : “ supp p v i q Ď Ω p π q are free.Then for all t P T it holds that µ G p t ¨ v q P i Lie p T K q and µ G p t ¨ v q “ µ T p t ¨ v q .If additionally R ∆ T p v q “ conv p Γ q , where Γ “ Ť i Γ i , then the upper bound dist p , conv p Γ qq for theweight margin γ T p π q also applies to the gap, i.e. γ G p π q ď dist p , conv p Γ qq . roof. The action of T preserves the supports Γ i , and in particular preserves their freeness. Hence,it suffices to show µ G p v q P i Lie p T K q , which immediately yields µ G p v q “ µ T p v q by Proposition 4.2.Moreover, the orthogonality with respect to the K -invariant inner product shows µ G p v q “ H ‘¨ ¨ ¨ ‘ H k , where H i “ µ p i q G p v i q is given by the moment map µ p i q G of the G -module V i if v i ‰ andotherwise H i “ . The latter holds similarly for µ T .Therefore, we may assume k “ , i.e. v ‰ has free support Γ . We write v “ ř ω P Γ v ω for v ω P V ω .Then, for any root α of G and all A P i Lie p K q X Lie p G q α we have Π p A q v ω “ by Γ X p Γ ` α q “ H (i.e., freeness) and Proposition B.4. Thus, Π p A q v “ and tr ` µ G p v q A ˘ “ for all roots α and all A P i Lie p K q X Lie p G q α . With the root space decomposition Lie p G q “ Lie p T q ‘ À α Lie p G q α (seealso Example B.3) we conclude µ G p v q P i Lie p T K q . The first statement is proven.For the second claim we note that indeed Ť i Γ i “ supp p v q . If additionally R conv p Γ q “ ∆ T p v q ,then v is T -unstable. In particular, v is G -unstable and thus γ G p π q ď dist ` , ∆ G p v q ˘ . On the other hand, we have dist ` , ∆ G p v q ˘ “ inf g P G } µ G p g ¨ v q} F ď inf t P T } µ G p t ¨ v q} F p˚q “ dist ` , conv p Γ q ˘ , where we used µ G p t ¨ v q “ µ T p t ¨ v q in p˚q . We conclude by combining the two inequalities. Remark 4.9.
It is well-known that any rational representation π : G Ñ GL p V q can be decomposed into G -irreducible subrepresentations that are pairwise orthogonal with respect to the fixed K -invariant innerproduct. Proposition 4.8 shows that ensuring freeness on the irreducible subrepresentations suffices. We end the section with an interesting connection between the weight margin and the gap.
Proposition 4.10.
Let π : G Ñ GL p V q be a rational representation and denote its m -fold direct sum by π m .1. The weight margin satisfies γ T p π q “ γ T p π m q for all m ě .2. The gap satisfies γ G p π m q ě γ G p π m ` q for all m ě .3. There exists some m ď dim p V q such that γ G p π m q “ γ T p π m q “ γ T p π q .Proof. We note that π m is given by the action g ¨ p v , . . . , v m q “ p g ¨ v , . . . , g ¨ v m q on V m . Furthermore,the K -invariant inner product x¨ , ¨y of V induces naturally a K -invariant product on V m by xp v , . . . , v m q , p w , . . . , w m qy V m : “ m ÿ i “ x v i , w i y . For the first claim just note that the weight space decomposition for π m is V m “ À ω P Ω p π q V mω and hence Ω p π m q “ Ω p π q .For the second claim, let p v , . . . , v m q P V m zt u be G -unstable such that } µ G p v , . . . , v m q} F “ γ G p π m q . Then p v , . . . , v m , q P V m ` zt u is G -unstable as well, so } µ G p v , . . . , v m , q} F ě γ G p π m ` q .Moreover, under the inner product x¨ , ¨y V m ` the first m copies of V are orthogonal to the lastcopy. Thus, µ G p v , . . . , v m , q is the ˆ block-diagonal matrix diag p µ G p v , . . . , v m q , q and hence } µ G p v , . . . , v m , q} F “ } µ G p v , . . . , v m q} F “ γ G p π m q .32inally, let Γ “ t ω , . . . , ω m u Ď Ω p π q be such that R conv p Γ q and dist p , conv p Γ qq “ γ T p π q . Wehave m ď dim p V q and for each ω i fix some weight vector v i P V zt u . Then v : “ p v , . . . , v m q P V m satisfies the assumptions of Proposition 4.8, because Γ i “ t ω i u is free and the distinct copies of V are orthogonal under x¨ , ¨y V m . Thus, we obtain γ G p π m q ď dist ` , conv p Γ q ˘ “ γ T p π q “ γ T p π m q , but on the other hand γ G p π m q ě γ T p π m q by Proposition 4.6. We recall from Example 4.5 that π n,d denotes the natural representation of G “ SL p n q d on p C n q b d and that the weight margin γ T p π n,d q is the margin γ p Ω n,d q for the array scaling problem fromTheorem 1.3 and Theorem 2.1. The purpose of this subsection is to prove the bounds for γ T p π n,d q from Theorem 2.1 also for the gap γ G p π n,d q . Theorem 4.11.
Let π n,d be the representation induced by the natural action of G : “ SL p n q d on p C n q b d .Then the weight margin γ T p π n,d q and the gap γ G p π n,d q can be bounded as follows:(a) If n “ and d ě , then γ T p π ,d q ď γ G p π ,d q ď ´ d ` . (b) If n ě and d “ , then γ T p π n, q ď γ G p π n, q ď ´ n ` .(c) If n ě and d “ r ´ for some integer r ě , then γ T p π n,d q ď γ G p π n,d q ď ? p n ´ q? r ´ r p n ´ q` ď ´ r p n ´ q` “ ´ p d ` qp n ´ q ` . Though the above theorem only applies to certain d , we can “pad” the tensors to obtain similarresults for all d ě . This is because bounds for γ G p π n,d q via free subsets of weights also hold for γ G p π n,d ` q and γ G p π n,d ` q , see Proposition C.1. The missing case n ě and d “ is treated inProposition C.2. Therefore, we can conclude Theorem 1.6 from the above Theorem 4.11.Our main method for transfering the bounds from the commutative case (Theorem 2.1) to thenoncommutative case is to use the concept of freeness in conjunction with Proposition 4.8. Thefollowing definition will be convenient for proving freeness of tensors. Definition 4.12 (Free sets) . A set M Ď r n s d is called free , if i “ p i , . . . , i d q , j “ p j , . . . , j d q P M with i ‰ j always implies |t i l ‰ j l | l “ , . . . , d u| ě . Proposition 4.13.
Let M Ď r n s d and denote the induced subset of weights by Γ M : “ tp ε i , . . . , ε i d q | p i , . . . , i d q P M u Ď p R n q d . Then M is a free set if and only if the set of weights Γ M Ď Ω p π n,d q is free as in Definition 4.7.Proof. We recall that Γ M is free if and only if no two distinct elements of Γ M differ by a root of G “ SL p n q d , see Definition 4.7. Furthermore, remember that the roots of G are p e i ´ e j , n , . . . , n q , p n , e i ´ e j , n , . . . , n q , . . . . . . , p n , . . . , n , e i ´ e j q P p R n q d i, j P r n s with i ‰ j ; see also Example B.3. Now, if M Ď r n s d is not free, then there exist i “ p i , . . . , i d q , j “ p j , . . . , j d q P M with i ‰ j such that they exactly differ one component.Without loss of generality we assume i ‰ j and i l “ j l for l “ , . . . , n . But then p ε i , . . . , ε i d q “ p ε j , . . . , ε j d q ` p e i ´ e j , n , . . . , n q , and hence Γ M is not free. Clearly, the argument can be inverted to show that if Γ M is not free, then M is not free.The above proposition shows how the equality µ G p t ¨ v q “ µ T p t ¨ v q of Proposition 4.8 can beverified directly for tensors. For tensors, the moment map components are the quantum marginals,and the equality µ G p t ¨ v q “ µ T p t ¨ v q simply says that the quantum marginals are diagonal. Eachoff-diagonal entry of a quantum marginal is the inner product between distinct d ´ -dimensionalslices of a tensor, and if the support of the tensor is free then the supports of such slices are entirelydisjoint - thus the quantum marginals are diagonal.In the following two Propositions we show, that the subsets of weights, which witness theupper bounds for the (weight) margin in Theorem 2.1, are all free. Thereby, we will implicitly useProposition 4.13. Proposition 4.14.
For r ě the rows of A r form a free subset of r s r , i.e. Γ , r is free. Moreover, for r ě the set of weights Γ , r ` is free.Proof. Clearly, Γ , “ t ε , , , ε , , u is free. Recall the constructions of Γ , r and Γ , r ` fromSection 2.1. If Γ , r is free, then Γ , r ` is clearly also free. Thus, we are left to prove the former.Consider A r as defined in Eq. (2.2). We must show that distinct rows of A r differ in at leasttwo entries for all r ě . The claim is proven by induction on r ě . For r “ , we verify the claimby inspection of A . Let a i be the i th row of A ; its definition is recalled in the left-hand table below.The right-hand table lists for each pair a i , a j with i ă j two distinct entries in which a i and a j differ,which shows the claim for r “ .entry 1 2 3 4 5 6 a a a a a a a a a a a a a a a a r “ , since a , . . . , a already pairwise differ in at leasttwo of the first four entries.Now assume that the claim holds for some fixed r ě . Let a i , a j be distinct rows of A r ` ; wewill show they differ in at least two entries. If ď i ă j ď r , then by our inductive hypothesisthere is nothing to prove because the first r rows of A r ` contain A r as a submatrix.To complete the proof, it is enough to show that the ˆ p r ` q submatrix formed by restrictingto the m th block row, m P r r s , and the last block row of A r ` satisfies the hypothesis, i.e. any twodistinct rows of this submatrix differ in at least two entries. This is the case as restricting to its st , m th and last block columns yields a ˆ submatrix of A if m ‰ , namely ˆ B B B B B B ˙ , ˆ submatrix equal to A if m “ . Proposition 4.15.
For n ě the set W n Ď r n s is free, i.e. Γ n, Ď Ω p π n, q is free. Furthermore, for n ě and r ě the set of weights Γ n, r ´ Ď Ω p π n, r ´ q is free.Proof. We remind the reader that W n “ (cid:32) p s, , s q , p s, s, q , p s ´ , s, s q | s “ , , . . . , n ( . Let x “ p x , x , x q , y “ p y , y , y q P W n be such that x ‰ y . We prove by a distinction of cases that x and y differ in at least two entries. First, we assume x “ y . Then a : “ x “ y ě , otherwise x “ p , , q “ y contradicts x ‰ y . Thus x, y P tp a, , a q , p a, a, q , p a, a ` , a ` qu and we concludethat x and y differ in at least two entries as x ‰ y . Second, we assume x ‰ y . There is nothing toshow if x ‰ y , so we additionally assume b : “ x “ y . If b “ , then we are done by x “ p x , , x q and y “ p y , , y q . On the other hand, b ě yields x, y P tp b, b, q , p b ´ , b, b qu and as x ‰ y theydiffer in the first and third entry. This proves the first statement.For the second claim, recall that Γ n, r ´ “ t ε σ p i q ,σ p j q ,σ p k q | p i, j, k q P W rn z J r u , where σ : r rn s Ñ r n s r ´ is injective, compare Remark 2.9. By the first part W rn is free and so is itssubset W rn z J r . Hence Γ n, r ´ is free as σ is injective.We are now ready to deduce Theorem 4.11. Proof of Theorem 4.11.
Recall that all the bounds in Theorem 4.11 hold for the weight margin γ T p π q by Theorem 2.1. This was proven by exhibiting witness sets Γ n,d Ď Ω p π n,d q such that R conv p Γ n,d q ,which gives the bound γ T p π n,d q ď dist p , conv p Γ n,d qq . But if Γ n,d is free, then we even have γ G p π n,d q ď dist ` , conv p Γ n,d q ˘ by Proposition 4.8. By Proposition 4.14 the witness sets Γ , and Γ , r , Γ , r ` , r ě for Theo-rem 2.1(a) are free, which proves Theorem 4.11(a). Similarly, we conclude parts (b) and (c) withProposition 4.15, which shows that for n ě and r ě the witness sets Γ n, and Γ n, r ´ are free. In the following we transfer the result from d -tensors to the natural SL p n q action on homogeneous d -forms in n variables. This representation is given by (cid:37) n,d : SL p n q Ñ GL ` C r x , . . . , x n s d ˘ , g ÞÑ ` p p x q ÞÑ p p g ´ x q ˘ . Each monomial x α “ x α ¨ ¨ ¨ x α n n , given by a multi-index α “ p α , . . . , α n q P p Z ě q n with | α | : “ ř i α i “ d , is a weight vector for (cid:37) n,d with weight ´ α ` dn n . Therefore Ω p (cid:37) n,d q “ " ´ α ` dn n ˇˇˇˇ α P p Z ě q n with | α | “ d * , i.e. Ω p (cid:37) n,d q “ Ω from Eq. (2.22) and the bounds from Corollary 2.12 apply to γ ST p n q p (cid:37) n,d q “ γ p Ω q .If n “ dm for some integer m ě , then we have ´ Ω p π m,d q Ď Ω p (cid:37) n,d q .35 roposition 4.16. Let n “ dm for some integer m ě . If Γ Ď Ω p π m,d q is free, then ´ Γ Ď Ω p (cid:37) n,d q is free.Proof. We prove the statement by contraposition. Assume that ´ Γ Ď Ω p (cid:37) n,d q is not free. Thenthere exists a root α “ e i ´ e j P R n of SL p n q , where i, j P r n s with i ‰ j , and two distinct weights ω, ω P ´ Γ such that ω “ ω ` e i ´ e j , equivalently ´ ω “ ´ ω ´ e i ` e j . The latter equation enforces ´ α to be of the form p m , . . . , m , e k ´ e l , m , . . . , m q P p R m q d – R n for some k, l P r m s with k ‰ l, because ´ ω, ´ ω P Ω p π m,d q . Thus, ´ α is a root of SL p m q d and hence Γ Ď Ω p π m,d q is not free.As a consequence of the preceding Proposition we obtain bounds for the gap γ SL p n q p (cid:37) n,d q . Theorem 4.17 (Gap for Polynomial scaling) . Let d ě and let n “ dm for some integer m ě . Thenthere exists a constant C ą , independent of n and d such that γ SL p n q p (cid:37) n,d q ď ´ Cdm “ ´ Cn . More concretely, for d “ and m ě it holds that γ SL p n q p (cid:37) n,d q ď dist ` , Γ m, ˘ ď ´ m ` “ ´ n ` , and if m ě and d “ r ´ for some r ě , we have γ SL p n q p (cid:37) n,d q ď dist ` , Γ m, r ´ ˘ ď ´ r p m ´ q` “ ´ p d ` qp m ´ q ` « ´ n . Proof.
We recall that Theorem 1.6 was proven by padding the results from Theorem 4.11. Thus, foreach m ě and d ě the bound γ SL p m q d p π m,d q ď ´ Cmd from Theorem 1.6 is witnessed by a freeset of weights Γ m,d Ď Ω p π m,d q , i.e. ă dist p , conv p Γ m,d qq ď ´ Cdm . But then R conv p´ Γ m,d q and ´ Γ m,d Ď Ω p (cid:37) n,d q is free by Proposition 4.16. Therefore, Proposition 4.8 yields γ SL p n q p (cid:37) n,d q ď dist ` , conv p´ Γ m,d q ˘ “ dist ` , conv p Γ m,d q ˘ ď ´ Cdm . Similarly, we get the other bounds by using freeness of Γ m, and, respectively, Γ m, r ´ (seeProposition 4.15) combined with the distance bounds Lemma 2.6 and Lemma 2.11, respectively. In this section we show that the diameter lower bound of Theorem 1.1 generalizes to diameterbounds for the capacity Eq. (1.4) over the noncommutative group G “ SL p n q d . Many algorithmsfor computing the capacity have resorted to geodesically convex optimization - G can be viewedas a manifold on which g ÞÑ } g ¨ v } is geodesically convex. The distance between an element of g and the identity in this geometry is closely related to the condition number of the matrix g . Thediameter bound question is the following: given an input v and ε ą , how large a ball in G about theidentity must we optimize over to find an approximate minimizer g P G such that } g ¨ v } ´ cap p v q ď ε ? Inother words, how well-conditioned can we expect approximate minimizers to Eq. (1.4) to be? Thismatters because all the algorithms we know start at the origin and take small steps in the manifold,and if all the high-precision solutions are far from the origin then such algorithms cannot reach anyof them quickly. 36efore tackling this question we must make our notions of distance more precise. The manifoldwe use is actually not G but rather the manifold P of Hermitian, positive-definite matrices in G .Indeed, we can write inf g P G } g ¨ v } “ inf g P G x v, g : g ¨ v y “ inf x P P x v, x ¨ v y . Thus we may instead optimize the function f v : g ÞÑ x v, g ¨ v y over P . The manifold P is aprototypical example of a Hadamard manifold , a complete, simply connected Riemannian manifold ofnon-positive sectional curvature [Bac14]. For us, G “ SL p n q d for some d , and so P is just the set of d -tuples of positive-definite matrices of determinant . Even for d “ , P contains a totally geodesicsubmanifold isometric to the hyperbolic plane; as such the volumes of balls grow exponentially intheir radius. The function f v : g ÞÑ } g ¨ v } is convex along geodesics in this manifold [BFG ` .The geodesics through a point X P P are given by γ p t q “ ? Xe Ht ? X for Hermitian H . TheRiemannian gradient ∇ log f v p g q of log f v at g P P is given by the moment map µ G p g ¨ v q . Thegeodesic ball of radius R in P about the identity is given by B R : “ t e A : A traceless, Hermitian , } A } F ď R u Ď P. In a slight abuse of notation, we define the geodesic ball in G (rather than P ) to be KB R , as inthe introduction. The values taken by f v over B R are the same as the values taken by g ÞÑ } g ¨ v } on KB R . We now define diameter bounds. Definition 4.18.
The diameter bound D f p ε q for a function f on P and a real number ε ą is defined as theinfimum over R ą such that inf g P B R f p g q ď ε ` inf g P P f p g q . We will show that the diameter bound for the norm-squared function can grow faster than poly p n, { ε q for d “ . Firstly, we need to review how diameter bounds for tensors in p R n ě q d like thatin Theorem 1.1 relate to diameter bounds for tensors in p C b n q d over SL p n q d and ST p n q d . Infimizing f v p g q over the subset P X ST p n q d Ď P , or the tuples of positive-definite diagonal matrices within SL p n q d , results in a program of the form Eq. (1.2). For d “ , for example, inf g P P X ST p n q x v, g ¨ v y “ cap p p q “ inf x Pp R n q ÿ ω P Ω n, p ω e ω ¨ x “ inf x Pp K n q ÿ ω P Ω n, p ω e ω ¨ x (4.2)where Ω n, “ tp ε i , ε j , ε k q : i, j, k P r n su and p p ε i ,ε j ,ε k q “ | v ijk | . The correspondence is exactly g “ e diag p x q for x P p K n q , which implies the following. Lemma 4.19.
For all ε ą , the diameter bound D f p ε q for the function f v : g ÞÑ x v, g ¨ v y on ST p n q isequal to the diameter bound D h p ε q of the function f p where p ijk “ | v ijk | , or f p : p R n q Ñ R , x ÞÑ ÿ i,j,k Pr n s | v ijk | e p ε i ,ε j ,ε k q¨ x . The volume of a ball can be computed exactly [GAN99], but the very crude bound of volume Ω p e Θ p r q´ O p n log n q q forthe geodesic ball of radius r can be proved elementarily. The manifold PD p n q X SL p n q contains the hyperbolic plane as atotally geodesic submanifold, in which the ball of radius r has area e Θ p r q [CFK ` r in PD p n q X SL p n q contains Ω p e Θ p r q q balls of radius , which themselves have volume at least e ´ O p n log n q by comparisonwith the Euclidean ball. This was implicitly shown much earlier in [KN79].
37f course, there’s nothing special about d “ here, and the lemma generalizes straightforwardlyto other d . For instance, applying Lemma 4.19 for d “ shows that restricting operator scaling todiagonal matrices yields an instance of matrix scaling. We have shown how diameter bounds over ST p n q d relate to those over p R n q d . Now we complete the chain by showing how to relate diameterbounds over SL p n q d to those over ST p n q d . We will show that tensors with free support (definedin Definition 4.12) have the same diameter bound over SL p n q d as they do over ST p n q d , which byTheorem 1.1 and Lemma 4.19 we have shown can be superpolynomial. We then show that theconstruction from Section 3.2 is free. Theorem 4.20.
Let G denote SL p n q d , and let T denote ST p n q d . Suppose µ T p t ¨ v q “ µ G p t ¨ v q for all t P T (which holds if v has free support). Then for any R ą we have inf g P B R f v p g q “ inf g P T X B R f v p g q , where B R denotes the geodesic ball of radius R about the identity in G .Proof. Define B : “ B R and recall that P denotes the positive-definite matrices in G . Let f : P Ñ R be given by f : g Ñ x v, g ¨ v y . Clearly inf g P B f p g q ď inf g P T X B f p g q . We must show the converseinequality. Let g ˚ : “ arg min g P B f p g q . Recall that P is a Hadamard manifold. Define T ` to be T X P . Let πg ˚ denote the projection of g ˚ to T ` , that is, the closest point in T ` to g ˚ . As T ` is ageodesically convex set, projections to T ` are unique and distances decrease under the projection[Bac14, Theorem 2.1.12]. Thus, πg ˚ P B . If we can show that f p πg ˚ q ď f p g ˚ q then the proof iscomplete.Let g ˚ “ exp πg ˚ p x q for some x in the tangent space T πg ˚ P to P at πg ˚ . That is, γ : r , s Ñ P, t ÞÑ exp πg ˚ p tx q is the geodesic between πg ˚ and g ˚ . Then, in the local inner product x¨ , ¨y πg ˚ at πg ˚ , x is orthogonal to the tangent space T πg ˚ T ` Ď T πg ˚ P of T ` at πg ˚ , because πg ˚ is a localminimum of the geodesically convex function d p g ˚ , ¨q on T ` and x is proportional to the gradientof d p g ˚ , ¨q at πg ˚ . The function f is geodesically convex, and its gradient ∇ f p πg ˚ q is proportional to the momentmap µ G p πg ˚ ¨ v q . By the assumption that µ T p t ¨ v q “ µ G p t ¨ v q for all t P T , µ G p πg ˚ ¨ v q is in i Lie p T K q ,which is precisely the tangent space of T ` at πg ˚ . Thus f p g ˚ q “ f p exp πg ˚ p x qq ě f p πg ˚ q ` x x, ∇ f p πg ˚ qy πg ˚ “ f p πg ˚ q , which completes the proof. Lemma 4.21.
The support of the tensor p from Theorem 1.1 is free.Proof. Recall that a tensor in p C n q b is free if and only if the supports of distinct rows of its weightmatrix intersect in at most one element. The construction in Proposition 3.5 preserves freeness,so we can consider the case n “ p l ` q treated in the proof of Theorem 1.1. Recall that, in thiscase, the support of p is Ω Y ω where Ω is the rows of a matrix M defined from the directedgraph D l . Each row in the matrix M corresponds to some edge D l . Let us first verify that Ω isfree. Assuming the rows correspond to the same edge, they can be verified to have intersection inat most one element, because the nonzero entries of the three rows corresponding to an edge arecontained in a ˆ submatrix with the following form: “ A I ‰ “ »– fifl are colored for readability. Now consider the case that the rows belongto two different edges. If the two edges share no vertices, then clearly the corresponding edges donot intersect. Because the graph is a directed tree, edges may only share a vertex which is the sinkof at least one of the edges. If the vertex is a sink for both edges, then the nonzero entries in the 6rows belonging to either edge (after permutation) take the form „ A IA I “ »——————– fiffiffiffiffiffiffifl . If the shared vertex is a sink for only one edge, then the rows are „ A IA I “ »——————– fiffiffiffiffiffiffifl . In all these cases it can be verified that supports of distinct rows intersect in at most one element.Lastly, we need to make sure that the intersection of the support of ω with the support of anyelement of Ω is at most one. Recall that ω is defined to have entry one in each block correspondingto the leaves u l , v l , w l in D l . However, there are no edges between the leaves, so the support of norow can intersect that of ω in more than one element.We are now nearly ready to prove Theorem 1.4. We would simply use the array p from the proofof Theorem 1.1, but setting | v ijk | “ p ijk would not be solvable over the rationals. Therefore wemust round ? p ijk , which requires some additional technical lemmas proven in Appendix E. Lemma 4.22 (Rounding and diameter bounds) . Let p, q : Ω Ñ R ě be positive functions on a finite set Ω Ď R m . Suppose there is a set B such that inf x P B f p p x q ě p ` ε q cap p, and let M “ max t { q ω , { p ω : ω P Ω u . Then inf x P B f q p x q ě pp ` ε qp ´ M } p ´ q } q ´ M } p ´ q } q cap q. Lemma 4.23 (Rounding and capacity) . Let Ω Ď R m be finite and let p, q : Ω Ñ R ě be positive functionson Ω . Let M “ max ω P Ω { q ω . Then log cap q ě log cap p ´ M } p ´ q } . Proof of Theorem 1.4.
First recall that the values taken by g ÞÑ } g ¨ v } on the geodesic ball KB R in G are the same as the values taken by f v : g ÞÑ x v, g ¨ v y on B R in P . Thus it is enough to show that f : “ f v has diameter bound D f p ε q “ Ω p n { log p { ε qq for ε ď e ´ Cn log n .39e will apply Lemma 4.22 with p as in the proof of Theorem 1.1 and q ijk “ | v ijk | , with v ijk chosen so that v has the same support as p and p ijk ´ δ ă | v ijk | ď p ijk for δ small. Because v isfree, by Theorem 4.20 the diameter bound for f v is the same as the diameter bound for f v over ST p n q . By Lemma 4.19, this is the same as the diameter bound for f q . It remains to show that D f q p ε q “ Ω p n { log p { ε qq . We will do this by relating D f q p ε q to D f p p ε q ; in particular we will show D f q p Ω p ε qq ě D f p p ε q . Let R “ D f p p ε q . We have inf x Pp R n q , } x }ď R f p p x q ě cap p p q ` ε “ p ` ε q cap p p q , recalling that cap p p q “ { . By Lemma 4.22, inf x Pp R n q , } x }ď R f q p x q ě pp ` ε qp ´ M } p ´ q } q ´ M } p ´ q } q cap p q q . As cap q ď { , if M } p ´ q } ď M } p ´ q } ď cε for c a small enough constant, then we have pp ` ε qp ´ M } p ´ q } q ´ M } p ´ q } q cap q “ cap q ` Ω p ε q , so inf x Pp R n q , } x }ď R f q p x q ě cap q ` Ω p ε q . Thus D f q p Ω p ε qq ě D f p p ε q assuming M } p ´ q } ď cε . To ensure that this constraint is satisfied,choose v of bit complexity O p log n ` log p { ε qq such that } p ´ q } “ cn ε . Because p ijk “ Ω p { n q for i, j, k in the support of p by construction, we have q ijk “ Ω p { n q for i, j, k in the support of q andhence M “ O p n q . Thus M } p ´ q } ď cε . Applying Lemma 4.23 together with our assumptionsabout the size of p ´ q and the fact that cap p q q “ cap p v q implies the final claim that cap p v q ě { and that ě } v } ě { .Finally, we remark that the same diameter bound holds for d ě for tuples of tensors. We notethat if v P p C n q b has free support, then so does the tensor v b e l b . . . b e l Ă p C n q b d for d ě . ByProposition 4.8, the tuple w P pp C n q b d q n given by w l “ n v b e l b . . . b e l for l P r n s has µ T p t ¨ v q “ µ G p t ¨ v q for all t P ST p n q d . The commutative problem obtained by restricting to SL p n q d as in Lemma 4.19 is precisely f q as in Corollary 3.7. As in the proof of Theorem 1.4, byTheorem 4.20, Lemma 4.19 and Corollary 3.7, we have the following. Corollary 4.24.
There is a constant C ą such that the following holds for all d ě . For all ε ď exp p´ Cn log n q , there is a tuple of tensors w “ w p ε q P pp C n q b d q n with O p n q nonzero entries of bitcomplexity O p log n ` log p { ε qq , and a geodesic ball B “ B p ε q of radius Ω ` n { log p { ε q ˘ about the identityin SL p n q d , such that inf g P B } g ¨ w } ě cap p v q ` ε. Moreover, it holds that { ď cap p w q ď and { ď } w } ď . For d ě let Q d be the quiver d ´ d ´ d if d even d ´ d ´ d if d odd . Q p k q d be the quiver one obtains from Q d by adding k ´ additional copies of each arrowin Q d . As before, let G “ SL p n q d and T “ ST p n q d . Then G acts on the quiver Q d with dimensionvector p n, . . . , n q as described in the introduction. We denote the corresponding representationby π d . Note that the action of G on Q p k q d with dimension vector p n, . . . , n q is given by π kd . In thissubsection we prove a bound on the weight margin of π d and on the gap of π nd . The bound on γ G p π nd q is thanks to the refinement of freeness in Proposition 4.8 pointed out by Visu Makam. Theorem 4.25.
Let n, d ě and denote the natural action of G “ SL p n q d on the quiver Q d with dimensionvector p n, . . . , n q by π d : SL p n q d Ñ GL p V d q , where V d “ p C n ˆ n q d ´ . The representation π nd correspondsto the G -action on the quiver Q p n q d with dimension vector p n, . . . , n q . It holds that γ T p π d q ď p n ´ q ´ d ` and γ G p π nd q ď p n ´ q ´ d ` . Remark 4.26.
Before proving the theorem, we point out a few consequences.1. Theorem 4.25 shows that γ T p π d q ´ and γ G p π nd q ´ are not polynomially bounded with respect to dim V d “ p d ´ q n and dim SL p n q d “ d p n ´ q . Instead we see for fixed n and d Ñ 8 anexponential behaviour in the number of vertices d .2. The proof of Theorem 4.25 below shows that for the bound on the gap it is enough to consider the quiver Q p n ´ q d with an additional n th arrow from d to d ´ .3. The ideas presented below can be adjusted to prove similar bounds for other dimension vectors. Forexample, one can show that the gap for the SL -action on Q p q d with dimension vector p , , , . . . , , q is inverse exponential in d . This aligns with an algebraic barrier for this action; the invariants that cutout the null cone for this action have exponential degree [DM18, Proposition 1.5].4. The quiver Q d is of finite representation type and has no oriented cycles. Therefore, the null-conemembership problem for π d can be solved in polynomial-time by algebraic algorithms. This means Q d is an example where the weight margin is very small but there still exist efficient algorithms. Can theexistence of efficient algorithms still be explained by a large gap in this case? This leads to the followinginteresting open question. Problem 4.27.
Is the gap γ G p π d q inverse polynomial in n and d ? A positive answer would provide an interesting example, since in this case the weight margin of π d would be significantly smaller than the gap of π d .We now introduce several lemmas needed to prove Theorem 4.25. Note that the set of weightsof π d viewed as a subset of p R n q d is !` p´ q d ε i , p´ q d ´ ε j , , . . . , ˘ , ` , p´ q d ´ ε i , p´ q d ´ ε j , , . . . , ˘ , . . . , ` , . . . , , ε i , ´ ε j ˘ | i, j P r n s ) . We define recursively the subsets of weights Γ : “ tp ε i , ´ ε j q | i P r n ´ s , j P r n su Ď Ω p π q Ď R n for d ě , Γ d : “ !` p´ q d ε i , p´ q d ´ ε n , n , . . . , n ˘ | i P r n ´ s ) Y ` t n u ˆ Γ d ´ ˘ Ď Ω p π d q Ď R dn . Personal communication with Visu Makam. There does not seem to be an explicit reference in the literature. emark 4.28. We note that for d ě , Γ d is not not free. For instance, we can always write p n , . . . , n , ε , ´ ε q “ p n , . . . , n , ε , ´ ε q ` p n , . . . , n , n , e ´ e q , i.e. the weights p n , . . . , n , ε , ´ ε q , p n , . . . , n , ε , ´ ε q P Γ d differ by the root p n , . . . , n , n , e ´ e q of SL p n q d . Therefore, we cannot deduce a bound on the gap γ G p π d q via Proposition 4.8. However, the latterallows us to deduce at least a bound on the gap of π nd . In the next two lemmas we show that Γ d witnesses the bound on γ T p π d q and afterwards we useProposition 4.8 to transfer this bound to γ G p π nd q . Lemma 4.29.
For all d ě it holds that R conv p Γ d q .Proof. We prove the statement by induction on d ě . For d “ , just note that any element in conv p Γ q Ď R n has value ´ { n in the n -th entry. In particular, R conv p Γ q . For d ě let x “ ÿ ω P Γ d λ ω ω , λ ω ě be a convex combination of the elements in Γ d . Assume there is an i P r n ´ s such that for ω i : “ ` p´ q d ε i , p´ q d ´ ε n , n , . . . , n ˘ one has λ ω i ą . Then the n -th entry of x is non-zero, since ω i has n -th entry p´ q d ` { n and all(other) ω P Γ d have p´ q d ` { n or zero as n -th entry. On the other hand, if λ ω i “ for all i P r n ´ s ,then x P t n u ˆ conv p Γ d ´ q . By induction hypothesis on d ´ we necessarily have x ‰ . Lemma 4.30.
For d ě it holds that x d : “ λ d ` p´ q d ´ ε n , n , . . . , n ˘ P conv p Γ d q , where λ d : “ ˜ d ´ ÿ i “ p n ´ q i ¸ ´ . In particular, } x d } ă | λ d | ď p n ´ q ´ d ` .Proof. We proceed by induction on d ě . In the case d “ , consider the convex combination n ´ ÿ i “ n ÿ j “ p n ´ q n p ε i , ´ ε j q “ n ´ p´ ε n , n q “ x , where we used (2.1). Now assume the claim is proven for some d ě , hence λ d ` n , p´ q d ´ ε n , n , . . . , n ˘ P t n u ˆ conv p Γ d q Ď conv p Γ d ` q . (4.3)Setting µ : “ p n ´ q λ d ` λ ´ d we have µλ d “ p n ´ q λ d ` and µ ` p n ´ q λ d ` “ . Together with(2.1) and (4.3) we deduce x d ` P conv p Γ d ` q via µ λ d ` n , p´ q d ´ ε n , n , . . . , n ˘ ` λ d ` n ´ ÿ i “ ` p´ q d ` ε i , p´ q d ε n , n , . . . , n ˘ “ x d ` . This ends the induction. Finally, } x d } ă | λ d | follows from } ε n } ă .42 roof of Theorem 4.25. By Lemma 4.29 and Lemma 4.30 we have γ T p π d q ď p n ´ q ´ d ` . With the fact Ω p π d q “ Ω p π nd q and with Proposition 4.8 we transfer this bound to the gap of π nd . To doso, we note that the natural inner product on V nd “ p C n ˆ n q n p d ´ q , given by the trace inner producton each C n ˆ n copy, is invariant under the action of K “ SU p n q d . Clearly, distinct C n ˆ n copies areorthogonal under this inner product. Thus, to be able to apply Proposition 4.8 it is enough toassign to each C n ˆ n copy, i.e. to each arrow of Q p n q d , a matrix M i such that supp p M i q is free and Γ d “ Ť i supp p M i q .For this, we consider the n ˆ n matrices M : “ ˆ I n ´
00 0 ˙ and P : “ ˆ I n ´ ˙ , and E i,j is the matrix with p i, j q -entry one and all other entries zero. Then E i,i P “ E i,σ p i q , where σ : r n s Ñ r n s is the cycle p . . . n q . Therefore, for k P r n s we have supp ´ M P k ´ ¯ “ !` n p d ´ q , ε i , ´ ε σ k ´ p i q ˘ | i P r n ´ s ) and t n p d ´ q u ˆ Γ “ ď k Pr n s supp ´ M P k ´ ¯ . For fixed k , i ‰ i implies σ k ´ p i q ‰ σ k ´ p i q , so any distinct elements of supp p M P k ´ q differ inthe last two R n -components. Hence, each supp p M P k ´ q is free and we assign M, M P, . . . , M P n ´ to the n arrows that go from vertex d to vertex d ´ . For l P r d ´ s , we assign to the n arrowsbetween the vertices l and l ` each of the matrices E ,n , E ,n , . . . , E n ´ ,n at least once. (Exactlyone of the latter matrices is assigned to two of these arrows.) Clearly, the support of E i,n , i P r n ´ s is free as it contains just one weight. By construction, this assignment does the job. Moreover, theargument shows that n ´ arrows between the vertices l and l ` , l P r d ´ s , suffice. A Notation f p the function R m Ñ R ě , x ÞÑ ř ω P Ω p ω e ω ¨ x , see Eq. (1.2) cap p p q the capacity of a non-negative function p on a finite set Ω Ď R m , see Eq. (1.2) cap p v q the capacity of a vector v under a group action, see Eq. (1.4) r n s the set t , , . . . , n u n the zero vector in R n e i the i th canonical unit vector in R n n the all-ones vector in R n K n the orthogonal complement of n in R n , i.e. tp v , . . . , v n q P R n : ř i v i “ u ε i the vector e i ´ n n I n the n ˆ n identity matrix dist p , S q the distance from the origin to the set S conv p S q the convex hull of S in R n Aff p S q the affine hull of S in R n π n,d the representation for d -dimensional tensor scaling43 p π q the set of weights of a representation π Ω n,d “ Ω p π n,d q the set t ε i : i P r n su d corresponding to d -dimensional array scaling; equal tothe set of weights of the tensor scaling representation π n,d , see Example 4.5 γ p Ω q the margin of the finite set Ω Ď R m , see Definition 1.2 γ T p π q the weight margin of a representation π , i.e. γ p Ω p π qq , see Definition 4.3 γ G p π q the gap of a representation π , see Definition 4.3 tr p A q the trace of a square matrix AD f p ε q the diameter bound of a function f for ε ą , see Definition 3.1 respectivelyDefinition 4.18 } A } F the Frobenius norm of a square matrix Ae A the exponential of a square matrix A Lie p G q the Lie algebra of a matrix Lie group G GL p n q the group of invertible complex n ˆ n matrices SL p n q the group of invertible complex n ˆ n matrices with determinant one ST p n q the group of diagonal invertible complex n ˆ n matrices with determinant one SU p n q the group of unitary matrices of size n ˆ n and determinant one Herm p n q the set of complex Hermitian n ˆ n matrices GL p V q the group of C -linear, bijective maps V Ñ V , where V is a C -vector space B Representation theory background
In this section we briefly recall some representation theory. All the concepts we present here actuallywork in the very general setting of reductive groups and their rational representations, see e.g.[BFG `
19, section 2]. For the sake of clarity and concreteness we stick to the special case needed inthis paper, i.e. the reductive group SL p n q d : “ SL p n q ˆ ¨ ¨ ¨ ˆ SL p n q with d ě many copies of SL p n q .We call a Euclidean-closed subgroup H Ď GL p n q a matrix Lie group . Indeed, such an H isnaturally a Lie group (c.f. [Hal03, Theorem 1.19]) with real Lie algebra
Lie p H q : “ (cid:32) A P C n ˆ n | @ t P R : e tA P H ( . The Lie bracket for
Lie p H q is the commutator r A, B s : “ AB ´ BA . Moreover, for d ě the product H d : “ H ˆ ¨ ¨ ¨ ˆ H becomes a matrix Lie group via block-diagonal embedding into GL p dn q , i.e. H d ã Ñ GL p dn q , p h , . . . , h d q ÞÑ ¨˚˝ h . . . h d ˛‹‚ Then the Lie algebra of H d is Lie p H q d “ Lie p H q ˆ ¨ ¨ ¨ ˆ Lie p H q block-diagonally embedded into C dn ˆ dn . If G Ď GL p n q is another matrix Lie group, then G X H is again a matrix Lie group with Liealgebra Lie p G X H q “ Lie p G q X Lie p H q . Example B.1.
The groups GL p n q , SL p n q , U p n q and GT p n q are matrix Lie groups with Lie algebras Lie p GL p n qq “ C n ˆ n Lie p U p n qq “ t A P C n ˆ n | A : “ ´ A u “ i Herm p n q Lie p SL p n qq “ t A P C n ˆ n | tr p A q “ u Lie p GT p n qq “ t A P C n ˆ n | A diagonal matrix u . herefore, also SU p n q , ST p n q and U p n q X ST p n q are matrix Lie groups and their Lie algebras are obtainedby corresponding intersections of the above Lie algebras. In particular, we have Lie p U p n q X ST p n qq “ (cid:32) i diag p x , . . . , x n q | x j P R , x ` . . . ` x n “ ( . Thus, we can identify i Lie p U p n q X ST p n qq with the orthogonal complement p n q K Ď R n of the all-onesvector n . In the following, let G : “ SL p n q d for some d ě . Then K : “ SU p n q d is a maximal compactsubgroup of G , and T : “ ST p n q d and T K : “ K X T are maximal tori of G and K , respectively. Asexplained above, we think of all these groups as matrix Lie subgroups of GL p dn q , and hence oftheir Lie algebras as subsets of C dn ˆ dn .A rational representation of G “ SL p n q d is a group morphism π : G Ñ GL p V q , such that in somebasis of V the matrix entries of π p g q P GL p V q are polynomials in the matrix entries of g . Such arational representation of G induces a representation of the Lie algebras by Π : Lie p G q Ñ End p V q , A ÞÑ ddt ˇˇˇˇ t “ π ` e tA ˘ with the property π p e A q “ e Π p A q for all A P Lie p G q . Restricting π to the commutative subgroup T induces a so-called weight space decomposition of V . That is, there is some finite set Ω p π q Ď i Lie p T K q and a decomposition V “ À ω P Ω p π q V ω into non-zero subspaces such that each ω P Ω p π q and any v ω P V ω satisfy @ A P Lie p T q : π ` e A ˘ v ω “ e tr p Aω q v ω or, equivalently, @ A P Lie p T q : Π p A q v ω “ tr p Aω q v ω . The elements ω P Ω p π q are called weights of π and the v ω P V ω are called weight vectors . ConsideringExample B.1 we frequently use the identification i Lie p T K q – p K n q d , where K n is the orthogonalcomplement of n in R n . We note that for ω P i Lie p T K q Ď C dn ˆ dn the Frobenius norm } ω } F becomes under this identification the 2-norm } ω } in p R n q d . Example B.2.
Let d “ . The group G “ SL p n q acts on C n by left-multiplication, which inducesthe rational representation π : SL p n q Ñ GL p n q , g ÞÑ g with corresponding Lie algebra representation Π : Lie p SL p n qq Ñ C n ˆ n , A ÞÑ A . For i P r n s we set ε i : “ e i ´ n n P K n Ď R n . For all A “ diag p a , . . . , a n q P Lie p T q and all i P r n s π ` e A ˘ e i “ diag p e a , . . . , e a n q e i “ e a i e i p˚q “ e tr p A diag p ε i qq e i where we used a ` . . . ` a n “ in p˚q . Thus, ε i P K n – i Lie p T K q is a weight of π with weight vector e i .Since C n “ À i C e i , we deduce Ω p π q “ t ε i | i P r n su . In other words, π is a morphism of affine algebraic groups. xample B.3. Of particular importance in representation theory is the adjoint representation . That is, G “ SL p n q d acts on its Lie algebra by conjugation Ad : G Ñ GL p Lie p G qq , g ÞÑ p A ÞÑ gAg ´ q , whichinduces the representation of Lie algebras ad : Lie p G q ÞÑ End p Lie p G qq , A ÞÑ p B ÞÑ r
A, B sq . The non-zeroweights α P Ω p Ad q are called roots of G and the weight spaces Lie p G q α are called root spaces .Let d “ and for i, j P r n s denote by E i,j the matrix with entry one at position i, j and all other entriesbeing zero. Then for i, j P r n s with i ‰ j and for all A “ diag p a , . . . , a n q , B P Lie p T q we compute ad p A q E i,j “ r A, E i,j s “ p a i ´ a j q E i,j “ tr ` A diag p e i ´ e j q ˘ E i,j , ad p A qp B q “ r A, B s “ . Since n , e i ´ e j P K n – i Lie p T K q , we deduce e i ´ e j P Ω p Ad q with weight vector E i,j and n P Ω p Ad q with weight vector B P Lie p T q . Therefore, the set of roots of SL p n q is t e i ´ e j | i, j P r n s , i ‰ j u , because Lie p G q “ Lie p T q ‘ À i ‰ j C E i,j .More generally, one can deduce that the roots of G “ SL p n q d are the p e i ´ e j , n , . . . , n q , p n , e i ´ e j , n , . . . , n q , . . . . . . , p n , . . . , n , e i ´ e j q P p R n q d for i, j P r n s with i ‰ j and that Lie p G q “ Lie p T q ‘ À α Lie p G q α . We need the following property of roots, see e.g. [Hal03, Lemma 7.11].
Proposition B.4.
Let α be a root of G “ SL p n q d and let π : G Ñ GL p V q be a rational representation of G .If V ω is the weight space of some weight ω P Ω p π q , then Π ` Lie p G q α ˘ p V ω q Ď V ω ` α , where V ω ` α : “ t u , if ω ` α R Ω p π q . C Padding for tensor margin and tensor gap
The Theorems 2.1 and 4.11 only give for all n ě bounds for certain sub-families of tp n, d q | d ě u .Still, we can deduce Theorems 1.3 and 1.6 via some padding on the number of tensor factors d ; thatpadding is provided in Proposition C.1 below. Recall the representation for tensor scaling π n,d : SL p n q d Ñ GL ´ p C n q b d ¯ , p g , . . . , g d q ÞÑ g b ¨ ¨ ¨ b g d , which set of weights is Ω p π n,d q “ Ω n,d “ t ε i | i P r n su d Ď p R n q d . Proposition C.1.
Let G : “ SL p n q d and n, d ě . Consider a set of weights Γ n,d Ď Ω n,d such that R conv p Γ n,d q , i.e. Γ n,d witnesses the inequality γ p Ω n,d q “ γ T p π n,d q ď dist p , conv p Γ n,d qq .1. Then γ p Ω n,d ` q ď dist ` , conv p Γ n,d q ˘ . Consequently, γ p Ω n,d ` q ď γ p Ω n,d q .2. If additionally Γ n,d is free, then γ G p π n,d ` r q ď dist ` , conv p Γ n,d q ˘ for all r ě .Proof. To prove the statement we set for r ě r : “ tp ε i , . . . , ε i q | i P r n su Ď p R n q r and Γ n,d ` r : “ Γ n,d ˆ ∆ r Ď Ω p π n,d ` r q .
46y Eq. (2.1) we have P conv p ∆ r q and therefore conv p Γ n,d ` r q “ conv p Γ n,d q ˆ conv p ∆ r q Ě conv p Γ n,d q ˆ t u . The latter implies dist ` , conv p Γ n,d ` r q ˘ ď dist ` , conv p Γ n,d q ˘ . (C.1)Clearly, P conv p Γ n,d ` r q implies P conv p Γ n,d q or, by contraposition, the assumption R conv p Γ n,d q yields R conv p Γ n,d ` r q . The latter for r “ shows γ T p π n,d ` q ď dist ` , conv p Γ n,d ` q ˘ and weconclude the first assertion with Eq. (C.1).Assume in addition that Γ n,d is free and let r ě . Considering Definition 4.12 and Proposi-tion 4.13 we prove that also Γ n,d ` r is free. For this, let M Ď r n s d be such that Γ M “ Γ n,d and consider p x, i, . . . , i q , p y, j, . . . , j q P M ˆ r n s r with p x, i, . . . , i q ‰ p y, j, . . . , j q . If x ‰ y , then x and y differ inat least two components by freeness of M . If x “ y , then we have i ‰ j and so p x, i, . . . , i q and p y, j, . . . , j q differ in at least two components, using r ě . This shows that Γ n,d ` r is free for r ě .Since also R conv p Γ n,d ` r q we obtain with Proposition 4.8 that γ G p π n,d ` r q ď dist ` , conv p Γ n,d ` r q ˘ holds for all r ě . Finally, we deduce the second statement using Eq. (C.1). Proposition C.2.
For n ě it holds that γ T p π n, q ď γ G p π n, q ď ´ n ` .Proof. This result can be obtained by imitating the proof of Theorem 2.1(b) in subsection 2.2 byusing Γ n, : “ tp ε i , ε j , ε k , ε i q | p i, j, k q P W n u Ď Ω p π n, q . Clearly, R conv p Γ n, q as R conv p Γ n, q by Lemma 2.7. Moreover, one can show with Lemma 2.5(similar to the proof of Lemma 2.6) that x : “ ´ c n ´ p ε , ε , ε , ε q P conv p Γ n, q , where c “ n ´ ´ n ` ě . Thus, (cid:107) p ε , ε , ε , ε q (cid:107) ď ? implies (cid:107) x (cid:107) ď c ´ ´ n ` ? ď ´ n ` . This proves γ T p π n, q ď ´ n ` .Since W n is free by Proposition 4.15, the set tp i, j, k, i q | p i, j, k q P W n u is free. Hence, weconclude γ G p π n, q ď ´ n ` with Proposition 4.13 and Proposition 4.8. D Proof of Lemma 2.10
Proof.
For the sake of contradiction assume that P Aff p Γ n, r ´ q . Then there are coefficients a s , b s , c s P R , where ď s ď rn , such that a “ . . . “ a r “ b “ . . . “ b r “ , ř s p a s ` b s ` c s q “ and rn ÿ s “ ` a s ε σ p s q ,σ p q ,σ p s q ` b s ε σ p s q ,σ p s q ,σ p q ` c s ε σ p s ´ q ,σ p s q ,σ p s q ˘ “ P p R n q r ´ . (D.1)The bulk of our work will consist of proving the equations b ` c “ b ` c “ . . . “ b rn ` c rn (D.2) a ` c “ a ` c “ . . . “ a rn ` c rn . (D.3)47rom here we will derive a contradiction. We now set about proving Eqs. (D.2) and (D.3). Rewritethe left-hand-side of Eq. (D.1) as the collection for k P r r ´ s of the following affine linearcombinations of ε , . . . , ε n in R n : rn ÿ s “ ` a s ε σ k p s q ` b s ε σ k p s q ` c s ε σ k p s ´ q ˘ “ (D.4) rn ÿ s “ ` a s ε σ k p q ` b s ε σ k p s q ` c s ε σ k p s q ˘ “ (D.5) rn ÿ s “ ` a s ε σ k p s q ` b s ε σ k p q ` c s ε σ k p s q ˘ “ . (D.6)If we expand this expressions as affine linear combinations of the ε l , then by Lemma 2.2 thecoefficient of ε l must be n ´ for all l P r n s . Translating this for equations (D.4), (D.5) and (D.6)respectively with ď l ď n and k P r r s , and using for j P r r s that σ k ` r p l ´ q ` j ´ k ` ˘ “ R p r p l ´ q ` j ´ k ` q ` p k ´ q r V “ l (D.7)we get @ k P r r s , l P t , , . . . , n u : r ÿ j “ ` a r p l ´ q` j ´ k ` ` b r p l ´ q` j ´ k ` ` c r p l ´ q` j ´ k ` ˘ “ n (D.8) @ k P r r s , l P t , , . . . , n u : r ÿ j “ ` b r p l ´ q` j ´ k ` ` c r p l ´ q` j ´ k ` ˘ “ n (D.9) @ k P r r s , l P t , , . . . , n u : r ÿ j “ ` a r p l ´ q` j ´ k ` ` c r p l ´ q` j ´ k ` ˘ “ n (D.10)respectively, where we set c rn ` : “ . Fixing some l ě and subtracting Eq. (D.9) with k “ fromEq. (D.9) for k “ , we find a telescoping sum that reduces to b r p l ´ q ` c r p l ´ q “ b rl ` c rl . Indeed,subtracting the two yields “ r ÿ j “ ` b r p l ´ q` j ´ ` c r p l ´ q` j ´ ˘ ´ r ÿ j “ ` b r p l ´ q` j ` c r p l ´ q` j ˘ “ r ´ ÿ j “ ` b r p l ´ q` j ` c r p l ´ q` j ˘ ´ r ÿ j “ ` b r p l ´ q` j ` c r p l ´ q` j ˘ “ p b r p l ´ q ` c r p l ´ q q ´ p b rl ` c rl q . More generally, for k P r r ´ s combining (D.9) for k and k Ð k ` , implies b rl ´ k ` ` c rl ´ k ` “ b r p l ´ q´ k ` ` c r p l ´ q´ k ` for all l “ , . . . , n , i.e. for every k P r r ´ s we have c r ´ k ` “ b r ´ k ` ` c r ´ k ` “ b r ´ k ` ` c r ´ k ` “ . . . “ b rn ´ k ` ` c rn ´ k ` . (D.11)We are still missing the value k “ , or the equations b r ` ` c r ` “ b r ` ` c r ` “ . . . “ b r p n ´ q` ` c r p n ´ q` . (D.12)48e obtain this by subtracting, for l “ , . . . , n , (D.9) for k “ and l from (D.9) with k “ r and l Ð l ` . Indeed, “ r ÿ j “ ` b rl ` j ´ r ` ` c rl ` j ´ r ` ˘ ´ r ÿ j “ ` b r p l ´ q` j ` c r p l ´ q` j ˘ “ r ` ÿ j “ ` b r p l ´ q` j ` c r p l ´ q` j ˘ ´ r ÿ j “ ` b r p l ´ q` j ` c r p l ´ q` j ˘ “ ` b rl ` ` c rl ` ˘ ´ ` b r p l ´ q` ` c r p l ´ q` ˘ . Lastly, we are missing the equations b ` c “ b ` c “ . . . “ b r ` ` c r ` for Eq. (D.2). We havenot yet used in Eq. (D.5) the values k “ r ` m with m P r r ´ s . For this we note that σ r ` m ` j ˘ “ for j P t r ´ m ` u Y t r ` , r ` , . . . , r u . We use this equation to apply Lemma 2.2 to (D.5) for ε and k “ r ` m with m P r r ´ s to obtain b r ´ m ` ` c r ´ m ` ` r ÿ j “ ` b r ` j ` c r ` j ˘ “ n . We need one more equation to eliminate the right-hand term, so we use the following. Lemma 2.2applied to equation (D.9) for k “ and l “ yields r ÿ j “ ` b r ` j ` c r ` j ˘ “ n . Subtracting this equation from the previous one yields, b r ´ m ` ` c r ´ m ` “ b r ` ` c r ` for all m “ , . . . , r ´ . Together with the equations (D.11) and (D.12) we conclude Eq. (D.2). Analogously,(D.6) and (D.10) can be used to obtain Eq. (D.3).To get a contradiction we show that a s “ b s “ c s “ for all s “ , , . . . , rn . For this, we set a : “ ř s a s and b : “ ř s b s . Eq. (D.7) still applies for l “ , k “ , so Lemma 2.2 applied to thecoefficient of ε in (D.4), in (D.5) and in (D.6) respectively for k “ gives r ÿ j “ c j ` “ n , a ` r ´ ÿ j “ c j ` “ n and b ` r ´ ÿ j “ c j ` “ n respectively. Subtracting the second equation from the first gives a “ c r ` , and reasoninganalogously for the third yields a “ b “ c r ` . Moreover, (D.9) with k “ r and l “ is ř rj “ p b j ` ` c j ` q “ n ´ . Using the latter together with b “ . . . “ b r “ and ř rj “ c j ` “ n ´ yields b r ` “ and similarly a r ` “ via (D.10) with k “ r and l “ .Since now also a r ` “ b r ` “ , the equation (D.8) with k “ r and l “ simplifies to ř rj “ c j ` “ n ´ . In conjunction with ř rj “ c j ` “ n ´ we deduce c “ c r ` and hence b r ` “ “ a r ` by(D.2) and (D.3). But now (D.8) with k “ r ´ and l “ is ř rj “ c j ` “ n ´ and together with ř rj “ c j ` “ n ´ we get c “ c r ` . Continuing inductively we obtain @ j P r r s : c j ` “ c r ` j ` and a r ` j ` “ b r ` j ` “ l “ , k P r r s and via (D.2), (D.3). Then (D.8) with k “ r and l “ simplifies to ř rj “ c r ` j ` “ n ´ and together with n ´ “ ř rj “ c j ` “ ř rj “ c r ` j ` we have c r ` “ c r ` . Hence, b r ` “ “ a r ` via (D.2) respectively (D.3). Continuing inductively in the outlined mannerwith equation (D.8) for k P r r s , l “ , . . . , n and with the equations (D.2) and (D.3) we conclude a s “ b s “ for all s “ , . . . , rn , so a “ b “ . Finally, (D.2) implies c r ` “ c s for all s “ , . . . , rn ,but c r ` “ b “ giving the desired contradiction. E Padding and rounding for diameter bounds
We begin with the proof of Proposition 3.5. We prove it only for d “ , but the proof goes throughmutatis mutandis for all d ě . Proof of Proposition 3.5.
Recall that q is the n ˆ n ˆ n array such that q ijk “ tn p ijk for i, j, k P r t s , q iii “ { n for t ` ď i ď n , and q ijk “ otherwise. We may split the inputs x, y, z P K n into x “ ˆ x ` α t , x ´ tn ´ t α n ´ t ˙ ,y “ ˆ y ` α t , y ´ tn ´ t α n ´ t ˙ ,z “ ˆ z ` α t , z ´ tn ´ t α n ´ t ˙ where x , y , z P R t , x , y , z P R n ´ t each sum to zero; write w “ p x , y , z q . As }p x, y, z q} ě } w } ,it is enough to prove that } w } is large for any approximate minimizer. By optimizing over α i and x , y , z for fixed w , one computes that the optimum value for f q for any fixed w is f p p w q t { n . To seethis, write f q p x, y, z q “ te α ` α ` α n f p p w q ` e ´ tn ´ t p α ` α ` α q n n ÿ i “ t ` e x i ` y i ` z i . First note that for fixed α i ’s, the second term is minimized at x “ y “ z “ by Jensen’s inequality.Furthermore, the value only depends on α : “ α ` α ` α . With x , y , z “ , we have f q p x, y, z q “ g p w, α q : “ te α n f p p w q ` p n ´ t q n e ´ tn ´ t α . Taking the derivative in α , we see that this is minimized when f p p w q e α “ e ´ tn ´ t α , or e α “ f p p w q ´ {p ` tn ´ t q “ f p p w q ´ n ´ tn . Plugging this value in proves that the optimum is f p p w q t { n . Byconcavity of x t { n , provided f p p w q ď we have f p p w q t { n ´ cap p p q t { n ě ´ cap p p q t { n ´ cap p p q p f p p w q ´ cap p p qq . The first factor in the second term is the slope of the line from p cap p p q , cap p p q t { n q to p , q . Thusfor any ε ď ´ cap p p q , any ε -approximate minimizer for f q has norm at least that of some ` ´ cap p p q ´ cap p p q t { n ˘ ε -approximate minimizer for f p . 50 roof of Lemma 4.23. We use the dual expression: log cap q “ ´ inf E r ω “ D KL p r || q q where r rangesover probability distributions on Ω . In particular, log cap q ě ´ D KL p r || q q for any distribution r on Ω with E r ω “ . Let r be a probability distribution; calculate log cap q ě ´ D KL p r || q q “ ´ D KL p r || p q ` D KL p r || p q ´ D KL p r || q q“ ´ D KL p r || p q ` ÿ ω P Ω r ω log p r ω { p ω q ´ ÿ ω P Ω r ω log p r ω { q ω q“ ´ D KL p r || p q ` ÿ ω P Ω r ω p log q ω ´ log p ω q . We lower bound log q ω ´ log p ω ě q ω p q ω ´ p ω q by applying the inequality log x ď x ´ to x “ p ω { q ω .Hence log cap q ě ´ D KL p r || p q ` ÿ ω P Ω r ω q ω p q ω ´ p ω qě ´ D KL p r || p q ´ M } p ´ q } . Allowing ´ D KL p r || p q to tend to log cap p completes the proof. Proof of Lemma 4.22.
Applying Lemma 4.23 with the roles of p and q switched yields log cap p ě log cap q ´ M } p ´ q } . Exponentiating both sides and applying the inequality e x ě ` x yields cap p ě p ´ M } p ´ q } q cap q. Thus inf x P B f q p x q “ inf x P S f q p x q ě ´ sup x P S | f q p x q ´ f p p x q| ` inf x P S f p p x q . Note that the minimizer for f q over B lies in the set S : “ B X t x : @ ω, q ω e x ¨ ω ď f q p q “ } q } u . Forall x P S , we have e x ¨ ω ď cap q { p ω for all ω P Ω , so f q p x q ´ f p p x q ď ÿ ω P Ω | p ω ´ q ω | e x ¨ ω ď ÿ ω P Ω | p ω ´ q ω |} q } { q ω qď } p ´ q } M } q } . Combining the above inequality with the lower bound for cap p p q , inf x P B f q p x q ě ´ M } q } } p ´ q } ` p ` ε q cap p ě p ` ε qp ´ M } p ´ q } q cap q ´ M } p ´ q } } q } . Acknowledgements
The authors thank Michael Walter, Peter Bürgisser, Visu Makam, and Jason Altschuler for helpfuldiscussions. The second author acknowledges funding by the European Research Council (ERC)under the Europeans Horizon 2020 research and innovation programme (grant agreement no.787840). 51 eferences [ABA20] Jason M. Altschuler and Enric Boix-Adsera. Polynomial-time algorithms for Multi-marginal Optimal Transport problems with structure. 2020. arXiv:2008.03006 .[AGV18] Nima Anari, Shayan Oveis Gharan, and Cynthia Vinzant. Log-concave polynomials,entropy, and a deterministic approximation algorithm for counting bases of matroids.In , pages35–46. IEEE, 2018.[AKRS20] Carlos Améndola, Kathlén Kohn, Philipp Reichenbach, and Anna Seigal. Invarianttheory and scaling algorithms for maximum likelihood estimation. 2020. arXiv:2003.13662 .[AMS08] P.-A. Absil, R. Mahony, and R. Sepulchre.
Optimization algorithms on matrix manifolds .Princeton University Press, Princeton, NJ, 2008. With a foreword by Paul Van Dooren. doi:10.1515/9781400830244 .[AV97] Noga Alon and Văn H. V ˜u. Anti-Hadamard matrices, coin weighing, thresholdgates, and indecomposable hypergraphs.
Journal of Combinatorial Theory, Series A ,79(1):133–160, 1997.[AZGL `
18] Zeyuan Allen-Zhu, Ankit Garg, Yuanzhi Li, Rafael Oliveira, and Avi Wigderson. Op-erator scaling via geodesically convex optimization, invariant theory and polynomialidentity testing. In
Proceedings of the 50th Annual ACM SIGACT Symposium on Theory ofComputing , pages 172–181, 2018.[AZLOW17] Zeyuan Allen-Zhu, Yuanzhi Li, Rafael Oliveira, and Avi Wigderson. Much fasteralgorithms for matrix scaling. In , pages 890–901. IEEE, 2017.[Bac14] Miroslav Bacák.
Convex analysis and optimization in Hadamard spaces , volume 22. Walterde Gruyter GmbH & Co KG, 2014.[BCMW17] Peter Bürgisser, Matthias Christandl, Ketan D. Mulmuley, and Michael Walter. Mem-bership in moment polytopes is in NP and coNP.
SIAM J. Comput. , 46(3):972–991,2017. doi:10.1137/15M1048859 .[BFG `
18] Peter Bürgisser, Cole Franks, Ankit Garg, Rafael Oliveira, Michael Walter, and AviWigderson. Efficient algorithms for tensor scaling, quantum marginals, and momentpolytopes. In , pages 883–897. IEEE, 2018.[BFG `
19] Peter Bürgisser, Cole Franks, Ankit Garg, Rafael Oliveira, Michael Walter, and AviWigderson. Towards a theory of non-commutative optimization: geodesic first andsecond order methods for moment maps and polytopes. 2019. arXiv:1910.12375 .[BGO `
18] Peter Bürgisser, Ankit Garg, Rafael Oliveira, Michael Walter, and Avi Wigderson.Alternating Minimization, Scaling Algorithms, and the Null-Cone Problem from52nvariant Theory. In , volume 94 of
Leibniz International Proceedings in Informatics (LIPIcs) , pages24:1–24:20, 2018. doi:10.4230/LIPIcs.ITCS.2018.24 .[Bha07] Rajendra Bhatia.
Positive definite matrices . Princeton Series in Applied Mathematics.Princeton University Press, Princeton, NJ, 2007.[BLNW20] Peter Bürgisser, Yinan Li, Harold Nieuwboer, and Michael Walter. Interior-pointmethods for unconstrained geometric programming and scaling problems. 2020. arXiv:2008.12110 .[CFK `
97] James W. Cannon, William J. Floyd, Richard Kenyon, Walter R. Parry, et al. Hyperbolicgeometry.
Flavors of geometry , 31:59–115, 1997.[CMTV17] Michael B. Cohen, Aleksander Madry, Dimitris Tsipras, and Adrian Vladu. Matrixscaling and balancing via box constrained Newton’s method and interior pointmethods. In , pages 902–913. IEEE, 2017.[Cut13] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In
Advances in neural information processing systems , pages 2292–2300, 2013.[DK85] Jiri Dadok and Victor Kac. Polar representations.
J. Algebra , 92(2):504–524, 1985. doi:10.1016/0021-8693(85)90136-X .[DM18] Harm Derksen and Visu Makam. Degree bounds for semi-invariant rings of quivers.
J. Pure Appl. Algebra , 222(10):3282–3292, 2018. doi:10.1016/j.jpaa.2017.12.007 .[DM20a] Harm Derksen and Visu Makam. Algorithms for orbit closure separation for invariantsand semi-invariants of matrices.
Algebra Number Theory , 14(10):2791–2813, 2020. doi:10.2140/ant.2020.14.2791 .[DM20b] Harm Derksen and Visu Makam. An exponential lower bound for the degreesof invariants of cubic forms and tensor actions.
Adv. Math. , 368:107136, 25, 2020. doi:10.1016/j.aim.2020.107136 .[FM20] Cole Franks and Ankur Moitra. Rigorous Guarantees for Tyler’s M-estimator viaquantum expansion. 2020. arXiv:2002.00071 .[Fra02] Matthias Franz. Moment polytopes of projective G -varieties and tensor products ofsymmetric group representations. J. Lie Theory , 12(2):539–549, 2002.[FS13] Michael A Forbes and Amir Shpilka. Explicit noether normalization for simultaneousconjugation via polynomial identity testing. In
Approximation, Randomization, andCombinatorial Optimization. Algorithms and Techniques , pages 527–542. Springer, 2013.[GAN99] X. Gual-Arnau and A. M. Naveira. Volume of tubes in noncompact symmetric spaces.
Publ. Math. Debrecen , 54(3-4):313–320, 1999.53GGOW16] Ankit Garg, Leonid Gurvits, Rafael Oliveira, and Avi Wigderson. A deterministicpolynomial time algorithm for non-commutative rational identity testing. In , pages 109–117.IEEE, 2016.[GIM `
20] Ankit Garg, Christian Ikenmeyer, Visu Makam, Rafael Oliveira, Michael Walter, andAvi Wigderson. Search Problems in Algebraic Complexity, GCT, and Hardness ofGenerators for Invariant Rings. In ,volume 169 of
Leibniz International Proceedings in Informatics (LIPIcs) , pages 12:1–12:17,2020. doi:10.4230/LIPIcs.CCC.2020.12 .[GS84] V. Guillemin and S. Sternberg. Convexity properties of the moment mapping. II.
Invent. Math. , 77(3):533–546, 1984. doi:10.1007/BF01388837 .[Gur04a] Leonid Gurvits. Classical complexity and quantum entanglement.
Journal of Computerand System Sciences , 69(3):448–484, 2004.[Gur04b] Leonid Gurvits. Combinatorial and algorithmic aspects of hyperbolic polynomials.2004. arXiv:math/0404474 .[Hal03] Brian C. Hall.
Lie groups, Lie algebras, and representations , volume 222 of
GraduateTexts in Mathematics . Springer-Verlag, New York, 2003. An elementary introduction. doi:10.1007/978-0-387-21554-9 .[HM13] Moritz Hardt and Ankur Moitra. Algorithms and hardness for robust subspacerecovery. In
Conference on Learning Theory , pages 354–375, 2013.[HM21] Linus Hamilton and Ankur Moitra. No-go Theorem for Acceleration in the HyperbolicPlane. 2021. arXiv:2101.05657 .[IQS18] Gábor Ivanyos, Youming Qiao, and K. V. Subrahmanyam. Constructive non-commutative rank computation is in deterministic polynomial time.
Comput. Complex-ity , 27(4):561–593, 2018. doi:10.1007/s00037-018-0165-7 .[KK96] Bahman Kalantari and Leonid Khachiyan. On the complexity of nonnegative-matrixscaling.
Linear Algebra and its applications , 240:87–103, 1996.[KL05] M. K. Kravtsov and V. E. Lukshin. On some properties of noninteger vertices of a three-index axial transportation polytope.
Tr. Inst. Matematiki NAN Belarusi , 13(2):31–36,2005.[KN79] George Kempf and Linda Ness. The length of vectors in representation spaces. In
Algebraic geometry (Proc. Summer Meeting, Univ. Copenhagen, Copenhagen, 1978) , volume732 of
Lecture Notes in Math. , pages 233–243. Springer, Berlin, 1979.[Kra07] V. M. Kravtsov. Combinatorial properties of noninteger vertices of a polytope in athree-index axial assignment problem.
Kibernet. Sistem. Anal. , 43(1):33–44, 189, 2007. doi:10.1007/s10559-007-0023-0 .[LHCJ19] Tianyi Lin, Nhat Ho, Marco Cuturi, and Michael I. Jordan. On the complexity ofapproximating multimarginal optimal transport. 2019. arXiv:1910.00152 .54LL14] Nathan Linial and Zur Luria. On the vertices of the d-dimensional Birkhoff polytope.
Discrete & Computational Geometry , 51(1):161–170, 2014.[Mul17] Ketan Mulmuley. Geometric complexity theory V: Efficient algorithms for Noethernormalization.
Journal of the American Mathematical Society , 30(1):225–309, 2017.[Mum65] David Mumford.
Geometric Invariant Theory . Ergebnisse der Mathematik und ihrerGrenzgebiete, Neue Folge, Band 34. Springer-Verlag, Berlin-New York, 1965.[Nes84] Linda Ness. A stratification of the null cone via the moment map.
Amer. J. Math. ,106(6):1281–1329, 1984. With an appendix by David Mumford. doi:10.2307/2374395 .[PR71] Beresford N. Parlett and Christian Reinsch. Balancing a matrix for calculation ofeigenvalues and eigenvectors. In
Handbook for Automatic Computation , pages 315–326.Springer, 1971.[Rus20] Alexander Rusciano. A Riemannian Corollary of Helly’s theorem.
J. Convex Anal. ,27(4):1261–1275, 2020.[Sja98] Reyer Sjamaar. Convexity properties of the moment mapping re-examined.
Adv.Math. , 138(1):46–91, 1998. doi:10.1006/aima.1998.1739 .[SV14] Mohit Singh and Nisheeth K. Vishnoi. Entropy, optimization and counting. In
Proceedings of the forty-sixth annual ACM symposium on Theory of computing , pages 50–59,2014.[SV19] Damian Straszak and Nisheeth K. Vishnoi. Maximum entropy distributions: Bitcomplexity and stability. In
Proceedings of the Thirty-Second Conference on LearningTheory , volume 99 of
Proceedings of Machine Learning Research , pages 2861–2891. PMLR,25–28 Jun 2019. URL: https://proceedings.mlr.press/v99/straszak19a.html , arXiv:1711.02036 .[Wal17] Nolan R. Wallach. Geometric Invariant Theory: Over the real and complex numbers .Universitext. Springer, Cham, 2017. doi:10.1007/978-3-319-65907-7 .[Wey46] Hermann Weyl.
The classical groups: their invariants and representations , volume 45.Princeton university press, 1946.[ZS16] Hongyi Zhang and Suvrit Sra. First-order methods for geodesically convex optimiza-tion. In