[PDF] Semidefinite Relaxations of Products of Nonnegative Forms on the Sphere

Abstract

We study the problem of maximizing the geometric mean of d low-degree non-negative forms on the real or complex sphere in n variables. We show that this highly non-convex problem is NP-hard even when the forms are quadratic and is equivalent to optimizing a homogeneous polynomial of degree O(d) on the sphere. The standard Sum-of-Squares based convex relaxation for this polynomial optimization problem requires solving a semidefinite program (SDP) of size n^{O(d)}, with multiplicative approximation guarantees of \Omega(\frac{1}{n}). We exploit the compact representation of this polynomial to introduce a SDP relaxation of size polynomial in n and d, and prove that it achieves a constant factor multiplicative approximation when maximizing the geometric mean of non-negative quadratic forms. We also show that this analysis is asymptotically tight, with a sequence of instances where the gap between the relaxation and true optimum approaches this constant factor as d \rightarrow \infty. Next we propose a series of intermediate relaxations of increasing complexity that interpolate to the full Sum-of-Squares relaxation, as well as a rounding algorithm that finds an approximate solution from the solution of any intermediate relaxation. Finally we show that this approach can be generalized for relaxations of products of non-negative forms of any degree.

Full PDF

SSEMIDEFINITE RELAXATIONS OF PRODUCTS OF NONNEGATIVE FORMSON THE SPHERE

CHENYANG YUAN AND PABLO A. PARRILO

Abstract.

We study the problem of maximizing the geometric mean of d low-degree non-negativeforms on the real or complex sphere in n variables. We show that this highly non-convex problemis NP-hard even when the forms are quadratic and is equivalent to optimizing a homogeneouspolynomial of degree O ( d ) on the sphere. The standard Sum-of-Squares based convex relaxation forthis polynomial optimization problem requires solving a semideﬁnite program (SDP) of size n O ( d ) ,with multiplicative approximation guarantees of Ω( n ). We exploit the compact representationof this polynomial to introduce a SDP relaxation of size polynomial in n and d , and prove thatit achieves a constant factor multiplicative approximation when maximizing the geometric meanof non-negative quadratic forms. We also show that this analysis is asymptotically tight, with asequence of instances where the gap between the relaxation and true optimum approaches thisconstant factor as d → ∞ . Next we propose a series of intermediate relaxations of increasingcomplexity that interpolate to the full Sum-of-Squares relaxation, as well as a rounding algorithmthat ﬁnds an approximate solution from the solution of any intermediate relaxation. Finally weshow that this approach can be generalized for relaxations of products of non-negative forms of anydegree. Introduction

Sum-of-squares optimization is a powerful method of constructing hierarchies of relaxations forpolynomial optimization problems that converge to the optimal solution at a cost of increasing com-putational complexity ([Las01], [Par00]). However, computing these relaxations in general requiressolving large instances of semideﬁnite programs (SDPs), which quickly becomes computationallyintractable. In particular, to ﬁnd the Sum-of-Squares decomposition of a dense degree- d polynomialin n variables, the input size alone is of order n O ( d ) , which is exponential in the degree.In this paper, we introduce a series of Sum-of-Squares based algorithms to eﬃciently approximatea class of dense polynomial optimization problems where the polynomials have high degree (wherethe degree is comparable to the number of variables) but are compactly represented (meaning thatthey can be eﬃciently evaluated). One example of such a polynomial is the determinant of a n × n matrix, a degree n polynomial in its n entries (thus having exponentially many coeﬃcients), butcan be eﬃciently computed in polynomial time. The class of polynomials we study in this paperis constructed by taking the product of low-degree non-negative polynomials. For the most of thepaper, we will focus on the product of positive semideﬁnite (PSD) forms, corresponding to theproduct of degree-2 non-negative polynomials. Deﬁnition 1.1.

Let A = ( A , . . . , A d ) where A i ∈ K n × n be symmetric/Hermitian PSD matrices,where K = R or C . Then p ( x ) = d (cid:89) i =1 (cid:104) x, A i x (cid:105) , a degree- d polynomial of n variables, is a product of PSD forms. Maximizing the product of PSD forms over the sphere generalizes many diﬀerent problemsin optimization, such as Kantorovich’s inequality, optimizing monomials over the sphere, linear a r X i v : . [ m a t h . O C ] F e b olarization constants for Hilbert spaces, approximating permanents of PSD matrices, portfoliooptimization, and can also be interpreted as computing the Nash social welfare for agents withpolynomial utility functions. These applications will be further elaborated in Section 2. We alsoprove in Section 7 that this problem is NP-hard when d = Ω( n ), using a reduction to hardness ofapproximation of MaxCut . Since d can be much greater than n , in order to normalize for d wedeﬁne our objective to be the geometric mean of quadratic forms: Opt ( A ) := max x ∈ K , (cid:107) x (cid:107) =1 (cid:32) d (cid:89) i =1 (cid:104) x, A i x (cid:105) (cid:33) /d . (1)Sum-of-Squares optimization allow us to create a hierarchy of algorithms of increasing complexitythat give better bounds for (1). In general, if the objective is a degree 2 d polynomial, the lowestlevel of the hierarchy is a degree- d Sum-of-Squares relaxation. This relaxation for (1) is written asfollows:

OptSOS d ( A ) := min γ /d s.t. γ (cid:107) x (cid:107) d − d (cid:89) i =1 (cid:104) x, A i x (cid:105) is a sum of squares , (2)where a polynomial f ( x ) is a sum of squares if there exist polynomials s i ( x ) so that f ( x ) = (cid:80) i s i ( x ) . The constraint that a degree d polynomial in n variables is a sum of squares can berepresented by a SDP of size n O ( d ) . Although techniques exist for reducing the size of this repre-sentation for sparse polynomials [KKW05] and polynomials with symmetry [GP04], the polynomial p ( x ) may not have these properties. Thus OptSOS d ( A ) requires solving a SDP of size n O ( d ) . How-ever, because of the compact representation of this polynomial, one can perhaps hope to do better.In this paper we ﬁrst present a SDP-based relaxation of Opt ( A ) as well as a rounding algorithmfor this relaxation. Deﬁnition 1.2 (Semideﬁnite relaxation of (1)) . We deﬁne

OptSDP ( A ) to be the optimum of thefollowing SDP-based relaxation of (1) : OptSDP ( A ) := max X (cid:32) d (cid:89) i =1 (cid:104) A i , X (cid:105) (cid:33) /d s.t. (cid:26) X (cid:23) X ) = 1 , (3) where X is symmetric when K = R and Hermitian when K = C . This relaxation comes from writing (cid:104) x, A i x (cid:105) = (cid:10) A i , xx † (cid:11) in (1) and relaxing the rank-1 matrix xx † to the semideﬁnite variable X . Finding the value of this relaxation involves solving a SDPwith O ( n + d ) variables and O ( n d ) constraints, compared to the Sum-of-Squares relaxation (2)which involves solving a SDP of size n O ( d ) . The trade-oﬀ is that this relaxation is weaker thanSum-of-Squares (Proposition 6.6): Opt ( A ) ≤ OptSOS d ( A ) ≤ OptSDP ( A ) . Nevertheless, we show that its approximation factor is bounded by a constant, compared to theworst case n approximation factor of general polynomial optimization algorithms ([BGG + X ∗ to (3): Sample y ∼ N K (0 , X ∗ ) and return x = y/ (cid:107) y (cid:107) ,where N K is a real/complex multivariate Gaussian distribution (see Deﬁnition 4.1). The followingtheorem bounds the multiplicative approximation factor of the relaxation OptSDP ( A ). Theorem 1.3.

Suppose there is an optimal solution X ∗ to (3) with rank( X ∗ ) = r . Let L r ( K ) = (cid:40) γ + log 2 + φ (cid:0) r (cid:1) − log (cid:0) r (cid:1) < . if K = R γ + φ ( r ) − log( r ) < . if K = C , here φ ( x ) = ddx log Γ( x ) is the digamma function. Then e − L r ( K ) OptSDP ( A ) ≤ Opt ( A ) ≤ OptSDP ( A ) , which gives us a multiplicative approximation factor of e − L r ( K ) . Since lim r →∞ φ ( r ) − log( r ) = 0, the approximation factor is at least 0 . K = R and0 . K = C , and can be improved if we can further bound rank( X ∗ ). In particular, since L ( K ) = 0, the rounding algorithm recovers the exact solution when rank( X ∗ ) = 1. In section 3we explore a few cases where this relaxation is exact, showing that the relaxation (3) is able toexactly recover Kantorovich’s inequality (Example 3.2), as well as ﬁnd the exact optimal solutionfor optimizing any monomial over the sphere (Section 3.3).Using a connection to linear polarization constants (Section 5), we show that there exists anasymptotically tight integrality gap instance where the gap between Opt ( A ) and OptSDP ( A )approaches the approximation factor e − L r ( K ) as n and d approaches inﬁnity. The intuition is tochoose A i = v i v † i to be rank-1, where v i are symmetrically distributed on the sphere. Because ofsymmetry, the rounding algorithm on this instance will sample a uniformly random point on thesphere, completely ignoring the structure of the problem. We plot an example of such a symmetricpolynomial in Figure 1.This also motivates the need for higher-degree relaxations that perform better than (3). InSection 6, we deﬁne a series of Sum-of-Squares based relaxations computing OptSOS k ( A ), whichinterpolates between OptSOS ( A ) = OptSDP ( A ) and OptSOS d ( A ), the full Sum-of-Squaresrelaxation. We also propose a randomized rounding algorithm which allows us to sample a feasiblesolution from the relaxation. Figure 1 shows the distribution sampled from this rounding algorithmfor diﬀerent values of k for a “worst case” example with multiple global optima symmetricallydistributed on the sphere. We can see that the sampled distribution concentrates towards the trueoptimum values as k increases. We then analyze the approximation ratio of the rounding algorithmand provide lower bounds on the integrality gap similar to the results in Section 5. Next we extendthis relaxation to products of general non-negative forms. Finally in Section 7, we prove a hardnessof approximation result for computing Opt ( A ) by a reduction to MaxCut .1.1.

Related Work.

There has been recent attention on problems similar to (1). The authors ofthis paper analyzed a special case of (1) where the A i are rank-1 matrices, used in an approximationalgorithm for the permanent of PSD matrices [YP21] (see Section 2.4 for more details). Barvinok[Bar93] reduced the problem of certifying feasibility for systems of quadratic equations to ﬁndingthe optimum of (1), and provided a polynomial time algorithm for solving (1) when d is ﬁxed. In amore recent work [Bar20], he studied a closely related problem of approximating the integral of aproduct of quadratic forms on the sphere, giving a quasi-polynomial time approximation algorithm.For general polynomial optimization on the sphere, [DW12], [BGG +

17] and [FF20] gave boundson the convergence of the Sum-of-Squares hierarchy. These papers analyzed the convergence ofhigher levels of the hierarchy (of which

OptSOS d ( A ) is the lowest level), proposed rounding algo-rithms and bounded their approximation ratios. As noted in the introduction, these methods whenapplied to (1) takes n O ( d ) time and only guarantees a Ω(1 /n ) approximation ratio, as p ( x ) is a highdegree polynomial.Finally, we review some strategies for speeding up Sum-of-Squares for diﬀerent polynomial opti-mization problems with special structure:(1) Solving the problem using a weakened but more computationally eﬃcient version of sum ofsquares, for example using diagonally-dominant or scaled-diagonally-dominant cones insteadof the positive semideﬁnite cone [AH17]. These methods typically sacriﬁce solution qualityfor computational tractability, but bounds on their approximation quality are not known. s i n ( / ) k = 2 s i n ( / ) k = 3 s i n ( / ) k = 4 s i n ( / ) k = 5 s i n ( / ) k = 6 p ico ( x , y , z ) on the sphere Figure 1. p ico ( x, y, z ) is a degree 6 polynomial in 3 variables with icosahedralsymmetry (see Example 6.4 for its deﬁnition). The 3D plot shows the value of p ico on the sphere, superimposed on an icosahedron. We compute OptSOS k for k = 2 , . . . ,

6, relaxations of maximizing p ico ( x, y, z ) over the 2-sphere. The 2Dplots show samples from the distribution obtained by the rounding algorithm to OptSOS k , on an equal-area projection of the sphere. The contour plot is of p ico and shows its 12 maxima on the sphere, and is overlaid on a scatter plot of 10000points sampled by the rounding algorithm.(2) Reducing the size of SDPs needed by exploiting special structure in the problem, such assparsity in [KKW05] and [FSP16] or symmetry in [GP04].(3) Using spectral methods inspired by sum of squares algorithms to solve average case problems[HSSS15] [HKP + Contributions.

In summary, the main contributions of this paper are:

1) An SDP-based relaxation (3) and a simple randomized rounding procedure that ﬁnds a fea-sible solution to (1). We then prove that this is a constant-factor approximation algorithmto (1) (Theorem 1.3).(2) Using a connection to the linear polarization constant problem (Section 2.3) to show anintegrality gap (Theorem 5.1) in the relaxation (3) that asymptotically matches the ap-proximation factor shown in Theorem 1.3 as d → ∞ .(3) A strategy (Section 6) to turn degree-2 Sum-of-Squares relaxations of (1) into degree- k relaxations for any k ≤ d , as a way of interpolating between the relaxation (3) and the fulldegree- d Sum-of-Squares relaxation. We also propose and implement a rounding algorithmto produce feasible solutions from these relaxations.(4) We also prove a hardness result from a reduction to

MaxCut (Section 7), showing that inthe regime d = Ω( n ), the problem (1) is NP-hard.1.3. Notations.

In subsequent sections, we use K to denote either R or C . For any x ∈ C , let x ∗ be its complex conjugate, and | x | = xx ∗ . For any matrix A ∈ K n × m , let A † = ( A ∗ ) T be itsconjugate transpose if K = C , or its transpose if K = R . Given a, b ∈ K n , let (cid:104) a, b (cid:105) = a † b beits inner product in K n , and (cid:107) a (cid:107) = (cid:104) a, a (cid:105) . A matrix A is Hermitian if A = A † , and is positivesemideﬁnite (PSD) if in addition x † Ax ≥ x ∈ K . We can also denote this as A (cid:23)

0. The (cid:23) operator induces a partial order called the L¨owner order, where A (cid:23) B if A − B (cid:23) Motivation and Applications

In this section we introduce a variety of problems that can be cast into (1), maximizing thegeometric mean of PSD forms over the sphere. In particular, for a few special cases the relaxation

OptSDP is exact, corresponding to when d = 2 (Kantorovich’s inequality in Section 2.1) or A i arediagonal (optimizing monomials over sphere in Section 2.2 and portfolio optimization in Section 2.5).This shows that our approach generalizes many other optimization methods and has applicationsto problems such as ﬁnding the linear polarization constant of Hilbert spaces (Section 2.3) andapproximating the permanent of PSD matrices (Section 2.4).2.1. Kantorovich’s Inequality.Proposition 2.1 ([Kan48]) . Given a symmetric n × n positive deﬁnite matrix A , let λ ≥ · · · ≥ λ n > be its eigenvalues. Then for all x ∈ R n : ( x † Ax )( x † A − x ) x † x ≤ (cid:32)(cid:114) λ λ n + (cid:114) λ n λ (cid:33) (4)This inequality is used in the analysis of the convergence rate for gradient descent (with exactline search) on quadratic objectives x † Ax + b † x (see, for example, [LY08]). It is used to provethat the error decreases by a factor of (cid:16) λ − λ n λ + λ n (cid:17) with each step taken. It can also be used tobound the eﬃciency of estimators in noisy linear regression where A is the covariance matrix of thenoise [Rag86]. The optimization problem (1) is a generalization of this inequality to higher degreeproducts. However unlike in (4) the A i may not be simultaneously diagonalizable.2.2. Optimizing Monomials over the Sphere.

Maximizing monomials on the sphere is a specialcase of (1) where A i are diagonal. We can compute the exact value of the maximum of any monomialover the sphere, and we have the following result for K = R (a similar result holds for K = C ). roposition 2.2. Let x β = (cid:81) ni =1 x β i i be any monomial of degree d = (cid:80) i β i . Then max (cid:107) x (cid:107) =1 , x ∈ R n (cid:16) x β (cid:17) d = 1 d n (cid:89) i =1 β β i /di This result is proven in Appendix B. Since we know the exact value for this special case, it is usefulto use this problem to compare diﬀerent methods of speeding up Sum-of-Squares. In particular,the algorithms derived from Sum-of-Squares in ([HSSS15] and [BGG + Linear Polarization Constants for Hilbert Spaces.

When all the A i in (1) are rank-1,the optimization problem has connections to the linear polarization constant problem: Deﬁnition 2.3 (Linear polarization constant of a normed space) . Given a normed space X , let X ∗ be its dual and S X = { x ∈ X : (cid:107) x (cid:107) = 1 } be the sphere with respect to the norm. Then the d -thlinear polarization constant of X is given by: c d ( X ) := (cid:32) inf f ,...,f d ∈ S X ∗ sup x ∈ S X | f ( x ) · · · f d ( x ) | (cid:33) − This problem has been studied in the papers [PR04], [Mar97], and [MM06]. In particular, it isproved in [Ari98] that c d ( C d ) = d d/ , but the analogous result for R d is still a conjecture: Conjecture 2.4 ([PR04]) . Let v , . . . , v d and x be vectors in R d . min (cid:107) v (cid:107) =1 ,..., (cid:107) v d (cid:107) =1 max (cid:107) x (cid:107) =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d (cid:89) i =1 (cid:104) v i , x (cid:105) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = d − d/ (5) And is achieved when v i are (up to rotation) the basis vectors e i . We see that (5) is a minimax problem with its inner maximization problem equivalent to solvingthe following optimization problem: max (cid:107) x (cid:107) =1 (cid:32) d (cid:89) i =1 (cid:104) v i , x (cid:105) (cid:33) /d (6)Which is exactly (1) with A i = v i v † i . Exact values for c d ( K n ) where K = R or C and d > n are notknown, but [PR04] computed the asymptotic value lim d →∞ c d ( K n ) /d . We will use these resultslater to construct integrality gap instances in Sections 5 and 6.5.2.4. Permanents of PSD Matrices.

Given a matrix M ∈ C n × n , its permanent is deﬁned to beper( M ) = (cid:88) σ ∈ S n n (cid:89) i =1 M i,σ ( i ) , Where the sum is over all permutations of n elements. If M is Hermitian positive semideﬁnite(PSD), [AGGS17] and [YP21] analyzed a SDP-based approximation algorithm that produces asimply exponential approximation factor to per( M ). Let M = V † V and v i are the columns of V .In [YP21], the problem of approximating per( M ) is related to the problem of maximizing a productof linear forms over the complex sphere r ( M ) := max (cid:107) x (cid:107) = n n (cid:89) i =1 |(cid:104) x, v i (cid:105)| , nd its convex relaxation rel( M ) (obtained in a similar manner as (3)) by showing that n ! n n r ( M ) ≤ per( M ) ≤ rel( M ) . Thus we can approximate per( M ) by analyzing the approximation quality of rel( M ) as a relaxationof r ( M ). It is easy to see that r ( M ) is equivalent to a special case of (1) when the A i are all rank-1,and the result of Theorem 1.3 applied to this problem gives the same approximation factor to thepermanent as [YP21].2.5. Portfolio Optimization.

Suppose there is a collection of n stocks with their returns denotedas r , where r i > i ( r i < r i > y i of our capital to stock i so as to maximize our expected return. We have the historical returns r (1) , . . . , r ( d ) over d timeperiods to base our decision on. The strategy employed by [WPM77] is to maximize the geometricmean of the total returns max y ≥ , (cid:80) i y i =1 (cid:32) d (cid:89) i =1 (cid:104) y, r ( i ) (cid:105) (cid:33) /d , which can be interpreted as rebalancing the portfolio after each time period. This is a special caseof (1) in which A i are diagonal matrices with r ( i ) on the diagonal and y i = x i . In Section 3.5 weshow that in this case the relaxation (3) is exact.2.6. Nash Social Welfare.

Suppose x is an allocation of a set of divisible resources to d agentseach with a non-negative utility function A i ( x ). We can ensure fairness by choosing the objectivefunction, which result in diﬀerent notions of fairness, ranging from the utilitarian max x d (cid:80) i A i ( x )to egalitarian max x min i A i ( x ). Interpolating between these is the Nash social welfare objectivemax x ( (cid:81) i A i ( x )) /d , which is the geometric mean of the utilities. This objective is well-studied forallocation of indivisible items [CKM + i is x † A i x , a non-negativequadratic form on x . 3. Semidefinite Relaxation

Before proving Theorem 1.3, we derive our semideﬁnite relaxation of the problem and give inter-pretations for both its primal and dual forms. The insights gained from deriving both the primaland dual relaxations will be helpful in Section 6 when generalizing to higher-degree relaxations.Recall that the polynomial we wish to optimize is p A ( x ) = d (cid:89) i =1 (cid:104) x, A i x (cid:105) , and we want to ﬁnd an upper bound of p A ( x ) /d on the sphere. One can compute an upper boundusing the degree- d Sum-of-Squares relaxation (2) over the sphere, but this involves solving a SDPof size n O ( d ) , which is computationally ineﬃcient and does not exploit the compact representationof p A ( x ). One computationally eﬃcient upper bound is given by (cid:81) di =1 (cid:107) A i (cid:107) /d , the geometric meanof the spectral norms of A i , but it can diﬀer from the true optimum multiplicatively by a factorof n − / (see Proposition 2.2). In the next few sections, we will introduce a series of weaker butcomputationally more eﬃcient bounds, which still have good approximation guarantees. .1. Quadratic Upper Bounds.

The ﬁrst approach uses the arithmetic mean/geometric mean(AM/GM) inequality: (cid:32) d (cid:89) i =1 (cid:104) x, A i x (cid:105) (cid:33) d ≤ d d (cid:88) i =1 (cid:104) x, A i x (cid:105) = 1 d x † (cid:32) d (cid:88) i =1 A i (cid:33) x This then becomes an eigenvalue problem. Maximizing this quadratic form over the unit sphere,we obtain the following:

Proposition 3.1.

Let G = (cid:80) di =1 A i . Then if (cid:107) x (cid:107) = 1 , p A ( x ) ≤ (cid:18) λ max ( G ) d (cid:19) d . This technique is powerful enough to prove Kantorovich’s inequality (Proposition 2.1), as we willsee in the following example. This is a adaptation of Newman’s proof in [New60].

Example 3.2 (Proof of Kantorovich’s inequality) . Since both A and A − are positive deﬁnite, wecan apply the AM/GM inequality on ( αx † Ax )( α − x † A − x ) for any α > : ( x † Ax )( x † A − x ) ≤ (cid:18) x † (cid:18) α A + αA − (cid:19) x (cid:19) ≤ λ max (cid:18) α A + αA − (cid:19) . Without loss of generality we assume A and A − are diagonal, as they are simultaneously diago-nalizable. Choosing α = √ λ λ n , λ max (cid:18) α A + αA − (cid:19) = max i (cid:18) λ i √ λ λ n + √ λ λ n λ i (cid:19) ≤ (cid:114) λ λ n + (cid:114) λ n λ . This is because f ( x ) = xα + αx is convex on any nonnegative interval and a convex function on aninterval is maximized at its endpoints. Rescaling and Semideﬁnite Relaxation.

In Example 3.2, in addition to using the AM/GMinequality, we also introduced a scaling factor α to strengthen the inequality. Since the cost functionis multilinear in A i , we can optimize over all possible rescalings A i (cid:55)→ α i A i for all α i > (cid:81) i α i = 1, to improve the upper bound. Furthermore, the problem of optimizing over such scalingsis also convex since a lower bound on the concave geometric mean (cid:16)(cid:81) di =1 α i (cid:17) /d deﬁnes a convexset. Theorem 3.3.

Given A = ( A , . . . , A d ) , the following upper bound holds: Opt ( A ) = max (cid:107) x (cid:107) =1 p A ( x ) /d ≤ λ ∗ , where λ ∗ is the optimum of the following convex program: min λ s.t. d d (cid:88) i =1 α i A i (cid:22) λI n , d (cid:89) i =1 α i ≥ , α i > Theorem 3.4.

The following upper bound holds:

Opt ( A ) ≤ OptSDP ( A ) , here OptSDP ( A ) is the optimum of the following convex program: OptSDP ( A ) := max (cid:32) d (cid:89) i =1 (cid:104) A i , X (cid:105) (cid:33) /d s.t. (cid:26) Tr( X ) = 1 X (cid:23) Furthermore, (8) is dual to (7) , and

OptSDP ( A ) = λ ∗ .Proof. It is clear that

OptSDP ( A ) is a rank relaxation of Opt ( A ), by using the variable X insteadof xx † . To ﬁnd the dual of (7), we write the Lagrangian L ( X, γ, α, λ ) = λ − (cid:42) λI − d d (cid:88) i =1 α i A i , X (cid:43) − γ (cid:32)(cid:89) i α /di − (cid:33) Solving for λ , we get the constraint Tr( X ) = 1. Solving for α i and γ , we get α i = γ (cid:104) A i , X (cid:105) and γ = (cid:32) d (cid:89) i =1 (cid:104) A i , X (cid:105) (cid:33) /d And we obtain (8) after substituting these values into the Lagrangian. (cid:3)

Note that the dual objective is log-concave, and it is a special case of maximizing the determi-nant of a PSD matrix, which can be solved eﬃciently using (for example) interior point methods[VBW98].3.3.

Maximizing Monomials over the Sphere.

To get more insight of the role the multipliers α i play, we consider the special case where p ( x ) = x β is a monomial. Maximizing a monomialover the sphere is a special case of (1): for each copy of x i in x β (there are d of these in total,corresponding to A , . . . , A d ), set A j to be 1 on the i -th diagonal entry and 0 elsewhere. Next weshow that the convex relaxation in Theorem 3.3 achieves the true maximum value. In the relaxationthere are d multipliers α , . . . , α d associated with each copy of x i . For each x i , set its multiplier tobe β − i (cid:81) ni =1 β iβ i /d . Thus λ max  d d (cid:88) j =1 α j A j  = λ max  d n (cid:88) j =1 β j (cid:88) k =1 β − j n (cid:89) i =1 β jβ j /d e j e † j  = 1 d n (cid:89) i =1 β jβ j /d Thus the relaxation value is the same as the optimum given by Proposition 2.2. The multipliers α i play the role of balancing out the terms in the sum.3.4. Rank of Solutions.

We can bound the rank of the solution X ∗ to the relaxation (8) using aresult by Barvinok [Bar02] and Pataki [Pat98]: Proposition 3.5 (Proposition 13.4 of [Bar02]) . For some r > , ﬁx k = ( r + 2)( r + 1) / symmetricmatrices A , . . . , A k ∈ R n × n where n ≥ r + 2 and k real numbers α , . . . , α k . If there is a solution X (cid:23) to the system: (cid:104) A i , X (cid:105) = α i for i = 1 , . . . , k and the set of all such solutions is bounded, then there is a matrix X (cid:23) satisfying the samesystem and rank X ≤ r . Indeed, suppose X ∗ is an optimal solution to the relaxation (3), then any solution X to the d + 1linear equations (cid:104) X, A i (cid:105) = (cid:104) X ∗ , A i (cid:105) and Tr( X ) = 1 is also optimal. Proposition 3.5, along withan analogous result in the complex setting [AHZ08], also implies that the rank of the relaxation isbounded by O ( √ d ), which helps us bound the approximation factor of this relaxation in the nextsection. .5. Exact Relaxations.

In this section we study a few special cases where the relaxation

OptSDP ( A )is exact. The ﬁrst case is when d = 2, which is a direct result of Proposition 3.5 substituting in k = 3. Proposition 3.6.

When d = 2 and K = R , then Opt ( A ) = OptSDP ( A ) . This also implies that the bound on Kantorovich’s inequality produced by our relaxation is tight.Next we show that the relaxation is tight when A i are simultaneously diagonalizable. This alsoimplies that our relaxation ﬁnds the optimum solutions to the portfolio optimization (Section 2.5)and optimizing monomial (Section 2.2) problems. Proposition 3.7.

Let A = ( A , . . . , A d ) . If all A i commute with each other, then Opt ( A ) = OptSDP ( A ) .Proof. Since the matrices A i commute with each other, they are simultaneously diagonalizable.They can be written as A i = U † D i U , where D i is diagonal and U unitary. Then after a change ofvariables x (cid:55)→ U x , the relaxation (3) is equivalent to the original problem (1) with the substitution X ii = x i . (cid:3) Rounding Algorithm and Analysis

In this section we present our randomized rounding algorithm for the relaxation

OptSDP ( A )(3), and an analysis of its approximation factor (Theorem 1.3). First we state some standard resultsabout generalized Chi-squared distributions, after which we will use these results to prove Theorem1.3.4.1. Background on Real and Complex Multivariate Gaussians.

In this section we willuse a few results involving the expectation of functions of real or complex multivariate Gaussianvariables.

Deﬁnition 4.1 (Multivariate Gaussian Random Variable) . Let x ∼ N K (0 , I n ) . If K = R , then itscoordinates x j are i.i.d. normal random variables. If K = C , then x j = ( y j + iz j ) / √ , where y j and z j are i.i.d. standard normal random variables. The random variable z ∼ N C (0 , I n ) is circularly symmetric, meaning that its distribution isinvariant after the transformation z (cid:55)→ e iθ z for all θ ∈ R . All complex multivariate Gaussians inthis paper are circularly symmetric. Similar to real multivariate Gaussians, a linear transform onthe random vector induces a congruence transform on the covariance matrix. Proposition 4.2 (Invariance under orthogonal/unitary transformations) . Given x ∼ N K (0 , Σ) andany matrix A ∈ K n × n , Ax has the distribution N K (0 , A Σ A † ) . Thus given A = U U † (cid:23)

0, to sample w ∈ K n from N K (0 , A ), we can ﬁrst sample x ∼ N K (0 , I ),then let w = U x . The proof of this proposition and more about complex multivariate Gaussianscan be found in [Gal]. In particular, this tells us that the distribution N K (0 , I ) is invariant underunitary transformations.In the analysis of our rounding procedure, we use some results about the gamma distribution. Fact 4.3 (Expectation of log of gamma random variable) . Let X ∼ Gamma( α, β ) be drawn fromthe gamma distribution, with density p ( x ; α, β ) = Γ( α ) − β α x α − e − βx . Then E [log X ] = ψ ( α ) − log( β ) , where ψ ( x ) = ddx log Γ( x ) is the digamma function. This follows from the fact that the gamma distribution is an exponential family, of which log x isa suﬃcient statistic (see section 2.2 of [Kee10] for more details). Next we prove an useful identity. act 4.4. Let ( z , . . . , z r ) ∼ N C (0 , I r ) , γ = lim n →∞ ( H n − log n ) ≈ . be the Euler-Mascheroniconstant and L r ( K ) = (cid:40) γ + log 2 + φ (cid:0) r (cid:1) − log (cid:0) r (cid:1) if K = R γ + φ ( r ) − log( r ) if K = C . Then E (cid:34) log (cid:32) r r (cid:88) i =1 | z i | (cid:33)(cid:35) = E (cid:104) log | z | (cid:105) − L r ( K ) Proof.

For K = R , (cid:80) ri =1 | z i | is a chi-squared distribution with r degrees of freedom, which isequivalent to Gamma (cid:0) r , (cid:1) . Using Fact 4.3, E log (cid:16)(cid:80) ri =1 | z i | (cid:17) = ψ (cid:0) r (cid:1) − log (cid:0) r (cid:1) . Since ψ (cid:0) (cid:1) = − γ − log(4), we get E log | z | = − γ − log(2) to obtain the value of L r ( R ). We can ﬁnd L r ( C ) witha similar calculation, using the fact that when K = C , (cid:80) ri =1 | z i | is a chi-squared distributionwith 2 r degrees of freedom. (cid:3) We need the following result in our proof of Theorem 4.6.

Proposition 4.5.

Given z ∼ N K (0 , I r ) and a r × r PSD matrix M (cid:23) where Tr( M ) = 1 , E z (cid:104) log | z | (cid:105) ≤ E z [log (cid:104) z, M z (cid:105) ] ≤ E z (cid:34) log 1 r r (cid:88) i =1 | z | (cid:35) Proof.

Because of the rotational invariance of z , it suﬃces to bound: f ( λ ( M )) = E z (cid:34) log r (cid:88) i =1 λ i ( M ) | z i | (cid:35) . Since f as a function of λ is concave and symmetric on the simplex, it is minimized on any oneof the vertices, so it is lower bounded by setting λ = (1 , , . . . , f ( λ )achieves its maximum when λ = (1 /r, . . . , /r ), in the center of the simplex. (cid:3) Proof of Theorem 1.3.

Now we will show that the value of the SDP relaxation (3) is a e − L r ( K ) approximation of the optimum, where L r ( K ) ≥ X ∗ be the dual solution of the SDP, where X ∗ (cid:23) X ∗ ) = 1. Informally, for ourrounding algorithm we want to pick a vector from a distribution over the sphere with covariancematrix X ∗ . The following theorem states the rounding algorithm and its approximation factor. Theorem 4.6.

Given a solution X ∗ to the optimization problem (3) with rank( X ∗ ) = r thatachieves value OptSDP ( A ) , we produce a feasible solution y with the following rounding procedure:(1) Sample x ∈ K n uniformly at random from the multivariate Gaussian distribution N K (0 , X ∗ ) .(2) Return the normalized vector y = x/ (cid:107) x (cid:107) .If y is sampled using this procedure, E y (cid:34) d (cid:89) i =1 (cid:104) y, A i y (cid:105) /d (cid:35) ≥ e − L r ( K ) OptSDP ( A ) . Since y is always a feasible solution to (1), we have Opt ( A ) ≥ E y (cid:34) d (cid:89) i =1 (cid:104) y, A i y (cid:105) /d (cid:35) and thus Theorem 4.6 implies the lower bound in Theorem 1.3. roof of Theorem 4.6. Since X ∗ is a PSD matrix it can be factored as X ∗ = U U † , where U ∈ K n × r is a rank- r matrix. Another way to sample from a Gaussian distribution with covariance X ∗ is toﬁrst sample z ∼ N K (0 , I r ), so that y = U z/ (cid:107)

U z (cid:107) . Next we compute the expected value of theobjective with y sampled from the rounding procedure: E z (cid:32) d (cid:89) i =1 (cid:104) A i U z, U z (cid:105)(cid:107)

U z (cid:107) (cid:33) /d  = E z (cid:34) exp (cid:32) d d (cid:88) i =1 (log (cid:104) A i U z, U z (cid:105) − log (cid:107)

U z (cid:107) ) (cid:33)(cid:35) ≥ exp (cid:32) d d (cid:88) i =1 ( E z log (cid:104) A i U z, U z (cid:105) − E z log (cid:107) U z (cid:107) ) (cid:33) , where we have used Jensen’s inequality. Next we compute the inner expectations separately. Let M = U † A i U/ Tr( U † A i U ) so E z [log (cid:104) A i U z, U z (cid:105) ] = E z [log (cid:104) z, M z (cid:105) ] + log Tr( U † A i U ) ≥ E z (cid:104) log | z | (cid:105) + log Tr( U † A i U )= E z (cid:104) log | z | (cid:105) + log (cid:104) A i , X ∗ (cid:105) . Since Tr( M ) = 1 the inequality follows from the lower bound in Proposition 4.5. Next note thatTr( U † U ) = Tr( U U † ) = Tr( X ∗ ) = 1. Suppose λ , . . . , λ n are the eigenvalues of U † U . Then applyingthe upper bound in Proposition 4.5 and using Fact 4.4: E z (cid:104) log (cid:107) U z (cid:107) (cid:105) = E z (cid:34) log (cid:32) r (cid:88) i =1 λ i | z i | (cid:33)(cid:35) ≤ E z (cid:34) log (cid:32) r r (cid:88) i =1 | z i | (cid:33)(cid:35) = E z (cid:104) log | z | (cid:105) − L r ( K ) . Putting these together, we get that E y (cid:34) d (cid:89) i =1 (cid:104) y, A i y (cid:105) /d (cid:35) ≥ exp (cid:32) d d (cid:88) i =1 (log (cid:104) A i , X ∗ (cid:105) − L r ( K )) (cid:33) = e − L r ( K ) OptSDP ( A ) . (cid:3) Asymptotically Tight Instances

An integrality gap instance of a relaxation is a problem instance where there is a gap betweenthe true optimum and the SDP relaxation. In this section we provide an asymptotic integrality gapinstance for the SDP relaxation (3), showing that the integrality gap approaches the approximationfactor for large n and d . We do so by drawing a connection to the problem of linear polarizationconstants on Hilbert spaces. Theorem 5.1.

For any (cid:15) > , there exists n, d and unit vectors v , . . . , v d ∈ K n (where K = R or C )so that there is a gap between the true optimum of the optimization problem: Opt ( A ) = max (cid:107) x (cid:107) =1 (cid:32) d (cid:89) i =1 |(cid:104) x, v i (cid:105)| (cid:33) /d , and the SDP relaxation OptSDP ( A ) given by (3) . This gap increases with the dimensions, so thatfor suﬃciently large n and d : e − L n ( K ) ≥ OptSDP ( A ) Opt ( A ) ≥ e − L n ( K ) − (cid:15) ntuitively, we want to choose v i respecting some symmetry, so that the distribution on solutionsreturned by the SDP relaxation is as symmetrical as possible. Thus during the rounding procedure(choosing a single solution out of the distribution) we are forced to break this symmetry. Thisis where the integrality gap instance arises. One natural choice of an instance with this kind ofsymmetry is to sample each v i uniformly at random on the sphere. To ﬁnd the value of the trueoptimum, we use a result about the linear polarization constants of Hilbert spaces (recall Deﬁnition2.3): Theorem 5.2 (Theorem F and 1 of [PR04]) . Let K = R or C . Then lim d →∞ c d ( K n ) − /d = 1 n e L n ( K ) and there exist a family of instances A d = (cid:16) v v † , . . . , v d v † d (cid:17) so that Opt ( A d ) converges to this valueas d → ∞ .Proof of Theorem 5.1. Applying Theorem 5.2, we can ﬁnd a family of instances A d so that Opt ( A d )converge to n e L n ( K ) . Next, we bound OptSDP ( A d ). Given any solution λ ∗ of the primal form (7), λ ∗ I n (cid:23) d d (cid:88) i =1 α i v i v † i nλ ∗ ≥ d d (cid:88) i =1 α i ≥ (cid:32) d (cid:89) i =1 α i (cid:33) /d ≥ , where the inequality is obtained by taking the trace and using AM/GM. Thus OptSDP = λ ∗ ≥ /n . Putting this together with Theorem 5.2, we have shown that there is a sequence of instances A d such that lim d →∞ OptSDP ( A ) Opt ( A ) ≥ e − L n ( K ) . (cid:3) A Hierarchy of Relaxations

The relaxation

OptSDP ( A ) introduced in Deﬁnition 1.2 gives us a computationally eﬃcientalgorithm to bound the maximum of the geometric mean of PSD forms on the sphere. In thissection we discuss a few methods to strengthen OptSDP ( A ) using Sum-of-Squares optimization.In Section 3.2, the SDP formulation of OptSDP ( A ) can be interpreted as ﬁrst using the AM/GMinequality to provide an upper bound of the original degree- d polynomial in terms of a low-degreepolynomial, then optimizing over the degrees of freedom introduced by the relaxation. We canextend this idea to higher degrees using Maclaurin’s inequality, a generalization of the AM/GMinequality. Let S k be the set of all k -tuples chosen from d indices, of size (cid:0) dk (cid:1) . Given x ∈ R d ,we deﬁne the (normalized) elementary symmetric polynomial to be E k ( x ) = (cid:0) dk (cid:1) − (cid:80) I ∈S k (cid:81) i ∈ I x i with E = 1. For example E ( x ) = d (cid:80) di =1 x i , E ( x ) = (cid:0) d (cid:1) − (cid:80) i>j x i x j and E d ( x ) = x · · · x d .Maclaurin’s inequality states that for all 1 ≤ j ≤ k ≤ d and x ≥ E k ( x ) /k ≤ E j ( x ) /j . (9)In particular when j = 1 and k = d we recover the AM/GM inequality. This also generates a seriesof inequalities interpolating between the arithmetic and geometric means. Since the objective of theoptimization problem (1) can be written as E d ( (cid:104) x, A x (cid:105) , . . . , (cid:104) x, A d x (cid:105) ) /d , we can get progressivelybetter upper bounds by optimizing E k ( (cid:104) x, A x (cid:105) , . . . , (cid:104) x, A d x (cid:105) ) /k for increasing values of k . Since k is a degree-2 k homogeneous polynomial in x we can use Sum-of-Squares optimization to obtainbounds on its maximum.6.1. Background on Sum-of-Squares.

Sum-of-Squares optimization is a method of obtainingconvex relaxations for polynomial optimization problems ([Las01], [Par00]). Let p ( x ) be a degree-2 k polynomial. We use the notation p ( x ) (cid:23) p ( x ) can be written asa sum of squares, and p ( x ) (cid:23) q ( x ) if p ( x ) − q ( x ) (cid:23)

0. This can be determined by solving a SDPof size n O ( k ) . The degree- k Sum-of-Squares relaxation for maximizing a degree-2 k homogeneouspolynomial f ( x ) on the sphere can be written as the following optimization problem with a Sum-of-Squares constraint: min γ s.t. γ (cid:107) x (cid:107) k − f ( x ) (cid:23) pseudoexpectation operator ˜ E for each sum of squares constraint. Deﬁnition 6.1 (Homogeneous pseudoexpectation operator) . A linear operator ˜ E : R [ x ] → R onthe space of degree degree- k homogeneous polynomials is a valid degree- k homogeneous pseudoex-pectation if ˜ E [ (cid:107) x (cid:107) k ] = 1 and ˜ E [ f ( x ) ] ≥ for all degree- k polynomials f ( x ) . The pseudoexpectation ˜ E encodes moments up to degree 2 k and the dual of a Sum-of-Squaresproblem can be viewed as optimizing over this truncated moment sequence. Similar to a sum ofsquares constraint, the constraint that ˜ E is a valid pseudoexpectation can be written as a SDP ofsize n O ( k ) . Thus the dual of (10) can be written as:max ˜ E [ f ( x )] s.t. ˜ E is valid degree- k homogeneous pseudoexpectation(11)Next we provide a series of relaxations that interpolates between the relaxations (7) and (2).6.2. Higher Degree Relaxations.

Given an instance A = ( A , . . . , A d ) of the problem (1), wecan write the following relaxation of Opt ( A ) using Maclaurin’s inequality: Opt ( A ) ≤ S k ( A ) := (cid:26) min λ /k s.t. λ (cid:107) x (cid:107) k − E k ( (cid:104) x, A x (cid:105) , . . . , (cid:104) x, A d x (cid:105) ) (cid:23) (cid:107) x (cid:107) =1 E /dd ≤ max (cid:107) x (cid:107) =1 E /kk follows from (9) and S k ( A ) is a Sum-of-Squaresrelaxation of the latter problem. Also as k increases, the approximation improves until when k = d we get the standard degree- d Sum-of-Squares relaxation (2). Thus by varying k we have a series ofrelaxations of increasing degree.Similar to the relaxation OptSDP ( A ) presented in Theorem 3.3, we can also use multipliers toimprove S k , arriving at our deﬁnition for OptSOS k ( A ): Deﬁnition 6.2.

Let S k be the set of all combinations of k -tuples from d indices, of size (cid:0) dk (cid:1) . Then OptSOS k ( A ) := min λ /k s.t. λ (cid:107) x (cid:107) k − (cid:0) dk (cid:1) − (cid:80) I ∈S k α I (cid:81) i ∈ I (cid:104) x, A i x (cid:105) (cid:23) (cid:81) I ∈S k α I ≥ , α I > . (13)With this deﬁnition, OptSOS ( A ) = OptSDP ( A ), and OptSOS d ( A ) is equivalent to thedegree- d Sum-of-Squares relaxation to the optimization problem max (cid:107) x (cid:107) =1 (cid:81) di =1 (cid:104) x, A i x (cid:105) . Similar o taking the dual of OptSOS d ( A ) in Theorem 3.4, the dual of (13) is equivalent to: OptSOS k = max (cid:16)(cid:81) I ∈S k ˜ E x (cid:2)(cid:81) i ∈ I (cid:104) x, A i x (cid:105) (cid:3)(cid:17) d ( d − k − )s.t. ˜ E x is a degree- k homogeneous pseudoexpectation(14)From the above discussion, we produced a series of relaxations of increasingly higher Sum-of-Squaresdegree. Proposition 6.3.

For all ≤ k ≤ d , Opt ( A ) ≤ OptSOS k ( A ) ≤ S k ( A )Because we are taking powers of the polynomials, it isn’t immediately clear that the values ofthis series of relaxations increase monotonically as we increase the degree. Next we will prove amonotonicity result on the value of the intermediate relaxations S k ( A ). Theorem 6.4.

Given an instance A = ( A , . . . , A d ) , for any ≤ k < nk ≤ d : S nk ( A ) ≤ S k ( A ) Where S k is deﬁned in (12) . From this we can show a partial order for relaxation values S k , based on the divisibility of theirdegrees. For example, if d = 2 m , Theorem 6.4 implies that S ( A ) ≥ S ( A ) ≥ S ( A ) ≥ · · · ≥ S m ( A ). We use the following lemma about a Sum-of-Squares proof of Maclaurin’s inequality toprove Theorem 6.4. Proposition 6.5 (Lemma 3 of [FH12]) . Given x ∈ R n , let s ( x ) , . . . , s d ( x ) (cid:23) be sum of squarespolynomials. Next let E k ( x ) = E k ( s ( x ) , . . . , s d ( x )) be the k -th elementary symmetric polynomialin the variables s ( x ) , . . . , s d ( x ) . For all ≤ i ≤ j ≤ d − the following sum of squares (in thevariable x ) inequality holds: E i ( x ) E j ( x ) (cid:23) E i − ( x ) E j +1 ( x )(15) We can use (15) to prove Maclaurin’s inequality: E i ( x ) j (cid:23) E j ( x ) i (16) As well as the following inequality: E m ( x ) n (cid:23) E mn ( x )(17) Proof of Theorem 6.4.

Since S k is an optimal solution to (12), let λ ∗ k = S k ( A ) k and we have λ ∗ k (cid:107) x (cid:107) k − E k ( A ) (cid:23) . Since (cid:107) x (cid:107) k and E k ( A ) are both Sum-of-Squares polynomials in x , this implies that λ ∗ kn (cid:107) x (cid:107) nk − E k ( A ) n (cid:23) . From (17) we can show that λ ∗ kn (cid:107) x (cid:107) nk − E kn ( A ) (cid:23) . Since the above equation is a feasible solution to optimization problem (12) with optimum S kn ( A ),we have S kn ( A ) = ( λ ∗ kn ) /kn ≤ ( λ ∗ k ) /k = S k ( A ). (cid:3) If we let k = 1 and n = d , we can introduce multipliers α i to this proof to get: roposition 6.6. OptSOS d ( A ) ≤ OptSOS ( A ) = OptSDP ( A )It is natural to ask how good an approximation OptSOS k ( A ) is as a function of k , and how wecan recover a feasible solution from the solution of (13). We will ﬁrst propose a rounding algorithmfor all levels of this relaxation that generalizes the rounding algorithm presented in Section 4, thenanalyze its approximation ratio for the case where the relaxation is exact. Finally we show a lowerbound on the integrality gap of OptSOS k ( A ), and show that this bound decreases as k increases.6.3. Rounding Algorithm.

Even though the higher-degree relaxations

OptSOS k ( A ) provideupper bounds to the true optimum Opt ( A ), it is not immediately clear how to produce a feasiblesolution to the optimization problem (1). Here we describe a general rounding procedure forobtaining a feasible solution from each of the higher-degree relaxations. As a generalization to therounding algorithm for the quadratic case in Section 4, we ﬁrst construct a PSD moment matrix M = U U † (unlike the quadratic case, M is chosen randomly) and generate a feasible point y onthe sphere as follows:(1) Sample v uniformly at random on S K (2) Sample x ∼ N K (0 , M ( v )), where M ( v ) = ˜ E (cid:104) (cid:104) v, x (cid:105) k − xx † (cid:105) (3) Return y = x/ (cid:107) x (cid:107) In the proof of Theorem 1.3 in Section 4, we showed that when k = 1, the above rounding algorithmproduces a solution that achieves a value of at least e − L r ( K ) OptSOS ( A ) in expectation. Thisimplies that OptSOS ( A ) achieves an approximation factor of at least e − L r ( K ) . One naturalquestion to ask is if the higher-degree OptSOS k ( A ) improves on this approximation factor.We answer this question partially by providing a lower bound on the performance of the roundingalgorithm for instances A where the relaxation OptSOS k ( A ) is exact. Then we state a conjectureinvolving an identity of pseudoexpectations which if true, the same bound applies to all instances.Even when the relaxation is exact, this is a non-trivial result. Since there can be exponentiallymany solutions to (1) (see for instance the example in Section 6.4), recovering one solution from thepseudoexpectation in OptSOS k ( A ) is a tensor decomposition problem. For clarity of exposition,we present our result for the case of K = C . We note that an analogous result can also be provedfor K = R . Theorem 6.7.

Suppose K = C and OptSOS k ( A ) = Opt ( A ) . Let C ( n, k ) := γ + (1 − (cid:15) ) n − ( − γ + log(1 − (cid:15) ))(1 − (cid:15) − (cid:15)/ ( n − n − − n − (cid:88) (cid:96) =1 (cid:15) (1 − (cid:15) ) (cid:96) − (log( (cid:15)/ ( n − φ ( n − (cid:96) ))( n − − (cid:15) − (cid:15)/ ( n − (cid:96) There exists a vector v so that given y generated from the above rounding procedure, E y (cid:34) d (cid:89) i =1 (cid:104) y, A i y (cid:105) /d (cid:35) ≥ e − C ( n,k ) OptSOS k ( A ) , (18) where C K ( n, k ) ≥ is bounded from above by L n ( C ) and decreases with increasing k . roof. First we write the expectation in exponential form and use Jensen’s inequality: E y (cid:34) d (cid:89) i =1 (cid:104) y, A i y (cid:105) /d (cid:35) = E y (cid:34) exp (cid:32) d (cid:88) i =1 (cid:104) y, A i y (cid:105) /d (cid:33)(cid:35) ≥ exp (cid:32) d d (cid:88) i =1 E y log (cid:104) y, A i y (cid:105) (cid:33) = exp  d (cid:18) d − k − (cid:19) − (cid:88) I ∈S k (cid:88) i ∈ I E y log (cid:104) y, A i y (cid:105)  Next we analyse each term in the sum in the exponential. Let M ( v ) = U U † , and the roundingprocedure is equivalent to setting y = U w/ (cid:107)

U w (cid:107) , where w is drawn from a standard multivariatecomplex Gaussian distribution. (cid:88) i ∈ I E y log (cid:104) y, A i y (cid:105) = (cid:88) i ∈ I E w log (cid:104) U w, A i U w (cid:105) = (cid:88) i ∈ I log Tr( U † A i U ) + E w (cid:20) log w † U † A i U Tr( U † A i U ) w (cid:21) − E w (cid:104) log (cid:107) U w (cid:107) (cid:105) ≥ (cid:88) i ∈ I log (cid:104) A i , M ( v ) (cid:105) − γ − E w (cid:104) log (cid:107) U w (cid:107) (cid:105) , where the inequality is implied by Proposition 4.5. Next we bound log (cid:104) A i , M (cid:105) in terms of E x ∼ µ ,expectations over the distribution of solutions to (1). We ﬁrst deﬁne M ( v ) as expectation over areweighed distribution µ (cid:48) . Let f ( x ) = (cid:104) v, x (cid:105) k − E x ∼ µ (cid:104) (cid:104) v, x (cid:105) k − (cid:105) E x ∼ µ (cid:48) [ g ( x )] = E x ∼ µ [ f ( x ) g ( x )] , since f ( x ) ≥ E x ∼ µ [ f ( x )] = 1. Then M ( v ) = E x ∼ µ (cid:48) (cid:2) xx † (cid:3) and we use Jensen’s inequality toshow that (cid:88) I ∈S k (cid:88) i ∈ I log (cid:104) A i , M ( v ) (cid:105) = (cid:88) I ∈S k (cid:88) i ∈ I log E x ∼ µ (cid:48) [ (cid:104) x, A i x (cid:105) ] ≥ E x ∼ µ (cid:48)  (cid:88) I ∈S k (cid:88) i ∈ I log (cid:104) x, A i x (cid:105)  . Now since we are assuming that µ (and so is µ (cid:48) ) is a distribution over actual solutions, (cid:88) I ∈S k (cid:88) i ∈ I log (cid:104) x, A i x (cid:105) = log  (cid:89) I ∈S k E x ∼ µ (cid:34)(cid:89) i ∈ I (cid:104) x, A i x (cid:105) (cid:35) , and since this is constant for all x in the support of µ and µ (cid:48) , we have (cid:88) I ∈S k (cid:88) i ∈ I log (cid:104) A i , M ( v ) (cid:105) ≥ log  (cid:89) I ∈S k E x ∼ µ (cid:34)(cid:89) i ∈ I (cid:104) x, A i x (cid:105) (cid:35) . This completes the ﬁrst part of our proof. Next we need to upper bound E w (cid:104) log (cid:107) U w (cid:107) (cid:105) , whichby results in Appendix A depends on the eigenvalues of U † U which are the same as the eigenvaluesof M ( v ). Informally, with high probability one eigenvalue of M ( v ) will be large while the other nes will be small, since by taking high powers the gap between the top eigenvalue and the othereigenvalues will be ampliﬁed. Thus we can use the results in Section A to bound the last term.First we compute a lower bound for λ max ( M ( v )) for the case where K = C . λ max ( M ( v )) = λ max  E x (cid:104) (cid:104) v, x (cid:105) k − xx † (cid:105) E x (cid:104) (cid:104) v, x (cid:105) k − (cid:105)  ≥ E x (cid:104) (cid:104) v, x (cid:105) k (cid:105) E x (cid:104) (cid:104) v, x (cid:105) k − (cid:105) Using the fact that for random variables X and Y where Pr( Y >

0) = 1, Pr(

X/Y ≥ E [ X ] / E [ Y ]) >

0, we know that there exist a v such that: λ max ( M ( v )) ≥ E v E x (cid:104) (cid:104) v, x (cid:105) k (cid:105) E v E x (cid:104) (cid:104) v, x (cid:105) k − (cid:105) = E v [ | v | k ] E v [ | v | k − ] = (cid:18) n + k − n − (cid:19) − (cid:18) n + k − n − (cid:19) = kk + n − E x is a pseudoexpectation instead, since we can interchange expectationsand pseudoexpectations and E v (cid:104) (cid:104) v, x (cid:105) k (cid:105) = E v [ | v | k ] (cid:107) x (cid:107) k . If we let kk + n − = 1 − (cid:15) and supposethat 1 − (cid:15) ≥ /n (always holds when k ≥ f ( λ ) = E (cid:104) log (cid:16) λ | w | + · · · + λ n | w n | (cid:17)(cid:105) , E w (cid:104) log (cid:107) U w (cid:107) (cid:105) = E (cid:104) log (cid:16) λ | w | + · · · + λ n | w n | (cid:17)(cid:105) ≤ E (cid:20) log (cid:18) λ | w | + 1 − λ n − | w | + · · · + 1 − λ n − | w n | (cid:19)(cid:21) ≤ E (cid:20) log (cid:18) (1 − (cid:15) ) | w | + (cid:15)n − | w | + · · · + (cid:15)n − | w n | (cid:19)(cid:21) . The last inequality arises because the expectation as a function of (cid:15) is monotonically increasing onthe interval [0 , − /n ]. Using the result in Appendix A, we get that if k ≥

1, there exists a v sothat E z (cid:34) log (cid:88) i | z i | λ i ( M ( v )) (cid:35) ≤ (1 − (cid:15) ) n − ( − γ + log(1 − (cid:15) ))(1 − (cid:15) − (cid:15)/ ( n − n − − n − (cid:88) (cid:96) =1 (cid:15) (1 − (cid:15) ) (cid:96) − (log( (cid:15)/ ( n − φ ( n − (cid:96) ))( n − − (cid:15) − (cid:15)/ ( n − (cid:96) = − γ + C ( n, k ) (cid:3) n = = = = k (- C ( n,k )) Figure 2.

Plot of e − C ( n,k ) for diﬀerent values of n and k . The horizontal lineshows the lower bound e − L r ( C ) > . he analysis of Theorem 6.7 is done assuming that the solution to the relaxation are real dis-tributions over solutions. To analyze the approximation ratio we need to translate the results topseudodistributions. We ﬁrst deﬁne µ ( x ) := (cid:104) v, x (cid:105) k − ˜ E x (cid:104) (cid:104) v, x (cid:105) k − (cid:105) , so that ˜ E [ µ ( x )] = 1. If the following conjecture is true, then OptSOS k ( A ) has an approximationfactor of e − C ( n,k ) . Conjecture 6.8. (cid:32) d (cid:89) i =1 ˜ E x [ µ ( x ) (cid:104) x, A i x (cid:105) ] (cid:33) ( d − k − ) ≥ (cid:89) I ∈S k ˜ E x (cid:34)(cid:89) i ∈ I (cid:104) x, A i x (cid:105) (cid:35) . For example, in the case where k = d , the above inequality reduces to d (cid:89) i =1 ˜ E [ µ ( x ) (cid:104) x, A i x (cid:105) ] = d (cid:89) i =1 ˜ E (cid:104) (cid:104) v, x (cid:105) d − (cid:104) x, A i x (cid:105) (cid:105) ˜ E (cid:104) (cid:104) v, x (cid:105) d − (cid:105) ≥ ˜ E (cid:34) d (cid:89) i =1 (cid:104) x, A i x (cid:105) (cid:35) . Example: Icosahedral Form.

Let φ = (1 + √ / p ico ( x, y, z ) = [5(2 φ − x + φy )( x − φy )( y + φz )( y − φz )( z + φx )( z − φx )] . On the sphere x + y + z = 1, p ico has 62 critical points: 12 maxima on the faces, 20 minima on thevertices and 30 saddle points on the edges of the icosahedron. The normalizing constant is chosenso that p ico ( x, y, z ) has a maximum of 1 on the sphere. Because of its icosahedral symmetry, p ico anexample of a polynomial where the gap between the SDP-based relaxation and the true optimumis large. When we solve the relaxation OptSDP = OptSOS for maximizing p ico on the sphere, X ∗ = I because of symmetry. Thus the rounding algorithm in Section 4 reduces to sampling auniformly random point on the sphere, completely ignoring the structure of p ico . However, we cando better by solving the relaxations OptSOS k for k = 2 , . . . ,

6. The following table shows theupper bounds obtained for diﬀerent values of the relaxation parameter k . We can also apply therounding algorithm described in the previous section to this problem, obtaining lower bounds bytaking the mean of the function value from samples returned from the rounding algorithm. Fromthe table below we can see the quality of the bounds increases with k , and when k = 6 the relaxationis exact. k Rounding lower bound SoS upper bound1 0.66019 1.274542 0.65575 1.168143 0.80480 1.102924 0.86907 1.058215 0.90546 1.025346 0.92616 1.00000Figure 1 contains a 3D plot of p ico showing its icosahedral symmetry, as well as 2D scatter plotsof points sampled from the rounding algorithm for k = 2 , . . . ,

6. This shows that the distributioninduced by the rounding procedure getting increasingly concentrated towards the optimal pointsas the degree k increases. .5. Quality of Sum-of-Squares Relaxations.

Similar to Section 5, we can show a more generalresult, where even with the Sum-of-Squares relaxation (12), there is an integrality gap dependingon the degree of relaxation.

Theorem 6.9.

For any k ≥ and (cid:15) > , there exists n, d and unit vectors v , . . . , v d ∈ K n (where K = R or C ) so that there is a gap between the true optimum of the optimization problem: Opt ( A ) = max (cid:107) x (cid:107) =1 (cid:32) d (cid:89) i =1 |(cid:104) x, v i (cid:105)| (cid:33) /d , and the value of the degree k Sum-of-Squares relaxation

OptSoS k ( A ) given by (2) : (cid:18) OptSoS k ( A ) Opt ( A ) (cid:19) ≥ e − L r ( K ) k − /n − (cid:15) To prove this result, we need the following bound on the Sum-of-Squares relaxation:

Proposition 6.10.

Given any instance A = ( v v † , . . . , v d v † d ) , where v , . . . , v d ∈ K n are unitvectors, then OptSoS k ( A ) ≥ n + k − . Proof.

The Sum-of-Squares algorithm produces a certiﬁcate in the form of the pseudo-expectationlinear operator that satisﬁes:˜ E (cid:104) λ d (cid:107) x (cid:107) d − E k ( |(cid:104) x, v (cid:105)| , . . . , |(cid:104) x, v d (cid:105)| ) (cid:105) ≥ λ ∗ by taking an expectation over a uniform distributionon the sphere instead. For the complex case, We can convert each term in the expectation to anintegral over a complex Gaussian measure dµ n ( x ): (cid:90) S n − C k (cid:89) i =1 |(cid:104) x, v i (cid:105)| dx = ( n − k + n − (cid:90) C n k (cid:89) i =1 |(cid:104) x, v i (cid:105)| dµ n ( x )Then using the integral representation of the permanent, we can rewrite the integral as (cid:90) C n k (cid:89) i =1 |(cid:104) x, v i (cid:105)| dµ n ( x ) = per( V † V ) , where v i are the columns of V . Since V † V is positive semideﬁnite and has 1 on its diagonal, byLieb’s theorem [Lie66] its permanent is at least 1. Therefore λ ∗ d ≥ (cid:90) S n − C S k ( |(cid:104) x, v (cid:105)| , . . . , |(cid:104) x, v d (cid:105)| ) dx ≥ (cid:18) dk (cid:19) ( n − k + n − ≥ (cid:18) dk (cid:19) ( n + k − − k . Where the last inequality comes from applying AM/GM. Since

OptSoS k ( A ) = (cid:104) λ ∗ / (cid:0) dk (cid:1)(cid:105) /k , weget the desired bound.For the real case, we can bound the integration on the sphere with the following result (Theorem2.2 of [Fre08]): For any v , . . . , v k ∈ R n with (cid:107) v i (cid:107) = 1, the average of (cid:81) ki =1 (cid:104) v i , x (cid:105) on the unitsphere { x ∈ R n | (cid:107) x (cid:107) = 1 } is at leastΓ( n/ k Γ( n/ k ) = 1 n ( n + 2)( n + 4) · · · ( n + 2 k − ≥ ( n + k − − k . This combined with the rest of the argument in the complex case gets us the desired bound. (cid:3) sing Proposition 6.10 and the same upper bound on the value of Opt ( A ) in the proof ofTheorem 5.1, we prove Theorem 6.9.6.6. Product of Nonnegative Forms.

We can also apply the same technique to produce low-degree relaxations for product of nonnegative forms. Given a product of homogeneous polynomials p ( x ) , · · · , p d ( x ) each of degree 2 (cid:96) , we can apply Maclaurin’s inequality if the polynomials arenon-negative. Hence we can obtain relaxations of the form OptSoS k similar to the optimizationproblem in Deﬁnition 6.2, replacing (cid:104) x, A i x (cid:105) with p i ( x ). This problem involves solving a degree k(cid:96) Sum-of-Squares relaxation. 7.

Hardness

In this section we investigate the hardness of computing

Opt ( A ). When d is ﬁxed, a resultof Barvinok (Theorem 3.4 in [Bar93]) provides a polynomial-time algorithm for computing (1).However we shall prove that this problem is hard when d = Ω( n ). Theorem 7.1.

There exists a constant (cid:15) > so that for all d = Ω( n ) , it is NP-hard to approximate Opt ( A ) deﬁned in (1) better than a factor of (1 − (cid:15) ) /d . This is obtained by a reduction from

MaxCut . In our proof we will use a result by [BK98],showing that

MaxCut for 3-regular graphs is NP-hard to approximate better than a factor of (for general graphs this factor can be improved to [H˚as01]).Let G be a 3-regular graph with unit edge weights and adjacency matrix A . The matrix Q G = ( I − A ) (cid:23) MaxCut ( G ) = max x ∈{± / √ n } n x † Q G x. Next let λ max ( Q G ) be the largest eigenvalue of Q G . A result in spectral graph theory (see [Tre12]for example) shows that: 12 λ max ( Q G ) ≤ MaxCut ( G ) ≤ λ max ( Q G ) ≤ . (19)Let p G ( x ) = x † Q G x (cid:81) ni =1 (cid:0) nx i (cid:1) k be a product of d = nk +1 PSD forms. The following optimizationproblem is equivalent to an instance of (1), after taking the d -th power: Opt ( G ) := max (cid:107) x (cid:107) =1 p G ( x ) . It is easy to show that

Opt ( G ) is a relaxation of MaxCut ( G ), as the feasible set (cid:107) x (cid:107) = 1 includesthe boolean cube {± / √ n } n , and (cid:81) ni =1 (cid:0) nx i (cid:1) k = 1 on this cube. Proposition 7.2.

For any graph G , MaxCut ( G ) ≤ Opt ( G ) . Next we claim that for all ˆ x on the sphere suﬃciently far away from the vertices of the booleanhypercube, the value of p G ( x ) is upper bounded by MaxCut ( G ), thus allowing us to restrict thefeasible region to all vectors x that are close to a vertex of the hypercube. Proposition 7.3.

For any k ≤ δ < n , let η = n √ n + δ . If (cid:107) x (cid:107) = 1 and (cid:107) x (cid:107) ≤ η , then p G ( x ) ≤ MaxCut ( G ) ≤ Opt ( G ) . Letting T δ = { x ∈ R n | (cid:107) x (cid:107) = 1 , (cid:107) x (cid:107) ≥ η } , then Opt ( G ) =max x ∈T δ p G ( x ) .Proof. We can write any x on the sphere (cid:107) x (cid:107) = 1 as x = ( y + ∆) / (cid:107) ( y + ∆) (cid:107) , where y ∈ {± / √ n } n and ∆ is orthogonal to y (see Figure 3). Let y = / √ n without loss of generality and (cid:107) ∆ (cid:107) = δ/n . y = √ n x Figure 3.

Illustration of the parameterization of the sphere we use in the proofof Proposition 7.3.Then any x in the intersection of the sphere and non-negative orthant can be written as x = (cid:114) nn + δ ( / √ n + ∆) = / √ n + ∆ (cid:112) δ/n , for some ∆ where (cid:107) / √ n + ∆ (cid:107) = √ n and δ ≤ n . By construction, (cid:107) x (cid:107) = η . Next we bound theproduct n (cid:89) i =1 (cid:0) nx i (cid:1) k = (1 + δ/n ) − nk n nk n (cid:89) i =1 (cid:12)(cid:12) / √ n + ∆ i (cid:12)(cid:12) k ≤ (1 + δ/n ) − nk n nk (cid:32) n n (cid:88) i =1 (cid:12)(cid:12) / √ n + ∆ i (cid:12)(cid:12)(cid:33) nk = (1 + δ/n ) − nk ≤ e − kδ/ , where we have used the AM/GM inequality, the fact that (cid:107) / √ n + ∆ (cid:107) = √ n and (1 + x/n ) − n ≤ e − x/ for 0 ≤ x ≤ n . Since x † Q G x ≤ λ max ( Q G ) ≤ MaxCut ( G ), if δ ≥ k , then for all x inthe nonnegative orthant where (cid:107) x (cid:107) = 1 and (cid:107) x (cid:107) ≥ η = n √ n + δ , p G ( x ) ≤ MaxCut ( G ). We canthen repeat this argument for all other vertices of the hypercube. Geometrically T δ is deﬁned asthe union of spherical caps centered around the vertices of the hypercube {± / √ n } n . Thus for any x (cid:54)∈ T δ , p G ( x ) ≤ MaxCut ( G ) and we can restrict the optimization problem to T δ . (cid:3) This restriction of the feasible set allows us to ﬁnd an upper bound on

Opt ( G ). Proposition 7.4.

There exists a universal constant C such that for all k ≥ C , Opt ( G ) < MaxCut ( G ) . roof. Any ˆ x ∈ T δ can be written as ˆ x = ( y + ∆) / (cid:112) δ/n , where y ∈ {± / √ n } n , (cid:107) ∆ (cid:107) ≤ δ/n and (cid:104) ∆ , y (cid:105) = 0. Thenˆ x † Q G ˆ x ≤ ( y + ∆) † Q G ( y + ∆) ≤ (cid:16)(cid:112) MaxCut ( G ) + (cid:112) δλ max ( Q G ) /n (cid:17) ≤ (cid:16)(cid:112) MaxCut ( G ) + (cid:112) MaxCut ( G )2 δ/n (cid:17) ≤ MaxCut ( G ) (cid:16) (cid:112) δ/n (cid:17) where we used the bound in (19). We get the desired bound by choosing a large enough constant k so that δ = k and (1 + (cid:112) δ/n ) < for all n . (cid:3) This shows us that for a constant k , if we can ﬁnd an algorithm that solves Opt ( G ), then wecan also approximate MaxCut ( G ) to within a factor of . However [BK98] showed that this isnot possible unless P = N P , thus completing the proof of Theorem 7.1.8.

Conclusion

In this paper we studied the problem of maximizing the product of non-negative forms over thesphere. Even though the objective is a high degree dense polynomial on the sphere, we leveraged itscompact representation as a product of low degree polynomials formulate a series of computationallyeﬃcient relaxations. We then provided bounds on the quality of these relaxations and showed thatthey are much better than known bounds for approximating general polynomial optimization.A few intriguing questions remain. Although we showed a partial order for the values of re-laxations in Section 6.2, it remains to prove that the values of

OptSOS k ( A ) are monotone forincreasing values of k . Numerical experiments suggest that this is the case. Another open problemis to extend the analysis of the performance ratio of the Sum-of-Squares relaxation in section 6.3 toﬁnd a bound on its approximation ratio. Answering these questions may require proving identitiesinvolving products of pseudoexpectations.The main tools in formulating the low degree relaxations in this paper are algebraic identitiessuch as the AM/GM and Maclaurin’s inequalities, that bounds the objective and at the same timereduces the polynomial’s degree. This idea may also be applied to other optimization problemswith compact representation. References [AGGS17] Nima Anari, Leonid Gurvits, Shayan Oveis Gharan, and Amin Saberi,

Simply Exponential Approximationof the Permanent of Positive Semideﬁnite Matrices , 2017 IEEE 58th Annual Symposium on Foundationsof Computer Science (FOCS), October 2017, pp. 914–925.[AGSS17] Nima Anari, Shayan Oveis Gharan, Amin Saberi, and Mohit Singh,

Nash Social Welfare, Matrix Perma-nent, and Stable Polynomials , 8th Innovations in Theoretical Computer Science Conference (ITCS 2017)(Dagstuhl, Germany) (Christos H. Papadimitriou, ed.), Leibniz International Proceedings in Informatics(LIPIcs), vol. 67, Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2017, pp. 36:1–36:12.[AH17] Amir Ali Ahmadi and Georgina Hall,

On the construction of converging hierarchies for polynomial opti-mization based on certiﬁcates of global positivity , arXiv:1709.09307 [cs, math] (2017).[AHZ08] Wenbao Ai, Yongwei Huang, and Shuzhong Zhang,

On the Low Rank Solutions for Linear Matrix In-equalities , Mathematics of Operations Research (2008), no. 4, 965–975.[Ari98] J. Arias-de-Reyna, Gaussian variables, polynomials and permanents , Linear Algebra and its Applications (1998), no. 1-3, 107–114 (en).[Bar93] Alexander I. Barvinok,

Feasibility testing for systems of real quadratic equations , Discrete & Computa-tional Geometry (1993), no. 1, 1–13 (en).[Bar02] Alexander Barvinok, A Course in Convexity , Graduate Studies in Mathematics, vol. 54, American Math-ematical Society, Providence, Rhode Island, November 2002 (en).[Bar20] ,

Integrating products of quadratic forms , arXiv:2002.07249 [cs, math] (2020). BGG +

17] V. Bhattiprolu, M. Ghosh, V. Guruswami, E. Lee, and M. Tulsiani,

Weak Decoupling, Polynomial Foldsand Approximate Optimization over the Sphere , 2017 IEEE 58th Annual Symposium on Foundations ofComputer Science (FOCS), October 2017, pp. 1008–1019.[BHO09] E. Bjornson, D. Hammarwall, and B. Ottersten,

Exploiting Quantized Channel Norm Feedback ThroughConditional Statistics in Arbitrarily Correlated MIMO Systems , IEEE Transactions on Signal Processing (2009), no. 10, 4027–4041 (en).[BK98] Piotr Berman and Marek Karpinski, On Some Tighter Inapproximability Results, Further Improvements ,Tech. Report 065, 1998.[CKM +

16] Ioannis Caragiannis, David Kurokawa, Herv´e Moulin, Ariel D. Procaccia, Nisarg Shah, and JunxingWang,

The Unreasonable Fairness of Maximum Nash Welfare , Proceedings of the 2016 ACM Conferenceon Economics and Computation - EC ’16 (Maastricht, The Netherlands), ACM Press, 2016, pp. 305–322(en).[DW12] Andrew C. Doherty and Stephanie Wehner,

Convergence of SDP hierarchies for polynomial optimizationon the hypersphere , arXiv:1210.5048 [math-ph, physics:quant-ph] (2012).[FF20] Kun Fang and Hamza Fawzi,

The sum-of-squares hierarchy on the sphere and applications in quantuminformation theory , Mathematical Programming (2020) (en).[FH12] P´eter E. Frenkel and P´eter Horv´ath,

Minkowski’s inequality and sums of squares , arXiv:1206.5783 [math](2012).[Fol01] Gerald B. Folland,

How to Integrate a Polynomial over a Sphere , The American Mathematical Monthly (2001), no. 5, 446–448.[Fre08] P´eter E. Frenkel,

Pfaﬃans, hafnians and products of real linear functionals , Mathematical ResearchLetters (2008), no. 2, 351–358 (en).[FSP16] Hamza Fawzi, James Saunderson, and Pablo A. Parrilo, Sparse sums of squares on ﬁnite abelian groupsand improved semideﬁnite lifts , Mathematical Programming (2016), no. 1, 149–191 (en).[Gal] Robert G Gallager,

Circularly-Symmetric Gaussian random vectors

Symmetry groups, semideﬁnite programs, and sums of squares ,Journal of Pure and Applied Algebra (2004), no. 1-3, 95–128.[GS00] Hongsheng Gao and Peter J Smith,

A Determinant Representation for the Distribution of QuadraticForms in Complex Normal Vectors , Journal of Multivariate Analysis (2000), no. 2, 155–165 (en).[H˚as01] Johan H˚astad, Some Optimal Inapproximability Results , J. ACM (2001), no. 4, 798–859.[HKP +

17] Samuel B. Hopkins, Pravesh K. Kothari, Aaron Potechin, Prasad Raghavendra, Tselil Schramm, andDavid Steurer,

The Power of Sum-of-Squares for Detecting Hidden Structures , 2017 IEEE 58th AnnualSymposium on Foundations of Computer Science (FOCS) (Berkeley, CA), IEEE, October 2017, pp. 720–731 (en).[HSSS15] Samuel B. Hopkins, Tselil Schramm, Jonathan Shi, and David Steurer,

Fast spectral algorithms from sum-of-squares proofs: Tensor decomposition and planted sparse vectors , arXiv:1512.02337 [cs, stat] (2015).[Kan48] L. V. Kantorovich,

Functional analysis and applied mathematics , Uspekhi Mat. Nauk (1948), no. 6(28),89–185 (ru).[Kee10] Robert W. Keener, Theoretical statistics: Topics for a core course , Springer Texts in Statistics, Springer,New York, 2010 (en).[KKW05] Masakazu Kojima, Sunyoung Kim, and Hayato Waki,

Sparsity in sums of squares of polynomials , Math-ematical Programming (2005), no. 1, 45–62 (en).[Las01] Jean B. Lasserre,

Global Optimization with Polynomials and the Problem of Moments , SIAM Journal onOptimization (2001), no. 3, 796–817 (en).[Lee17] Euiwoong Lee, APX-hardness of maximizing Nash social welfare with indivisible items , Information Pro-cessing Letters (2017), 17–20 (en).[Lie66] Elliott H. Lieb,

Proofs of some Conjectures on Permanents , Journal of Mathematics and Mechanics (1966), no. 2, 127–134.[LY08] David G. Luenberger and Yinyu Ye, Linear and nonlinear programming , 3rd ed ed., International Seriesin Operations Research and Management Science, Springer, New York, NY, 2008 (en).[Mar97] Marvin Marcus,

A lower bound for the product of linear forms , Linear and Multilinear Algebra (1997),no. 1-3, 115–120.[MM06] M´at´e Matolcsi and Gustavo A. Mu˜noz, On the real linear polarization constant problem , MathematicalInequalities & Applications (2006), no. 3, 485–494 (en).[New60] Morris Newman,

Kantorovich’s inequality , Journal of Research of the National Bureau of StandardsSection B Mathematics and Mathematical Physics (1960), no. 1, 33 (en). Par00] Pablo A. Parrilo,

Structured semideﬁnite programs and semialgebraic geometry methods in robustnessandoptimization , PhD thesis, California Institute of Technology, 2000.[Pat98] G´abor Pataki,

On the Rank of Extreme Matrices in Semideﬁnite Programs and the Multiplicity of OptimalEigenvalues , Mathematics of Operations Research (1998) (en).[PR04] Alexandros Pappas and Szil´ard Gy. R´ev´esz,

Linear polarization constants of Hilbert spaces , Journal ofMathematical Analysis and Applications (2004), no. 1, 129–146 (en).[Rag86] M. Raghavachari,

A Linear Programming Proof of Kantorovich’s Inequality , The American Statistician (1986), no. 2, 136–137 (en).[Tre12] Luca Trevisan, Max Cut and the Smallest Eigenvalue , SIAM Journal on Computing (2012), no. 6,1769–1786 (en).[VBW98] Lieven Vandenberghe, Stephen Boyd, and Shao-Po Wu, Determinant Maximization with Linear MatrixInequality Constraints , SIAM Journal on Matrix Analysis and Applications (1998), no. 2, 499–533(en).[WPM77] James H. Vander Weide, David W. Peterson, and Steven F. Maier, A Strategy Which Maximizes theGeometric Mean Return on Portfolio Investments , Management Science (1977), no. 10, 1117–1123.[YP21] Chenyang Yuan and Pablo A. Parrilo, Maximizing products of linear forms, and the permanent of positivesemideﬁnite matrices , Mathematical Programming (2021) (en).

Appendix A. Expected Log of Generalized Chi-squared Distribution

Given λ , . . . , λ n > z i ∼ N C (0 ,

1) be i.i.d. complex Gaussians, we wish to ﬁnd: E (cid:34) log (cid:32)(cid:88) i λ i | z i | (cid:33)(cid:35) Using equation (11) from [GS00], we know that the density of the random variable Z = (cid:80) i λ i | z i | is: f ( z ) = ( − n − n (cid:88) i =1 λ n − i exp( − z/λ i ) (cid:81) j (cid:54) = i ( λ j − λ i )Suppose λ i are distinct, using the integral (cid:82) ∞ log( z ) exp( − z/λ i ) = λ i ( − γ + log λ i ), we get: E [log( Z )] = ( − n − n (cid:88) i =1 λ n − i ( − γ + log λ i ) (cid:81) j (cid:54) = i ( λ j − λ i )= − γ + ( − n − n (cid:88) i =1 λ n − i log λ i (cid:81) j (cid:54) = i ( λ j − λ i ) . The identity in the last step can be proved using diﬀerent representations of the determinant of aVandermonde matrix. The sum can be represented as a ratio of determinants. Let V =  λ · · · λ n − λ · · · λ n − ... ... . . . ...1 λ n · · · λ n − n  and ¯ V =  λ · · · λ n − log λ λ · · · λ n − log λ ... ... . . . ...1 λ n · · · λ n − n log λ n  . Then E [log( Z )] = − γ + det( ¯ V )det( V ) . Now suppose some of the λ i are repeated, then we can determine the pdf of Z using results fromSection II of [BHO09]. In particular, if λ = λ and λ , . . . , λ n = (cid:15) , then f ( z ) = 1 λ(cid:15) n − (cid:32) e − z/λ (1 /(cid:15) − /λ ) n − + n − (cid:88) (cid:96) =1 ( − (cid:96) +1 x n − − (cid:96) e − x/(cid:15) ( n − − (cid:96) )!(1 /λ − /(cid:15) ) (cid:96) (cid:33) . sing the integral (where b ≥ a > (cid:90) ∞ x b − e − x/a log x dx = a b ( b − a ) + φ ( b )) , we can derive a closed form expression for E [log( Z )]: E [log( Z )] = 1 λ(cid:15) n − (cid:32) − γ + log λ (1 /(cid:15) − /λ ) n − + n − (cid:88) (cid:96) =1 ( − (cid:96) +1 (cid:15) n − (cid:96) (log (cid:15) + φ ( n − (cid:96) ))(1 /λ − /(cid:15) ) (cid:96) (cid:33) = λ n − ( − γ + log λ )( λ − (cid:15) ) n − − n − (cid:88) (cid:96) =1 (cid:15)λ (cid:96) − (log (cid:15) + φ ( n − (cid:96) ))( λ − (cid:15) ) (cid:96) . Appendix B. Proof of Proposition 2.2

From [Fol01] we know that given the monomial x β = (cid:81) ni =1 x β i i , its integral over the real sphere S n − can be computed as follows: (cid:90) S n − x β dx = 2Γ( γ ) · · · Γ( γ n )Γ( γ + · · · + γ n ) , where γ i = ( β i + 1). Next let d = (cid:80) i β i and k ≥ (cid:107) x (cid:107) =1 x β = lim k →∞ (cid:18)(cid:90) S n − x kβ dx (cid:19) /k = lim k →∞ (cid:18) (cid:81) ni =1 Γ( kβ + 1 / kd + n/ (cid:19) /k = lim k →∞ (cid:81) ni =1 ( kβ i − / β i ( kd + n/ − d = (cid:81) ni =1 β iβ i d d ..