[PDF] Lower bounds on matrix factorization ranks via noncommutative polynomial optimization

Abstract

We use techniques from (tracial noncommutative) polynomial optimization to formulate hierarchies of semidefinite programming lower bounds on matrix factorization ranks. In particular, we consider the nonnegative rank, the positive semidefinite rank, and their symmetric analogues: the completely positive rank and the completely positive semidefinite rank. We study the convergence properties of our hierarchies, compare them extensively to known lower bounds, and provide some (numerical) examples.

Full PDF

NNoname manuscript No. (will be inserted by the editor)

Lower bounds on matrix factorization ranks vianoncommutative polynomial optimization

Sander Gribling · David de Laat · MoniqueLaurent

Received: date / Accepted: date

Abstract

We use techniques from (tracial noncommutative) polynomial optimiza-tion to formulate hierarchies of semideﬁnite programming lower bounds on matrixfactorization ranks. In particular, we consider the nonnegative rank, the positive semi-deﬁnite rank, and their symmetric analogues: the completely positive rank and thecompletely positive semideﬁnite rank. We study convergence properties of our hier-archies, compare them extensively to known lower bounds, and provide some (nu-merical) examples.

Keywords

Matrix factorization ranks · Nonnegative rank · Positive semideﬁniterank · Completely positive rank · Completely positive semideﬁnite rank · Noncom-mutative polynomial optimization

Mathematics Subject Classiﬁcation (2010) · · A ∈ R m × n over a sequence { K d } d ∈ N of cones that are eachequipped with an inner product (cid:104)· , ·(cid:105) is a decomposition of the form A = ( (cid:104) X i , Y j (cid:105) ) with X i , Y j ∈ K d for all ( i , j ) ∈ [ m ] × [ n ] , for some integer d ∈ N . Following [35], the The ﬁrst and second authors are supported by the Netherlands Organization for Scientiﬁc Research, grantnumber 617.001.351, and the second author by the ERC Consolidator Grant QPROGRESS 615307.S. GriblingCWI, AmsterdamD. de LaatCWI, AmsterdamM. LaurentCWI, Amsterdam, and Tilburg University a r X i v : . [ m a t h . O C ] N ov Gribling, de Laat, Laurent smallest integer d for which such a factorization exists is called the cone factorizationrank of A over { K d } .The cones K d we use in this paper are the nonnegative orthant R d + with the usualinner product and the cone S d + (resp., H d + ) of d × d real symmetric (resp., Hermitian)positive semideﬁnite matrices with the trace inner product (cid:104) X , Y (cid:105) = Tr ( X T Y ) (resp., (cid:104) X , Y (cid:105) = Tr ( X ∗ Y ) ). We obtain the nonnegative rank , denoted rank + ( A ) , which usesthe cones K d = R d + , and the positive semideﬁnite rank , denoted psd-rank K ( A ) , whichuses the cones K d = S d + for K = R and K d = H d + for K = C . Both the nonnegativerank and the positive semideﬁnite rank are deﬁned whenever A is entrywise nonneg-ative.The study of the nonnegative rank is largely motivated by the groundbreakingwork of Yannakakis [78], who showed that the linear extension complexity of a poly-tope P is given by the nonnegative rank of its slack matrix. The linear extensioncomplexity of P is the smallest integer d for which P can be obtained as the linear im-age of an afﬁne section of the nonnegative orthant R d + . The slack matrix of P is givenby the matrix ( b i − a T i v ) v ∈ V , i ∈ I , where P = conv ( V ) and P = { x : a T i x ≤ b i ( i ∈ I ) } are the point and hyperplane representations of P . Analogously, the semideﬁnite ex-tension complexity of P is the smallest d such that P is the linear image of an afﬁnesection of the cone S d + and it is given by the (real) positive semideﬁnite rank of itsslack matrix [35].The motivation to study the linear and semideﬁnite extension complexities is thatpolytopes with small extension complexity admit efﬁcient algorithms for linear op-timization. Well-known examples include spanning tree polytopes [55] and permu-tahedra [33], which have polynomial linear extension complexity, and the stable setpolytope of perfect graphs, which has polynomial semideﬁnite extension complex-ity [53] (see, e.g., the surveys [19,26]). The above connection to the nonnegative rankand to the positive semideﬁnite rank of the slack matrix can be used to show that apolytope does not admit a small extended formulation. Recently this connection wasused to show that the linear extension complexities of the traveling salesman, cut,and stable set polytopes are exponential in the number of nodes [30], and this resultwas extended to their semideﬁnite extension complexities in [51]. Surprisingly, thelinear extension complexity of the matching polytope is also exponential [66], eventhough linear optimization over this set is polynomial time solvable [24]. It is an openquestion whether the semideﬁnite extension complexity of the matching polytope isexponential.Besides this link to extension complexity, the nonnegative rank also ﬁnds ap-plications in probability theory and in communication complexity, and the positivesemideﬁnite rank has applications in quantum information theory and in quantumcommunication complexity (see, e.g., [56,25,42,30]).For square symmetric matrices ( m = n ) we are also interested in symmetric ana-logues of the above matrix factorization ranks, where we require the same factorsfor the rows and columns (i.e., X i = Y i for all i ∈ [ n ] ). The symmetric analog of thenonnegative rank is the completely positive rank , denoted cp-rank ( A ) , which uses thecones K d = R d + , and the symmetric analog of the positive semideﬁnite rank is the completely positive semideﬁnite rank , denoted cpsd-rank K ( A ) , which uses the cones K d = S d + if K = R and K d = H d + if K = C . These symmetric factorization ranks ower bounds on matrix factorization ranks via noncommutative polynomial optimization 3 are not always well deﬁned since not every symmetric nonnegative matrix admits asymmetric factorization by nonnegative vectors or postive semideﬁnite matrices. Thesymmetric matrices for which these parameters are well deﬁned form convex conesknown as the completely positive cone , denoted CP n , and the completely positivesemideﬁnite cone , denoted CS n + . We have the inclusions CP n ⊆ CS n + ⊆ S n + , which areknown to be strict for n ≥

5. For details on these cones see [7,18,50] and referencestherein.Motivation for the cones CP n and CS n + comes in particular from their use to modelclassical and quantum information optimization problems. For instance, graph pa-rameters such as the stability number and the chromatic number can be written aslinear optimization problems over the completely positive cone [45], and the sameholds, more generally, for quadratic problems with mixed binary variables [14]. Thecp-rank is widely studied in the linear algebra community; see, e.g., [7,69,68,11].The completely positive semideﬁnite cone was ﬁrst studied in [50] to describequantum analogues of the stability number and of the chromatic number of a graph.This was later extended to general graph homomorphisms in [72] and to graph iso-morphism in [3]. In addition, as shown in [54,72], there is a close connection be-tween the completely positive semideﬁnite cone and the set of quantum correlations.This also gives a relation between the completely positive semideﬁnite rank and theminimal entanglement dimension necessary to realize a quantum correlation. Thisconnection has been used in [62,39,63] to construct matrices whose completely pos-itive semideﬁnite rank is exponentially large in the matrix size. For the special case ofsynchronous quantum correlations the minimum entanglement dimension is directlygiven by the completely positive semideﬁnite rank of a certain matrix (see [38]).The following inequalities hold for the nonnegative rank and the positive semi-deﬁnite rank: We havepsd-rank C ( A ) ≤ psd-rank R ( A ) ≤ rank + ( A ) ≤ min { m , n } for any m × n nonnegative matrix A and cp-rank ( A ) ≤ (cid:0) n + (cid:1) for any n × n completelypositive matrix A . However, the situation for the cpsd-rank is very different. Ex-ploiting the connection between the completely positive semideﬁnite cone and quan-tum correlations it follows from results in [73] that the cone CS n + is not closed for n ≥ n ≥

10. As a conse-quence there does not exist an upper bound on the cpsd-rank as a function of thematrix size. For small matrix sizes very little is known. It is an open problem whetherCS + is closed, and we do not even know how to construct a 5 × + , cp-rank, and psd-rank are known to be computable; this follows usingresults from [65] since upper bounds exist on these factorization ranks that dependonly on the matrix size, see [6] for a proof for the case of the cp-rank. But computingthe nonnegative rank is NP-hard [76]. In fact, determining the rank + and psd-rankof a matrix are both equivalent to the existential theory of the reals [70,71]. For thecp-rank and the cpsd-rank no such results are known, but there is no reason to assumethey are any easier. In fact it is not even clear whether the cpsd-rank is computable ingeneral. Gribling, de Laat, Laurent

To obtain upper bounds on the factorization rank of a given matrix one can employheuristics that try to construct small factorizations. Many such heuristics exist for thenonnegative rank (see the overview [31] and references therein), factorization algo-rithms exist for completely positive matrices (see the recent paper [40], also [21] forstructured completely positive matrices), and algorithms to compute positive semi-deﬁnite factorizations are presented in the recent work [75]. In this paper we want tocompute lower bounds on matrix factorization ranks, which we achieve by employinga relaxation approach based on (noncommutative) polynomial optimization.1.2 Contributions and connections to existing boundsIn this work we provide a uniﬁed approach to obtain lower bounds on the four matrixfactorization ranks mentioned above, based on tools from (noncommutative) polyno-mial optimization.We sketch the main ideas of our approach in Section 1.4 below, after havingintroduced some necessary notation and preliminaries about (noncommutative) poly-nomials in Section 1.3. We then indicate in Section 1.5 how our approach relates tothe more classical use of polynomial optimization dealing with the minimization ofpolynomials over basic closed semialgebraic sets. The main body of the paper con-sists of four sections each dealing with one of the four matrix factorization ranks. Westart with presenting our approach for the completely positive semideﬁnite rank andthen explain how to adapt this to the other ranks.For our results we need several technical tools about linear forms on spaces ofpolynomials, both in the commutative and noncommutative setting. To ease the read-ability of the paper we group these technical tools in Appendix A. Moreover, we pro-vide full proofs, so that our paper is self-contained. In addition, some of the proofsmight differ from the customary ones in the literature since our treatment in this pa-per is consistently on the ‘moment’ side rather than using real algebraic results aboutsums of squares.In Section 2 we introduce our approach for the completely positive semideﬁniterank. We start by deﬁning a hierarchy of lower bounds ξ cpsd1 ( A ) ≤ ξ cpsd2 ( A ) ≤ . . . ≤ ξ cpsd t ( A ) ≤ . . . ≤ cpsd-rank C ( A ) , where ξ cpsd t ( A ) , for t ∈ N , is given as the optimal value of a semideﬁnite programwhose size increases with t . Not much is known about lower bounds for the cpsd-rank in the literature. The inequality (cid:112) rank ( A ) ≤ cpsd-rank C ( A ) is known, whichfollows by viewing a Hermitian d × d matrix as a d -dimensional real vector, and ananalytic lower bound is given in [62]. We show that the new parameter ξ cpsd1 ( A ) isat least as good as this analytic lower bound and we give a small example where astrengthening of ξ cpsd2 ( A ) is strictly better then both above mentioned generic lowerbounds. Currently we lack evidence that the lower bounds ξ cpsd t ( A ) can be larger than,for example, the matrix size, but this could be because small matrices with large cpsd-rank are hard to construct or might even not exist. We also introduce several ideasleading to strengthenings of the basic bounds ξ cpsd t ( A ) . ower bounds on matrix factorization ranks via noncommutative polynomial optimization 5 We then adapt these ideas to the other three matrix factorization ranks discussedabove, where for each of them we obtain analogous hierarchies of bounds.For the nonnegative rank and the completely positive rank much more is knownabout lower bounds. The best known generic lower bounds are due to Fawzi and Par-rilo [27,28]. In [28] the parameters τ + ( A ) and τ cp ( A ) are deﬁned, which, respectively,lower bound the nonnegative rank and the cp-rank, along with their computable semi-deﬁnite programming relaxations τ sos + ( A ) and τ soscp ( A ) . In [28] it is also shown that τ + ( A ) is at least as good as certain norm-based lower bounds. In particular, τ + ( · ) is atleast as good as the (cid:96) ∞ norm-based lower bound, which was used by Rothvoß [66] toshow that the matching polytope has exponential linear extension complexity. In [27]it is shown that for the Frobenius norm, the square of the norm-based bound is stilla lower bound on the nonnegative rank, but it is not known how this lower boundcompares to τ + ( · ) .Fawzi and Parrilo [28] use the atomicity of the nonnegative and completely pos-itive ranks to derive the parameters τ + ( A ) and τ cp ( A ) ; i.e., they use the fact that thenonnegative rank (cp-rank) of A is equal to the smallest d for which A can be writ-ten as the sum of d nonnegative (positive semideﬁnite) rank one matrices. As thepsd-rank and cpsd-rank are not known to admit atomic formulations, the techniquesfrom [28] do not extend directly to these factorization ranks. However, our approachvia polynomial optimization captures these factorization ranks as well.In Sections 3 and 4 we construct semideﬁnite programming hierarchies of lowerbounds ξ cp t ( A ) and ξ + t ( A ) on cp-rank ( A ) and rank + ( A ) . We show that the bounds ξ + t ( A ) converge to τ + ( A ) as t → ∞ . The basic hierarchy { ξ cp t ( A ) } for the cp-rankdoes not converge to τ cp ( A ) in general, but we provide two types of additional con-straints that can be added to the program deﬁning ξ cp t ( A ) to ensure convergence to τ cp ( A ) . First, we show how a generalization of the tensor constraints that are used inthe deﬁnition of the parameter τ soscp ( A ) can be used for this, and we also give a moreefﬁcient (using smaller matrix blocks) description of these constraints. This strength-ening of ξ cp2 ( A ) is then at least as strong as τ soscp ( A ) , but requires matrix variablesof roughly half the size. Alternatively, we show that for every ε > { ξ cp t ( A ) } so that the limit of the sequence of these new lower bounds ξ + t ( A ) is atleast τ cp ( A ) − ε . We give numerical results on small matrices studied in the literature,which show that ξ + ( A ) can improve over τ sos + ( A ) .Finally, in Section 5 we derive a hierarchy { ξ psd t ( A ) } of lower bounds on thepsd-rank. We compare the new bounds ξ psd t ( A ) to a bound from [52] and we providesome numerical examples illustrating their performance.We provide two implementations of all the lower bounds introduced in this paper,at the arXiv submission of this paper. One implementation uses Matlab and the CVXpackage [37], and the other one uses Julia [9]. The implementations support varioussemideﬁnite programming solvers, for our numerical examples we used Mosek [2]. Gribling, de Laat, Laurent x , . . . , x n by (cid:104) x (cid:105) = (cid:104) x , . . . , x n (cid:105) , where the empty word is denoted by 1. This is a semigroup withinvolution, where the binary operation is concatenation, and the involution of a word w ∈ (cid:104) x (cid:105) is the word w ∗ obtained by reversing the order of the symbols in w . The ∗ -algebra of all real linear combinations of these words is denoted by R (cid:104) x (cid:105) , and itselements are called noncommutative polynomials . The involution extends to R (cid:104) x (cid:105) bylinearity. A polynomial p ∈ R (cid:104) x (cid:105) is called symmetric if p ∗ = p and Sym R (cid:104) x (cid:105) de-notes the set of symmetric polynomials. The degree of a word w ∈ (cid:104) x (cid:105) is the numberof symbols composing it, denoted as | w | or deg ( w ) , and the degree of a polyno-mial p = ∑ w p w w ∈ R (cid:104) x (cid:105) is the maximum degree of a word w with p w (cid:54) =

0. Given t ∈ N ∪{ ∞ } , we let (cid:104) x (cid:105) t be the set of words w of degree | w | ≤ t , so that (cid:104) x (cid:105) ∞ = (cid:104) x (cid:105) , and R (cid:104) x (cid:105) t is the real vector space of noncommutative polynomials p of degree deg ( p ) ≤ t .Given t ∈ N , we let (cid:104) x (cid:105) = t be the set of words of degree exactly equal to t .For a set S ⊆ Sym R (cid:104) x (cid:105) and t ∈ N ∪{ ∞ } , the truncated quadratic module at degree2 t associated to S is deﬁned as the cone generated by all polynomials p ∗ gp ∈ R (cid:104) x (cid:105) t with g ∈ S ∪ { } : M t ( S ) = cone (cid:110) p ∗ gp : p ∈ R (cid:104) x (cid:105) , g ∈ S ∪ { } , deg ( p ∗ gp ) ≤ t (cid:111) . (1)Likewise, for a set T ⊆ R (cid:104) x (cid:105) , we can deﬁne the truncated ideal at degree 2 t , denotedby I t ( T ) , as the vector space spanned by all polynomials ph ∈ R (cid:104) x (cid:105) t with h ∈ T : I t ( T ) = span (cid:8) ph : p ∈ R (cid:104) x (cid:105) , h ∈ T , deg ( ph ) ≤ t (cid:9) . (2)We say that M ( S ) + I ( T ) is Archimedean when there exists a scalar R > R − n ∑ i = x i ∈ M ( S ) + I ( T ) . (3)Throughout we are interested in the space R (cid:104) x (cid:105) ∗ t of real-valued linear functionalson R (cid:104) x (cid:105) t . We list some basic deﬁnitions: A linear functional L ∈ R (cid:104) x (cid:105) ∗ t is symmetric if L ( w ) = L ( w ∗ ) for all w ∈ (cid:104) x (cid:105) t and tracial if L ( ww (cid:48) ) = L ( w (cid:48) w ) for all w , w (cid:48) ∈ (cid:104) x (cid:105) t .A linear functional L ∈ R (cid:104) x (cid:105) ∗ t is said to be positive if L ( p ∗ p ) ≥ p ∈ R (cid:104) x (cid:105) t .Many properties of a linear functional L ∈ R (cid:104) x (cid:105) ∗ t can be expressed as properties ofits associated moment matrix (also known as its Hankel matrix ). For L ∈ R (cid:104) x (cid:105) ∗ t wedeﬁne its associated moment matrix , which has rows and columns indexed by wordsin (cid:104) x (cid:105) t , by M t ( L ) w , w (cid:48) = L ( w ∗ w (cid:48) ) for w , w (cid:48) ∈ (cid:104) x (cid:105) t , and as usual we set M ( L ) = M ∞ ( L ) . It then follows that L is symmetric if and onlyif M t ( L ) is symmetric, and L is positive if and only if M t ( L ) is positive semideﬁnite.In fact, one can even express nonnegativity of a linear form L ∈ R (cid:104) x (cid:105) ∗ t on M t ( S ) interms of certain associated positive semideﬁnite moment matrices. For this, given a ower bounds on matrix factorization ranks via noncommutative polynomial optimization 7 polynomial g ∈ R (cid:104) x (cid:105) , deﬁne the linear form gL ∈ R (cid:104) x (cid:105) ∗ t − deg ( g ) by ( gL )( p ) = L ( gp ) .Then we have L ( p ∗ gp ) ≥ p ∈ R (cid:104) x (cid:105) t − d g ⇐⇒ M t − d g ( gL ) (cid:23) , ( d g = (cid:100) deg ( g ) / (cid:101) ) , and thus L ≥ M t ( S ) if and only if M t − d g ( gL ) (cid:23) g ∈ S ∪ { } . Also, thecondition L = I t ( T ) corresponds to linear equalities on the entries of M t ( L ) .The moment matrix also allows us to deﬁne a property called ﬂatness . For t ∈ N ,a linear functional L ∈ R (cid:104) x (cid:105) ∗ t is called δ -ﬂat if the rank of M t ( L ) is equal to that ofits principal submatrix indexed by the words in (cid:104) x (cid:105) t − δ , that is,rank ( M t ( L )) = rank ( M t − δ ( L )) . (4)We call L ﬂat if it is δ -ﬂat for some δ ≥

1. When t = ∞ , L is said to be ﬂat whenrank ( M ( L )) < ∞ , which is equivalent to rank ( M ( L )) = rank ( M s ( L )) for some s ∈ N .A key example of a ﬂat symmetric tracial positive linear functional on R (cid:104) x (cid:105) isgiven by the trace evaluation at a given matrix tuple X = ( X , . . . , X n ) ∈ ( H d ) n : p (cid:55)→ Tr ( p ( X )) . Here p ( X ) denotes the matrix obtained by substituting x i by X i in p , and throughoutTr ( · ) denotes the usual matrix trace, which satisﬁes Tr ( I ) = d where I is the identitymatrix in H d . We mention in passing that we use tr ( · ) to denote the normalized matrixtrace , which satisﬁes tr ( I ) = I ∈ H d . Throughout, we use L X to denote the realpart of the above functional, that is, L X denotes the linear form on R (cid:104) x (cid:105) deﬁned by L X ( p ) = Re ( Tr ( p ( X , . . . , X n ))) for p ∈ R (cid:104) x (cid:105) . (5)Observe that L X too is a symmetric tracial positive linear functional on R (cid:104) x (cid:105) . More-over, L X is nonnegative on M ( S ) if the matrix tuple X is taken from the matrixpositivity domain D ( S ) associated to the ﬁnite set S ⊆ Sym R (cid:104) x (cid:105) , deﬁned as D ( S ) = (cid:91) d ≥ (cid:110) X = ( X , . . . , X n ) ∈ ( H d ) n : g ( X ) (cid:23) g ∈ S (cid:111) . (6)Similarly, the linear functional L X is zero on I ( T ) if the matrix tuple X is taken fromthe matrix variety V ( T ) associated to the ﬁnite set T ⊆ Sym R (cid:104) x (cid:105) , deﬁned as V ( T ) = (cid:91) d ≥ (cid:8) X ∈ ( H d ) n : h ( X ) = h ∈ T (cid:9) , To discuss convergence properties of our lower bounds for matrix factorizationranks we will need to consider inﬁnite dimensional analogs of matrix algebras, namely C ∗ -algebras admitting a tracial state. Let us introduce some basic notions we needabout C ∗ -algebras; see, e.g., [10] for details. For our purposes we deﬁne a C ∗ -algebra to be a norm closed ∗ -subalgebra of the complex algebra B ( H ) of bounded oper-ators on a complex Hilbert space H . In particular, we have (cid:107) a ∗ a (cid:107) = (cid:107) a (cid:107) for allelements a in the algebra. Such an algebra A is said to be unital if it contains theidentity operator (denoted 1). For instance, any full complex matrix algebra C d × d is a unital C ∗ -algebra. Moreover, by a fundamental result of Artin-Wedderburn, any Gribling, de Laat, Laurent ﬁnite dimensional C ∗ -algebra (as a vector space) is ∗ -isomorphic to a direct sum (cid:76) Mm = C d m × d m of full complex matrix algebras [4,77]. In particular, any ﬁnite dimen-sional C ∗ -algebra is unital.An element b in a C ∗ -algebra A is called positive , denoted b (cid:23)

0, if it is of theform b = a ∗ a for some a ∈ A . For ﬁnite sets S ⊆ Sym R (cid:104) x (cid:105) and T ⊆ R (cid:104) x (cid:105) , the C ∗ -algebraic analogs of the matrix positivity domain and matrix variety are the sets D A ( S ) = (cid:8) X = ( X , . . . , X n ) ∈ A n : X ∗ i = X i for i ∈ [ n ] , g ( X ) (cid:23) g ∈ S (cid:9) , V A ( T ) = (cid:8) X = ( X , . . . , X n ) ∈ A n : X ∗ i = X i for i ∈ [ n ] , h ( X ) = h ∈ T (cid:9) . A state τ on a unital C ∗ -algebra A is a linear form on A that is positive , i.e., τ ( a ∗ a ) ≥ a ∈ A , and satisﬁes τ ( ) =

1. Since A is a complex algebra, everystate τ is Hermitian: τ ( a ) = τ ( a ∗ ) for all a ∈ A . We say that that a state is tracial if τ ( ab ) = τ ( ba ) for all a , b ∈ A and faithful if τ ( a ∗ a ) = a =

0. A useful factis that on a full matrix algebra C d × d the normalized matrix trace is the unique tracialstate (see, e.g., [16]). Now, given a tuple X = ( X , . . . , X n ) ∈ A n in a C ∗ -algebra A with tracial state τ , the second key example of a symmetric tracial positive linearfunctional on R (cid:104) x (cid:105) is given by the trace evaluation map , which we again denote by L X and is deﬁned by L X ( p ) = τ ( p ( X , . . . , X n )) for all p ∈ R (cid:104) x (cid:105) . A = ( Tr ( X i , X j )) , with d = cpsd-rank C ( A ) and X = ( X , . . . , X n ) in ( H d + ) n , consider the linear form L X on R (cid:104) x (cid:105) as deﬁned in (5): L X ( p ) = Re ( Tr ( p ( X , . . . , X n ))) for p ∈ R (cid:104) x (cid:105) . Then we have A = ( L X ( x i x j )) and cpsd-rank C ( A ) = d = L X ( ) . To obtain lowerbounds on cpsd-rank C ( A ) we minimize L ( ) over a set of linear functionals L thatsatisfy certain computationally tractable properties of L X . Note that this idea of min-imizing L ( ) has recently been used in the works [74,59] in the commutative settingto derive a hierarchy of lower bounds converging to the nuclear norm of a symmetrictensor.The above linear functional L X is symmetric and tracial. Moreover it satisﬁessome positivity conditions, since we have L X ( q ) ≥ q ( X ) is positive semi-deﬁnite. It follows that L X ( p ∗ p ) ≥ p ∈ R (cid:104) x (cid:105) and, as we explain later, L X satisﬁes the localizing conditions L X ( p ∗ ( √ A ii x i − x i ) p ) ≥ p and i . Truncat-ing the linear form yields the following hierarchy of lower bounds: ξ cpsd t ( A ) = min (cid:110) L ( ) : L ∈ R (cid:104) x , . . . , x n (cid:105) ∗ t tracial and symmetric , L ( x i x j ) = A i j for i , j ∈ [ n ] , L ≥ M t (cid:0) { (cid:112) A x − x , . . . , (cid:112) A nn x n − x n } (cid:1)(cid:111) . ower bounds on matrix factorization ranks via noncommutative polynomial optimization 9 The bound ξ cpsd t ( A ) is computationally tractable (for small t ). Indeed, as was ex-plained in Section 1.3, the localizing constraint “ L ≥ M t ( S ) ” can be enforcedby requiring certain matrices, whose entries are determined by L , to be positivesemideﬁnite. This makes the problem deﬁning ξ cpsd t ( A ) into a semideﬁnite program.The localizing conditions ensure the Archimedean property of the quadratic module,which permits to show certain convergence properties of the bounds ξ cpsd t ( A ) .The above approach extends naturally to the other matrix factorization ranks, us-ing the following two basic ideas. First, since the cp-rank and the nonnegative rankdeal with factorizations by diagonal matrices, we use linear functionals acting onclassical commutative polynomials. Second, the asymmetric factorization ranks (psd-rank and nonnegative rank) can be seen as analogs of the symmetric ranks in the partial matrix setting, where we know only the values of L on the quadratic monomi-als corresponding to entries in the off-diagonal blocks (this will require scaling of thefactors in order to be able to deﬁne localizing constraints ensuring the Archimedeanproperty). A main advantage of our approach is that it applies to all four matrix fac-torization ranks, after easy suitable adaptations.1.5 Connection to polynomial optimizationIn classical polynomial optimization the problem is to ﬁnd the global minimum of acommutative polynomial f over a semialgebraic set of the form D ( S ) = { x ∈ R n : g ( x ) ≥ g ∈ S } , where S ⊆ R [ x ] = R [ x , . . . , x n ] is a ﬁnite set of polynomials. Tracial polynomialoptimization is a noncommutative analog, where the problem is to minimize the nor-malized trace tr ( f ( X )) of a symmetric polynomial f over a matrix positivity domain D ( S ) where S ⊆ Sym R (cid:104) x (cid:105) is a ﬁnite set of symmetric polynomials. Notice that thedistinguishing feature here is the dimension independence: the optimization is overall possible matrix sizes. Perhaps counterintuitively, in this paper we use techniquessimilar to those used for the tracial polynomial optimization problem to computelower bounds on factorization dimensions.For classical polynomial optimization Lasserre [46] and Parrilo [60] have pro-posed hierarchies of semideﬁnite programming relaxations based on the theory ofmoments and the dual theory of sums of squares polynomials. These can be used tocompute successively better lower bounds converging to the global minimum (underthe Archimedean condition). This approach has been used in a wide range of appli-cations and there is an extensive literature (see, e.g., [1,47,49]). Most relevant to thiswork, it is used in [48] to design conic approximations of the completely positive coneand in [58] to check membership in the completely positive cone. This approach hasalso been extended to the noncommutative setting, ﬁrst to the eigenvalue optimiza-tion problem [61,57] (which will not play a role in this paper), and later to tracialoptimization [15,43]. Here, and throughout the paper, we use [ x ] as the commutative analogue of (cid:104) x (cid:105) . In fact, one could consider optimization over D ( S ) ∩ V ( T ) for some ﬁnite set T ⊆ R (cid:104) x (cid:105) , the resultsbelow still hold in that setting, see Appendix A.0 Gribling, de Laat, Laurent For our paper the moment formulation of the lower bounds is most relevant: Forall t ∈ N ∪ { ∞ } we can deﬁne the bounds f t = inf (cid:8) L ( f ) : L ∈ R [ x ] ∗ t , L ( ) = , L ≥ M t ( S ) (cid:9) , f tr t = inf (cid:8) L ( f ) : L ∈ R (cid:104) x (cid:105) ∗ t tracial and symmetric , L ( ) = , L ≥ M t ( S ) (cid:9) , where f t (resp., f tr t ) lower bounds the (tracial) polynomial optimization problem.The connection between the parameters ξ cpsd t ( A ) and f tr t is now clear: in the for-mer we do not have the normalization property “ L ( ) =

1” but we do have the ad-ditional afﬁne constraints “ L ( x i x j ) = A i j ”. This close relation to (tracial) polynomialoptimization allows us to use that theory to understand the convergence propertiesof our bounds. Since throughout the paper we use (proof) techniques from (tracial)polynomial optimization, we will state the main convergence results we need, withfull proofs, in Appendix A. Moreover, we give all proofs from the “moment side”,which is most relevant to our treatment. Below we give a short summary of the con-vergence results for the hierarchies { f t } and { f tr t } that are relevant to our paper. Werefer to Appendix A.3 for details.Under the condition that M ( S ) is Archimedean we have asymptotic convergence: f t → f ∞ and f tr t → f tr ∞ as t → ∞ . In the commutative setting one can moreover showthat f ∞ is equal to the global minimum of f over the set D ( S ) . However, in the non-commutative setting, the parameter f tr ∞ is in general not equal to the minimum oftr ( f ( X )) over X ∈ D ( S ) . Instead we need to consider the C ∗ -algebraic version of thetracial polynomial optimization problem: one can show that f tr ∞ = inf (cid:8) τ ( f ( X )) : X ∈ D A ( S ) , A is a unital C ∗ -algebra with tracial state τ (cid:9) . An important additional convergence result holds under ﬂatness. If the program deﬁn-ing the bound f t (resp., f tr t ) admits a sufﬁciently ﬂat optimal solution, then equalityholds: f t = f ∞ (resp., f tr t = f tr ∞ ). Moreover, in this case, the parameter f tr t is equal tothe minimum value of tr ( f ( X )) over the matrix positivity domain D ( S ) . Let A be a completely positive semideﬁnite n × n matrix. For t ∈ N ∪ { ∞ } we con-sider the following semideﬁnite program, which, as we see below, lower bounds thecomplex completely positive semideﬁnite rank of A : ξ cpsd t ( A ) = min (cid:8) L ( ) : L ∈ R (cid:104) x , . . . , x n (cid:105) ∗ t tracial and symmetric , L ( x i x j ) = A i j for i , j ∈ [ n ] , L ≥ M t ( S cpsd A ) (cid:9) , where we set S cpsd A = (cid:8)(cid:112) A x − x , . . . , (cid:112) A nn x n − x n (cid:9) . (7)Additionally, deﬁne the parameter ξ cpsd ∗ ( A ) , obtained by adding the rank constraintrank ( M ( L )) < ∞ to the program deﬁning ξ cpsd ∞ ( A ) , where we consider the inﬁmum in-stead of the minimum since we do not know whether the inﬁmum is always attained. ower bounds on matrix factorization ranks via noncommutative polynomial optimization 11 (In Proposition 1 we show the inﬁmum is attained in ξ cpsd t ( A ) for t ∈ N ∪ { ∞ } ). Thisgives a hierarchy of monotone nondecreasing lower bounds on the completely posi-tive semideﬁnite rank: ξ cpsd1 ( A ) ≤ . . . ≤ ξ cpsd t ( A ) ≤ . . . ≤ ξ cpsd ∞ ( A ) ≤ ξ cpsd ∗ ( A ) ≤ cpsd-rank C ( A ) . The inequality ξ cpsd ∞ ( A ) ≤ ξ cpsd ∗ ( A ) is clear and monotonicity as well: If L is feasiblefor ξ cpsd k ( A ) with t ≤ k ≤ ∞ , then its restriction to R (cid:104) x (cid:105) t is feasible for ξ cpsd t ( A ) .The following notion of localizing polynomials will be useful. A set S ⊆ R (cid:104) x (cid:105) issaid to be localizing at a matrix tuple X if X ∈ D ( S ) (i.e., g ( X ) (cid:23) g ∈ S ) andwe say that S is localizing for A if S is localizing at some factorization X ∈ ( H d + ) n of A with d = cpsd-rank C ( A ) . The set S cpsd A as deﬁned in (7) is localizing for A , and,in fact, it is localizing at any factorization X of A by Hermitian positive semideﬁnitematrices. Indeed, since A ii = Tr ( X i ) ≥ λ max ( X i ) = λ max ( X i ) we have √ A ii X i − X i (cid:23) i ∈ [ n ] .We can now use this to show the inequality ξ cpsd ∗ ( A ) ≤ cpsd-rank C ( A ) . For thisset d = cpsd-rank C ( A ) , let X ∈ ( H d + ) n be a Gram factorization of A , and consider thelinear form L X ∈ R (cid:104) x (cid:105) ∗ deﬁned by L X ( p ) = Re ( Tr ( p ( X ))) for all p ∈ R (cid:104) x (cid:105) . By construction L X is symmetric and tracial, and we have A = ( L ( x i x j )) . Moreover,since the set of polynomials S cpsd A is localizing for A , the linear form L X is nonnegativeon M ( S cpsd A ) . Finally, we have rank ( M ( L X )) < ∞ , since the algebra generated by X , . . . , X n is ﬁnite dimensional. Hence, L X is feasible for ξ cpsd ∗ ( A ) with L X ( ) = d ,which shows ξ cpsd ∗ ( A ) ≤ cpsd-rank C ( A ) .The inclusions in (8) below show the quadratic module M ( S cpsd A ) is Archimedean(recall the deﬁnition in (3)). Moreover, although there are other possible choices forthe localizing polynomials to use in S cpsd A , these inclusions also show that the choicemade in (7) leads to the largest truncated quadratic module and thus to the best bound.For any scalar c >

0, we have the inclusions M t ( x , c − x ) ⊆ M t ( x , c − x ) ⊆ M t ( cx − x ) ⊆ M t + ( x , c − x ) , (8)which hold in light of the following identities: c − x = (cid:0) ( c − x ) + c − x (cid:1) / ( c ) , (9) c − x = ( c − x ) + ( cx − x ) , (10) cx − x = (cid:0) ( c − x ) x ( c − x ) + x ( c − x ) x (cid:1) / c , (11) x = (cid:0) ( cx − x ) + x (cid:1) / c . (12)In the rest of this section we investigate properties of the hierarchy { ξ cpsd t ( A ) } aswell as some variations on it. We discuss convergence properties, asymptotically andunder ﬂatness, and we give another formulation for the parameter ξ cpsd ∗ ( A ) . More-over, as the inequality ξ cpsd ∗ ( A ) ≤ cpsd-rank C ( A ) is typically strict, we present an approach to strengthen the bounds in order to go beyond ξ cpsd ∗ ( A ) . Then we proposesome techniques to simplify the computation of the bounds, and we illustrate thebehaviour of the bounds on some examples.2.1 The parameters ξ cpsd ∞ ( A ) and ξ cpsd ∗ ( A ) In this section we consider convergence properties of the hierarchy ξ cpsd t ( · ) , bothasymptotically and under ﬂatness. We also give equivalent reformulations of the lim-iting parameters ξ cpsd ∞ ( A ) and ξ cpsd ∗ ( A ) in terms of C ∗ -algebras with a tracial state,which we will use in Sections 2.3-2.4 to show properties of these parameters. Proposition 1

Let A ∈ CS n + . For t ∈ N ∪ { ∞ } the optimum in ξ cpsd t ( A ) is attained,and lim t → ∞ ξ cpsd t ( A ) = ξ cpsd ∞ ( A ) . Moreover, ξ cpsd ∞ ( A ) is equal to the smallest scalar α ≥ for which there exists aunital C ∗ -algebra A with tracial state τ and ( X , . . . , X n ) ∈ D A ( S cpsd A ) such thatA = α · ( τ ( X i X j )) .Proof The sequence ( ξ cpsd t ( A )) t is monotonically nondecreasing and upper boundedby ξ cpsd ∞ ( A ) < ∞ , which implies its limit exists and is at most ξ cpsd ∞ ( A ) .As ξ cpsd t ( A ) ≤ ξ cpsd ∞ ( A ) , we may add the redundant constraint L ( ) ≤ ξ cpsd ∞ ( A ) tothe problem ξ cpsd t ( A ) for every t ∈ N . By (10) we have Tr ( A ) − ∑ i x i ∈ M ( S cpsd A ) .Hence, using the result of Lemma 13, the feasible region of ξ cpsd t ( A ) is compact,and thus it has an optimal solution L t . Again by Lemma 13, the sequence ( L t ) hasa pointwise converging subsequence with limit L ∈ R (cid:104) x (cid:105) ∗ . This pointwise limit L issymmetric, tracial, satisﬁes ( L ( x i x j )) = A , and is nonnegative on M ( S cpsd A ) . Hence L is feasible for ξ cpsd ∞ ( A ) . This implies that L is optimal for ξ cpsd ∞ ( A ) and we havelim t → ∞ ξ cpsd t ( A ) = ξ cpsd ∞ ( A ) .The reformulation of ξ cpsd ∞ ( A ) in terms of C ∗ -algebras with a tracial state followsdirectly using Theorem 1. (cid:3) Next we give some equivalent reformulations for the parameter ξ cpsd ∗ ( A ) , whichfollow as a direct application of Theorem 2. In general we do not know whether theinﬁmum in ξ cpsd ∗ ( A ) is attained. However, as a direct application of Corollary 1, wesee that this inﬁmum is attained if there is an integer t ∈ N for which ξ cpsd t ( A ) admitsa ﬂat optimal solution. Proposition 2

Let A ∈ CS n + . The parameter ξ cpsd ∗ ( A ) is given by the inﬁmum of L ( ) taken over all conic combinations L of trace evaluations at elements in D A ( S cpsd A ) for which A = ( L ( x i x j )) . The parameter ξ cpsd ∗ ( A ) is also equal to the inﬁmum overall α ≥ for which there exist a ﬁnite dimensional C ∗ -algebra A with tracial state τ and ( X , . . . , X n ) ∈ D A ( S cpsd A ) such that A = α · ( τ ( X i X j )) .In addition, if ξ cpsd t ( A ) admits a ﬂat optimal solution, then ξ cpsd t ( A ) = ξ cpsd ∗ ( A ) . ower bounds on matrix factorization ranks via noncommutative polynomial optimization 13 Next we show a formulation for ξ cpsd ∗ ( A ) in terms of factorization by block-diagonal matrices, which helps explain why the inequality ξ cpsd ∗ ( A ) ≤ cpsd-rank C ( A ) is typically strict. Here (cid:107) · (cid:107) is the operator norm, so that (cid:107) X (cid:107) = λ max ( X ) for X (cid:23) Proposition 3

For A ∈ CS n + we have ξ cpsd ∗ ( A ) = inf (cid:110) M ∑ m = d m · max i ∈ [ n ] (cid:107) X mi (cid:107) A ii : M ∈ N , d , . . . , d M ∈ N , (13) X mi ∈ H d m + for i ∈ [ n ] , m ∈ [ M ] , A = Gram (cid:0) ⊕ Mm = X m , . . . , ⊕ Mm = X mn (cid:1)(cid:111) . Note that using matrices from S d m + instead of H d m + does not change the optimal value.Proof The proof uses the formulation of ξ cpsd ∗ ( A ) in terms of conic combinationsof trace evaluations at matrix tuples in D ( S cpsd A ) as given in Proposition 2. We ﬁrstshow the inequality β ≤ ξ cpsd ∗ ( A ) , where β denotes the optimal value of the programin (13).For this, assume L ∈ R (cid:104) x (cid:105) ∗ is a conic combination of trace evaluations at elementsof D ( S cpsd A ) such that A = ( L ( x i x j )) . We will construct a feasible solution for (13)with objective value L ( ) . The linear functional L can be written as L = M ∑ m = λ m L Y m , where λ m > Y m = ( Y m , . . . , Y mn ) ∈ D ( S cpsd A ) for m ∈ [ M ] . Let d m denote the size of the matrices Y m , . . . , Y mn , so that L ( ) = ∑ m λ m d m . Since Y m ∈ D ( S cpsd A ) , we have Y mi (cid:23) A ii I − ( Y mi ) (cid:23) (cid:107) Y mi (cid:107) ≤ A ii for all i ∈ [ n ] and m ∈ [ M ] . Deﬁne X m = √ λ m Y m . Then, L ( x i x j ) = ∑ m Tr ( X mi X mj ) , so that the matrices ⊕ m X m , . . . , ⊕ m X mn form a Gram decom-position of A . This gives a feasible solution to (13) with value M ∑ m = d m · max i ∈ [ n ] (cid:107) X mi (cid:107) A ii = M ∑ m = d m λ m max i ∈ [ n ] (cid:107) Y mi (cid:107) A ii ≤ M ∑ m = d m λ m = L ( ) , which shows β ≤ L ( ) , and hence β ≤ ξ cpsd ∗ ( A ) .For the other direction we assume A = Gram (cid:0) ⊕ Mm = X m , . . . , ⊕ Mm = X mn (cid:1) , X m , . . . , X mn ∈ S d m + for m ∈ [ M ] . Set λ m = max i ∈ [ n ] (cid:107) X mi (cid:107) / A ii , and deﬁne the linear form L by L = M ∑ m = λ m L Y m , where Y m = X m / (cid:112) λ m for all m ∈ [ M ] . We have L ( ) = ∑ m λ m d m and A = ( L ( x i x j )) , and thus it sufﬁces to show that eachmatrix tuple Y m belongs to D ( S cpsd A ) . For this we observe that λ m A ii ≥ (cid:107) X mi (cid:107) . There-fore λ m A ii I (cid:23) ( X mi ) , and thus A ii I (cid:23) ( Y mi ) , which implies √ A ii Y mi − ( Y mi ) (cid:23)

0. Thisshows ξ cpsd ∗ ( A ) ≤ L ( ) = ∑ m λ m d m , and thus ξ cpsd ∗ ( A ) ≤ β . (cid:3) We can say a bit more when the matrix A lies on an extreme ray of the cone CS n + .In the formulation from Proposition 3 it sufﬁces to restrict the minimization overfactorizations of A involving only one block. However, we know very little about theextreme rays of CS n + , also in view of the recent result that the cone is not closed forlarge n [73,23]. Proposition 4

If A lies on an extreme ray of the cone CS n + , then ξ cpsd ∗ ( A ) = inf (cid:110) d · max i ∈ [ n ] (cid:107) X i (cid:107) A ii : d ∈ N , X , . . . , X n ∈ H d + , A = Gram (cid:0) X , . . . , X n (cid:1)(cid:111) . Moreover, if ⊕ Mm = X m , . . . , ⊕ Mm = X mn is a Gram decomposition of A providing an opti-mal solution to (13) and some block X mi has rank , then ξ cpsd ∗ ( A ) = cpsd-rank C ( A ) .Proof Let β be the inﬁmum in Proposition 4. The inequality ξ cpsd ∗ ( A ) ≤ β followsfrom the reformulation of ξ cpsd ∗ ( A ) in Proposition 3. To show the reverse inequalitywe consider a solution ⊕ Mm = X m , . . . , ⊕ Mm = X mn to (13), and set λ m = max i (cid:107) X mi (cid:107) / A ii .We will show β ≤ ∑ m d m λ m . For this deﬁne the matrices A m = Gram ( X m , · · · , X mn ) , so that A = ∑ m A m . As A lies on an extreme ray of CS n + , we must have A m = α m A forsome α m > ∑ m α m =

1. Hence, since A = A m / α m = Gram ( X m / √ α m , · · · , X mn / √ α m ) , we have β ≤ d m λ m / α m for all m ∈ [ M ] . It sufﬁces now to use ∑ m α m = m d m λ m / α m ≤ ∑ m d m λ m . So we have shown β ≤ min m d m λ m / α m ≤ ∑ m d m λ m . Thisimplies β ≤ ξ cpsd ∗ ( A ) , and thus equality holds.Assume now that ⊕ Mm = X m , . . . , ⊕ Mm = X mn is optimal to (13) and that there is ablock X mi of rank 1. By Proposition 3 we have ∑ m d m λ m = ξ cpsd ∗ ( A ) . From the argu-ment just made above it follows that ξ cpsd ∗ ( A ) = min m d m λ m / α m = ∑ m d m λ m . As ∑ m α m = d m λ m / α m = min m d m λ m / α m for all m ; that is, all terms d m λ m / α m take the same value ξ cpsd ∗ ( A ) . By assumption there exist some m ∈ [ M ] and i ∈ [ n ] for which X mi has rank 1. Then (cid:107) X mi (cid:107) = (cid:104) X mi , X mi (cid:105) , which gives λ m = α m , andthus ξ cpsd ∗ ( A ) = d m . On the other hand, cpsd-rank C ( A ) ≤ d m since ( X mi / √ α m ) i formsa Gram decomposition of A , so equality ξ cpsd ∗ ( A ) = d m = cpsd-rank C ( A ) holds. (cid:3) ξ cpsd ∗ ( A ) In order to strengthen the bounds we may require nonnegativity over a (truncated)quadratic module generated by a larger set of localizing polynomials for A . The fol-lowing lemma gives one such approach. Lemma 1

Let A ∈ CS n + . For v ∈ R n and g v = v T Av − (cid:0) ∑ ni = v i x i (cid:1) , the set { g v } islocalizing for every Gram factorization by Hermitian positive semideﬁnite matricesof A (in particular, { g v } is localizing for A). ower bounds on matrix factorization ranks via noncommutative polynomial optimization 15 Proof If X , . . . , X n is a Gram decomposition of A by Hermitian positive semideﬁnitematrices, then v T Av = Tr (cid:0)(cid:16) n ∑ i = v i X i (cid:17) (cid:1) ≥ λ max (cid:0)(cid:16) n ∑ i = v i X i (cid:17) (cid:1) , hence v T AvI − ( ∑ ni = v i X i ) (cid:23) (cid:3) Given a set V ⊆ R n , we consider the larger set S cpsd A , V = S cpsd A ∪ { g v : v ∈ V } of localizing polynomials for A . For t ∈ N ∪ { ∞ , ∗} , denote by ξ cpsd t , V ( A ) the parame-ter obtained by replacing in ξ cpsd t ( A ) the nonnegativity constraint on M t ( S cpsd A ) bynonnegativity on the larger set M t ( S cpsd A , V ) . We have ξ cpsd t , /0 ( A ) = ξ cpsd t ( A ) and ξ cpsd t ( A ) ≤ ξ cpsd t , V ( A ) ≤ cpsd-rank C ( A ) for all V ⊆ R n . By scaling invariance, we can add the above constraints for all v ∈ R n by setting V to be the unit sphere S n − . Since S n − is a compact metric space, there exists asequence V ⊆ V ⊆ . . . ⊆ S n − of ﬁnite subsets such that (cid:83) k ≥ V k is dense in S n − .Each of the parameters ξ cpsd t , V k ( A ) involves ﬁnitely many localizing constraints, and, aswe now show, they converge to the parameter ξ cpsd t , S n − ( A ) . Proposition 5

Consider a matrix A ∈ CS n + . For t ∈ { ∞ , ∗} , we have lim k → ∞ ξ cpsd t , V k ( A ) = ξ cpsd t , S n − ( A ) . Proof

Let ε >

0. Since (cid:83) k V k is dense in S n − , there is an integer k ≥ u ∈ S n − there exists a vector v ∈ V k satisfying (cid:107) u − v (cid:107) ≤ ελ min ( A ) √ n max i A ii and (cid:107) u − v (cid:107) ≤ ελ min ( A ) ( A ) / . (14)The above Propositions 1 and 2 have natural analogues for the programs ξ cpsd t , V ( A ) .These show that for t = ∞ ( t = ∗ ) the parameter ξ cpsd t , V k ( A ) is the inﬁmum over all α ≥ C ∗ -algebra A with tracialstate τ and X ∈ D A ( S cpsd A , V k ) such that A = α · ( τ ( X i X j )) .Below we will show that X (cid:48) = √ − ε X ∈ D A ( S cpsd A , S n − ) . This implies that thelinear form L ∈ R (cid:104) x (cid:105) ∗ deﬁned by L ( p ) = α / ( − ε ) τ ( p ( X (cid:48) )) is feasible for ξ cpsd t , S n − ( A ) with objective value L ( ) = α / ( − ε ) . This shows ξ cpsd t , S n − ( A ) ≤ − ε ξ cpsd t , V k ( A ) ≤ − ε lim k → ∞ ξ cpsd t , V k ( A ) . Since ε > ε tend to 0 completes the proof.We now show X (cid:48) = √ − ε X ∈ D A ( S cpsd A , S n − ) . For this consider the map f X : S n − → R , v (cid:55)→ (cid:13)(cid:13)(cid:13) n ∑ i = v i X i (cid:13)(cid:13)(cid:13) , where (cid:107) · (cid:107) denotes the C ∗ -algebra norm of A . For α ∈ R and a ∈ A with a ∗ = a ,we have α ≥ (cid:107) a (cid:107) if and only if α − a (cid:23) A , or, equivalently, α − a (cid:23) A .Since X ∈ D A ( S cpsd A , V k ) we have v T Av − f X ( v ) ≥ v ∈ V k , and hence v T Av − f X (cid:48) ( v ) = v T Av (cid:16) − ( − ε ) f X ( v ) v T Av (cid:17) ≥ v T Av (cid:0) − ( − ε ) (cid:1) = ε v T Av ≥ ελ min ( A ) . Let u ∈ S n − and let v ∈ V k be such that (14) holds. Using Cauchy-Schwarz we have | u T Au − v T Av | = | ( u − v ) T A ( u + v ) | = |(cid:104) A , ( u − v )( u + v ) T (cid:105)|≤ (cid:113) Tr ( A ) (cid:113) Tr (( u + v )( u − v ) T ( u − v )( u + v ) T ) ≤ (cid:113) Tr ( A ) (cid:107) u − v (cid:107) (cid:107) u + v (cid:107) ≤ (cid:113) Tr ( A ) (cid:107) u − v (cid:107) ≤ (cid:113) Tr ( A ) ελ min ( A ) (cid:112) Tr ( A ) = ελ min ( A ) . Since √ A ii X i − X i is positive in A , we have that √ A ii − X i is positive in A by (9)and (10), which implies (cid:107) X i (cid:107) ≤ √ A ii . By the reverse triangle inequality we then have | f X (cid:48) ( u ) − f X (cid:48) ( v ) | = (cid:12)(cid:12)(cid:12)(cid:13)(cid:13) n ∑ i = u i X (cid:48) i (cid:13)(cid:13) − (cid:13)(cid:13) n ∑ i = v i X (cid:48) i (cid:13)(cid:13)(cid:12)(cid:12)(cid:12)(cid:16)(cid:13)(cid:13) n ∑ i = u i X (cid:48) i (cid:13)(cid:13) + (cid:13)(cid:13) n ∑ i = v i X (cid:48) i (cid:13)(cid:13)(cid:17) ≤ (cid:13)(cid:13) n ∑ i = ( v i − u i ) X (cid:48) i (cid:13)(cid:13) √ n max i (cid:112) A ii ≤ (cid:16) n ∑ i = | v i − u i |(cid:107) X (cid:48) i (cid:107) (cid:17) √ n max i (cid:112) A ii ≤ (cid:107) u − v (cid:107) √ n max i A ii ≤ ελ min ( A ) √ n max i A ii √ n max i A ii = ελ min ( A ) . Combining the above inequalities we obtain that u T Au − f X (cid:48) ( u ) ≥ S n − , andhence u T Au − (cid:0) ∑ ni = u i X (cid:48) i (cid:1) is positive in A . Thus we have X (cid:48) ∈ D A ( S cpsd A , S n − ) . (cid:3) We now discuss two examples where the bounds ξ cpsd ∗ , V ( A ) go beyond ξ cpsd ∗ ( A ) . Example 1

Consider the matrix A = (cid:18) / / (cid:19) = Gram (cid:16) (cid:18) (cid:19) , (cid:18) / / / / (cid:19) (cid:17) , (15)with cpsd-rank C ( A ) =

2. We can also write A = Gram ( Y , Y ) , where Y = √   , Y = √   . With X i = √ Y i we have I − X i (cid:23) i = ,

2. Hence the linear form L = L X / ξ cpsd ∗ ( A ) , which shows that ξ cpsd ∗ ( A ) ≤ L ( ) = /

2. In fact, this form ower bounds on matrix factorization ranks via noncommutative polynomial optimization 17 L gives an optimal ﬂat solution to ξ cpsd2 ( A ) , as we can check using a semideﬁniteprogramming solver, so ξ cpsd ∗ ( A ) = /

2. In passing, we observe that ξ cpsd1 ( A ) = / e = ( , ) ∈ R and V = { e } , this form L is not feasible for ξ cpsd ∗ , V ( A ) , becausefor the polynomial p = − x − x we have L ( p ∗ g e p ) = − / <

0. This means thatthe localizing constraint L ( p ∗ g e p ) ≥ t ≥ ξ cpsd t ( A ) . Indeed, using a semideﬁnite programming solverwe ﬁnd an optimal ﬂat solution of ξ cpsd3 , V ( A ) with objective value ( − √ ) / ≈ . ξ cpsd ∗ , V ( A ) = ( − √ ) / > / = ξ cpsd ∗ ( A ) . Example 2

Consider the symmetric circulant matrices M ( α ) =  α αα α α α

00 0 α αα α  for α ∈ R . For 0 ≤ α ≤ / M ( α ) ∈ CS + with cpsd-rank C ( M ( α )) ≤

5. To see this weset β = ( + √ − α ) / X i = Diag ( (cid:112) β e i + (cid:112) − β e i + ) ∈ S + , i ∈ [ ] , ( with e : = e ) , form a factorization of M ( α ) . As M ( α ) is supported by a cycle, we have M ( α ) ∈ CS + if and only if M ( α ) ∈ CP [50]. Thus M ( α ) ∈ CS + if and only if 0 ≤ α ≤ / ξ cpsd ∗ ( M ( / )) ≤ /

2. However, using a semideﬁnite program-ming solver we see that ξ cpsd2 , V ( M ( / )) = , where V is the set containing the vector ( , − , , − , ) and its cyclic shifts. Hencethe bound ξ cpsd2 , V ( M ( / )) is tight: It certiﬁes cpsd-rank C ( M ( / )) =

5, while theother known bounds, the rank bound (cid:112) rank ( A ) and the analytic bound (18), onlygive cpsd-rank C ( A ) ≥ < ε , δ < / C ( M ( α )) = α ∈ [ , ε ] ∪ [ δ , / ] . Indeed, this follows from the fact that ξ cpsd1 ( M ( )) = ξ cpsd2 , V ( M ( / )) =

5, and the lower semicontinuityof α (cid:55)→ ξ cpsd2 , V ( M ( α )) , which is shown in Lemma 7 below.As the matrices M ( α ) are nonsingular, the above factorization shows that theircp-rank is equal to 5 for all α ∈ [ , / ] ; whether they all have cpsd-rank equal to 5is not known. ξ cpsd t , V ( A ) for ﬁnite t . These constraints may shrink the feasibility regionof ξ cpsd t , V ( A ) for t ∈ N , but they are redundant for t ∈ { ∞ , ∗} . The latter is shown usingthe reformulation of the parameters ξ cpsd ∞ , V ( A ) and ξ cpsd ∗ , V ( A ) in terms of C ∗ -algebras.We ﬁrst mention how to construct localizing constraints of “bilinear type”, in-spired by the work of Berta, Fawzi and Scholz [8]. Note that as for localizing con-straints, these bilinear constraints can be modeled as semideﬁnite constraints. Lemma 2

Let A ∈ CS n + , t ∈ N ∪ { ∞ , ∗} , and let { g , g (cid:48) } be localizing for A. If we addthe constraintsL ( p ∗ gpg (cid:48) ) ≥ for p ∈ R (cid:104) x (cid:105) with deg ( p ∗ gpg (cid:48) ) ≤ t (16) to ξ cpsd t , V ( A ) , then we still get a lower bound on cpsd-rank C ( A ) . However, the con-straints (16) are redundant for ξ cpsd ∞ , V ( A ) and ξ cpsd ∗ , V ( A ) when g , g (cid:48) ∈ M ( S cpsd A , V ) .Proof Let X ∈ ( H d + ) n be a Gram decomposition of A , and let L = L X be the real partof the trace evaluation at X . Then, p ( X ) ∗ g ( X ) p ( X ) (cid:23) g (cid:48) ( X ) (cid:23)

0, and thus L ( p ∗ gpg (cid:48) ) = Re ( Tr ( p ( X ) ∗ g ( X ) p ( X ) g (cid:48) ( X ))) ≥ . So by adding the constraints (16) we still get a lower bound on cpsd-rank C ( A ) .To show that the constraints (16) are redundant for ξ cpsd t , V ( A ) and ξ cpsd ∗ , V ( A ) when g , g (cid:48) ∈ M ( S cpsd A , V ) , we let t ∈ { ∞ , ∗} and assume L is feasible for ξ cpsd t , V ( A ) . By Theo-rem 1 there exist a unital C ∗ -algebra A with tracial state τ and X ∈ D ( S cpsd A , V ) suchthat L ( p ) = L ( ) τ ( p ( X )) for all p ∈ R (cid:104) x (cid:105) . Since g , g (cid:48) ∈ M ( S cpsd A , V ) we know that g ( X ) , g (cid:48) ( X ) are positive elements in A , so g ( X ) = a ∗ a and g (cid:48) ( X ) = b ∗ b for some a , b ∈ A . Then we have L ( p ∗ gpg ) = L ( ) τ ( p ∗ ( X ) g ( X ) p ( X ) g (cid:48) ( X ))= L ( ) τ ( p ∗ ( X ) a ∗ a p ( X ) b ∗ b )= L ( ) τ (( a p ( X ) b ∗ ) ∗ a p ( X ) b ∗ ) ≥ , where we use that τ is a positive tracial state on A . (cid:3) Second, we show how to use zero entries in A and vectors in the kernel of A toenforce new constraints on ξ cpsd t , V ( A ) . Lemma 3

Let A ∈ CS n + and t ∈ N ∪ { ∞ , ∗} . If we add the constraintL = on I t (cid:0)(cid:8) n ∑ i = v i x i : v ∈ ker A (cid:9) ∪ (cid:8) x i x j : A i j = (cid:9)(cid:1) (17) to ξ cpsd t , V ( A ) , then we still get a lower bound on cpsd-rank C ( A ) . Moreover, these con-straints are redundant for ξ cpsd ∞ , V ( A ) and ξ cpsd ∗ , V ( A ) . ower bounds on matrix factorization ranks via noncommutative polynomial optimization 19 Proof

Let X ∈ ( H d + ) n be a Gram factorization of A and let L X be as in (5). If Av = = v T Av = Tr (( ∑ ni = v i X i ) ) and thus ∑ ni = v i X i =

0. Hence L X (( ∑ nI = v i x i ) p ) = Re ( Tr (( ∑ ni = v i X i ) p ( X ))) =

0. If A i j =

0, then Tr ( X i X j ) =

0, which implies X i X j = X i and X j are positive semideﬁnite. Hence L X ( x i x i p ) = Re ( Tr ( X i X j p ( X ))) = C ( A ) .As in the proof of the previous lemma, if t ∈ { ∞ , ∗} and L is feasible for ξ cpsd t , V ( A ) then, by Theorem 1, there exist a unital C ∗ -algebra A with tracial state τ and X in D ( S cpsd A , V ) such that L ( p ) = L ( ) τ ( p ( X )) for all p ∈ R (cid:104) x (cid:105) . Moreover, by Lemma 12we may assume τ to be faithful. For a vector v in the kernel of A we have 0 = v T Av = L (( ∑ i v i x i ) ) = L ( ) τ (( ∑ i v i X i ) ) , and hence, since τ is faithful, ∑ i v i X i = A .It follows that L ( p ( ∑ i v i x i )) = L ( ) τ ( p ( X ) ) = p ∈ R (cid:104) x (cid:105) . Analogously, if A i j =

0, then L ( x i x j ) = τ ( X i X j ) = X i X j =

0, since X i , X j arepositive in A and τ is faithful. This implies L ( px i x j ) = p ∈ R (cid:104) x (cid:105) . Thisshows that the constraints (17) are redundant. (cid:3) Note that the constraints L ( p ( ∑ ni = v i x i )) = p ∈ R (cid:104) x (cid:105) t , which are implied by (17),are in fact redundant: if v ∈ ker ( A ) , then the vector obtained by extending v with ze-ros belongs to ker ( M t ( L )) , since M t ( L ) (cid:23)

0. Also, for an implementation of ξ cpsd t ( A ) with the additional constraints (17), it is more efﬁcient to index the moment matriceswith a basis for R (cid:104) x (cid:105) t modulo the ideal I t (cid:0) { ∑ i v i x i : v ∈ ker ( A ) } ∪ { x i x j : A i j = } (cid:1) .2.4 Additional properties of the boundsHere we list some additional properties of the parameters ξ cpsd t ( A ) for t ∈ N ∪ { ∞ , ∗} .First we state some properties for which the proofs are immediate and thus omitted. Lemma 4

Suppose A ∈ CS n + and t ∈ N ∪ { ∞ , ∗} .(1) If P is a permutation matrix, then ξ cpsd t ( A ) = ξ cpsd t ( P T AP ) .(2) If B is a principal submatrix of A, then ξ cpsd t ( B ) ≤ ξ cpsd t ( A ) .(3) If D is a positive deﬁnite diagonal matrix, then ξ cpsd t ( A ) = ξ cpsd t ( DAD ) . We also have the following direct sum property, where the equality follows using the C ∗ -algebra reformulations as given in Proposition 1 and Proposition 2. Lemma 5

If A ∈ CS n + and B ∈ CS m + , then ξ cpsd t ( A ⊕ B ) ≤ ξ cpsd t ( A ) + ξ cpsd t ( B ) , whereequality holds for t ∈ { ∞ , ∗} .Proof To prove the inequality we take L A and L B feasible for ξ cpsd t ( A ) and ξ cpsd t ( B ) ,and construct a feasible L for ξ cpsd t ( A ⊕ B ) by L ( p ( x , y )) = L A ( p ( x , )) + L B ( p ( , y )) .Now we show equality for t = ∞ ( t = ∗ ). By Proposition 1 (Proposition 2), ξ cpsd t ( A ⊕ B ) is equal to the inﬁmum over all α ≥ C ∗ -algebra A with tracial state τ and ( X , Y ) ∈ D A ( S cpsd A ⊕ B ) such that A = α · ( τ ( X i X j )) , B = α · ( τ ( Y i Y j )) and ( τ ( X i Y j )) =

0. This implies X ∈ D A ( S cpsd A ) and Y ∈ D A ( S cpsd B ) . Let P A be the projection onto the space ∑ i Im ( X i ) and deﬁne the linear form L A ∈ R (cid:104) x (cid:105) ∗ by L A ( p ) = α · τ ( p ( X ) P A ) . It follows that L A is is nonnegativeon M ( S cpsd A ) , and L A ( x i x j ) = α τ ( x i x j P A ) = α τ ( x i x j ) = A i j , so L A is feasible for ξ cpsd ∞ ( A ) with L A ( ) = ατ ( P A ) . In the same way we consider theprojection P B onto the space ∑ j Im ( Y j ) and deﬁne a feasible solution L B for ξ cpsd t ( B ) with L B ( ) = ατ ( P B ) . By Lemma 12 we may assume τ to be faithful, so that positivityof X i and Y j together with τ ( X i Y j ) = X i Y j = i and j , and thus ∑ i Im ( X i ) ⊥ ∑ j Im ( Y j ) . This implies I (cid:23) P A + P B and thus τ ( P A + P B ) ≤ τ ( ) =

1. Wehave L A ( ) + L B ( ) = α τ ( P A ) + ατ ( P B ) ≤ α τ ( ) = α , so ξ cpsd t ( A ) + ξ cpsd t ( B ) ≤ L A ( ) + L B ( ) ≤ α , completing the proof. (cid:3) Note that the cpsd-rank of a matrix satisﬁes the same properties as those mentionedin the above two lemmas, where the inequality in Lemma 5 is always an equality:cpsd-rank C ( A ⊕ B ) = cpsd-rank C ( A ) + cpsd-rank C ( B ) [62,39].The following lemma shows that the ﬁrst level of our hierarchy is at least as goodas the analytic lower bound (18) on the cpsd-rank derived in [62, Theorem 10]. Lemma 6

For any non-zero matrix A ∈ CS n + we have ξ cpsd1 ( A ) ≥ (cid:0) ∑ ni = √ A ii (cid:1) ∑ ni , j = A i j . (18) Proof

Let L be feasible for ξ cpsd1 ( A ) . Since L is nonnegative on M ( S cpsd A ) , it followsthat L ( √ A ii x i − x i ) ≥

0, implying √ A ii L ( x i ) ≥ L ( x i ) = A ii and thus L ( x i ) ≥ √ A ii .Moreover, the matrix M ( L ) is positive semideﬁnite. By taking the Schur comple-ment with respect to its upper left corner (indexed by 1) it follows that the matrix L ( ) · A − ( L ( x i ) L ( x j )) is positive semideﬁnite. Hence the sum of its entries is non-negative, which gives L ( )( ∑ i , j A i j ) ≥ ( ∑ i L ( x i )) ≥ ( ∑ i √ A ii ) and shows the desiredinequality. (cid:3) As an application of Lemma 6, the ﬁrst bound ξ cpsd1 is exact for the k × k identitymatrix: ξ cpsd1 ( I k ) = cpsd-rank C ( I k ) = k . Moreover, by combining this with Lemma 4,it follows that ξ cpsd1 ( A ) ≥ k if A contains a diagonal positive deﬁnite k × k principalsubmatrix. A slightly more involved example is given by the 5 × A whose entries are given by A i j = cos (( i − j ) π / ) ( i , j ∈ [ ] ); this matrix was usedin [26] to show a separation between the completely positive semideﬁnite cone andthe completely positive cone, and it was shown that cpsd-rank C ( A ) =

2. The analyticlower bound of [62] also evaluates to 2, hence Lemma 6 shows that our bound is tighton this example.We now examine further analytic properties of the parameters ξ cpsd t ( · ) . For each r ∈ N , the set of matrices A ∈ CS n + with cpsd-rank C ( A ) ≤ r is closed, which showsthat the function A (cid:55)→ cpsd-rank C ( A ) is lower semicontinuous. We now show thatthe functions A (cid:55)→ ξ cpsd t ( A ) have the same property. The other bounds deﬁned in thispaper are also lower semicontinuous, with a similar proof. ower bounds on matrix factorization ranks via noncommutative polynomial optimization 21 Lemma 7

For every t ∈ N ∪ { ∞ } and V ⊆ R n , the function S n → R ∪ { ∞ } , A (cid:55)→ ξ cpsd t , V ( A ) is lower semicontinuous.Proof It sufﬁces to show the result for t ∈ N , because ξ cpsd ∞ , V ( A ) = sup t ξ cpsd t , V ( A ) , andthe pointwise supremum of lower semicontinuous functions is lower semicontinuous.We show that the level sets { A ∈ S n : ξ cpsd t , V ( A ) ≤ r } are closed. For this we considera sequence ( A k ) k ∈ N in S n converging to A ∈ S n such that ξ cpsd t , V ( A k ) ≤ r for all k .We show that ξ cpsd t , V ( A ) ≤ r . Let L k ∈ R (cid:104) x (cid:105) ∗ t be an optimal solution to ξ cpsd t , V ( A k ) . As L k ( ) ≤ r for all k , it follows from Lemma 13 that there is a pointwise convergingsubsequence of ( L k ) k , still denoted ( L k ) k for simplicity, that has a limit L ∈ R (cid:104) x (cid:105) ∗ t with L ( ) ≤ r . To complete the proof we show that L is feasible for ξ cpsd t , V ( A ) . By thepointwise convergence of L k to L , for every ε > p ∈ R (cid:104) x (cid:105) , and i ∈ [ n ] , there existsa K ∈ N such that for all k ≥ K we have | L ( p ∗ x i p ) − L k ( p ∗ x i p ) | < min { , ε √ A ii } , | L ( p ∗ x i p ) − L k ( p ∗ x i p ) | < ε , | (cid:112) A ii − (cid:112) ( A k ) ii | < ε L ( p ∗ x i p ) + . Hence we have L ( p ∗ ( (cid:112) A ii x i − x i ) p ) = (cid:112) A ii (cid:16) L ( p ∗ x i p ) − L k ( p ∗ x i p ) + L k ( p ∗ x i p ) (cid:17) − (cid:16) L ( p ∗ x i p ) − L k ( p ∗ x i p ) + L k ( p ∗ x i p ) (cid:17) ≥ − ε + (cid:112) A ii L k ( p ∗ x i p ) − L k ( p ∗ x i p ) ≥ − ε + (cid:112) ( A k ) ii L k ( p ∗ x i p ) − L k ( p ∗ x i p )= − ε + L k ( p ∗ ( (cid:112) ( A k ) ii x i − x i ) p ) ≥ − ε , where in the second inequality we use that 0 ≤ L k ( p ∗ x i p ) ≤ L ( p ∗ x i p ) +

1. Letting ε → L ( p ∗ ( √ A ii x i − x i ) p ) ≥ L ( p ∗ ( v T Av − ( ∑ i v i x i ) ) p ) ≥ v ∈ V , p ∈ R (cid:104) x (cid:105) . (cid:3) If we restrict to completely positive semideﬁnite matrices with an all-ones diagonal,that is, to CS n + ∩ E n , we can show an even stronger property. Here E n is the elliptope ,which is the set of n × n positive semideﬁnite matrices with an all-ones diagonal. Lemma 8

For every t ∈ N ∪ { ∞ } , the function CS n + ∩ E n → R , A (cid:55)→ ξ cpsd t ( A ) is convex, and hence continuous on the interior of its domain.Proof Let A , B ∈ CS n + ∩ E n and 0 < λ <

1. Let L A and L B be optimal solutionsfor ξ cpsd t ( A ) and ξ cpsd t ( B ) . Since the diagonals of A and B are the same, we have S cpsd A = S cpsd B . So L = λ L A + ( − λ ) L B is feasible for ξ cpsd t ( λ A + ( − λ ) B ) , hence ξ cpsd t ( λ A + ( − λ ) B ) ≤ λ L A ( ) + ( − λ ) L B ( ) = λ ξ cpsd t ( A )+( − λ ) ξ cpsd t ( B ) . (cid:3) Example 3

In this example we show that for t ≥

1, the functionCS n + → R , A (cid:55)→ ξ cpsd t ( A ) is not continuous. For this we consider the matrices A k = (cid:18) / k

00 1 (cid:19) ∈ CS + , with cpsd-rank C ( A k ) = k ≥

1. As A k is diagonal positive deﬁnite we have ξ cpsd t ( A k ) = t , k ≥

1, while ξ cpsd t ( lim k → ∞ A k ) =

1. This argument extends toCS n + with n >

2. This example also shows that the ﬁrst level of the hierarchy ξ cpsd1 ( · ) can be strictly better than the analytic lower bound (18) of [62]. Example 4

In this example we determine ξ cpsd t ( A ) for all t ≥ A ∈ CS + . In viewof Lemma 4(3) we only need to ﬁnd ξ cpsd t ( A ( α )) for 0 ≤ α ≤

1, where A ( α ) = (cid:0) αα (cid:1) . The ﬁrst bound ξ cpsd1 ( A ( α )) is equal to the analytic bound 2 / ( α + ) from (18),where the equality follows from the fact that L given by L ( x i x j ) = A ( α ) i j , L ( x ) = L ( x ) = L ( ) = / ( α + ) is feasible for ξ cpsd1 ( A ( α )) .For t ≥ ξ cpsd t ( A ( α )) = − α . By the above this is true for α = α =

1, and in Example 1 we show ξ cpsd t ( A ( / )) = / t ≥

2. The claim thenfollows since the function α (cid:55)→ ξ cpsd t ( A ( α )) is convex by Lemma 8. The best current approach for lower bounding the completely positive rank of a ma-trix is due to Fawzi and Parrilo [28]. Their approach relies on the atomicity of thecompletely positive rank, that is, the fact that cp-rank ( A ) = r if and only if A has anatomic decomposition A = ∑ rk = v k v T k for nonnegative vectors v k . In other words, ifcp-rank ( A ) = r , then A / r can be written as a convex combination of r rank one posi-tive semideﬁnite matrices v k v T k that satisfy 0 ≤ v k v T k ≤ A and v k v T k (cid:22) A . Based on thisobservation Fawzi and Parrilo deﬁne the parameter τ cp ( A ) = min (cid:110) α : α ≥ , A ∈ α · conv (cid:8) R ∈ S n : 0 ≤ R ≤ A , R (cid:22) A , rank ( R ) ≤ (cid:9)(cid:111) , as lower bound for cp-rank ( A ) . They also deﬁne the semideﬁnite programming pa-rameter τ soscp ( A ) = min (cid:8) α : α ∈ R , X ∈ S n , (cid:18) α vec ( A ) T vec ( A ) X (cid:19) (cid:23) , X ( i , j ) , ( i , j ) ≤ A i j for 1 ≤ i , j ≤ n , X ( i , j ) , ( k , l ) = X ( i , l ) , ( k , j ) for 1 ≤ i < k ≤ n , ≤ j < l ≤ n , X (cid:22) A ⊗ A (cid:9) , ower bounds on matrix factorization ranks via noncommutative polynomial optimization 23 as an efﬁciently computable relaxation of τ cp ( A ) , and they show rank ( A ) ≤ τ soscp ( A ) .Therefore we have rank ( A ) ≤ τ soscp ( A ) ≤ τ cp ( A ) ≤ cp-rank ( A ) . Instead of the atomic point of view, here we take the matrix factorization perspec-tive, which allows us to obtain bounds by adapting the techniques from Section 2 tothe commutative setting. Indeed, we may view a factorization A = ( a T i a j ) by nonneg-ative vectors as a factorization by diagonal (and thus pairwise commuting) positivesemideﬁnite matrices.Before presenting the details of our hierarchy of lower bounds, we mention someof our results in order to make the link to the parameters τ soscp ( A ) and τ cp ( A ) . Thedirect analogue of { ξ cpsd t ( A ) } in the commutative setting leads to a hierarchy thatdoes not converge to τ cp ( A ) , but we provide two approaches to strengthen it that doconverge to τ cp ( A ) . The ﬁrst approach is based on a generalization of the tensor con-straints in τ soscp ( A ) . We also provide a computationally more efﬁcient version of thesetensor constraints, leading to a hierarchy whose second level is at least as good as τ soscp ( A ) while being deﬁned by a smaller semideﬁnite program. The second approachrelies on adding localizing constraints for vectors in the unit sphere as in Section 2.2.The following hierarchy is a commutative analogue of the hierarchy from Sec-tion 2, where we may now add the localizing polynomials A i j − x i x j for the pairs1 ≤ i < j ≤ n , which was not possible in the noncommutative setting of the com-pletely positive semideﬁnite rank. For each t ∈ N ∪ { ∞ } we consider the semideﬁniteprogram ξ cp t ( A ) = min (cid:8) L ( ) : L ∈ R [ x , . . . , x n ] ∗ t , L ( x i x j ) = A i j for i , j ∈ [ n ] , L ≥ M t ( S cp A ) (cid:9) , where we set S cp A = (cid:8)(cid:112) A ii x i − x i : i ∈ [ n ] (cid:9) ∪ (cid:8) A i j − x i x j : 1 ≤ i < j ≤ n (cid:9) . We additionally deﬁne ξ cp ∗ ( A ) by adding the constraint rank ( M ( L )) < ∞ to ξ cp ∞ ( A ) .We also consider the strengthening ξ cp t , † ( A ) , where we add to ξ cp t ( A ) the positivityconstraints L ( gu ) ≥ g ∈ { } ∪ S cp A and u ∈ [ x ] t − deg ( g ) (19)and the tensor constraints ( L (( ww (cid:48) ) c )) w , w (cid:48) ∈(cid:104) x (cid:105) = l (cid:22) A ⊗ l for all integers 2 ≤ l ≤ t , (20)which generalize the case l = τ soscp ( A ) . Here, for a word w ∈ (cid:104) x (cid:105) , we denote by w c the corresponding (commutative) monomial in [ x ] . Thetensor constraints (20) involve matrices indexed by the noncommutative words oflength exactly l . In Section 3.4 we show a more economical way to rewrite these con-straints as ( L ( mm (cid:48) )) m , m (cid:48) ∈ [ x ] = l (cid:22) Q l A ⊗ l Q T l , thus involving smaller matrices indexedby commutative words of degree l . Note that, as before, we can strengthen the bounds by adding other localizingpolynomials to the set S cp A . In particular, we can follow the approach of Section 2.2.Another possibility is to add localizing constraints speciﬁc to the commutative set-ting: we can add each monomial u ∈ [ x ] to S cp A (see Section 3.5.2 for an example).The bounds ξ cp t ( A ) and ξ cp t , † ( A ) are monotonically nondecreasing in t and theyare invariant under simultaneously permuting the rows and columns of A and underscaling a row and column of A by a positive number. In Propositions 6 and 7 we show τ soscp ( A ) ≤ ξ cp t , † ( A ) ≤ τ cp ( A ) for t ≥ , and in Proposition 10 we show the equality ξ cp ∗ , † ( A ) = τ cp ( A ) .3.1 Comparison to τ soscp ( A ) We ﬁrst show that the semideﬁnite programs deﬁning ξ cp t , † ( A ) are valid relaxations forthe completely positive rank. More precisely, we show that they lower bound τ cp ( A ) . Proposition 6

For A ∈ CP n and t ∈ N ∪ { ∞ , ∗} we have ξ cp t , † ( A ) ≤ τ cp ( A ) .Proof It sufﬁces to show the inequality for t = ∗ . For this consider a decomposi-tion A = α ∑ rk = λ k R k , where α ≥ λ k > ∑ rk = λ k =

1, 0 ≤ R k ≤ A , R k (cid:22) A , andrank R k =

1. There are nonnegative vectors v k such that R k = v k v T k . Deﬁne the linearmap L ∈ R [ x ] ∗ by L = α ∑ rk = λ k L v k , where L v k is the evaluation at v k mapping anypolynomial p ∈ R [ x ] to p ( v k ) .The equality ( L ( x i x j )) = A follows from the identity A = α ∑ rk = λ k R k . The con-straints L (( √ A ii x i − x i ) p ) ≥ L v k ( (cid:112) A ii x i − x i ) p ) = ( (cid:112) A ii ( v k ) i − ( v k ) i ) p ( v k ) ≥ , where we use that ( v k ) i ≥ ( v k ) i = ( R k ) ii ≤ A ii implies ( v k ) i ≤ ( v k ) i √ A ii . Theconstraints L (( A i j − x i x j ) p ) ≥ L ( gu ) ≥ g ∈ { } ∪ S cp A and u ∈ [ x ] follow in a similar way.It remains to be shown that X l (cid:22) A ⊗ l for all l , where we set X l = ( L ( uv )) u , v ∈(cid:104) x (cid:105) = l .Note that X = A . We adapt the argument used in [28] to show X l (cid:22) A ⊗ l using in-duction on l ≥

2. Suppose A ⊗ ( l − ) (cid:23) X l − . Combining A − R k (cid:23) R k (cid:23) ( A − R k ) ⊗ R ⊗ ( l − ) k (cid:23) A ⊗ R ⊗ ( l − ) k (cid:23) R ⊗ lk for each k . Scale by factor αλ k and sum over k to get A ⊗ X l − = ∑ k αλ k A ⊗ R ⊗ ( l − ) k (cid:23) ∑ k αλ k R ⊗ lk = X l . Finally, combining with A ⊗ ( l − ) − X l − (cid:23) A (cid:23)

0, we obtain A ⊗ l = A ⊗ ( A ⊗ ( l − ) − X l − ) + A ⊗ X l − (cid:23) A ⊗ X l − (cid:23) X l . (cid:3) ower bounds on matrix factorization ranks via noncommutative polynomial optimization 25 Now we show that the new parameter ξ cp2 , † ( A ) is at least as good as τ soscp ( A ) . Laterin Section 3.5.1 we will give an example where the inequality is strict. Proposition 7

For A ∈ CP n we have τ soscp ( A ) ≤ ξ cp2 , † ( A ) . Proof

Let L be feasible for ξ cp2 , † ( A ) . We will construct a feasible solution to the pro-gram deﬁning τ soscp ( A ) with objective value L ( ) , which implies τ soscp ( A ) ≤ L ( ) andthus the desired inequality. For this set α = L ( ) and deﬁne the symmetric n × n matrix X by X ( i , j ) , ( k , l ) = L ( x i x j x k x l ) for i , j , k , l ∈ [ n ] . Then the matrix M : = (cid:18) α vec ( A ) T vec ( A ) X (cid:19) is positive semideﬁnite. This follows because M is obtained from the principal sub-matrix of M ( L ) indexed by the monomials 1 and x i x j (1 ≤ i ≤ j ≤ n ) where therows/columns indexed by x j x i with 1 ≤ i < j ≤ n are duplicates of the rows/columnsindexed by x i x j .We have L (( A i j − x i x j ) x i x j ) ≥ i , j : For i (cid:54) = j this follows using the con-straint L (( A i j − x i x j ) u ) ≥ u = x i x j (from (19)), and for i = j this follows from L (( A ii − x i ) x i ) = L (( (cid:112) A ii − x i ) + ( (cid:112) A ii x i − x i )) ≥ , which holds because of (10), the constraint L ( p ) ≥ ( p ) ≤

2, and the con-straint L ( √ A ii x i − x i ) ≥

0. Using L ( x i x j ) = A i j , we get X ( i , j ) , ( i , j ) = L ( x i x j ) ≤ A i j . We also have X ( i , j ) , ( k , l ) = L ( x i x j x k x l ) = L ( x i x l x k x j ) = X ( i , l ) , ( k , j ) , and the constraint ( L ( uv )) u , v ∈(cid:104) x (cid:105) = (cid:22) A ⊗ implies X (cid:22) A ⊗ A . (cid:3) ξ cp t ( A ) . Note that unlikein Section 2 where we can only claim the inequality ξ cpsd ∞ ( A ) ≤ ξ cpsd ∗ ( A ) , here we canshow the equality ξ cp ∞ ( A ) = ξ cp ∗ ( A ) . This is because we can use Theorem 7, whichpermits to represent certain truncated linear functionals by ﬁnite atomic measures. Proposition 8

Let A ∈ CP n . For every t ∈ N ∪ { ∞ , ∗} the optimum in ξ cp t ( A ) is at-tained, and ξ cp t ( A ) → ξ cp ∞ ( A ) = ξ cp ∗ ( A ) as t → ∞ . If ξ cp t ( A ) admits a ﬂat optimalsolution, then ξ cp t ( A ) = ξ cp ∞ ( A ) . Moreover, ξ cp ∞ ( A ) = ξ cp ∗ ( A ) is the minimum value ofL ( ) taken over all conic combinations L of evaluations at elements of D ( S cp A ) satis-fying A = ( L ( x i x j )) .Proof We may assume A (cid:54) =

0. Since √ A ii x i − x i ∈ S cp A for all i , using (10) we obtainthat Tr ( A ) − ∑ i x i ∈ M ( S cp A ) . By adapting the proof of Proposition 1 to the com-mutative setting, we see that the optimum in ξ cp t ( A ) is attained for t ∈ N ∪ { ∞ } , and ξ cp t ( A ) → ξ cp ∞ ( A ) as t → ∞ .We now show the inequality ξ cp ∗ ( A ) ≤ ξ cp ∞ ( A ) , which implies that equality holds.For this, let L be optimal for ξ cp ∞ ( A ) . By Theorem 7, the restriction of L to R [ x ] extends to a conic combination of evaluations at points in D ( S cp A ) . It follows that this extension is feasible for ξ cp ∗ ( A ) with the same objective value. This shows that ξ cp ∗ ( A ) ≤ ξ cp ∞ ( A ) , that the optimum in ξ cp ∗ ( A ) is attained, and that ξ cp ∗ ( A ) is the mini-mum of L ( ) over all conic combinations L of evaluations at elements of D ( S cp A ) suchthat A = ( L ( x i x j )) . Finally, by Theorem 6 we have ξ cp t ( A ) = ξ cp ∞ ( A ) if ξ cp t ( A ) admitsa ﬂat optimal solution. (cid:3) Next, we give a reformulation for the parameter ξ cp ∗ ( A ) , which is similar to the for-mulation of τ cp ( A ) , although it lacks the constraint R (cid:22) A which is present in τ cp ( A ) . Proposition 9

We have ξ cp ∗ ( A ) = min (cid:110) α : α ≥ , A ∈ α · conv (cid:8) R ∈ S n : 0 ≤ R ≤ A , rank ( R ) ≤ (cid:9)(cid:111) . Proof

This follows directly from the reformulation of ξ cp ∗ ( A ) in Proposition 8 interms of conic evaluations at points in D ( S cp A ) after observing that, for v ∈ R n , wehave v ∈ D ( S cp A ) if and only if the matrix R = vv T satisﬁes 0 ≤ R ≤ A . (cid:3) τ cp ( A ) The reformulation of the parameter ξ cp ∗ ( A ) in Proposition 9 differs from τ cp ( A ) in thatthe constraint R (cid:22) A is missing. In order to have a hierarchy converging to τ cp ( A ) weneed to add constraints to enforce that L can be decomposed as a conic combinationof evaluation maps at nonnegative vectors v satisfying vv T (cid:22) A . Here we present twoways to achieve this goal. First we show that the tensor constraints (20) sufﬁce inthe sense that ξ cp ∗ , † ( A ) = τ cp ( A ) (note that the constraints (19) are not needed for thisresult). However, because of the special form of the tensor constraints we do notknow whether ξ cp t , † ( A ) admitting a ﬂat optimal solution implies ξ cp t , † ( A ) = ξ cp ∗ , † ( A ) , andwe do not know whether ξ cp ∞ , † ( A ) = ξ cp ∗ , † ( A ) . Second, we adapt the approach of addingadditional localizing constraints from Section 2.2 to the commutative setting, wherewe do show ξ cp ∞ , S n − ( A ) = ξ cp ∗ , S n − ( A ) = τ cp ( A ) . This yields a doubly indexed sequenceof semideﬁnite programs whose optimal values converge to τ cp ( A ) . Proposition 10

Let A ∈ CP n . For every t ∈ N ∪ { ∞ } the optimum in ξ cp t , † ( A ) is at-tained. We have ξ cp t , † ( A ) → ξ cp ∞ , † ( A ) as t → ∞ and ξ cp ∗ , † ( A ) = τ cp ( A ) .Proof The attainment of the optima in ξ cp t , † ( A ) for t ∈ N ∪ { ∞ } and the convergenceof ξ cp t , † ( A ) to ξ cp ∞ , † ( A ) can be shown in the same way as the analogue statements for ξ cp t ( A ) in Proposition 8.We have seen the inequality ξ cp ∗ , † ( A ) ≤ τ cp ( A ) in Proposition 6. Now we show thereverse inequality. Let L be feasible for ξ cp ∗ , † ( A ) . We will show that L is feasible for τ cp ( A ) , which implies τ cp ( A ) ≤ L ( ) and thus τ cp ( A ) ≤ ξ cp ∗ , † ( A ) .By Proposition 7 and the fact that rank ( A ) ≤ τ soscp ( A ) we have L ( ) > A (cid:54) = L = L ( ) K ∑ k = λ k L v k , ower bounds on matrix factorization ranks via noncommutative polynomial optimization 27 where λ k > ∑ k λ k =

1, and L v k is an evaluation map at a point v k ∈ D ( S cp A ) . Wedeﬁne the matrices R k = v k v T k , so that A = L ( ) ∑ Kk = R k . The matrices R k satisfy0 ≤ R k ≤ A since v k ∈ D ( S cp A ) . Clearly also R k (cid:23)

0. It remains to show that R k (cid:22) A .For this we use the tensor constraints (20). Using that L is a conic combination ofevaluation maps, we may rewrite these constraints as L ( ) K ∑ k = λ k R ⊗ lk (cid:22) A ⊗ l , from which it follows that L ( ) λ k R ⊗ lk (cid:22) A ⊗ l for all k ∈ [ K ] . Therefore, for all k ∈ [ K ] and all vectors v with v T R k v > L ( ) λ k ≤ (cid:18) v T Avv T R k v (cid:19) l for all l ∈ N . (21)Suppose there is a k such that R k (cid:54)(cid:22) A . Then there exists a v such that v T R k v > v T Av .As ( v T Av ) / ( v T R k v ) <

1, letting l tend to ∞ we obtain L ( ) λ k =

0, reaching a contra-diction. It follows that R k (cid:22) A for all k ∈ [ K ] . (cid:3) The second approach for reaching τ cp ( A ) is based on using the extra localizingconstraints from Section 2.2. For a subset V ⊆ S n − , deﬁne ξ cp t , V ( A ) by replacing thetruncated quadratic module M t ( S cp A ) in ξ cp t ( A ) by M t ( S cp A , V ) , where S cp A , V = S cp A ∪ (cid:8) v T Av − (cid:16) n ∑ i = v i x i (cid:17) : v ∈ V (cid:9) . Proposition 5 can be adapted to the completely positive setting, so that we have asequence of ﬁnite subsets V ⊆ V ⊆ . . . ⊆ S n − with ξ cp ∗ , V k ( A ) → ξ cp ∗ , S n − ( A ) as k → ∞ .Proposition 8 still holds when adding extra localizing constraints, so that for any k ≥ t → ∞ ξ cp t , V k ( A ) = ξ cp ∗ , V k ( A ) . Combined with Proposition 11 this shows that we have a doubly indexed sequence ξ cp t , V k ( A ) of semideﬁnite programs that converges to τ cp ( A ) as t → ∞ and k → ∞ . Proposition 11

For A ∈ CP n we have ξ cp ∗ , S n − ( A ) = τ cp ( A ) .Proof The proof is the same as the proof of Proposition 9, with the following addi-tional observation: Given a vector u ∈ R n , we have u ∈ D ( S cp A , S n − ) only if uu T (cid:22) A .The latter follows from the additional localizing constraints: for each v ∈ R n we have0 ≤ v T Av − (cid:16) ∑ i v i u i (cid:17) = v T ( A − uu T ) v . (cid:3) l ≥ A ⊗ l − ( L (( ww (cid:48) ) c )) w , w (cid:48) ∈(cid:104) x (cid:105) = l (cid:23) ξ cp t , + ( A ) , can be reformulated in a more economical wayusing matrices indexed by commutative monomials in [ x ] = l instead of noncommu-tative words in (cid:104) x (cid:105) = l . For this we exploit the symmetry in the matrices A ⊗ l and ( L (( ww (cid:48) ) c )) w , w (cid:48) ∈(cid:104) x (cid:105) = l for L ∈ R [ x ] ∗ l . Recall that for a word w ∈ (cid:104) x (cid:105) , we let w c de-note the corresponding (commutative) monomial in [ x ] .Deﬁne the matrix Q l ∈ R [ x ] = l ×(cid:104) x (cid:105) = l by ( Q l ) m , w = (cid:40) / d m if w c = m , m = x α · · · x α n n ∈ [ x ] = l , we deﬁne the multinomial coefﬁcient d m = (cid:12)(cid:12)(cid:8) w ∈ (cid:104) x (cid:105) = l : w c = m (cid:9)(cid:12)(cid:12) = l ! α ! · · · α n ! . (23) Lemma 9

For L ∈ R [ x ] ∗ l we haveQ l ( L (( ww (cid:48) ) c )) w , w (cid:48) ∈(cid:104) x (cid:105) = l Q T l = ( L ( mm (cid:48) )) m , m (cid:48) ∈ [ x ] = l . Proof

For m , m (cid:48) ∈ [ x ] l , the ( m , m (cid:48) ) -entry of the left hand side is equal to ∑ w , w (cid:48) ∈(cid:104) x (cid:105) = l Q mw Q m (cid:48) w (cid:48) L (( ww (cid:48) ) c ) = ∑ w ∈(cid:104) x (cid:105) = lwc = m ∑ w (cid:48) ∈(cid:104) x (cid:105) = l ( w (cid:48) ) c = m (cid:48) L (( ww (cid:48) ) c ) d m d m (cid:48) = L ( mm (cid:48) ) . (cid:3) The symmetric group S l acts on (cid:104) x (cid:105) = l by ( x i · · · x i l ) σ = x i σ ( ) · · · x i σ ( l ) for σ ∈ S l . Let P = l ! ∑ σ ∈ S l P σ , (24)where, for any σ ∈ S l , P σ ∈ R (cid:104) x (cid:105) = l ×(cid:104) x (cid:105) = l is the permutation matrix deﬁned by ( P σ ) w , w (cid:48) = (cid:40) w σ = w (cid:48) , . A matrix M ∈ R (cid:104) x (cid:105) = l ×(cid:104) x (cid:105) = l is said to be S l -invariant if P σ M = MP σ for all σ ∈ S l . Lemma 10

If M ∈ R (cid:104) x (cid:105) = l ×(cid:104) x (cid:105) = l is symmetric and S l -invariant, thenM (cid:23) ⇐⇒ Q l MQ T l (cid:23) . Proof

The implication M (cid:23) = ⇒ Q l MQ T l (cid:23) D ∈ R [ x ] = l × [ x ] = l with ower bounds on matrix factorization ranks via noncommutative polynomial optimization 29 D mm = d m for m ∈ [ x ] = l . We claim that Q T l DQ l = P , the matrix in (24). Indeed, forany w , w (cid:48) ∈ (cid:104) x (cid:105) = l , we have ( Q T l DQ l ) ww (cid:48) = ∑ m ∈ [ x ] = l ( Q l ) mw ( Q l ) mw (cid:48) D mm = (cid:40) / d m if w c = ( w (cid:48) ) c = m , = |{ σ ∈ S l : w σ = w (cid:48) }| l ! = P ww (cid:48) . Suppose Q l MQ T l (cid:23)

0, and let λ be an eigenvalue of M with eigenvector z . Since MP = PM , we may assume Pz = z , for otherwise we can replace z by Pz , which isstill an eigenvector of M with eigenvalue λ . We may also assume z to be a unit vector.Then λ ≥ Q T l DQ l = P as follows: λ = z T Mz = z T PMPz = z T ( Q T l DQ l ) M ( Q T l DQ l ) z = ( DQ l z ) T ( Q l MQ T l ) DQ l z ≥ . (cid:3) We can now derive our symmetry reduction result:

Proposition 12

For L ∈ R [ x ] ∗ l we haveA ⊗ l − ( L (( ww (cid:48) ) c )) w , w (cid:48) ∈(cid:104) x (cid:105) = l (cid:23) ⇐⇒ Q l A ⊗ l Q T l − ( L ( mm (cid:48) )) m , m (cid:48) ∈ [ x ] = l (cid:23) . Proof

For any w , w (cid:48) ∈ (cid:104) x (cid:105) = l we have ( P σ A ⊗ l P T σ ) w , w (cid:48) = A ⊗ lw σ , ( w (cid:48) ) σ = A ⊗ lw , w (cid:48) and ( P σ ( L (( uu (cid:48) ) c )) u , u (cid:48) ∈(cid:104) x (cid:105) = l P ∗ σ ) w , w (cid:48) = L (( w σ ( w (cid:48) ) σ ) c ) = L (( ww (cid:48) ) c ) . This shows that the matrix A ⊗ l − ( L (( ww (cid:48) ) c )) w , w (cid:48) ∈(cid:104) x (cid:105) = l is S l -invariant. Hence theclaimed result follows by using Lemma 9 and Lemma 10. (cid:3) Consider the ( p + q ) × ( p + q ) matrices P ( a , b ) = (cid:18) ( a + q ) I p J p , q J q , p ( b + p ) I q (cid:19) , a , b ∈ R + , where J p , q denotes the all-ones matrix of size p × q . We have P ( a , b ) = P ( , ) + D for some nonnegative diagonal matrix D . As can be easily veriﬁed, P ( , ) is com-pletely positive with cp-rank ( P ( , )) = pq , so P ( a , b ) is completely positive with pq ≤ cp-rank ( P ( a , b )) ≤ pq + p + q .For p = q = ( P ( a , b )) = a , b ≥

0, which followsfrom the fact that 5 × τ soscp ( P ( , )) =

6, and give a subregion of [ , ] where 5 < τ soscp ( P ( a , b )) <

6. Thenext lemma shows the bound ξ cp2 , † ( P ( a , b )) is tight for all a , b ≥ τ soscp in this region. Lemma 11

For a , b ≥ we have ξ cp2 , † ( P ( a , b )) ≥ pq.Proof Let L be feasible for ξ cp2 , † ( P ( a , b )) and let B = (cid:18) α c T c X (cid:19) be the principal submatrix of M ( L ) where the rows and columns are indexed by { } ∪ { x i x j : 1 ≤ i ≤ p , p + ≤ j ∈ p + q } . It follows that c is the all ones vector c = . Moreover, if P ( a , b ) i j = i (cid:54) = j ,then the constraints L ( x i x j u ) ≥ L (( P ( a , b ) i j − x i x j ) u ) ≥ L ( x i x j u ) = u ∈ [ x ] . Hence, X x i x j , x k x l = L ( x i x j x k x l ) = x i x j (cid:54) = x k x l . It followsthat X is a diagonal matrix. We write B = (cid:18) α T Diag ( z , . . . , z pq ) (cid:19) . Since (cid:18) − T − J (cid:19) (cid:23) ≤ Tr (cid:18)(cid:18) α T Diag ( z , . . . , z pq ) (cid:19) (cid:18) − T − J (cid:19)(cid:19) = α − pq + pq ∑ k = z k . Finally, by the constraints L (( P ( a , b ) i j − x i x j ) u ) ≥ i ∈ [ p ] , j ∈ p + [ q ] and u = x i x j ) and L ( x i x j ) = P ( a , b ) i j we obtain z k ≤ k ∈ [ pq ] . Combined with theabove inequality, it follows that L ( ) = α ≥ pq − pq ∑ k = z k ≥ pq , and hence ξ cp2 , † ( P ( a , b )) ≥ pq . (cid:3) The Drew-Johnson-Loewy conjecture [22] states that the maximal cp-rank of an n × n completely positive matrix is equal to (cid:98) n / (cid:99) . Recently this conjecture hasbeen disproven for n = , , , ,

11 in [11] and for all n ≥

12 in [12] (interestingly,it remains open for n = M , ˜ M and ˜ M correspond to the matrices˜ M in Examples 1,2,3 of [11], and M , M correspond to the matrices M in Examples1 and 4. The column ξ cp2 , † ( · ) + x i x j corresponds to the bound ξ cp2 , † ( · ) where we replace S cp A by S cp A ∪ { x i x j : 1 ≤ i < j ≤ n } . ower bounds on matrix factorization ranks via noncommutative polynomial optimization 31 Table 1

Examples from [11] with various bounds on their cp-rankExample cp-rank ( · ) (cid:98) n (cid:99) rank ( · ) ξ cp1 ( · ) ξ cp2 ( · ) ξ cp2 , † ( · ) ξ cp2 , † ( · ) + x i x j ξ cp3 , † ( · ) M

14 12 7 2 .

64 4 .

21 7 .

21 9 .

75 10 . (cid:101) M

14 12 7 2 .

58 4 .

66 8 .

43 9 .

53 10 . (cid:101) M

18 16 8 3 .

23 5 .

45 8 .

74 10 .

41 13 . (cid:101) M

26 20 9 3 .

39 5 .

71 11 .

60 13 .

74 17 . M

32 30 11 4 .

32 7 .

46 20 .

76 21 .

84 –

In this section we adapt the techniques for the cp-rank from Section 3 to the asymmet-ric setting of the nonnegative rank. We now view a factorization A = ( a T i b j ) i ∈ [ m ] , j ∈ [ n ] by nonnegative vectors as a factorization by positive semideﬁnite diagonal matri-ces, by writing A i j = Tr ( X i X m + j ) , with X i = Diag ( a i ) and X m + j = Diag ( b j ) . Notethat we can view this as a “partial matrix” setting, where for the symmetric ma-trix ( Tr ( X i X k )) i , k ∈ [ m + n ] of size m + n , only the off-diagonal entries at the positions ( i , m + j ) for i ∈ [ m ] , j ∈ [ n ] are speciﬁed.This asymmetry requires rescaling the factors in order to get upper bounds ontheir maximal eigenvalues, which is needed to ensure the Archimedean property forthe selected localizing polynomials. For this we use the well-known fact that forany A ∈ R m × n + there exists a factorization A = ( Tr ( X i X m + j )) by diagonal nonnegativematrices of size rank + ( A ) , such that λ max ( X i ) , λ max ( X m + j ) ≤ (cid:112) A max for all i ∈ [ m ] , j ∈ [ n ] , where A max : = max i , j A i j . To see this, observe that for any rank one matrix R = uv T with 0 ≤ R ≤ A , one may assume 0 ≤ u i , v j ≤ √ A max for all i , j . Hence, the set S + A = (cid:8)(cid:112) A max x i − x i : i ∈ [ m + n ] (cid:9) ∪ (cid:8) A i j − x i x m + j : i ∈ [ m ] , j ∈ [ n ] (cid:9) is localizing for A ; that is, there exists a minimal factorization X of A with X ∈ D ( S + A ) .Given A ∈ R m × n ≥ , for each t ∈ N ∪ { ∞ } we consider the semideﬁnite program ξ + t ( A ) = min (cid:8) L ( ) : L ∈ R [ x , . . . , x m + n ] ∗ t , L ( x i x m + j ) = A i j for i ∈ [ m ] , j ∈ [ n ] , L ≥ M t ( S + A ) (cid:9) . Moreover, deﬁne ξ + ∗ ( A ) by adding the constraint rank ( M ( L )) < ∞ to the programdeﬁning ξ + ∞ ( A ) . It it easy to check that ξ + t ( A ) ≤ ξ + ∞ ( A ) ≤ ξ + ∗ ( A ) ≤ rank + ( A ) for t ∈ N .Denote by ξ + t , † ( A ) the strengthening of ξ + t ( A ) where we add the positivity con-straints L ( gu ) ≥ g ∈ { } ∪ S + A and u ∈ [ x ] t − deg ( g ) . (25)Note that these extra constraints can help for ﬁnite t , but are redundant for t ∈ { ∞ , ∗} . τ + ( A ) as analogue of the bound τ cp ( A ) for the nonnegative rank: τ + ( A ) = min (cid:110) α : α ≥ , A ∈ α · conv (cid:8) R ∈ R m × n : 0 ≤ R ≤ A , rank ( R ) ≤ (cid:9)(cid:111) , and the analogue τ sos + ( A ) of the bound τ soscp ( A ) for the nonnegative rank: τ sos + ( A ) = inf (cid:8) α : X ∈ R mn × mn , α ∈ R , (cid:18) α vec ( A ) T vec ( A ) X (cid:19) (cid:23) , X ( i , j ) , ( i , j ) ≤ A i j for 1 ≤ i ≤ m , ≤ j ≤ n , X ( i , j ) , ( k , l ) = X ( i , l ) , ( k , j ) for 1 ≤ i < k ≤ m , ≤ j < l ≤ n (cid:9) . First we give the analogue of Proposition 8, whose proof we omit since it is verysimilar.

Proposition 13

Let A ∈ R m × n + . For every t ∈ N ∪ { ∞ , ∗} the optimum in ξ + t ( A ) isattained, and ξ + t ( A ) → ξ + ∞ ( A ) = ξ + ∗ ( A ) as t → ∞ . If ξ + t ( A ) admits a ﬂat optimalsolution, then ξ + t ( A ) = ξ + ∗ ( A ) . Moreover, ξ + ∞ ( A ) = ξ + ∗ ( A ) is the minimum of L ( ) over all conic combinations L of trace evaluations at elements of D ( S + A ) satisfyingA = ( L ( x i x m + j )) . Now we observe that the parameters ξ + ∞ ( A ) and ξ + ∗ ( A ) coincide with τ + ( A ) , sothat we have a sequence of semideﬁnite programs converging to τ + ( A ) . Proposition 14

For any A ∈ R m × n ≥ , we have ξ + ∞ ( A ) = ξ + ∗ ( A ) = τ + ( A ) . Proof

The discussion at the beginning of Section 4 shows that for any rank one matrix R satisfying 0 ≤ R ≤ A we may assume that R = uv T with ( u , v ) ∈ R m + × R n + and u i , v j ≤ √ A max for i ∈ [ m ] , j ∈ [ n ] . Hence, τ + ( A ) can be written asmin (cid:110) α : α ≥ , A ∈ α · conv (cid:8) uv T : u ∈ (cid:104) , (cid:112) A max (cid:105) m , v ∈ (cid:104) , (cid:112) A max (cid:105) n , uv T ≤ A (cid:9)(cid:111) = min (cid:110) α : α ≥ , A ∈ α · conv (cid:8) uv T : ( u , v ) ∈ D ( S + A ) (cid:9)(cid:111) . The equality ξ + ∞ ( A ) = ξ + ∗ ( A ) = τ + ( A ) now follows from the reformulation of ξ + ∗ ( A ) in Proposition 13 in terms of conic evaluations, after noting that for ( u , v ) in R m × R n we have ( u , v ) ∈ D ( S + A ) if and only if the matrix R = uv T satisﬁes 0 ≤ R ≤ A . (cid:3) Analogously to the case of the completely positive rank we have the followingproposition. The proof is similar to that of Proposition 4.2, considering now for M the principal submatrix of M ( L ) indexed by the monomials 1 and x i x m + j for i ∈ [ m ] and j ∈ [ n ] . Proposition 15

If A is a nonnegative matrix, then ξ + , † ( A ) ≥ τ sos + ( A ) . ower bounds on matrix factorization ranks via noncommutative polynomial optimization 33 In the remainder of this section we recall how τ + ( A ) and τ sos + ( A ) compare to otherbounds in the literature. These bounds can be divided into two categories: combina-torial lower bounds and norm-based lower bounds. The following diagram from [28]summarizes how τ sos + ( A ) and τ + ( A ) relate to the combinatorial lower bounds τ sos + ( A ) ≤ τ + ( A ) ≤ rank + ( A ) ≤ ≤ ≤ fool ( A ) = ω ( RG ( A )) ≤ ϑ ( RG ( A )) ≤ χ frac ( RG ( A )) ≤ χ ( RG ( A )) = rank B ( A ) . Here RG ( A ) is the rectangular graph , with V = { ( i , j ) ∈ [ m ] × [ n ] : A i j > } as vertexset and E = { (( i , j ) , ( k , l )) : A il A k j = } as edge set. The coloring number of RG ( A ) coincides with the well known rectangle covering number (also denoted rank B ( A ) ),which was used, e.g., in [30] to show that the extension complexity of the correlationpolytope is exponential. The clique number of RG ( A ) is also known as the foolingset number (see, e.g., [29]). Observe that the above combinatorial lower bounds onlydepend on the sparsity pattern of the matrix A , and that they are all equal to one for astrictly positive matrix.Fawzi and Parrilo [28] have furthermore shown that the bound τ + ( A ) is at leastas good as norm-based lower bounds: τ + ( A ) = sup N monotone andpositively homogeneous N ∗ ( A ) N ( A ) . Here, a function N : R m × n + → R + is positively homogeneous if N ( λ A ) = λ N ( A ) for all λ ≥ monotone if N ( A ) ≤ N ( B ) for A ≤ B , and N ∗ ( A ) is deﬁned as N ∗ ( A ) = max { L ( A ) : L : R m × n → R linear and L ( X ) ≤ X ∈ R m × n + with rank ( X ) ≤ N ( X ) ≤ } . These bounds are called norm-based since norms often provide valid functions N .For example, when N is the (cid:96) ∞ -norm, Rothvoß [66] used the corresponding lowerbound to show that the matching polytope has exponential extension complexity.When N is the Frobenius norm: N ( A ) = ( ∑ i , j A i j ) / , the parameter N ∗ ( A ) isknown as the nonnegative nuclear norm . In [27] it is denoted by ν + ( A ) , shown tosatisfy rank + ( A ) ≥ ( ν + ( A ) / || A || F ) , and reformulated as ν + ( A ) = min (cid:8) ∑ i λ i : A = ∑ i λ i u i v T i , ( λ i , u i , v i ) ∈ R + m + n + , || u i || = || v i || = (cid:9) (26) = max (cid:8) (cid:104) A , W (cid:105) : W ∈ R m × n , (cid:0) I − W − W T I (cid:1) is copositive (cid:9) . (27)where the cone of copositive matrices is the dual of the cone of completely pos-itive matrices. Fawzi and Parrilo [27] use the copositive formulation (27) to pro-vide bounds ν [ k ]+ ( A ) ( k ≥ ν + ( A ) from below. We now observe that by Theorem 7the atomic formulation of ν + ( A ) from (26) can be seen as a moment optimizationproblem: ν + ( A ) = min (cid:90) V ( S ) d µ ( x ) s.t. A i j = (cid:90) V ( S ) x i x m + j d µ ( x ) for i ∈ [ m ] , j ∈ [ n ] . Here, the optimization variable µ is required to be a Borel measure on the variety V ( S ) , where S = { ∑ mi = x i − , ∑ nj = x m + j − } . (The same observation is made in [74] for the real nuclear norm of a symmetric3-tensor and in [59] for symmetric odd-dimensional tensors.) For t ∈ N ∪ { ∞ } , let µ t ( A ) denote the parameter deﬁned analogously to ξ + t ( A ) , where we replace the con-dition L ≥ M t ( S + A ) by L ≥ M t ( { x , . . . , x m + n } ) and L = I t ( S ) , andlet µ ∗ ( A ) be obtained by adding the constraint rank ( M ( L )) < ∞ to µ ∞ ( A ) . We have µ t ( A ) → µ ∞ ( A ) = µ ∗ ( A ) = ν + ( A ) by Theorem 7 and (a non-normalized analogue of)Theorem 8. One can show that µ ( A ) with the additional constraints L ( u ) ≥ u ∈ [ x ] , is at least as good as ν [ ]+ ( A ) . It is not clear how the hierarchies µ t ( A ) and ν [ k ]+ ( A ) compare in general.4.2 Computational examplesWe illustrate the performance of our approach by comparing our lower bounds ξ + , † and ξ + , † to the lower bounds τ + and τ sos + on the two examples considered in [28]. × matrices For A ( α ) = (cid:0) α (cid:1) , Fawzi and Parrilo [28] show that τ + ( A ( α )) = − α and τ sos + ( A ( α )) = + α for all 0 ≤ α ≤ . Since the parameters τ + ( A ) and τ sos + ( A ) are invariant under scaling and permutingrows and columns of A , one can use the identity (cid:18) α (cid:19) = (cid:18) α (cid:19) (cid:18) / α (cid:19) (cid:18) (cid:19) to see this describes the parameters for all nonnegative 2 × α = k / k ∈ [ ] , we see that ξ + ( A ( α )) coincides with τ + ( A ( α )) . In this section we consider the nested rectangles problem as described in [28, Sec-tion 2.7.2] (see also [56]), which asks for which a , b there exists a triangle T such that R ( a , b ) ⊆ T ⊆ P , where R ( a , b ) = [ − a , a ] × [ − b , b ] and P = [ − , ] .The nonnegative rank relates not only to the extension complexity of a poly-tope [78], but also to extended formulations of nested pairs [13,32]. An extendedformulation of a pair of polytopes P ⊆ P ⊆ R d is a (possibly) higher dimensionalpolytope K whose projection π ( K ) is nested between P and P . Let us suppose π ( K ) = { x ∈ R d : y ∈ R k + , ( x , y ) ∈ K } and K = { ( x , y ) : Ex + Fy = g , y ∈ R k + } , then k ower bounds on matrix factorization ranks via noncommutative polynomial optimization 35 is the size of the extended formulation , and the smallest such k is called the extensioncomplexity of the pair ( P , P ) . It is known (cf. [13, Theorem 1]) that the extensioncomplexity of the pair ( P , P ) , where P = conv ( { v , . . . , v n } ) and P = { x : a T i x ≤ b i for i ∈ [ m ] } , is equal to the nonnegative rank of the generalized slack matrix S P , P ∈ R m × n , deﬁnedby ( S P , P ) i j = b j − a T j v i for i ∈ [ m ] , j ∈ [ n ] . Any nonnegative matrix is the slack matrix of some nested pair of polytopes [35,Lemma 4.1] (see also [32]).Applying this to the pair ( R ( a , b ) , P ) , one immediately sees that there exists apolytope K with at most three facets whose projection T = π ( K ) ⊆ R satisﬁes R ( a , b ) ⊆ T ⊆ P if and only if the pair ( R ( a , b ) , P ) admits an extended formulation ofsize 3. For a , b >

0, the polytope T has to be 2 dimensional, therefore K has to be atleast 2 dimensional as well; it follows that K and T have to be triangles. Hence thereexists a triangle T such that R ( a , b ) ⊆ T ⊆ P if and only if the nonnegative rank ofthe slack matrix S ( a , b ) : = S R ( a , b ) , P is equal to 3. One can verify that S ( a , b ) =  − a + a − b + b + a − a − b + b + a − a + b − b − a + a + b − b  . Such a triangle exists if and only if ( + a )( + b ) ≤ τ sos + ( S ( a , b )) for different values of a and b . In doing so they determine the regionwhere τ sos + ( S ( a , b )) >

3. We do the same for the bounds ξ + , † ( S ( a , b )) , ξ + , † ( S ( a , b )) and ξ + , † ( S ( a , b )) , see Figure 1. The results show that ξ + , † ( S ( a , b )) strictly improvesupon the bound τ sos + ( S ( a , b )) , and that ξ + , † ( S ( a , b )) is again a strict improvement over ξ + , † ( S ( a , b )) . The positive semideﬁnite rank can be seen as an asymmetric version of the completelypositive semideﬁnite rank. Hence, as was the case in the previous section for thenonnegative rank, we need to select suitable factors in a minimal factorization inorder to be able to bound their maximum eigenvalues and obtain a localizing set ofpolynomials leading to an Archimedean quadratic module.For this we can follow, e.g., the approach in [52, Lemma 5] to rescale a fac-torization and claim that, for any A ∈ R m × n + with psd-rank C ( A ) = d , there exists afactorization A = ( (cid:104) X i , X m + j (cid:105) ) by matrices X , . . . , X m + n ∈ H d + such that ∑ mi = X i = I and Tr ( X m + j ) = ∑ i A i j for all j ∈ [ n ] . Indeed, starting from any factorization X i , X m + j in H d + of A , we may replace X i by X − / X i X − / and X m + j by X / X m + j X / , where . . . . . . . . a b Fig. 1

The colored region corresponds to rank + ( S ( a , b )) =

4. The top right region (black) corresponds to ξ + , † ( S ( a , b )) >

3, the two top right regions (black and red) together correspond to τ sos + ( S ( a , b )) >

3, thethree top right regions (black, red and yellow) to ξ + , † ( S ( a , b )) >

3, and the four top right regions (black,red, yellow, and green) to ξ + , † ( S ( a , b )) > X : = ∑ mi = X i is positive deﬁnite (by minimality of d ). This argument shows that theset of polynomials S psd A = (cid:8) x i − x i : i ∈ [ m ] (cid:9) ∪ (cid:8)(cid:16) m ∑ i = A i j (cid:17) x m + j − x m + j : j ∈ [ n ] (cid:9) is localizing for A ; that is, there is at least one minimal factorization X of A such that g ( X ) (cid:23) g ∈ S psd A . Moreover, for the same minimal factorization X of A we have p ( X )( − ∑ mi = X i ) = p ∈ R (cid:104) x (cid:105) .Given A ∈ R m × n + , for each t ∈ N ∪ { ∞ } we consider the semideﬁnite program ξ psd t ( A ) = min (cid:8) L ( ) : L ∈ R (cid:104) x , . . . , x m + n (cid:105) ∗ t , L ( x i x m + j ) = A i j for i ∈ [ m ] , j ∈ [ n ] , L ≥ M t ( S psd A ) , L = I t ( − ∑ mi = x i ) (cid:9) . We additionally deﬁne ξ psd ∗ ( A ) by adding the constraint rank ( M ( L )) < ∞ to the pro-gram deﬁning ξ psd ∞ ( A ) (and considering the inﬁmum instead of the minimum, sincewe do not know if the inﬁmum is attained in ξ psd ∗ ( A ) ). By the above discussion itfollows that the parameter ξ psd ∗ ( A ) is a lower bound on psd-rank C ( A ) and we have ξ psd1 ( A ) ≤ . . . ≤ ξ psd t ( A ) ≤ . . . ≤ ξ psd ∞ ( A ) ≤ ξ psd ∗ ( A ) ≤ psd-rank C ( A ) . Note that, in contrast to the previous bounds, the parameter ξ psd t ( A ) is not invariantunder rescaling the rows of A or under taking the transpose of A (see Section 5.2.2). ower bounds on matrix factorization ranks via noncommutative polynomial optimization 37 It follows from the construction of S psd A and Equation (10) that the quadratic mod-ule M ( S psd A ) is Archimedean, and hence the following analogue of Proposition 1 canbe shown. Proposition 16

Let A ∈ R m × n + . For each t ∈ N ∪ { ∞ } , the optimum in ξ psd t ( A ) isattained, and we have lim t → ∞ ξ psd t ( A ) = ξ psd ∞ ( A ) . Moreover, ξ psd ∞ ( A ) is equal to the inﬁmum over all α ≥ for which there exists aunital C ∗ -algebra A with tracial state τ and X ∈ D A ( S psd A ) ∩ V A ( − ∑ mi = x i ) suchthat A = α · ( τ ( X i X m + j )) i ∈ [ m ] , j ∈ [ n ] . C ( A ) ≥ m ∑ i = max j ∈ [ n ] A i j ∑ i A i j . (28)If a feasible linear form L to ξ psd t ( A ) satisﬁes the inequalities L ( x i ( ∑ i A i j − x m + j )) ≥ i ∈ [ m ] , j ∈ [ n ] , then L ( ) is at least the above lower bound. Indeed, the inequal-ities give L ( x i ) ≥ max j ∈ [ n ] L ( x i x m + j ) ∑ i A i j = max j ∈ [ n ] A i j ∑ i A i j . and hence L ( ) = m ∑ i = L ( x i ) ≥ m ∑ i = max j ∈ [ n ] A i j ∑ i A i j . The inequalities L ( x i ( ∑ i A i j − x m + j )) ≥ D ( S psd A ) . More importantly, as in Lemma 2, these inequalitiesare satisﬁed by feasible linear forms to the programs ξ psd ∞ ( A ) and ξ psd ∗ ( A ) . Hence, ξ psd ∞ ( A ) and ξ psd ∗ ( A ) are at least as good as the lower bound (28).In [52] two other ﬁdelity based lower bounds on the psd-rank were deﬁned; wedo not know how they compare to ξ psd t ( A ) .5.2 Computational examplesIn this section we apply our bounds to some (small) examples taken from the litera-ture, namely 3 × . . . . . . . . b c Fig. 2

The colored region corresponds to the values ( b , c ) for which psd-rank R ( M ( b , c )) =

3; the outerregion (yellow) shows the values of ( b , c ) for which ξ psd2 ( M ( b , c )) > M ( b , c ) =  b cc bb c  with b , c ≥ . If b = = c , then rank ( M ( b , c )) = psd-rank R ( M ( b , c )) = psd-rank C ( M ( b , c )) = ( M ( b , c )) ≥

2, which implies psd-rank K ( M ( b , c )) ≥ K ∈ { R , C } . In [26, Example 2.7] it is shown thatpsd-rank R ( M ( b , c )) ≤ ⇐⇒ + b + c ≤ ( b + c + bc ) . Hence, if b and c do not satisfy the above relation then psd-rank R ( M ( b , c )) = ξ psd2 ( M ( b , c )) for ( b , c ) ∈ [ , ] (with stepsize 0 . ξ psd2 ( M ( b , c )) certiﬁes that psd-rank R ( M ( b , c )) = psd-rank C ( M ( b , c )) = ( b , c ) where psd-rank R ( M ( b , c )) = Here we consider the slack matrices of two polygons in the plane, where the boundsare sharp (after rounding) and illustrate the dependence on scaling the rows or takingthe transpose. We consider the quadrilateral Q with vertices ( , ) , ( , ) , ( , ) , ( , ) , ower bounds on matrix factorization ranks via noncommutative polynomial optimization 39 and the regular hexagon H , whose slack matrices are given by S Q =   , S H =   . Our lower bounds on the psd-rank C are not invariant under taking the transpose,indeed numerically we have ξ psd2 ( S Q ) ≈ .

266 and ξ psd2 ( S T Q ) ≈ .

5. The slack matrix S Q has psd-rank R ( S Q ) = C ( S Q ) = = psd-rank R ( S Q ) .Secondly, our bounds are not invariant under rescaling the rows of a nonnegativematrix. Numerically we have ξ psd2 ( S H ) ≈ .

99 while ξ psd2 ( DS H ) ≈ .

12, where D = Diag ( , , , , , ) . The bound ξ psd2 ( DS H ) is in fact tight (after rounding) for thecomplex positive semideﬁnite rank of DS H and hence of S H : in [34] it is shown thatpsd-rank C ( S H ) = In this work we provide a uniﬁed approach for the four matrix factorizations ob-tained by considering (a)symmetric factorizations by nonnegative vectors and posi-tive semideﬁnite matrices. Our methods can be extended to the nonnegative tensorrank, which is deﬁned as the smallest integer d for which a k -tensor A ∈ R n ×···× n k + can be written as A = ∑ dl = u , l ⊗ · · · ⊗ u k , l for nonnegative vectors u j , l ∈ R n j + . Theapproach from Section 4 for rank + can be extended to obtain a hierarchy of lowerbounds on the nonnegative tensor rank. For instance, if A is a 3-tensor, the analogousbound ξ + t ( A ) is obtained by minimizing L ( ) over L ∈ R [ x , . . . , x n + n + n ] ∗ suchthat L ( x i x n + i x n + n + i ) = A i i i (for i ∈ [ n ] , i ∈ [ n ] , i ∈ [ n ] ), using as localiz-ing polynomials in S + A the polynomials √ A max x i − x i and A i i i − x i x n + i x n + n + i .As in the matrix case one can compare to the bounds τ + ( A ) and τ sos + ( A ) from [28].One can show ξ + ∗ ( A ) = τ + ( A ) , and one can show ξ + , † ( A ) ≥ τ sos + ( A ) after adding theconditions L ( x i x n + i x n + n + i ( A i i i − x i x n + i x n + n + i )) ≥ ξ + ( A ) .Testing membership in the completely positive cone and the completely positivesemideﬁnite cone is another important problem, to which our hierarchies can also beapplied. It follows from the proof of Proposition 8 that if A is not completely positivethen, for some order t , the program ξ cp t ( A ) is infeasible or its optimum value is largerthan the Caratheodory bound on the cp-rank (which is similar to an earlier resultin [58]). In the noncommutative setting the situation is more complicated: If ξ cpsd ∗ ( A ) is feasible, then A ∈ CS + , and if A (cid:54)∈ CS n + , vN , then ξ cpsd ∞ ( A ) is infeasible (Proposi-tions 1 and 2). Here CS n + , vN is the cone deﬁned in [18] consisting of the matrices ad-mitting a factorization in a von Neumann algebra with a trace. By Lemma 12, CS n + , vN can equivalently be characterized as the set of matrices of the form α ( τ ( a i a j )) forsome C ∗ -algebra A with tracial state τ , positive elements a , . . . , a n ∈ A and α ∈ R + . Our lower bounds are on the complex version of the (completely) positive semi-deﬁnite rank. As far as we are aware, the existing lower bounds (except for the di-mension counting rank lower bound) are also on the complex (completely) positivesemideﬁnite rank. It would be interesting to ﬁnd a lower bound on the real (com-pletely) positive semideﬁnite rank that can go beyond the complex (completely) pos-itive semideﬁnite rank.We conclude with some open questions regarding applications of lower boundson matrix factorization ranks. First, as was shown in [62,39,63], completely positivesemideﬁnite matrices whose cpsd-rank C is larger than their size do exist, but cur-rently we do not know how to construct small examples for which this holds. Hence,a concrete question: Does there exist a 5 × C is at least 6? Second, as we mentioned before, the asymmetricsetting corresponds to (semideﬁnite) extension complexity of polytopes. Rothvoß’result [66] (indirectly) shows that the parameter ξ + ∞ is exponential (in the number ofnodes of the graph) for the slack matrix of the matching polytope. Can this result alsobe shown directly using the dual formulation of ξ + ∞ , that is, by a sum-of-squares cer-tiﬁcate? If so, could one extend the argument to the noncommutative setting (whichwould show a lower bound on the semideﬁnite extension complexity)? Acknowledgements

The authors would like to thank Sabine Burgdorf for helpful discussions and ananonymous referee for suggestions that helped improve the presentation.

A Commutative and tracial polynomial optimization

In this appendix we discuss known convergence and ﬂatness results for commutative and tracial polynomialoptimization. We present these results in such a way that they can be directly used for our hierarchies oflower bounds on matrix factorization ranks. Although the commutative case was developed ﬁrst, here wetreat the commutative and tracial cases together. For the reader’s convenience we provide all proofs byworking on the “moment side”; that is, relying on properties of linear functionals rather than using realalgebraic results on sums of squares. Tracial optimization is an adaptation of eigenvalue optimization asdeveloped in [61], but here we only discuss the commutative and tracial cases, as these are most relevantto our work.

A.1 Flat extensions and representations of linear forms

The optimization variables in the optimization problems considered in this paper are linear forms on spacesof (noncommutative) polynomials. To study the properties of the bounds obtained through these optimiza-tion problems we need to study properties and representations of (ﬂat) linear forms on polynomial spaces.In Section 1.3 the key examples of symmetric tracial linear functionals on R (cid:104) x (cid:105) t are trace evaluationson a (ﬁnite dimensional) C ∗ -algebra. In this section we present some results that provide conditions underwhich, conversely, a symmetric tracial linear map on R (cid:104) x (cid:105) t ( t ∈ N ∪ { ∞ } ) that is nonnegative on M ( S ) and zero on I ( T ) arises from trace evaluations at elements in the intersection of the C ∗ -algebraic analogsof the matrix positivity domain of S and the matrix ideal of T . In Theorems 1 and 2 we consider the case t = ∞ and in Theorem 3 we consider the case t ∈ N . Results like these can for instance be used to link thelinear forms arising in the limiting optimization problems of our hierarchies to matrix factorization ranks.The proofs of Theorems 1 and 2 use a classical Gelfand–Naimark–Segal (GNS) construction. In theseproofs it will also be convenient to work with the concept of the null space of a linear functional L ∈ R (cid:104) x (cid:105) ∗ t ,ower bounds on matrix factorization ranks via noncommutative polynomial optimization 41which is deﬁned as the vector space N t ( L ) = (cid:8) p ∈ R (cid:104) x (cid:105) t : L ( qp ) = q ∈ R (cid:104) x (cid:105) t (cid:9) . We use the notation N ( L ) = N ∞ ( L ) for the nontruncated null space. Recall that M t ( L ) is the momentmatrix associated to L , its rows and columns are indexed by words in (cid:104) x (cid:105) t , and its entries are given by M t ( L ) w , w (cid:48) = L ( w ∗ w (cid:48) ) for w , w (cid:48) ∈ (cid:104) x (cid:105) t . The null space of L can therefore be identiﬁed with the kernel of M t ( L ) : A polynomial p = ∑ w c w w belongs to N t ( L ) if and only if its coefﬁcient vector ( c w ) belongs to thekernel of M t ( L ) .In Section 1.3 we deﬁned a linear functional L ∈ R (cid:104) x (cid:105) ∗ t to be δ -ﬂat based on the rank stabilizationproperty (4) of its moment matrix: rank ( M t ( L )) = rank ( M t − δ ( L )) . This deﬁnition can be reformulated interms of a decomposition of the corresponding polynomial space using the null space: the form L is δ -ﬂatif and only if R (cid:104) x (cid:105) t = R (cid:104) x (cid:105) t − δ + N t ( L ) . Recall that L is said to be ﬂat if it is δ -ﬂat for some δ ≥

1. Finally, in the nontruncated case ( t = ∞ ) L wascalled ﬂat if rank ( M ( L )) < ∞ . We can now see that rank ( M ( L )) < ∞ if and only if there exists an integer s ∈ N such that R (cid:104) x (cid:105) = R (cid:104) x (cid:105) s + N ( L ) .Theorem 1 below is implicit in several works (see, e.g., [57,17]). Here we assume that M ( S )+ I ( T ) is Archimedean, which we recall means that there exists a scalar R > R − n ∑ i = x i ∈ M ( S ) + I ( T ) . Theorem 1

Let S ⊆ Sym R (cid:104) x (cid:105) and T ⊆ R (cid:104) x (cid:105) with M ( S ) + I ( T ) Archimedean. Given a linear formL ∈ R (cid:104) x (cid:105) ∗ , the following are equivalent:(1) L is symmetric, tracial, nonnegative on M ( S ) , zero on I ( T ) , and L ( ) = ;(2) there is a unital C ∗ -algebra A with tracial state τ and X ∈ D A ( S ) ∩ V A ( T ) withL ( p ) = τ ( p ( X )) for all p ∈ R (cid:104) x (cid:105) . (29) Proof

We ﬁrst prove the easy direction ( ) ⇒ ( ) : We have L ( p ∗ ) = τ ( p ∗ ( X )) = τ ( p ( X ) ∗ ) = τ ( p ( X )) = L ( p ) = L ( p ) , where we use that τ is Hermitian and X ∗ i = X i for i ∈ [ n ] . Moreover, L is tracial since τ is tracial. In addition,for g ∈ S ∪ { } and p ∈ R (cid:104) x (cid:105) we have L ( p ∗ gp ) = τ ( p ∗ ( X ) g ( X ) p ( X )) = τ ( p ( X ) ∗ g ( X ) p ( X )) ≥ , since g ( X ) is positive in A as X ∈ D A ( S ) and τ is positive. Similarly L ( hq ) = τ ( h ( X ) q ( X )) = h ∈ T , since X ∈ V A ( T ) .We show ( ) ⇒ ( ) by applying a GNS construction. Consider the quotient vector space R (cid:104) x (cid:105) / N ( L ) ,and denote the class of p in R (cid:104) x (cid:105) / N ( L ) by p . We can equip this quotient with the inner product (cid:104) p , q (cid:105) = L ( p ∗ q ) for p , q ∈ R (cid:104) x (cid:105) , so that the completion H of R (cid:104) x (cid:105) / N ( L ) is a separable Hilbert space. As N ( L ) isa left ideal in R (cid:104) x (cid:105) , the operator X i : R (cid:104) x (cid:105) / N ( L ) → R (cid:104) x (cid:105) / N ( L ) , p (cid:55)→ x i p (30)is well deﬁned. We have (cid:104) X i p , q (cid:105) = L (( x i p ) ∗ q ) = L ( p ∗ x i q ) = (cid:104) p , X i q (cid:105) for all p , q ∈ R (cid:104) x (cid:105) , so the X i are self-adjoint. Since g ∈ S ∪ { } is symmetric and (cid:104) p , g ( X ) p (cid:105) = (cid:104) p , gp (cid:105) = L ( p ∗ gp ) ≥ p we have g ( X ) (cid:23)

0. By the Archimedean condition, there exists an R > R − ∑ ni = x i ∈ M ( S ) + I ( T ) . Using R − x i = ( R − ∑ nj = x j ) + ∑ j (cid:54) = i x j ∈ M ( S ) + I ( T ) we get (cid:104) X i p , X i p (cid:105) = L ( p ∗ x i p ) ≤ R · L ( p ∗ p ) = R (cid:104) p , p (cid:105) for all p ∈ R (cid:104) x (cid:105) . X i extends to a bounded self-adjoint operator, also denoted X i , on the Hilbert space H such that g ( X ) is positive for all g ∈ S ∪ { } . Moreover, we have (cid:104) f , h ( X ) (cid:105) = L ( f ∗ h ) = f ∈ R (cid:104) x (cid:105) , h ∈ T .The operators X i ∈ B ( H ) extend to self-adjoint operators in B ( C ⊗ R H ) , where C ⊗ R H is thecomplexiﬁcation of H . Let A be the unital C ∗ -algebra obtained by taking the operator norm closure of R (cid:104) X (cid:105) ⊆ B ( C ⊗ R H ) . It follows that X ∈ D A ( S ) ∩ V A ( T ) .Deﬁne the state τ on A by τ ( a ) = (cid:104) , a (cid:105) for a ∈ A . For all p , q ∈ R (cid:104) x (cid:105) we have τ ( p ( X ) q ( X )) = (cid:104) , p ( X ) q ( X ) (cid:105) = (cid:104) , pq (cid:105) = L ( pq ) , (31)so that the restriction of τ to R (cid:104) X (cid:105) is tracial. Since R (cid:104) X (cid:105) is dense in A in the operator norm, this implies τ is tracial.To conclude the proof, observe that (29) follows from (31) by taking q = (cid:3) The next result can be seen as a ﬁnite dimensional analogue of the above result, where we do not need M ( S ) + I ( T ) to be Archimedean, but instead we assume the rank of M ( L ) to be ﬁnite (i.e., L to be ﬂat).In addition to the Gelfand–Naimark–Segal construction, the proof uses Artin–Wedderburn theory. For theunconstrained case the proof of this result can be found in [16], and in [17,43] this result is extended tothe constrained case. Theorem 2

For S ⊆ Sym R (cid:104) x (cid:105) , T ⊆ R (cid:104) x (cid:105) , and L ∈ R (cid:104) x (cid:105) ∗ , the following are equivalent:(1) L is a symmetric, tracial, linear form with L ( ) = that is nonnegative on M ( S ) , zero on I ( T ) , andhas rank ( M ( L )) < ∞ ;(2) there is a ﬁnite dimensional C ∗ -algebra A with a tracial state τ , and X ∈ D A ( S ) ∩ V A ( T ) satisfyingequation (29) ;(3) L is a convex combination of normalized trace evaluations at points in D ( S ) ∩ V ( T ) .Proof ((1) ⇒ (2)) Here we can follow the proof of Theorem 1, with the extra observation that the conditionrank ( M ( L )) < ∞ implies that the quotient space R (cid:104) x (cid:105) / N ( L ) is ﬁnite dimensional. Since R (cid:104) x (cid:105) / N ( L ) isﬁnite dimensional the multiplication operators are bounded, and the constructed C ∗ -algebra A is ﬁnitedimensional.((2) ⇒ (3)) By Artin-Wedderburn theory there exists a ∗ -isomorphism ϕ : A → M (cid:77) m = C d m × d m for some M ∈ N , d ,..., d M ∈ N . Deﬁne the ∗ -homomorphisms ϕ m : A → C d m × d m for m ∈ [ M ] by ϕ = ⊕ Mm = ϕ m . Then, for each m ∈ [ M ] , themap C d m × d m → C deﬁned by X (cid:55)→ τ ( ϕ − m ( X )) is a positive tracial linear form, and hence it is a nonnegativemultiple λ m tr ( · ) of the normalized matrix trace (since, for a full matrix algebra, the normalized trace isthe unique tracial state). Then we have τ ( a ) = ∑ m λ m tr ( ϕ m ( a )) for all a ∈ A . So τ ( · ) = ∑ m λ m tr ( · ) fornonnegative scalars λ m with ∑ m λ m = L ( ) =

1. By deﬁning the matrices X mi = ϕ m ( X i ) for m ∈ [ M ] , weget L ( p ) = τ ( p ( X ,..., X n )) = M ∑ m = λ m tr ( p ( X m ,..., X mn )) for all p ∈ R (cid:104) x (cid:105) . Since ϕ m is a ∗ -homomorphism we have g ( X m ,..., X mn ) (cid:23) g ∈ S ∪{ } and also h ( X m ,..., X mn ) = h ∈ T , which shows ( X m ,..., X mn ) ∈ D ( S ) ∩ V ( T ) .((3) ⇒ (1)) If L is a conic combination of trace evaluations at elements from D ( S ) ∩ V ( T ) , then L is symmetric, tracial, nonnegative on M ( S ) , zero on I ( T ) , and satisﬁes rank ( M ( L )) < ∞ because themoment matrix of any trace evaluation has ﬁnite rank. (cid:3) The previous two theorems were about linear functionals deﬁned on the full space of noncommutativepolynomials. The following result claims that a ﬂat linear functional on a truncated polynomial spacecan be extended to a ﬂat linear functional on the full space of polynomials while preserving the samepositivity properties. It is due to Curto and Fialkow [20] in the commutative case and extensions to thenoncommutative case can be found in [61] (for eigenvalue optimization) and [16] (for trace optimization).

Theorem 3

Let ≤ δ ≤ t < ∞ , S ⊆ Sym R (cid:104) x (cid:105) δ , and T ⊆ R (cid:104) x (cid:105) δ . Suppose L ∈ R (cid:104) x (cid:105) ∗ t is symmetric,tracial, δ -ﬂat, nonnegative on M t ( S ) , and zero on I t ( T ) . Then L extends to a symmetric, tracial, linearform on R (cid:104) x (cid:105) that is nonnegative on M ( S ) , zero on I ( T ) , and whose moment matrix has ﬁnite rank. ower bounds on matrix factorization ranks via noncommutative polynomial optimization 43 Proof

Let W ⊆ (cid:104) x (cid:105) t − δ index a maximum nonsingular submatrix of M t − δ ( L ) , and let span ( W ) be the linearspace spanned by W . We have the vector space direct sum R (cid:104) x (cid:105) t = span ( W ) ⊕ N t ( L ) . (32)That is, for each u ∈ (cid:104) x (cid:105) t there exists a unique r u ∈ span ( W ) such that u − r u ∈ N t ( L ) .We ﬁrst construct the (unique) symmetric ﬂat extension ˆ L ∈ R (cid:104) x (cid:105) t + of L . For this we set ˆ L ( p ) = L ( p ) for deg ( p ) ≤ t , and we setˆ L ( u ∗ x i v ) = L ( u ∗ x i r v ) and ˆ L (( x i u ) ∗ x j v ) = L (( x i r u ) ∗ x j r v ) for all i , j ∈ [ n ] and u , v ∈ (cid:104) x (cid:105) with | u | = | v | = t . One can verify that ˆ L is symmetric and satisﬁes x i ( u − r u ) ∈ N t + ( ˆ L ) for all i ∈ [ n ] and u ∈ R (cid:104) x (cid:105) t , from which it follows that ˆ L is 2-ﬂat.We also have ( u − r u ) x i ∈ N t + ( ˆ L ) for all i ∈ [ n ] and u ∈ R (cid:104) x (cid:105) t : Since ˆ L is 2-ﬂat, we have ( u − r u ) x i ∈ N t + ( ˆ L ) if and only if ˆ L ( p ( u − r u ) x i ) = p ∈ R (cid:104) x (cid:105) t − . By using deg ( x i p ) ≤ t , L is tracial, and u − r u ∈ N t ( L ) , we get ˆ L ( p ( u − r u ) x i ) = L ( p ( u − r u ) x i ) = L ( x i p ( u − r u )) = ( v − r v ) x j ∈ N t + ( ˆ L ) , symmetry of ˆ L , x i ( u − r u ) ∈ N t + ( ˆ L ) , and again symme-try of ˆ L , we see thatˆ L (( x i u ) ∗ vx j ) = ˆ L (( x i u ) ∗ r v x j ) = ˆ L (( r v x j ) ∗ x i u ) = ˆ L (( r v x j ) ∗ x i r u ) = ˆ L (( x i r u ) ∗ r v x j ) , (33)and in an analogous way one can showˆ L (( ux i ) ∗ x j v ) = ˆ L (( r u x i ) ∗ x j r v ) . (34)We can now show that ˆ L is tracial. We do this by showing that ˆ L ( wx j ) = ˆ L ( x j w ) for all w withdeg ( w ) ≤ t +

1. Notice that when deg ( w ) ≤ t − L is an extensionof L . Suppose w = u ∗ v with deg ( u ) = t + ( v ) ≤ t . We write u = x i u (cid:48) , and we let r u (cid:48) , r v ∈ R (cid:104) x (cid:105) t − be such that u (cid:48) − r u (cid:48) , v − r v ∈ N t ( L ) . We then haveˆ L ( wx j ) = ˆ L ( u ∗ vx j ) = ˆ L (( x i u (cid:48) ) ∗ vx j )= ˆ L (( x i r u (cid:48) ) ∗ r v x j ) by (33) = L (( x i r u (cid:48) ) ∗ r v x j ) since deg ( x i r u (cid:48) r v x j ) ≤ t = L (( r u (cid:48) x j ) ∗ x i r v ) since L is tracial = ˆ L (( r u (cid:48) x j ) ∗ x i r v ) since deg (( r u (cid:48) x j ) ∗ x i r v ) ≤ t = ˆ L (( u (cid:48) x j ) ∗ x i v ) by (34) = ˆ L ( x j w ) . It follows ˆ L is a symmetric tracial ﬂat extension of L , and rank ( M ( ˆ L )) = rank ( M ( L )) .Next, we iterate the above procedure to extend L to a symmetric tracial linear functional ˆ L ∈ R (cid:104) x (cid:105) ∗ .It remains to show that ˆ L is nonnegative on M ( S ) and zero on I ( T ) . For this we make two observations:(i) I ( N t ( L )) ⊆ N ( ˆ L ) .(ii) R (cid:104) x (cid:105) = span ( W ) ⊕ I ( N t ( L )) .For (i) we use the (easy to check) fact that N t ( L ) = span ( { u − r u : u ∈ (cid:104) x (cid:105) t } ) . Then it sufﬁces to show that w ( u − r u ) ∈ N ( ˆ L ) for all w ∈ (cid:104) x (cid:105) , which can be done using induction on | w | . From (i) one easily deducesthat span ( W ) ∩ N ( ˆ L ) = { } , so we have the direct sum span ( W ) ⊕ I ( N t ( L )) . The claim (ii) follows usinginduction on the length of w ∈ (cid:104) x (cid:105) : The base case w ∈ (cid:104) x (cid:105) t follows from (32). Let w = x i v ∈ (cid:104) x (cid:105) andassume v ∈ span ( W ) ⊕ I ( N t ( L )) , that is, v = r v + q v where r v ∈ span ( W ) and q v ∈ I ( N t ( L )) . We have x i v = x i r v + x i q v so it sufﬁces to show x i r v , x i q v ∈ span ( W ) ⊕ I ( N t ( L )) . Clearly x i q v ∈ I ( N t ( L )) , since q v ∈ I ( N t ( L )) . Also, observe that x i r v ∈ R (cid:104) x (cid:105) t and therefore x i r v ∈ span ( W ) ⊕ I ( N t ( L )) by (32).We conclude the proof by showing that ˆ L is nonnegative on M ( S ) and zero on I ( T ) . Let g ∈ M ( S ) , h ∈ I ( T ) , and p ∈ R (cid:104) x (cid:105) . For p ∈ R (cid:104) x (cid:105) we extend the deﬁnition of r p so that r p ∈ span ( W ) and p − r p ∈ I ( N t ( L )) , which is possible by (ii). Then,ˆ L ( p ∗ gp ) ( i ) = ˆ L ( p ∗ gr p ) = ˆ L ( r ∗ p gp ) ( i ) = ˆ L ( r ∗ p gr p ) = L ( r ∗ p gr p ) ≥ , ˆ L ( p ∗ h ) = ˆ L ( h ∗ p ) ( i ) = ˆ L ( h ∗ r p ) = ˆ L ( r p h ) = L ( r p h ) = , where we use deg ( r ∗ p gr p ) ≤ ( t − δ ) + δ = t and deg ( r p h ) ≤ ( t − δ ) + δ ≤ t . (cid:3) Corollary 1

Let ≤ δ ≤ t < ∞ , S ⊆ Sym R (cid:104) x (cid:105) δ , and T ∈ R (cid:104) x (cid:105) δ . If L ∈ R (cid:104) x (cid:105) ∗ t is symmetric, tracial, δ -ﬂat, nonnegative on M t ( S ) , and zero on I t ( T ) , then it extends to a conic combination of trace evalu-ations at elements of D ( S ) ∩ V ( T ) . A.2 Specialization to the commutative setting

The material from Appendix A.1 can be adapted to the commutative setting. Throughout [ x ] denotes theset of monomials in x ,..., x n , i.e., the commutative analog of (cid:104) x (cid:105) .The moment matrix M t ( L ) of a linear form L ∈ R [ x ] ∗ t is now indexed by the monomials in [ x ] t , wherewe set M t ( L ) w , w (cid:48) = L ( ww (cid:48) ) for w , w (cid:48) ∈ [ x ] t . Due to the commutativity of the variables, this matrix is smallerand more entries are now required to be equal. For instance, the ( x x , x x ) -entry of M ( L ) is equal to its ( x x , x x ) -entry, which does not hold in general in the noncommutative case.Given a ∈ R n , the evaluation map at a is the linear map L a ∈ R [ x ] ∗ deﬁned by L a ( p ) = p ( a ,..., a n ) for all p ∈ R [ x ] . We can view L a as a trace evaluation at scalar matrices. Moreover, we can view a trace evaluation map ata tuple of pairwise commuting matrices as a conic combination of evaluation maps at scalars by simulta-neously diagonalizing the matrices.The quadratic module M ( S ) and the ideal I ( T ) have immediate specializations to the commutativesetting. We recall that in the commutative setting the (scalar) positivity domain and scalar variety of sets S , T ⊆ R [ x ] are given by D ( S ) = (cid:8) a ∈ R n : g ( a ) ≥ g ∈ S (cid:9) , V ( T ) = (cid:8) a ∈ R n : h ( a ) = h ∈ T (cid:9) . (35)We ﬁrst give the commutative analogue of Theorem 1, where we give an additional integral represen-tation in point (3). The equivalence of points (1) and (3) is proved in [64] based on Putinar’s Positivstel-lensatz. Here we give a direct proof on the “moment side” using the Gelfand representation. Theorem 4

Let S , T ⊆ R [ x ] with M ( S ) + I ( T ) Archimedean. For L ∈ R [ x ] ∗ , the following are equiva-lent:(1) L is nonnegative on M ( S ) , zero on I ( T ) , and L ( ) = ;(2) there exists a unital commutative C ∗ -algebra A with a state τ and X ∈ D A ( S ) ∩ V A ( T ) such thatL ( p ) = τ ( p ( X )) for all p ∈ R [ x ] ;(3) there exists a probability measure µ on D ( S ) ∩ V ( T ) such thatL ( p ) = (cid:90) D ( S ) ∩ V ( T ) p ( x ) d µ ( x ) for all p ∈ R [ x ] . Proof ((1) ⇒ (2)) This is the commutative analogue of the implication (1) ⇒ (2) in Theorem 1 (observ-ing in addition that the operators X i in (30) pairwise commute so that the constructed C ∗ -algebra A iscommutative).((2) ⇒ (3)) Let (cid:99) A denote the set of unital ∗ -homomorphisms A → C , known as the spectrum of A .We equip (cid:99) A with the weak- ∗ topology, so that it is compact as a result of A being unital (see, e.g., [10,II.2.1.4]). The Gelfand representation is the ∗ -isomorphism Γ : A → C ( (cid:99) A ) , Γ ( a )( φ ) = φ ( a ) for a ∈ A , φ ∈ (cid:99) A , Note that in the commutative setting we could avoid using the variety since V ( T ) = D ( ± T ) . However,in the noncommutative setting, the polynomials in T need not be symmetric in which case the quadraticmodule D ( ± T ) would not be well deﬁned.ower bounds on matrix factorization ranks via noncommutative polynomial optimization 45where C ( (cid:99) A ) is the set of complex-valued continuous functions on (cid:99) A . Since Γ is an isomorphism, the state τ on A induces a state τ (cid:48) on C ( (cid:99) A ) deﬁned by τ (cid:48) ( Γ ( a )) = τ ( a ) for a ∈ A . By the Riesz representationtheorem (see, e.g., [67, Theorem 2.14]) there is a Radon measure ν on (cid:99) A such that τ (cid:48) ( Γ ( a )) = (cid:90) (cid:99) A Γ ( a )( φ ) d ν ( φ ) for all a ∈ A . We then have L ( p ) = τ ( p ( X )) = τ (cid:48) ( Γ ( p ( X ))) = (cid:90) (cid:99) A Γ ( p ( X ))( φ ) d ν ( φ ) = (cid:90) (cid:99) A φ ( p ( X )) d ν ( φ )= (cid:90) (cid:99) A p ( φ ( X ) ,..., φ ( X n )) d ν ( φ ) = (cid:90) (cid:99) A p ( f ( φ )) d ν ( φ ) = (cid:90) R n p ( x ) d µ ( x ) , where f : (cid:99) A → R n is deﬁned by φ (cid:55)→ ( φ ( X ) ,..., φ ( X n )) , and where µ = f ∗ ν is the pushforward measureof ν by f ; that is, µ ( B ) = ν ( f − ( B )) for measurable B ⊆ R n .Since X ∈ D A ( S ) , we have g ( X ) (cid:23) g ∈ S , hence Γ ( g ( X )) is a positive element of C ( (cid:99) A ) ,implying g ( φ ( X ) ,..., φ ( X n )) = φ ( g ( X )) = Γ ( g ( X ))( φ ) ≥ . Similarly we see h ( φ ( X ) ,..., φ ( X n )) = h ∈ T . So, the range of f is contained in D ( S ) ∩ V ( T ) , µ is a probability measure on D ( S ) ∩ V ( T ) since L ( ) =

1, and we have L ( p ) = (cid:82) D ( S ) ∩ V ( T ) p ( x ) d µ ( x ) for all p ∈ R [ x ] .((3) ⇒ (1)) This is immediate. (cid:3) Note that the more common proof for the implication (1) ⇒ (3) in Theorem 4 relies on Putinar’s Pos-itivstellensatz [64]: if L satisﬁes (1) then L ( p ) ≥ p nonnegative on D ( S ) ∩ V ( T ) (since p + ε ∈ M ( S ) + I ( T ) for any ε > L has a representing measure µ as in (3) by theRiesz-Haviland theorem [41].The following is the commutative analogue of Theorem 2. Theorem 5

For S ⊆ R [ x ] , T ⊆ R [ x ] , and L ∈ R [ x ] ∗ , the following are equivalent:(1) L is nonnegative on M ( S ) , zero on I ( T ) , has rank ( M ( L )) < ∞ , and L ( ) = ;(2) there is a ﬁnite dimensional commutative C ∗ -algebra A with a state τ , and X ∈ D A ( S ) ∩ V A ( T ) suchthat L ( p ) = τ ( p ( X )) for all p ∈ R [ x ] ;(3) L is a convex combination of evaluations at points in D ( S ) ∩ V ( T ) .Proof ((1) ⇒ (2)) We indicate how to derive this claim from its noncommutative analogue. For this denotethe commutative version of p ∈ R (cid:104) x (cid:105) by p c ∈ R [ x ] . For any g ∈ S and h ∈ T , select symmetric polynomials g (cid:48) , h (cid:48) ∈ R (cid:104) x (cid:105) with ( g (cid:48) ) c = g and ( h (cid:48) ) c = h , and set S (cid:48) = (cid:8) g (cid:48) : g ∈ S (cid:9) ⊆ R (cid:104) x (cid:105) and T (cid:48) = (cid:8) h (cid:48) : h ∈ T (cid:9) ∪ (cid:8) x i x j − x j x i ∈ R (cid:104) x (cid:105) : i , j ∈ [ n ] , i (cid:54) = j (cid:9) ⊆ R (cid:104) x (cid:105) . Deﬁne the linear form L (cid:48) ∈ R (cid:104) x (cid:105) ∗ by L (cid:48) ( p ) = L ( p c ) for p ∈ R (cid:104) x (cid:105) . Then L (cid:48) is symmetric, tracial, non-negative on M ( S (cid:48) ) , zero on I ( T (cid:48) ) , and satisﬁes rank M ( L (cid:48) ) = rank M ( L ) < ∞ . Following the proof ofthe implication (1) ⇒ (2) in Theorem 1, we see that the operators X ,..., X n pairwise commute (since X ∈ V A ( T (cid:48) ) and T (cid:48) contains all x i x j − x j x i ) and thus the constructed C ∗ -algebra A is ﬁnite dimensionaland commutative.((2) ⇒ (3)) Here we follow the proof of this implication in Theorem 2 and observe that since A isﬁnite dimensional and commutative, it is ∗ -isomorphic to an algebra of diagonal matrices ( d m = m ∈ [ M ] ), which gives directly the desired result.((3) ⇒ (1)) is easy. (cid:3) The next result, due to Curto and Fialkow [20], is the commutative analogue of Corollary 1.

Theorem 6

Let ≤ δ ≤ t < ∞ and S , T ⊆ R [ x ] δ . If L ∈ R [ x ] ∗ t is δ -ﬂat, nonnegative on M t ( S ) , and zeroon I t ( T ) , then L extends to a conic combination of evaluation maps at points in D ( S ) ∩ V ( T ) .Proof Here too we derive the result from its noncommutative analogue in Corollary 1. As in the aboveproof for the implication (1) = ⇒ (2) in Theorem 5, deﬁne the sets S (cid:48) , T (cid:48) ⊆ R (cid:104) x (cid:105) and the linear form L (cid:48) ∈ R (cid:104) x (cid:105) ∗ t by L (cid:48) ( p ) = L ( p c ) for p ∈ R (cid:104) x (cid:105) t . Then L (cid:48) is symmetric, tracial, nonnegative on M t ( S (cid:48) ) , zeroon I t ( T (cid:48) ) , and δ -ﬂat. By Corollary 1, L (cid:48) is a conic combination of trace evaluation maps at elements of D ( S (cid:48) ) ∩ V ( T (cid:48) ) . It sufﬁces now to observe that such a trace evaluation L X is a conic combination of (scalar)evaluations at elements of D ( S ) ∩ V ( T ) . Indeed, as X ∈ V ( T (cid:48) ) , the matrices X ,..., X n pairwise commuteand thus can be assumed to be diagonal. Since X ∈ D ( S (cid:48) ) ∩ V ( T (cid:48) ) , we have g ( X ) (cid:23) g (cid:48) ∈ S (cid:48) and h (cid:48) ( X ) = h (cid:48) ∈ T (cid:48) . This implies g (( X ) j j ,..., ( X n ) j j ) ≥ h (( X ) j j ,..., ( X n ) j j ) = g ∈ S , h ∈ T , and j ∈ [ d ] . Thus L X = ∑ j L r j , where r j = (( X ) j j ,..., ( X n ) j j ) ∈ D ( S ) ∩ V ( T ) . (cid:3) L nonnegative on an Archimedean quadratic module as a conic combination ofevaluations at points, when restricting L to polynomials of bounded degree. Theorem 7

Let S , T ⊆ R [ x ] such that M ( S )+ I ( T ) is Archimedean. If L ∈ R [ x ] ∗ is nonnegative on M ( S ) and zero on I ( T ) , then for any integer k ∈ N the restriction of L to R [ x ] k extends to a conic combinationof evaluations at points in D ( S ) ∩ V ( T ) .Proof By Theorem 4 there exists a probability measure µ on D ( S ) such that L ( p ) = L ( ) (cid:90) D ( S ) ∩ V ( T ) p ( x ) d µ ( x ) for all p ∈ R [ x ] . A general version of Tchakaloff’s theorem, as explained in [5], shows that there exist r ∈ N , scalars λ ,..., λ r > x ,..., x r ∈ D ( S ) such that (cid:90) D ( S ) ∩ V ( T ) p ( x ) d µ ( x ) = r ∑ i = λ i p ( x i ) for all p ∈ R [ x ] k . Hence the restriction of L to R [ x ] k extends to a conic combination of evaluations at points in D ( S ) . (cid:3) A.3 Commutative and tracial polynomial optimization

We brieﬂy discuss here the basic polynomial optimization problems in the commutative and tracial settings.We recall how to design hierarchies of semideﬁnite programming based bounds and we give their mainconvergence properties. The classical commutative polynomial optimization problem asks to minimize apolynomial f ∈ R [ x ] over a feasible region of the form D ( S ) as deﬁned in (35): f ∗ = inf a ∈ D ( S ) f ( a ) = inf (cid:8) f ( a ) : a ∈ R n , g ( a ) ≥ g ∈ S (cid:9) . In tracial polynomial optimization, given f ∈ Sym R (cid:104) x (cid:105) , this is modiﬁed to minimizing tr ( f ( X )) over afeasible region of the form D ( S ) as in (6): f tr ∗ = inf X ∈ D ( S ) tr ( f ( X )) = inf (cid:8) tr ( f ( X )) : d ∈ N , X ∈ ( H d ) n , g ( X ) (cid:23) g ∈ S (cid:9) , where the inﬁmum does not change if we replace H d by S d . Commutative polynomial optimization isrecovered by restricting to 1 × L ( f ) over all normalized trace evaluation maps L at points in D ( S ) or D ( S ) , and then to express computationallytractable properties satisﬁed by such maps L .For S ∪ { f } ⊆ R [ x ] and (cid:100) deg ( f ) / (cid:101) ≤ t ≤ ∞ , recall the (truncated) quadratic module M t ( S ) M t ( S ) = cone (cid:8) gp : p ∈ R [ x ] , g ∈ S ∪ { } , deg ( gp ) ≤ t (cid:9) , which we use to formulate the following semideﬁnite programming lower bound on f ∗ : f t = inf (cid:8) L ( f ) : L ∈ R [ x ] ∗ t , L ( ) = , L ≥ M t ( S ) (cid:9) . For t ∈ N we have f t ≤ f ∞ ≤ f ∗ .In the same way, for S ∪ { f } ⊆ Sym R (cid:104) x (cid:105) and t such that (cid:100) deg ( f ) / (cid:101) ≤ t ≤ ∞ , we have the followingsemideﬁnite programming lower bound on f tr ∗ : f tr t = inf (cid:8) L ( f ) : L ∈ R (cid:104) x (cid:105) ∗ t tracial and symmetric , L ( ) = , L ≥ M t ( S ) (cid:9) , where we now use deﬁnition (1) for M t ( S ) .The next theorem from [46] gives fundamental convergence properties for the commutative case; seealso, e.g., [47,49] for a detailed exposition.ower bounds on matrix factorization ranks via noncommutative polynomial optimization 47 Theorem 8

Let ≤ δ ≤ t < ∞ and S ∪ { f } ⊆ R [ x ] δ with D ( S ) (cid:54) = /0 .(i) If M ( S ) is Archimedean, then f t → f ∞ as t → ∞ , the optimal values in f ∞ and f ∗ are attained, andf ∞ = f ∗ .(ii) If f t admits an optimal solution L that is δ -ﬂat, then L is a convex combination of evaluation maps atglobal minimizers of f in D ( S ) , and f t = f ∞ = f ∗ .Proof (i) By repeating the ﬁrst part of the proof of Theorem 9 in the commutative setting we see that f t → f ∞ and that the optimum is attained in f ∞ . Let L be optimal for f ∞ and let k be greater than deg ( f ) and deg ( g ) for g ∈ S . By Theorem 7, the restriction of L to R [ x ] k extends to a conic combination ofevaluations at points in D ( S ) . It follows that this extension if feasible for f ∗ with the same objective value,which shows f ∞ = f ∗ .(ii) This follows in the same way as the proof of Theorem 9(ii) below, where, instead of using Corol-lary 1, we now use its commutative analogue, Theorem 6. (cid:3) To discuss convergence for the tracial case we need one more optimization problem: f trII = inf (cid:8) τ ( f ( X )) : X ∈ D A ( S ) , A is a unital C ∗ -algebra with tracial state τ (cid:9) . This problem can be seen as an inﬁnite dimensional analogue of f tr ∗ : if we restrict to ﬁnite dimensional C ∗ -algebras in the deﬁnition of f trII , then we recover the parameter f tr ∗ (use Theorem 2 to see this). Moreover,as we see in Theorem 9(ii) below, equality f tr ∗ = f trII holds if some ﬂatness condition is satisﬁed. Whether f trII = f tr ∗ is true in general is related to Connes’ embedding conjecture (see [44,43,17]).Above we deﬁned the parameter f trII using C ∗ -algebras. However, the following lemma shows thatwe get the same optimal value if we restrict to A being a von Neumann algebra of type II with sep-arable predual, which is the more common way of deﬁning the parameter f trII as is done in [43] (andjustiﬁes the notation). We omit the proof of this lemma which relies on a GNS construction and algebraicmanipulations, standard for algebraists. Lemma 12

Let A be a C ∗ -algebra with tracial state τ and a ,..., a n ∈ A . There exists a von Neumannalgebra F of type II with separable predual, a faithful normal tracial state φ , and elements b ,..., b n ∈ F , so that for every p ∈ R (cid:104) x (cid:105) we have τ ( p ( a ,..., a n )) = φ ( p ( b ,..., b n )) andp ( a ,..., a n ) is positive ⇐⇒ p ( b ,..., b n ) is positive . For all t ∈ N we have f tr t ≤ f tr ∞ ≤ f trII ≤ f tr ∗ , where the last inequality follows by considering for A the full matrix algebra C d × d . The next theoremfrom [43] summarizes convergence properties for these parameters, its proof uses Lemma 13 below. Theorem 9

Let ≤ δ ≤ t < ∞ and S ∪ { f } ⊆ Sym R (cid:104) x (cid:105) δ with D ( S ) (cid:54) = /0 .(i) If M ( S ) is Archimedean, then f trt → f tr ∞ as t → ∞ , and the optimal values in f tr ∞ and f trII are attainedand equal.(ii) If f trt has an optimal solution L that is δ -ﬂat, then L is a convex combination of normalized traceevaluations at matrix tuples in D ( S ) , and f trt = f tr ∞ = f trII = f tr ∗ .Proof We ﬁrst show (i). As M ( S ) is Archimedean, R − ∑ ni = x i ∈ M d ( S ) for some R > d ∈ N . Sincethe bounds f tr t are monotone nondecreasing in t and upper bounded by f tr ∞ , the limit lim t → ∞ f tr t exists andit is at most f tr ∞ .Fix ε >

0. For t ∈ N let L t be a feasible solution to the program deﬁning f tr t with value L t ( f ) ≤ f tr t + ε .As L t ( ) = t we can apply Lemma 13 below and conclude that the sequence ( L t ) t has a convergentsubsequence. Let L ∈ R (cid:104) x (cid:105) ∗ be the pointwise limit. One can easily check that L is feasible for f tr ∞ . Hencewe have f tr ∞ ≤ L ( f ) ≤ lim t → ∞ f tr t + ε ≤ f tr ∞ + ε . Letting ε → f tr ∞ = lim t → ∞ f tr t and L isoptimal for f tr ∞ .Next, since L is symmetric, tracial, and nonnegative on M ( S ) , we can apply Theorem 1 to obtain afeasible solution ( A , τ , X ) to f trII satisfying (29) with objective value L ( f ) . This shows f tr ∞ = f trII and thatthe optima are attained in f tr ∞ and f trII .Finally, part (ii) is derived as follows. If L is an optimal solution of f tr t that is δ -ﬂat, then, by Corol-lary 1, it has an extension ˆ L ∈ R (cid:104) x (cid:105) ∗ that is a conic combination of trace evaluations at elements of D ( S ) .This shows f tr ∗ ≤ ˆ L ( f ) = L ( f ) , and thus the chain of equalities f tr t = f tr ∞ = f tr ∗ = f tr Π holds.8 Gribling, de Laat, LaurentWe conclude with the following technical lemma, based on the Banach-Alaoglu theorem. It is a wellknown crucial tool for proving the asymptotic convergence result from Theorem 9(i) and it is used at otherplaces in the paper. Lemma 13

Let S ⊆ Sym R (cid:104) x (cid:105) , T ⊆ R (cid:104) x (cid:105) , and assume R − ( x + ··· + x n ) ∈ M d ( S ) + I d ( T ) for somed ∈ N and R > . For t ∈ N assume L t ∈ R (cid:104) x (cid:105) ∗ t is tracial, nonnegative on M t ( S ) and zero on I t ( T ) .Then we have | L t ( w ) | ≤ R | w | / L t ( ) for all w ∈ (cid:104) x (cid:105) t − d + . In addition, if sup t L t ( ) < ∞ , then { L t } t has apointwise converging subsequence in R (cid:104) x (cid:105) ∗ .Proof We ﬁrst use induction on | w | to show that L t ( w ∗ w ) ≤ R | w | L t ( ) for all w ∈ (cid:104) x (cid:105) t − d + . For this,assume L t ( w ∗ w ) ≤ R | w | L t ( ) and | w | ≤ t − d . Then we have L t (( x i w ) ∗ x i w ) = L t ( w ∗ ( x i − R ) w ) + R · L t ( w ∗ w ) ≤ R · R | w | L t ( ) = R | x i w | L t ( ) . For the inequality we use the fact that L t ( w ∗ ( x i − R ) w ) ≤ w ∗ ( R − x i ) w can be written as the sumof a polynomial in M t ( S ) + I t ( T ) and a sum of commutators of degree at most 2 t , which follows usingthe following identity: w ∗ qhw = ww ∗ qh +[ w ∗ qh , w ] . Next we write any w ∈ (cid:104) x (cid:105) ( t − d + ) as w = w ∗ w with w , w ∈ (cid:104) x (cid:105) t − d + and use the positive semideﬁniteness of the principal submatrix of M t ( L t ) indexed by { w , w } to get L t ( w ) = L t ( w ∗ w ) ≤ L t ( w ∗ w ) L t ( w ∗ w ) ≤ R | w | + | w | L t ( ) = R | w | L t ( ) . This shows the ﬁrst claim.Suppose c : = sup t L t ( ) < ∞ . For each t ∈ N , consider the linear functional ˆ L t ∈ R (cid:104) x (cid:105) ∗ deﬁned byˆ L t ( w ) = L t ( w ) if | w | ≤ t − d + L t ( w ) = ( ˆ L t ( w ) / ( cR | w | / )) w ∈(cid:104) x (cid:105) liesin the supremum norm unit ball of R (cid:104) x (cid:105) , which is compact in the weak ∗ topology by the Banach–Alaoglutheorem. It follows that the sequence ( ˆ L t ) t has a pointwise converging subsequence and thus the sameholds for the sequence ( L t ) t . (cid:3) References

1. M.F. Anjos and J.B. Lasserre. Handbook on Semideﬁnite, Conic and Polynomial Optimization. Inter-national Series in Operations Research & Management Science Series, Springer, 2012.2. MOSEK ApS. The MOSEK optimization toolbox for MATLAB manual. Version 8.0.0.81, 2017. URL http://docs.mosek.com/8.0/toolbox.pdf

3. A. Atserias, L. Manˇcinska, D. Roberson, R. ˇS´amal, S. Severini, and A. Varvitsiotis. Quantum andnon-signalling graph isomorphisms. arXiv:1611.09837 (2016).4. G.P. Barker, L.Q. Eiﬂer, and T.P. Kezlan. A non-commutative spectral theorem,

Linear Algebra andits Applications

Proceedings of the American Mathemat-ical Society

134 (2006), 3035–3040.6. A. Berman, U.G. Rothblum. A note on the computation of the cp-rank.

Linear Algebra and its Appli-cations

419 (2006), 1–7.7. A. Berman, N. Shaked-Monderer. Completely Positive Matrices. World Scientiﬁc, 2003.8. M. Berta, O. Fawzi, V.B. Scholz. Quantum bilinear optimization.

SIAM Journal on Optimization

SIAM Review

Linear Algebra and its Applications

459 (2014), 208 – 221.12. I.M. Bomze, W. Schachinger, R. Ullrich. New lower bounds and asymptotics for the cp-rank.

SIAMJournal on Matrix Analysis and Applications

36 (2015), 20–37.13. G. Braun, S. Fiorini, S. Pokutta, D. Steurer. Approximation limits of linear programs (beyond hierar-chies).

Mathematics of Operations Research

Mathematical Programming

Journal of Operator Theory

Electronic Journal of Linear Algebra

32 (2017), 15–40.19. M. Conforti, G. Cornu´ejols, G. Zambelli. Extended formulations in combinatorial optimization.

SIAM Journal on Matrix Analysis and Applications

Linearand Multilinear Algebra , Journal of Research of theNational Bureau of Standards

69 B (1965), 125–130.25. Y. Faenza, S. Fiorini, R. Grappe, H. Tiwari. Extended formulations, non-negative factorizations andrandomized communication protocols.

Mathematical Programming

Mathe-matical Programming

Mathemat-ical Programming

Mathematical Programming

Discrete Mathematics

Journal of the ACM

SIAG/OPT Views and News

Linear Algebra and itsApplications

Mathematical Programming

SIAM Journal on Discrete Math-ematics

Mathematics ofOperations Research

Discrete& Computational Geometry http://cvxr.com/cvx

38. S. Gribling, D. de Laat, M. Laurent. Bounds on entanglement dimensions and quantum graph pa-rameters via noncommutative polynomial optimization.

Mathematical Programming Series B

LinearAlgebra and its Applications

513 (2017), 122 – 148.40. P. Groetzner, M. D¨ur. A factorization method for completely positive matrices. Preprint (2018), .41. E.K. Haviland. On the Momentum Problem for Distribution Functions in More Than One Dimension.II.

American Journal of Mathematics

IEEE Transactions on Information Theory

Journal of Global Optimization

Advancesin Mathematics

SIAM Journal on Optimization

SIAM Journal onOptimization

MathematicalProgramming

SIAM Journal on Optimization

Mathematical Programming

Combinatorica quantuminfo.quantumlah.org/memberpages/laura/corr.pdf (2014).55. R.K. Martin. Using separation algorithms to generate mixed integer model reformulations.

OperationsResearch Letters

Proceedings of the Royal Society of London A: Mathematical, Physicaland Engineering Sciences (2039) (2003), 2821–2845.57. M. Navascu´es, S. Pironio, A. Ac´ın. SDP relaxations for non-commutative polynomial optimization.In Handbook on Semideﬁnite, Conic and Polynomial Optimization (M.F. Anjos, J.B. Lasserre eds.).Springer, 2012, pp. 601–634.58. J. Nie. The A -truncated K -moment problem. Foundations of Computational Mathematics

SIAM Journal on Applied Algebra and Geometry

SIAM Journal on Optimization

MathematicalProgramming , 969–984 (1993)65. J. Renegar. On the computational complexity and geometry of the ﬁrst-order theory of the reals. PartI: Introduction. Preliminaries. The geometry of semi-algebraic sets. The decision problem for theexistential theory of the reals. Journal of Symbolic Computation

Linear and Multilinear Algebra

SIAM Journal on Matrix Analysis and Applications

SIAM Journal on Optimization

Mathematical Programming

SIAM Journal on Optimization