Inner approximating the completely positive cone via the cone of scaled diagonally dominant matrices
IInner approximating the completely positive cone via the cone of scaleddiagonally dominant matrices
Jo˜ao Gouveia · Ting Kei Pong · Mina SaeeAbstract
Motivated by the expressive power of completely positive programming to encode hard opti-mization problems, many approximation schemes for the completely positive cone have been proposed andsuccessfully used. Most schemes are based on outer approximations, with the only inner approximationsavailable being a linear programming based method proposed by Bundfuss and D¨ur [9] and also Yıldırım [25],and a semidefinite programming based method proposed by Lasserre [17]. In this paper, we propose the useof the cone of nonnegative scaled diagonally dominant matrices as a natural inner approximation to thecompletely positive cone. Using projections of this cone we derive new graph-based second-order cone ap-proximation schemes for completely positive programming, leading to both uniform and problem-dependenthierarchies. This offers a compromise between the expressive power of semidefinite programming and thespeed of linear programming based approaches. Numerical results on random problems, standard quadraticprograms and the stable set problem are presented to illustrate the effectiveness of our approach.
Copositive programming and its dual counterpart of completely positive programming are classes of convexoptimization problems that have in the past decades developed as a particularly expressive tool to encodeoptimization problems, especially for many problems arising from combinatorial or quadratic optimization.A classical example of that can be found in [10], which shows that general quadratic programs with a mixof binary and continuous variables can be expressed as copositive programs. A large body of work has beendeveloped in the area and there is a series of survey papers that can be consulted for further information.We refer the readers to [8, 11, 13] and references therein for more details.
The first author’s research was partially supported by FCT under grants UID/MAT/00324/2019 through CMUC, andP2020 SAICTPAC/0011/2015. The second author’s research was supported partly by a research grant (G-UADF) from TheHong Kong Polytechnic University and the third author’s research was supported by a PhD scholarship from FCT, grantPD/BD/128060/2016.J. GouveiaCMUC, Department of Mathematics, University of Coimbra, PortugalE-mail: [email protected]. K. PongDepartment of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, People’s Republic of ChinaE-mail: [email protected]. SaeeDepartment of Mathematics, University of Coimbra, PortugalE-mail: [email protected] a r X i v : . [ m a t h . O C ] O c t Jo˜ao Gouveia et al.
In this paper we will focus on general completely positive programs which are linear optimization prob-lems of the form (see Section 1.1 below for notation) v p := min tr( CX )s . t . tr( A i X ) = b i , i = 1 , . . . , m,X ∈ CP n , (1.1)where C and A i , i = 1 , . . . , m are symmetric matrices, and CP n is the closed cone of n × n completely positivematrices defined as CP n := { X ∈ S n : ∃ B ≥ , X = B T B } . (1.2)We also consider the dual problem of (1.1), which is the following copositive programming problem v d := max b T y s . t . C − (cid:80) mi =1 y i A i ∈ COP n , (1.3)where COP n is the closed cone of n × n copositive matrices and is defined as COP n := { X ∈ S n : v T Xv ≥ , ∀ v ≥ } . (1.4)It is well known that completely positive programming problems (1.1) are NP-hard in general. Severalapproximation schemes have been proposed and successfully used in the literature, based on approximationsto CP n . The simplest one is to replace CP n by the cone of nonnegative positive semidefinite matrices, whichis strictly larger than CP n when n ≥
5, hence leading to a lower bound to v p . Other popular lower boundsare those relying on semidefinite programming sums of squares techniques as introduced in [19]. For upperbounds based on inner approximations to CP n , the literature is somewhat sparser.One way of constructing inner approximations to CP n is to make use of the fact that the extremerays of CP n are matrices of the form vv T with v ∈ IR n + \{ } ; see [1]. Thus, one can pick uniformly spaced v ∈ ∆ n = { x ∈ IR n + : (cid:80) x i = 1 } , and approximate CP n by the cone the matrices vv T generate (see [9, 25]).This leads to linear programming (LP) approximations to (1.1). Another inner approximation to CP n is that proposed in [17], based on the theory of moments, leading to semidefinite programming (SDP)approximations to (1.1). In both cases we have hierarchies that give upper bounds to (1.1), and duallylower bounds to (1.3), and converge to the optimal value/solutions of (1.1). These inner approximationsare uniform (i.e., problem-independent) approximations, giving rise to either LP or SDP problems. Seealso [26] for a more thorough treatment of inner approximations. An extra step taken as an adaptivelinear approximation algorithm was proposed in [9]. This uses information obtained from an upper boundapproximation to selectively refine the hierarchy, leading to problem-dependent LP approximations.In this paper, we propose a new inner approximation scheme to CP n that is based on second-ordercone programming (SOCP) problems and can be either uniform or problem-dependent. Our approach ismotivated by the recent work in [3, 4] that uses the cone of scaled diagonally dominant matrices for inner- approximating the cone of positive semidefinite matrices. Specifically, we use the cones of nonnegativescaled diagonally dominant matrices and their projections as a natural inner approximation to CP n , andderive a new SOCP-based approximation scheme for completely positive and copositive programming. Ourapproximation scheme has a natural graphical interpretation. By exploiting this interpretation, we canflexibly expand or trim the SOCP problems in our hierarchy, leading to both uniform and problem-dependentapproximation schemes. The use of SOCP offers a compromise between the expressive power of SDP, thatcomes at a significant computational cost, and the speed of LP approaches, that have inherently lowerexpressive power. Numerical experiments on solving random instances, standard quadratic programs andthe stable set problem demonstrate the effectiveness of our approximation schemes. nner approximating the completely positive cone via the cone of scaled diagonally dominant matrices 3 The rest of the paper is organized as follows. We present notation and state our blanket assumptionsconcerning (1.1) and (1.3) in Section 1.1. Properties of the scaled diagonally dominant matrices are reviewedin Section 2, and a graphical refinement scheme is discussed. We derive our uniform inner approximationschemes in Section 3 with a convergence analysis, and discuss several problem-dependent inner approxima-tion schemes in Section 4. Numerical experiments are reported in Section 5.1.1 Notation and blanket assumptionsIn this paper, we use S n to denote the space of n × n symmetric matrices. Matrices are denoted by uppercase letters, and their entries are represented in the corresponding lower case letters, e.g., d ij as the ( i, j )thentry of the matrix D ; we also use lower case letters to denote vectors. For vectors u , v ∈ IR n , we write u ≥ u is elementwise nonnegative, and use [ u, v ] to denote the line segment between u and v , i.e.,[ u, v ] := { tu + (1 − t ) v : t ∈ [0 , } . For an X ∈ S n , we write X (cid:23) X is positive semidefinite, and write X ≥ X is elementwise nonnegative.We also write the trace of X as tr( X ). We use E and I to denote the square matrix of all ones and theidentity matrix, respectively, whose dimensions should be clear from the context. Finally, for a linear map A : S n → S m , we use A ∗ to denote its adjoint.The cone of positive semidefinite n × n matrices is denoted by S n + . We also use N n to denote the cone of n × n symmetric nonnegative matrices, i.e., N n := { X ∈ S n : X ≥ } , and an n × n real symmetric matrix is doubly nonnegative if it is positive semidefinite and entrywisenonnegative. It is known that the cones in (1.2) and (1.4) are dual to each other, i.e., CP n = ( COP n ) ∗ and COP n = ( CP n ) ∗ ; here, C ∗ := { Y ∈ S n : tr( XY ) ≥ , ∀ X ∈ C} for a closed convex cone C ⊆ S n . Moreover, it is also known that the cone of positive semidefinite matricesand the cone of symmetric nonnegative matrices are self-dual, i.e., S n + = ( S n + ) ∗ and N n = ( N n ) ∗ .Throughout this paper, we make the following blanket assumptions concerning (1.1) and (1.3): A
1. Problem (1.1) is feasible. A
2. The mapping X (cid:55)→ (tr( A X ) , . . . , tr( A m X )) is surjective. A
3. Problem (1.3) is strictly feasible, i.e., there exists ¯ y satisfying C − m (cid:88) i =1 ¯ y i A i ∈ int COP n . Under these assumptions, the dual Slater condition holds. Therefore we have v p = v d , with both valuesbeing finite and the primal optimal value v p being attained. In this section, we present the basis for our construction of inner approximations in Sections 3 and 4. Ourconstruction is motivated by the work in [3, 4], which studied inner approximations of the cone of positivesemidefinite matrices based on the cones of diagonally dominant and scaled diagonally dominant matrices.While their work can be directly applied to the existing SOS hierarchies to yield outer approximations of CP n (see [4, Section 4.2]) we show an alternative approach, based on the same cones but using them in afundamentally different way, in order to obtain an inner approximation to CP n . We first recall the followingdefinition of diagonally dominant and scaled diagonally dominant matrices from [4, Definition 3.3]. Jo˜ao Gouveia et al.
Definition 1
A symmetric matrix A is diagonally dominant if a ii ≥ (cid:80) j (cid:54) = i | a ij | for all i , and is said to bescaled diagonally dominant (sdd) if there exists a diagonal matrix D with positive diagonal entries suchthat DAD is diagonally dominant.In [5, Theorem 9] a convenient characterization of sdd matrices was presented. They proved that amatrix is sdd if and only if it can be written as the sum of positive semidefinite matrices whose supportsare contained in some 2 × n of n × n sdd matrices is given bySDD n := (cid:88) ≤ i The following statements hold. (i) (SDD n ) ∗ = { Q ∈ S n : ι ∗ ij ( Q ) (cid:23) , ∀ ≤ i < j ≤ n } . (ii) (SDD n + ) ∗ = (SDD n ) ∗ + N n . (iii) SDD n + = (cid:80) ≤ i Comparison of S ∩ N with SDD Since 2 × n + is an inner approximation to CP n . In Figure 1 we show a random 2-dimensional sliceof the cone of doubly nonnegative 5 × S ∩ N ) with the slice of SDD highlighted in red.The cone CP is sandwiched between them.This simple inner approximation can be used as a basis to construct more general inner second-ordercone approximations for CP n . To do that we consider a useful variant of SDD n + that will help us constructinner approximations of CP n . Definition 2 Let U ∈ IR t × n + have row sum 1. DefineSDD n + ( U ) := { U T Y U : Y ∈ SDD t + } = U T (SDD t + ) U. (2.2)The above definition is similar to the development in [3, Section 3.1], which makes use of the so-calledDD( U ). Here we assume that U has nonnegative entries so that SDD n + ( U ) will be a subcone of CP n ; seeProposition 2 below. In addition, we assume that the rows of U have sum one: we can then always thinkof the rows of U as points in the simplex ∆ n . This is no less general than just considering U ∈ IR t × n + withnonzero rows, because scaling rows of U by positive scalars does not change SDD n + ( U ). Note that SDD n + ( U )is simply a linear image of SDD t + into S n .Some basic properties of this set are that SDD n + ( I n ) = SDD n + , and that if U ∈ IR t × n + is a submatrix of˜ U ∈ IR s × n + then SDD n + ( U ) ⊆ SDD n + ( ˜ U ). We show in the next example that SDD n + ( U ) can be strictly largerthan SDD n + in general. Example 1 One can see that the matrix M = is in S ∩ N . However, M / ∈ SDD ; indeed, if we define W := − − − − − − , then tr( W M ) < W ∈ (SDD ) ∗ thanks to Proposition 1(ii), showing that M / ∈ SDD .Now, suppose we set U to be the 4 × I with anall row vector, i.e., U = 13 13 13 , Jo˜ao Gouveia et al. and consider the set SDD ( U ). Then we know SDD ⊆ SDD ( U ) because I is a submatrix of U . Further-more, we have M = U T U ∈ SDD ( U ) , where the inclusion holds because = + + ∈ SDD . Consequently, SDD ( U ) is a strictly larger set than SDD .We next give an important characterization of SDD n + ( U ) that is crucial in our development of innerapproximation schemes in Sections 3 and 4. Recall from (1.2) that CP n can be seen as the convex hull ofall vv T with v ∈ IR n + . The next theorem shows that one can think of SDD n + ( U ) similarly. Theorem 1 Let U ∈ IR t × n + have row sum . Then SDD n + ( U ) is the conic hull of all vv T with v belonging tosome line segment [ u i , u j ] , where u i is the i -th row of U .Proof Note from Proposition 1(iii) and (2.2) that any matrix in SDD n + ( U ) can be written as (cid:88) ≤ i Let U ∈ IR t × n + have row sum . Then the following statements hold. (i) The cone SDD n + ( U ) is a closed sub-cone of CP n . (ii) (SDD n + ( U )) ∗ = { Y : UY U T ∈ (SDD t + ) ∗ } = { Y : UY U T ∈ (SDD t ) ∗ + N t } .Proof From Theorem 1, it follows that SDD n + ( U ) is a sub-cone of CP n . It remains to prove closedness. Since U is nonnegative and has no zero rows, the origin is not in the convex hull of vv T , where v belongs to some[ u i , u j ], and u i is the i -th row of U . Hence SDD n + ( U ) is the conic hull of a compact convex set not containingthe origin. Thus, it is closed.To prove (ii), recall that SDD n + ( U ) = U T (SDD t + ) U . From this we see that Y ∈ (SDD n + ( U )) ∗ if and onlyif tr( Y ( U T W U )) ≥ ∀ W ∈ SDD t + , nner approximating the completely positive cone via the cone of scaled diagonally dominant matrices 7 which is the same as UY U T ∈ (SDD t + ) ∗ . This proves the first equality. The second equality in (ii) followsfrom Proposition 1(ii). This completes the proof.Note that the construction of SDD n + ( U ) is fairly general. Anytime we have a cone C ⊆ CP t and a matrix U ∈ IR t × n + whose rows have sum one, one can define the cone C ( U ) := { U T Y U : Y ∈ C} = U T C U. (2.3)This is easily seen to always verify C ( U ) ⊆ CP n , since C ( U ) ⊆ U T CP t U ⊆ CP n . It is helpful to state in thislanguage the usual LP inner approximations to CP n . Let Diag n + be the set of nonnegative n × n diagonalmatrices. Clearly Diag n + ⊆ CP n , so we can defineDiag n + ( U ) := { U T Y U : Y ∈ Diag t + } . (2.4)This is nothing more than the conic hull of the matrices u i u Ti , i = 1 , . . . , t , where u i is the i -th row of U .The use of (2.4) for inner approximation corresponds to the standard LP approximation strategy used, forexample, in [9], where strategies for efficient choices of U were explored.Another possibility for obtaining an LP relaxation would be to use the cone of n × n symmetric non-negative diagonally dominant matrices, denoted by DD n + . We have Diag n + ⊆ DD n + ⊆ SDD n + . So, if wedefine DD n + ( U ) := { U T Y U : Y ∈ DD t + } , (2.5) we would get Diag n + ( U ) ⊆ DD n + ( U ) ⊆ SDD n + ( U ). However, since one can easily see that DD n + is the conic hull of ( e i + e j )( e i + e j ) T for 1 ≤ i ≤ j ≤ n , it is not hard to see that DD n + ( U ) is simply the conic hull of( u i + u j )( u i + u j ) T for 1 ≤ i ≤ j ≤ t , and hence can be expressed in terms of Diag n + ( U (cid:48) ) for some U (cid:48) thatcontains U as a submatrix.Other choices would be to use not submatrices in S , as we did for SDD n + , but matrices in S or S . Notethat it is still true in these two cases that S i + ∩ N i ⊆ CP i . These cones would give better approximations,but we would get a much higher number of constraints that would not be second-order cone constraints butfully semidefinite. While the semidefinite constraints would still be small, the process would become morecumbersome and significantly less tractable.2.1 A graphical refinementWe saw above that SDD n + ( U ) is a natural inner approximation to CP n . Furthermore, Theorem 1 suggeststhat the fundamental property of U that guides the approximation is the collection of segments [ u i , u j ].We might associate to the points u i vertices of a graph, and to the segments its edges, and think of thecollection of points and segments as a concrete realization of the graph in IR n . This insight can be used torefine the approximation, making it more flexible. We start by generalizing the notion of SDD.Given a graph G with vertex set { , . . . , n } and edge set E , we defineSDD G := (cid:88) { i,j }∈E ι ij ( S ) , and we set SDD G := { } if E = ∅ by convention. The graph G simply encodes which principal 2 × G to be the complete graph K n , this is simply SDD n . We can define SDD G + as the nonnegative matrices in SDD G , similarly as before.Then we can naturally define a generalization of SDD n + ( U ): Definition 3 For a graph G with t vertices and a matrix U ∈ IR t × n + whose rows have sum one, we definethe cone SDD G + ( U ) as SDD G + ( U ) := { U T Y U : Y ∈ SDD G + } = U T (SDD G + ) U. Jo˜ao Gouveia et al. It will be helpful to think of the rows of U as points in the standard simplex ∆ n (i.e. with nonnegativecoordinates summing to one). These points correspond to vertices of the graph G , and the edge set of G simply encodes which pairs of rows of U (vertices) are “connected”. In other words, the pair ( G, U ) is arealization of the graph G inside ∆ n with segments for edges. We will denote by seg( G, U ) the set of pointsin some of the segments, i.e, seg( G, U ) = (cid:91) { i,j }∈E [ u i , u j ] , where u i is the i -th row of U . This set completely controls the geometry of the cone. Based on this notionand the proof of Theorem 1, we can immediately obtain the following refinement of Theorem 1 for therepresentation of SDD G + ( U ). Theorem 2 Let G be a graph with t vertices and U ∈ IR t × n + be a matrix whose rows have sum one. Then SDD G + ( U ) is the conic hull of all vv T with v ∈ seg( G, U ) . Theorem 2 gives a simple way of translating results from the graph language to results about cones. Inparticular if we have seg( G, U ) ⊆ seg( G (cid:48) , U (cid:48) ), we have SDD G + ( U ) ⊆ SDD G (cid:48) + ( U (cid:48) ), and furthermore SDD G + ( U ) ⊆ SDD K t + ( U ) = SDD n + ( U ) ⊆ CP n , for all graphs G with t vertices and matrices U ∈ IR t × n + whose rows have sumone. On the other hand, if every node of the graph G is covered by some edges, then SDD G + ( U ) ⊇ Diag n + ( U ),the usual LP inner approximation. Thus, the graphical notation allows us to construct intermediate approx-imations somewhere in between the simple LP inner approximation and the full SDD n + ( U ) version.We end the section by noting that most of our other previous results concerning SDD n + and SDD n + ( U )can be adapted with no effort to this new cone. Theorem 3 Given a graph G with t vertices and edge set E , and a matrix U ∈ IR t × n + whose rows have sum one,we have the following properties. (i) (SDD G ) ∗ = { Q ∈ S n : ι ∗ ij ( Q ) (cid:23) ∀{ i, j } ∈ E} ; (ii) SDD G + = (cid:80) { i,j }∈E ι ij ( S ∩ N ) ; (iii) (SDD G + ( U )) ∗ = { Y : UY U T ∈ (SDD G + ) ∗ } ; (iv) SDD G + ( U ) is a closed sub-cone of CP n .Proof Immediate from the proofs of Proposition 1 and Proposition 2. The main idea of this section is to approximate the solution to (1.1) by using the cones SDD G + ( U ) to replace CP n . More concretely our scheme is based on the following family of optimization problems, which dependson a graph G on t vertices and a U ∈ IR t × n + whose rows have sum one: v p ( G, U ) := min tr( CX )s . t . tr( A i X ) = b i , i = 1 , . . . , m,X ∈ SDD G + ( U ) , (3.1) and its dual problem given by v d ( G, U ) := max b T y s . t . C − (cid:80) mi =1 y i A i ∈ (SDD G + ( U )) ∗ . (3.2)Note that the semidefinite constraints in (3.1) are imposed only on 2 × G + ( U ) and (SDD G + ( U )) ∗ are both closed convex cones. Also, notice that(3.2) has a strictly feasible point due to Assumption A COP n ⊆ (SDD G + ( U )) ∗ (which nner approximating the completely positive cone via the cone of scaled diagonally dominant matrices 9 follows from SDD G + ( U ) ⊆ CP n ). Consequently, if Problem (3.1) is feasible, then v p ( G, U ) = v d ( G, U ), bothvalues are finite and v p ( G, U ) is attained. Moreover, we conclude from SDD G + ( U ) ⊆ CP n that v p ( G, U ) ≥ v p . Furthermore, we have already pointed out that augmenting the embedded graph ( G, U ) leads to anenlargement in SDD G + ( U ). In view of these observations, we will discuss strategies for constructing an“enlarging” sequence of graphs { ( G k , U k ) } to possibly tighten the gap v p ( G k , U k ) − v p as k increases.To simplify our terminology, we make the following definition. Definition 4 A sequence of embedded graphs { ( G k , U k ) } is called a positively enlarging sequence if seg( G k , U k ) ⊆ seg( G k +1 , U k +1 ), each U is a nonnegative matrix having at least n rows, each row of U (the realizations ofvertices of G ) sums to one, and each node of G is covered by at least one edge.Positively enlarging sequences verify v p ( G k , U k ) ≥ v p ( G k +1 , U k +1 ) ≥ v p by construction. Furthermore,once (3.1) is feasible for some k = k , it will remain feasible whenever k ≥ k , since the sequence of sets { SDD G k + ( U k ) } are monotonically increasing. Moreover, we have noted above that we might think of therows of U to be in the simplex ∆ n so that we can think of this as an enlarging family of graphs embeddedin ∆ n .We next study convergence of our inner approximation schemes for (1.1) based on (3.1) when { ( G k , U k ) } is a positively enlarging sequence. We first prove a convergence result concerning a similar approximationscheme, which uses Diag n + ( U ) (as defined in (2.4)) in place of SDD G + ( U ) in (3.1). This strategy was used in [9],which studied the pairs (3.1) and (3.2) with Diag n + ( U ) in place of SDD G + ( U ), and constructed an “enlarging”sequence { U k } by adding new rows to U k from ∆ n at each step. To determine what rows to add, they solveanother LP approximation scheme based on U , which they see as the set of vertices of a simplicial partitionof ∆ n , and use its results to construct a sequence of { U k } with an increasing number of rows. In studyingthe convergence of that method they proved a version of the following result for copositive programmingproblems in [9, Theorem 4.2]. The version presented below will be useful for studying convergence of ourinner approximation schemes for (1.1). Theorem 4 Assume that (1.1) is strictly feasible. Let { U k } be a sequence of matrices whose rows have sum one,where for each k , U k ∈ IR t k × n + for some t k ≥ n . Suppose that lim k →∞ max x ∈ ∆ n min i =1 ,...,t k (cid:107) x − u ki (cid:107) = 0 , (3.3) where u ki is the i -th row of U k . Consider for each k the following problem: ˜ v p ( U k ) := min tr( CX )s . t . tr( A i X ) = b i , i = 1 , . . . , m,X ∈ Diag n + ( U k ) . (3.4) Then the following statements hold. (i) ˜ v p ( U k ) is finite for all sufficiently large k and lim k →∞ ˜ v p ( U k ) = v p . (ii) The solution set of (3.4) is nonempty and uniformly bounded for all sufficiently large k . (iii) Let X k be a solution of (3.4) whenever the solution set is nonempty. Then any accumulation point of { X k } is a solution of (1.1) .Proof Note that the Diag n + ( U k ) defined in (2.4) is the conic hull of u ki u ki T , where u ki are rows of U k . Note alsothat any element X in CP n can be written as the conic combination of n ( n +1)2 matrices vv T , with v ∈ ∆ n .Thus, in view of (3.3), X can then be written as the limit of a sequence { X k } , where X k ∈ Diag n + ( U k ) foreach k . This together with Diag n + ( U k ) ⊆ CP n shows that the sequence of sets { Diag n + ( U k ) } converges to CP n in the sense of Painlev´e-Kuratowski [23, Chapter 4B].Since the mapping X (cid:55)→ A ( X ) := (tr( A X ) , . . . , tr( A m X )) is surjective by Assumption A b and the set A ( CP n ) cannot be separated in the sense of [23, Theorem 2.39]. Thus, [23, Theorem 4.32] shows that the sequence of feasible sets of (3.4) converges to the feasible set of(1.1) in the sense of Painlev´e-Kuratowski.It now follows from [23, Theorem 4.10(a)] and the nonemptiness of the feasible set of (1.1) that thefeasible sets of (3.4) are nonempty for all sufficiently large k . Hence ˜ v p ( U k ) < ∞ for all sufficiently large k . Note that for each k , the dual problem to (3.4) is dual strictly feasible because of Assumption A COP n ⊆ (Diag n + ( U k )) ∗ . Thus, ˜ v p ( U k ) is indeed finite for all sufficiently large k . Moreover, thanks to thedual strict feasibility, the solution sets of (3.4) are nonempty whenever ˜ v p ( U k ) is finite hence, in particular,are nonempty for all sufficiently large k .Next, note that by Assumption A k actually have a common Slaterpoint, i.e., there exists a matrix¯ Y := C − m (cid:88) i =1 ¯ y i A i ∈ int COP n ⊆ int (Diag n + ( U k )) ∗ . Therefore, there exists (cid:15) > Y + (cid:15) B ⊆ int COP n , where B is the unit closed ball centered at theorigin (in Fr¨obenius norm). Consequently, for any X ∈ CP n , it holds that tr( ¯ Y X ) ≥ (cid:15) (cid:107) X (cid:107) F . We now arguethat the solution sets of (3.4) are uniformly bounded for all k . Indeed, fix any k so that the solution setof (3.4) is nonempty, and let X k be a solution. Then X k is a Lagrange multiplier for the dual problem. Inparticular, ˜ v p ( U k ) = max y (cid:40) b T y + tr (cid:32) X k (cid:34) C − m (cid:88) i =1 y i A i (cid:35)(cid:33)(cid:41) ≥ b T ¯ y + tr( X k ¯ Y ) ≥ b T ¯ y + (cid:15) (cid:107) X k (cid:107) F , where the last inequality holds because X k ∈ Diag n + ( U k ) ⊆ CP n . Since { ˜ v p ( U k ) } is nonincreasing, weconclude from the above inequality that { X k } can be bounded above by a constant independent of k . Thus,the solution sets of (3.4) are uniformly bounded for all k .Finally, since the sequence of sets { Diag n + ( U k ) } is monotonically increasing, we see from [23, Proposi-tion 7.4(c)] that the objective function (with the constraint considered as the indicator function) of (3.4)epi-converges to that of (1.1) in the sense of [23, Definition 7.1]. The desired conclusion concerning limitsof { ˜ v p ( U k ) } and { X k } now follows from [23, Theorem 7.31(b)].Since Diag n + ( U ) ⊆ SDD G + ( U ) if the edges of G cover all nodes, we get the convergence of the sequence ofproblems (3.1) for a positively enlarging sequence { ( G k , U k ) } under the same assumptions on U k . But wecan actually obtain the desired convergence result under a weaker condition. Theorem 5 Assume that (1.1) is strictly feasible. Let { ( G k , U k ) } be a positively enlarging sequence such that lim k →∞ max x ∈ ∆ n min y ∈ seg( G k ,U k ) (cid:107) x − y (cid:107) = 0 . (3.5) Then it holds that: (i) v p ( G k , U k ) is finite for all sufficiently large k and lim k →∞ v p ( G k , U k ) = v p . (ii) The solution set of (3.1) with ( G, U ) = ( G k , U k ) is nonempty and uniformly bounded for all sufficientlylarge k . (iii) Let X k be a solution of (3.1) with ( G, U ) = ( G k , U k ) whenever the solution set is nonempty. Then anyaccumulation point of { X k } is a solution of (1.1) .Proof Note that from Theorem 2 and the description of Diag n + ( U ) as the conic hull of all matrices u i u Ti where u i is a row of U , if every node of G is covered by some edges and if we construct U (cid:48) by adding rowssuch that each new row lies in [ u i , u j ] for some { i, j } ∈ E , we have Diag n + ( U (cid:48) ) ⊆ SDD G + ( U ). nner approximating the completely positive cone via the cone of scaled diagonally dominant matrices 11 For each U k , subdivide each segment [ u ki , u kj ] into segments no longer than 1 /k , and add these new pointsto U k to form ˜ U k ∈ IR ˜ t k × n + . Then for each x ∈ ∆ n , we havemin i =1 ,..., ˜ t k (cid:107) x − ˜ u ki (cid:107) ≤ min y ∈ seg( G k ,U k ) (cid:107) x − y (cid:107) + 1 k , where ˜ u ki is the i -th row of ˜ U k . Thus, the sequence { ˜ U k } satisfies the conditions of Theorem 4. Consequently,from the proof of Theorem 4, the sequence of sets { Diag n + ( ˜ U k ) } converges to CP n in the sense of Painlev´e-Kuratowski. In view of this and [23, Exercise 4.3(c)], { SDD n + ( U k ) } converges to CP n . The rest of the prooffollows exactly the same arguments as in the proof of Theorem 4.An obvious way of guaranteeing the satisfaction of the condition (3.5) in Theorem 5 is to consider therows of U k to be the set of points in x ∈ ∆ n such that kx ∈ Z n , i.e. an equally spaced distribution ofpoints in the simplex, with a growing number of points. This is in fact the strategy explored in [25] withthe linear programming approach. As guaranteed by Theorem 5, this is sufficient to get convergence inour case, independently of the edges considered, but we can get away with much less. Indeed, it is easyto see, for example, that we do not need to map vertices to the interior of the simplex to get convergenceand, in fact, it is enough to uniformly sample the boundary of the simplex, and form a graph with allpossible edges between the chosen vertices. Finding embedded graphs that optimally cover ∆ n in the senseof minimizing the maximum distance to a point of the simplex seems to be a hard problem with no obvious answer, but many different strategies can be attempted. For practical purposes, it might be helpful to use the problem structure to design strategies for constructing { ( G k , U k ) } ; these may not satisfy condition (3.5)and hence the convergence behavior can be compromised, but their corresponding problem (3.1) may beeasier to solve. Indeed, as discussed in [18, Section 1.4], the amount of work per iteration for solving (3.2)is O (( m + t k ) (4 |E| + t k )) when ( G, U ) = ( G k , U k ). Hence, we will explore some problem-dependent innerapproximation schemes in the next section.Before ending this section, we would like to point out that the approach in [25] using Diag n + ( U ) for(rows of) U equally distributed in the simplex is one of the few problem-independent inner approximationsto CP n presented in the literature. The only other approach is that of [17], which leads to SDP problems.Although conceptually very interesting and with guaranteed convergence, this latter approach performspoorly in practice, because the size of the constraints grows very fast and the small instances that can bereasonably computed give weak approximations. In some sense, our SOCP based approximation schemesmay lend some of the power of semidefinite programming to the LP approximation without completelysacrificing computability. In this section, we propose some problem-dependent heuristic schemes for constructing { ( G k , U k ) } . Theytypically lead to computationally more tractable problems than a positively enlarging sequence satisfying (3.5). As we shall see later in our numerical experiments, these problem-dependent schemes in generalreturn solutions with reasonable quality, though their convergence behaviors are still unknown. A relatedproblem-dependent approach was developed in [2] for semidefinite programming. In there, they proposedthe use of the cone SDD n ( U ) and progressively enlarge the U to obtain efficient inner approximations to S n + . We propose in this section a related approach. The main difference is that in the semidefinite caseconsidered in [2], enlarging the U is relatively simple, as we can always separate the dual solution to theinner approximation from S n + , if it is not there. In the case of completely positive cone, however, there isno realistic way of even checking if the dual solution is copositive. Thus, a direct separation procedure, likethe one proposed in [2], is not viable. { ( G k , U k ) } that can potentially perform better on specific problem instances.After solving (3.1) with a choice of ( G k , U k ), if the problem is feasible, one will obtain a solution X ∈ SDD G + ( U ). By Theorem 2, this X can be written as a conic combination of vv T for v ∈ seg( G, U ). Ourplan here is to add these v as vertices to G and add some new edges from them, in order to increment thegraph. The decomposition is not unique, so one has to carefully define what is meant by it.First, note that for an M ∈ S ∩ N , there exist a ≥ b ≥ v ∈ IR so that M = vv T + (cid:20) a b (cid:21) . (4.1)This is trivially true if any element in the diagonal of M is zero. For other matrices, the above decompositioncan be realized by taking for example v = ( √ m , m / √ m ), implying a = 0 and b = m − m /m ,which is greater than or equal to zero since M (cid:23) U ∈ IR t × n + , one can see that U T ι ij ( M ) U = au i u Ti + bu j u Tj + ( v u i + v u j )( v u i + v u j ) T ,where u i is the i th row of U , 1 ≤ i < j ≤ t . So, besides the vertices u i and u j , we need at most one pointcoming from each edge [ u i , u j ] to describe U T ι ij ( M ) U . Since elements of SDD G + ( U ) are sums of matrices ofthis type for { i, j } ∈ E by Theorem 2, we have the following Lemma refining Theorem 2. Lemma 1 Any element X ∈ SDD G + ( U ) can be written as X = t (cid:88) i =1 λ i u i u Ti + (cid:88) { i,j }∈E γ ij w ij w Tij where u i is the i -th row of U ∈ IR t × n + , w ij ∈ [ u i , u j ] and λ i , γ ij ≥ . Indeed, for the first sum, it suffices to sumover the i ’s that are covered by some edges. A natural question to ask is which points we can pick in each segment. To answer this question, we assumewithout loss of generality that m > m > m > 0) in (4.1) and demonstrate how the v there can be chosen. Note that U T vv T U is supposed to correspond to a γ ij w ij w Tij in the decomposition inLemma 1.Since m > 0, we must have v > v > 0. Then we just need to see what the ratio r = v /v canbe. What we saw above right after (4.1) was the largest case, where we get r = m /m . The smallest itcan get is attained by setting v = ( m / √ m , √ m ), which gives us r = m /m . These two values for r can be seen by noting that any extremal ratio v /v for the v in (4.1) must correspond to a = 0 or b = 0. Abalanced option, defined in a way that the ratio between diagonal entries of vv T preserves the ratio between the diagonal entries of M , is to take v = √ m (cid:34)(cid:0) m m (cid:1) (cid:0) m m (cid:1) (cid:35) , (4.2)which corresponds to r = (cid:112) m /m , the geometric mean of the largest and smallest possible ratios.Based on these observations, we can now describe a general strategy for an iterative procedure to obtainupper bounds for (1.1). nner approximating the completely positive cone via the cone of scaled diagonally dominant matrices 13 Scheme 1: Successive upper bound scheme for (1.1)Step 0. Start with a complete graph G and its embedding ( G , I ) in ∆ n . Set k = 0 and U = I .Step 1. For an optimal solution X k of (3.1) with ( G, U ) = ( G k , U k ), apply Lemma 1 to obtainpoints w ij for some { i, j } ∈ E (cid:48) ⊆ E such that X = X k is a conic combination of w ij w Tij for { i, j } ∈ E (cid:48) and u i u Ti for the vertex i of G .Step 2. Define a new graph embedding ( G k +1 , U k +1 ) by adding new vertices at the points w ij (orat least some subset of them) and some new edges connecting those vertices to some of thepreviously defined ones, and possibly remove redundant edges and go to Step 1.The general idea is therefore to, augment the graph at each step by adding some vertices in the edgesthat were active in the optimal solution and some edges incident with them. All the steps have, however,some subtleties that need to be addressed.The initial embedding ( G , U ) is currently taken to be simply the embedding of K n into the vertices of ∆ n , so that SDD G + ( U ) = SDD n + . If that is infeasible, however, the strategy does not work. Nevertheless,assuming strict feasibility of (1.1), we know from Theorem 4 that there is some small enough uniformsimplicial partition of ∆ n that will make the problem feasible.The decomposition obtained in Step 1 is not unique. There are two sources of variations. First, asdiscussed above, given a 2 × M such that ι ij ( M ) appears in the decomposition of X ,we have some leeway on which point to pick in the edge [ u i , u j ]. Second, notice that even these matrices M are not uniquely defined. Since the matrices M will be a side result of the solution to (3.1), the choice of algorithm and the way the problem is encoded will have some impact in the decomposition. As for definingthe v given the matrix M , we will use the balanced approach described above in (4.2) as it seems to performwell in practice.The augmenting step (Step 2) is the most delicate of all. Different augmenting techniques will give riseto very different procedures. Here and in our numerical experiments, we consider two different approaches.We will present more implementation details in Section 5. The maximalist approach: In this approach, we add some new vertices and then connect all vertices to forma complete graph. This is memory consuming and induces some redundancies: every node we add is in themiddle of an already existing edge. Adding edges to those does not enlarge the cone SDD G + ( U ) and mightlead to numerical inaccuracies, as we create multiple ways of writing points in a segment. Some pruningtechniques could be applied. The adaptive simplicial partition approach: This is mimicking the technique introduced in [9], which main-tains the set of edges as that of a simplicial partition. At every step we would pick edges to subdivide andsubdivide all the simplices containing that edge. The choice of nodes and edges to add to G k in our approachis based on the solution we obtain from solving (3.1) for ( G, U ) = ( G k − , U k − ). This is different from [9],which relies solely on an outer approximation to guide the subdivision process.Note that we do not have any guarantee of convergence for Scheme 1. However, geometrically one can see what must happen in order for the method to get stuck, i.e., for SDD G k + ( U k ) = SDD G k +1 + ( U k +1 ). Asan immediate consequence of Theorem 2, this happens if and only if all the newly added edges in theembedding are contained in previously existing edges. This is because rank one nonnegative matrices areon the extreme rays of CP n (see [1]). Thus, we see from Theorem 2 that SDD G k + ( U k ) = SDD G k +1 + ( U k +1 ) ifand only if seg( G k , U k ) = seg( G k +1 , U k +1 ). This is an extremely strong condition, that implies essentially(depending on the scheme chosen to enlarge the graph) that the scheme gets stuck if for some iteration theoptimal solution can be attained as a combination of only the nodes, and no elements from the edges. Or, inother words, the problem (3.1) has the same solution if we replace SDD G k + ( U k ) by Diag n + ( U k ). On passing,we would like to point out that, in occasions where convergence is a serious concern, one can modify Step 2 of Scheme 1 by adding a random vertex in ∆ n in addition to those w ij : this resulting scheme is guaranteedto converge in view of Theorem 5 if (1.1) is also strictly feasible.4.2 A forgetfulness schemeThe use of a positively enlarging sequence { ( G k , U k ) } can lead to large-scale SOCP problems when k ishuge. As a heuristic to alleviate the computational complexity, we propose a simple forgetfulness scheme.In this approach, we maintain the complete graph throughout. However, we always form U k by append-ing only the newly generated vertices to U , which we choose to be the identity matrix. The details aredescribed below. Scheme 2: A forgetfulness upper bound scheme for (1.1)Step 0. Start with a complete graph G and its embedding ( G , I ) in ∆ n . Set k = 0 and U = I .Step 1. For an optimal solution X k of (3.1) with ( G, U ) = ( G k , U k ), apply Lemma 1 to obtainpoints w ij for some { i, j } ∈ E (cid:48) ⊆ E such that X = X k is a conic combination of w ij w Tij for { i, j } ∈ E (cid:48) and u i u Ti for the vertex i of G .Step 2. Define a new graph embedding ( G k +1 , U k +1 ): starting with ( G , I ), add new vertices atthe points w ij and then add edges between each new vertex and all vertices in G . Go to Step1.Note that, in general, one cannot guarantee that the forgetfulness scheme is even monotone, as weare dropping the factors u i u Ti that were a part of the representation of the optimal solution X in Step 1.However, in most studied random instances in our numerical experiments, the forgetfulness scheme appearsto be monotone. The main reason could be that the algorithm tends to write X as a conic combination ofjust the matrices w ij w Tij for { i, j } ∈ E (cid:48) . When this happens, we are guaranteed that the next iteration willbe non-increasing, but this need not always be the case. In this section, we report on numerical experiments to test our proposed approaches. All experiments wereperformed in Matlab (R2017a) on a 64-bit PC with an Intel(R) Core(TM) i7-6700 CPU (3.40GHz) and16GB RAM. We used the convex optimization software CVX [14] (version 2.1), running the solver MOSEK(version 8.0.0.60) to solve the conic optimization problems that arise.In our tests, we specifically consider the following strategies: ∆ -partition: In this approach, controlled by a parameter k ≥ 2, we generate the vertices of the graph G k asthe ( n + k − k ) vertices in the uniform subdivision of the simplex ∆ n into simplices of size k ∆ n . We then addedges between two vertices whenever their supports differ by 2.Note that by Theorem 5, if (1.1) is in addition strictly feasible, then v p ( G k , U k ) will be close to v p forall sufficiently large k , so this strategy is guaranteed to converge as k increases. Max: This is a variant of Scheme 1. Specifically, in Step 1, we decompose X k as described in Lemma 1using the balanced option given in (4.2). Then, in Step 2, we add to G k as new vertices all w ij whosecorresponding entry X kij is sufficiently large as new vertices, and add edges between all vertices so that thenew graph G k +1 is complete. Max1: This is another variant of Scheme 1. Step 1 is the same as in Max . However, in Step 2, we only addthe w ij corresponding to the largest X kij (if X kij exceeds a certain threshold) as a new vertex. We then addedges between all vertices so that the new graph G k +1 is complete. nner approximating the completely positive cone via the cone of scaled diagonally dominant matrices 15 Adaptive ∆ -partition: This is also a variant of Scheme 1. Step 1 is the same as in Max . For Step 2, the wayof adding vertices is the same as in Max1 . However, the way we add edges mimics the approach introducedin [9], which maintains the set of edges as that of a simplicial partition. Specifically, we subdivide the edgecorresponding to the w ij we added, and subdivide all the simplices containing that edge. Forgetfulness: This is a variant of Scheme 2. We perform Step 1 as in Max . As for Step 2, we add all w ij whose X kij is sufficiently large as new vertices to the original graph G . We then add edges to join eachnewly added vertex to all vertices in G .In Section 5.1, we compare the strategies Max , Adaptive ∆ -partition and Forgetfulness on randominstances of (1.1). We will also present results obtained via ∆ -partition (with k = 2) as benchmark. InSection 5.2, we will look at how Forgetfulness performs on standard quadratic programs. In Section 5.3,we will first review the standard completely positive programming formulation of the stable set problem,and then examine how Max1 performs for some standard test graphs.5.1 Random instancesIn order to test the performance of our method in a generic setting, we test it for randomly generatedinstances of problem (1.1). We generate our objective function by setting C = M T M where M is an n × n matrix with i.i.d. standard Gaussian entries, guaranteeing strict feasibility of (1.3). Furthermore, we generate the constraints by setting A i = ( M i + M Ti ) / 2, where the M i are also n × n matrices with i.i.d. standardGaussian entries, and choosing b i such that b i = tr( A i ( E + nI )). This guarantees strict feasibility of (1.1).For the first of our tests we varied the number of variables, n , and the number of constraints m , so that n is either 10 or 25 and m is either 5, 10 or 15. Given the complexity of copositive programming, there isactually no reliable way to find the true solution for these problems and there is no available implementedmethod that can generate lower bounds with which to compare our results. As a work-around, throughoutthis section we will compare the results we obtain with the classical (and somewhat coarse) lower boundprovided by replacing CP n by S n + ∩ N n in problem (1.1). We will use the difference of our approximationsto this lower bound, normalized by dividing it by the bound, as a proxy for the quality of the methods,and will simply denote it by relative gap. Precisely, this quantity is defined by gap( x ) = x − x ∗ | x ∗ | , where x is the objective value attained by the method being studied and x ∗ the doubly nonnegative lower bound.This makes it somewhat easier to compare different methods across different instances of the problem. Thedrawback is that the gap we compute is actually the sum of the gaps of the proposed method and the doublynonnegative approximation, which we don’t know how to independently estimate. Max Adaptive ∆ -Partition Forgetfulness ∆ -Partitionn m time(sec) Relative Gap time(sec) Relative Gap time(sec) Relative Gap time(sec) Relative Gap10 5 8.2 5.035e-02 25.3 6.362e-02 13.0 2.006e-02 4.6 4.620e-0110 10 19.5 2.281e-02 25.3 7.920e-02 23.0 1.849e-02 4.5 4.095e-0110 15 41.7 1.212e-02 27.6 8.207e-02 27.0 1.179e-02 5.0 2.995e-0125 5 23.0 6.748e-01 55.1 5.828e-01 38.4 2.975e-01 — —25 10 45.8 4.660e-01 62.9 7.841e-01 52.7 2.020e-01 — —25 15 71.8 3.715e-01 56.1 8.565e-01 61.5 1.545e-01 — — Table 1 Comparison of different iterative approaches The results obtained can be seen in Table 1, where we present both the average gaps and the averagerunning time for the studied methods. A few technical details are needed to be able to replicate the experi-ment. The results presented are averages of 30 instances per parameter pair. Moreover we fix the maximum Fig. 2 Evolution of the gap for the forgetfulness scheme as iterations increase number of iterations for the Max, Adaptive ∆ -partition and Forgetfulness schemes as, respectively, 5, 20and 15 for n = 10 and 5, 15 and 12 for n = 25. This was done (in an ad hoc way) to try to keep theaverage execution time as similar as possible across iterative methods, so that a fair comparison can bemade. Also, since the maximalist approach can occasionally explode in size, we also stop this approachearly when t k +1 > 200 (Recall that U k ∈ IR t k × n + for all k ). For the forgetfulness approach, we prune the U k in each step by removing redundant rows: we compute δ kij := (cid:107) u ki − u kj (cid:107) , where u ki and u kj are the i -th and the j -th rows of U k respectively, j > i , and discard u kj if δ kij < − . We also stop this approach early when t k +1 > 200 for the U k +1 after pruning. The static ∆ -partition is not computed for n = 25 as it takes toolong.These results show that the Forgetfulness scheme dominates the others in all categories as far as therelation quality/time is concerned. The relative gaps of the attained solutions jumps from between 1% and2% for n = 10 to between 15% and 30% for n = 25. Once again, we stress that these are upper boundsfor the Forgetfulness scheme quality as well as for the doubly nonnegative approximation quality, and wecannot separate the contributions from each method.We also plot in Figure 2 the evolutions of the gaps for the Forgetfulness scheme for 10 random instancesof the problem (1.1) with n = 25 and m = 10. We can see the logarithmic scale plot of the gap as iterationsincrease, and the diminishing returns in improvement percentage. Again, note that the true gap mightactually be decreasing faster, as what we are seeing is the gap to the doubly nonnegative lower bound.5.2 Standard Quadratic ProgramWe now focus on a class of more structured completely positive programs, those coming from standardquadratic programs. A standard quadratic optimization problem (SQP) consists of finding global minimizersfor a quadratic form over the standard simplex. In other words, given p ( x ) = x T Qx for some Q ∈ S n , wewant to find its minimum over the simplex ∆ = { x ∈ IR n + : (cid:80) x i = 1 } . It is shown in [7] that this can be written as the completely positive program p ∗ := min tr( QX )s . t . tr( EX ) = 1 ,X ∈ CP n , (5.1)where E is the all ones matrix. It is not difficult to see that these problems always verify the blanketassumptions presented in Section 1.1. Furthermore, since n I n is feasible, our hierarchy can always startfrom the base SDD relaxation. nner approximating the completely positive cone via the cone of scaled diagonally dominant matrices 17 To illustrate the behaviour of our method we start by taking the four concrete examples collected in [6]from several domains of application and applying the Forgetfulness scheme. We get the encouraging resultsshown in Table 2, where we can see the source of the examples, their size n , their true solution p ∗ , andthe approximate solution obtained by the Forgetfulness scheme in 5 iterations (reported under the columnapprox.). We can see that the third example is the only one where there is a significant deviation from theoptimal value, and all the results were attained in a few seconds. Example n p ∗ approx.[6, Example 5.1] - independence number of pentagon 5 1 / . / . − − . . . Table 2 Applying the Forgetfulness scheme in four small SQP examples To further explore the behaviour of our approach we followed the idea of [6] to generate random instancesof SQP. In that paper they generate matrices Q to be 10 × 10, symmetric and with entries uniformlydistributed in the interval [0 , . 5% of observed instances) or in edges (40 . 1% of observed instances). But in those two cases,by Theorem 1 our relaxation finds the optimal value at the first step. In other words, simply replacing CP by SDD ∗ gives us the exact solution in 88 . 6% of the times, with our iterative procedure only kicking in inthe remaining instances.To get a more meaningful test, we generated symmetric matrices Q with diagonal 1 and only off-diagonalentries uniformly distributed in the interval [0 , n = 10 and n = 15,comparing the results against the true value obtained using CPLEX. The parameters were chosen in thesame way as in the previous section with the number of iterations being 15 for n = 10 and 12 for n = 15.We ran 1000 instances for n = 10 and 100 for n = 15. The results are presented in Figure 3. On thetop row we show the histograms for the ratios between the true value, computed using CPLEX, and ourcomputed approximation, which provides an upper bound. We can see among other things that in bothcases around half the instances were within 1% of the true value and four fifths were within 10%. If we wantto more directly compare it with the results attained for the random instances in Table 1, one can computethe mean value of the true relative gap ˆ p − p ∗ p ∗ where p ∗ is the true optimal value returned by CPLEX and ˆ p is the approximate value obtained by our approach. We get 4 . × − and 5 . × − for n = 10 and n = 15 respectively, very much in line to what we have seen before.On the bottom row of Figure 3, as a rough reference, we have the boxplots of the CPU times (in seconds)taken by our method and CPLEX, presented here in logarithmic scale for readability. In both cases we cansee that the Forgetfulness scheme is quite stable, as it will simply stop after a set number of iterations,while CPLEX has a huge number of outliers. While for n = 10 the exact CPLEX computation is faster, in n = 15 it becomes much slower, with several outliers taking many hours. For larger values of n it quicklybecomes prohibitively slow compared to our approach.5.3 Stable set problemsWhile in the previous section we focus on random problems, the main focus of the completely posi-tive/copositive programming literature has been in highly structured combinatorial optimization problems.One of the most common applications is to the stable set problem, i.e., the problem of finding in a graph Fig. 3 Results of the random SQP tests for n = 10 (left) and n = 15 (right). The top graphs are histograms of the ratiobetween the true value and the attained upper bound, the bottom graphs box plots of the CPU times (in seconds) used byour method (1) and CPLEX (2). G a set of vertices of maximal cardinality such that no two are connected with an edge. The cardinality ofsuch a set is known as the independence number of G , denoted by α ( G ). In [12, Equation (8)], the followingcompletely positive formulation was introduced for that problem. α ( G ) = max tr( EX )s . t . tr(( A G + I ) X ) = 1 ,X ∈ CP n , (5.2)where A G is the adjacency matrix of G .In this setting we have a single constraint, so m = 1. Our inner approximations of CP n will yield inthis case lower bounds, from which one might be able to extract an actual feasible stable set with givencardinality. There are a number of good heuristic approaches to the stable set problem with good results, as there exist implementations of exact algorithms that can handle small to medium sized graphs, all performingnecessarily much better than our all-purpose conic programming approach. However, we can still see howour approach performs on its own, to get some indication of its performance on low codimension structuredproblems.In this class of problems, symmetry and structure likely imply that the growth of the matrix U inthe greedier Maximalist approach but also in the Forgetfulness approach is too fast and adds too muchredundancy. To avoid this phenomenon we take the Max1 approach: at every iteration we only add to U the vertex that has the largest weight in the solution found. This yields a greedy sort of algorithm, that nner approximating the completely positive cone via the cone of scaled diagonally dominant matrices 19 in practice tends to grow the stable set greedily one by one. We stopped as soon as the greedy process gotstuck and there was no improvement in two consecutive iterations.We computed both stability numbers, α ( G ), and clique numbers, ω ( G ), which are simply the stabilitynumbers of the complementary graph. Following [9], we started by computing the clique numbers of thegraphs where their method was tested. Our method yields the correct answers in a relatively short time, ascan be seen in Table 3, where our results are presented under the column “result”, and the column “ ω ( G )”corresponds to the known clique numbers. Note that this is not too surprising, as finding a large stableset, or clique, is in a general sense computationally easier than proving that a larger one does not exist. Inother words, lower bounding the stable set and clique numbers of particular graphs tends to be easier thanupper bounding them, so our problem has a smaller scope than what was attempted in [9], leading to muchfaster times. The graphs in the table come from two sources, the first is a 17 vertex graph from [20] that isnotoriously hard for upper bounding by convex approximations, the other five come from the 2nd DIMACSimplementation challenge test instances [16], and only hamming6-4 and johnson8-2-4 could be solved byBundfuss and D¨ur’s method in less than two hours as reported in their paper [9]. graph vertices iterations time(sec) result ω ( G )pena17 17 5 13.8 6.0000 6hamming6-2 64 31 836.7 32.0000 32hamming6-4 64 3 64.0 4.0000 4johnson8-2-4 28 3 11.7 4.0000 4johnson8-4-4 70 13 322.5 14.0000 14johnson16-2-4 120 7 637.0 8.0000 8 Table 3 Clique number for different graphs To explore the limits of our approach we tried a few more instances of the stable set problem. We triedPaley graphs, known to mimic some properties of random graphs, with some degree of success, and a fewsmall-sized instances of graphs derived from error correcting codes, available at [24]. The results are muchworse in this family, with our algorithm failing in small instances, as can be seen in Table 4, where our resultsare reported under the column “result”, and the true stability numbers are presented under the column“ α ( G )”. One word of caution is that the entire procedure is highly unstable, and simply changing the solverfrom MOSEK to SDPT3 can result in changes in the result, e.g. Paley becomes exact in SDPT3. graph vertices iterations time(sec) result α ( G )Paley 137 4 977.4 5.0000 7Paley 149 6 1841.6 7.0000 7Paley 157 6 2254.1 7.0000 71tc.16 16 6 15.7 7.0000 81tc.32 32 10 85.5 11.0000 121dc.64 64 7 235.8 8.0000 101dc.128 128 13 2491.0 14.0000 162dc.128 128 4 823.6 5.0000 5 Table 4 Stability number for different graphs References 1. Berman Abraham and Shaked-monderer Naomi. Completely positive matrices . World Scientific, 2003.0 Jo˜ao Gouveia et al.2. Amir Ali Ahmadi, Sanjeeb Dash, and Georgina Hall. Optimization over structured subsets of positive semidefinitematrices via column generation. Discrete Optimization , 24:129–151, 2017.3. Amir Ali Ahmadi and Georgina Hall. Sum of Squares Basis Pursuit with Linear and Second Order Cone Programming .Contemporary Mathematics, 2017.4. Amir Ali Ahmadi and Anirudha Majumdar. Dsos and sdsos optimization: more tractable alternatives to sum of squaresand semidefinite optimization. SIAM Journal on Applied Algebra and Geometry , 3(2):193–230, 2019.5. Erik G. Boman, Doron Chen, Ojas Parekh, and Sivan Toledo. On factor width and symmetric h-matrices. LinearAlgebra and its Applications , 405:239–248, 2005.6. Immanuel M. Bomze and Etienne de Klerk. Solving standard quadratic optimization problems via linear, semidefiniteand copositive programming. Journal of Global Optimization , 24(2):163–185, 2002.7. Immanuel M. Bomze, Mirjam D¨ur, Etienne de Klerk, Cornelis Roos, Arie J. Quist, and Tam´as Terlaky. On copositiveprogramming and standard quadratic optimization problems. Journal of Global Optimization , 18(4):301–320, 2000.8. Immanuel M. Bomze, Werner Schachinger, and Gabriele Uchida. Think co(mpletely) positive! Matrix properties,examples and a clustered bibliography on copositive optimization. Journal of Global Optimization , 52(3):423–445,2012.9. Stefan Bundfuss and Mirjam D¨ur. An adaptive linear approximation algorithm for copositive programs. SIAM Journalon Optimization , 20(1):30–53, 2009.10. Samuel Burer. On the copositive representation of binary and continuous nonconvex quadratic programs. MathematicalProgramming , 120(2):479–495, 2009.11. Samuel Burer. Copositive programming. In Handbook on Semidefinite, Conic and Polynomial Optimization , pages201–218. Springer, 2012.12. Etienne de Klerk and Dmitrii V. Pasechnik. Approximation of the stability number of a graph via copositive program-ming. SIAM Journal on Optimization , 12(4):875–892, 2002.13. Mirjam D¨ur. Copositive programming – a survey. Recent Advances in Optimization and its Applications in Engineering ,320, 2010.14. Michael Grant and Stephen Boyd. CVX: Matlab software for disciplined convex programming, version 2.1.http://cvxr.com/cvx, 2014.15. Roger A. Horn and Charles R. Johnson. Matrix Analysis . Cambridge University Press, 1985.16. David J. Johnson and Michael A. Trick. Cliques, colorings and satisfiability. 2nd DIMACS implementation challenge,1993. American Mathematical Society , 492:497, 1996.17. Jean B. Lasserre. New approximations for the cone of copositive matrices and its dual. Mathematical Programming ,144(1):265–276, 2014.18. Miguel Sousa Lobo, Lieven Vandenberghe, Stephen Boyd, and Herv´e Lebret. Applications of second-order cone pro-gramming. SIAM Journal on Optimization , 12(4):875–892, 2002.19. Pablo A. Parrilo. Structured Semidefinite Programs and Semialgebraic Geometry Methods in Robustness and Opti-mization . PhD thesis, California Institute of Technology, 2000.20. Javier Pena, Juan Vera, and Luis F. Zuluaga. Computing the stability number of a graph via linear and semidefiniteprogramming. SIAM Journal on Optimization , 18(1):87–105, 2007.21. Frank Permenter and Pablo Parrilo. Partial facial reduction: simplified, equivalent sdps via approximations of the psdcone. Mathematical Programming , 171:1–54, 2018.22. Ralph Tyrrell Rockafellar. Convex Analysis . Princeton University Press, 1970.23. Ralph Tyrrell Rockafellar and J-B. Roger Wets. Variational Analysis . Springer, 1998.24. Neil Sloane. Challenge problems: Independent sets in graphs. Information Sciences Research Center,https://oeis.org/A265032/a265032.html , 2005.25. E. Alper Yıldırım. On the accuracy of uniform polyhedral approximations of the copositive cone. Optimization Methodsand Software , 27(1):155–173, 2012.26. E. Alper Yıldırım. Inner approximations of completely positive reformulations of mixed binary quadratic programs: aunified analysis.