[PDF] Optimality Conditions for Nonlinear Semidefinite Programming via Squared Slack Variables

Abstract

In this work, we derive second-order optimality conditions for nonlinear semidefinite programming (NSDP) problems, by reformulating it as an ordinary nonlinear programming problem using squared slack variables. We first consider the correspondence between Karush-Kuhn-Tucker points and regularity conditions for the general NSDP and its reformulation via slack variables. Then, we obtain a pair of "no-gap" second-order optimality conditions that are essentially equivalent to the ones already considered in the literature. We conclude with the analysis of some computational prospects of the squared slack variables approach for NSDP.

Full PDF

OOptimality Conditions for Nonlinear SemideﬁniteProgramming via Squared Slack Variables ∗ Bruno F. Louren¸co † Ellen H. Fukuda ‡ Masao Fukushima § April 9, 2018

Abstract

In this work, we derive second-order optimality conditions for nonlinear semideﬁnite program-ming (NSDP) problems, by reformulating it as an ordinary nonlinear programming problem us-ing squared slack variables. We ﬁrst consider the correspondence between Karush-Kuhn-Tuckerpoints and regularity conditions for the general NSDP and its reformulation via slack variables.Then, we obtain a pair of “no-gap” second-order optimality conditions that are essentially equiv-alent to the ones already considered in the literature. We conclude with the analysis of somecomputational prospects of the squared slack variables approach for NSDP.

Keywords:

Nonlinear semideﬁnite programming, squared slack variables, optimality condi-tions, second-order conditions.

We consider the following nonlinear semideﬁnite programming (NSDP) problem:minimize x f ( x )subject to G ( x ) ∈ S m + , (P1)where f : R n → R and G : R n → S m are twice continuously diﬀerentiable functions, S m is thelinear space of all real symmetric matrices of dimension m × m , and S m + is the cone of all positivesemideﬁnite matrices in S m . Second-order optimality conditions for such problems were originallyderived by Shapiro in [20]. It might be fair to say that the second-order analysis of NSDP problemsis more intricated than its counterpart for classical nonlinear programming problems. That is oneof the reasons why it is interesting to have alternative ways for obtaining optimality conditionsfor (P1); see the works by Forsgren [9] and Jarre [12]. In this work, we propose to use the squaredslack variables approach for deriving these optimality conditions.It is well-known that the squared slack variables can be used to transform a nonlinear program-ming (NLP) problem with inequality constraints into a problem with only equality constraints. For ∗ This work was supported by Grant-in-Aid for Young Scientists (B) (26730012) and for Scientiﬁc Research (C)(26330029) from Japan Society for the Promotion of Science. † Department of Mathematical and Computing Sciences, Tokyo Institute of Technology, 2-12-1-W8-41 Ookayama,Meguro-ku, Tokyo 152-8552, Japan ( [email protected]) . ‡ Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan ( [email protected] ). § Department of Systems and Mathematical Science, Faculty of Science and Engineering, Nanzan University, Nagoya,Aichi 486-8673, Japan [email protected] a r X i v : . [ m a t h . O C ] D ec LP problems, this technique was hardly considered in the literature because it increases the di-mension of the problem and may lead to numerical instabilities [18]. However, recently, Fukudaand Fukushima [10] showed that the situation may change in the nonlinear second-order cone pro-gramming context. Here, we observe that the slack variables approach can be used also for NSDPproblems, because, like the nonnegative orthant and the second-order cone, the cone of positivesemideﬁnite matrices is also a cone of squares. More precisely, S m + can be represented as S m + = { Z ◦ Z | Z ∈ S m } , (1.1)where ◦ is the Jordan product associated with the space S m , which is deﬁned as W ◦ Z := W Z + ZW W, Z ∈ S m . Note that actually Z ◦ Z = ZZ = Z for any Z ∈ S m .The fact above allows us to develop the squared slack variables approach. In fact, by introducinga slack variable Y ∈ S m in (P1), we obtain the following problem:minimize x,Y f ( x )subject to G ( x ) − Y ◦ Y = 0 , (P2)which is an NLP problem with only equality constraints. Note that if ( x, Y ) ∈ R n × S m is a global(local) minimizer of (P2), then x is a global (local) minimizer of (P1). Moreover, if x ∈ R n is a global(local) minimizer of (P1), then there exists Y ∈ S m such that ( x, Y ) is a global (local) minimizerof (P2). However, the relation between stationary points, or Karush-Kuhn-Tucker (KKT) points,of (P1) and (P2) is not so trivial. As in [10], we will take a closer look at this issue, and investigatealso the relation between constraint qualiﬁcations for (P1) and (P2), using second-order conditions.We remark that second-order conditions for these two problems are vastly diﬀerent. While (P2)is a run-of-the-mill nonlinear programming problem, (P1) has nonlinear conic constraints, which aremore diﬃcult to deal with. Moreover, it is known that second-order conditions for NSDPs includean extra term, which takes into account the curvature of the cone. For more details, we refer tothe papers of Kawasaki [13], Cominetti [7] and Shapiro [20]. The main objective of this work is toshow that, under appropriate regularity conditions, second-order conditions for (P1) and (P2) areessentially the same. This suggests that the addition of the slack term already encapsulates most ofthe nonlinear structure of the cone. In the analysis, we also propose and use a sharp characterizationof positive semideﬁniteness that takes into account the rank information. We believe that such acharacterization can be useful in other contexts as well.Finally, we present results of some numerical experiments where NSDPs are reformulated as NLPsusing slack variables. Note that we are not necessarily advocating the use of slack variables and weare, in fact, driven by curiosity about its computational prospects. Nevertheless, there are a couple ofreasons why this could be interesting. First of all, conventional wisdom would say that using squaredslack variables is not a good idea, but, in reality, even for linear SDPs there are good reasons to usesuch variables. In [5, 6], Burer and Monteiro transform a linear SDP inf { tr( CX ) | A X = b, X ∈ S m + } into inf { tr( CV V (cid:62) ) | A V V (cid:62) = b } , where V is a square matrix and tr denotes the trace map. Theidea is to use a theorem, proven independently by Barvinok [1] and Pataki [16], which bounds therank of possible optimal solutions. By doing so, it is possible to restrict V to be a rectangular matrixinstead of a square one, thereby reducing the number of variables. Another reason to use squaredslack variables is that the reformulated NLP problem can be solved by eﬃcient NLP solvers thatare widely available. In fact, while there are a number of solvers for linear SDPs, as we move to thegeneral nonlinear case, the situation changes drastically [22].2hroughout the paper, the following notations will be used. For x ∈ R s and Y ∈ R (cid:96) × s , x i ∈ R and Y ij ∈ R denote the i th entry of x and the ( i, j ) entry ( i th row and j th column) of Y , respectively. Theidentity matrix of dimension (cid:96) is denoted by I (cid:96) . The transpose, the Moore-Penrose pseudo-inverse,and the rank of Y ∈ R (cid:96) × s are denoted by Y (cid:62) ∈ R s × (cid:96) , Y † ∈ R s × (cid:96) , and rank Y , respectively. If Y isa square matrix, its trace is denoted by tr( Y ) := (cid:80) i Y ii . For square matrices W and Y of the samedimension, their inner product is denoted by (cid:104) W, Y (cid:105) := tr( W (cid:62) Y ). We will use the notation Z (cid:23) Z ∈ S m + . In that case, we will denote by √ Z the positive semideﬁnite square root of Z , that is, √ Z satisﬁes √ Z (cid:23) √ Z = Z . For any function P : R s + (cid:96) → R , the gradient and the Hessian at( x, y ) ∈ R s + (cid:96) with respect to x are denoted by ∇ x P ( x, y ) and ∇ x P ( x, y ), respectively. Moreover, forany linear operator G : R s → S (cid:96) deﬁned by G v = (cid:80) si =1 v i G i with G i ∈ S (cid:96) , i = 1 , . . . , s , and v ∈ R s ,the adjoint operator G ∗ : S (cid:96) → R s is deﬁned by G ∗ Z = ( (cid:104)G , Z (cid:105) , . . . , (cid:104)G s , Z (cid:105) ) (cid:62) , Z ∈ S (cid:96) . Given a mapping H : R s → S (cid:96) , its derivative at a point x ∈ R s is denoted by ∇H ( x ) : R s → S (cid:96) anddeﬁned by ∇H ( x ) v = s (cid:88) i =1 v i ∂ H ( x ) ∂x i , v ∈ R s , where ∂ H ( x ) /∂x i ∈ S (cid:96) are the partial derivative matrices. Finally, for a closed convex cone K , wewill denote by lin K the largest subspace contained in K . Note that lin K = K ∩ − K .The paper is organized as follows. In Section 2, we recall a few basic deﬁnitions concerningKKT points and second-order conditions for (P1) and (P2). We also give a sharp characterization ofpositive semideﬁniteness. In Section 3, we prove that the original and the reformulated problems areequivalent in terms of KKT points, under some conditions. In Section 4, we establish the relationbetween constraint qualiﬁcations of those two problems. The analyses of second-order suﬃcientconditions and second-order necessary conditions are presented in Sections 5 and 6, respectively. InSection 7, we show some computational results. We conclude in Section 8, with ﬁnal remarks andfuture works. It is a well-known fact that a matrix Λ ∈ S m is positive semideﬁnite if and only if (cid:104) Λ , W (cid:105) ≥ W ∈ S m . This statement is equivalent to the self-duality of the cone S m + . However, we getno information about the rank of Λ. In the next lemma, we give a new characterization of positivesemideﬁnite matrices, which takes into account the rank information. Lemma 2.1.

Let Λ ∈ S m . The following statements are equivalent:(i) Λ ∈ S m + ;(ii) There exists Y ∈ S m such that Y ◦ Λ = 0 and Y ∈ Φ(Λ) , where

Φ(Λ) := (cid:8) Y ∈ S m | (cid:104) W ◦ W , Λ (cid:105) > for all (cid:54) = W ∈ S m with Y ◦ W = 0 (cid:9) . (2.1) For any Y satisfying the conditions in ( ii ) , we have rank Λ = m − rank Y . Moreover, if σ and σ (cid:48) arenonzero eigenvalues of Y , then σ + σ (cid:48) (cid:54) = 0 . Although we will also use (cid:104)· , ·(cid:105) to denote the inner product in R n , there will be no confusion. roof. Let us prove ﬁrst that ( ii ) implies ( i ). Since the inner product is invariant under orthogonaltransformations, we may assume without loss of generality that Y is diagonal, i.e., Y = (cid:18) D

00 0 (cid:19) , where D is a k × k nonsingular diagonal matrix, and rank Y = k . We partition Λ in blocks in asimilar way: Λ = (cid:18) A BB (cid:62) C (cid:19) , where A ∈ S k , B ∈ R k × ( m − k ) , and C ∈ S m − k . We will proceed by proving that A = 0, B = 0 and C is positive deﬁnite.First, observe that, by assumption,0 = Y ◦ Λ = (cid:18) D ◦ A DB/ B (cid:62) D/ (cid:19) (2.2)holds. Since D is nonsingular, this implies B = 0. Now, let us prove that A = 0. From (2.2) andthe fact that D is diagonal, we obtain0 = 2( D ◦ A ) ij = A ij ( D ii + D jj ) . (2.3)Again, because D is nonsingular, it must be the case that all diagonal elements of A should be zero.Now, suppose that A ij is nonzero for some i and j , with i (cid:54) = j . In face of (2.3), this can only happenif D ii + D jj = 0. Let us now consider the following matrix: W = (cid:18)(cid:102) W

00 0 (cid:19) ∈ S m , where (cid:102) W ∈ S k is a submatrix containing only two nonzero elements, (cid:102) W ij = 1 and (cid:102) W ji = 1. Then,easy calculations show that (cid:102) W ◦ D = 0, which also implies W ◦ Y = 0. Moreover, (cid:104) W ◦ W, Λ (cid:105) = 0because (cid:102) W = (cid:102) W ◦ (cid:102) W is the diagonal matrix having 1 in the ( i, i ) entry and 1 in the ( j, j ) entry,and since A ii and A jj are zero. We conclude that Y / ∈ Φ(Λ), contradicting the assumptions. So,it follows that A must be zero. Similarly, we have that D ii + D jj is never zero, which correspondsto the statement about eigenvalues σ and σ (cid:48) in the lemma. In fact, if D ii + D jj is zero, then, bytaking W exactly as before, we have W ◦ Y = 0 and (cid:104) W ◦ W, Λ (cid:105) = 0. Once again, this shows that Y / ∈ Φ(Λ), which is a contradiction.It remains to show that C is positive deﬁnite. Taking an arbitrary nonzero ˜ H ∈ S m − k , anddeﬁning H = (cid:18) H (cid:19) ∈ S m , we easily obtain H ◦ Y = 0. Since Y ∈ Φ(Λ), we have (cid:104) H , Λ (cid:105) >

0. But this shows that (cid:104) ˜ H , C (cid:105) > C is positive deﬁnite. In particular, the rank of Λ is equal to the rank of C , whichis m − rank Y .Now, let us prove that ( i ) implies ( ii ). Similarly, we may assume Λ = (cid:0) C (cid:1) , with C positivedeﬁnite. Then, we can take Y = (cid:0) E

00 0 (cid:1) , where E is any positive deﬁnite matrix. It follows that anymatrix W ∈ S m satisfying Y ◦ W = 0 must have the shape (cid:0) F (cid:1) , for some matrix F . Since C ispositive deﬁnite, it is clear that (cid:104) W ◦ W , Λ (cid:105) >

0, whenever W is nonzero.4he statement about the sum of nonzero eigenvalues might seem innocuous at ﬁrst, but it will bevery useful in Section 5. In fact, the idea for this new characterization of positive semideﬁnitenesscomes from the second-order conditions of (P2). For now, let us present another result that will benecessary. Given A ∈ S m , denote by L A : S m → S m the linear operator deﬁned by L A ( E ) := A ◦ E for all E ∈ S m . There are many examples of invertible matrices A for which the operator L A is notinvertible . This is essentially due to the failure of the condition on the eigenvalues. The followingproposition is well-known in the context of Euclidean Jordan algebra (see [21, Proposition 1]), butwe include here a short-proof for completeness. Proposition 2.2.

Let A ∈ S m . Then, L A is invertible if and only if σ + σ (cid:48) (cid:54) = 0 for every pair ofeigenvalues σ, σ (cid:48) of A ; in this case, A must be invertible.Proof. The statements in the proposition are all invariant under orthogonal transformations. Thus,we may assume without loss of generality that A is already diagonalized, and so A kk is an eigenvalueof A for every k = 1 , . . . , m .Let us show that the invertibility of L A implies the statement about the eigenvalues of A . Wewill do so by proving the contrapositive. Take i and j such that A ii + A jj = 0. Let W be such thatall the entries are zero except for W ij = W ji = 1. Then, we have A ◦ W = 0. This shows that thekernel of L A is non-trivial and consequently, L A is not invertible.Reciprocally, since we assume that A is diagonal, for every W ∈ S m , we have 2( L A ( W )) ij = W ij ( A ii + A jj ) for all i and j . Due to the fact that A ii + A jj is never zero, the kernel of L A mustonly contain the zero matrix. Hence L A is invertible, and the result follows.In view of Proposition 2.2, the matrix D which appears in the proof of Lemma 2.1 is such that L D is invertible. This will play an important role when we discuss the relation between the second-ordersuﬃcient conditions of problems (P1) and (P2). Now, let us consider the following lemma, which will allow us to present appropriately the KKTconditions of problems (P1) and (P2).

Lemma 2.3.

The following statements hold.(a) For any matrices

A, B ∈ R m × m , let ϕ : R m × m → R be deﬁned by ϕ ( Z ) := tr( Z (cid:62) AZB ) . Then,we have ∇ ϕ ( Z ) = AZB + A (cid:62) ZB (cid:62) .(b) For any matrix A ∈ S m , let ϕ : S m → R be deﬁned by ϕ ( Z ) := (cid:104) Z ◦ Z, A (cid:105) . Then, we have ∇ ϕ ( Z ) = 2 Z ◦ A .(c) For any matrix A ∈ R m × m and function θ : R n → S m , let ψ : R n → R be deﬁned by ψ ( x ) = (cid:104) θ ( x ) , A (cid:105) . Then, we have ∇ ψ ( x ) = ∇ θ ( x ) ∗ A .(d) Let A, B ∈ S m . Then, they commute, i.e., AB = BA , if and only if A and B are simultaneouslydiagonalizable by an orthogonal matrix, i.e., there exists an orthogonal matrix Q such that QAQ (cid:62) and

QBQ (cid:62) are diagonal.(e) Let

A, B ∈ S m + . Then, AB = 0 if and only if (cid:104) A, B (cid:105) = 0 . Take A = (cid:0) − (cid:1) , for example. roof. (a) See [2, Section 10.7].(b) Note that ϕ ( Z ) = (cid:104) Z ◦ Z, Λ (cid:105) = (cid:104) ZZ (cid:62) , A (cid:105) + (cid:104) Z (cid:62) Z, A (cid:105) = tr( ZZ (cid:62) A )+ tr( Z (cid:62) ZA ) = tr( Z (cid:62) AZ )+ tr( Z (cid:62) ZA ). Let ϕ ( Z ) = tr( Z (cid:62) AZ ) and ϕ ( Z ) = tr( Z (cid:62) ZA ). Then, from item (a), we have ∇ ϕ ( Z ) = AZ + A (cid:62) Z and ∇ ϕ ( Z ) = ZA + ZA (cid:62) . Taking into account the symmetry of A , we have ∇ ϕ ( Z ) = 2 AZ and ∇ ϕ ( Z ) = 2 ZA . Hence we have ∇ ϕ ( Z ) = ∇ ϕ ( Z ) + ∇ ϕ ( Z ) = 2 A ◦ Z .(c) Observe that ψ ( x ) = (cid:104) θ ( x ) , A (cid:105) = tr( θ ( x ) A ) = (cid:80) i,j θ ( x ) ij A ij for any x ∈ R n . Then, we have ∇ ψ ( x ) =  (cid:80) i,j ( ∂θ ( x ) ij /∂x ) A ij ... (cid:80) i,j ( ∂θ ( x ) ij /∂x n ) A ij  =  (cid:104) ∂θ ( x ) /∂x , A (cid:105) ... (cid:104) ∂θ ( x ) /∂x n , A (cid:105)  = ∇ θ ( x ) ∗ A, where the last equality follows from the deﬁnition of adjoint operator.(d) See [2, Section 8.17].(e) See [2, Section 8.12].We can now recall the KKT conditions of problems (P1) and (P2). First, deﬁne the Lagrangianfunction L : R n × S m → R associated with problem (P1) as L ( x, Λ) := f ( x ) − (cid:104) G ( x ) , Λ (cid:105) . We say that ( x, Λ) ∈ R n × S m is a KKT pair of problem (P1) if the following conditions are satisﬁed: ∇ f ( x ) − ∇ G ( x ) ∗ Λ = 0 , (P1.1)Λ (cid:23) , (P1.2) G ( x ) (cid:23) , (P1.3)Λ ◦ G ( x ) = 0 , (P1.4)where, from Lemma 2.3(c), we have ∇ f ( x ) − ∇ G ( x ) ∗ Λ = ∇ x L ( x, Λ). Applying the trace mapon both sides of (P1.4), we see that condition (P1.4) is equivalent to (cid:104) Λ , G ( x ) (cid:105) = 0. This result,together with the fact that Λ (cid:23) G ( x ) (cid:23)

0, shows that (P1.4) is also equivalent to Λ G ( x ) = 0,by Lemma 2.3(e). Moreover, the equality (P1.4) implies that Λ and G ( x ) commute, which means, byLemma 2.3(d), that they are simultaneously diagonalizable by an orthogonal matrix. The followingdeﬁnition is also well-known. Deﬁnition 2.4. If ( x, Λ) ∈ R n × S m is a KKT pair of (P1) such that rank G ( x ) + rank Λ = m, then ( x, Λ) is said to satisfy the strict complementarity condition. As for the equality constrained NLP problem (P2), we observe that ( x, Y, Λ) ∈ R n × S m × S m isa KKT triple if the conditions below are satisﬁed: ∇ ( x,Y ) L ( x, Y, Λ) = 0 ,G ( x ) − Y ◦ Y = 0 , L : R n × S m × S m → R is the Lagrangian function associated with (P2), which is given by L ( x, Y, Λ) := f ( x ) − (cid:104) G ( x ) − Y ◦ Y, Λ (cid:105) . From Lemma 2.3(b),(c), these conditions can be written as follows: ∇ f ( x ) − ∇ G ( x ) ∗ Λ = 0 , (P2.1)Λ ◦ Y = 0 , (P2.2) G ( x ) − Y ◦ Y = 0 . (P2.3)For problem (P1), we say that the Mangasarian-Fromovitz constraint qualiﬁcation (MFCQ) holdsat a point x if there exists some d ∈ R n such that G ( x ) + ∇ G ( x ) d ∈ int S m + , where int S m + denotes the interior of S m + , that is, the set of symmetric positive deﬁnite matrices. If x is a local minimum for (P1), MFCQ ensures the existence of a Lagrange multiplier Λ and that the setof multipliers is bounded. A more restrictive assumption is the nondegeneracy condition discussedin [20], where it is presented in terms of a transversality condition on the map G . However, at theend, it boils down to the following condition. Deﬁnition 2.5.

Suppose that ( x, Λ) ∈ R n × S m is a KKT pair of (P1) such that S m = lin T S m + ( G ( x )) + Im ∇ G ( x ) , where Im ∇ G ( x ) denotes the image of the linear map ∇ G ( x ) , T S m + ( G ( x )) denotes the tangent cone of S m + at G ( x ) , and lin T S m + ( G ( x )) is the lineality space of the tangent cone T S m + ( G ( x )) , i.e., lin T S m + ( G ( x )) = T S m + ( G ( x )) ∩ −T S m + ( G ( x )) (see, for instance, the observations on page 310 in [20]). Then, ( x, Λ) issaid to satisfy the nondegeneracy condition. A good thing about the nondegeneracy condition is that it ensures that Λ is unique.For problem (P2), a common constraint qualiﬁcation is the linear independence constraint quali-ﬁcation (LICQ), which simply requires that the gradients of the constraints be linearly independent.In Section 4, we will show that LICQ and the nondegeneracy are essentially equivalent.

Since (P2) is just an ordinary equality constrained nonlinear program, second-order suﬃcient con-ditions are well-known and can be written as follows.

Proposition 2.6.

Let ( x, Y, Λ) ∈ R n ×S m ×S m be a KKT triple of problem (P2) . The second-ordersuﬃcient condition (SOSC-NLP) holds if (cid:104)∇ x L ( x, Λ) v, v (cid:105) + 2 (cid:104) W ◦ W , Λ (cid:105) > for every nonzero ( v, W ) ∈ R n × S m such that ∇ G ( x ) v − Y ◦ W = 0 . We refer to this condition as SOSC-NLP in order to distinguish it from SOSC for SDP. roof. The second-order suﬃcient condition for (P2) holds if (cid:104)∇ x,Y ) L ( x, Y, Λ)( v, W ) , ( v, W ) (cid:105) > v, W ) ∈ R n × S m such that ∇ G ( x ) v − Y ◦ W = 0; see [3, Section 3.3] or [15,Theorem 12.6]. Since (cid:104)∇ x,Y ) L ( x, Y, Λ)( v, W ) , ( v, W ) (cid:105) = (cid:104)∇ x L ( x, Λ) v, v (cid:105) + 2 (cid:104) W ◦ W , Λ (cid:105) , we have the desired result.Similarly, we have the following second-order necessary condition. Note that we require the LICQto hold. Proposition 2.7.

Let ( x, Y ) be a local minimum for (P2) and ( x, Y, Λ) ∈ R n × S m × S m be a KKTtriple such that LICQ holds. Then, the following second-order necessary condition (SONC-NLP)holds: (cid:104)∇ x L ( x, Λ) v, v (cid:105) + 2 (cid:104) W ◦ W , Λ (cid:105) ≥ for every ( v, W ) ∈ R n × S m such that ∇ G ( x ) v − Y ◦ W = 0 .Proof. See [15, Theorem 12.5].Second-order conditions for (P1) are a more delicate matter. Let ( x, Λ) be a KKT pair of(P1). It is true that a suﬃcient condition for optimality is that the Hessian of the Lagrangian bepositive deﬁnite over the set of critical directions. However, replacing “positive deﬁnite” by “positivesemideﬁnite” does not yield a necessary condition. Therefore, it seems that there is a gap betweennecessary and suﬃcient conditions. In order to close the gap, it is essential to add an additionalterm to the Hessian of the Lagrangian. For the theory behind this see, for instance, the papers byKawasaki [13], Cominetti [7], and Bonnans, Cominetti and Shapiro [4]. The condition below wasobtained by Shapiro in [20] and it is suﬃcient for ( x, Λ) to be a local minimum, see Theorem 9therein.

Proposition 2.8.

Let ( x, Λ) ∈ R n ×S m be a KKT pair of problem (P1) satisfying strict complemen-tarity and the nondegeneracy condition. The second-order suﬃcient condition (SOSC-SDP) holdsif (cid:104) ( ∇ x L ( x, Λ) + H ( x, Λ)) d, d (cid:105) > for all nonzero d ∈ C ( x ) , where C ( x ) := (cid:110) d ∈ R n (cid:12)(cid:12)(cid:12) ∇ G ( x ) d ∈ T S m + ( G ( x )) , (cid:104)∇ f ( x ) , d (cid:105) = 0 (cid:111) is the critical cone at x , and H ( x, Λ) ∈ S m is a matrix with elements H ( x, Λ) ij := 2 tr (cid:18) ∂G ( x ) ∂x i G ( x ) † ∂G ( x ) ∂x j Λ (cid:19) (2.7) for i, j = 1 , . . . , n . In this case, ( x, Λ) is a local minimum for (P1) . Conversely, if x is a localminimum for (P1) and ( x, Λ) is a KKT pair satisfying strict complementarity and nondegeneracy,then the following second-order necessary condition (SONC-SDP) holds: (cid:104) ( ∇ x L ( x, Λ) + H ( x, Λ)) d, d (cid:105) ≥ for all d ∈ C ( x ) . Equivalence between KKT points

Let us now establish the relation between KKT points of the original problem (P1) and its reformu-lation (P2). We start with the following simple implication.

Proposition 3.1.

Let ( x, Λ) ∈ R n × S m be a KKT pair of problem (P1) . Then, there exists Y ∈ S m such that ( x, Y, Λ) is a KKT triple of (P2) .Proof. Let Y ∈ S m + be the positive semideﬁnite matrix satisfying G ( x ) = Y ◦ Y . Let us show that( x, Y, Λ) is a KKT triple of (P2). The conditions (P2.1) and (P2.3) are immediate. We need to showthat (P2.2) holds.Recall that (P1.4) along with (P1.2) and (P1.3) implies G ( x )Λ = 0, due to Lemma 2.3(e). Itmeans that every column of Λ lies in the kernel of G ( x ). However, G ( x ) and Y share exactly thesame kernel, since G ( x ) = Y . It follows that Y Λ = 0, so that Y ◦ Λ = 0.The converse is not always true. That is, even if ( x, Y,

Λ) is a KKT triple of (P2), ( x, Λ) mayfail to be a KKT pair of (P1), since Λ need not be positive semideﬁnite. This, however, is the onlyobstacle for establishing equivalence.

Proposition 3.2. If ( x, Y, Λ) ∈ R n × S m × S m is a KKT triple of (P2) such that Λ is positivesemideﬁnite, then ( x, Λ) is a KKT pair of (P1) .Proof. The only condition that remains to be veriﬁed is (P1.4). Due to (P2.2), we have0 = (cid:104)

Y, Y ◦ Λ (cid:105) = (cid:104) Y ◦ Y , Λ (cid:105) = (cid:104) G ( x ) , Λ (cid:105) . Since G ( x ) and Λ are both positive semideﬁnite, we must have G ( x ) ◦ Λ = 0.The previous proposition leads us to consider conditions which ensure that Λ is positive semidef-inite. It turns out that if the second-order suﬃcient condition for (P2) is satisﬁed at ( x, Y,

Λ), thenΛ is positive semideﬁnite. In fact, a weaker condition is enough to ensure positive semideﬁniteness.

Proposition 3.3.

Suppose that ( x, Y, Λ) ∈ R n × S m × S m is a KKT triple of (P2) such that Y ∈ Φ(Λ) , where

Φ(Λ) is deﬁned by (2.1) , that is, (cid:104) W ◦ W , Λ (cid:105) > for every nonzero W ∈ S m such that Y ◦ W = 0 . Then ( x, Λ) is a KKT pair of (P1) satisfyingstrict complementarity.Proof. Due to Lemma 2.1, Λ is positive semideﬁnite and rank Y = m − rank Λ. Now, since G ( x ) = Y ,we have rank G ( x ) = rank Y . Therefore ( x, Λ) must satisfy the strict complementarity condition.

Corollary 3.4.

Suppose that SOSC-NLP is satisﬁed at a KKT triple ( x, Y, Λ) ∈ R n × S m × S m .Then ( x, Λ) is a KKT pair for (P1) which satisﬁes the strict complementarity condition.Proof. If we take v = 0 in the deﬁnition of SOSC-NLP, we obtain Y ∈ Φ(Λ). So, the result followsfrom Proposition 3.3.The next result is a reﬁnement of Proposition 3.1.

Proposition 3.5.

Suppose that ( x, Λ) ∈ R n × S m is a KKT pair of (P1) which satisﬁes the strictcomplementarity condition. Then there exists some Y ∈ Φ(Λ) such that ( x, Y, Λ) is a KKT triple of (P2) . roof. Without loss of generality, we may assume that G ( x ) has the shape (cid:0) A

00 0 (cid:1) , where A ∈ S k + and k = rank G ( x ). Since G ( x ) and Λ are both positive semideﬁnite, the condition G ( x ) ◦ Λ = 0is equivalent to G ( x )Λ = 0. It follows that Λ has the shape (cid:0) C (cid:1) for some matrix C ∈ S m − k + .However, strict complementarity holds only if C is positive deﬁnite. Therefore, it is enough to pick Y to be the positive semideﬁnite matrix satisfying Y = G ( x ).Finally, note that if W = (cid:0) W W W (cid:62) W (cid:1) , with W ∈ S m , W ∈ S k , W ∈ R k × ( m − k ) , W ∈ S m − k , thenthe condition Y ◦ W = 0 together with Proposition 2.2 implies W = 0, W = 0. Since C is positivedeﬁnite, we must have (cid:104) Λ , W ◦ W (cid:105) >

0, if W (cid:54) = 0. From (2.1), this shows that Y ∈ Φ(Λ).

In this section, we shall show that the nondegeneracy in Deﬁnition 2.5 is essentially equivalent toLICQ for (P2). In [20], Shapiro mentions that the nondegeneracy condition for (P1) is an analogueof LICQ, but he also states that the analogy is imperfect. For instance, when G ( x ) is diagonal, (P1)naturally becomes an NLP, since the semideﬁniteness constraint is reduced to the nonnegativityof the diagonal elements. However, even in that case, LICQ and the nondegeneracy in Deﬁnition2.5 might not be equivalent (see page 309 of [20]). In this sense, it is interesting to see whether acorrespondence between the conditions can be established when (P1) is reformulated as (P2). Beforethat, we recall some facts about the geometry of the cone of positive semideﬁnite matrices.Let A ∈ S m + and let U be an m × k matrix whose columns form a basis for the kernel of A . Then,the tangent cone of S m + at A is written as T S m + ( A ) = { E ∈ S m | U (cid:62) EU ∈ S k + } (see [17] or [20, Equation 26]). For example, if A can be written as (cid:0) D

00 0 (cid:1) , where D is positivedeﬁnite, then the matrices in T S m + ( A ) have the shape (cid:0) C FF (cid:62) H (cid:1) , where the only restriction is that H should be positive semideﬁnite.Our ﬁrst step is to notice that nondegeneracy implies that the only matrix which is orthogonalto both lin T S m + ( G ( x )) and Im ∇ G ( x ) is the trivial one, i.e., W ∈ (lin T S m + ( G ( x ))) ⊥ and ∇ G ( x ) ∗ W = 0 ⇒ W = 0 , (Nondegeneracy)where ⊥ denotes the orthogonal complement.On the other hand, the LICQ constraint qualiﬁcation for (P2) holds at a feasible point ( x, Y ) if thelinear function which maps ( v, W ) to ∇ G ( x ) v − W ◦ Y is surjective. This happens if and only if theadjoint map has trivial kernel. The adjoint map takes W ∈ S m and maps it to ( ∇ G ( x ) ∗ W, − W ◦ Y ).So the surjectivity assumption amounts to requiring that every W which satisﬁes both ∇ G ( x ) ∗ W = 0and W ◦ Y = 0 must actually be 0, that is, W ◦ Y = 0 and ∇ G ( x ) ∗ W = 0 ⇒ W = 0 . (LICQ)The subspaces ker L Y = { W | Y ◦ W = 0 } and (lin T S m + ( G ( x ))) ⊥ are closely related. The nextproposition clariﬁes this connection. Proposition 4.1.

Let V = Y , then (lin T S m + ( V )) ⊥ ⊆ ker L Y . If Y is positive semideﬁnite, then ker L Y ⊆ (lin T S m + ( V )) ⊥ as well. roof. Note that if Q is an orthogonal matrix, then T S m + ( Q (cid:62) V Q ) = Q (cid:62) T S m + ( V ) Q . The same is truefor ker L Y , i.e., ker L Q (cid:62) Y Q = Q (cid:62) ker L Y Q . So, without loss of generality, we may assume that Y isdiagonal and that Y = (cid:18) D

00 0 (cid:19) , where D is an r × r nonsingular diagonal matrix. Then, we have T S m + ( V ) = (cid:26)(cid:18) A BB (cid:62) C (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) A ∈ S r , B ∈ R r × ( m − r ) , C ∈ S m − r + (cid:27) , lin T S m + ( V ) = (cid:26)(cid:18) A BB (cid:62) (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) A ∈ S r , B ∈ R r × ( m − r ) (cid:27) , (lin T S m + ( V )) ⊥ = (cid:26)(cid:18) C (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) C ∈ S m − r (cid:27) . This shows that every matrix Z ∈ (lin T S m + ( V )) ⊥ satisﬁes Y Z = 0 and therefore lies in ker L Y . Now,the kernel of L Y can be described as follows:ker L Y = (cid:26)(cid:18) A C (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) A ◦ D = 0 , C ∈ S m − r (cid:27) . If Y is positive semideﬁnite, then D is positive deﬁnite and the operator L D is nonsingular. Hence A ◦ D = 0 implies A = 0. In this case, ker L Y coincides with (lin T S m + ( V )) ⊥ . Corollary 4.2. If ( x, Y ) ∈ R n × S m satisﬁes LICQ for the problem (P2) , then nondegeneracy issatisﬁed at x for (P1) . On the other hand, if x satisﬁes nondegeneracy and if Y = (cid:112) G ( x ) , then ( x, Y ) satisﬁes LICQ for (P2) .Proof. It follows easily from Proposition 4.1.

In this section, we examine the relations between KKT points of (P1) and (P2) that satisfy second-order suﬃcient conditions.

Proposition 5.1.

Suppose that ( x, Y, Λ) ∈ R n × S m × S m is a KKT triple of (P2) satisfyingSOSC-NLP. Then ( x, Λ) is a KKT pair of (P1) that satisﬁes strict complementarity and (2.6) . Ifadditionally, ( x, Y, Λ) satisﬁes LICQ for (P2) or ( x, Λ) satisﬁes nondegeneracy for (P1) , then ( x, Λ) satisﬁes SOSC-SDP as well.Proof. In Corollary 3.4, we have already shown that ( x, Λ) is a KKT pair of (P1) and strict com-plementarity is satisﬁed. In addition, if ( x, Y,

Λ) satisﬁes LICQ for (P2), then Corollary 4.2 ensuresthat ( x, Λ) satisﬁes nondegeneracy for (P1). It only remains to show that (2.6) is also satisﬁed. Tothis end, consider an arbitrary nonzero d ∈ R n such that ∇ G ( x ) d ∈ T S m + ( G ( x )) and (cid:104)∇ f ( x ) , d (cid:105) = 0.We are thus required to show that (cid:104) ( ∇ x L ( x, Λ) + H ( x, Λ)) d, d (cid:105) > , (5.1)where H ( x, Λ) is deﬁned in (2.7). A ﬁrst observation is that, due to (P1.1), we have (cid:104)∇ G ( x ) d, Λ (cid:105) = (cid:104) d, ∇ G ( x ) ∗ Λ (cid:105) = (cid:104) d, ∇ f ( x ) (cid:105) = 0, that is, ∇ G ( x ) d ∈ { Λ } ⊥ . We recall that H ( x, Λ) satisﬁes (cid:104) H ( x, Λ) d, d (cid:105) = 2 (cid:104) Λ , ( ∇ G ( x ) d ) (cid:62) G † ( x ) ∇ G ( x ) d (cid:105) . (5.2)11he strategy here is to ﬁrst identify the shape and properties of several matrices involved, beforeshowing that (5.1) holds. Without loss of generality, we may assume that G ( x ) is diagonal, i.e., G ( x ) = (cid:18) D

00 0 (cid:19) , where D is a k × k diagonal positive deﬁnite matrix. We also have Y = (cid:18) E

00 0 (cid:19) , with E an invertible diagonal matrix such that E = D . Since SOSC-NLP holds, considering v = 0in (2.4), we obtain (cid:104) W ◦ W , Λ (cid:105) > W ∈ S m such that W ◦ Y = 0, which, by (2.1),shows that Y ∈ Φ(Λ). From (P2.2), we also have Y ◦ Λ = 0. Thus, Lemma 2.1 ensures that everypair σ, σ (cid:48) of nonzero eigenvalues of Y satisﬁes σ + σ (cid:48) (cid:54) = 0. Since the eigenvalues of E are preciselythe nonzero eigenvalues of Y , it follows that L E is an invertible operator, by virtue of Proposition2.2. Moreover, due to the strict complementarity condition, we obtainΛ = (cid:18) (cid:19) , where Γ ∈ S m − k + is positive deﬁnite. The pseudo-inverse of G ( x ) is given by G ( x ) † = (cid:18) D −

00 0 (cid:19) . We partition ∇ G ( x ) d in blocks in the following fashion: ∇ G ( x ) d = (cid:18) A BB (cid:62) C (cid:19) , where A ∈ S k , B ∈ R k × ( m − k ) and C ∈ S m − k . Inasmuch as ∇ G ( x ) d lies in the tangent cone T S m + ( G ( x )), C must be positive semideﬁnite. However, as observed earlier, we have (cid:104)∇ G ( x ) d, Λ (cid:105) = 0,which yields (cid:104) C, Γ (cid:105) = 0. Since Γ is positive deﬁnite, this implies C = 0, and hence ∇ G ( x ) d = (cid:18) A BB (cid:62) (cid:19) . We are now ready to show that (5.1) holds. We shall do that by considering v = d in (2.4) andexhibiting some W such that ∇ G ( x ) d − Y ◦ W = 0 and 2 (cid:104) W ◦ W , Λ (cid:105) = (cid:104) H ( x, Λ) d, d (cid:105) . Then SOSC-NLP will ensure that (5.1) holds. Note that, for any Z ∈ S m − k , W Z = (cid:18) L − E ( A ) / E − BB (cid:62) E − Z (cid:19) is a solution to the equation ∇ G ( x ) d − Y ◦ W = 0. Moreover, any solution to that equation musthave this particular shape. Therefore, the proof will be complete if we can choose Z such that2 (cid:104) W ◦ W , Λ (cid:105) = (cid:104) H ( x, Λ) d, d (cid:105) holds. Observe that, by (5.2),2 (cid:104) W Z ◦ W Z , Λ (cid:105) − (cid:104) H ( x, Λ) d, d (cid:105) = 2 (cid:104) W Z , Λ (cid:105) − (cid:104) ( ∇ G ( x ) d ) (cid:62) G † ( x ) ∇ G ( x ) d, Λ (cid:105) = 2 (cid:104) Z + B (cid:62) E − B, Γ (cid:105) − (cid:104) B (cid:62) D − B, Γ (cid:105) = (cid:104) Z + 2 B (cid:62) D − B − B (cid:62) D − B, Γ (cid:105) = (cid:104) Z , Γ (cid:105) . (5.3)Thus, taking Z = 0 yields 2 (cid:104) W Z ◦ W Z , Λ (cid:105) = (cid:104) H ( x, Λ) d, d (cid:105) . This completes the proof.12 roposition 5.2. Suppose that ( x, Λ) ∈ R n × S m is a KKT pair for (P1) satisfying (2.6) and thestrict complementarity condition. Then, there exists Y ∈ S m such that ( x, Y, Λ) is a KKT triple for (P2) satisfying SOSC-NLP.Proof. Again, we assume without loss of generality that G ( x ) is diagonal, so that G ( x ) = (cid:18) D

00 0 (cid:19) , where D is a k × k diagonal positive deﬁnite matrix. Take Y such that Y = (cid:18) E

00 0 (cid:19) , where E = D and E is positive deﬁnite; in particular L E is invertible. Then ( x, Y, Λ) is a KKTtriple for (P2). Due to strict complementarity, we haveΛ = (cid:18) (cid:19) , where Γ ∈ S m − k + is positive deﬁnite. We are required to show that (cid:104)∇ x L ( x, Λ) v, v (cid:105) + 2 (cid:104) W ◦ W , Λ (cid:105) > v, W ) such that ∇ G ( x ) v − Y ◦ W = 0. So, let ( v, W ) satisfy ∇ G ( x ) v − Y ◦ W =0. Let us ﬁrst consider what happens when v = 0. Partitioning W in blocks, we have Y ◦ (cid:18) W W W (cid:62) W (cid:19) = (cid:18) E ◦ W EW / W (cid:62) E/ (cid:19) . Recall that L E as well as E is invertible. So, Y ◦ W = 0 implies W = 0 and W = 0. If W (cid:54) = 0,then W must be nonzero, which in turn implies that W ◦ W must also be nonzero. We then have (cid:104) W ◦ W , Λ (cid:105) = (cid:104) W ◦ W , Γ (cid:105) . But (cid:104) W ◦ W , Γ (cid:105) must be greater than zero, since Γ is positive deﬁnite.Thus, in this case, (5.4) is satisﬁed.Now, we suppose that v is nonzero. First, we will show that ∇ G ( x ) v lies in the tangent cone T S m + ( G ( x )) and that ∇ G ( x ) v is orthogonal to Λ, which implies 0 = (cid:104)∇ G ( x ) v, Λ (cid:105) = v (cid:62) ∇ G ( x ) ∗ Λ = v (cid:62) ∇ f ( x ). This shows that v lies in the critical cone C ( x ).Note that the image of the operator L Y only contains matrices having the lower right ( m − k ) × ( m − k ) block equal to zero. Therefore, ∇ G ( x ) v = 2 Y ◦ W implies that ∇ G ( x ) v has the shape ∇ G ( x ) v = (cid:18) A BB (cid:62) (cid:19) . Hence, ∇ G ( x ) v ∈ T S m + ( G ( x )) and ∇ G ( x ) v is orthogonal to Λ. Due to SOSC-SDP, we must have (cid:104) ( ∇ x L ( x, Λ) + H ( x, Λ)) v, v (cid:105) > . Thus, if (cid:104) H ( x, Λ) v, v (cid:105) ≤ (cid:104) W ◦ W , Λ (cid:105) holds, then we have (5.4). In fact, since W satisﬁes ∇ G ( x ) v − Y ◦ W = 0, the chain of equalities ﬁnishing at (5.3) readily yields (cid:104) H ( x, Λ) v, v (cid:105) ≤ (cid:104) W ◦ W , Λ (cid:105) .Here, we remark one interesting consequence of the previous analysis. The second-order suﬃcient condition for NSDPs in [20] is stated under the assumption that the pair ( x, Λ) satisﬁes both strictcomplementarity and nondegeneracy. However, since (P1) and (P2) share the same local minima,Proposition 5.2 implies that we may remove the nondegeneracy assumption from SOSC-SDP. Wenow state a suﬃcient condition for (P1) based on the analysis above.13 roposition 5.3 (A Suﬃcient Condition via Slack Variables) . Let ( x, Λ) ∈ R n × S m be a KKT pairfor (P1) satisfying strict complementarity. Assume also that the following condition holds: (cid:104)∇ x L ( x, Λ) v, v (cid:105) + 2 (cid:104) W ◦ W , Λ (cid:105) > for every nonzero ( v, W ) ∈ R n × S m such that ∇ G ( x ) v − (cid:112) G ( x ) ◦ W = 0 . Then, x is a localminimum for (P1) . Apart from the detail of requiring nondegeneracy, the condition above is equivalent to SOSC-SDP,due to Propositions 5.1 and 5.2.

We now take a look at the diﬀerence between second-order necessary conditions that can be derivedfrom (P1) and (P2). Since the inequalities (2.5) and (2.8) are not strict, we need a slightly strongerassumption to prove the next proposition.

Proposition 6.1.

Suppose that ( x, Y, Λ) ∈ R n × S m × S m is a KKT triple of (P2) satisfying LICQand SONC-NLP. Moreover, assume that Y and Λ are positive semideﬁnite. If ( x, Λ) is a KKT pairfor (P1) satisfying strict complementarity, then it also satisﬁes SONC-SDP.Proof. Since ( x, Y,

Λ) satisﬁes LICQ for (P2) and Y is positive semideﬁnite, Corollary 4.2 impliesthat ( x, Λ) satisﬁes nondegeneracy. Under the assumption that ( x, Λ) is strictly complementary, theonly thing missing is to show that (2.8) holds. To do so, we proceed as in Proposition 5.1. Wepartition G ( x ) , Y, Λ, G ( x ) † and ∇ G ( x ) d in blocks in exactly the same way. The only diﬀerence isthat, since (2.5) does not hold strictly, we cannot make use of Lemma 2.1 in order to conclude that L E is invertible. Nevertheless, since we assume that Y is positive semideﬁnite, all the eigenvaluesof E are strictly positive anyway. So, as before, we can conclude that L E is an invertible operator,by Proposition 2.2. Due to strict complementarity, we can also conclude that Γ ∈ S m − k + is positivedeﬁnite and that C = 0.All our ingredients are now in place and we can proceed exactly as in the proof of Proposition5.1. Namely, we have to prove that, given d ∈ C ( x ), the inequality (cid:104) ( ∇ x L ( x, Λ) + H ( x, Λ)) d, d (cid:105) ≥ W satisfying both ∇ G ( x ) d − Y ◦ W = 0 and (cid:104) H ( x, Λ) d, d (cid:105) = 2 (cid:104) W ◦ W , Z (cid:105) . Then SONC-NLP will ensure that (2.8) holds. Actually, it can bedone by taking W = (cid:18) L − E ( A ) / E − BB (cid:62) E − (cid:19) and following the same line of arguments that leads to (5.3). Proposition 6.2.

Suppose that ( x, Λ) ∈ R n × S m is a KKT pair for (P1) satisfying SONC-SDP.Then, there exists Y ∈ S m such that ( x, Y, Λ) is a KKT triple for (P2) satisfying SONC-NLP.Proof. It is enough to choose Y to be (cid:112) G ( x ). If we do so, Corollary 4.2 ensures that ( x, Y, Λ) satisﬁesLICQ. We now have to check that (2.5) holds. For this, we can follow the proof of Proposition 5.2by considering (2.8) instead of (2.6). No special considerations are needed for this case.Assume that ( x, Λ) is a KKT pair of (P1) satisfying nondegeneracy and strict complementarity.Then, Proposition 6.1 gives an elementary route to prove that SONC-SDP holds. This is becauseif we select Y to be the positive semideﬁnite square root of G ( x ), all the conditions of Proposition6.1 are satisﬁed, which means that (2.8) must hold. Moreover, if we were to derive second-ordernecessary conditions for (P1) from scratch, we could consider the following.14 roposition 6.3 (A Necessary Condition via Slack Variables) . Let x ∈ R n be a local minimum of (P1) . Assume that ( x, Λ) ∈ R n × S m is a KKT pair for (P1) satisfying strict complementarity andnondegeneracy. Then the following condition holds: (cid:104)∇ x L ( x, Λ) v, v (cid:105) + 2 (cid:104) W ◦ W , Λ (cid:105) ≥ for every ( v, W ) ∈ R n × S m such that ∇ G ( x ) v − (cid:112) G ( x ) ◦ W = 0 . Propositions 6.1 and 6.2 ensure that the condition above is equivalent to SONC-SDP. ComparingPropositions 5.3 and 6.3, we see that the second-order conditions derived through the aid of slackvariables have “no-gap” in the sense that, apart from regularity conditions, the only diﬀerencebetween them is the change from “ > ” to “ ≥ ”. Let us now examine the validity of the squared slack variables method for NSDP problems. We testedthe slack variables approach on a few simple problems. Our solver of choice was PENLAB [8], whichis based on PENNON [14] and uses an algorithm based on the augmented Lagrangian technique. Asfar as we know, PENLAB is the only open-source general nonlinear programming solver capable ofhandling nonlinear SDP constraints. Because of that, we have the chance of comparing the “native”approach against the slack variables approach using the same code. We ran PENLAB with thedefault parameters. All the tests were done on a notebook with the following specs: Ubuntu 14.04,CPU Intel i7-4510U with 4 cores operating at 2.0Ghz and 4GB of RAM.In order to use an NLP solver to tackle (P1), we have to select a vectorization strategy. Wedecided to vectorize an n × n symmetric matrix by transforming it into an n ( n +1)2 vector, such thatthe columns of the lower triangular part are stacked one on top of the other. For instance, the matrix (cid:0) (cid:1) is transformed to the column vector (1 , , (cid:62) . There is a known suite of problems for testing nonlinear optimization problems collected by Hock andSchittkowski [11, 19]. The problem below is a modiﬁcation of problem 71 of [19] and it comes togetherwith PENLAB. Both the constraints and the objective function are nonconvex. The problem hasthe following formulation:minimize x ∈ R x x ( x + x + x ) + x subject to x x x x − x −

25 = 0 ,x + x + x + x − x −

40 = 0 ,  x x x x x + x x + x x x x x  ∈ S , ≤ x i ≤ , i = 1 , , , x i ≥ , i = 5 , . (HS)We reformulate the problem (HS) by removing the positive semideﬁniteness constraints andadding a squared slack variable Y . We then test both formulations using PENLAB. The initial point15able 1: Slack vs “native” for (HS)functions gradients Hessians iterations time (s) opt. valueslack 110 57 44 13 0.54 87.7105native 123 71 58 13 0.57 87.7105is set to be x = (5 , , , , ,

0) and the slack variable to be the identity matrix Y = I . This producesinfeasible points for both formulations. Nevertheless, PENLAB was able to solve the problem viaboth approaches. The results can be seen in Table 1. The ﬁrst three columns count the numbers ofevaluations of the augmented Lagrangian function, its gradients and its Hessians, respectively. Thefourth column is the number of outer iterations. The “time” column indicates the time in seconds asmeasured by PENLAB. The last column indicates the optimal value obtained. It seems that therewere no signiﬁcant diﬀerences in performance between both approaches. Given an m × m symmetric matrix H with diagonal entries equal to one, we want to ﬁnd the elementin S m + which is closest to H and has all diagonal entries also equal to one. The problem can beformulated as follows: minimize X (cid:104) X − H, X − H (cid:105) subject to X ii = 1 ∀ i,X ∈ S m + . (Cor)This problem is convex and, due to its structure, we can use slack variables without increasing thenumber of variables. We have the following formulation:minimize X (cid:104) ( X ◦ X ) − H, ( X ◦ X ) − H (cid:105) subject to ( X ◦ X ) ii = 1 ∀ i,X ∈ S m . (Cor-Slack)In our experiments, we generated 100 symmetric matrices H such that the diagonal elements areall 1 and other elements are uniform random numbers between − X = I m as an initial solution in all instances. We solved problems with m = 5 , , ,

20 andthe results can be found in Table 2. The columns “mean”, “min” and “max” indicate, respectively,the mean, minimum and maximum of the running times in seconds of all instances. For this problem,both formulations were able to solve all instances. We included the “mean time” column just to givean idea about the magnitude of the running time. In reality, for ﬁxed m , the running time oscillatedhighly among diﬀerent instances, as can be seen by the diﬀerence between the maximum and theminimum running times. We noted no signiﬁcant diﬀerence between the optimal values obtainedfrom both formulations.We tried, as much as possible, to implement gradients and Hessians of both problems in a similarway. As Cor is an example that comes with PENLAB, we also performed some minor tweaks toconform to that goal. Performance-wise, the formulation Cor-Slack seems to be competitive forthis example. In most instances, Cor-Slack had a faster running time. In Figure 1, we show thecomparison between running times, instance-by-instance, for the case m = 20.16able 2: Comparison between Cor and Cor-SlackCor-Slack Cor m mean (s) min (s) max (s) mean (s) min (s) max (s)5 0.090 0.060 0.140 0.201 0.130 0.25010 0.153 0.120 0.230 0.423 0.330 0.63015 0.287 0.210 0.430 1.306 1.020 1.95020 0.556 0.450 1.180 3.491 2.820 4.990 Instance s e c SlackNative

Figure 1: Cor vs. Cor-Slack. Instance-by-instance running times for m = 20. We consider an extended formulation for Cor as suggested in one of PENLAB’s examples, with extraconstraints to bound the eigenvalues of the matrices:minimize

X,z (cid:104) zX − H, zX − H (cid:105) subject to zX ii = 1 ∀ i,I m (cid:22) X (cid:22) κI m , (Cor-Ext)where κ is some positive number greater than 1 and the notation X (cid:23) κI m means X − κI m ∈ S m + .This is a nonconvex problem, and using slack variables, we obtain the following formulation:minimize X,Y ,Y ,z (cid:104) zX − H, zX − H (cid:105) subject to zX ii = 1 ∀ i,κI m − X = Y ◦ Y ,X − I m = Y ◦ Y . (Cor-Ext-Slack)In our experiments, we set κ = 10. As before, we generated 100 symmetric matrices H whosediagonal elements are all 1 and other elements are uniform random numbers between − m mean (s) min (s) max (s) fail mean (s) min (s) max (s) fail5 0.236 0.130 0.830 15 0.445 0.250 2.130 110 0.741 0.420 2.580 3 1.206 0.580 7.300 015 4.651 2.090 26.96 15 3.809 1.960 14.12 020 24.32 15.20 69.34 8 9.288 5.150 36.81 0For Cor-Ext, we used z = 1 and X = I m as initial points. For Cor-Ext-Slack, we used an infeasiblestarting point z = 1, X = Y = I m and Y = 3 I m . We solved problems with m = 5 , , ,

20 and theresults can be found in Table 3. The columns have the same meaning as in Section7.2. This time, wesaw a higher failure rate for the formulation Cor-Ext-Slack. We tried a few diﬀerent initial points,but the results stayed mostly the same. The best results were obtained for the case m = 5 and m = 10, where Cor-Ext-Slack had a performance comparable to Cor-Ext, although the latter seldomfailed. For m = 15 and m = 20, Cor-Ext-Slack was slower than Cor-Ext, which is expected, becausethe number of variables increased signiﬁcantly. However, it was still able to solve the majority ofinstances. In Figure 2, we show the comparison of running times, instance-by-instance, for the cases m = 10 and m = 20. In this article, we have shown that the optimality conditions for (P1) and (P2) are essentially thesame. One intriguing part of this connection is the fact that the addition of squared slack variablesseems to be enough to capture a great deal of the structure of S m + . The natural progression fromhere is to expand the results to symmetric cones. In this article, we already saw some results thathave a distinct Jordan-algebraic ﬂavor, such as Lemma 2.1. It would be interesting to see how theseresults can be further extended and, whether clean proofs can be obtained without recoursing to theclassiﬁcation of simple Euclidean Jordan algebras.As for the computational results, we found it mildly surprising that the slack variables approachwas able to outperform the “native” approach in many instances. This warrants a deeper investi-gation of whether this could be a reliable tool for attacking NSDPs that are not linear. These areprecisely the ones that are not covered by the earlier work by Burer and Monteiro [5, 6]. References [1] Barvinok, A.: Problems of distance geometry and convex properties of quadratic maps. DiscreteComput. Geom. (1), 189–202 (1995)[2] Bernstein, D.S.: Matrix Mathematics: Theory, Facts, and Formulas, 2nd edn. Princeton Uni-versity Press (2009)[3] Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientiﬁc (1999)[4] Bonnans, J.F., Cominetti, R., Shapiro, A.: Second order optimality conditions based onparabolic second order tangent sets. SIAM J. Optim. (2), 466–492 (1999)18 nstance (m = 10) s e c SlackNative

Instance (m = 20) s e c SlackNative

Figure 2: Cor-Ext vs Cor-Ext-Slack. Instance-by-instance running times for m = 10 and m = 20.Failures are represented by omitting the corresponding running time.195] Burer, S., Monteiro, R.D.: A nonlinear programming algorithm for solving semideﬁnite pro-grams via low-rank factorization. Math. Program. (2), 329–357 (2003)[6] Burer, S., Monteiro, R.D.: Local minima and convergence in low-rank semideﬁnite program-ming. Math. Program. (3), 427–444 (2005)[7] Cominetti, R.: Metric regularity, tangent sets, and second-order optimality conditions. Appl.Math. Optim. (1), 265–287 (1990)[8] Fiala, J., Koˇcvara, M., Stingl, M.: PENLAB: A matlab solver for nonlinear semideﬁnite opti-mization. ArXiv e-prints (2013)[9] Forsgren, A.: Optimality conditions for nonconvex semideﬁnite programming. Math. Program. (1), 105–128 (2000)[10] Fukuda, E.H., Fukushima, M.: The use of squared slack variables in nonlinear second-ordercone programming. Submitted (2015)[11] Hock, W., Schittkowski, K.: Test examples for nonlinear programming codes. J. Optim. TheoryAppl. (1), 127–129 (1980)[12] Jarre, F.: Elementary optimality conditions for nonlinear SDPs. In: Handbook on Semideﬁnite,Conic and Polynomial Optimization, International Series in Operations Research & Manage-ment Science , vol. 166, pp. 455–470. Springer (2012)[13] Kawasaki, H.: An envelope-like eﬀect of inﬁnitely many inequality constraints on second-ordernecessary conditions for minimization problems. Math. Program. (1-3), 73–96 (1988)[14] Koˇcvara, M., Stingl, M.: PENNON: A code for convex nonlinear and semideﬁnite programming.Optim. Methods Softw. (3), 317–333 (2003)[15] Nocedal, J., Wright, S.J.: Numerical Optimization, 1st edn. Springer Verlag, New York (1999)[16] Pataki, G.: On the rank of extreme matrices in semideﬁnite programs and the multiplicity ofoptimal eigenvalues. Math. Oper. Res. (2), 339–358 (1998)[17] Pataki, G.: The geometry of semideﬁnite programming. In: H. Wolkowicz, R. Saigal, L. Vanden-berghe (eds.) Handbook of Semideﬁnite Programming: Theory, Algorithms, and Applications.Kluwer Academic Publishers (2000)[18] Robinson, S.M.: Stability theory for systems of inequalities, part II: Diﬀerentiable nonlinearsystems. SIAM J. Numer. Anal. (4), 497–513 (1976)[19] Schittkowski, K.: Test examples for nonlinear programming codes – all problems from the Hock-Schittkowski-collection. Tech. rep., Department of Computer Science, University of Bayreuth(2009)[20] Shapiro, A.: First and second order analysis of nonlinear semideﬁnite programs. Math. Program. (1), 301–320 (1997)[21] Sturm, J.F.: Similarity and other spectral relations for symmetric cones. Linear Algebra Appl. (1-3), 135–154 (2000)[22] Yamashita, H., Yabe, H.: A survey of numerical methods for nonlinear semideﬁnite program-ming. J. Oper. Res. Soc. Jpn.58