[PDF] DC Semidefinite Programming and Cone Constrained DC Optimization

Abstract

In the first part of this paper we discuss possible extensions of the main ideas and results of constrained DC optimization to the case of nonlinear semidefinite programming problems (i.e. problems with matrix constraints). To this end, we analyse two different approaches to the definition of DC matrix-valued functions (namely, order-theoretic and componentwise), study some properties of convex and DC matrix-valued functions and demonstrate how to compute DC decompositions of some nonlinear semidefinite constraints appearing in applications. We also compute a DC decomposition of the maximal eigenvalue of a DC matrix-valued function, which can be used to reformulate DC semidefinite constraints as DC inequality constrains. In the second part of the paper, we develop a general theory of cone constrained DC optimization problems. Namely, we obtain local optimality conditions for such problems and study an extension of the DC algorithm (the convex-concave procedure) to the case of general cone constrained DC optimization problems. We analyse a global convergence of this method and present a detailed study of a version of the DCA utilising exact penalty functions. In particular, we provide two types of sufficient conditions for the convergence of this method to a feasible and critical point of a cone constrained DC optimization problem from an infeasible starting point.

Full PDF

aa r X i v : . [ m a t h . O C ] F e b DC Semideﬁnite Programming and ConeConstrained DC Optimization

M.V. Dolgopolik ∗† February 3, 2021

Abstract

In the ﬁrst part of this paper we discuss possible extensions of themain ideas and results of constrained DC optimization to the case ofnonlinear semideﬁnite programming problems (i.e. problems with ma-trix constraints). To this end, we analyse two diﬀerent approaches tothe deﬁnition of DC matrix-valued functions (namely, order-theoretic andcomponentwise), study some properties of convex and DC matrix-valuedfunctions and demonstrate how to compute DC decompositions of somenonlinear semideﬁnite constraints appearing in applications. We also com-pute a DC decomposition of the maximal eigenvalue of a DC matrix-valuedfunction, which can be used to reformulate DC semideﬁnite constraintsas DC inequality constrains.In the second part of the paper, we develop a general theory of coneconstrained DC optimization problems. Namely, we obtain local opti-mality conditions for such problems and study an extension of the DCalgorithm (the convex-concave procedure) to the case of general cone con-strained DC optimization problems. We analyse a global convergence ofthis method and present a detailed study of a version of the DCA utilisingexact penalty functions. In particular, we provide two types of suﬃcientconditions for the convergence of this method to a feasible and criticalpoint of a cone constrained DC optimization problem from an infeasiblestarting point.

Starting with the pioneering works of Hiriart-Urruty [23, 24], Pham Dinh andSouad [50], Strekalovsky [55], Tuy [63], and many others in the 1980s, DC (Dif-ference of Convex functions) programming has been one of the most active areasof research in nonlinear nonconvex optimization. One of the main features ofDC optimization problems is the fact that one can derive constructive globaloptimality conditions [13, 25, 56, 66, 72] and develop deterministic global opti-mization methods [15, 26, 39, 57, 64, 65] for this class of problems. Local searchmethods for minimizing DC functions have also attracted a considerable atten-tion of researchers (see [17, 28, 58, 61] and the references therein). ∗ Institute for Problems in Mechanical Engineering, Russian Academy of Sciences, SaintPetersburg, Russia † This work was performed in IPME RAS and supported by the Russian Science Foundation(Grant No. 20-71-10032).

DC Algorithm (DCA), originally presented byPham Dinh and Souad in [50] and later on thoroughly investigated in the worksof Le Thi and Pham Dinh et al. [35,36,38,46,47] (a particular version of the DCAis sometimes called the concave-convex/convex-concave procedure [34,71]). Someclosely related local search methods were studied in the works of de Oliveira etal. [8, 9, 67, 68]. For a detailed survey on DC programming, DC Algorithm, andtheir applications see [37]. A comprehensive literature review of the DC algo-rithm, the convex-concave procedure, and other related optimization methodscan be found in [40].Cone constrained optimization is one the central areas of constrained op-timization, since it provides a uniﬁed setting for many diﬀerent problems ap-pearing in applications. Standard equality and inequality constrained problems,semideﬁnite programming problems [32,54,60], second order cone programmingproblems [2], semi-inﬁnite programming problems [18, 49], and many other par-ticular problems (see, e.g. [3,5,43]) can be formulated as general cone constrainedoptimization problems.A detailed theoretical analysis of smooth and nonsmooth cone constrainedoptimization problems was presented in [4,16,30,42,62,73]. Optimization meth-ods for solving various convex cone constrained optimization problems can befound in [3,5,43], while algorithms for solving various classes of nonconvex coneconstrained optimization problems were developed, e.g. in [6,31,32,53,54,69,70](see also the references therein).Despite the abundance of publications on cone constrained optimization and(usually inequality) constrained DC optimization problems, very little attentionhas been paid to extensions of the main results and methods of DC optimizationto the case of problems with cone constraints. Even in the comprehensive sur-vey paper [37], only unconstrained and inequality constrained DC optimizationproblems are discussed.The convex-concave procedure and the penalty convex-concave procedure forsolving cone constrained DC optimization problems were proposed by Lipp andBoyd in [40], where an application of these methods to multi-matrix principalcomponent analysis was presented. However, to the best of the author’s knowl-edge, a convergence analysis of these methods remains an open problem. Anapplication of the DCA to bilinear and quadratic matrix inequality feasibilityproblems was considered by Niu and Dinh [44].The main goal of this paper is to ﬁll in the gap and extend some of the mainresults and algorithms of inequality constrained DC optimization to the case ofDC optimization problems with DC cone constraints, particularly, DC semideﬁ-nite programming problems. To this end, in the ﬁrst part of the paper we studytwo diﬀerent approaches to the deﬁnition of DC matrix-valued functions: order-theoretic and componentwise. We obtain several useful properties of convex andDC matrix-valued functions, prove that any DC (in the order-theoretic sense)matrix-valued function is necessarily componentwise DC, and demonstrate howone can compute DC decompositions of several nonlinear matrix-valued func-tions appearing in applications. We also construct a DC decomposition of themaximal eigenvalue of a componentwise DC matrix-valued function. This resultallows one to easily extend all results and methods of inequality constrained DCoptimization to the case of DC optimization problems with componentwise DCsemideﬁnite constraints. 2he second part of the paper is devoted to abstract cone constrained DCoptimization problems. We obtain local optimality conditions for such problemsin several diﬀerent forms and present a detailed convergence analysis of the algo-rithms for solving cone constrained DC optimization problems proposed in [40],thus providing a theoretical foundation for applications of the methods from [40].We prove a global convergence of the convex-concave procedure (CCP)/DC Al-gorithm from [40] to a critical point of the problem under consideration andpresent a comprehensive analysis of a penalized version of this method. Weobtain suﬃcient conditions for the exactness of the penalty subproblem, estab-lish a global convergence of the penalty CCP to generalized critical points, andprovide two types of suﬃcient conditions for a convergence of the penalty CCPto a feasible and critical point of a cone constrained DC optimization problemfrom an infeasible starting point. We also discuss why the penalty CCP mightbe superior to the non-penalized version of this method in the case when thefeasible starting point is known.The paper is organized as follows. Order-theoretic and componentwise ap-proaches to DC matrix valued functions are studied in Section 2, while a DCstructure of the maximal eigenvalue of a nonlinear matrix-valued function isdiscussed in Section 3. Section 4 is devoted to general cone constrained DCoptimization problems. Local optimality conditions for such problems are ob-tained in Subsection 4.2. A convergence analysis of the DC algorithm (theconvex-concave procedure) for cone constrained DC optimization problems pro-posed by Lipp and Boyd [40] is presented in Subsection 4.3, while a detailedconvergence analysis of the penalty convex-concave procedure from [40] is givenin Subsections 4.4–4.6. Finally, some auxiliary results on vector-valued convexmappings and convex multifunctions are collected in Subsection 4.1.

Denote by S ℓ the space of all real symmetric matrices of order ℓ ∈ N , and let (cid:22) be the L¨oewner partial order on S ℓ , i.e. A (cid:22) B for some matrices A, B ∈ S ℓ iﬀthe matrix B − A is positive semideﬁnite. Nonlinear semideﬁnite optimizationis concerned with problems of optimizing functions subject to constraints of theform F ( x ) (cid:22)

0, where F : R d → S ℓ is a given nonlinear mapping. To extendthe main ideas and results of DC optimization to the case of nonlinear semidef-inite programming problems, ﬁrst one must introduce a suitable deﬁnition of aDC matrix-valued function F . There are two possible approaches to this deﬁ-nition: order-theoretic and componentwise. Let us discuss and compare theseapproaches.Recall that the matrix-valued function F is called convex (see, e.g. [4,Sect. 5.3.2] and [5, Sect. 3.6.2]), if F ( αx + (1 − α ) x ) (cid:22) αF ( x ) + (1 − α ) F ( x ) ∀ x , x ∈ R d , α ∈ [0 , . Therefore it is natural to call the function F DC (

Diﬀerence-of-Convex ), ifthere exist convex functions

G, H : R d → S ℓ such that F = G − H . Any suchrepresentation of the function F (or, equivalently, any such pair of functions( G, H )) is called a

DC decomposition of F .3he deﬁnition of matrix-valued DC function given above has several disad-vantages. Firstly, the convexity of matrix-valued functions is much harder toverify than the convexity of real-valued functions. Many matrix-valued functionsthat might seem to be convex judging by the experience with the real-valuedcase are, in actuality, nonconvex. In particular, the convexity of each component F ij ( · ) of F is not suﬃcient to ensure the matrix convexity of F . Example 1.

Let d = 1, ℓ = 2, and F ( x ) = (cid:16) x x (cid:17) . Then for x = 1 and x = − αF ( x ) + (1 − α ) F ( x ) − F ( αx + (1 − α ) x ) = (cid:16) − (2 α − − (2 α − (cid:17) . This matrix is not positive semideﬁnite for any α ∈ (0 , F is nonconvex.Secondly, recall that the set S ℓ equipped with the L¨oewner partial orderis not a vector lattice, since by Kadison’s theorem [29] the least upper bound(the supremum) of two matrices in the L¨oewner order exists iﬀ these matricesare comparable. Therefore, many standard results and techniques from convexanalysis do not admit a natural extension to the case of matrix convexity (cf.the general theory of convex vector-valued functions [33, 45, 59], in which theassumption on the completeness of partial order is often indispensable). Forexample, in most cases the supremum of two convex matrix-valued functions isnot correctly deﬁned.Nevertheless, there are some similarities between matrix-valued DC func-tions and their real-valued counterparts. In particular, one can construct aDC decomposition of a twice continuously diﬀerentiable matrix-valued functionwith bounded Hessian in the same way one can construct DC decomposition ofa twice continuously diﬀerentiable real-valued function.Let I ℓ be the identity matrix of order ℓ . Denote by | · | the Euclidean norm,by h· , ·i the inner product in R k , and by k A k F = p Tr( A ) the Frobenius normof a matrix A . Theorem 1.

Let a function F : R d → S ℓ be twice continuously diﬀerentiableand suppose that there exists M > such that k∇ F ij ( x ) k F ≤ M for all i, j ∈{ , . . . , ℓ } . Then the function F is DC and for any µ ≥ ℓM the pair ( G, H ) with G ( x ) = F ( x ) + µ | x | I ℓ and H ( x ) = µ | x | I ℓ , x ∈ R d , is a DC decomposition ofthe function F .Proof. Observe that by the deﬁnitions of matrix convexity and the L¨oewnerpartial order a function G : R d → S ℓ is convex iﬀ for any z ∈ R ℓ one has h z, (cid:16) αG ( x ) + (1 − α ) G ( x ) − G ( αx + (1 − α ) x ) (cid:17) z i ≥ x , x ∈ R d and α ∈ [0 ,

1] or, equivalently, h z, G ( αx + (1 − α ) x ) z i ≤ α h z, G ( x ) z i + (1 − α ) h z, G ( x ) z i . Therefore, a function G : R d → S ℓ is convex iﬀ for any z ∈ R ℓ the real-valuedfunction G z ( · ) = h z, G ( · ) z i is convex. Consequently, in the case when G is twicecontinuously diﬀerentiable, this function is convex iﬀ for any z the Hessian of4he function G z is positive semideﬁnite, i.e. for all x ∈ R d and z ∈ R ℓ thematrix ∇ G z ( x ) = ℓ X i,j =1 z i z j ∇ G ij ( x )is positive semideﬁnite.Let us now turn to the proof of the theorem. Deﬁne G ( x ) = F ( x ) + µ | x | I ℓ and H ( x ) = µ | x | I ℓ , x ∈ R d , for some µ ≥

0. Let us check that the functions G and H are convex, provided µ ≥ ℓM . Then one can conclude that F is a DCfunction and the pair ( G, H ) is a DC decomposition of F .Indeed, for any x ∈ R d and v, z ∈ R ℓ one has h v, ∇ G z ( x ) v i = ℓ X i,j =1 z i z j h v, ∇ F ij ( x ) v i + µ ℓ X i =1 z i | v | Applying the Cauchy-Schwarz inequality, the inequality | z i z j | ≤ . z i + z j ),and the fact that the Frobenius norm is compatible with the Euclidean normone gets that h v, ∇ G z ( x ) v i ≥ − | v | ℓ X i,j =1 k∇ F ij ( x ) k F (cid:0) z i + z j (cid:1) + µ ℓ X i =1 z i | v | = | v | ℓ X i =1  µ − ℓ X j =1 k∇ F ij ( x ) k  z i . Hence for any x ∈ R d and µ ≥ ℓM one has h v, ∇ G z ( x ) v i ≥ ∀ v ∈ R ℓ , that is, the Hessian ∇ G z ( x ) is positive semideﬁnite, which implies that thematrix-valued function G ( x ) = F ( x ) + µ | x | I ℓ is convex. The convexity of thefunction H can be readily veriﬁed directly.The diﬃculties connected with the use of matrix convexity motivate us toconsider a diﬀerent approach to the deﬁnition of DC matrix-valued functions. Deﬁnition 1.

A function F : R d → S ℓ is called componentwise convex , if eachcomponent F ij ( · ), i, j ∈ { , . . . , ℓ } , is convex. The function F is called compo-nentwise DC , if there exist componentwise convex functions G, H : R d → S ℓ suchthat F = G − H . Any such representation of the function F (or, equivalently,any such pair of functions ( G, H )) is called a componentwise DC decomposition of F .Many properties of real-valued DC functions can be easily extended to thecase of componentwise DC matrix-valued functions. For example, a linear com-bination of componentwise DC functions is obviously componentwise DC. Withthe use of the well-known results of Hartman [21] one can easily see that theHadamard and the Kronecker products of componentwise DC matrix-valuedfunctions are componentwise DC. Furthermore, applying the representation ofinverse matrix via adjugate matrix one can verify that if a function F : R d → S ℓ

5s componentwise DC and for all x ∈ R d the matrix F ( x ) is invertible, then theinverse-matrix function F − mapping x to ( F ( x )) − is also componentwise DC.Let us point out some connections between convex/DC and componentwiseconvex/DC matrix-valued functions. As Example 1 demonstrates, component-wise convex matrix-valued functions need not be convex. On the other hand,from the fact that for any convex matrix-valued function F the real-valued func-tion h z, F ( · ) z i is convex for all z ∈ R ℓ it follows that all diagonal components F ii ( · ) of a convex matrix-valued function F must be convex (put z = e i for everyvector e i from the canonical basis of R ℓ ). However, non-diagonal componentsof F need not be convex. Example 2.

Let d = 1, ℓ = 2, and F ( x ) = (cid:16) . x sin x sin x . x (cid:17) . Then for all z ∈ R and x ∈ R one has X i,j =1 z i z j ∇ F ij ( x ) = z − x ) z z + z ≥ z − | z || z | + z = ( | z |−| z | ) ≥ . Consequently, the function F is convex by [4, Proposition 5.72, part (ii)], despitethe fact that non-diagonal elements of F are nonconvex.Although non-diagonal elements of a convex matrix-valued function F mightbe nonconvex, they cannot be too ‘wild’, e.g. discontinuous. Namely, the fol-lowing result hold true. Theorem 2.

Let a function F : R d → S ℓ be convex. Then for all i, j ∈{ , . . . , ℓ } , i = j , the function F ij is DC and, therefore, Lipschitz continuous onany bounded set and twice diﬀerentiable almost everywhere.Proof. We prove the theorem by induction in ℓ . The case ℓ = 1 is trivial. Letus prove the case ℓ = 2 in order to highlight the main idea of the proof.As was noted above, the function h z, F ( · ) z i is convex for all z ∈ R ℓ , which,in particular, implies that the functions F ( · ) and F ( · ) are convex. For thevector z = (1 , T one obtains that the function F z ( x ) = h z, F ( x ) z i = F ( x ) + 2 F ( x ) + F ( x ) , x ∈ R d is convex as well. Therefore the function F ( x ) = F ( x ) = 12 F z ( x ) −

12 ( F ( x ) + F ( x ))is DC, which completes the proof of the case ℓ = 2. Inductive step.

Suppose that the theorem is valid for some ℓ ∈ N . Let usprove it for ℓ + 1. The function F z ( · ) = h z, F ( · ) z i is convex for all z ∈ R ℓ +1 .Putting z = ( z , . . . , z ℓ , T and z = (0 , z , . . . , z ℓ +1 ) T for any z i ∈ R , i ∈{ , . . . , ℓ + 1 } one obtains that the matrix-valued functions G ( x ) =  F ( x ) . . . F ℓ ( x )... ... ... F ℓ ( x ) . . . F ℓℓ  , H ( x ) =  F ( x ) . . . F ℓ +1) ( x )... ... ... F ( ℓ +1)2 ( x ) . . . F ( ℓ +1)( ℓ +1)  are convex. Therefore by the induction hypothesis all functions F ij , i, j ∈{ , . . . , ℓ + 1 } are DC, except for F ℓ +1) (or, equivalently, F ( ℓ +1)1 , since F ( x ) isby deﬁnition a symmetric matrix). 6or z = (1 , . . . , T one gets that the function F z ( x ) = ℓ +1 X i,j =1 F ij ( x ) , x ∈ R d is convex, which obviously implies that the function F ℓ +1) is DC.As simple corollaries to the previous theorem we obtain straightforwardextensions of some well-known results for real-valued convex function to thematrix-valued case. Corollary 1.

Let a function F : R d → S ℓ be convex. Then F is Lipschitzcontinuous on bounded sets, i.e. for any bounded set K ⊂ R d there exists L > such that k F ( x ) − F ( x ) k F ≤ L | x − x | for all x , x ∈ K . Corollary 2 (Aleksandrov-Busemann-Feller theorem for matrix-valued func-tions) . Let a function F : R d → S ℓ be convex. Then F is twice diﬀerentiablealmost everywhere.Remark . Note that the statement of Theorem 2 is obviously true for locallyconvex (i.e. convex in a neighbourhood of every point) matrix-valued functionsdeﬁned on not necessarily convex sets. Therefore, the previous corollary remainstrue in this case as well. Namely, every locally convex function F : U → S ℓ deﬁned on an open set U ⊂ R d is twice diﬀerentiable almost everywhere on U .Since the diﬀerence of two real-valued DC functions is a DC function, The-orem 2 also allows one to point out a direct connection between DC and com-ponentwise DC functions. Corollary 3.

Any DC function F : R d → S ℓ is componentwise DC. Since the deﬁnition of DC function provides a lot of ﬂexibility (namely, thereare inﬁnitely many DC decompositions of a given function), it seems reasonableto assume that despite some drawbacks of matrix convexity the class of matrix-valued DC functions is suﬃciently rich. In particular, one might ask whetherthe class of matrix valued DC functions coincides with the class of componen-twise DC functions or there are some componentwise DC functions that arenot DC (a characterization of such functions would provide a deep insight intothe structure of DC matrix-valued functions). Another interesting question iswhether the matrix DC property is preserved under standard operations, suchas the Hadamard/Kronecker product and inversion. Arguing in the same wayas in the proof of Theorem 1 one can easily check that for twice continuouslydiﬀerentiable matrix-valued functions the answer to this question is positive,provided one considers locally

DC functions. However, it is unclear whether theclasses of locally and globally DC functions coincide in the matrix-valued case(for componentwise DC functions this statement is obviously true due to thecelebrated result of Hartman [21]).In the end of this section, let us present several simple examples of DCsemideﬁnite constraints appearing in applications and their DC decompositions.These examples, in particular, demonstrate some beneﬁts of using matrix-valuedDC functions in comparison with componentwise DC functions.7 xample 3 (Quadratic/Bilinear Constraints) . Suppose that F ( x ) = C + d X i =1 x i B i + d X i,j =1 x i x j A ij (1)for some matrices C, B i , A ij ∈ S ℓ . In particular, one can suppose that thefunction F ( x ) is bilinear/biaﬃne, that is, F ( x, y ) = A + d X i =1 x i A i + m X j =1 y j A j + d X i =1 m X j =1 x i y j A ij , ∀ x ∈ R d , y ∈ R m for some matrices A ij ∈ S ℓ . Such nonlinear matrix constraints appear in prob-lems of simultaneous stabilisation of single-input single-output linear systemsby one ﬁxed controller of a given order [22,54], robust gain-scheduling and somedecentralized control problems [19, 20], problems of maximizing the minimaleigenfrequency of a given structure [54], etc.By Theorem 1 the function F of the form (1) is DC and for any µ ≥ ℓM ,where M = max s,k ∈{ ,...,ℓ } d X i,j =1 [ A ij ] sk , the pair G ( x ) = C + d X i =1 x i B i + d X i,j =1 x i x j A ij + µ | x | I ℓ , H ( x ) = µ | x | I ℓ is a DC decomposition of F . Note that to compute a componentwise DC de-composition of the function F one would have to compute DC decompositionsof ℓ quadratic functions of the form d X i,j =1 [ A ij ] sk x i x j , s, k ∈ { , . . . , ℓ } . Moreover, in the general case the function H (the concave part) from a compo-nentwise DC decomposition of F would not be diagonal.It should be noted that a diﬀerent DC decomposition of the function F canbe constructed. Namely, as was shown in [4, Example 5.74] a matrix-valuedfunction F of the form (1) is convex, if the ℓd × ℓd block matrix A = ( A ij ) di,j =1 is positive semideﬁnite. Therefore, if a decomposition A = A + + A − of thematrix A onto positive semideﬁnite and negative semideﬁnite parts is known,one can deﬁne G ( x ) = C + d X i =1 x i B i + d X i,j =1 x i x j ( A + ) ij , H ( x ) = − d X i,j =1 x i x j ( A − ) ij . Such DC decomposition can be used if the block matrix A has a relatively simplestructure, e.g. when only diagonal blocks A ii are nonzero.8 xample 4 (Bilinear/Biaﬃne Matrix Constraints) . Let R ( X , X , X ) = (cid:20) X ( A + BX C ) X X ( A + BX C ) T X (cid:21) (cid:22) X , X ∈ S ℓ , X ∈ R m × m , and for some matrices A ∈ R ℓ × ℓ , B ∈ R ℓ × m ,and C ∈ R m × ℓ . Nonlinear semideﬁnite constraints involving such functions R (or similar ones) appear, e.g. in optimal H / H ∞ -static output feedbackproblems [54].To apply the results presented in this section to the function R , deﬁne d =0 . ℓ ( ℓ +1)+ m +0 . ℓ ( ℓ +1) (here we used the fact that matrix X ∈ S ℓ is deﬁnedby ℓ ( ℓ + 1) / x ∈ R d let ( X , X , X ) be the correspondingtriplet of matrices from S ℓ × R m × m × S ℓ , and let F ( x ) = R ( X , X , X ).By Theorem 1 the function F is DC and for any µ ≥ ℓM , where M = max i ∈{ ,...,ℓ } m X k =1 m X k =1 ℓ X k =1 (cid:0) B ik C k k (cid:1) , the pair G ( x ) = F ( x ) + µ (cid:0) k X k F + k X k F (cid:1) I ℓ , H ( x ) = µ (cid:0) k X k F + k X k F (cid:1) I ℓ is a DC decomposition of F . Example 5 (The Stiefel Manifold/Orthogonality Constraint) . Let d = m × ℓ for some m ∈ N , i.e. x is a real matrix of order m × ℓ , which we denote by X .Consider the equality constraint X T X = I ℓ , (2)which is known as the Stiefel manifold or orthogonality constraint appearing inmany applications [1, 14, 40, 41].Following Lipp and Boyd [40], we rewrite equality constraint (2) as twomatrix inequality constraints: G ( X ) = X T X − I ℓ (cid:22) , H ( X ) = I ℓ − X T X (cid:22) . Let, as above, G z ( X ) = h z, G ( X ) z i . Observe that for any X , X ∈ R m × ℓ and α ∈ [0 ,

1] one has αG z ( X ) + (1 − α ) G z ( X ) − G z ( αX + (1 − α ) X )= ( α − α ) h z, X T X z i + (cid:0) (1 − α ) − (1 − α ) (cid:1) h z, X T X z i− α (1 − α ) h z, ( X T X + X T X ) z i = α (1 − α ) (cid:16) | X z | + | X z | − h X z, X z i (cid:17) = α (1 − α ) (cid:12)(cid:12) X z − X z (cid:12)(cid:12) ≥ . Consequently, the function G z is convex for any z ∈ R ℓ , which implies thatthe functions G and − H are matrix convex. Thus, equality constraint (2)can be rewritten as two DC semideﬁnite constraints. It should be noted thatalthough this transformation is degenerate (we rewrite an equality constraint astwo inequality constraints), numerical experiments reported in [40] demonstratethe eﬀectiveness of an optimization method based on such transformation.9 DC Structure of the Maximal Eigenvalue Func-tion

Since there is no obvious connection between componentwise convexity and theL¨oewner partial order/matrix convexity, componentwise DC matrix-valued func-tions cannot be utilised directly in the abstract setting of nonlinear semideﬁniteprogramming problems. Instead, it is natural to apply componentwise DC prop-erty to a reformulation of such problems in which the semideﬁnite constraint F ( x ) (cid:22) λ max ( F ( x )) ≤ λ max ( A ) is the maximal eigenvalue of a symmetric matrix A .Our aim is to show that for componentwise DC functions F the inequalityconstraint λ max ( F ( x )) ≤ λ max ( F ( · )), if a componentwise DC decom-position of the function F is known. With the use of this result one can easilyextend standard results and algorithms from the theory of DC constrained DCoptimization problems to the case of DC semideﬁnite optimization problems. Theorem 3.

Let F : R d → S ℓ be a componentwise DC function and F ij = G ij − H ij be a DC decomposition of each component of F , i, j ∈ { , . . . , ℓ } .Then the function λ max ( F ( · )) is DC and the pair ( g, h ) with g ( x ) = max | v |≤ ℓ X i,j =1 (cid:16) ( v i v j + 1) G ij ( x ) + (1 − v i v j ) H ij ( x ) (cid:17) ,h ( x ) = ℓ X i,j =1 (cid:16) G ij ( x ) + H ij ( x ) (cid:17) (3) for all x ∈ R d is a DC decomposition of the function λ max ( F ( · )) .Proof. Fix any x ∈ R d . As is well-known and easy to check, the followingequality holds true: λ max ( F ( x )) = max | v |≤ h v, F ( x ) v i = max | v |≤ ℓ X i,j =1 v i v j F ij ( x ) . Adding and subtracting G ij ( x ) + H ij ( x ) for all i, j ∈ { , . . . , ℓ } and taking intoaccount the equality F ij ( x ) = G ij ( x ) − H ij ( x ) one obtains that λ max ( F ( x )) = max | v |≤ ℓ X i,j =1 (cid:16) ( v i v j + 1) G ij ( x ) + (1 − v i v j ) H ij ( x ) (cid:17) − ℓ X i,j =1 (cid:16) G ij ( x ) + H ij ( x ) (cid:17) =: g ( x ) − h ( x ) . The function h is obviously convex as the sum of convex functions. Moreover,note that v i v j + 1 ≥ − v i v j ≥ | v | ≤

1. Therefore, the function g is also convex as the maximum of the family of convex functions( v i v j + 1) G ij ( x ) + (1 − v i v j ) H ij ( x ) , | v | ≤ . Thus, the function λ max ( F ( · )) is DC and the pair ( g, h ) deﬁned in (3) is a DCdecomposition of this function. 10 emark . Let us make an almost trivial, yet useful observation. By deﬁnition g ( x ) = λ max ( F ( x )) + h ( x ). Therefore, there is no need to directly compute themaximum in the deﬁnition of g in order to compute g ( x ). One simply has to toﬁnd the maximal eigenvalue of the matrix F ( x ) and then add h ( x ).For the sake of completeness let us point out explicit formulae for the subdif-ferentials of the convex functions g and h from the theorem above. To this end,for any matrix A ∈ S ℓ denote by E max ( A ) the eigenspace of λ max ( A ), i.e. theunion of zero and all eigenvectors of A corresponding to its maximal eigenvalue. Proposition 1.

Let F : R d → S ℓ be a componentwise DC function, F ij = G ij − H ij be a DC decomposition of each component of F , i, j ∈ { , . . . , ℓ } , andthe functions g and h be deﬁned as in (3) . Then for any x ∈ R d one has ∂g ( x ) = co n ℓ X i,j =1 (cid:16) ( v i v j + 1) ∂G ij ( x ) + (1 − v i v j ) ∂H ij ( x ) (cid:17) (cid:12)(cid:12)(cid:12) v ∈ E max ( A ) : | v | = 1 o and ∂h ( x ) = P ℓi,j =1 ( ∂G ij ( x ) + ∂H ij ( x )) , where ‘ co ’ stands for the convex hull.Proof. The expression for ∂h ( x ) follows directly from the standard rules of sub-diﬀerential calculus. Let us prove the equality for ∂g ( x ).Indeed, ﬁx any x ∈ R d and denote by V ( A ) the set of all those v ∈ R ℓ with | v | ≤ g ( x ) is attained. Clearly, V ( A ) is a compact set. With the use of the theorem on the subdiﬀerential of thesupremum of an inﬁnite family of convex functions (see, e.g. [27, Theorem 4.2.3])one obtains that ∂g ( x ) = co n ℓ X i,j =1 ( v i v j + 1) ∂G ij ( x ) + (1 − v i v j ) ∂H ij ( x ) (cid:12)(cid:12)(cid:12) v ∈ V ( A ) o . Note that this convex hull is closed as the convex hull of a compact set. There-fore, it remains to show that v ∈ V ( A ) iﬀ v ∈ E max ( A ) and | v | = 1.Observe that for any v ∈ R ℓ one has ℓ X i,j =1 (cid:16) ( v i v j + 1) G ij ( x ) + (1 − v i v j ) H ij ( x ) (cid:17) = ℓ X i,j =1 v i v j (cid:0) G ij ( x ) − H ij ( x ) (cid:1) + h ( x ) = h v, F ( x ) , v i + h ( x ) . Therefore, the maximum over all v ∈ R ℓ with | v | ≤ g ( x )) is attained at exactly the same v as themaximum over all v ∈ R ℓ with | v | ≤ λ max ( F ( x )) + h ( x )). Consequently, one has V ( A ) = n v ∈ R ℓ (cid:12)(cid:12)(cid:12) | v | ≤ , h v, F ( x ) v i = λ max ( F ( x )) o . With the use of the spectral decomposition of the matrix F ( x ) one can easilyverify that λ max ( F ( x )) = h v, F ( x ) v i for some | v | ≤ | v | = 1 and v is aneigenvector of the matrix F ( x ) corresponding to its maximal eigenvalue (i.e. v ∈ E max ( A )), which implies the required result.11hus, if an eigenvector v with | v | = 1 of the matrix F ( x ) correspondingto the maximal eigenvalue λ max ( F ( x )) is computed, one can easily computesubgradients of DC components of the function λ max ( F ( · )) at the point x withthe use of subgradients of the functions G ij and H ij . Remark . Let us note once again that one can rewrite nonlinear semideﬁniteprogramming problemminimize f ( x ) subject to F ( x ) (cid:22) , x ∈ A. where A is a convex set, as the following equivalent inequality constrained prob-lem minimize f ( x ) subject to λ max ( F ( x )) ≤ , x ∈ A. (4)In the case when the function f is DC and the function F is componentwiseDC, one can easily extend all existing results and algorithms for inequality con-strained DC optimization problems to the case of problem (4) with use of The-orem 3 and Proposition 1. For the sake of shortness, we leave the tedious taskof explicitly reformulating existing results and algorithms in terms of problem(4) to the interested reader. In the previous section we pointed out how methods and results of DC opti-mization can be applied to nonlinear semideﬁnite optimization problems withcomponentwise DC constraints. Let us now show how one can extend standardresults from DC optimization to the case when the semideﬁnite constraint isDC in the order-theoretic sense. Since such extension does not rely on any par-ticular properties of semideﬁnite problems (i.e. any properties of matrix-valuedfunctions, the L¨owner partial order, etc.) or the ﬁnite dimensional nature ofthe problem, following Lipp and Boyd [40] we study optimality conditions andminimization methods for DC semideﬁnite programming problems in the moregeneral setting of DC cone constrained problems of the formminimize f ( x ) = g ( x ) − h ( x ) , subject to F ( x ) = G ( x ) − H ( x ) (cid:22) K , x ∈ A. ( P )Here g , h are real-valued closed convex functions deﬁned on R d , K is a propercone in a real Banach space Y (i.e. K is a closed convex cone, such that K ∩ ( − K ) = { } ), (cid:22) K is the partial order induced by the cone K , i.e. x (cid:22) K y iﬀ y − x ∈ K , the functions G, H : R d → Y are convex with respect to the cone K (or K -convex), that is, G ( αx + (1 − α ) x ) (cid:22) K αG ( x ) + (1 − α ) G ( x ) ∀ α ∈ [0 , , x , x ∈ R d and the same inequality holds for H , and, ﬁnally, A ⊆ X is a closed convex set.Note that the constraint F ( x ) (cid:22) K F ( x ) ∈ − K .Thus, the problem ( P ) is cone constrained DC optimization problem thatconsists in minimizing the DC objective function f subject to the generalizedinequality (or cone) constraint that is DC with respect to the cone K . In thecase when Y = S ℓ and K is the cone of positive semideﬁnite matrices, theproblem ( P ) becomes a standard nonlinear semideﬁnite programming problem.12 .1 Some Properties of Convex Mappings Before we proceed to the study of cone constrained DC optimization problems,let us ﬁrst present two well-known auxiliary results on convex mappings andconvex multifunctions, whose formulations are tailored to our speciﬁc setting.For the sake of completeness, we provide detailed proofs of these results.We start with the following well-known characterisation of K -convex func-tions in terms of their derivatives. Lemma 1.

A Gˆateaux diﬀerentiable function

Φ : R d → Y is K -convex iﬀ Φ( x ) − Φ( x ) (cid:23) K D Φ( x )( x − x ) ∀ x , x ∈ R d , (5) where D Φ( x ) is the Gˆateaux derivative of Φ at x .Proof. Let Φ be convex. Then by deﬁnition α Φ( x ) + (1 − α )Φ( x ) − Φ( αx + (1 − α ) x ) ∈ K ∀ α ∈ [0 , , x , x ∈ R d . Since K is a cone, for any α ∈ (0 ,

1] one hasΦ( x ) − Φ( x ) − α (cid:0) Φ( x + α ( x − x )) − Φ( x ) (cid:1) ∈ K. Passing to the limit as α → +0 and taking into account the fact that the cone K is closed one obtains thatΦ( x ) − Φ( x ) − D Φ( x )( x − x ) ∈ K ∀ x , x ∈ R d or, equivalently, condition (5) holds true.Conversely, if condition (5) holds true then for all x , x ∈ R d and for any α ∈ [0 ,

1] one hasΦ( x ) − Φ( x ( α )) − (1 − α ) D Φ( x ( α ))( x − x ) ∈ K, Φ( x ) − Φ( x ( α )) − αD Φ( x ( α ))( x − x ) ∈ K. where x ( α ) = αx + (1 − α ) x . Multiplying the ﬁrst expression by α and thesecond expression by 1 − α , and bearing in mind the fact that convex cone isclosed under addition one obtains that α Φ( x ) + (1 − α )Φ( x ) − Φ( x ( α )) ∈ K ∀ α ∈ [0 , , x , x ∈ R d , that is, Φ is K -convex.Let us also present a lemma on solutions of perturbed convex generalizedequations, based on some well-known results on metric regularity of convexmultifunctions (see, e.g. [51]). For any metric space ( X, d ) and all x ∈ X denote B ( x, r ) = { x ′ ∈ X | d ( x ′ , x ) ≤ r } . If X is a normed space, then B X = B (0 , Lemma 2.

Let X be a real Banach space, Z be a metric space, and M z : X → Y , z ∈ Z , be a family of closed convex multifunctions. Suppose that for some z ∗ ∈ Z and x ∈ X one has ∈ int M z ∗ ( X ) and ∈ M z ∗ ( x ) . Suppose also thatthe function z dist(0 , M z ( x )) is continuous at z ∗ and for any ε > thereexists δ > such that M z ∗ ( x + B X ) ⊆ M z ( x + B X ) + εB Y ∀ z ∈ B ( z ∗ , δ ) . Then there exist a neighbourhood U of z ∗ and a mapping ξ : U → X such that ∈ M z ( ξ ( z )) for all z ∈ U , ξ ( z ∗ ) = x , and ξ ( z ) → x as z → z ∗ . roof. Since 0 ∈ int M z ∗ ( X ) and 0 ∈ M z ∗ ( x ), by [51, Thrm. 1] there exists η > ηB Y ⊆ M z ∗ ( x + B X ). By our assumption there exists δ > ηB Y ⊆ M z ∗ ( x + B X ) ⊆ M z ( x + B X ) + η B Y ∀ z ∈ B ( z ∗ , δ ) , which with the use of [51, Lemma 2] implies that η B Y ⊆ M z ( x + B X ) ∀ z ∈ B ( z ∗ , δ ) . Therefore by [51, Thrm. 2] for all x ∈ X and z ∈ B ( z ∗ , δ ) one hasdist( x, M − z (0)) ≤ η (cid:0) k x − x k (cid:1) dist(0 , M z ( x )) . Putting x = x one obtains that for any z ∈ B ( z ∗ , δ ) there exists ξ ( z ) ∈ M − z (0)such that k x − ξ ( z ) k ≤ (4 /η ) dist(0 , M z ( x )). Note that ξ ( z ∗ ) = x , since 0 ∈ M z ∗ ( x ). Moreover, from the fact that the function z dist(0 , M z ( x ) is contin-uous at z ∗ it follows that ξ ( z ) → x as z → z ∗ , which completes the proof. Remark . Roughly speaking, the previous lemma states that if 0 ∈ int M z ∗ ( X )and 0 ∈ M z ∗ ( x ), then under certain semicontinuity assumptions for any z ina neighbourhood of z ∗ there exists a solution ξ ( z ) of the generalized equation0 ∈ M z ( x ) continuously depending on z and such that ξ ( z ∗ ) = x . Corollary 4.

Let X be a Banach space, A ⊆ X be a closed convex set, K ⊆ Y be a proper cone, and Φ , Ψ : X → Y be K -convex functions. Suppose that Φ is continuous on A , Ψ is continuously Fr´echet diﬀerentiable on A , and thefollowing constraint qualiﬁcation holds true ∈ int n Φ( x ) − Ψ( x ∗ ) − D Ψ( x ∗ )( x − x ∗ ) + K (cid:12)(cid:12)(cid:12) x ∈ A o (6) for some x ∗ ∈ A such that Φ( x ∗ ) − Ψ( x ∗ ) (cid:22) K . Then for any x ∈ A such that Φ( x ) − Ψ( x ∗ ) − D Ψ( x ∗ )( x − x ∗ ) (cid:22) K there exists a neighbourhood U of x ∗ and a mapping ξ : U ∩ A → A such that Φ( ξ ( z )) − Ψ( z ) − D Ψ( z )( ξ ( z ) − z ) (cid:22) K ∀ z ∈ U ∩ A ξ ( x ∗ ) = x , and ξ ( z ) → x as z → x ∗ .Proof. For any z ∈ X introduce the convex function Φ z : X → Y deﬁned asΦ z ( x ) = Φ( x ) − Ψ( z ) − D Ψ( z )( x − z ) and the set-valued mapping M z ( x ) = ( Φ z ( x ) + K , if x ∈ A , ∅ , if x / ∈ A . (7)The multifunction M z is closed due to the facts that the function Φ z ( · ) is con-tinuous and the sets A and K are closed. Moreover, this multifunction is convex.Indeed, by the convexity of Φ for any x , x ∈ A and all α ∈ [0 ,

1] one has α Φ z ( x ) + (1 − α )Φ z ( x ) ∈ Φ z ( αx + (1 − α ) x )) + K , K implies that αM z ( x ) + (1 − α ) M z ( x ) ⊆ Φ z ( αx + (1 − α ) x )) + K + α K + (1 − α ) K⊆ M z ( αx + (1 − α ) x )for all x , x ∈ A and α ∈ [0 , M z is convex.Our aim is to apply Lemma 2 with Z = A and z ∗ = x ∗ . Indeed, by deﬁnition0 ∈ M z ∗ ( x ), while condition (6) implies that 0 ∈ int M z ∗ ( X ).From the fact that Ψ is continuously Fr´echet diﬀerentiable on A it followsthat for any ε > δ < min { , ε/ k D Ψ( z ∗ ) k ) } such that k Ψ( z ) − Ψ( z ∗ ) k < ε , k D Ψ( z ) − D Ψ( z ∗ ) k < ε k x k + k z ∗ k )for all z ∈ B ( z ∗ , δ ) ∩ A . Choose any y ∈ M z ∗ ( x + B X ). By deﬁnition there exist x ∈ ( x + B X ) ∩ A and w ∈ K such that y = Φ z ∗ ( x ) + w . Observe that k Φ z ( x ) + w − y k = k Φ z ( x ) − Φ z ∗ ( x ) k≤ k Ψ( z ) − Ψ( x ∗ ) k + k D Ψ( z ) − D Ψ( z ∗ ) kk x − z k + k D Ψ( z ∗ ) kk z − z ∗ k < ε for all z ∈ B ( z ∗ , δ ) ∩ A , which implies that M z ∗ ( x + B X ) ⊆ M z ( x + B X ) + εB Y ∀ z ∈ B ( x, δ ) ∩ A . Thus, it remains to show that the restriction of the function dist(0 , M z ( x ) to A is continuous.By deﬁnition dist(0 , M z ( x )) = dist(Φ z ( x ) , − K ) (see (7)). With the use ofthe fact that Ψ is continuously Fr´echet diﬀerentiable one obtains that for any ε > r < min { , ε/ k D Ψ( z ∗ ) k ) } such that k Ψ( z ) − Ψ( z ∗ ) k < ε , k D Ψ( z ) − D Ψ( z ∗ ) k < ε k x k + k z ∗ k + 1)for all z ∈ B ( z ∗ , r ) ∩ A . Therefore for any such z one has k Φ z ( x ) − Φ z ∗ ( x ) k ≤ k Ψ( z ) − Ψ( z ∗ ) k + k D Ψ( z ) − D Ψ( z ∗ ) kk x − z k + k D Ψ( z ∗ ) kk z − z ∗ k < ε, which implies that for any z ∈ B ( x ∗ , r ) ∩ A the following inequality hold true:dist(0 , M z ( x )) = dist(Φ z ( x ) , − K ) ≤ k Φ z ( x ) − Φ x ∗ ( x ) k < ε (here we used the fact that Φ z ∗ ( x ) ∈ − K ). Thus, all assumptions of Lemma 2with Z = A and z ∗ = x ∗ are valid, and by this lemma there exists a requiredmapping ξ ( z ). Let us extend well-known local optimality conditions for constrained DC opti-mization problems to the case of the problem ( P ). To the best of the author’sknowledge, standard subdiﬀerential calculus cannot be extended to the case ofconvex matrix-valued functions and many other K -convex vector-valued func-tions, which makes it very diﬃcult to deal with subdiﬀerentials of such functions.Therefore, below we suppose that the function H (the K -concave part of F ) iscontinuously diﬀerentiable, but do not impose any smoothness assumptions onthe objective function f . 15 heorem 4. Let x ∗ be a locally optimal solution of the problem ( P ) and thefunction H be Fr´echet diﬀerentiable at x ∗ . Then for any v ∈ ∂h ( x ∗ ) the point x ∗ is a globally optimal solutions of the following convex programming problem: minimize g ( x ) − h ( x ∗ ) − h v, x − x ∗ i subject to G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) (cid:22) K , x ∈ A, (8) where DH ( x ∗ ) is the Fr´echet derivative of H at x ∗ .Proof. Denote by ω v ( x ) = g ( x ) − h ( x ∗ ) − h v, x − x ∗ i , x ∈ R d , the objectivefunction of problem (8). This function is convex. Moreover, taking into accountthe fact that by the deﬁnition of subgradient h ( x ) ≥ h ( x ∗ ) + h v, x − x ∗ i , oneobtains that ω v ( x ) ≥ f ( x ) for all x ∈ R d and ω v ( x ∗ ) = f ( x ∗ ).Arguing by reductio ad absurdum, suppose that there exists v ∈ ∂h ( x ∗ )such that the point x ∗ is not a globally optimal solution of problem (8), i.e.there exists a feasible point x of this problem such that ω v ( x ) < ω v ( x ∗ ). Deﬁne x ( α ) = αx + (1 − α ) x ∗ . Then f ( x ( α )) ≤ ω v ( x ( α )) ≤ αω v ( x ) + (1 − α ) ω v ( x ∗ ) < ω v ( x ∗ ) = f ( x ∗ ) (9)for all α ∈ (0 , ω v .Let us check that x ( α ) is a feasible point of the problem ( P ) for all α ∈ [0 , x ∗ is not a locally optimalsolution of the problem ( P ), which contradicts the assumption of the theorem.Indeed, by Lemma 1 one has H ( x ( α )) − H ( x ∗ ) − DH ( x ∗ )( x ( α ) − x ∗ ) ∈ K for all α ∈ [0 , G ( x ( α )) one obtains that − F ( x ( α )) + G ( x ( α )) − H ( x ∗ ) − DH ( x ∗ )( x ( α ) − x ∗ ) ∈ K ∀ α ∈ [0 , F ( x ( α )) (cid:22) K G ( x ( α )) − H ( x ∗ ) − DH ( x ∗ )( x ( α ) − x ∗ ) ∀ α ∈ [0 , . Hence taking into account the fact that the point x ( α ) is feasible for problem(8) due to the convexity of this problem one can conclude that F ( x ( α )) (cid:22) K x ( α ) is a feasible point of the problem ( P ) and the proof is complete.Let us reformulate optimality conditions from the previous theorem. Denoteby Ω( x ∗ ) the feasible region of problem (8) and for any convex set V ⊆ R d and x ∈ V denote by N V ( x ) = { v ∈ R d | h v, z − x i ≤ ∀ z ∈ V } the normal cone to V at x . Corollary 5.

Let x ∗ be a locally optimal solution of the problem ( P ) and thefunction H be diﬀerentiable at x ∗ . Then ∂h ( x ∗ ) ⊆ ∂g ( x ∗ ) + N Ω( x ∗ ) ( x ∗ ) . Proof.

Fix any v ∈ ∂h ( x ∗ ). By Theorem 4 the point x ∗ is a globally optimalsolution of the convex problem (8). Applying standard necessary and suﬃ-cient optimality conditions for a convex function on a convex set (see, e.g. [27,Theorem 1.1.2’]) one obtains that 0 ∈ ∂ω v ( x ∗ ) + N Ω( x ∗ ) ( x ∗ ), where, as above, ω v ( x ) = g ( x ) − h ( x ∗ ) − h v, x − x ∗ i is the objective function of problem (8).Since ∂ω ( x ∗ ) = ∂g ( x ∗ ) − v , one gets that v ∈ ∂g ( x ∗ ) + N Ω( x ∗ ) ( x ∗ ), whichimplies the desired result. 16n the case when a natural constraint qualiﬁcation (namely, Slater’s condi-tion for problem (8)) holds at x ∗ , one can show that optimality conditions fromTheorem 4 coincide with standard optimality conditions for cone constrainedoptimization problems (see, e.g. [4]). To this end, denote by Y ∗ the topologicaldual space of Y and by h· , ·i the canonical duality pairing between Y and Y ∗ ,that is, h y ∗ , y i = y ∗ ( y ) for any y ∗ ∈ Y ∗ and y ∈ Y .Let K ∗ = { y ∗ ∈ Y ∗ | h y ∗ , y i ≥ ∀ y ∈ K } be the dual cone of K and for any λ ∈ Y ∗ deﬁne L ( x, λ ) = f ( x ) + h λ, F ( x ) i . Corollary 6.

Let x ∗ be a locally optimal solution of the problem ( P ) and thefunctions G and H be Fr´echet diﬀerentiable at x ∗ . Suppose also that the follow-ing constraint qualiﬁcation holds true: ∈ int n G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) + K (cid:12)(cid:12)(cid:12) x ∈ A o (if K has nonempty interior, it is suﬃcient to suppose that there exists x ∈ A such that G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) ∈ − int K ). Then for any v ∈ ∂h ( x ∗ ) there exists a multiplier λ ∗ ∈ K ∗ such that h λ ∗ , F ( x ∗ ) i = 0 and v ∈ ∂g ( x ∗ ) + D (cid:16) h λ ∗ , F ( · ) i (cid:17) ( x ∗ ) + N A ( x ∗ ) . In particular, if both g and h are diﬀerentiable at x ∗ , then there exists λ ∗ ∈ K ∗ such that h λ ∗ , F ( x ∗ ) i = 0 and h D x L ( x ∗ , λ ∗ ) , x − x ∗ i ≥ for all x ∈ A .Proof. Rewriting problem (8) as the convex cone constrained problemminimize g ( x ) − h ( x ∗ ) − h v, x − x ∗ i subject to G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) ∈ − K, x ∈ A and applying standard necessary and suﬃcient optimality conditions for convexcone constrained optimization problems (see, e.g. [4, Theorem 3.6 and Proposi-tion 2.106]) we arrive at the required result. Remark . In the case of semideﬁnite programs, i.e. when Y = S ℓ and K is thecone of semideﬁnite matrices, the dual cone K ∗ coincides with K (if we identifythe dual of S ℓ with the space S ℓ itself), and thus the multiplier λ ∗ from theprevious corollary is a positive semideﬁnite matrix. In addition, the constraintqualiﬁcation from the corollary takes the form: there exists x ∈ A such that thematrix G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) is negative deﬁnite. The optimality conditions from the previous section can be applied to a con-vergence analysis of the so-called convex-concave procedure (CCP) for coneconstrained DC optimization problems proposed in [40], which can be viewedas an extension of the renown DC Algorithms [35, 37, 47] to the case of coneconstrained problems. A general scheme (algorithmic pattern) of this methodfor the problem ( P ) is given in Algorithm 1. Let us note that the convex sub-problem on Step 3 of Algorithm 1 can be solved with the use of interior pointmethods (see, e.g. [5, Sect. 11.6]), augmented Lagrangian methods [32, 54], etc.Our aim is to prove a convergence theorem for Algorithm 1. Clearly, in thenonsmooth case (more precisely, when h is nonsmooth) one cannot expect a17 lgorithm 1: DC Algorithm/The Convex-Concave Procedure (CCP).

Step 1.

Choose a feasible initial point x and set n := 0. Step 2.

Compute v n ∈ ∂h ( x n ) and DH ( x n ). Step 3.

Set the value of x n +1 to a solution of the convex problemminimize g ( x ) − h v n , x − x n i subject to G ( x ) − H ( x n ) − DH ( x n )( x − x n ) (cid:22) K , x ∈ A. If x n +1 = x n , Stop . Otherwise, put n := n + 1 and go to Step 2 .sequence { x n } generated by this algorithm to converge to a point satisfying op-timality conditions from Corollary 5. Furthermore, these optimality conditionsare often too restrictive for applications, since they require the knowledge ofthe entire subdiﬀerential ∂h ( x ∗ ), which might make veriﬁcation of these con-ditions too computationally expensive or even impossible. That is why oneusually establishes a convergence of DC optimization methods to so-called crit-ical points [37, 67]. Recall that a point x ∗ is said to be critical for the problem( P ), if the following condition holds true: ∂h ( x ∗ ) ∩ (cid:16) ∂g ( x ∗ ) + N Ω( x ∗ ) ( x ∗ ) (cid:17) = ∅ . Note that this condition is satisﬁed iﬀ there exists v ∈ ∂h ( x ∗ ) such that x ∗ is aglobally optimal solution of convex problem (8) (cf. Theorem 4 and Corollary 5).Hence, in particular, if a point x n on Step 3 of Algorithm 1 is not critical for theproblem ( P ), then x n is not a solution of the corresponding convex subproblem.In other words, if Algorithm 1 terminates on step n ∈ N , then x n is a criticalpoint for the problem ( P ).The proof of the following theorem was largely inspired by the convergenceanalysis of an algorithmic pattern for inequality constrained DC optimizationproblems from [67, Section 3.1]. However, let us note that we prove the globalconvergence of Algorithm 1 to a critical point under assumptions that are dif-ferent from the ones used in [67]. Theorem 5.

Let the function f be bounded below on the feasible region of theproblem ( P ) , G be continuous on A , H be continuously Fr´echet diﬀerentiableon A , and a sequence { x n } be generated by Algorithm 1. Then the followingstatements hold true:1. the feasible region Ω( x n ) of the convex subproblem on Step 3 of the algo-rithm is nonempty for all n ∈ N ∪ { } , and the sequence { x n } is feasiblefor the problem ( P ) ;2. for any n ∈ N ∪ { } either x n is a critical point of the problem ( P ) andthe process terminates at step n or f ( x n +1 ) < f ( x n ) ; moreover, if thealgorithm does not terminate, then the sequence { f ( x n ) } converges;3. if the function h is strongly convex with constant µ > , then f ( x n +1 ) ≤ f ( x n ) − µ | x n +1 − x n | (10) for all n ∈ N ∪ { } ; . if x ∗ is a limit point of the sequence { x n } such that ∈ int (cid:8) G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) + K (cid:12)(cid:12) x ∈ A (cid:9) (that is, Slater’s condition holds for problem (8) ), then x ∗ is a criticalpoint for the problem ( P ) .Proof.

1. Let us prove this statement by induction in n . By our assumption x is feasible for the problem ( P ), which implies that x ∈ Ω( x ), that is, thefeasible region Ω( x ) of the convex subproblem is nonempty. Inductive step.

Suppose that for some n ∈ N the point x n is feasible for theproblem ( P ) and Ω( x n ) is nonempty. Let us prove that x n +1 is feasible for theproblem ( P ). Then x n +1 ∈ Ω( x n +1 ), i.e. Ω( x n +1 ) = ∅ , and the proof of theﬁrst statement is complete.Indeed, by deﬁnition the point x n +1 is a globally optimal solution of theconvex subproblem on Step 3 of the algorithm, which implies that G ( x n +1 ) − H ( x n ) − DH ( x n )( x n +1 − x n ) (cid:22) K , x n +1 ∈ A. By Lemma 1 one has − H ( x n +1 ) (cid:22) K − H ( x n ) − DH ( x n )( x n +1 − x n ) . Therefore F ( x n +1 ) = G ( x n +1 ) − H ( x n +1 ) (cid:22) K

0, i.e. the point x n +1 is feasiblefor the problem ( P ).2. If a point x n is not critical, then, as was noted above, x n is not a solutionof the convex subproblem on Step 3 of Algorithm 1, which implies that g ( x n +1 ) − h v n , x n +1 − x n i < g ( x n ) . Subtracting h ( x n ) from both sides of this inequality and applying the deﬁnitionof subgradient one obtains that f ( x n +1 ) < f ( x n ). Hence bearing in mind thefacts that the sequence { x n } is feasible and f is bounded below on the feasibleregion one gets that the sequence { f ( x n ) } converges.3. Fix any n ∈ N . Due to the strong convexity of h one has h ( x n +1 ) − h ( x n ) ≥ h v n , x n +1 − x n i + µ | x n +1 − x n | . Furthermore, by the deﬁnition of x n +1 one has g ( x n ) ≥ g ( x n +1 ) − h v n , x n +1 − x n i . Summing up these two inequalities one obtains that (10) holds true.4. By our assumption there exists a subsequence { x n k } converging to x ∗ .The corresponding sequence { v n k } of subgradients of the function h is bounded,since the subdiﬀerential mapping of a ﬁnite convex function is locally bounded(see, e.g. [52, Crlr. 24.5.1]). Therefore, replacing, if necessary, the sequence { x n k } with its subsequence one can suppose that the sequence of subgradients { v n k } converges to some vector v ∗ belonging to ∂h ( x ∗ ) due to the fact that thegraph of the subdiﬀerential is closed (see, e.g. [52, Thrm. 24.4]).Arguing by reductio ad absurdum, suppose that x ∗ is not a critical point ofthe problem ( P ). Then, in particular, v ∗ / ∈ ∂g ( x ∗ ) + N Ω( x ∗ ) ( x ∗ ) , x ∗ is not a globally optimal solution ofthe convex problemminimize g ( x ) − h v ∗ , x − x ∗ i subject to G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) (cid:22) K , x ∈ A (11)(see the proof of Corollary 5). Consequently, there exists a feasible point x ofthis problem and θ > g ( x ) − h v ∗ , x − x ∗ i < g ( x ∗ ) − θ .Applying Corollary 4 with Φ = G , Ψ = H , A = A , and K = K one obtainsthat for any z ∈ A lying in a neighbourhood of x ∗ one can ﬁnd a point ξ ( z ) ∈ A such that G ( ξ ( z )) − H ( z ) − DH ( z )( ξ ( z ) − z ) (cid:22) K ξ ( z ) → x as z → x ∗ .Hence taking into account the facts that the subsequence { x n k } converges to x ∗ , while { v n k } converges to v ∗ , one obtains that there exists k ∈ N such thatfor all k ≥ k one has g ( ξ ( x n k )) − h v n k , ξ ( x n k ) − x n k i ≤ g ( x n k ) − θ ,G ( ξ ( x n k )) − H ( x n k ) − DH ( x n k )( ξ ( x n k ) − x n k ) (cid:22) K . Note that ξ ( x n k ) is a feasible point of the convex subproblem on Step 3 ofAlgorithm 1 for any k ≥ k . Consequently, by the deﬁnition of x n k +1 one has g ( x n k +1 ) − h v n k , x n k +1 − x n k i ≤ g ( ξ ( x n k )) − h v n k , ξ ( x n k ) − x n k i≤ g ( x n k ) − θ k ≥ k . Subtracting h ( x n k ) from both sides of this inequality andapplying the deﬁnition of subgradient one gets that f ( x n k +1 ) ≤ f ( x n k ) − θ/ k ≥ k . Hence with the use of the second part of this theorem one canconclude that f ( x n ) → −∞ , which contradicts the facts that f is boundedbelow on the feasible set by our assumption and the sequence { x n } is feasibleby the ﬁrst part of the theorem. Remark . (i) Note that the assumption on the strong convexity of the function h is not restrictive, since if this assumption is not satisﬁed, for any µ > f = g − h of the objective function f withthe following one: f ( x ) = (cid:16) g ( x ) + µ | x | (cid:17) − (cid:16) h ( x ) + µ | x | (cid:17) , x ∈ R d . (ii) Since by the previous theorem the sequence { f ( x n ) } converges, one can usethe inequality | f ( x n +1 ) − f ( x n ) | < ε (or k x n +1 − x n k < ε , when h is stronglyconvex) as a stopping criterion for Algorithm 1. Observe that in order to apply Algorithm 1, one needs to ﬁnd a feasible pointof the problem under consideration. In the case when such point is unknown inadvance and is hard to compute, one can use a combination of the DC algorithmand exact penalty techniques that allows one to start iterations at infeasiblepoints. Such modiﬁcations of Algorithm 1 were discussed in [40] (and in [35,47]20n the case of inequality constrained problems). Here we present and analyseone such method, called the Penalty Convex-Concave Procedure (Penalty CCP),which is a slight modiﬁcation of [40, Algorithm 4.2]. This method can be viewedas an extension of DCA2 algorithm from [35,47] to the case of cone constrainedDC optimization problems.A general scheme of DCA2/Penalty CCP for the problem ( P ) is given inAlgorithm 2. The only diﬀerence between our method and [40, Algorithm 4.2]is the penalty updates. Namely, in constrast to [40], we increase the penaltyparameter, only if the infeasibility measure at the current iteration exceeds aprespeciﬁed threshold. Let us also note that the inequality t ≻ K ∗ t ∈ K ∗ and h t , y i > y ∈ K , y = 0. Algorithm 2:

DCA2/Penalty CCP.

Step 1.

Choose an initial point x ∈ A , penalty parameter t ≻ K ∗ τ max > µ > κ >

0, and set n := 0. Step 2.

Compute v n ∈ ∂h ( x n ) and DH ( x n ). Step 3.

Set the value of x n +1 to a solution of the convex problemminimize ( x,s ) g ( x ) − h v, x − x n i + h t n , s i subject to G ( x ) − H ( x n ) − DH ( x n )( x − x n ) (cid:22) K s, s (cid:23) K , x ∈ A. If x n +1 = x n , Stop . Step 4.

Deﬁne t n +1 = ( µt n , if k s n +1 k ≥ κ and µ k t n k ≤ τ max ,t n , otherwise , where ( x n +1 , s n +1 ) is a solution of the subproblem on Step 3. Put n := n + 1 and go to Step 2 .Let us analyse a convergence of Algorithm 2. Firstly, we show that undersome standard assumptions the penalized convex subproblem on Step 3 of thisalgorithm is exact, in the sense that if the norm of the penalty parameter t n issuﬃciently large, then a solution of the subproblem on Step 3 of Algorithm 2coincides with the solution of the corresponding non-penalized problemminimize g ( x ) − h v, x − x n i subject to G ( x ) − H ( x n ) − DH ( x n )( x − x n ) (cid:22) K , x ∈ A, (12)provided the feasible region of this problem is nonempty. This result implies, inparticular, that if for some n ∈ N the norm of the penalty parameter t n exceedsa certain threshold and the feasible region of problem (12) is nonempty, thenthe next point x n +1 is feasible for the problem ( P ) and the rest of the iterationsof Algorithm 2 coincide with the iterations of Algorithm 1. Thus, in this caseone can ensure the convergence of a sequence generated by Algorithm 2 to acritical point for the problem ( P ).Before we proceed to the proof of the exactness of the subproblem fromStep 3 of Algorithm 2, let us ﬁrst provide simple suﬃcient conditions for the21xistence of globally optimal solutions of this problem and the correspondingnon-penalized problem (12). To this end, recall that a function ϕ : R d → R is called coercive on the set A , if ϕ ( x n ) → + ∞ as n → ∞ for any sequence { x n } ⊂ A such that k x n k → + ∞ as n → ∞ . Lemma 3.

Let the space Y be ﬁnite dimensional, the cone K be generating (i.e. K − K = Y ), G be continuous on A , H be continuously Fr´echet diﬀerentiableon R d , and the penalty function Φ c ( · ) = f ( · ) + c dist( F ( · ) , − K ) be coercive on A for some c > . Then there exists µ ∗ ≥ such that for any µ ≥ µ ∗ and forall x ∈ R d and v ∈ ∂h ( x ) there exists a globally optimal solution of the penalizedproblem minimize ( x,s ) g ( x ) − h v, x − x i + µ h t , s i subject to G ( x ) − H ( x ) − DH ( x )( x − x ) (cid:22) K s, s (cid:23) K , x ∈ A. (13) Moreover, if the feasible region of the corresponding non-penalized problem minimize g ( x ) − h v, x − x i subject to G ( x ) − H ( x ) − DH ( x )( x − x ) (cid:22) K , x ∈ A (14) is nonempty, then this problem has a globally optimal solution as well.Proof. Indeed, ﬁx any x ∈ R d . Suppose at ﬁrst that the feasible region ofproblem (14) is nonempty. Arguing in the same way as in the proof of Theorem 4one can check that F ( x ) (cid:22) K G ( x ) − H ( x ) − DH ( x )( x − x ) ∀ x, x ∈ R d , (15)which implies that the feasible region of problem (14) is contained in the feasibleregion of the problem ( P ). From the the coercivity of the penalty functionΦ c ( · ) = f ( · ) + c dist( F ( · ) , − K ) on the set A it follows that the function f iscoercive on the feasible region of the problem ( P ) and, therefore, on the feasibleregion of problem (14) as well (recall that F ( x ) (cid:22) K F ( x ) ∈ − K ). Hencetaking into account the fact that by the deﬁnition of subgradient g ( x ) − h v, x − x i ≥ f ( x ) + h ( x ) ∀ x, x ∈ R d . one obtains that the objective function of problem (14) is coercive on the feasibleregion of this problem, which is closed by virtue of our assumptions on G and H . Consequently, there exists a globally optimal solution of problem (14).Let us now consider problem (13). The assumptions of the lemma on G and H guarantee that the feasible region of this problem is closed. Note that a pair( x, s ) ∈ A × K is feasible for this problem iﬀ G ( x ) − H ( x ) − DH ( x )( x − x ) ∈ s − K. Hence bearing in mind the fact that the cone K is generating one gets that thefeasible region of problem (13) is nonempty.Let us check that the objective function ω ( x, s ) = g ( x ) − h v, x − x i + µ h t , s i

22f problem (13) is coercive on the feasible region of this problem, provided µ ≥ ω is not coercive onthe feasible region of problem (13). Then there exist M > { ( x n , s n ) } of feasible points of problem (13) such that k x n k + k s n k → + ∞ as n → ∞ , but ω ( x n , s n ) ≤ M for all n ∈ N . Observe that F ( x n ) (cid:22) K s n for all n ∈ N due to (15), which implies that M ≥ ω ( x n , s n ) ≥ inf n ω ( x n , s ) (cid:12)(cid:12)(cid:12) s (cid:23) K F ( x n ) , s (cid:23) K o ∀ n ∈ N . Let us estimate the inﬁmum on the right-hand side of this inequality. Bearingin mind the facts that t is a continuous linear functional, t ≻ K ∗

0, and K isa closed subset of a ﬁnite dimensional normed space one obtains that τ := min n h t , s i (cid:12)(cid:12)(cid:12) s ∈ K, k s k = 1 o > , h t , s i ≥ τ k s k ∀ s ∈ K. Therefore for any n ∈ N one has M ≥ ω ( x n , s n ) ≥ g ( x n ) − h v, x n − x i + µτ inf n k s k (cid:12)(cid:12)(cid:12) s ∈ F ( x n ) + K, s ∈ K o ≥ f ( x n ) + h ( x ) + µτ inf y ∈ K k F ( x n ) + y k = f ( x n ) + h ( x ) + µτ dist( F ( x n ) , − K ) . Hence taking into account the fact that by our assumption the penaly functionΦ c ( · ) = f ( · ) + c dist( F ( · ) , − K ) is coercive on A , one obtains that the sequence { x n } is bounded, provided µ ≥ c/τ . Consequently, k s n k → + ∞ as n → ∞ ,which contradicts the fact that M ≥ ω ( x n , s n ) ≥ min k x k≤ r (cid:0) g ( x ) − h v, x − x i (cid:1) + µτ k s n k for all n ∈ N , where r = sup n ∈ N k x n k . Thus, the function ω is coercive on thefeasible region of problem (13) for any µ ≥ c/τ and for any such µ there existsa globally optimal solution of this problem. Remark . Note that the assumptions on the space Y and the cone K arenot used in the proof of the existence of globally optimal solutions of the non-penalized problem (14). Now we can turn to the proof of the exactness of the penalized problem (13).Introduce the set D = n x ∈ A (cid:12)(cid:12)(cid:12) ∃ x ∈ A : G ( x ) − H ( x ) − DH ( x )( x − x ) (cid:22) K o , i.e. D is the set of all those x ∈ A for which the feasible region of the non-penalized problem (14) is nonepmty. Observe that the feasible region of theproblem ( P ) is contained in D , but in the general case D = R d . Denote by D s = n x ∈ A (cid:12)(cid:12)(cid:12) ∈ int (cid:8) G ( x ) − H ( x ) − DH ( x )( x − x ) + K (cid:12)(cid:12) x ∈ A (cid:9)o . x ∈ A for which the constraint qualiﬁcation from Corollary 6holds true. It should be noted that in the case when the cone K has nonemptyinterior, this constraint qualiﬁcation is satisﬁed iﬀ there exists x ∈ A such that G ( x ) − H ( x ) − DH ( x )( x − x ) ∈ − int K, that is, iﬀ Slater’s condition for the non-penalized problem (14) holds true (see,e.g. [4, Prp. 2.106]). Note that by deﬁnition D s ⊆ D . Thus, D s is the subsetof D consisting of all those x for which Slater’s condition holds true for thenon-penalized problem.Under some natural assumptions one can verify that the set D is closed (inparticular, it is suﬃcient to suppose that the feasible region of the problem ( P )is bounded, G is continuous, and H is continuously diﬀerentiable), while theset D s is open in A . Therefore, there are some degenerate points x ∈ D \ D s (e.g. the ones that lie on the boundary of D in A ) for which one must imposesome additional assumptions. Our aim is to ﬁrst provide somewhat cumbersomesuﬃcient conditions for the exactness of the penalized problem (13) for the entireset D or its arbitrary subset, and then show that these conditions are satisﬁedfor any compact subset of D s . The suﬃcient conditions that we present hereare based on a uniform local error bound for the non-penalized problem (14).To simplify the formulations and proofs of the statements below, for any z ∈ R d introduce the convex function F z ( x ) = G ( x ) − H ( z ) − DH ( z )( x − z ), x ∈ R d , and the set-valued mapping M z ( x ) = ( G ( x ) − H ( z ) − DH ( z )( x − z ) + K, if x ∈ A, ∅ , if x / ∈ A. The multifunction M z is convex and closed, provided the function G is contin-uous on A . Proposition 2.

Let K be ﬁnite dimensional, G be continuous on A , H becontinuously Fr´echet diﬀerentiable on A , and there exist c ≥ such that thepenalty function Φ c ( · ) = f ( · ) + c dist( F ( · ) , − K ) is coercive on A . Let also D ⊆ D be a nonempty set for which one can ﬁnd a > , L g > , and L h > such that for any z ∈ D and v ∈ ∂h ( z ) one has k v k ≤ L h and there exist r > and a globally optimal solution x ∗ of the problem minimize g ( x ) − h v, x − z i subject to G ( x ) − H ( z ) − DH ( z )( x − z ) (cid:22) K , x ∈ A (16) such that g is Lipschitz continuous near x ∗ with Lipschitz constant L g and dist( F z ( x ) , − K ) ≥ a dist( x, M − z (0)) ∀ x ∈ B ( x ∗ , r ) ∩ A. (17) Then there exists µ ∗ ≥ such that for all µ ≥ µ ∗ and for any z ∈ D and v ∈ ∂h ( z ) there exists a globally optimal solution of the penalized problem minimize ( x,s ) g ( x ) − h v, x − z i + µ h t , s i subject to G ( x ) − H ( z ) − DH ( z )( x − z ) (cid:22) K s, s (cid:23) K , x ∈ A, (18) and a pair ( x ∗ , s ∗ ) is a solution of this problem if and only if s ∗ = 0 and x ∗ isa solution of the corresponding non-penalized problem (16) . roof. Fix any z ∈ D and v ∈ ∂h ( z ), and denote by ω µ ( x, s ) = g ( x ) − h v, x − z i + µ h t , s i . the objective function of problem (18). Arguing in the same way as in the proofof Lemma 3, one can check that there exists τ > h t , s i ≥ τ k s k forall s ∈ K . Therefore, for any feasible point ( x, s ) of problem (18) one has ω µ ( x, s ) ≥ g ( x ) − h v, x − z i + µτ inf (cid:8) k s k (cid:12)(cid:12) s (cid:23) K F z ( x ) , s (cid:23) K (cid:9) ≥ g ( x ) − h v, x − z i + µτ inf (cid:8) k s k (cid:12)(cid:12) s ∈ F z ( x ) + K (cid:9) = g ( x ) − h v, x − z i + µτ dist( F z ( x ) , − K ) . Let x ∗ be a globally optimal solution of the non-penalized problem (16) fromthe formulation of the proposition (optimal solutions of this problem exist byLemma 3). Observe that by deﬁnition the set M − z (0) coincides with the feasibleregion of problem (16). Therefore, by [10, Prp. 2.7] there exists δ > g ( x ) − h v, x − z i ≥ g ( x ∗ ) − h v, x ∗ − z i − ( L g + L h ) dist( x, M − z (0))for all x ∈ B ( x ∗ , δ ) ∩ A . Consequently, applying inequality (17) one obtains that ω µ ( x, s ) ≥ g ( x ∗ ) − h v, x ∗ − z i + ( µτ a − L g − L h ) dist( x, M − z (0))for any feasible point ( x, s ) of problem (18) such that x ∈ B ( x ∗ , δ ). Hence forany such ( x, s ) one has ω µ ( x, s ) ≥ g ( x ∗ ) − h v, x ∗ − z i = ω ( x ∗ , ∀ µ ≥ µ ∗ := L g + L h τ a , that is, ( x ∗ ,

0) is a locally optimal solution of problem (18) for any µ ≥ µ ∗ . Tak-ing into account the fact that this problem is convex, one gets that for any such µ the pair ( x ∗ ,

0) is a globally optimal solution of problem (18). Furthermore,since for any other globally optimal solution b x of the non-penalized problem(16) one has ω µ ( x ∗ ,

0) = ω µ ( b x, any globally optimalsolution b x of the non-penalized problem (16) and for all µ ≥ µ ∗ the pair ( b x, b x,

0) is a globally optimal solution of the penal-ized problem (18), then b x is necessarily a globally optimal solution of the non-penalized problem (16). In addition, for any x ∈ A and s ∈ K \ { } one has ω µ ( x, s ) = g ( x ) − h v, x − z i + µ h t , s i > g ( x ) − h v, x − z i + µ ∗ h t , s i = ω µ ∗ ( x, s ) ≥ ω µ ∗ ( x ∗ ,

0) = ω µ ( x ∗ , µ > µ ∗ , that is, for any µ > µ ∗ globally optimal solution of problem (18)necessarily have the form ( b x, µ > µ ∗ a pair ( b x, b s ) is a globallyoptimal solution of the penalized problem (18) iﬀ b s = 0 and b x is a solution ofthe corresponding non-penalized problem. Since z ∈ D and v ∈ ∂h ( z ) werechosen arbitrarily and µ ∗ does not depend on z and v , one can conclude thatthe statement of the proposition holds true. Corollary 7.

Let K be ﬁnite dimensional, G be continuous on A , H be contin-uously Fr´echet diﬀerentiable on A , and there exist c ≥ such that the penalty unction Φ c ( · ) = f ( · )+ c dist( F ( · ) , − K ) is coercive on A . Then for any compactsubset D ⊆ D s there exists µ ∗ ≥ such that for all µ ≥ µ ∗ and for any z ∈ D and v ∈ ∂h ( z ) there exists a globally optimal solution of the penalized problem (18) and this problem is exact, in the sense that a pair ( x ∗ , s ∗ ) is a solution ofthis problem iﬀ s ∗ = 0 and x ∗ is a solution of the corresponding non-penalizedproblem (16) .Proof. Let us verify that for any z ∈ D s there exists r > D = B ( z, r ) ∩ A .Then one can easily verify that these assumptions are satisﬁed for any compactsubset D ⊆ D s .Fix any z ∈ D s and choose some x ∈ A such that 0 ∈ M z ( x ). By thedeﬁnition of the set D s one has 0 ∈ int M z ( R d ). Hence by [51, Thrm. 1] thereexists η > ηB Y ⊆ M z ( x + B R d ).From the fact that H is continuously Fr´echet diﬀerentiable it follows thatthere exists r < min { , η/ k DH ( z ) k ) } such that k H ( u ) − H ( z ) k ≤ η , k DH ( u ) − DH ( z ) k ≤ η k x k + k z k )for any u ∈ B ( z, r ) ∩ A . Choose any y ∈ M z ( x + B R d ). By deﬁnition thereexist x ∈ ( x + B R d ) ∩ A and w ∈ K such that y = F z ( x ) + w . Observe that k F u ( x ) + w − y k = k F u ( x ) − F z ( x ) k ≤ k H ( u ) − H ( z ) k + k DH ( u ) − D ( z ) kk x − u k + k DH ( z ) kk u − z k ≤ η u ∈ B ( z, r ) ∩ A , which implies that ηB Y ⊆ M z ( x + B R d ) ⊆ M u ( x + B R d ) + η B Y ∀ u ∈ B ( z, r ) ∩ A. Consequently, by [51, Lemma 2] one has η B Y ⊆ M u ( x + B R d ) ∀ u ∈ B ( z, r ) ∩ A, (19)which with the use of [51, Thrm. 2] yields that for all x ∈ R d and u ∈ B ( z, r ) ∩ A one has dist( x, M − u (0)) ≤ η (cid:0) k x − x k (cid:1) dist(0 , M u ( x )) (20)Let us show that one can ﬁnd R > u ∈ B ( z, r ) ∩ A and v ∈ ∂h ( u ) globally optimal solutions of the problemmin g ( x ) − h v, x − u i s.t. G ( x ) − H ( u ) − DH ( u )( x − u ) (cid:22) K , x ∈ A (21)(which exist by Lemma 3) lie in the ball B (0 , R ). Then taking into account thefact that by deﬁnition dist(0 , M u ( x )) = dist( F u ( x ) , − K ) one obtains that forall u ∈ B ( z, r ) ∩ A and v ∈ ∂h ( u ), and for any globally optimal solution x ∗ ofproblem (21) the following inequality holds true:dist( F z ( x ) , − K ) ≥ η (2 + R + k x k ) dist( x, M − z (0)) ∀ x ∈ B ( x ∗ , ∩ A (cf. (17)). Moreover, one can take as L g > g on theset B (0 , R +1) (recall that a convex function ﬁnite on R d is Lipschitz continuous26n bounded sets; see, e.g. [52, Thrm. 10.4]), while the existence of L h such that k v k ≤ L h for all v ∈ ∂h ( u ) and u ∈ B ( z, r ) follows from the local boundednessof the subdiﬀerential mapping [52, Crlr. 24.5.1]. Therefore, all assumptions ofProposition 2 are satisﬁed for D = B ( z, r ) ∩ A , and one can conclude that thecorollary holds true.Thus, it remains to prove that globally optimal solutions of problem (21) liewithin some ball B (0 , R ). Indeed, by the deﬁnition of subgradient g ( x ) − h v, x − u i ≥ g ( x ) − h ( x ) + h ( u ) ≥ f ( x ) + C , C := min u ∈ B ( z,r ) ∩ A h ( u ) . Furthermore, from inclusion (19) it follows that for any u ∈ B ( z, r ) ∩ A thereexists x ( u ) ∈ x + B R d such that 0 ∈ M u ( x ( u )), i.e. x ( u ) is a feasible pointof problem (21). Finally, as was noted in the proof of Lemma 3, the feasibleregion of problem (21) is contained in the feasible region of the problem ( P ),which we denote by Ω. Therefore globally optimal solutions of problem (21) arecontained in the set S := { x ∈ Ω | f ( x ) ≤ | C | + C } , where C := sup u ∈ B ( z,r ) ∩ A (cid:0) g ( x ( u )) − h v, x ( u ) − u i (cid:1) ≤ sup x ∈ x + B R d g ( x ) + L h (cid:16) k x k + 1 + k z k + r (cid:17) < + ∞ . It remains to note that the set S does not depend on u ∈ B ( z, r ) ∩ A and v ∈ ∂h ( u ), and is contained in some ball B (0 , R ), since Ω = { x ∈ A | F ( x ) ∈ − K } and by our assumption the penalty function Φ c ( · ) = f ( · ) + c dist( F ( · ) , − K ) iscoercive on A . Remark . Let a sequence { x n } be generated by Algorithm 2 and supposethat there exists m ∈ N such that either the assumptions of Proposition 2 aresatisﬁed for some set D ⊆ D containing the sequence { x n } n ≥ m or this sequenceis contained in a compact subset of the set D s (note that since D s is an open set,it is suﬃcient to suppose that the sequence x n converges to a point x ∗ ∈ D s ).Then there exists a threshold τ ∗ > k ≥ m one has k t k k ≥ τ ∗ , then the sequence { x n } n ≥ k +1 is feasible for the problem ( P ) andcoincides with a sequence generated by Algorithm 1 with starting point x k +1 .In this case one can apply Theorem 5 to analyse the behaviour of the sequence { x n } n ≥ k +1 and its convergence to a critical point for the problem ( P ). Notethat to prove this result one must suppose that τ max > τ ∗ , i.e. the maximaladmissible norm of the penalty parameters t n is suﬃciently large.Let us give a simple example illustrating Proposition 2 and Corollary 7, aswell as behaviour of sequences generated by Algorithms 1 and 2. Example 6.

Let d = 1, Y = R , and K = R + , i.e. y (cid:22) K y means that y ≤ y for all y , y ∈ R . Consider the following inequality constrained DCoptimization problem:min ( x − . subject to x − x ≤ . (22)We deﬁne g ( x ) = ( x − . , h ( x ) = 0, G ( x ) = x , and H ( x ) = x for all x ∈ R . The feasible region has the form Ω = ( −∞ , − ∪ { } ∪ [1 , + ∞ ). Thepoints x ∗ = 1 and x ∗ = 0 are globally optimal solutions of problem (22), while27he point x ∗ = − z ∈ R the linearized convex problem for problem (22) has the formmin x ( x − . subject to x − z − z ( x − z ) ≤ . (23)The inequality constraint can be rewritten as follows: (cid:0) x − z (cid:1) − z (cid:18) z − (cid:19) ≤ . Therefore D = −∞ , − √ ∪ { } ∪ " √ , + ∞ ! , D s = int D , that is, the feasible region of problem (23) is nonempty iﬀ z ∈ D , and Slater’scondition holds true for this problem iﬀ z ∈ int D = D s . Furthermore, for z = 0the feasible region of problem (23) consists of the single point x = 0, while forany z ∈ D , z = 0, the feasible region has the form " z − z r z − , z + 2 z r z − . As was noted multiple times above, this set is contained in the feasible region ofproblem (22), which implies that for any z ≥ √ / , + ∞ ),while for any z ≤ −√ / −∞ , − { x n } generated by Algorithm 1, i.e. x n +1 is deﬁned as a solution of the problemmin x ( x − . subject to x − x n − x n ( x − x n ) ≤ , is contained in the set ( −∞ , − x ≤ −

1, and in the set [1 , + ∞ ), if x ≥ x = 0 one has x n ≡

0. Moreover, one can easily check that allassumptions of Theorem 5 are satisﬁed, and x n +1 > x n for all n ∈ N , if x ≤ − x n +1 < x n for all n ∈ N , if x ≥

1. Therefore, a sequence { x n } generatedby Algorithm 1 converges to the locally optimal solution x ∗ = −

1, if x ≤ − x ∗ = 1, if x ≥

1. This exampleshows that if the feasible region of a problem under consideration consists ofseveral disjoint convex components, then a sequence generated by Algorithm 1lies within the component containing the initial guess x and converges to acritical point from this component, i.e. a sequence generated by Algorithm 1cannot jump from one convex component of the feasible region to another. Letus note that one can easily prove this result in the general case.Let us now consider Algorithm 2. To this end, we ﬁrst analyse the exactnessof the penalized subproblem that has the formmin ( x,s ) ( x − . + µt s subject to x − z − z ( x − z ) ≤ s, s ≥ , (24)where t >

0. One can easily verify that ( x ∗ , s ∗ ) is a globally optimal solutionof this problem iﬀ s ∗ = max { x − z − z ( x − z ) , } and x ∗ is a globally optimalsolution of the unconstrained problemmin ( x − . + µt max { x − z − z ( x − z ) , } . z ∈ D \ D s this problem takes the formmin ( x − . + µt ( x − z ) . Clearly, for any µ > { z } . Thus, the penalized problem (24)is not exact for all z ∈ D \ D s (one can verify that this result is connected tothe fact that error bound (17) from Proposition 2 is not valid for such z ).By Corollary 7 for any compact subset D ⊂ D s the penalized problem (24)is exact for all z ∈ D , in the sense that there exists µ ∗ ≥ µ ≥ µ ∗ a pair ( x ∗ , s ∗ ) is a globally optimal solution of problem (24) iﬀ s ∗ = 0and x ∗ is a globally optimal solution of the non-penalized problem (23). Denotethe greatest lower bound of all such µ ∗ by µ ∗ ( D ).One can verify that problem (24) is not exact for all z ∈ D s simultaneous,due to the fact that the µ ∗ ( { z } ) → + ∞ as z tends to the boundary of D s . Forthe sake of shortness, we do not present a detailed proof of this result and leaveit to the interested reader. Here we only mention that this result can be provedby noting that µ ∗ ( { z } ) is equal to the norm of an optimal solution of the dualproblem of (23) divided by t .Let us now consider the performance of Algorithm 2. To this end, put x = − t = 1, µ = 2, κ = 10 − , and τ max = 1024 in Algorithm 2. Note thatthe initial point x is critical for problem (22), but is not a globally optimalsolution of this problem. Solving the penalized problem (24) with z = x one obtains that x = − .

75. Thus, Algorithm 2, unlike the DC algorithm,managed to escape a convex component of the feasible region containing theinitial guess and, furthermore, to “jump oﬀ” from a point of local minimum.Numerical simulation showed that the sequence { x n } generated by Algorithm 2converges to the point x ∗ ≈ . τ max = + ∞ and κ = 0, then thesequence converges to the globally optimal solution x ∗ = 0. However, note thatif one chooses t ≥ µ ∗ ( {− } ) = 1 .

5, then the method terminates after the ﬁrstiteration with x = x .Thus, it seems advisable to choose t with suﬃciently small norm (and maybeeven perform several iterations before increasing the penalty parameter), toenable Algorithm 2 to ﬁnd a better solution. Moreover, even if a feasible point x is known, it is reasonable to use Algorithm 2 instead of Algorithm 1 due tothe ability of the penalized method to escape convex components of the feasibleregion and ﬁnd better locally optimal solutions than the original method. In the general case, the feasible region of the non-penalized problemminimize g ( x ) − h v, x − x n i subject to G ( x ) − H ( x n ) − DH ( x n )( x − x n ) (cid:22) K , x ∈ A, (see (12)) might be empty for all n ∈ N . Then a sequence { x n } generatedby Algorithm 2 is infeasible for the problem ( P ), and Proposition 2 along withCorollary 7 do not allow one to say anything about a convergence of the method.Moreover, even if τ max = + ∞ , i.e. the norm of t n can increase unboundedly,there is no guarantee that limit points of the sequence { x n } are feasible for the29riginal problem. To avoid such pathological cases, one usually either adopts an‘a priori approach’ and supposes that a suitable constraint qualiﬁcation holdstrue at all infeasible points (this approach was widely used, e.g. for convergenceanalysis of exact penalty methods in [48]) or adopts an ‘a posteriori approach’and supposes that a sequence generated by the method converges to a point,at which an appropriate constraint qualiﬁcation holds true (such approach wasused, e.g. for an analysis of trust region methods in [7]). For the sake ofcompleteness, we present two convergence theorems for Algorithm 2, one ofwhich is based on the a priori approach, while the other one is based on thea posteriori one and was hinted at in Remark 8. Both these theorems ensurethe convergence of Algorithm 2 to a feasible and critical point, provided τ max issuﬃciently large.We start with the a priori approach. To this end we need to introduce thefollowing extension of the deﬁnition of critical point to the case of infeasiblepoints. Deﬁnition 2.

A point x ∗ ∈ A is said to be a generalized critical point for vector t ≻ K ∗

0, if there exist v ∗ ∈ ∂h ( x ∗ ) and s ∗ (cid:23) K x ∗ , s ∗ ) isa globally optimal solution of the problemmin ( x,s ) g ( x ) − h v ∗ , x − x ∗ i + h t, s i s.t. G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) (cid:22) K s, s (cid:23) K , x ∈ A. (25)Let us give two useful characterizations of the generalized criticality. Proposition 3.

Let x ∗ ∈ A and t ≻ K ∗ be given. The following statementshold true:1. x ∗ is a generalized critical point for t iﬀ there exist v ∗ ∈ ∂h ( x ∗ ) , s ∗ (cid:23) K ,and λ ∗ , µ ∗ ∈ K ∗ such that F ( x ∗ ) − s ∗ (cid:22) K , t = λ ∗ + µ ∗ , and ∈ ∂ x L ( x ∗ , λ ∗ ) + N A ( x ∗ ) , h λ ∗ , F ( x ∗ ) + s ∗ i = 0 , h µ ∗ , s ∗ i = 0 , where L ( x, λ ) = g ( x ) − h v ∗ , x − x ∗ i + h λ, G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) i ;2. if x ∗ is feasible for the problem ( P ) and is a generalized critical point for t ,then x ∗ is a critical point for the problem ( P ) ; conversely, if x ∗ is a criticalpoint for the problem ( P ) satisfying optimality conditions from Corollary 6for some λ ∗ ∈ K ∗ such that t (cid:23) K ∗ λ ∗ , then x ∗ is a generalized criticalpoint for t .Proof.

1. Problem (25) can be rewritten as a convex cone constrained optimiza-tion problem of the formminimize ( x,s ) g ( x ) − h v ∗ , x − x ∗ i + h t, s i subject to x ∈ A, b F ( x, s ) = (cid:18) G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) − s − s (cid:19) ∈ (cid:18) − K − K (cid:19) . (26)Note that the following constraint qualiﬁcation holds true for this problem:0 ∈ int n b F ( x, s ) + K × K (cid:12)(cid:12)(cid:12) x ∈ A, s ∈ Y o . x ∗ is a generalized critical point for t iﬀ there exist v ∗ ∈ ∂h ( x ∗ ) and s ∗ (cid:23) K x ∗ , s ∗ ) satisﬁes the KKT optimality conditions forproblem (26) (see, e.g. [4, Thrm. 3.6]). Rewriting the KKT optimality conditionsin terms of problem (25) we arrive at the require result.2. Let x ∗ be a generalized critical point for t . Then by deﬁnition thereexist v ∗ ∈ ∂h ( x ∗ ) and s ∗ (cid:23) K x ∗ , s ∗ ) is a globally optimalsolution of problem (25). Since the point x ∗ is feasible for the problem ( P ), thepair ( x ∗ ,

0) is feasible for problem (25). Moreover, one has g ( x ∗ ) ≤ g ( x ∗ ) + h t, s ∗ i , since s ∗ (cid:23) K t ≻ K ∗

0. Therefore the pair ( x ∗ ,

0) is a globallyoptimal solution of problem (25), which obviously implies that x ∗ is a globallyoptimal solution of the problemmin g ( x ) − h v ∗ , x − x ∗ i s.t. G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) (cid:22) K , x ∈ A. or, equivalently, x ∗ is a critical point for the problem ( P ).Suppose now that x ∗ is a critical point for the problem ( P ) satisfying opti-mality conditions from Corollary 6 for some λ ∗ ∈ K ∗ such that t (cid:23) K ∗ λ ∗ . Thenone can easily verify that the pair ( x ∗ ,

0) satisﬁes optimality conditions fromthe ﬁrst part of this proposition with µ ∗ = t − λ ∗ , which implies that x ∗ is ageneralized critical point for t . Remark . (i) From the proposition above it follows that if x ∗ is a critical point,but the inequality t (cid:23) K λ ∗ is not satisﬁed for any corresponding Lagrange mul-tiplier λ ∗ (roughly speaking, the penalty parameter is smaller then the Lagrangemultiplier), then x ∗ cannot be a generalized critical point for t . Indeed, if x ∗ isa generalized critical point for t , then from the proof of the second part of theproposition it follows that ( x ∗ ,

0) is a globally optimal solution of problem (26).Applying KKT optimality conditions to this problem one gets that t = λ ∗ + µ ∗ for some µ ∗ ∈ K ∗ and some Lagrange multiplier λ ∗ . Consequently, t (cid:23) K λ ∗ ,which is impossible.(ii) With the use of the ﬁrst part of the previous proposition one can readilyverify that a (not necessarily feasible) point x ∗ is a generalized critical pointfor some t ∈ R m with t ( i ) > i ∈ I := { , . . . , m } , of the smooth inequalityconstrained DC optimization problemmin f ( x ) = g ( x ) − h ( x ) s.t. f i ( x ) = g i ( x ) − h i ( x ) ≤ t ( x ) = f ( x ) + P mi =1 t ( i ) max { , f i ( x ) } one has0 ∈ ∂ Φ t ( x ∗ ) = ∇ f ( x ∗ )+ X i ∈ I : f i ( x ∗ ) > t ( i ) ∇ f i ( x ∗ )+ X i ∈ I : f i ( x ∗ )=0 t ( i ) co { , ∇ f i ( x ∗ ) } or, equivalently, iﬀ there exists λ ∗ ∈ R m such that ∇ f ( x ∗ ) + m X i =1 λ ( i ) ∗ ∇ f i ( x ∗ ) = 0 , t ( i ) ≥ λ ( i ) ∗ ≥ ∀ i ∈ I, and for all i ∈ I one has λ ( i ) ∗ = 0 whenever f i ( x ∗ ) <

0, while t ( i ) = λ ( i ) ∗ whenever f i ( x ∗ ) >

0. With the use of this result one can show that the point x ∗ is nota generalized stationary point for µt with µ >

1, provided a suitable constraintqualiﬁcation holds true at x ∗ . Thus, generalized criticality depends on the choice31f the penalty parameter t and in many cases its increase or decrease might helpto escape a generalized critical point.(iii) As was noted above, a generalized critical point x ∗ is, in essence, a criticalpoint of the penalty function Φ t , i.e. such point that 0 ∈ ∂ Φ t ( x ∗ ), where ∂ Φ t ( x )is the Dini subdiﬀerential of Φ t at x . Various conditions ensuring that there areno infeasible critical points of a penalty function were studied in detail in [10–12].Before we proceed to convergence analysis, let us also establish an importantproperty of a sequence generated by Algorithm 2, which, in particular, leads toa natural stopping criterion for this method. Lemma 4.

Let a sequence { ( x n , s n ) } be generated by Algorithm 2. Then f ( x n +1 ) + h t n , s n +1 i ≤ f ( x n ) + h t n , s n i ∀ n ∈ N . (27) Moreover, this inequality is strict, if x n is not a generalized critical point for t n .Proof. By deﬁnition ( x n +1 , s n +1 ) is a globally optimal solution of the problemmin ( x,s ) g ( x ) − h v n , x − x n i + h t n , s i s.t. G ( x ) − H ( x n ) − DH ( x n )( x − x n ) (cid:22) K s, s (cid:23) K , x ∈ A, (28)while the pair ( x n , s n ) satisﬁes the following conditions: G ( x n ) − H ( x n − ) − DH ( x n − )( x n − x n − ) (cid:22) K s n , s n (cid:23) k , x n ∈ A. With the use of Lemma 1 one obtains that G ( x n ) − H ( x n ) (cid:22) K s n , which impliesthat ( x n , s n ) is a feasible point of problem (28). Therefore g ( x n +1 ) − h v n , x n +1 − x n i + h t n , s n +1 i ≤ g ( x n ) + h t n , s n i ∀ n ∈ N . (29)Subtracting h ( x n ) from both sides of this inequality and applying the deﬁni-tion of subgradient one obtains that inequality (27) holds true. It remains tonote that if x n is not a generalized critical point for t n , then by deﬁnition theinequality in (29) is strict, which implies that inequality (27) is strict as well. Remark . From the lemma above it follows that one can use the inequality (cid:12)(cid:12)(cid:12) f ( x n +1 ) + h t n , s n +1 i − f ( x n ) − h t n , s n i (cid:12)(cid:12)(cid:12) < ε as a stopping criterion for Algorithm 2.Now we can provide suﬃcient conditions for the convergence of a sequencegenerated by Algorithm 2 to a feasible and critical point for the problem ( P ),based on the a priori approach to convergence analysis. Theorem 6.

Let the space Y be ﬁnite dimensional, the cone K be generating, G be continuous on A , H be continuously Fr´echet diﬀerentiable on A , and thepenalty function Φ c ( x ) = f ( x ) + c dist( F ( x ) , − K ) be bounded below on A for c = min {h t , s i | s ∈ K, k s k = 1 } > . Then all limits points of a sequence { x n } generated by Algorithm 2 are generalized critical points for t ∗ = lim t n .Suppose, in addition, that all points from the set n x ∈ A (cid:12)(cid:12)(cid:12) dist( F ( x ) , − K ) > κ o re not generalized critical points for b t = µ p t , where p ∈ N is the largest naturalnumber satisfying the inequality k µ p t k ≤ τ max . Then all limit points x ∗ of thesequence { x n } satisfy the inequality dist( F ( x ∗ ) , − K ) ≤ κ . In particular, if κ = 0 , then all limits points of the sequence { x n } are feasible and critical forthe problem ( P ) .Proof. Suppose that the ﬁrst part of the theorem holds true, i.e. all limit pointsof the sequence { x n } are generalized critical points for t ∗ = lim t n (note thatthis limit exists, since according to Step 4 of Algorithm 2 the penalty parametercan be updated only a ﬁnite number of times). Let us show that the secondpart of the theorem holds true.Indeed, let x ∗ be a limit point of the sequence { x n } , that is, there exists asubsequence { x n k } converging to x ∗ . Let us consider two cases. Suppose at ﬁrstthat the norm of the penalty parameter t n does not reach the upper bound τ max (see Step 4 of Algorithm 2), that is, the penalty parameter is updated less than p times. Then according to the penalty updating rule on Step 4 of Algorithm 2there exists n ∈ N such that k s n k < κ for all n ≥ n . By deﬁnition G ( x n ) − H ( x n − ) − DH ( x n − )( x n − x n − ) (cid:22) K s n ∀ n ∈ N , which thanks to Lemma 1 implies that F ( x n ) (cid:22) K s n or, equivalently, one has F ( x n ) − s n ∈ − K . Therefore dist( F ( x n ) , − K ) ≤ k s n k < κ for all n ≥ n .Consequently, passing to the limit in the inequality dist( F ( x n k ) , − K ) < κ with the use of the fact that both G and H are continuous one obtains thatdist( F ( x ∗ ) , − K ) ≤ κ .Suppose now that the norm of t n reaches the upper bound τ max after aﬁnite number of iterations. Then according to Step 4 of Algorithm 2 thereexists n ∈ N such that t n = µ p t for all n ≥ n . By our assumption x ∗ isa generalized critical point for t ∗ = b t = µ p t , which by the assumption of thetheorem implies that it cannot belong to the set { x ∈ A : dist( F ( x ) , − K ) > κ } .Therefore, dist( F ( x ∗ ) , − K ) ≤ κ .Finally, if κ = 0, then dist( F ( x ∗ ) , − K ) = 0, that is, F ( x ∗ ) ∈ − K , since thecone K is closed. Consequently, the point x ∗ is feasible for the problem ( P ).Hence by the second part of Proposition 3 the point x ∗ is also a critical for theproblem ( P ).Thus, it remains to prove that all limit points of the sequence { x n } aregeneralized critical points for t ∗ = lim t n . Let a subsequence { x n k } converge tosome point x ∗ . Then the corresponding sequence { v n k } of subgradients of thefunction h is bounded due to to the local boundedness of the subdiﬀerentialmapping [52, Crlr. 24.5.1]. Therefore, replacing, if necessary, the sequence { x n k } with its subsequence one can suppose that the sequence { v n k } converges tosome vector v ∗ belonging to ∂h ( x ∗ ) by virtue of the fact that the graph of thesubdiﬀerential is closed [52, Thrm. 24.4].Let us show that the sequence { s n k } ⊂ K is bounded. Then taking intoaccount the facts that the space Y is ﬁnite dimensional and the cone K isclosed, and replacing, if necessary, the sequence { x n k } with its subsequence,one can suppose that { s n k } converges to some s ∗ ∈ K .Indeed, since the penalty parameter t n can be updated only a ﬁnite numberof times, there exists n ∈ N such that t n = t n for all n ≥ n . Consequently, byLemma 4 the sequence { f ( x n ) + h t n , s n i} n ≥ n is non-increasing and, in partic-33lar, bounded above. Therefore the sequence { f ( x n k ) + h t , s n k i} is boundedabove as well.Arguing by reductio ad absurdum, suppose that the sequence { s n k } is un-bounded. Then applying the inequality f ( x n k ) + h t , s n k i ≥ f ( x n k ) + c k s n k k one gets that lim sup k →∞ ( f ( x n k )+ h t , s n k i ) = + ∞ , which is impossible. Thus,without loss of generality one can suppose that the sequence { s n k } converges tosome s ∗ . Note that from the deﬁnition of ( x n , s n ) and Lemma 1 it follows that F ( x n ) (cid:22) K s n . Therefore F ( x ∗ ) (cid:22) K s ∗ , thanks to the fact that the cone K isclosed.Now we can turn to the proof of the fact that the point x ∗ is a generalizedcritical point for t ∗ . Arguing by reductio ad absurdum, suppose that this state-ment is false. Then, in particular, the point ( x ∗ , s ∗ ) is not a globally optimalsolution of the problemminimize ( x,s ) g ( x ) − h v ∗ , x − x ∗ i + h t ∗ , s i subject to G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) (cid:22) K s, s (cid:23) K , x ∈ A. Therefore there exist a feasible point ( x, s ) of this problem and θ > g ( x ) − h v ∗ , x − x ∗ i + h t ∗ , s i < g ( x ∗ ) + h t ∗ , s ∗ i − θ. Applying Corollary 4 with X = R d × Y andΦ( x, s ) = (cid:18) G ( x ) − s − s (cid:19) , Ψ( x, s ) = (cid:18) H ( x )0 (cid:19) , A = A × Y, K = K × K, one obtains that for any z ∈ A × Y lying in a neighbourhood of ( x ∗ , s ∗ ) one canﬁnd ( ξ ( z ) , ζ ( z )) ∈ A × K such that G ( ξ ( z )) − H ( z ) − DH ( z )( ξ ( z ) − z ) (cid:22) K ζ ( z )and ( ξ ( z ) , ζ ( z )) → ( x, s ) as z → ( x ∗ , s ∗ ). Consequently, there exists k ∈ N such that for any k ≥ k the point ( ξ ( x n k ) , ζ ( x n k )) is feasible for problemmin ( x,s ) g ( x ) − h v n k , x − x n k i + h t n k , s i s.t. G ( x ) − H ( x n k ) − DH ( x n k )( x − x n k ) (cid:22) K s, s (cid:23) K , x ∈ A. (note that one can suppose that t n k = t ∗ , since the penalty parameter is updatedonly a ﬁnite number of times) and g ( ξ ( x n k )) − h v n k , ξ ( x n k ) − x n k i + h t ∗ , ζ ( x n k ) i < g ( x n k ) + h t ∗ , s n k i − θ . Therefore by the deﬁnition of ( x n , s n ) for any k ≥ k one has g ( x n k +1 ) − h v n k , x n k +1 − x n k i + h t ∗ , s n k +1 i < g ( x n k ) + h t ∗ , s n k i − θ . Subtracting h ( x n k ) from both sides of this inequality and applying the deﬁni-tion of subgradient one obtains that f ( x n k +1 ) + h t ∗ , s n k +1 i < f ( x n k ) + h t ∗ , s n k i − θ ∀ k ≥ k , f ( x n ) + h t ∗ , s n i → −∞ as n → ∞ (recall that t ∗ = t n for any suﬃciently large n , since the penalty parameter canbe updated only a ﬁnite number of times). On the other hand, as was shownabove (see the proof of Lemma 3), one has f ( x n ) + h t ∗ , s n i ≥ f ( x n ) + h t , s n i ≥ f ( x n ) + c dist( F ( x n ) , − K ) =: Φ c ( x n ) . Consequently, Φ c ( x n ) → −∞ , which contradicts the fact that by our assumptionthis function is bounded below on A . Therefore one can conclude that x ∗ is ageneralized critical point for t ∗ . Remark . Note that in the previous theorem it is suﬃcient to suppose that thepenalty function Φ c is bounded below on A for c = inf {h t ∗ , s i | s ∈ K, k s k = 1 } ,which is, in the general case, greater than c from the formulation of the theorem.However, such assumption is inconsistent with the a priori approach, since it isbased on the information about the behaviour of the sequence { t n } , which isnot known in advance.Finally, let us consider the a posteriori approach to convergence analysis,which allows one to obtain suﬃcient conditions for the convergence of Algo-rithm 2 to a critical point for the problem ( P ). Theorem 7.

Let K be ﬁnite dimensional, G be continuous on A , H be contin-uously Fr´echet diﬀerentiable on A , and there exists c ≥ such that the penaltyfunction Φ c ( · ) = f ( · ) + c dist( F ( · ) , − K ) is coercive on A . Suppose also that asequence { x n } generated by Algorithm 2 with κ = 0 and τ max = + ∞ convergesto a point x ∗ satisfying the following constraint qualiﬁcation: ∈ int (cid:8) G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) + K (cid:12)(cid:12) x ∈ A (cid:9) (30) (i.e. x ∗ ∈ D s ). Then the sequence { t n } is bounded, there exists m ∈ N suchthat for all n ≥ m the point x n is feasible for the problem ( P ) , and the point x ∗ is feasible and critical for the problem ( P ) .Proof. By our assumption x ∗ ∈ D s . Therefore, as was shown in the proof ofCorollary 7, there exist r > µ ∗ ≥ µ ≥ µ ∗ and for any z ∈ B ( x ∗ , r ) ∩ A and v ∈ ∂h ( z ) the penalized problem (18) is exact. Deﬁne τ ∗ = µ ∗ k t k .If the penalty parameter t n is updated only a ﬁnite number of times, thenthe sequence { t n } is obviously bounded. Moreover, according to Step 4 ofAlgorithm 2 in this case there exists m ∈ N such that s n = 0 for all n ≥ m , whichimplies that the sequence { x n } n ≥ m is feasible for the problem ( P ). Thereforethe point x ∗ is also feasible for this problem, due to the fact that under ourassumptions the feasible region of the problem ( P ) is closed.On the other hand, if the penalty parameter t n is updated an inﬁnite numberof times, then according to Step 4 of Algorithm 2 there exists m ∈ N such that k t n k ≥ τ ∗ for all n ∈ N . Moreover, increasing m , if necessary, one can supposethat x n ∈ B ( x ∗ , r ) for all n ≥ m . Consequently, the penalized subproblemon Step 3 of Algorithm 2 is exact for all n ≥ m . Hence by the deﬁnition ofexactness s n = 0 for all n ≥ m + 1, which contradicts our assumption that t n isupdated an inﬁnite number of times.Thus, the sequence { t n } is bounded and the point x ∗ is feasible for theproblem ( P ). It remains to verify that the point x ∗ is critical. Suppose at35rst that there exists m ∈ N such that k t m k ≥ τ ∗ . Then k t n k ≥ τ ∗ for all n ≥ m . Increasing m , if necessary, one can suppose that x n ∈ B ( x ∗ , r ) for all n ≥ m . Therefore by the deﬁnitions of exactness of the penalized problem andAlgorithms 1 and 2 the sequence { x n } n ≥ m +1 is feasible for the problem ( P ) andcoincides with the sequence generated by Algorithm 1 with starting point x n +1 .Therefore by Theorem 5 the point x ∗ is critical for the problem ( P ).Suppose now that k t n k < τ ∗ for all n ∈ N . Then there exists n ∈ N suchthat t n = t n for all n ≥ n . Since the sequence { x n } generated by Algorithm 2converges to x ∗ , the corresponding sequence { v n } of subgradients of the function h is bounded, thanks to the local boundedness of the subdiﬀerential mapping[52, Crlr. 24.5.1]. Consequently, there exists a subsequence { v n k } converging tosome vector v ∗ , which belongs to ∂h ( x ∗ ) due to closedness of the graph of thesubdiﬀerential [52, Thrm. 24.4].Arguing by reductio ad absurdum, suppose that x ∗ is not a critical point forthe problem ( P ). As was noted several times above, it implies that x ∗ is not aglobally optimal solution of the problemminimize g ( x ) − h v ∗ , x − x ∗ i subject to G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) (cid:22) K , x ∈ A. Thus, there exist θ > x of this problem satisfying theinequality g ( x ) − h v ∗ , x − x ∗ i < g ( x ∗ ) − θ .Applying Corollary 4 with Φ = G , Ψ = H , A = A , and K = K one obtainsthat for any z ∈ A lying in a neighbourhood of x ∗ one can ﬁnd ξ ( z ) ∈ A suchthat G ( ξ ( z )) − H ( z ) − DH ( z )( ξ ( z ) − z ) (cid:22) K ξ ( z ) → x as z → x ∗ . Hencebearing in mind the facts that x n k → x ∗ and v n k → v ∗ as k → ∞ , one obtainsthat there exists k ∈ N such that g ( ξ ( x n k )) − h v n k , ξ ( x n k ) − x n k i < g ( x n k ) − θ ∀ k ≥ k . Clearly, one can suppose that n k ≥ n .Recall that by deﬁnition ( x n k +1 , s n k +1 ) is a globally optimal solution of thepenalized problemmin ( x,s ) g ( x ) − h v n k , x − x n k i + h t n k , s i s.t. G ( x ) − H ( x n k ) − DH ( x n k )( x − x n k ) (cid:22) K s, s (cid:23) K , x ∈ A. By deﬁnition the point ( ξ ( x n k ) ,

0) is feasible for this problem, which impliesthat g ( x n k +1 ) − h v n k , x n k +1 − x n k i + h t n k , s n k +1 i≤ g ( ξ ( x n k )) − h v n k , ξ ( x n k ) − x n k i < g ( x n k ) − θ k ≥ k . Subtracting h ( x n k ) from both sides of this inequality andapplying the deﬁnition of subgradient and the fact that t n = t n for all n ≥ n ,one obtains that f ( x n k +1 ) + h t n , s n k +1 i < f ( x n k ) − θ ≤ f ( x n k ) + h t n , s n k i − θ k ≥ k (here we used the facts that by deﬁnition s n k ∈ K and h t n , s i ≥ s ∈ K and n ∈ N ). By Lemma 4 one has f ( x n +1 ) + h t n , s n +1 i ≤ f ( x n ) + h t n , s n i ∀ n ≥ n . Consequently, f ( x n ) + h t n , s n i → −∞ as n → ∞ , which contradicts the factsthat x n → x ∗ as n → ∞ and f ( x n )+ h t n , s n i ≥ f ( x n ) for all n ∈ N . Therefore, x ∗ is a critical point, and the proof is complete.Thus, one can conclude that if a sequence { x n } generated by either Algo-rithm 1 or Algorithm 2 converges to a point x ∗ such that Slater’s conditionholds true for the corresponding linearized convex problem, then under somenatural assumptions the point x ∗ is critical for the problem ( P ). In this paper we developed a general theory of cone constrained DC optimiza-tion problems, particularly, DC semideﬁnite programming problems. To thisend, we studied two deﬁnition of DC matrix-valued functions (abstract andcomponentwise) and their interconnections. We proved that any DC matrix-valued function is componentwise DC and demonstrated how one can computea DC decomposition of several nonlinear semideﬁnite constraints appearing inapplications. We also constructed a DC decomposition of the maximal eigen-value function, which allows one to apply standard results and methods of in-equality constrained DC optimization to problems with smooth and nonsmoothcomponentwise DC semideﬁnite constraints.In the case of general cone constrained DC optimization problems, we ob-tained local optimality conditions and presented a detailed convergence analysisof the DC algorithm (the convex-concave procedure) and its penalized versionproposed in [40] (see also [35,47]) under the assumption that the concave part ofthe constraints is smooth. In particular, we obtained suﬃcient conditions for theexactness of the penalty subproblem of the penalized version of the method andanalysed two types of suﬃcient conditions for the convergence of this methodto a feasible and critical point of a cone constrained DC optimization problemfrom an infeasible starting point. The ﬁrst type of suﬃcient condition is theso-called a priori conditions, which are based on general assumptions on theproblem under consideration, while the second type is the a posteriori condi-tions, which rely on some assumptions on a limit point of a sequence generatedby an optimization method. Finally, we presented a simple example demon-strating that even if a feasible starting point is known, it might be reasonable touse the penalized version of the method, since it is sometimes capable of ﬁndingdeeper local minimum than the standard method.The main results of this paper pave the way for applications of DC opti-mization methods to various nonlinear semideﬁnite programming problems andother nonlinear cone constrained optimization problems, such as nonlinear sec-ond order cone programming problems.37 eferences [1] P. A. Absil, R. Mahony, and R. Sepulchre.

Optimization algorithms onmatrix manifolds . Princeton University Press, Princeton, 2009.[2] F. Alizadeh and D. Goldfarb. Second-order cone programming.

Math.Program. , 95:3–51, 2003.[3] A. Ben-Tal and A. Nemirovski.

Lectures on Modern Convex Optimization.Analysis, Algorithms, and Engineering Applications . SIAM, Philadelphia,2001.[4] J. F. Bonnans and A. Shapiro.

Perturbation Analysis of Optimization Prob-lems . Springer, New York, 2000.[5] S. Boyd and L. Vandenberghe.

Convex Optimization . Cambridge UniversityPress, Cambridge, 2004.[6] A. Canelas, M. Carrasco, and J. L´opez. A feasible direction algorithm fornonlinear second-order cone programs.

Optim. Meth. Softw. , 34:1322–1341,2019.[7] A. R. Conn, N. I. M. Gould, and P. L. Toint.

Trust-Region Methods . SIAM,Philadelphia, 2000.[8] W. de Oliveira. Proximal bundle methods for nonsmooth DC programming.

J. Glob. Optim. , 75:523–563, 2019.[9] W. de Oliveira and M. P. Tcheou. An inertial algorithm for DC program-ming.

Set-Valued Var. Anal. , 27:895–919, 2019.[10] M. V. Dolgopolik. A unifying theory of exactness of linear penalty func-tions.

Optim. , 65:1167–1202, 2016.[11] M. V. Dolgopolik. A unifying theory of exactness of linear penalty functionsII: parametric penalty functions.

Optim. , 66:1577–1622, 2017.[12] M. V. Dolgopolik and A. V. Fominyh. Exact penalty functions for optimalcontrol problems I: Main theorem and free-endpoint problems.

Optim.Control Appl. Method , 40:1018–1044, 2019.[13] M. D¨ur, R. Horst, and M. Locatelli. Necessary and suﬃcient global op-timality conditions for convex maximization revisited.

J. Math. AnalysisAppl. , 217:637–649, 1998.[14] A. Edelman, A. A. Tom´as, and T. S. Smith. The geometry of algorithmswith orthogonality constraints.

SIAM J. Matrix Anal. Appl. , 20:303–353,1998.[15] A. Ferrer and J. E. Mart´ınez-Legaz. Improving the eﬃciency of DC globaloptimization methods by improving the DC representation of the objectivefunction.

J. Glob. Optim. , 43:513–531, 2009.[16] N. A. Gadhi. Necessary optimality conditions for a nonsmooth semi-inﬁniteprogramming problem.

J. Glob. Optim. , 74:161–168, 2019.3817] M. Gaudioso, G. Giallombardo, G. Miglionico, and A. M. Bagirov. Mini-mizing nonsmooth DC functions via successive DC piecewise-aﬃne approx-imations.

J. Glob. Optim. , 71:37–55, 2018.[18] M. A. Goberna and M. A. L´opez, editors.

Semi-Inﬁnite Programming:Recent Advances . Kluwer Academic Publishers, Dordrecht, 2001.[19] K.-C. Goh, M. G. Safonov, and J. H. Ly. Robust synthesis via bilinearmatrix inequalities.

Int. J. Robust Nonlinear Control , 6:1079–1095, 1996.[20] K.-C. Goh, M. G. Safonov, and G. P. Papavassilopous. Global optimizationfor the Biaﬃne Matrix Inequality problem.

J. Glob. Optim. , 7:365–380,1995.[21] P. Hartman. On functions representable as a diﬀerence of convex functions.

Pac. J. Math. , 9:707–713, 1959.[22] D. Henrion, S. Tarbouriech, and M. ˇSebek. Rank-one LMI approach tosimultaneous stabilization of linear systems.

Syst. Control Lett. , 38:79–89,1999.[23] J.-B. Hiriart-Urruty. Generalized diﬀerentiability/duality and optimizationfor problems dealing with diﬀerences of convex functions. In J. Ponstein, ed-itor,

Convexity and Duality in Optimization , pages 37–70. Springer, Berlin,Heidelberg, 1985.[24] J.-B. Hiriart-Urruty. From convex optimization to nonconvex optimization.Necessary and suﬃcient conditions for global optimality. In F. H. Clarke,V. F. Dem’yanov, and F. Giannessi, editors,

Nonsmooth Optimization andRelated Topics , pages 219–239. Springer, Boston, MA, 1989.[25] J.-B. Hiriart-Urruty. Conditions for global optimality 2.

J. Glob. Optim. ,13:349–367, 1998.[26] R. Horst and N. V. Thoai. DC programming: Overview.

J. Optim. TheoryAppl. , 103:1–43, 1999.[27] A. D. Ioﬀe and V. M. Tihomirov.

Theory of Extremal Problems . North-Holland, Amsterdam, 1979.[28] K. Joki, A. M. Bagirov, N. Karmitsa, M. M¨akel¨a, and S. Taheri. Dou-ble bundle method for ﬁnding Clarke stationary points in nonsmooth DCprogramming.

SIAM J. Optim. , 28:1892–1919, 2018.[29] R. V. Kadison. Order properties of bounded self-adjoint operators.

Proc.Amer. Math. Soc. , 2:505–510, 1951.[30] N. Kanzi. Necessary optimality conditions for nonsmooth semi-inﬁnite pro-gramming problems.

J. Glob. Optim. , 49:713–725, 2011.[31] H. Kato and M. Fukushima. An SQP-type algorithm for nonlinear second-order cone programs.

Optim. Lett. , 1:129–144, 2007.[32] M. Koˇcvara and M. Stingl. PENNON: A code for convex nonlinear andsemideﬁnite programming.

Optim. Methods Softw. , 18:317–333, 2003.3933] A. G. Kusraev and S. S. Kutateladze.

Subdiﬀerentials: Theory and Appli-cations . Kluwer Academic Publishers, Dordrecht, 1995.[34] G. R. Lanckreit and B. K. Sriperumbudur. On the convergence of theconcave-convex procedure.

Adv. Neural Inf. Process. Syst. , 22:1759–1767,2009.[35] H. A. Le Thi, V. N. Nuynh, and T. Pham Dinh. DC programming andDCA for general DC programs. In T. van Do, H. A. L. Thi, and N. T.Nguyen, editors,

Advanced Computational Methods for Knowledge Engi-neering , pages 15–35. Springer, Berlin, Heidelberg, 2014.[36] H. A. Le Thi, V. N. Nuynh, and T. Pham Dinh. Convergence analysisof diﬀerence-of-convex algorithm with subanalytic data.

J. Optim. TheoryAppl. , 179:103–126, 2018.[37] H. A. Le Thi and T. Pham Dinh. DC programming and DCA: thirty yearsof developments.

Math. Program. , 169:5–68, 2018.[38] H. A. Le Thi, T. Pham Dinh, and L. D. Muu. Numerical solution foroptimization over the eﬃcient set by D.C. optimization algorithm.

Oper.Res. Lett. , 19:117–128, 1996.[39] H. A. Le Thi, T. Pham Dinh, and N. V. Thoai. Combination between globaland local methods for solving an optimization problem over the eﬃcient set.

Eur. J. Oper. Res. , 142:258–270, 2002.[40] T. Lipp and S. Boyd. Variations and extension of the convex-concave pro-cedure.

Optim. Eng. , 17:263–287, 2016.[41] J. H. Manton. Optimization algorithms exploiting unitary constraints.

IEEE Trans. Signal Process. , 50:635–650, 2002.[42] B. S. Mordukhovich and T. Nghia. Nonsmooth cone-constrained optimiza-tion with applications to semi-inﬁnite programming.

Math. Oper. Res. ,39:301–324, 2014.[43] Y. Nesterov and A. Nemirovskii.

Interior-Point Polynomial Algorithms inConvex Programming . SIAM, Philadelphia, 1994.[44] Y.-S. Niu and T. P. Dinh. DC programming approaches for BMI andQMI feasibility problems. In T. van Do, H. Thi, and N. Nguyen, editors,

Advanced Computational Methods for Knowledge Engineering. , pages 37–63. Springer, Cham, 2014.[45] N. S. Papageorgiou. Nonsmooth analysis on partially ordered vectorsspaces: part 1 — convex case.

Pac. J. Math. , 107:403–458, 1983.[46] T. Pham Dinh and H. A. Le Thi. D.C. optimization algorithms for solvingthe trust region subproblem.

SIAM J. Optim. , 8:476–505, 1998.[47] T. Pham Dinh and H. A. Le Thi. Recent advances in DC programmingand DCA. In N. T. Nguyen and H. A. L. Thi, editors,

Transactions onComputational Intelligence XIII , pages 1–37. Springer, Berlin, Heidelberg,2014. 4048] E. Polak.

Optimization: Algorithms and Consistent Approximations .Springer-Verlag, New York, 1997.[49] R. Reemtsen and J.-J. R¨uckmann, editors.

Semi-Inﬁnite Programming .Kluwer Academic Publishers, Dordrecht, 1998.[50] T. Pham Dinh and E. B. Souad. Algorithms for solving a class of nonconvexoptimization problems. Methods of subgradients. In J.-B. Hiriart-Urruty,editor,

Fermat Days 85: Mathematics for Optimization. North-HollandMathematics Studies. Vol. 129 , pages 249–271. Norht-Holland, Amsterdam,1986.[51] S. M. Robinson. Regularity and stability for convex multivalued functions.

Math. Oper. Res. , 1:130–143, 1976.[52] R. T. Rockafellar.

Convex Analysis . Princeton University Press, Princeton,1970.[53] O. Stein. How to solve a semi-inﬁnite optimization problem.

Eur. J. Oper.Res. , 223:312–320, 2012.[54] M. Stingl.

On the solution of nonlinear semideﬁnite programs by aug-mented Lagrangian methods . PhD thesis, Institute of Applied MathematicsII, Friedrech-Alexander University of Erlangen-Nuremberg, Erlangen, Ger-many, 2006.[55] A. S. Strekalovsky. On the problem of the global extremum.

Sov. Math.Dokl. , 35:194–198, 1987.[56] A. S. Strekalovsky. Global optimality conditions for nonconvex optimiza-tion.

J. Glob. Optim. , 12:415–434, 1998.[57] A. S. Strekalovsky. On the minimization of the diﬀerence of convex func-tions on a feasible set.

Comput. Math. Math. Phys. , 43:380–390, 2003.[58] A. S. Strekalovsky. On local search in d.c. optimization problems.

Appl.Math. Comput. , 255:73–83, 2015.[59] M. Thera. Subdiﬀerential calculus for convex operators.

J. Math. Anal.Appl. , 80:78–91, 1981.[60] M. Todd. Semideﬁnite optimization.

Acta Numerica , 10:515–560, 2001.[61] A. H. Tor, A. Bagirov, and B. Karas¨ozen. Aggregate codiﬀerential methodfor nonsmooth DC optimization.

J. Comput. Appl. Math. , 259:851–867,2014.[62] L. T. Tung. Karush-Kuhn-Tucker optimality conditions for nonsmoothmultiobjective semideﬁnite and semi-inﬁnite programming.

J. Appl. Nu-mer. Optim. , 1:63–75, 2019.[63] H. Tuy. A general deterministic approach to global optimization via D.C.programming. In J.-B. Hiriart-Urruty, editor,

Fermat Days 85: Mathemat-ics for Optimization. North-Holland Mathematics Studies. Vol. 129 , pages273–303. Norht-Holland, Amsterdam, 1986.4164] H. Tuy.

Convex Analysis and Global Optimization . Kluwer Academic Pub-lishers, Dordrecht, 1998.[65] H. Tuy. On some recent advances and applications of D.C. optimization.In V. H. Nguyen, J. J. Strodiot, and P. Tossings, editors,

Optimization.Lecture Notes in Economics and Mathematical Systems, vol. 481 , pages473–497. Springer, Berling, Heidelberg, 2000.[66] H. Tuy. On global optimality conditions and cutting plane algorithms.

J.Optim. Theory Appl. , 118:201–216, 2003.[67] W. van Ackooij and W. de Oliveira. Non-smooth DC-constrained opti-mization: constraint qualiﬁcation and minimizing methodologies.

Optim.Methods Softw. , 34:890–920, 2019.[68] W. van Ackooij and W. de Oliveira. Nonsmooth and nonconvex optimiza-tion via approximate diﬀerence-of-convex decompositions.

J. Optim. The-ory Appl. , 182:49–80, 2019.[69] H. Yamashita and H. Yabe. A primal-dual interior point method for nonlin-ear optimization over second-order cones.

Optim. Meth. Softw. , 24:407–426,2009.[70] H. Yamashita and H. Yabe. A survey of numerical methods for nonlinearsemideﬁnite programming.

J. Oper. Res. Soc. Japan , 58:24–60, 2015.[71] A. L. Yuille and A. Rangarajan. The concave-convex procedure.

NeuralComput. , 15:915–936, 2003.[72] Q. Zhang. A new necessary and suﬃcient global optimality condition forcanonical DC problems.

J. Glob. Optim. , 55:559–577, 2013.[73] X. Y. Zheng and X. Yang. Lagrange multipliers in nonsmooth semi-inﬁniteoptimization problems.