DC Semidefinite Programming and Cone Constrained DC Optimization
aa r X i v : . [ m a t h . O C ] F e b DC Semidefinite Programming and ConeConstrained DC Optimization
M.V. Dolgopolik ∗† February 3, 2021
Abstract
In the first part of this paper we discuss possible extensions of themain ideas and results of constrained DC optimization to the case ofnonlinear semidefinite programming problems (i.e. problems with ma-trix constraints). To this end, we analyse two different approaches tothe definition of DC matrix-valued functions (namely, order-theoretic andcomponentwise), study some properties of convex and DC matrix-valuedfunctions and demonstrate how to compute DC decompositions of somenonlinear semidefinite constraints appearing in applications. We also com-pute a DC decomposition of the maximal eigenvalue of a DC matrix-valuedfunction, which can be used to reformulate DC semidefinite constraintsas DC inequality constrains.In the second part of the paper, we develop a general theory of coneconstrained DC optimization problems. Namely, we obtain local opti-mality conditions for such problems and study an extension of the DCalgorithm (the convex-concave procedure) to the case of general cone con-strained DC optimization problems. We analyse a global convergence ofthis method and present a detailed study of a version of the DCA utilisingexact penalty functions. In particular, we provide two types of sufficientconditions for the convergence of this method to a feasible and criticalpoint of a cone constrained DC optimization problem from an infeasiblestarting point.
Starting with the pioneering works of Hiriart-Urruty [23, 24], Pham Dinh andSouad [50], Strekalovsky [55], Tuy [63], and many others in the 1980s, DC (Dif-ference of Convex functions) programming has been one of the most active areasof research in nonlinear nonconvex optimization. One of the main features ofDC optimization problems is the fact that one can derive constructive globaloptimality conditions [13, 25, 56, 66, 72] and develop deterministic global opti-mization methods [15, 26, 39, 57, 64, 65] for this class of problems. Local searchmethods for minimizing DC functions have also attracted a considerable atten-tion of researchers (see [17, 28, 58, 61] and the references therein). ∗ Institute for Problems in Mechanical Engineering, Russian Academy of Sciences, SaintPetersburg, Russia † This work was performed in IPME RAS and supported by the Russian Science Foundation(Grant No. 20-71-10032).
DC Algorithm (DCA), originally presented byPham Dinh and Souad in [50] and later on thoroughly investigated in the worksof Le Thi and Pham Dinh et al. [35,36,38,46,47] (a particular version of the DCAis sometimes called the concave-convex/convex-concave procedure [34,71]). Someclosely related local search methods were studied in the works of de Oliveira etal. [8, 9, 67, 68]. For a detailed survey on DC programming, DC Algorithm, andtheir applications see [37]. A comprehensive literature review of the DC algo-rithm, the convex-concave procedure, and other related optimization methodscan be found in [40].Cone constrained optimization is one the central areas of constrained op-timization, since it provides a unified setting for many different problems ap-pearing in applications. Standard equality and inequality constrained problems,semidefinite programming problems [32,54,60], second order cone programmingproblems [2], semi-infinite programming problems [18, 49], and many other par-ticular problems (see, e.g. [3,5,43]) can be formulated as general cone constrainedoptimization problems.A detailed theoretical analysis of smooth and nonsmooth cone constrainedoptimization problems was presented in [4,16,30,42,62,73]. Optimization meth-ods for solving various convex cone constrained optimization problems can befound in [3,5,43], while algorithms for solving various classes of nonconvex coneconstrained optimization problems were developed, e.g. in [6,31,32,53,54,69,70](see also the references therein).Despite the abundance of publications on cone constrained optimization and(usually inequality) constrained DC optimization problems, very little attentionhas been paid to extensions of the main results and methods of DC optimizationto the case of problems with cone constraints. Even in the comprehensive sur-vey paper [37], only unconstrained and inequality constrained DC optimizationproblems are discussed.The convex-concave procedure and the penalty convex-concave procedure forsolving cone constrained DC optimization problems were proposed by Lipp andBoyd in [40], where an application of these methods to multi-matrix principalcomponent analysis was presented. However, to the best of the author’s knowl-edge, a convergence analysis of these methods remains an open problem. Anapplication of the DCA to bilinear and quadratic matrix inequality feasibilityproblems was considered by Niu and Dinh [44].The main goal of this paper is to fill in the gap and extend some of the mainresults and algorithms of inequality constrained DC optimization to the case ofDC optimization problems with DC cone constraints, particularly, DC semidefi-nite programming problems. To this end, in the first part of the paper we studytwo different approaches to the definition of DC matrix-valued functions: order-theoretic and componentwise. We obtain several useful properties of convex andDC matrix-valued functions, prove that any DC (in the order-theoretic sense)matrix-valued function is necessarily componentwise DC, and demonstrate howone can compute DC decompositions of several nonlinear matrix-valued func-tions appearing in applications. We also construct a DC decomposition of themaximal eigenvalue of a componentwise DC matrix-valued function. This resultallows one to easily extend all results and methods of inequality constrained DCoptimization to the case of DC optimization problems with componentwise DCsemidefinite constraints. 2he second part of the paper is devoted to abstract cone constrained DCoptimization problems. We obtain local optimality conditions for such problemsin several different forms and present a detailed convergence analysis of the algo-rithms for solving cone constrained DC optimization problems proposed in [40],thus providing a theoretical foundation for applications of the methods from [40].We prove a global convergence of the convex-concave procedure (CCP)/DC Al-gorithm from [40] to a critical point of the problem under consideration andpresent a comprehensive analysis of a penalized version of this method. Weobtain sufficient conditions for the exactness of the penalty subproblem, estab-lish a global convergence of the penalty CCP to generalized critical points, andprovide two types of sufficient conditions for a convergence of the penalty CCPto a feasible and critical point of a cone constrained DC optimization problemfrom an infeasible starting point. We also discuss why the penalty CCP mightbe superior to the non-penalized version of this method in the case when thefeasible starting point is known.The paper is organized as follows. Order-theoretic and componentwise ap-proaches to DC matrix valued functions are studied in Section 2, while a DCstructure of the maximal eigenvalue of a nonlinear matrix-valued function isdiscussed in Section 3. Section 4 is devoted to general cone constrained DCoptimization problems. Local optimality conditions for such problems are ob-tained in Subsection 4.2. A convergence analysis of the DC algorithm (theconvex-concave procedure) for cone constrained DC optimization problems pro-posed by Lipp and Boyd [40] is presented in Subsection 4.3, while a detailedconvergence analysis of the penalty convex-concave procedure from [40] is givenin Subsections 4.4–4.6. Finally, some auxiliary results on vector-valued convexmappings and convex multifunctions are collected in Subsection 4.1.
Denote by S ℓ the space of all real symmetric matrices of order ℓ ∈ N , and let (cid:22) be the L¨oewner partial order on S ℓ , i.e. A (cid:22) B for some matrices A, B ∈ S ℓ iffthe matrix B − A is positive semidefinite. Nonlinear semidefinite optimizationis concerned with problems of optimizing functions subject to constraints of theform F ( x ) (cid:22)
0, where F : R d → S ℓ is a given nonlinear mapping. To extendthe main ideas and results of DC optimization to the case of nonlinear semidef-inite programming problems, first one must introduce a suitable definition of aDC matrix-valued function F . There are two possible approaches to this defi-nition: order-theoretic and componentwise. Let us discuss and compare theseapproaches.Recall that the matrix-valued function F is called convex (see, e.g. [4,Sect. 5.3.2] and [5, Sect. 3.6.2]), if F ( αx + (1 − α ) x ) (cid:22) αF ( x ) + (1 − α ) F ( x ) ∀ x , x ∈ R d , α ∈ [0 , . Therefore it is natural to call the function F DC (
Difference-of-Convex ), ifthere exist convex functions
G, H : R d → S ℓ such that F = G − H . Any suchrepresentation of the function F (or, equivalently, any such pair of functions( G, H )) is called a
DC decomposition of F .3he definition of matrix-valued DC function given above has several disad-vantages. Firstly, the convexity of matrix-valued functions is much harder toverify than the convexity of real-valued functions. Many matrix-valued functionsthat might seem to be convex judging by the experience with the real-valuedcase are, in actuality, nonconvex. In particular, the convexity of each component F ij ( · ) of F is not sufficient to ensure the matrix convexity of F . Example 1.
Let d = 1, ℓ = 2, and F ( x ) = (cid:16) x x (cid:17) . Then for x = 1 and x = − αF ( x ) + (1 − α ) F ( x ) − F ( αx + (1 − α ) x ) = (cid:16) − (2 α − − (2 α − (cid:17) . This matrix is not positive semidefinite for any α ∈ (0 , F is nonconvex.Secondly, recall that the set S ℓ equipped with the L¨oewner partial orderis not a vector lattice, since by Kadison’s theorem [29] the least upper bound(the supremum) of two matrices in the L¨oewner order exists iff these matricesare comparable. Therefore, many standard results and techniques from convexanalysis do not admit a natural extension to the case of matrix convexity (cf.the general theory of convex vector-valued functions [33, 45, 59], in which theassumption on the completeness of partial order is often indispensable). Forexample, in most cases the supremum of two convex matrix-valued functions isnot correctly defined.Nevertheless, there are some similarities between matrix-valued DC func-tions and their real-valued counterparts. In particular, one can construct aDC decomposition of a twice continuously differentiable matrix-valued functionwith bounded Hessian in the same way one can construct DC decomposition ofa twice continuously differentiable real-valued function.Let I ℓ be the identity matrix of order ℓ . Denote by | · | the Euclidean norm,by h· , ·i the inner product in R k , and by k A k F = p Tr( A ) the Frobenius normof a matrix A . Theorem 1.
Let a function F : R d → S ℓ be twice continuously differentiableand suppose that there exists M > such that k∇ F ij ( x ) k F ≤ M for all i, j ∈{ , . . . , ℓ } . Then the function F is DC and for any µ ≥ ℓM the pair ( G, H ) with G ( x ) = F ( x ) + µ | x | I ℓ and H ( x ) = µ | x | I ℓ , x ∈ R d , is a DC decomposition ofthe function F .Proof. Observe that by the definitions of matrix convexity and the L¨oewnerpartial order a function G : R d → S ℓ is convex iff for any z ∈ R ℓ one has h z, (cid:16) αG ( x ) + (1 − α ) G ( x ) − G ( αx + (1 − α ) x ) (cid:17) z i ≥ x , x ∈ R d and α ∈ [0 ,
1] or, equivalently, h z, G ( αx + (1 − α ) x ) z i ≤ α h z, G ( x ) z i + (1 − α ) h z, G ( x ) z i . Therefore, a function G : R d → S ℓ is convex iff for any z ∈ R ℓ the real-valuedfunction G z ( · ) = h z, G ( · ) z i is convex. Consequently, in the case when G is twicecontinuously differentiable, this function is convex iff for any z the Hessian of4he function G z is positive semidefinite, i.e. for all x ∈ R d and z ∈ R ℓ thematrix ∇ G z ( x ) = ℓ X i,j =1 z i z j ∇ G ij ( x )is positive semidefinite.Let us now turn to the proof of the theorem. Define G ( x ) = F ( x ) + µ | x | I ℓ and H ( x ) = µ | x | I ℓ , x ∈ R d , for some µ ≥
0. Let us check that the functions G and H are convex, provided µ ≥ ℓM . Then one can conclude that F is a DCfunction and the pair ( G, H ) is a DC decomposition of F .Indeed, for any x ∈ R d and v, z ∈ R ℓ one has h v, ∇ G z ( x ) v i = ℓ X i,j =1 z i z j h v, ∇ F ij ( x ) v i + µ ℓ X i =1 z i | v | Applying the Cauchy-Schwarz inequality, the inequality | z i z j | ≤ . z i + z j ),and the fact that the Frobenius norm is compatible with the Euclidean normone gets that h v, ∇ G z ( x ) v i ≥ − | v | ℓ X i,j =1 k∇ F ij ( x ) k F (cid:0) z i + z j (cid:1) + µ ℓ X i =1 z i | v | = | v | ℓ X i =1 µ − ℓ X j =1 k∇ F ij ( x ) k z i . Hence for any x ∈ R d and µ ≥ ℓM one has h v, ∇ G z ( x ) v i ≥ ∀ v ∈ R ℓ , that is, the Hessian ∇ G z ( x ) is positive semidefinite, which implies that thematrix-valued function G ( x ) = F ( x ) + µ | x | I ℓ is convex. The convexity of thefunction H can be readily verified directly.The difficulties connected with the use of matrix convexity motivate us toconsider a different approach to the definition of DC matrix-valued functions. Definition 1.
A function F : R d → S ℓ is called componentwise convex , if eachcomponent F ij ( · ), i, j ∈ { , . . . , ℓ } , is convex. The function F is called compo-nentwise DC , if there exist componentwise convex functions G, H : R d → S ℓ suchthat F = G − H . Any such representation of the function F (or, equivalently,any such pair of functions ( G, H )) is called a componentwise DC decomposition of F .Many properties of real-valued DC functions can be easily extended to thecase of componentwise DC matrix-valued functions. For example, a linear com-bination of componentwise DC functions is obviously componentwise DC. Withthe use of the well-known results of Hartman [21] one can easily see that theHadamard and the Kronecker products of componentwise DC matrix-valuedfunctions are componentwise DC. Furthermore, applying the representation ofinverse matrix via adjugate matrix one can verify that if a function F : R d → S ℓ
5s componentwise DC and for all x ∈ R d the matrix F ( x ) is invertible, then theinverse-matrix function F − mapping x to ( F ( x )) − is also componentwise DC.Let us point out some connections between convex/DC and componentwiseconvex/DC matrix-valued functions. As Example 1 demonstrates, component-wise convex matrix-valued functions need not be convex. On the other hand,from the fact that for any convex matrix-valued function F the real-valued func-tion h z, F ( · ) z i is convex for all z ∈ R ℓ it follows that all diagonal components F ii ( · ) of a convex matrix-valued function F must be convex (put z = e i for everyvector e i from the canonical basis of R ℓ ). However, non-diagonal componentsof F need not be convex. Example 2.
Let d = 1, ℓ = 2, and F ( x ) = (cid:16) . x sin x sin x . x (cid:17) . Then for all z ∈ R and x ∈ R one has X i,j =1 z i z j ∇ F ij ( x ) = z − x ) z z + z ≥ z − | z || z | + z = ( | z |−| z | ) ≥ . Consequently, the function F is convex by [4, Proposition 5.72, part (ii)], despitethe fact that non-diagonal elements of F are nonconvex.Although non-diagonal elements of a convex matrix-valued function F mightbe nonconvex, they cannot be too ‘wild’, e.g. discontinuous. Namely, the fol-lowing result hold true. Theorem 2.
Let a function F : R d → S ℓ be convex. Then for all i, j ∈{ , . . . , ℓ } , i = j , the function F ij is DC and, therefore, Lipschitz continuous onany bounded set and twice differentiable almost everywhere.Proof. We prove the theorem by induction in ℓ . The case ℓ = 1 is trivial. Letus prove the case ℓ = 2 in order to highlight the main idea of the proof.As was noted above, the function h z, F ( · ) z i is convex for all z ∈ R ℓ , which,in particular, implies that the functions F ( · ) and F ( · ) are convex. For thevector z = (1 , T one obtains that the function F z ( x ) = h z, F ( x ) z i = F ( x ) + 2 F ( x ) + F ( x ) , x ∈ R d is convex as well. Therefore the function F ( x ) = F ( x ) = 12 F z ( x ) −
12 ( F ( x ) + F ( x ))is DC, which completes the proof of the case ℓ = 2. Inductive step.
Suppose that the theorem is valid for some ℓ ∈ N . Let usprove it for ℓ + 1. The function F z ( · ) = h z, F ( · ) z i is convex for all z ∈ R ℓ +1 .Putting z = ( z , . . . , z ℓ , T and z = (0 , z , . . . , z ℓ +1 ) T for any z i ∈ R , i ∈{ , . . . , ℓ + 1 } one obtains that the matrix-valued functions G ( x ) = F ( x ) . . . F ℓ ( x )... ... ... F ℓ ( x ) . . . F ℓℓ , H ( x ) = F ( x ) . . . F ℓ +1) ( x )... ... ... F ( ℓ +1)2 ( x ) . . . F ( ℓ +1)( ℓ +1) are convex. Therefore by the induction hypothesis all functions F ij , i, j ∈{ , . . . , ℓ + 1 } are DC, except for F ℓ +1) (or, equivalently, F ( ℓ +1)1 , since F ( x ) isby definition a symmetric matrix). 6or z = (1 , . . . , T one gets that the function F z ( x ) = ℓ +1 X i,j =1 F ij ( x ) , x ∈ R d is convex, which obviously implies that the function F ℓ +1) is DC.As simple corollaries to the previous theorem we obtain straightforwardextensions of some well-known results for real-valued convex function to thematrix-valued case. Corollary 1.
Let a function F : R d → S ℓ be convex. Then F is Lipschitzcontinuous on bounded sets, i.e. for any bounded set K ⊂ R d there exists L > such that k F ( x ) − F ( x ) k F ≤ L | x − x | for all x , x ∈ K . Corollary 2 (Aleksandrov-Busemann-Feller theorem for matrix-valued func-tions) . Let a function F : R d → S ℓ be convex. Then F is twice differentiablealmost everywhere.Remark . Note that the statement of Theorem 2 is obviously true for locallyconvex (i.e. convex in a neighbourhood of every point) matrix-valued functionsdefined on not necessarily convex sets. Therefore, the previous corollary remainstrue in this case as well. Namely, every locally convex function F : U → S ℓ defined on an open set U ⊂ R d is twice differentiable almost everywhere on U .Since the difference of two real-valued DC functions is a DC function, The-orem 2 also allows one to point out a direct connection between DC and com-ponentwise DC functions. Corollary 3.
Any DC function F : R d → S ℓ is componentwise DC. Since the definition of DC function provides a lot of flexibility (namely, thereare infinitely many DC decompositions of a given function), it seems reasonableto assume that despite some drawbacks of matrix convexity the class of matrix-valued DC functions is sufficiently rich. In particular, one might ask whetherthe class of matrix valued DC functions coincides with the class of componen-twise DC functions or there are some componentwise DC functions that arenot DC (a characterization of such functions would provide a deep insight intothe structure of DC matrix-valued functions). Another interesting question iswhether the matrix DC property is preserved under standard operations, suchas the Hadamard/Kronecker product and inversion. Arguing in the same wayas in the proof of Theorem 1 one can easily check that for twice continuouslydifferentiable matrix-valued functions the answer to this question is positive,provided one considers locally
DC functions. However, it is unclear whether theclasses of locally and globally DC functions coincide in the matrix-valued case(for componentwise DC functions this statement is obviously true due to thecelebrated result of Hartman [21]).In the end of this section, let us present several simple examples of DCsemidefinite constraints appearing in applications and their DC decompositions.These examples, in particular, demonstrate some benefits of using matrix-valuedDC functions in comparison with componentwise DC functions.7 xample 3 (Quadratic/Bilinear Constraints) . Suppose that F ( x ) = C + d X i =1 x i B i + d X i,j =1 x i x j A ij (1)for some matrices C, B i , A ij ∈ S ℓ . In particular, one can suppose that thefunction F ( x ) is bilinear/biaffine, that is, F ( x, y ) = A + d X i =1 x i A i + m X j =1 y j A j + d X i =1 m X j =1 x i y j A ij , ∀ x ∈ R d , y ∈ R m for some matrices A ij ∈ S ℓ . Such nonlinear matrix constraints appear in prob-lems of simultaneous stabilisation of single-input single-output linear systemsby one fixed controller of a given order [22,54], robust gain-scheduling and somedecentralized control problems [19, 20], problems of maximizing the minimaleigenfrequency of a given structure [54], etc.By Theorem 1 the function F of the form (1) is DC and for any µ ≥ ℓM ,where M = max s,k ∈{ ,...,ℓ } d X i,j =1 [ A ij ] sk , the pair G ( x ) = C + d X i =1 x i B i + d X i,j =1 x i x j A ij + µ | x | I ℓ , H ( x ) = µ | x | I ℓ is a DC decomposition of F . Note that to compute a componentwise DC de-composition of the function F one would have to compute DC decompositionsof ℓ quadratic functions of the form d X i,j =1 [ A ij ] sk x i x j , s, k ∈ { , . . . , ℓ } . Moreover, in the general case the function H (the concave part) from a compo-nentwise DC decomposition of F would not be diagonal.It should be noted that a different DC decomposition of the function F canbe constructed. Namely, as was shown in [4, Example 5.74] a matrix-valuedfunction F of the form (1) is convex, if the ℓd × ℓd block matrix A = ( A ij ) di,j =1 is positive semidefinite. Therefore, if a decomposition A = A + + A − of thematrix A onto positive semidefinite and negative semidefinite parts is known,one can define G ( x ) = C + d X i =1 x i B i + d X i,j =1 x i x j ( A + ) ij , H ( x ) = − d X i,j =1 x i x j ( A − ) ij . Such DC decomposition can be used if the block matrix A has a relatively simplestructure, e.g. when only diagonal blocks A ii are nonzero.8 xample 4 (Bilinear/Biaffine Matrix Constraints) . Let R ( X , X , X ) = (cid:20) X ( A + BX C ) X X ( A + BX C ) T X (cid:21) (cid:22) X , X ∈ S ℓ , X ∈ R m × m , and for some matrices A ∈ R ℓ × ℓ , B ∈ R ℓ × m ,and C ∈ R m × ℓ . Nonlinear semidefinite constraints involving such functions R (or similar ones) appear, e.g. in optimal H / H ∞ -static output feedbackproblems [54].To apply the results presented in this section to the function R , define d =0 . ℓ ( ℓ +1)+ m +0 . ℓ ( ℓ +1) (here we used the fact that matrix X ∈ S ℓ is definedby ℓ ( ℓ + 1) / x ∈ R d let ( X , X , X ) be the correspondingtriplet of matrices from S ℓ × R m × m × S ℓ , and let F ( x ) = R ( X , X , X ).By Theorem 1 the function F is DC and for any µ ≥ ℓM , where M = max i ∈{ ,...,ℓ } m X k =1 m X k =1 ℓ X k =1 (cid:0) B ik C k k (cid:1) , the pair G ( x ) = F ( x ) + µ (cid:0) k X k F + k X k F (cid:1) I ℓ , H ( x ) = µ (cid:0) k X k F + k X k F (cid:1) I ℓ is a DC decomposition of F . Example 5 (The Stiefel Manifold/Orthogonality Constraint) . Let d = m × ℓ for some m ∈ N , i.e. x is a real matrix of order m × ℓ , which we denote by X .Consider the equality constraint X T X = I ℓ , (2)which is known as the Stiefel manifold or orthogonality constraint appearing inmany applications [1, 14, 40, 41].Following Lipp and Boyd [40], we rewrite equality constraint (2) as twomatrix inequality constraints: G ( X ) = X T X − I ℓ (cid:22) , H ( X ) = I ℓ − X T X (cid:22) . Let, as above, G z ( X ) = h z, G ( X ) z i . Observe that for any X , X ∈ R m × ℓ and α ∈ [0 ,
1] one has αG z ( X ) + (1 − α ) G z ( X ) − G z ( αX + (1 − α ) X )= ( α − α ) h z, X T X z i + (cid:0) (1 − α ) − (1 − α ) (cid:1) h z, X T X z i− α (1 − α ) h z, ( X T X + X T X ) z i = α (1 − α ) (cid:16) | X z | + | X z | − h X z, X z i (cid:17) = α (1 − α ) (cid:12)(cid:12) X z − X z (cid:12)(cid:12) ≥ . Consequently, the function G z is convex for any z ∈ R ℓ , which implies thatthe functions G and − H are matrix convex. Thus, equality constraint (2)can be rewritten as two DC semidefinite constraints. It should be noted thatalthough this transformation is degenerate (we rewrite an equality constraint astwo inequality constraints), numerical experiments reported in [40] demonstratethe effectiveness of an optimization method based on such transformation.9 DC Structure of the Maximal Eigenvalue Func-tion
Since there is no obvious connection between componentwise convexity and theL¨oewner partial order/matrix convexity, componentwise DC matrix-valued func-tions cannot be utilised directly in the abstract setting of nonlinear semidefiniteprogramming problems. Instead, it is natural to apply componentwise DC prop-erty to a reformulation of such problems in which the semidefinite constraint F ( x ) (cid:22) λ max ( F ( x )) ≤ λ max ( A ) is the maximal eigenvalue of a symmetric matrix A .Our aim is to show that for componentwise DC functions F the inequalityconstraint λ max ( F ( x )) ≤ λ max ( F ( · )), if a componentwise DC decom-position of the function F is known. With the use of this result one can easilyextend standard results and algorithms from the theory of DC constrained DCoptimization problems to the case of DC semidefinite optimization problems. Theorem 3.
Let F : R d → S ℓ be a componentwise DC function and F ij = G ij − H ij be a DC decomposition of each component of F , i, j ∈ { , . . . , ℓ } .Then the function λ max ( F ( · )) is DC and the pair ( g, h ) with g ( x ) = max | v |≤ ℓ X i,j =1 (cid:16) ( v i v j + 1) G ij ( x ) + (1 − v i v j ) H ij ( x ) (cid:17) ,h ( x ) = ℓ X i,j =1 (cid:16) G ij ( x ) + H ij ( x ) (cid:17) (3) for all x ∈ R d is a DC decomposition of the function λ max ( F ( · )) .Proof. Fix any x ∈ R d . As is well-known and easy to check, the followingequality holds true: λ max ( F ( x )) = max | v |≤ h v, F ( x ) v i = max | v |≤ ℓ X i,j =1 v i v j F ij ( x ) . Adding and subtracting G ij ( x ) + H ij ( x ) for all i, j ∈ { , . . . , ℓ } and taking intoaccount the equality F ij ( x ) = G ij ( x ) − H ij ( x ) one obtains that λ max ( F ( x )) = max | v |≤ ℓ X i,j =1 (cid:16) ( v i v j + 1) G ij ( x ) + (1 − v i v j ) H ij ( x ) (cid:17) − ℓ X i,j =1 (cid:16) G ij ( x ) + H ij ( x ) (cid:17) =: g ( x ) − h ( x ) . The function h is obviously convex as the sum of convex functions. Moreover,note that v i v j + 1 ≥ − v i v j ≥ | v | ≤
1. Therefore, the function g is also convex as the maximum of the family of convex functions( v i v j + 1) G ij ( x ) + (1 − v i v j ) H ij ( x ) , | v | ≤ . Thus, the function λ max ( F ( · )) is DC and the pair ( g, h ) defined in (3) is a DCdecomposition of this function. 10 emark . Let us make an almost trivial, yet useful observation. By definition g ( x ) = λ max ( F ( x )) + h ( x ). Therefore, there is no need to directly compute themaximum in the definition of g in order to compute g ( x ). One simply has to tofind the maximal eigenvalue of the matrix F ( x ) and then add h ( x ).For the sake of completeness let us point out explicit formulae for the subdif-ferentials of the convex functions g and h from the theorem above. To this end,for any matrix A ∈ S ℓ denote by E max ( A ) the eigenspace of λ max ( A ), i.e. theunion of zero and all eigenvectors of A corresponding to its maximal eigenvalue. Proposition 1.
Let F : R d → S ℓ be a componentwise DC function, F ij = G ij − H ij be a DC decomposition of each component of F , i, j ∈ { , . . . , ℓ } , andthe functions g and h be defined as in (3) . Then for any x ∈ R d one has ∂g ( x ) = co n ℓ X i,j =1 (cid:16) ( v i v j + 1) ∂G ij ( x ) + (1 − v i v j ) ∂H ij ( x ) (cid:17) (cid:12)(cid:12)(cid:12) v ∈ E max ( A ) : | v | = 1 o and ∂h ( x ) = P ℓi,j =1 ( ∂G ij ( x ) + ∂H ij ( x )) , where ‘ co ’ stands for the convex hull.Proof. The expression for ∂h ( x ) follows directly from the standard rules of sub-differential calculus. Let us prove the equality for ∂g ( x ).Indeed, fix any x ∈ R d and denote by V ( A ) the set of all those v ∈ R ℓ with | v | ≤ g ( x ) is attained. Clearly, V ( A ) is a compact set. With the use of the theorem on the subdifferential of thesupremum of an infinite family of convex functions (see, e.g. [27, Theorem 4.2.3])one obtains that ∂g ( x ) = co n ℓ X i,j =1 ( v i v j + 1) ∂G ij ( x ) + (1 − v i v j ) ∂H ij ( x ) (cid:12)(cid:12)(cid:12) v ∈ V ( A ) o . Note that this convex hull is closed as the convex hull of a compact set. There-fore, it remains to show that v ∈ V ( A ) iff v ∈ E max ( A ) and | v | = 1.Observe that for any v ∈ R ℓ one has ℓ X i,j =1 (cid:16) ( v i v j + 1) G ij ( x ) + (1 − v i v j ) H ij ( x ) (cid:17) = ℓ X i,j =1 v i v j (cid:0) G ij ( x ) − H ij ( x ) (cid:1) + h ( x ) = h v, F ( x ) , v i + h ( x ) . Therefore, the maximum over all v ∈ R ℓ with | v | ≤ g ( x )) is attained at exactly the same v as themaximum over all v ∈ R ℓ with | v | ≤ λ max ( F ( x )) + h ( x )). Consequently, one has V ( A ) = n v ∈ R ℓ (cid:12)(cid:12)(cid:12) | v | ≤ , h v, F ( x ) v i = λ max ( F ( x )) o . With the use of the spectral decomposition of the matrix F ( x ) one can easilyverify that λ max ( F ( x )) = h v, F ( x ) v i for some | v | ≤ | v | = 1 and v is aneigenvector of the matrix F ( x ) corresponding to its maximal eigenvalue (i.e. v ∈ E max ( A )), which implies the required result.11hus, if an eigenvector v with | v | = 1 of the matrix F ( x ) correspondingto the maximal eigenvalue λ max ( F ( x )) is computed, one can easily computesubgradients of DC components of the function λ max ( F ( · )) at the point x withthe use of subgradients of the functions G ij and H ij . Remark . Let us note once again that one can rewrite nonlinear semidefiniteprogramming problemminimize f ( x ) subject to F ( x ) (cid:22) , x ∈ A. where A is a convex set, as the following equivalent inequality constrained prob-lem minimize f ( x ) subject to λ max ( F ( x )) ≤ , x ∈ A. (4)In the case when the function f is DC and the function F is componentwiseDC, one can easily extend all existing results and algorithms for inequality con-strained DC optimization problems to the case of problem (4) with use of The-orem 3 and Proposition 1. For the sake of shortness, we leave the tedious taskof explicitly reformulating existing results and algorithms in terms of problem(4) to the interested reader. In the previous section we pointed out how methods and results of DC opti-mization can be applied to nonlinear semidefinite optimization problems withcomponentwise DC constraints. Let us now show how one can extend standardresults from DC optimization to the case when the semidefinite constraint isDC in the order-theoretic sense. Since such extension does not rely on any par-ticular properties of semidefinite problems (i.e. any properties of matrix-valuedfunctions, the L¨owner partial order, etc.) or the finite dimensional nature ofthe problem, following Lipp and Boyd [40] we study optimality conditions andminimization methods for DC semidefinite programming problems in the moregeneral setting of DC cone constrained problems of the formminimize f ( x ) = g ( x ) − h ( x ) , subject to F ( x ) = G ( x ) − H ( x ) (cid:22) K , x ∈ A. ( P )Here g , h are real-valued closed convex functions defined on R d , K is a propercone in a real Banach space Y (i.e. K is a closed convex cone, such that K ∩ ( − K ) = { } ), (cid:22) K is the partial order induced by the cone K , i.e. x (cid:22) K y iff y − x ∈ K , the functions G, H : R d → Y are convex with respect to the cone K (or K -convex), that is, G ( αx + (1 − α ) x ) (cid:22) K αG ( x ) + (1 − α ) G ( x ) ∀ α ∈ [0 , , x , x ∈ R d and the same inequality holds for H , and, finally, A ⊆ X is a closed convex set.Note that the constraint F ( x ) (cid:22) K F ( x ) ∈ − K .Thus, the problem ( P ) is cone constrained DC optimization problem thatconsists in minimizing the DC objective function f subject to the generalizedinequality (or cone) constraint that is DC with respect to the cone K . In thecase when Y = S ℓ and K is the cone of positive semidefinite matrices, theproblem ( P ) becomes a standard nonlinear semidefinite programming problem.12 .1 Some Properties of Convex Mappings Before we proceed to the study of cone constrained DC optimization problems,let us first present two well-known auxiliary results on convex mappings andconvex multifunctions, whose formulations are tailored to our specific setting.For the sake of completeness, we provide detailed proofs of these results.We start with the following well-known characterisation of K -convex func-tions in terms of their derivatives. Lemma 1.
A Gˆateaux differentiable function
Φ : R d → Y is K -convex iff Φ( x ) − Φ( x ) (cid:23) K D Φ( x )( x − x ) ∀ x , x ∈ R d , (5) where D Φ( x ) is the Gˆateaux derivative of Φ at x .Proof. Let Φ be convex. Then by definition α Φ( x ) + (1 − α )Φ( x ) − Φ( αx + (1 − α ) x ) ∈ K ∀ α ∈ [0 , , x , x ∈ R d . Since K is a cone, for any α ∈ (0 ,
1] one hasΦ( x ) − Φ( x ) − α (cid:0) Φ( x + α ( x − x )) − Φ( x ) (cid:1) ∈ K. Passing to the limit as α → +0 and taking into account the fact that the cone K is closed one obtains thatΦ( x ) − Φ( x ) − D Φ( x )( x − x ) ∈ K ∀ x , x ∈ R d or, equivalently, condition (5) holds true.Conversely, if condition (5) holds true then for all x , x ∈ R d and for any α ∈ [0 ,
1] one hasΦ( x ) − Φ( x ( α )) − (1 − α ) D Φ( x ( α ))( x − x ) ∈ K, Φ( x ) − Φ( x ( α )) − αD Φ( x ( α ))( x − x ) ∈ K. where x ( α ) = αx + (1 − α ) x . Multiplying the first expression by α and thesecond expression by 1 − α , and bearing in mind the fact that convex cone isclosed under addition one obtains that α Φ( x ) + (1 − α )Φ( x ) − Φ( x ( α )) ∈ K ∀ α ∈ [0 , , x , x ∈ R d , that is, Φ is K -convex.Let us also present a lemma on solutions of perturbed convex generalizedequations, based on some well-known results on metric regularity of convexmultifunctions (see, e.g. [51]). For any metric space ( X, d ) and all x ∈ X denote B ( x, r ) = { x ′ ∈ X | d ( x ′ , x ) ≤ r } . If X is a normed space, then B X = B (0 , Lemma 2.
Let X be a real Banach space, Z be a metric space, and M z : X → Y , z ∈ Z , be a family of closed convex multifunctions. Suppose that for some z ∗ ∈ Z and x ∈ X one has ∈ int M z ∗ ( X ) and ∈ M z ∗ ( x ) . Suppose also thatthe function z dist(0 , M z ( x )) is continuous at z ∗ and for any ε > thereexists δ > such that M z ∗ ( x + B X ) ⊆ M z ( x + B X ) + εB Y ∀ z ∈ B ( z ∗ , δ ) . Then there exist a neighbourhood U of z ∗ and a mapping ξ : U → X such that ∈ M z ( ξ ( z )) for all z ∈ U , ξ ( z ∗ ) = x , and ξ ( z ) → x as z → z ∗ . roof. Since 0 ∈ int M z ∗ ( X ) and 0 ∈ M z ∗ ( x ), by [51, Thrm. 1] there exists η > ηB Y ⊆ M z ∗ ( x + B X ). By our assumption there exists δ > ηB Y ⊆ M z ∗ ( x + B X ) ⊆ M z ( x + B X ) + η B Y ∀ z ∈ B ( z ∗ , δ ) , which with the use of [51, Lemma 2] implies that η B Y ⊆ M z ( x + B X ) ∀ z ∈ B ( z ∗ , δ ) . Therefore by [51, Thrm. 2] for all x ∈ X and z ∈ B ( z ∗ , δ ) one hasdist( x, M − z (0)) ≤ η (cid:0) k x − x k (cid:1) dist(0 , M z ( x )) . Putting x = x one obtains that for any z ∈ B ( z ∗ , δ ) there exists ξ ( z ) ∈ M − z (0)such that k x − ξ ( z ) k ≤ (4 /η ) dist(0 , M z ( x )). Note that ξ ( z ∗ ) = x , since 0 ∈ M z ∗ ( x ). Moreover, from the fact that the function z dist(0 , M z ( x ) is contin-uous at z ∗ it follows that ξ ( z ) → x as z → z ∗ , which completes the proof. Remark . Roughly speaking, the previous lemma states that if 0 ∈ int M z ∗ ( X )and 0 ∈ M z ∗ ( x ), then under certain semicontinuity assumptions for any z ina neighbourhood of z ∗ there exists a solution ξ ( z ) of the generalized equation0 ∈ M z ( x ) continuously depending on z and such that ξ ( z ∗ ) = x . Corollary 4.
Let X be a Banach space, A ⊆ X be a closed convex set, K ⊆ Y be a proper cone, and Φ , Ψ : X → Y be K -convex functions. Suppose that Φ is continuous on A , Ψ is continuously Fr´echet differentiable on A , and thefollowing constraint qualification holds true ∈ int n Φ( x ) − Ψ( x ∗ ) − D Ψ( x ∗ )( x − x ∗ ) + K (cid:12)(cid:12)(cid:12) x ∈ A o (6) for some x ∗ ∈ A such that Φ( x ∗ ) − Ψ( x ∗ ) (cid:22) K . Then for any x ∈ A such that Φ( x ) − Ψ( x ∗ ) − D Ψ( x ∗ )( x − x ∗ ) (cid:22) K there exists a neighbourhood U of x ∗ and a mapping ξ : U ∩ A → A such that Φ( ξ ( z )) − Ψ( z ) − D Ψ( z )( ξ ( z ) − z ) (cid:22) K ∀ z ∈ U ∩ A ξ ( x ∗ ) = x , and ξ ( z ) → x as z → x ∗ .Proof. For any z ∈ X introduce the convex function Φ z : X → Y defined asΦ z ( x ) = Φ( x ) − Ψ( z ) − D Ψ( z )( x − z ) and the set-valued mapping M z ( x ) = ( Φ z ( x ) + K , if x ∈ A , ∅ , if x / ∈ A . (7)The multifunction M z is closed due to the facts that the function Φ z ( · ) is con-tinuous and the sets A and K are closed. Moreover, this multifunction is convex.Indeed, by the convexity of Φ for any x , x ∈ A and all α ∈ [0 ,
1] one has α Φ z ( x ) + (1 − α )Φ z ( x ) ∈ Φ z ( αx + (1 − α ) x )) + K , K implies that αM z ( x ) + (1 − α ) M z ( x ) ⊆ Φ z ( αx + (1 − α ) x )) + K + α K + (1 − α ) K⊆ M z ( αx + (1 − α ) x )for all x , x ∈ A and α ∈ [0 , M z is convex.Our aim is to apply Lemma 2 with Z = A and z ∗ = x ∗ . Indeed, by definition0 ∈ M z ∗ ( x ), while condition (6) implies that 0 ∈ int M z ∗ ( X ).From the fact that Ψ is continuously Fr´echet differentiable on A it followsthat for any ε > δ < min { , ε/ k D Ψ( z ∗ ) k ) } such that k Ψ( z ) − Ψ( z ∗ ) k < ε , k D Ψ( z ) − D Ψ( z ∗ ) k < ε k x k + k z ∗ k )for all z ∈ B ( z ∗ , δ ) ∩ A . Choose any y ∈ M z ∗ ( x + B X ). By definition there exist x ∈ ( x + B X ) ∩ A and w ∈ K such that y = Φ z ∗ ( x ) + w . Observe that k Φ z ( x ) + w − y k = k Φ z ( x ) − Φ z ∗ ( x ) k≤ k Ψ( z ) − Ψ( x ∗ ) k + k D Ψ( z ) − D Ψ( z ∗ ) kk x − z k + k D Ψ( z ∗ ) kk z − z ∗ k < ε for all z ∈ B ( z ∗ , δ ) ∩ A , which implies that M z ∗ ( x + B X ) ⊆ M z ( x + B X ) + εB Y ∀ z ∈ B ( x, δ ) ∩ A . Thus, it remains to show that the restriction of the function dist(0 , M z ( x ) to A is continuous.By definition dist(0 , M z ( x )) = dist(Φ z ( x ) , − K ) (see (7)). With the use ofthe fact that Ψ is continuously Fr´echet differentiable one obtains that for any ε > r < min { , ε/ k D Ψ( z ∗ ) k ) } such that k Ψ( z ) − Ψ( z ∗ ) k < ε , k D Ψ( z ) − D Ψ( z ∗ ) k < ε k x k + k z ∗ k + 1)for all z ∈ B ( z ∗ , r ) ∩ A . Therefore for any such z one has k Φ z ( x ) − Φ z ∗ ( x ) k ≤ k Ψ( z ) − Ψ( z ∗ ) k + k D Ψ( z ) − D Ψ( z ∗ ) kk x − z k + k D Ψ( z ∗ ) kk z − z ∗ k < ε, which implies that for any z ∈ B ( x ∗ , r ) ∩ A the following inequality hold true:dist(0 , M z ( x )) = dist(Φ z ( x ) , − K ) ≤ k Φ z ( x ) − Φ x ∗ ( x ) k < ε (here we used the fact that Φ z ∗ ( x ) ∈ − K ). Thus, all assumptions of Lemma 2with Z = A and z ∗ = x ∗ are valid, and by this lemma there exists a requiredmapping ξ ( z ). Let us extend well-known local optimality conditions for constrained DC opti-mization problems to the case of the problem ( P ). To the best of the author’sknowledge, standard subdifferential calculus cannot be extended to the case ofconvex matrix-valued functions and many other K -convex vector-valued func-tions, which makes it very difficult to deal with subdifferentials of such functions.Therefore, below we suppose that the function H (the K -concave part of F ) iscontinuously differentiable, but do not impose any smoothness assumptions onthe objective function f . 15 heorem 4. Let x ∗ be a locally optimal solution of the problem ( P ) and thefunction H be Fr´echet differentiable at x ∗ . Then for any v ∈ ∂h ( x ∗ ) the point x ∗ is a globally optimal solutions of the following convex programming problem: minimize g ( x ) − h ( x ∗ ) − h v, x − x ∗ i subject to G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) (cid:22) K , x ∈ A, (8) where DH ( x ∗ ) is the Fr´echet derivative of H at x ∗ .Proof. Denote by ω v ( x ) = g ( x ) − h ( x ∗ ) − h v, x − x ∗ i , x ∈ R d , the objectivefunction of problem (8). This function is convex. Moreover, taking into accountthe fact that by the definition of subgradient h ( x ) ≥ h ( x ∗ ) + h v, x − x ∗ i , oneobtains that ω v ( x ) ≥ f ( x ) for all x ∈ R d and ω v ( x ∗ ) = f ( x ∗ ).Arguing by reductio ad absurdum, suppose that there exists v ∈ ∂h ( x ∗ )such that the point x ∗ is not a globally optimal solution of problem (8), i.e.there exists a feasible point x of this problem such that ω v ( x ) < ω v ( x ∗ ). Define x ( α ) = αx + (1 − α ) x ∗ . Then f ( x ( α )) ≤ ω v ( x ( α )) ≤ αω v ( x ) + (1 − α ) ω v ( x ∗ ) < ω v ( x ∗ ) = f ( x ∗ ) (9)for all α ∈ (0 , ω v .Let us check that x ( α ) is a feasible point of the problem ( P ) for all α ∈ [0 , x ∗ is not a locally optimalsolution of the problem ( P ), which contradicts the assumption of the theorem.Indeed, by Lemma 1 one has H ( x ( α )) − H ( x ∗ ) − DH ( x ∗ )( x ( α ) − x ∗ ) ∈ K for all α ∈ [0 , G ( x ( α )) one obtains that − F ( x ( α )) + G ( x ( α )) − H ( x ∗ ) − DH ( x ∗ )( x ( α ) − x ∗ ) ∈ K ∀ α ∈ [0 , F ( x ( α )) (cid:22) K G ( x ( α )) − H ( x ∗ ) − DH ( x ∗ )( x ( α ) − x ∗ ) ∀ α ∈ [0 , . Hence taking into account the fact that the point x ( α ) is feasible for problem(8) due to the convexity of this problem one can conclude that F ( x ( α )) (cid:22) K x ( α ) is a feasible point of the problem ( P ) and the proof is complete.Let us reformulate optimality conditions from the previous theorem. Denoteby Ω( x ∗ ) the feasible region of problem (8) and for any convex set V ⊆ R d and x ∈ V denote by N V ( x ) = { v ∈ R d | h v, z − x i ≤ ∀ z ∈ V } the normal cone to V at x . Corollary 5.
Let x ∗ be a locally optimal solution of the problem ( P ) and thefunction H be differentiable at x ∗ . Then ∂h ( x ∗ ) ⊆ ∂g ( x ∗ ) + N Ω( x ∗ ) ( x ∗ ) . Proof.
Fix any v ∈ ∂h ( x ∗ ). By Theorem 4 the point x ∗ is a globally optimalsolution of the convex problem (8). Applying standard necessary and suffi-cient optimality conditions for a convex function on a convex set (see, e.g. [27,Theorem 1.1.2’]) one obtains that 0 ∈ ∂ω v ( x ∗ ) + N Ω( x ∗ ) ( x ∗ ), where, as above, ω v ( x ) = g ( x ) − h ( x ∗ ) − h v, x − x ∗ i is the objective function of problem (8).Since ∂ω ( x ∗ ) = ∂g ( x ∗ ) − v , one gets that v ∈ ∂g ( x ∗ ) + N Ω( x ∗ ) ( x ∗ ), whichimplies the desired result. 16n the case when a natural constraint qualification (namely, Slater’s condi-tion for problem (8)) holds at x ∗ , one can show that optimality conditions fromTheorem 4 coincide with standard optimality conditions for cone constrainedoptimization problems (see, e.g. [4]). To this end, denote by Y ∗ the topologicaldual space of Y and by h· , ·i the canonical duality pairing between Y and Y ∗ ,that is, h y ∗ , y i = y ∗ ( y ) for any y ∗ ∈ Y ∗ and y ∈ Y .Let K ∗ = { y ∗ ∈ Y ∗ | h y ∗ , y i ≥ ∀ y ∈ K } be the dual cone of K and for any λ ∈ Y ∗ define L ( x, λ ) = f ( x ) + h λ, F ( x ) i . Corollary 6.
Let x ∗ be a locally optimal solution of the problem ( P ) and thefunctions G and H be Fr´echet differentiable at x ∗ . Suppose also that the follow-ing constraint qualification holds true: ∈ int n G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) + K (cid:12)(cid:12)(cid:12) x ∈ A o (if K has nonempty interior, it is sufficient to suppose that there exists x ∈ A such that G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) ∈ − int K ). Then for any v ∈ ∂h ( x ∗ ) there exists a multiplier λ ∗ ∈ K ∗ such that h λ ∗ , F ( x ∗ ) i = 0 and v ∈ ∂g ( x ∗ ) + D (cid:16) h λ ∗ , F ( · ) i (cid:17) ( x ∗ ) + N A ( x ∗ ) . In particular, if both g and h are differentiable at x ∗ , then there exists λ ∗ ∈ K ∗ such that h λ ∗ , F ( x ∗ ) i = 0 and h D x L ( x ∗ , λ ∗ ) , x − x ∗ i ≥ for all x ∈ A .Proof. Rewriting problem (8) as the convex cone constrained problemminimize g ( x ) − h ( x ∗ ) − h v, x − x ∗ i subject to G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) ∈ − K, x ∈ A and applying standard necessary and sufficient optimality conditions for convexcone constrained optimization problems (see, e.g. [4, Theorem 3.6 and Proposi-tion 2.106]) we arrive at the required result. Remark . In the case of semidefinite programs, i.e. when Y = S ℓ and K is thecone of semidefinite matrices, the dual cone K ∗ coincides with K (if we identifythe dual of S ℓ with the space S ℓ itself), and thus the multiplier λ ∗ from theprevious corollary is a positive semidefinite matrix. In addition, the constraintqualification from the corollary takes the form: there exists x ∈ A such that thematrix G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) is negative definite. The optimality conditions from the previous section can be applied to a con-vergence analysis of the so-called convex-concave procedure (CCP) for coneconstrained DC optimization problems proposed in [40], which can be viewedas an extension of the renown DC Algorithms [35, 37, 47] to the case of coneconstrained problems. A general scheme (algorithmic pattern) of this methodfor the problem ( P ) is given in Algorithm 1. Let us note that the convex sub-problem on Step 3 of Algorithm 1 can be solved with the use of interior pointmethods (see, e.g. [5, Sect. 11.6]), augmented Lagrangian methods [32, 54], etc.Our aim is to prove a convergence theorem for Algorithm 1. Clearly, in thenonsmooth case (more precisely, when h is nonsmooth) one cannot expect a17 lgorithm 1: DC Algorithm/The Convex-Concave Procedure (CCP).
Step 1.
Choose a feasible initial point x and set n := 0. Step 2.
Compute v n ∈ ∂h ( x n ) and DH ( x n ). Step 3.
Set the value of x n +1 to a solution of the convex problemminimize g ( x ) − h v n , x − x n i subject to G ( x ) − H ( x n ) − DH ( x n )( x − x n ) (cid:22) K , x ∈ A. If x n +1 = x n , Stop . Otherwise, put n := n + 1 and go to Step 2 .sequence { x n } generated by this algorithm to converge to a point satisfying op-timality conditions from Corollary 5. Furthermore, these optimality conditionsare often too restrictive for applications, since they require the knowledge ofthe entire subdifferential ∂h ( x ∗ ), which might make verification of these con-ditions too computationally expensive or even impossible. That is why oneusually establishes a convergence of DC optimization methods to so-called crit-ical points [37, 67]. Recall that a point x ∗ is said to be critical for the problem( P ), if the following condition holds true: ∂h ( x ∗ ) ∩ (cid:16) ∂g ( x ∗ ) + N Ω( x ∗ ) ( x ∗ ) (cid:17) = ∅ . Note that this condition is satisfied iff there exists v ∈ ∂h ( x ∗ ) such that x ∗ is aglobally optimal solution of convex problem (8) (cf. Theorem 4 and Corollary 5).Hence, in particular, if a point x n on Step 3 of Algorithm 1 is not critical for theproblem ( P ), then x n is not a solution of the corresponding convex subproblem.In other words, if Algorithm 1 terminates on step n ∈ N , then x n is a criticalpoint for the problem ( P ).The proof of the following theorem was largely inspired by the convergenceanalysis of an algorithmic pattern for inequality constrained DC optimizationproblems from [67, Section 3.1]. However, let us note that we prove the globalconvergence of Algorithm 1 to a critical point under assumptions that are dif-ferent from the ones used in [67]. Theorem 5.
Let the function f be bounded below on the feasible region of theproblem ( P ) , G be continuous on A , H be continuously Fr´echet differentiableon A , and a sequence { x n } be generated by Algorithm 1. Then the followingstatements hold true:1. the feasible region Ω( x n ) of the convex subproblem on Step 3 of the algo-rithm is nonempty for all n ∈ N ∪ { } , and the sequence { x n } is feasiblefor the problem ( P ) ;2. for any n ∈ N ∪ { } either x n is a critical point of the problem ( P ) andthe process terminates at step n or f ( x n +1 ) < f ( x n ) ; moreover, if thealgorithm does not terminate, then the sequence { f ( x n ) } converges;3. if the function h is strongly convex with constant µ > , then f ( x n +1 ) ≤ f ( x n ) − µ | x n +1 − x n | (10) for all n ∈ N ∪ { } ; . if x ∗ is a limit point of the sequence { x n } such that ∈ int (cid:8) G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) + K (cid:12)(cid:12) x ∈ A (cid:9) (that is, Slater’s condition holds for problem (8) ), then x ∗ is a criticalpoint for the problem ( P ) .Proof.
1. Let us prove this statement by induction in n . By our assumption x is feasible for the problem ( P ), which implies that x ∈ Ω( x ), that is, thefeasible region Ω( x ) of the convex subproblem is nonempty. Inductive step.
Suppose that for some n ∈ N the point x n is feasible for theproblem ( P ) and Ω( x n ) is nonempty. Let us prove that x n +1 is feasible for theproblem ( P ). Then x n +1 ∈ Ω( x n +1 ), i.e. Ω( x n +1 ) = ∅ , and the proof of thefirst statement is complete.Indeed, by definition the point x n +1 is a globally optimal solution of theconvex subproblem on Step 3 of the algorithm, which implies that G ( x n +1 ) − H ( x n ) − DH ( x n )( x n +1 − x n ) (cid:22) K , x n +1 ∈ A. By Lemma 1 one has − H ( x n +1 ) (cid:22) K − H ( x n ) − DH ( x n )( x n +1 − x n ) . Therefore F ( x n +1 ) = G ( x n +1 ) − H ( x n +1 ) (cid:22) K
0, i.e. the point x n +1 is feasiblefor the problem ( P ).2. If a point x n is not critical, then, as was noted above, x n is not a solutionof the convex subproblem on Step 3 of Algorithm 1, which implies that g ( x n +1 ) − h v n , x n +1 − x n i < g ( x n ) . Subtracting h ( x n ) from both sides of this inequality and applying the definitionof subgradient one obtains that f ( x n +1 ) < f ( x n ). Hence bearing in mind thefacts that the sequence { x n } is feasible and f is bounded below on the feasibleregion one gets that the sequence { f ( x n ) } converges.3. Fix any n ∈ N . Due to the strong convexity of h one has h ( x n +1 ) − h ( x n ) ≥ h v n , x n +1 − x n i + µ | x n +1 − x n | . Furthermore, by the definition of x n +1 one has g ( x n ) ≥ g ( x n +1 ) − h v n , x n +1 − x n i . Summing up these two inequalities one obtains that (10) holds true.4. By our assumption there exists a subsequence { x n k } converging to x ∗ .The corresponding sequence { v n k } of subgradients of the function h is bounded,since the subdifferential mapping of a finite convex function is locally bounded(see, e.g. [52, Crlr. 24.5.1]). Therefore, replacing, if necessary, the sequence { x n k } with its subsequence one can suppose that the sequence of subgradients { v n k } converges to some vector v ∗ belonging to ∂h ( x ∗ ) due to the fact that thegraph of the subdifferential is closed (see, e.g. [52, Thrm. 24.4]).Arguing by reductio ad absurdum, suppose that x ∗ is not a critical point ofthe problem ( P ). Then, in particular, v ∗ / ∈ ∂g ( x ∗ ) + N Ω( x ∗ ) ( x ∗ ) , x ∗ is not a globally optimal solution ofthe convex problemminimize g ( x ) − h v ∗ , x − x ∗ i subject to G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) (cid:22) K , x ∈ A (11)(see the proof of Corollary 5). Consequently, there exists a feasible point x ofthis problem and θ > g ( x ) − h v ∗ , x − x ∗ i < g ( x ∗ ) − θ .Applying Corollary 4 with Φ = G , Ψ = H , A = A , and K = K one obtainsthat for any z ∈ A lying in a neighbourhood of x ∗ one can find a point ξ ( z ) ∈ A such that G ( ξ ( z )) − H ( z ) − DH ( z )( ξ ( z ) − z ) (cid:22) K ξ ( z ) → x as z → x ∗ .Hence taking into account the facts that the subsequence { x n k } converges to x ∗ , while { v n k } converges to v ∗ , one obtains that there exists k ∈ N such thatfor all k ≥ k one has g ( ξ ( x n k )) − h v n k , ξ ( x n k ) − x n k i ≤ g ( x n k ) − θ ,G ( ξ ( x n k )) − H ( x n k ) − DH ( x n k )( ξ ( x n k ) − x n k ) (cid:22) K . Note that ξ ( x n k ) is a feasible point of the convex subproblem on Step 3 ofAlgorithm 1 for any k ≥ k . Consequently, by the definition of x n k +1 one has g ( x n k +1 ) − h v n k , x n k +1 − x n k i ≤ g ( ξ ( x n k )) − h v n k , ξ ( x n k ) − x n k i≤ g ( x n k ) − θ k ≥ k . Subtracting h ( x n k ) from both sides of this inequality andapplying the definition of subgradient one gets that f ( x n k +1 ) ≤ f ( x n k ) − θ/ k ≥ k . Hence with the use of the second part of this theorem one canconclude that f ( x n ) → −∞ , which contradicts the facts that f is boundedbelow on the feasible set by our assumption and the sequence { x n } is feasibleby the first part of the theorem. Remark . (i) Note that the assumption on the strong convexity of the function h is not restrictive, since if this assumption is not satisfied, for any µ > f = g − h of the objective function f withthe following one: f ( x ) = (cid:16) g ( x ) + µ | x | (cid:17) − (cid:16) h ( x ) + µ | x | (cid:17) , x ∈ R d . (ii) Since by the previous theorem the sequence { f ( x n ) } converges, one can usethe inequality | f ( x n +1 ) − f ( x n ) | < ε (or k x n +1 − x n k < ε , when h is stronglyconvex) as a stopping criterion for Algorithm 1. Observe that in order to apply Algorithm 1, one needs to find a feasible pointof the problem under consideration. In the case when such point is unknown inadvance and is hard to compute, one can use a combination of the DC algorithmand exact penalty techniques that allows one to start iterations at infeasiblepoints. Such modifications of Algorithm 1 were discussed in [40] (and in [35,47]20n the case of inequality constrained problems). Here we present and analyseone such method, called the Penalty Convex-Concave Procedure (Penalty CCP),which is a slight modification of [40, Algorithm 4.2]. This method can be viewedas an extension of DCA2 algorithm from [35,47] to the case of cone constrainedDC optimization problems.A general scheme of DCA2/Penalty CCP for the problem ( P ) is given inAlgorithm 2. The only difference between our method and [40, Algorithm 4.2]is the penalty updates. Namely, in constrast to [40], we increase the penaltyparameter, only if the infeasibility measure at the current iteration exceeds aprespecified threshold. Let us also note that the inequality t ≻ K ∗ t ∈ K ∗ and h t , y i > y ∈ K , y = 0. Algorithm 2:
DCA2/Penalty CCP.
Step 1.
Choose an initial point x ∈ A , penalty parameter t ≻ K ∗ τ max > µ > κ >
0, and set n := 0. Step 2.
Compute v n ∈ ∂h ( x n ) and DH ( x n ). Step 3.
Set the value of x n +1 to a solution of the convex problemminimize ( x,s ) g ( x ) − h v, x − x n i + h t n , s i subject to G ( x ) − H ( x n ) − DH ( x n )( x − x n ) (cid:22) K s, s (cid:23) K , x ∈ A. If x n +1 = x n , Stop . Step 4.
Define t n +1 = ( µt n , if k s n +1 k ≥ κ and µ k t n k ≤ τ max ,t n , otherwise , where ( x n +1 , s n +1 ) is a solution of the subproblem on Step 3. Put n := n + 1 and go to Step 2 .Let us analyse a convergence of Algorithm 2. Firstly, we show that undersome standard assumptions the penalized convex subproblem on Step 3 of thisalgorithm is exact, in the sense that if the norm of the penalty parameter t n issufficiently large, then a solution of the subproblem on Step 3 of Algorithm 2coincides with the solution of the corresponding non-penalized problemminimize g ( x ) − h v, x − x n i subject to G ( x ) − H ( x n ) − DH ( x n )( x − x n ) (cid:22) K , x ∈ A, (12)provided the feasible region of this problem is nonempty. This result implies, inparticular, that if for some n ∈ N the norm of the penalty parameter t n exceedsa certain threshold and the feasible region of problem (12) is nonempty, thenthe next point x n +1 is feasible for the problem ( P ) and the rest of the iterationsof Algorithm 2 coincide with the iterations of Algorithm 1. Thus, in this caseone can ensure the convergence of a sequence generated by Algorithm 2 to acritical point for the problem ( P ).Before we proceed to the proof of the exactness of the subproblem fromStep 3 of Algorithm 2, let us first provide simple sufficient conditions for the21xistence of globally optimal solutions of this problem and the correspondingnon-penalized problem (12). To this end, recall that a function ϕ : R d → R is called coercive on the set A , if ϕ ( x n ) → + ∞ as n → ∞ for any sequence { x n } ⊂ A such that k x n k → + ∞ as n → ∞ . Lemma 3.
Let the space Y be finite dimensional, the cone K be generating (i.e. K − K = Y ), G be continuous on A , H be continuously Fr´echet differentiableon R d , and the penalty function Φ c ( · ) = f ( · ) + c dist( F ( · ) , − K ) be coercive on A for some c > . Then there exists µ ∗ ≥ such that for any µ ≥ µ ∗ and forall x ∈ R d and v ∈ ∂h ( x ) there exists a globally optimal solution of the penalizedproblem minimize ( x,s ) g ( x ) − h v, x − x i + µ h t , s i subject to G ( x ) − H ( x ) − DH ( x )( x − x ) (cid:22) K s, s (cid:23) K , x ∈ A. (13) Moreover, if the feasible region of the corresponding non-penalized problem minimize g ( x ) − h v, x − x i subject to G ( x ) − H ( x ) − DH ( x )( x − x ) (cid:22) K , x ∈ A (14) is nonempty, then this problem has a globally optimal solution as well.Proof. Indeed, fix any x ∈ R d . Suppose at first that the feasible region ofproblem (14) is nonempty. Arguing in the same way as in the proof of Theorem 4one can check that F ( x ) (cid:22) K G ( x ) − H ( x ) − DH ( x )( x − x ) ∀ x, x ∈ R d , (15)which implies that the feasible region of problem (14) is contained in the feasibleregion of the problem ( P ). From the the coercivity of the penalty functionΦ c ( · ) = f ( · ) + c dist( F ( · ) , − K ) on the set A it follows that the function f iscoercive on the feasible region of the problem ( P ) and, therefore, on the feasibleregion of problem (14) as well (recall that F ( x ) (cid:22) K F ( x ) ∈ − K ). Hencetaking into account the fact that by the definition of subgradient g ( x ) − h v, x − x i ≥ f ( x ) + h ( x ) ∀ x, x ∈ R d . one obtains that the objective function of problem (14) is coercive on the feasibleregion of this problem, which is closed by virtue of our assumptions on G and H . Consequently, there exists a globally optimal solution of problem (14).Let us now consider problem (13). The assumptions of the lemma on G and H guarantee that the feasible region of this problem is closed. Note that a pair( x, s ) ∈ A × K is feasible for this problem iff G ( x ) − H ( x ) − DH ( x )( x − x ) ∈ s − K. Hence bearing in mind the fact that the cone K is generating one gets that thefeasible region of problem (13) is nonempty.Let us check that the objective function ω ( x, s ) = g ( x ) − h v, x − x i + µ h t , s i
22f problem (13) is coercive on the feasible region of this problem, provided µ ≥ ω is not coercive onthe feasible region of problem (13). Then there exist M > { ( x n , s n ) } of feasible points of problem (13) such that k x n k + k s n k → + ∞ as n → ∞ , but ω ( x n , s n ) ≤ M for all n ∈ N . Observe that F ( x n ) (cid:22) K s n for all n ∈ N due to (15), which implies that M ≥ ω ( x n , s n ) ≥ inf n ω ( x n , s ) (cid:12)(cid:12)(cid:12) s (cid:23) K F ( x n ) , s (cid:23) K o ∀ n ∈ N . Let us estimate the infimum on the right-hand side of this inequality. Bearingin mind the facts that t is a continuous linear functional, t ≻ K ∗
0, and K isa closed subset of a finite dimensional normed space one obtains that τ := min n h t , s i (cid:12)(cid:12)(cid:12) s ∈ K, k s k = 1 o > , h t , s i ≥ τ k s k ∀ s ∈ K. Therefore for any n ∈ N one has M ≥ ω ( x n , s n ) ≥ g ( x n ) − h v, x n − x i + µτ inf n k s k (cid:12)(cid:12)(cid:12) s ∈ F ( x n ) + K, s ∈ K o ≥ f ( x n ) + h ( x ) + µτ inf y ∈ K k F ( x n ) + y k = f ( x n ) + h ( x ) + µτ dist( F ( x n ) , − K ) . Hence taking into account the fact that by our assumption the penaly functionΦ c ( · ) = f ( · ) + c dist( F ( · ) , − K ) is coercive on A , one obtains that the sequence { x n } is bounded, provided µ ≥ c/τ . Consequently, k s n k → + ∞ as n → ∞ ,which contradicts the fact that M ≥ ω ( x n , s n ) ≥ min k x k≤ r (cid:0) g ( x ) − h v, x − x i (cid:1) + µτ k s n k for all n ∈ N , where r = sup n ∈ N k x n k . Thus, the function ω is coercive on thefeasible region of problem (13) for any µ ≥ c/τ and for any such µ there existsa globally optimal solution of this problem. Remark . Note that the assumptions on the space Y and the cone K arenot used in the proof of the existence of globally optimal solutions of the non-penalized problem (14). Now we can turn to the proof of the exactness of the penalized problem (13).Introduce the set D = n x ∈ A (cid:12)(cid:12)(cid:12) ∃ x ∈ A : G ( x ) − H ( x ) − DH ( x )( x − x ) (cid:22) K o , i.e. D is the set of all those x ∈ A for which the feasible region of the non-penalized problem (14) is nonepmty. Observe that the feasible region of theproblem ( P ) is contained in D , but in the general case D = R d . Denote by D s = n x ∈ A (cid:12)(cid:12)(cid:12) ∈ int (cid:8) G ( x ) − H ( x ) − DH ( x )( x − x ) + K (cid:12)(cid:12) x ∈ A (cid:9)o . x ∈ A for which the constraint qualification from Corollary 6holds true. It should be noted that in the case when the cone K has nonemptyinterior, this constraint qualification is satisfied iff there exists x ∈ A such that G ( x ) − H ( x ) − DH ( x )( x − x ) ∈ − int K, that is, iff Slater’s condition for the non-penalized problem (14) holds true (see,e.g. [4, Prp. 2.106]). Note that by definition D s ⊆ D . Thus, D s is the subsetof D consisting of all those x for which Slater’s condition holds true for thenon-penalized problem.Under some natural assumptions one can verify that the set D is closed (inparticular, it is sufficient to suppose that the feasible region of the problem ( P )is bounded, G is continuous, and H is continuously differentiable), while theset D s is open in A . Therefore, there are some degenerate points x ∈ D \ D s (e.g. the ones that lie on the boundary of D in A ) for which one must imposesome additional assumptions. Our aim is to first provide somewhat cumbersomesufficient conditions for the exactness of the penalized problem (13) for the entireset D or its arbitrary subset, and then show that these conditions are satisfiedfor any compact subset of D s . The sufficient conditions that we present hereare based on a uniform local error bound for the non-penalized problem (14).To simplify the formulations and proofs of the statements below, for any z ∈ R d introduce the convex function F z ( x ) = G ( x ) − H ( z ) − DH ( z )( x − z ), x ∈ R d , and the set-valued mapping M z ( x ) = ( G ( x ) − H ( z ) − DH ( z )( x − z ) + K, if x ∈ A, ∅ , if x / ∈ A. The multifunction M z is convex and closed, provided the function G is contin-uous on A . Proposition 2.
Let K be finite dimensional, G be continuous on A , H becontinuously Fr´echet differentiable on A , and there exist c ≥ such that thepenalty function Φ c ( · ) = f ( · ) + c dist( F ( · ) , − K ) is coercive on A . Let also D ⊆ D be a nonempty set for which one can find a > , L g > , and L h > such that for any z ∈ D and v ∈ ∂h ( z ) one has k v k ≤ L h and there exist r > and a globally optimal solution x ∗ of the problem minimize g ( x ) − h v, x − z i subject to G ( x ) − H ( z ) − DH ( z )( x − z ) (cid:22) K , x ∈ A (16) such that g is Lipschitz continuous near x ∗ with Lipschitz constant L g and dist( F z ( x ) , − K ) ≥ a dist( x, M − z (0)) ∀ x ∈ B ( x ∗ , r ) ∩ A. (17) Then there exists µ ∗ ≥ such that for all µ ≥ µ ∗ and for any z ∈ D and v ∈ ∂h ( z ) there exists a globally optimal solution of the penalized problem minimize ( x,s ) g ( x ) − h v, x − z i + µ h t , s i subject to G ( x ) − H ( z ) − DH ( z )( x − z ) (cid:22) K s, s (cid:23) K , x ∈ A, (18) and a pair ( x ∗ , s ∗ ) is a solution of this problem if and only if s ∗ = 0 and x ∗ isa solution of the corresponding non-penalized problem (16) . roof. Fix any z ∈ D and v ∈ ∂h ( z ), and denote by ω µ ( x, s ) = g ( x ) − h v, x − z i + µ h t , s i . the objective function of problem (18). Arguing in the same way as in the proofof Lemma 3, one can check that there exists τ > h t , s i ≥ τ k s k forall s ∈ K . Therefore, for any feasible point ( x, s ) of problem (18) one has ω µ ( x, s ) ≥ g ( x ) − h v, x − z i + µτ inf (cid:8) k s k (cid:12)(cid:12) s (cid:23) K F z ( x ) , s (cid:23) K (cid:9) ≥ g ( x ) − h v, x − z i + µτ inf (cid:8) k s k (cid:12)(cid:12) s ∈ F z ( x ) + K (cid:9) = g ( x ) − h v, x − z i + µτ dist( F z ( x ) , − K ) . Let x ∗ be a globally optimal solution of the non-penalized problem (16) fromthe formulation of the proposition (optimal solutions of this problem exist byLemma 3). Observe that by definition the set M − z (0) coincides with the feasibleregion of problem (16). Therefore, by [10, Prp. 2.7] there exists δ > g ( x ) − h v, x − z i ≥ g ( x ∗ ) − h v, x ∗ − z i − ( L g + L h ) dist( x, M − z (0))for all x ∈ B ( x ∗ , δ ) ∩ A . Consequently, applying inequality (17) one obtains that ω µ ( x, s ) ≥ g ( x ∗ ) − h v, x ∗ − z i + ( µτ a − L g − L h ) dist( x, M − z (0))for any feasible point ( x, s ) of problem (18) such that x ∈ B ( x ∗ , δ ). Hence forany such ( x, s ) one has ω µ ( x, s ) ≥ g ( x ∗ ) − h v, x ∗ − z i = ω ( x ∗ , ∀ µ ≥ µ ∗ := L g + L h τ a , that is, ( x ∗ ,
0) is a locally optimal solution of problem (18) for any µ ≥ µ ∗ . Tak-ing into account the fact that this problem is convex, one gets that for any such µ the pair ( x ∗ ,
0) is a globally optimal solution of problem (18). Furthermore,since for any other globally optimal solution b x of the non-penalized problem(16) one has ω µ ( x ∗ ,
0) = ω µ ( b x, any globally optimalsolution b x of the non-penalized problem (16) and for all µ ≥ µ ∗ the pair ( b x, b x,
0) is a globally optimal solution of the penal-ized problem (18), then b x is necessarily a globally optimal solution of the non-penalized problem (16). In addition, for any x ∈ A and s ∈ K \ { } one has ω µ ( x, s ) = g ( x ) − h v, x − z i + µ h t , s i > g ( x ) − h v, x − z i + µ ∗ h t , s i = ω µ ∗ ( x, s ) ≥ ω µ ∗ ( x ∗ ,
0) = ω µ ( x ∗ , µ > µ ∗ , that is, for any µ > µ ∗ globally optimal solution of problem (18)necessarily have the form ( b x, µ > µ ∗ a pair ( b x, b s ) is a globallyoptimal solution of the penalized problem (18) iff b s = 0 and b x is a solution ofthe corresponding non-penalized problem. Since z ∈ D and v ∈ ∂h ( z ) werechosen arbitrarily and µ ∗ does not depend on z and v , one can conclude thatthe statement of the proposition holds true. Corollary 7.
Let K be finite dimensional, G be continuous on A , H be contin-uously Fr´echet differentiable on A , and there exist c ≥ such that the penalty unction Φ c ( · ) = f ( · )+ c dist( F ( · ) , − K ) is coercive on A . Then for any compactsubset D ⊆ D s there exists µ ∗ ≥ such that for all µ ≥ µ ∗ and for any z ∈ D and v ∈ ∂h ( z ) there exists a globally optimal solution of the penalized problem (18) and this problem is exact, in the sense that a pair ( x ∗ , s ∗ ) is a solution ofthis problem iff s ∗ = 0 and x ∗ is a solution of the corresponding non-penalizedproblem (16) .Proof. Let us verify that for any z ∈ D s there exists r > D = B ( z, r ) ∩ A .Then one can easily verify that these assumptions are satisfied for any compactsubset D ⊆ D s .Fix any z ∈ D s and choose some x ∈ A such that 0 ∈ M z ( x ). By thedefinition of the set D s one has 0 ∈ int M z ( R d ). Hence by [51, Thrm. 1] thereexists η > ηB Y ⊆ M z ( x + B R d ).From the fact that H is continuously Fr´echet differentiable it follows thatthere exists r < min { , η/ k DH ( z ) k ) } such that k H ( u ) − H ( z ) k ≤ η , k DH ( u ) − DH ( z ) k ≤ η k x k + k z k )for any u ∈ B ( z, r ) ∩ A . Choose any y ∈ M z ( x + B R d ). By definition thereexist x ∈ ( x + B R d ) ∩ A and w ∈ K such that y = F z ( x ) + w . Observe that k F u ( x ) + w − y k = k F u ( x ) − F z ( x ) k ≤ k H ( u ) − H ( z ) k + k DH ( u ) − D ( z ) kk x − u k + k DH ( z ) kk u − z k ≤ η u ∈ B ( z, r ) ∩ A , which implies that ηB Y ⊆ M z ( x + B R d ) ⊆ M u ( x + B R d ) + η B Y ∀ u ∈ B ( z, r ) ∩ A. Consequently, by [51, Lemma 2] one has η B Y ⊆ M u ( x + B R d ) ∀ u ∈ B ( z, r ) ∩ A, (19)which with the use of [51, Thrm. 2] yields that for all x ∈ R d and u ∈ B ( z, r ) ∩ A one has dist( x, M − u (0)) ≤ η (cid:0) k x − x k (cid:1) dist(0 , M u ( x )) (20)Let us show that one can find R > u ∈ B ( z, r ) ∩ A and v ∈ ∂h ( u ) globally optimal solutions of the problemmin g ( x ) − h v, x − u i s.t. G ( x ) − H ( u ) − DH ( u )( x − u ) (cid:22) K , x ∈ A (21)(which exist by Lemma 3) lie in the ball B (0 , R ). Then taking into account thefact that by definition dist(0 , M u ( x )) = dist( F u ( x ) , − K ) one obtains that forall u ∈ B ( z, r ) ∩ A and v ∈ ∂h ( u ), and for any globally optimal solution x ∗ ofproblem (21) the following inequality holds true:dist( F z ( x ) , − K ) ≥ η (2 + R + k x k ) dist( x, M − z (0)) ∀ x ∈ B ( x ∗ , ∩ A (cf. (17)). Moreover, one can take as L g > g on theset B (0 , R +1) (recall that a convex function finite on R d is Lipschitz continuous26n bounded sets; see, e.g. [52, Thrm. 10.4]), while the existence of L h such that k v k ≤ L h for all v ∈ ∂h ( u ) and u ∈ B ( z, r ) follows from the local boundednessof the subdifferential mapping [52, Crlr. 24.5.1]. Therefore, all assumptions ofProposition 2 are satisfied for D = B ( z, r ) ∩ A , and one can conclude that thecorollary holds true.Thus, it remains to prove that globally optimal solutions of problem (21) liewithin some ball B (0 , R ). Indeed, by the definition of subgradient g ( x ) − h v, x − u i ≥ g ( x ) − h ( x ) + h ( u ) ≥ f ( x ) + C , C := min u ∈ B ( z,r ) ∩ A h ( u ) . Furthermore, from inclusion (19) it follows that for any u ∈ B ( z, r ) ∩ A thereexists x ( u ) ∈ x + B R d such that 0 ∈ M u ( x ( u )), i.e. x ( u ) is a feasible pointof problem (21). Finally, as was noted in the proof of Lemma 3, the feasibleregion of problem (21) is contained in the feasible region of the problem ( P ),which we denote by Ω. Therefore globally optimal solutions of problem (21) arecontained in the set S := { x ∈ Ω | f ( x ) ≤ | C | + C } , where C := sup u ∈ B ( z,r ) ∩ A (cid:0) g ( x ( u )) − h v, x ( u ) − u i (cid:1) ≤ sup x ∈ x + B R d g ( x ) + L h (cid:16) k x k + 1 + k z k + r (cid:17) < + ∞ . It remains to note that the set S does not depend on u ∈ B ( z, r ) ∩ A and v ∈ ∂h ( u ), and is contained in some ball B (0 , R ), since Ω = { x ∈ A | F ( x ) ∈ − K } and by our assumption the penalty function Φ c ( · ) = f ( · ) + c dist( F ( · ) , − K ) iscoercive on A . Remark . Let a sequence { x n } be generated by Algorithm 2 and supposethat there exists m ∈ N such that either the assumptions of Proposition 2 aresatisfied for some set D ⊆ D containing the sequence { x n } n ≥ m or this sequenceis contained in a compact subset of the set D s (note that since D s is an open set,it is sufficient to suppose that the sequence x n converges to a point x ∗ ∈ D s ).Then there exists a threshold τ ∗ > k ≥ m one has k t k k ≥ τ ∗ , then the sequence { x n } n ≥ k +1 is feasible for the problem ( P ) andcoincides with a sequence generated by Algorithm 1 with starting point x k +1 .In this case one can apply Theorem 5 to analyse the behaviour of the sequence { x n } n ≥ k +1 and its convergence to a critical point for the problem ( P ). Notethat to prove this result one must suppose that τ max > τ ∗ , i.e. the maximaladmissible norm of the penalty parameters t n is sufficiently large.Let us give a simple example illustrating Proposition 2 and Corollary 7, aswell as behaviour of sequences generated by Algorithms 1 and 2. Example 6.
Let d = 1, Y = R , and K = R + , i.e. y (cid:22) K y means that y ≤ y for all y , y ∈ R . Consider the following inequality constrained DCoptimization problem:min ( x − . subject to x − x ≤ . (22)We define g ( x ) = ( x − . , h ( x ) = 0, G ( x ) = x , and H ( x ) = x for all x ∈ R . The feasible region has the form Ω = ( −∞ , − ∪ { } ∪ [1 , + ∞ ). Thepoints x ∗ = 1 and x ∗ = 0 are globally optimal solutions of problem (22), while27he point x ∗ = − z ∈ R the linearized convex problem for problem (22) has the formmin x ( x − . subject to x − z − z ( x − z ) ≤ . (23)The inequality constraint can be rewritten as follows: (cid:0) x − z (cid:1) − z (cid:18) z − (cid:19) ≤ . Therefore D = −∞ , − √ ∪ { } ∪ " √ , + ∞ ! , D s = int D , that is, the feasible region of problem (23) is nonempty iff z ∈ D , and Slater’scondition holds true for this problem iff z ∈ int D = D s . Furthermore, for z = 0the feasible region of problem (23) consists of the single point x = 0, while forany z ∈ D , z = 0, the feasible region has the form " z − z r z − , z + 2 z r z − . As was noted multiple times above, this set is contained in the feasible region ofproblem (22), which implies that for any z ≥ √ / , + ∞ ),while for any z ≤ −√ / −∞ , − { x n } generated by Algorithm 1, i.e. x n +1 is defined as a solution of the problemmin x ( x − . subject to x − x n − x n ( x − x n ) ≤ , is contained in the set ( −∞ , − x ≤ −
1, and in the set [1 , + ∞ ), if x ≥ x = 0 one has x n ≡
0. Moreover, one can easily check that allassumptions of Theorem 5 are satisfied, and x n +1 > x n for all n ∈ N , if x ≤ − x n +1 < x n for all n ∈ N , if x ≥
1. Therefore, a sequence { x n } generatedby Algorithm 1 converges to the locally optimal solution x ∗ = −
1, if x ≤ − x ∗ = 1, if x ≥
1. This exampleshows that if the feasible region of a problem under consideration consists ofseveral disjoint convex components, then a sequence generated by Algorithm 1lies within the component containing the initial guess x and converges to acritical point from this component, i.e. a sequence generated by Algorithm 1cannot jump from one convex component of the feasible region to another. Letus note that one can easily prove this result in the general case.Let us now consider Algorithm 2. To this end, we first analyse the exactnessof the penalized subproblem that has the formmin ( x,s ) ( x − . + µt s subject to x − z − z ( x − z ) ≤ s, s ≥ , (24)where t >
0. One can easily verify that ( x ∗ , s ∗ ) is a globally optimal solutionof this problem iff s ∗ = max { x − z − z ( x − z ) , } and x ∗ is a globally optimalsolution of the unconstrained problemmin ( x − . + µt max { x − z − z ( x − z ) , } . z ∈ D \ D s this problem takes the formmin ( x − . + µt ( x − z ) . Clearly, for any µ > { z } . Thus, the penalized problem (24)is not exact for all z ∈ D \ D s (one can verify that this result is connected tothe fact that error bound (17) from Proposition 2 is not valid for such z ).By Corollary 7 for any compact subset D ⊂ D s the penalized problem (24)is exact for all z ∈ D , in the sense that there exists µ ∗ ≥ µ ≥ µ ∗ a pair ( x ∗ , s ∗ ) is a globally optimal solution of problem (24) iff s ∗ = 0and x ∗ is a globally optimal solution of the non-penalized problem (23). Denotethe greatest lower bound of all such µ ∗ by µ ∗ ( D ).One can verify that problem (24) is not exact for all z ∈ D s simultaneous,due to the fact that the µ ∗ ( { z } ) → + ∞ as z tends to the boundary of D s . Forthe sake of shortness, we do not present a detailed proof of this result and leaveit to the interested reader. Here we only mention that this result can be provedby noting that µ ∗ ( { z } ) is equal to the norm of an optimal solution of the dualproblem of (23) divided by t .Let us now consider the performance of Algorithm 2. To this end, put x = − t = 1, µ = 2, κ = 10 − , and τ max = 1024 in Algorithm 2. Note thatthe initial point x is critical for problem (22), but is not a globally optimalsolution of this problem. Solving the penalized problem (24) with z = x one obtains that x = − .
75. Thus, Algorithm 2, unlike the DC algorithm,managed to escape a convex component of the feasible region containing theinitial guess and, furthermore, to “jump off” from a point of local minimum.Numerical simulation showed that the sequence { x n } generated by Algorithm 2converges to the point x ∗ ≈ . τ max = + ∞ and κ = 0, then thesequence converges to the globally optimal solution x ∗ = 0. However, note thatif one chooses t ≥ µ ∗ ( {− } ) = 1 .
5, then the method terminates after the firstiteration with x = x .Thus, it seems advisable to choose t with sufficiently small norm (and maybeeven perform several iterations before increasing the penalty parameter), toenable Algorithm 2 to find a better solution. Moreover, even if a feasible point x is known, it is reasonable to use Algorithm 2 instead of Algorithm 1 due tothe ability of the penalized method to escape convex components of the feasibleregion and find better locally optimal solutions than the original method. In the general case, the feasible region of the non-penalized problemminimize g ( x ) − h v, x − x n i subject to G ( x ) − H ( x n ) − DH ( x n )( x − x n ) (cid:22) K , x ∈ A, (see (12)) might be empty for all n ∈ N . Then a sequence { x n } generatedby Algorithm 2 is infeasible for the problem ( P ), and Proposition 2 along withCorollary 7 do not allow one to say anything about a convergence of the method.Moreover, even if τ max = + ∞ , i.e. the norm of t n can increase unboundedly,there is no guarantee that limit points of the sequence { x n } are feasible for the29riginal problem. To avoid such pathological cases, one usually either adopts an‘a priori approach’ and supposes that a suitable constraint qualification holdstrue at all infeasible points (this approach was widely used, e.g. for convergenceanalysis of exact penalty methods in [48]) or adopts an ‘a posteriori approach’and supposes that a sequence generated by the method converges to a point,at which an appropriate constraint qualification holds true (such approach wasused, e.g. for an analysis of trust region methods in [7]). For the sake ofcompleteness, we present two convergence theorems for Algorithm 2, one ofwhich is based on the a priori approach, while the other one is based on thea posteriori one and was hinted at in Remark 8. Both these theorems ensurethe convergence of Algorithm 2 to a feasible and critical point, provided τ max issufficiently large.We start with the a priori approach. To this end we need to introduce thefollowing extension of the definition of critical point to the case of infeasiblepoints. Definition 2.
A point x ∗ ∈ A is said to be a generalized critical point for vector t ≻ K ∗
0, if there exist v ∗ ∈ ∂h ( x ∗ ) and s ∗ (cid:23) K x ∗ , s ∗ ) isa globally optimal solution of the problemmin ( x,s ) g ( x ) − h v ∗ , x − x ∗ i + h t, s i s.t. G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) (cid:22) K s, s (cid:23) K , x ∈ A. (25)Let us give two useful characterizations of the generalized criticality. Proposition 3.
Let x ∗ ∈ A and t ≻ K ∗ be given. The following statementshold true:1. x ∗ is a generalized critical point for t iff there exist v ∗ ∈ ∂h ( x ∗ ) , s ∗ (cid:23) K ,and λ ∗ , µ ∗ ∈ K ∗ such that F ( x ∗ ) − s ∗ (cid:22) K , t = λ ∗ + µ ∗ , and ∈ ∂ x L ( x ∗ , λ ∗ ) + N A ( x ∗ ) , h λ ∗ , F ( x ∗ ) + s ∗ i = 0 , h µ ∗ , s ∗ i = 0 , where L ( x, λ ) = g ( x ) − h v ∗ , x − x ∗ i + h λ, G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) i ;2. if x ∗ is feasible for the problem ( P ) and is a generalized critical point for t ,then x ∗ is a critical point for the problem ( P ) ; conversely, if x ∗ is a criticalpoint for the problem ( P ) satisfying optimality conditions from Corollary 6for some λ ∗ ∈ K ∗ such that t (cid:23) K ∗ λ ∗ , then x ∗ is a generalized criticalpoint for t .Proof.
1. Problem (25) can be rewritten as a convex cone constrained optimiza-tion problem of the formminimize ( x,s ) g ( x ) − h v ∗ , x − x ∗ i + h t, s i subject to x ∈ A, b F ( x, s ) = (cid:18) G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) − s − s (cid:19) ∈ (cid:18) − K − K (cid:19) . (26)Note that the following constraint qualification holds true for this problem:0 ∈ int n b F ( x, s ) + K × K (cid:12)(cid:12)(cid:12) x ∈ A, s ∈ Y o . x ∗ is a generalized critical point for t iff there exist v ∗ ∈ ∂h ( x ∗ ) and s ∗ (cid:23) K x ∗ , s ∗ ) satisfies the KKT optimality conditions forproblem (26) (see, e.g. [4, Thrm. 3.6]). Rewriting the KKT optimality conditionsin terms of problem (25) we arrive at the require result.2. Let x ∗ be a generalized critical point for t . Then by definition thereexist v ∗ ∈ ∂h ( x ∗ ) and s ∗ (cid:23) K x ∗ , s ∗ ) is a globally optimalsolution of problem (25). Since the point x ∗ is feasible for the problem ( P ), thepair ( x ∗ ,
0) is feasible for problem (25). Moreover, one has g ( x ∗ ) ≤ g ( x ∗ ) + h t, s ∗ i , since s ∗ (cid:23) K t ≻ K ∗
0. Therefore the pair ( x ∗ ,
0) is a globallyoptimal solution of problem (25), which obviously implies that x ∗ is a globallyoptimal solution of the problemmin g ( x ) − h v ∗ , x − x ∗ i s.t. G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) (cid:22) K , x ∈ A. or, equivalently, x ∗ is a critical point for the problem ( P ).Suppose now that x ∗ is a critical point for the problem ( P ) satisfying opti-mality conditions from Corollary 6 for some λ ∗ ∈ K ∗ such that t (cid:23) K ∗ λ ∗ . Thenone can easily verify that the pair ( x ∗ ,
0) satisfies optimality conditions fromthe first part of this proposition with µ ∗ = t − λ ∗ , which implies that x ∗ is ageneralized critical point for t . Remark . (i) From the proposition above it follows that if x ∗ is a critical point,but the inequality t (cid:23) K λ ∗ is not satisfied for any corresponding Lagrange mul-tiplier λ ∗ (roughly speaking, the penalty parameter is smaller then the Lagrangemultiplier), then x ∗ cannot be a generalized critical point for t . Indeed, if x ∗ isa generalized critical point for t , then from the proof of the second part of theproposition it follows that ( x ∗ ,
0) is a globally optimal solution of problem (26).Applying KKT optimality conditions to this problem one gets that t = λ ∗ + µ ∗ for some µ ∗ ∈ K ∗ and some Lagrange multiplier λ ∗ . Consequently, t (cid:23) K λ ∗ ,which is impossible.(ii) With the use of the first part of the previous proposition one can readilyverify that a (not necessarily feasible) point x ∗ is a generalized critical pointfor some t ∈ R m with t ( i ) > i ∈ I := { , . . . , m } , of the smooth inequalityconstrained DC optimization problemmin f ( x ) = g ( x ) − h ( x ) s.t. f i ( x ) = g i ( x ) − h i ( x ) ≤ t ( x ) = f ( x ) + P mi =1 t ( i ) max { , f i ( x ) } one has0 ∈ ∂ Φ t ( x ∗ ) = ∇ f ( x ∗ )+ X i ∈ I : f i ( x ∗ ) > t ( i ) ∇ f i ( x ∗ )+ X i ∈ I : f i ( x ∗ )=0 t ( i ) co { , ∇ f i ( x ∗ ) } or, equivalently, iff there exists λ ∗ ∈ R m such that ∇ f ( x ∗ ) + m X i =1 λ ( i ) ∗ ∇ f i ( x ∗ ) = 0 , t ( i ) ≥ λ ( i ) ∗ ≥ ∀ i ∈ I, and for all i ∈ I one has λ ( i ) ∗ = 0 whenever f i ( x ∗ ) <
0, while t ( i ) = λ ( i ) ∗ whenever f i ( x ∗ ) >
0. With the use of this result one can show that the point x ∗ is nota generalized stationary point for µt with µ >
1, provided a suitable constraintqualification holds true at x ∗ . Thus, generalized criticality depends on the choice31f the penalty parameter t and in many cases its increase or decrease might helpto escape a generalized critical point.(iii) As was noted above, a generalized critical point x ∗ is, in essence, a criticalpoint of the penalty function Φ t , i.e. such point that 0 ∈ ∂ Φ t ( x ∗ ), where ∂ Φ t ( x )is the Dini subdifferential of Φ t at x . Various conditions ensuring that there areno infeasible critical points of a penalty function were studied in detail in [10–12].Before we proceed to convergence analysis, let us also establish an importantproperty of a sequence generated by Algorithm 2, which, in particular, leads toa natural stopping criterion for this method. Lemma 4.
Let a sequence { ( x n , s n ) } be generated by Algorithm 2. Then f ( x n +1 ) + h t n , s n +1 i ≤ f ( x n ) + h t n , s n i ∀ n ∈ N . (27) Moreover, this inequality is strict, if x n is not a generalized critical point for t n .Proof. By definition ( x n +1 , s n +1 ) is a globally optimal solution of the problemmin ( x,s ) g ( x ) − h v n , x − x n i + h t n , s i s.t. G ( x ) − H ( x n ) − DH ( x n )( x − x n ) (cid:22) K s, s (cid:23) K , x ∈ A, (28)while the pair ( x n , s n ) satisfies the following conditions: G ( x n ) − H ( x n − ) − DH ( x n − )( x n − x n − ) (cid:22) K s n , s n (cid:23) k , x n ∈ A. With the use of Lemma 1 one obtains that G ( x n ) − H ( x n ) (cid:22) K s n , which impliesthat ( x n , s n ) is a feasible point of problem (28). Therefore g ( x n +1 ) − h v n , x n +1 − x n i + h t n , s n +1 i ≤ g ( x n ) + h t n , s n i ∀ n ∈ N . (29)Subtracting h ( x n ) from both sides of this inequality and applying the defini-tion of subgradient one obtains that inequality (27) holds true. It remains tonote that if x n is not a generalized critical point for t n , then by definition theinequality in (29) is strict, which implies that inequality (27) is strict as well. Remark . From the lemma above it follows that one can use the inequality (cid:12)(cid:12)(cid:12) f ( x n +1 ) + h t n , s n +1 i − f ( x n ) − h t n , s n i (cid:12)(cid:12)(cid:12) < ε as a stopping criterion for Algorithm 2.Now we can provide sufficient conditions for the convergence of a sequencegenerated by Algorithm 2 to a feasible and critical point for the problem ( P ),based on the a priori approach to convergence analysis. Theorem 6.
Let the space Y be finite dimensional, the cone K be generating, G be continuous on A , H be continuously Fr´echet differentiable on A , and thepenalty function Φ c ( x ) = f ( x ) + c dist( F ( x ) , − K ) be bounded below on A for c = min {h t , s i | s ∈ K, k s k = 1 } > . Then all limits points of a sequence { x n } generated by Algorithm 2 are generalized critical points for t ∗ = lim t n .Suppose, in addition, that all points from the set n x ∈ A (cid:12)(cid:12)(cid:12) dist( F ( x ) , − K ) > κ o re not generalized critical points for b t = µ p t , where p ∈ N is the largest naturalnumber satisfying the inequality k µ p t k ≤ τ max . Then all limit points x ∗ of thesequence { x n } satisfy the inequality dist( F ( x ∗ ) , − K ) ≤ κ . In particular, if κ = 0 , then all limits points of the sequence { x n } are feasible and critical forthe problem ( P ) .Proof. Suppose that the first part of the theorem holds true, i.e. all limit pointsof the sequence { x n } are generalized critical points for t ∗ = lim t n (note thatthis limit exists, since according to Step 4 of Algorithm 2 the penalty parametercan be updated only a finite number of times). Let us show that the secondpart of the theorem holds true.Indeed, let x ∗ be a limit point of the sequence { x n } , that is, there exists asubsequence { x n k } converging to x ∗ . Let us consider two cases. Suppose at firstthat the norm of the penalty parameter t n does not reach the upper bound τ max (see Step 4 of Algorithm 2), that is, the penalty parameter is updated less than p times. Then according to the penalty updating rule on Step 4 of Algorithm 2there exists n ∈ N such that k s n k < κ for all n ≥ n . By definition G ( x n ) − H ( x n − ) − DH ( x n − )( x n − x n − ) (cid:22) K s n ∀ n ∈ N , which thanks to Lemma 1 implies that F ( x n ) (cid:22) K s n or, equivalently, one has F ( x n ) − s n ∈ − K . Therefore dist( F ( x n ) , − K ) ≤ k s n k < κ for all n ≥ n .Consequently, passing to the limit in the inequality dist( F ( x n k ) , − K ) < κ with the use of the fact that both G and H are continuous one obtains thatdist( F ( x ∗ ) , − K ) ≤ κ .Suppose now that the norm of t n reaches the upper bound τ max after afinite number of iterations. Then according to Step 4 of Algorithm 2 thereexists n ∈ N such that t n = µ p t for all n ≥ n . By our assumption x ∗ isa generalized critical point for t ∗ = b t = µ p t , which by the assumption of thetheorem implies that it cannot belong to the set { x ∈ A : dist( F ( x ) , − K ) > κ } .Therefore, dist( F ( x ∗ ) , − K ) ≤ κ .Finally, if κ = 0, then dist( F ( x ∗ ) , − K ) = 0, that is, F ( x ∗ ) ∈ − K , since thecone K is closed. Consequently, the point x ∗ is feasible for the problem ( P ).Hence by the second part of Proposition 3 the point x ∗ is also a critical for theproblem ( P ).Thus, it remains to prove that all limit points of the sequence { x n } aregeneralized critical points for t ∗ = lim t n . Let a subsequence { x n k } converge tosome point x ∗ . Then the corresponding sequence { v n k } of subgradients of thefunction h is bounded due to to the local boundedness of the subdifferentialmapping [52, Crlr. 24.5.1]. Therefore, replacing, if necessary, the sequence { x n k } with its subsequence one can suppose that the sequence { v n k } converges tosome vector v ∗ belonging to ∂h ( x ∗ ) by virtue of the fact that the graph of thesubdifferential is closed [52, Thrm. 24.4].Let us show that the sequence { s n k } ⊂ K is bounded. Then taking intoaccount the facts that the space Y is finite dimensional and the cone K isclosed, and replacing, if necessary, the sequence { x n k } with its subsequence,one can suppose that { s n k } converges to some s ∗ ∈ K .Indeed, since the penalty parameter t n can be updated only a finite numberof times, there exists n ∈ N such that t n = t n for all n ≥ n . Consequently, byLemma 4 the sequence { f ( x n ) + h t n , s n i} n ≥ n is non-increasing and, in partic-33lar, bounded above. Therefore the sequence { f ( x n k ) + h t , s n k i} is boundedabove as well.Arguing by reductio ad absurdum, suppose that the sequence { s n k } is un-bounded. Then applying the inequality f ( x n k ) + h t , s n k i ≥ f ( x n k ) + c k s n k k one gets that lim sup k →∞ ( f ( x n k )+ h t , s n k i ) = + ∞ , which is impossible. Thus,without loss of generality one can suppose that the sequence { s n k } converges tosome s ∗ . Note that from the definition of ( x n , s n ) and Lemma 1 it follows that F ( x n ) (cid:22) K s n . Therefore F ( x ∗ ) (cid:22) K s ∗ , thanks to the fact that the cone K isclosed.Now we can turn to the proof of the fact that the point x ∗ is a generalizedcritical point for t ∗ . Arguing by reductio ad absurdum, suppose that this state-ment is false. Then, in particular, the point ( x ∗ , s ∗ ) is not a globally optimalsolution of the problemminimize ( x,s ) g ( x ) − h v ∗ , x − x ∗ i + h t ∗ , s i subject to G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) (cid:22) K s, s (cid:23) K , x ∈ A. Therefore there exist a feasible point ( x, s ) of this problem and θ > g ( x ) − h v ∗ , x − x ∗ i + h t ∗ , s i < g ( x ∗ ) + h t ∗ , s ∗ i − θ. Applying Corollary 4 with X = R d × Y andΦ( x, s ) = (cid:18) G ( x ) − s − s (cid:19) , Ψ( x, s ) = (cid:18) H ( x )0 (cid:19) , A = A × Y, K = K × K, one obtains that for any z ∈ A × Y lying in a neighbourhood of ( x ∗ , s ∗ ) one canfind ( ξ ( z ) , ζ ( z )) ∈ A × K such that G ( ξ ( z )) − H ( z ) − DH ( z )( ξ ( z ) − z ) (cid:22) K ζ ( z )and ( ξ ( z ) , ζ ( z )) → ( x, s ) as z → ( x ∗ , s ∗ ). Consequently, there exists k ∈ N such that for any k ≥ k the point ( ξ ( x n k ) , ζ ( x n k )) is feasible for problemmin ( x,s ) g ( x ) − h v n k , x − x n k i + h t n k , s i s.t. G ( x ) − H ( x n k ) − DH ( x n k )( x − x n k ) (cid:22) K s, s (cid:23) K , x ∈ A. (note that one can suppose that t n k = t ∗ , since the penalty parameter is updatedonly a finite number of times) and g ( ξ ( x n k )) − h v n k , ξ ( x n k ) − x n k i + h t ∗ , ζ ( x n k ) i < g ( x n k ) + h t ∗ , s n k i − θ . Therefore by the definition of ( x n , s n ) for any k ≥ k one has g ( x n k +1 ) − h v n k , x n k +1 − x n k i + h t ∗ , s n k +1 i < g ( x n k ) + h t ∗ , s n k i − θ . Subtracting h ( x n k ) from both sides of this inequality and applying the defini-tion of subgradient one obtains that f ( x n k +1 ) + h t ∗ , s n k +1 i < f ( x n k ) + h t ∗ , s n k i − θ ∀ k ≥ k , f ( x n ) + h t ∗ , s n i → −∞ as n → ∞ (recall that t ∗ = t n for any sufficiently large n , since the penalty parameter canbe updated only a finite number of times). On the other hand, as was shownabove (see the proof of Lemma 3), one has f ( x n ) + h t ∗ , s n i ≥ f ( x n ) + h t , s n i ≥ f ( x n ) + c dist( F ( x n ) , − K ) =: Φ c ( x n ) . Consequently, Φ c ( x n ) → −∞ , which contradicts the fact that by our assumptionthis function is bounded below on A . Therefore one can conclude that x ∗ is ageneralized critical point for t ∗ . Remark . Note that in the previous theorem it is sufficient to suppose that thepenalty function Φ c is bounded below on A for c = inf {h t ∗ , s i | s ∈ K, k s k = 1 } ,which is, in the general case, greater than c from the formulation of the theorem.However, such assumption is inconsistent with the a priori approach, since it isbased on the information about the behaviour of the sequence { t n } , which isnot known in advance.Finally, let us consider the a posteriori approach to convergence analysis,which allows one to obtain sufficient conditions for the convergence of Algo-rithm 2 to a critical point for the problem ( P ). Theorem 7.
Let K be finite dimensional, G be continuous on A , H be contin-uously Fr´echet differentiable on A , and there exists c ≥ such that the penaltyfunction Φ c ( · ) = f ( · ) + c dist( F ( · ) , − K ) is coercive on A . Suppose also that asequence { x n } generated by Algorithm 2 with κ = 0 and τ max = + ∞ convergesto a point x ∗ satisfying the following constraint qualification: ∈ int (cid:8) G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) + K (cid:12)(cid:12) x ∈ A (cid:9) (30) (i.e. x ∗ ∈ D s ). Then the sequence { t n } is bounded, there exists m ∈ N suchthat for all n ≥ m the point x n is feasible for the problem ( P ) , and the point x ∗ is feasible and critical for the problem ( P ) .Proof. By our assumption x ∗ ∈ D s . Therefore, as was shown in the proof ofCorollary 7, there exist r > µ ∗ ≥ µ ≥ µ ∗ and for any z ∈ B ( x ∗ , r ) ∩ A and v ∈ ∂h ( z ) the penalized problem (18) is exact. Define τ ∗ = µ ∗ k t k .If the penalty parameter t n is updated only a finite number of times, thenthe sequence { t n } is obviously bounded. Moreover, according to Step 4 ofAlgorithm 2 in this case there exists m ∈ N such that s n = 0 for all n ≥ m , whichimplies that the sequence { x n } n ≥ m is feasible for the problem ( P ). Thereforethe point x ∗ is also feasible for this problem, due to the fact that under ourassumptions the feasible region of the problem ( P ) is closed.On the other hand, if the penalty parameter t n is updated an infinite numberof times, then according to Step 4 of Algorithm 2 there exists m ∈ N such that k t n k ≥ τ ∗ for all n ∈ N . Moreover, increasing m , if necessary, one can supposethat x n ∈ B ( x ∗ , r ) for all n ≥ m . Consequently, the penalized subproblemon Step 3 of Algorithm 2 is exact for all n ≥ m . Hence by the definition ofexactness s n = 0 for all n ≥ m + 1, which contradicts our assumption that t n isupdated an infinite number of times.Thus, the sequence { t n } is bounded and the point x ∗ is feasible for theproblem ( P ). It remains to verify that the point x ∗ is critical. Suppose at35rst that there exists m ∈ N such that k t m k ≥ τ ∗ . Then k t n k ≥ τ ∗ for all n ≥ m . Increasing m , if necessary, one can suppose that x n ∈ B ( x ∗ , r ) for all n ≥ m . Therefore by the definitions of exactness of the penalized problem andAlgorithms 1 and 2 the sequence { x n } n ≥ m +1 is feasible for the problem ( P ) andcoincides with the sequence generated by Algorithm 1 with starting point x n +1 .Therefore by Theorem 5 the point x ∗ is critical for the problem ( P ).Suppose now that k t n k < τ ∗ for all n ∈ N . Then there exists n ∈ N suchthat t n = t n for all n ≥ n . Since the sequence { x n } generated by Algorithm 2converges to x ∗ , the corresponding sequence { v n } of subgradients of the function h is bounded, thanks to the local boundedness of the subdifferential mapping[52, Crlr. 24.5.1]. Consequently, there exists a subsequence { v n k } converging tosome vector v ∗ , which belongs to ∂h ( x ∗ ) due to closedness of the graph of thesubdifferential [52, Thrm. 24.4].Arguing by reductio ad absurdum, suppose that x ∗ is not a critical point forthe problem ( P ). As was noted several times above, it implies that x ∗ is not aglobally optimal solution of the problemminimize g ( x ) − h v ∗ , x − x ∗ i subject to G ( x ) − H ( x ∗ ) − DH ( x ∗ )( x − x ∗ ) (cid:22) K , x ∈ A. Thus, there exist θ > x of this problem satisfying theinequality g ( x ) − h v ∗ , x − x ∗ i < g ( x ∗ ) − θ .Applying Corollary 4 with Φ = G , Ψ = H , A = A , and K = K one obtainsthat for any z ∈ A lying in a neighbourhood of x ∗ one can find ξ ( z ) ∈ A suchthat G ( ξ ( z )) − H ( z ) − DH ( z )( ξ ( z ) − z ) (cid:22) K ξ ( z ) → x as z → x ∗ . Hencebearing in mind the facts that x n k → x ∗ and v n k → v ∗ as k → ∞ , one obtainsthat there exists k ∈ N such that g ( ξ ( x n k )) − h v n k , ξ ( x n k ) − x n k i < g ( x n k ) − θ ∀ k ≥ k . Clearly, one can suppose that n k ≥ n .Recall that by definition ( x n k +1 , s n k +1 ) is a globally optimal solution of thepenalized problemmin ( x,s ) g ( x ) − h v n k , x − x n k i + h t n k , s i s.t. G ( x ) − H ( x n k ) − DH ( x n k )( x − x n k ) (cid:22) K s, s (cid:23) K , x ∈ A. By definition the point ( ξ ( x n k ) ,
0) is feasible for this problem, which impliesthat g ( x n k +1 ) − h v n k , x n k +1 − x n k i + h t n k , s n k +1 i≤ g ( ξ ( x n k )) − h v n k , ξ ( x n k ) − x n k i < g ( x n k ) − θ k ≥ k . Subtracting h ( x n k ) from both sides of this inequality andapplying the definition of subgradient and the fact that t n = t n for all n ≥ n ,one obtains that f ( x n k +1 ) + h t n , s n k +1 i < f ( x n k ) − θ ≤ f ( x n k ) + h t n , s n k i − θ k ≥ k (here we used the facts that by definition s n k ∈ K and h t n , s i ≥ s ∈ K and n ∈ N ). By Lemma 4 one has f ( x n +1 ) + h t n , s n +1 i ≤ f ( x n ) + h t n , s n i ∀ n ≥ n . Consequently, f ( x n ) + h t n , s n i → −∞ as n → ∞ , which contradicts the factsthat x n → x ∗ as n → ∞ and f ( x n )+ h t n , s n i ≥ f ( x n ) for all n ∈ N . Therefore, x ∗ is a critical point, and the proof is complete.Thus, one can conclude that if a sequence { x n } generated by either Algo-rithm 1 or Algorithm 2 converges to a point x ∗ such that Slater’s conditionholds true for the corresponding linearized convex problem, then under somenatural assumptions the point x ∗ is critical for the problem ( P ). In this paper we developed a general theory of cone constrained DC optimiza-tion problems, particularly, DC semidefinite programming problems. To thisend, we studied two definition of DC matrix-valued functions (abstract andcomponentwise) and their interconnections. We proved that any DC matrix-valued function is componentwise DC and demonstrated how one can computea DC decomposition of several nonlinear semidefinite constraints appearing inapplications. We also constructed a DC decomposition of the maximal eigen-value function, which allows one to apply standard results and methods of in-equality constrained DC optimization to problems with smooth and nonsmoothcomponentwise DC semidefinite constraints.In the case of general cone constrained DC optimization problems, we ob-tained local optimality conditions and presented a detailed convergence analysisof the DC algorithm (the convex-concave procedure) and its penalized versionproposed in [40] (see also [35,47]) under the assumption that the concave part ofthe constraints is smooth. In particular, we obtained sufficient conditions for theexactness of the penalty subproblem of the penalized version of the method andanalysed two types of sufficient conditions for the convergence of this methodto a feasible and critical point of a cone constrained DC optimization problemfrom an infeasible starting point. The first type of sufficient condition is theso-called a priori conditions, which are based on general assumptions on theproblem under consideration, while the second type is the a posteriori condi-tions, which rely on some assumptions on a limit point of a sequence generatedby an optimization method. Finally, we presented a simple example demon-strating that even if a feasible starting point is known, it might be reasonable touse the penalized version of the method, since it is sometimes capable of findingdeeper local minimum than the standard method.The main results of this paper pave the way for applications of DC opti-mization methods to various nonlinear semidefinite programming problems andother nonlinear cone constrained optimization problems, such as nonlinear sec-ond order cone programming problems.37 eferences [1] P. A. Absil, R. Mahony, and R. Sepulchre.
Optimization algorithms onmatrix manifolds . Princeton University Press, Princeton, 2009.[2] F. Alizadeh and D. Goldfarb. Second-order cone programming.
Math.Program. , 95:3–51, 2003.[3] A. Ben-Tal and A. Nemirovski.
Lectures on Modern Convex Optimization.Analysis, Algorithms, and Engineering Applications . SIAM, Philadelphia,2001.[4] J. F. Bonnans and A. Shapiro.
Perturbation Analysis of Optimization Prob-lems . Springer, New York, 2000.[5] S. Boyd and L. Vandenberghe.
Convex Optimization . Cambridge UniversityPress, Cambridge, 2004.[6] A. Canelas, M. Carrasco, and J. L´opez. A feasible direction algorithm fornonlinear second-order cone programs.
Optim. Meth. Softw. , 34:1322–1341,2019.[7] A. R. Conn, N. I. M. Gould, and P. L. Toint.
Trust-Region Methods . SIAM,Philadelphia, 2000.[8] W. de Oliveira. Proximal bundle methods for nonsmooth DC programming.
J. Glob. Optim. , 75:523–563, 2019.[9] W. de Oliveira and M. P. Tcheou. An inertial algorithm for DC program-ming.
Set-Valued Var. Anal. , 27:895–919, 2019.[10] M. V. Dolgopolik. A unifying theory of exactness of linear penalty func-tions.
Optim. , 65:1167–1202, 2016.[11] M. V. Dolgopolik. A unifying theory of exactness of linear penalty functionsII: parametric penalty functions.
Optim. , 66:1577–1622, 2017.[12] M. V. Dolgopolik and A. V. Fominyh. Exact penalty functions for optimalcontrol problems I: Main theorem and free-endpoint problems.
Optim.Control Appl. Method , 40:1018–1044, 2019.[13] M. D¨ur, R. Horst, and M. Locatelli. Necessary and sufficient global op-timality conditions for convex maximization revisited.
J. Math. AnalysisAppl. , 217:637–649, 1998.[14] A. Edelman, A. A. Tom´as, and T. S. Smith. The geometry of algorithmswith orthogonality constraints.
SIAM J. Matrix Anal. Appl. , 20:303–353,1998.[15] A. Ferrer and J. E. Mart´ınez-Legaz. Improving the efficiency of DC globaloptimization methods by improving the DC representation of the objectivefunction.
J. Glob. Optim. , 43:513–531, 2009.[16] N. A. Gadhi. Necessary optimality conditions for a nonsmooth semi-infiniteprogramming problem.
J. Glob. Optim. , 74:161–168, 2019.3817] M. Gaudioso, G. Giallombardo, G. Miglionico, and A. M. Bagirov. Mini-mizing nonsmooth DC functions via successive DC piecewise-affine approx-imations.
J. Glob. Optim. , 71:37–55, 2018.[18] M. A. Goberna and M. A. L´opez, editors.
Semi-Infinite Programming:Recent Advances . Kluwer Academic Publishers, Dordrecht, 2001.[19] K.-C. Goh, M. G. Safonov, and J. H. Ly. Robust synthesis via bilinearmatrix inequalities.
Int. J. Robust Nonlinear Control , 6:1079–1095, 1996.[20] K.-C. Goh, M. G. Safonov, and G. P. Papavassilopous. Global optimizationfor the Biaffine Matrix Inequality problem.
J. Glob. Optim. , 7:365–380,1995.[21] P. Hartman. On functions representable as a difference of convex functions.
Pac. J. Math. , 9:707–713, 1959.[22] D. Henrion, S. Tarbouriech, and M. ˇSebek. Rank-one LMI approach tosimultaneous stabilization of linear systems.
Syst. Control Lett. , 38:79–89,1999.[23] J.-B. Hiriart-Urruty. Generalized differentiability/duality and optimizationfor problems dealing with differences of convex functions. In J. Ponstein, ed-itor,
Convexity and Duality in Optimization , pages 37–70. Springer, Berlin,Heidelberg, 1985.[24] J.-B. Hiriart-Urruty. From convex optimization to nonconvex optimization.Necessary and sufficient conditions for global optimality. In F. H. Clarke,V. F. Dem’yanov, and F. Giannessi, editors,
Nonsmooth Optimization andRelated Topics , pages 219–239. Springer, Boston, MA, 1989.[25] J.-B. Hiriart-Urruty. Conditions for global optimality 2.
J. Glob. Optim. ,13:349–367, 1998.[26] R. Horst and N. V. Thoai. DC programming: Overview.
J. Optim. TheoryAppl. , 103:1–43, 1999.[27] A. D. Ioffe and V. M. Tihomirov.
Theory of Extremal Problems . North-Holland, Amsterdam, 1979.[28] K. Joki, A. M. Bagirov, N. Karmitsa, M. M¨akel¨a, and S. Taheri. Dou-ble bundle method for finding Clarke stationary points in nonsmooth DCprogramming.
SIAM J. Optim. , 28:1892–1919, 2018.[29] R. V. Kadison. Order properties of bounded self-adjoint operators.
Proc.Amer. Math. Soc. , 2:505–510, 1951.[30] N. Kanzi. Necessary optimality conditions for nonsmooth semi-infinite pro-gramming problems.
J. Glob. Optim. , 49:713–725, 2011.[31] H. Kato and M. Fukushima. An SQP-type algorithm for nonlinear second-order cone programs.
Optim. Lett. , 1:129–144, 2007.[32] M. Koˇcvara and M. Stingl. PENNON: A code for convex nonlinear andsemidefinite programming.
Optim. Methods Softw. , 18:317–333, 2003.3933] A. G. Kusraev and S. S. Kutateladze.
Subdifferentials: Theory and Appli-cations . Kluwer Academic Publishers, Dordrecht, 1995.[34] G. R. Lanckreit and B. K. Sriperumbudur. On the convergence of theconcave-convex procedure.
Adv. Neural Inf. Process. Syst. , 22:1759–1767,2009.[35] H. A. Le Thi, V. N. Nuynh, and T. Pham Dinh. DC programming andDCA for general DC programs. In T. van Do, H. A. L. Thi, and N. T.Nguyen, editors,
Advanced Computational Methods for Knowledge Engi-neering , pages 15–35. Springer, Berlin, Heidelberg, 2014.[36] H. A. Le Thi, V. N. Nuynh, and T. Pham Dinh. Convergence analysisof difference-of-convex algorithm with subanalytic data.
J. Optim. TheoryAppl. , 179:103–126, 2018.[37] H. A. Le Thi and T. Pham Dinh. DC programming and DCA: thirty yearsof developments.
Math. Program. , 169:5–68, 2018.[38] H. A. Le Thi, T. Pham Dinh, and L. D. Muu. Numerical solution foroptimization over the efficient set by D.C. optimization algorithm.
Oper.Res. Lett. , 19:117–128, 1996.[39] H. A. Le Thi, T. Pham Dinh, and N. V. Thoai. Combination between globaland local methods for solving an optimization problem over the efficient set.
Eur. J. Oper. Res. , 142:258–270, 2002.[40] T. Lipp and S. Boyd. Variations and extension of the convex-concave pro-cedure.
Optim. Eng. , 17:263–287, 2016.[41] J. H. Manton. Optimization algorithms exploiting unitary constraints.
IEEE Trans. Signal Process. , 50:635–650, 2002.[42] B. S. Mordukhovich and T. Nghia. Nonsmooth cone-constrained optimiza-tion with applications to semi-infinite programming.
Math. Oper. Res. ,39:301–324, 2014.[43] Y. Nesterov and A. Nemirovskii.
Interior-Point Polynomial Algorithms inConvex Programming . SIAM, Philadelphia, 1994.[44] Y.-S. Niu and T. P. Dinh. DC programming approaches for BMI andQMI feasibility problems. In T. van Do, H. Thi, and N. Nguyen, editors,
Advanced Computational Methods for Knowledge Engineering. , pages 37–63. Springer, Cham, 2014.[45] N. S. Papageorgiou. Nonsmooth analysis on partially ordered vectorsspaces: part 1 — convex case.
Pac. J. Math. , 107:403–458, 1983.[46] T. Pham Dinh and H. A. Le Thi. D.C. optimization algorithms for solvingthe trust region subproblem.
SIAM J. Optim. , 8:476–505, 1998.[47] T. Pham Dinh and H. A. Le Thi. Recent advances in DC programmingand DCA. In N. T. Nguyen and H. A. L. Thi, editors,
Transactions onComputational Intelligence XIII , pages 1–37. Springer, Berlin, Heidelberg,2014. 4048] E. Polak.
Optimization: Algorithms and Consistent Approximations .Springer-Verlag, New York, 1997.[49] R. Reemtsen and J.-J. R¨uckmann, editors.
Semi-Infinite Programming .Kluwer Academic Publishers, Dordrecht, 1998.[50] T. Pham Dinh and E. B. Souad. Algorithms for solving a class of nonconvexoptimization problems. Methods of subgradients. In J.-B. Hiriart-Urruty,editor,
Fermat Days 85: Mathematics for Optimization. North-HollandMathematics Studies. Vol. 129 , pages 249–271. Norht-Holland, Amsterdam,1986.[51] S. M. Robinson. Regularity and stability for convex multivalued functions.
Math. Oper. Res. , 1:130–143, 1976.[52] R. T. Rockafellar.
Convex Analysis . Princeton University Press, Princeton,1970.[53] O. Stein. How to solve a semi-infinite optimization problem.
Eur. J. Oper.Res. , 223:312–320, 2012.[54] M. Stingl.
On the solution of nonlinear semidefinite programs by aug-mented Lagrangian methods . PhD thesis, Institute of Applied MathematicsII, Friedrech-Alexander University of Erlangen-Nuremberg, Erlangen, Ger-many, 2006.[55] A. S. Strekalovsky. On the problem of the global extremum.
Sov. Math.Dokl. , 35:194–198, 1987.[56] A. S. Strekalovsky. Global optimality conditions for nonconvex optimiza-tion.
J. Glob. Optim. , 12:415–434, 1998.[57] A. S. Strekalovsky. On the minimization of the difference of convex func-tions on a feasible set.
Comput. Math. Math. Phys. , 43:380–390, 2003.[58] A. S. Strekalovsky. On local search in d.c. optimization problems.
Appl.Math. Comput. , 255:73–83, 2015.[59] M. Thera. Subdifferential calculus for convex operators.
J. Math. Anal.Appl. , 80:78–91, 1981.[60] M. Todd. Semidefinite optimization.
Acta Numerica , 10:515–560, 2001.[61] A. H. Tor, A. Bagirov, and B. Karas¨ozen. Aggregate codifferential methodfor nonsmooth DC optimization.
J. Comput. Appl. Math. , 259:851–867,2014.[62] L. T. Tung. Karush-Kuhn-Tucker optimality conditions for nonsmoothmultiobjective semidefinite and semi-infinite programming.
J. Appl. Nu-mer. Optim. , 1:63–75, 2019.[63] H. Tuy. A general deterministic approach to global optimization via D.C.programming. In J.-B. Hiriart-Urruty, editor,
Fermat Days 85: Mathemat-ics for Optimization. North-Holland Mathematics Studies. Vol. 129 , pages273–303. Norht-Holland, Amsterdam, 1986.4164] H. Tuy.
Convex Analysis and Global Optimization . Kluwer Academic Pub-lishers, Dordrecht, 1998.[65] H. Tuy. On some recent advances and applications of D.C. optimization.In V. H. Nguyen, J. J. Strodiot, and P. Tossings, editors,
Optimization.Lecture Notes in Economics and Mathematical Systems, vol. 481 , pages473–497. Springer, Berling, Heidelberg, 2000.[66] H. Tuy. On global optimality conditions and cutting plane algorithms.
J.Optim. Theory Appl. , 118:201–216, 2003.[67] W. van Ackooij and W. de Oliveira. Non-smooth DC-constrained opti-mization: constraint qualification and minimizing methodologies.
Optim.Methods Softw. , 34:890–920, 2019.[68] W. van Ackooij and W. de Oliveira. Nonsmooth and nonconvex optimiza-tion via approximate difference-of-convex decompositions.
J. Optim. The-ory Appl. , 182:49–80, 2019.[69] H. Yamashita and H. Yabe. A primal-dual interior point method for nonlin-ear optimization over second-order cones.
Optim. Meth. Softw. , 24:407–426,2009.[70] H. Yamashita and H. Yabe. A survey of numerical methods for nonlinearsemidefinite programming.
J. Oper. Res. Soc. Japan , 58:24–60, 2015.[71] A. L. Yuille and A. Rangarajan. The concave-convex procedure.
NeuralComput. , 15:915–936, 2003.[72] Q. Zhang. A new necessary and sufficient global optimality condition forcanonical DC problems.
J. Glob. Optim. , 55:559–577, 2013.[73] X. Y. Zheng and X. Yang. Lagrange multipliers in nonsmooth semi-infiniteoptimization problems.