The Log-Exponential Smoothing Technique and Nesterov's Accelerated Gradient Method for Generalized Sylvester Problems
Nguyen Thai An, Daniel Giles, Nguyen Mau Nam, R. Blake Rector
aa r X i v : . [ m a t h . O C ] M a r THE LOG-EXPONENTIAL SMOOTHING TECHNIQUE ANDNESTEROV’S ACCELERATED GRADIENT METHOD FORGENERALIZED SYLVESTER PROBLEMS
N.T. An , D. Giles , N.M. Nam , R. B. Rector . Abstract:
The Sylvester smallest enclosing circle problem involves finding the smallest circle thatencloses a finite number of points in the plane. We consider generalized versions of the Sylvesterproblem in which the points are replaced by sets. Based on the log-exponential smoothing techniqueand Nesterov’s accelerated gradient method, we present an effective numerical algorithm for solvingthese problems.
Key words. log-exponential smoothing; minimization majorization algorithm; Nesterov’s acceler-ated gradient method; generalized Sylvester problem.
AMS subject classifications.
The smallest enclosing circle problem can be stated as follows: Given a finite set of pointsin the plane, find the circle of smallest radius that encloses all of the points. This problemwas introduce in the 19th century by the English mathematician James Joseph Sylvester(1814–1897) [24]. It is both a facility location problem and a major problem in computa-tional geometry. Over a century later, the smallest enclosing circle problem remains veryactive due to its important applications to clustering, nearest neighbor search, data clas-sification, facility location, collision detection, computer graphics, and military operations.The problem has been widely treated in the literature from both theoretical and numericalstandpoints; see [1, 4, 6, 7, 9, 21, 23, 25, 27, 28, 31] and the references therein.The authors’ recent research focuses on generalized Sylvester problems in which the givenpoints are replaced by sets. Besides the intrinsic mathematical motivation, this questionappears in more complicated models of facility location in which the sizes of the locationsare not negligible, as in bilevel transportation problems. The main goal of this paper is to Thua Thien Hue College of Education, 123 Nguyen Hue, Hue City, Vietnam (email: [email protected]). Fariborz Maseeh Department of Mathematics and Statistics, Portland State University, PO Box 751,Portland, OR 97207, United States (email: [email protected]). The research of Daniel Giles was partiallysupported by the USA National Science Foundation under grant DMS-1411817. Fariborz Maseeh Department of Mathematics and Statistics, Portland State University, PO Box 751,Portland, OR 97207, United States (email: [email protected]). The research of Nguyen Mau Namwas partially supported by the USA National Science Foundation under grant DMS-1411817 and the SimonsFoundation under grant Fariborz Maseeh Department of Mathematics and Statistics, Portland State University, PO Box 751,Portland, OR 97207, United States (email: [email protected]) smallest intersecting ball problem .This problems asks for the smallest ball that intersects a finite number of convex targetsets in R n . Note that when the target sets given in the problem are singletons, the smallestintersecting ball problem reduces to the classical Sylvester problem.The smallest intersecting ball problem can be solved by minimizing a nonsmooth optimiza-tion problem in which the objective function is the maximum of the distances to the targetsets. The nondifferentiability of this objective function makes it difficult to develop effectivenumerical algorithms for solving the problem. A natural approach is to approximate thenonsmooth objective function by a smooth function that is favorable for applying availablesmooth optimization schemes. Based on the log-exponential smoothing technique and Nes-terov’s accelerated gradient method, we present an effective numerical algorithm for solvingthis problem.Our paper is organized as follows. Section 2 contains tools of convex optimization usedthroughout the paper. In Section 3, we focus the analysis of the log-exponential smoothingtechnique applied to the smallest intersecting ball problem. Section 4 is devoted to develop-ing an effective algorithm based on the minimization majorization algorithm and Nesterov’saccelerated gradient method to solve the problem. We also analyze the convergence of thealgorithm. Finally, we present some numerical examples in Section 5. In this section, we introduce the mathematical models of the generalized Sylvester problemsunder consideration. We also present some important tools of convex optimization usedthroughout the paper.Consider the linear space R n equipped with the Euclidean norm k · k . The distance functionto a nonempty subset Q of R n is defined by d ( x ; Q ) := inf {k x − q k | q ∈ Q } , x ∈ R n . (2.1)Given x ∈ R n , the Euclidean projection from x to is the setΠ( x ; Q ) := { q ∈ Q | d ( x ; Q ) = k x − q k} . If Q is a nonempty closed convex set in R n , then Π( x ; Q ) is a singleton for every x ∈ R n .Furthermore, the projection operator is non-expansive in the sense that k Π( x ; Q ) − Π( y ; Q ) k ≤ k x − y k for all x, y ∈ R n . Let Ω and Ω i for i = 1 , . . . , m be nonempty closed convex subsets of R n . The mathematicalmodeling of the smallest intersecting ball problem with target sets Ω i for i = 1 , . . . , m and constraint set Ω isminimize D ( x ) := max (cid:8) d ( x ; Ω i ) (cid:12)(cid:12) i = 1 , . . . , m (cid:9) subject to x ∈ Ω . (2.2)2he solution to this problem gives the center of the smallest Euclidean ball (with center inΩ) that intersects all target sets Ω i for i = 1 , . . . , m .In order to study new problems in which the intersecting Euclidean ball is replaced byballs generated by different norms, we consider a more general setting. Let F be a closedbounded convex set that contains the origin as an interior point. We hold this as our standing assumptions for the set F for the remainder of the paper. The support function associated with F is defined by σ F ( x ) := sup {h x, f i | f ∈ F } . Note that if F = { x ∈ R n | k x k X ≤ } , where k · k X is a norm in R n , then σ F is the dualnorm of the norm k · k X .Let Q be a nonempty subset of R n . The generalized distance from a point x ∈ R n to Q generated by F is given by d F ( x ; Q ) := inf { σ F ( x − q ) | q ∈ Q } . (2.3)The generalized distance function (2.3) reduces to the distance function (2.1) when F is theclosed unit ball of R n with respect to the Euclidean norm. The readers are referred to [14]for important properties of the generalized distance function (2.3).Using (2.3), a more general model of problem (2.2) is given byminimize D F ( x ) := max (cid:8) d F ( x ; Ω i ) (cid:12)(cid:12) i = 1 , . . . , m (cid:9) subject to x ∈ Ω . (2.4)The function D F as well as its specification D are nonsmooth in general. Thus, problem(2.4) and, in particular, problem (2.2) must be studied from both theoretical and numericalview points using the tools of generalized differentiation from convex analysis.Given a function ϕ : R n → R , we say that ϕ is convex if it satisfies ϕ ( λx + (1 − λ ) y ) ≤ λϕ ( x ) + (1 − λ ) ϕ ( y ) , for all x, y ∈ R n and λ ∈ (0 , ϕ is said to be strictly convex if the aboveinequality becomes strict whenever x = y .The class of convex functions plays an important role in many applications of mathematics,especially applications to optimization. It is well-known that for a convex function f : R n → R , the function has an absolute minimum on a convex set Ω at ¯ x if and only if it has alocal minimum on Ω at ¯ x . Moreover, if f : R n → R is a differentiable convex function, then¯ x ∈ Ω is a minimizer for f if and only if h∇ f (¯ x ) , x − ¯ x i ≥ x ∈ Ω . (2.5)The readers are referred to [2, 3, 10, 15] for more complete theory of convex analysis andapplications to optimization from both theoretical and numerical aspects.3 Smoothing Techniques for Generalized Sylvester Problems
In this section, we employ the approach developed in [31] to approximate the nonsmoothoptimization problem (2.4) by a smooth optimization problem which is favorable for apply-ing available smooth numerical algorithms. The difference here is that we use generalizeddistances to sets instead of distances to points.Given an element v ∈ R n , the cone generated by v is given by cone { v } := { λv | λ ≥ } . Letus review the following definition from [14]. We recall that F is a closed bounded convexset that contains zero in its interior, as per the standing assumptions in this paper. Definition 3.1
The set F is normally smooth if for every x ∈ bd F there exists a x ∈ R n such that N ( x ; F ) = cone { a x } . In the theorem below, we establish the necessary and sufficient condition for the smallestintersecting ball problem (2.4) to have a unique optimal solution.
Theorem 3.2
Suppose that F is normally smooth, all of the target sets are strictly convex,and at least one of the sets Ω , Ω , ..., Ω m is bounded. Then the smallest intersecting ballproblem (2.4) has a unique optimal solution if and only if T mi =1 (Ω ∩ Ω i ) contains at mostone point.Proof It is clear that every point in the set T mi =1 (Ω ∩ Ω i ) is a solution of (2.4). Thus, if (2.4)has a unique optimal solution we must have that T mi =1 (Ω ∩ Ω i ) contains at most one point,so the necessary condition has been proven.For the sufficient condition, assume that T mi =1 (Ω ∩ Ω i ) contains at most one point. Theexistence of an optimal solution is guaranteed by the assumption that at least one of thesets Ω , Ω , ..., Ω m is bounded. What remains to be shown is the uniqueness of this solution.We consider two cases.In the first case, we assume that T mi =1 (Ω ∩ Ω i ) contains exactly one point ¯ x . Observethat D F (¯ x ) = 0 and D F ( x ) ≥ x ∈ R n , so ¯ x is a solution of (2.4). If ˆ x ∈ Ω isanother solution then we must have D F (ˆ x ) = D F (¯ x ) = 0. Therefore, d F (ˆ x ; Ω i ) = 0 forall i ∈ { , . . . , m } and hence ˆ x ∈ T mi =1 (Ω ∩ Ω i ) = { ¯ x } . We conclude that ˆ x = ¯ x and theproblem has unique solution in this case.For the second case we assume that T mi =1 (Ω ∩ Ω i ) = ∅ . We will show that the function S ( x ) = max { ( d F ( x ; Ω )) , . . . , ( d F ( x ; Ω m )) } , is strictly convex on Ω. This will prove the uniqueness of the solution.Take any x, y ∈ Ω, x = y and t ∈ (0 , x t := tx + (1 − t ) y . Let i ∈ { , . . . , m } such that ( d F ( x t ; Ω i )) = S ( x t ). Let u, v ∈ Ω i such that σ F ( x − u ) = d F ( x ; Ω i ) and4 F ( y − v ) = d F ( y ; Ω i ). Then we have S ( x t ) = ( d F ( x t ; Ω i )) = [ d F ( tx + (1 − t ) y ; Ω i )] ≤ [ td F ( x ; Ω i ) + (1 − t ) d F ( y ; Ω i )] = [ tσ F ( x − u ) + (1 − t ) σ F ( y − v )] = t ( σ F ( x − u )) + 2 t (1 − t ) σ F ( x − u ) σ F ( y − v ) + (1 − t ) ( σ F ( y − v )) ≤ t ( σ F ( x − u )) + t (1 − t ) h ( σ F ( x − u )) + (cid:0) σ F ( y − v ) (cid:1)i + (1 − t ) ( σ F ( y − v )) = t ( σ F ( x − u )) + (1 − t ) ( σ F ( y − v )) = t ( d F ( x ; Ω i )) + (1 − t ) ( d F ( y ; Ω i )) ≤ tS ( x ) + (1 − t ) S ( y ) . Recall that we need to prove the inequality S ( x t ) < tS ( x ) + (1 − t ) S ( y ). Suppose bycontradiction that S ( x t ) = tS ( x ) + (1 − t ) S ( y ). Then all of the inequalities in the aboveestimate are turned to equalities and thus we have d F ( x t ; Ω i ) = td F ( x ; Ω i ) + (1 − t ) d F ( y ; Ω i )and σ F ( x − u ) = σ F ( y − v ) . (3.6)Hence, d F ( x t ; Ω i ) = σ F ( x − u ) = σ F ( y − v ) . (3.7)Observe that σ F ( w ) = 0 if and only if w = 0, (3.7) implies x = u if and only if y = v .We claim that x = u and y = v . Indeed, if x = u and y = v , then x, y ∈ Ω i and hence x t ∈ Ω i by the convexity of Ω i . Thus d F ( x t ; Ω i ) = 0. This contradicts the fact that d F ( x t ; Ω i ) = D F ( x t ) > T mi =1 (Ω ∩ Ω i ) = ∅ .Now, we will show that u = v . Denote c := tu + (1 − t ) v ∈ Ω i . The properties of the supportfunction and (3.6) give d F ( x t ; Ω i ) ≤ σ F ( x t − c )= σ F ( t ( x − u ) + (1 − t )( y − v )) ≤ σ F ( t ( x − u )) + σ F ((1 − t )( y − v )) ≤ tσ F ( x − u ) + (1 − t ) σ F ( y − v )= σ F ( x − u ) . By (3.7) we have σ F ( t ( x − u ) + (1 − t )( y − v )) = σ F ( t ( x − u )) + σ F ((1 − t )( y − v )) . Since F is normally smooth, it follows from [14, Remark 3.4] that there exists λ > t ( x − u ) = λ (1 − t )( y − v ) . Now, by contradiction, suppose u = v . Then x − u = β ( y − u ) where β = t − λ (1 − t ). Notethat β = 1 since x = y and σ F ( y − u ) > y − u = 0. Now we have σ F ( x − u ) = sup {h x − u, f i | f ∈ F } = sup {h β ( y − u ) , f i | f ∈ F } = β sup {h y − u, f i | f ∈ F } = βσ F ( y − u ) = σ F ( y − u ) , u = v .Since u, v ∈ Ω i , u = v and Ω i is strictly convex, c ∈ int Ω i . The assumption T mi =1 (Ω ∩ Ω i ) = ∅ gives d F ( x t ; Ω i ) = D F ( x t ) >
0. Therefore, x t / ∈ Ω i and thus x t = c . Let δ > IB ( c ; δ ) ⊂ Ω i . Then c + γ ( x t − c ) ∈ Ω i , with γ = δ k x t − c k >
0. We have d F ( x t ; Ω i ) ≤ σ F ( x t − c − γ ( x t − c )) = (1 − γ ) σ F ( t ( x − u ) + (1 − t )( y − v )) < σ F ( t ( x − u ) + (1 − t )( y − v )) = σ F ( x − u ) . This contradicts (3.7) and completes the proof. (cid:3)
Recall the following definition.
Definition 3.3
A convex set F is said to be normally round if N ( x ; F ) = N ( y ; F ) whenever x, y ∈ bd F , x = y . Proposition 3.4
Let Θ be a nonempty closed convex subset of R n . Suppose that F isnormally smooth and normally round. Then the function g ( x ) := [ d F ( x ; Θ)] , x ∈ R n , iscontinuously differentiable.Proof It suffices to show that ∂g (¯ x ) is a singleton for every ¯ x ∈ R n . By [15], we have ∂g (¯ x ) = 2 d F (¯ x ; Θ) ∂d F (¯ x ; Θ) . It follows from [14, Proposition 4.3 (iii)] that g is continuously differentiable on Θ c , and so ∂g (¯ x ) = 2 d F (¯ x ; Θ) ∇ d F (¯ x ; Θ) = 2 d F (¯ x ; Θ) ∇ σ F (¯ x − w ) , where w := Π F (¯ x ; Θ) and ¯ x / ∈ Θ.In the case where ¯ x ∈ Θ, one has d F (¯ x ; Θ) = 0, and hence ∂g (¯ x ) = 2 d F (¯ x ; Θ) ∂d F (¯ x ; Θ) = { } . The proof is now complete. (cid:3)
If all of the target sets have a common point which belongs to the constraint set, then sucha point is a solution of problem (2.4), so we always assume that T ni =1 (Ω i ∩ Ω) = ∅ . We alsoassume that at least one of the sets Ω , Ω , ..., Ω m is bounded which guarantees the existenceof an optimal solution; see [16]. These are our standing assumptions for the remainder ofthis section.Let us start with some useful well-known results. We include the proofs for the convenienceof the reader. Lemma 3.5
Given positive numbers a i for i = 1 , . . . , m , m > , and < s < t , one has (i) ( a s + a s + . . . + a sm ) /s > ( a t + a t + . . . + a tm ) /t . (ii) ( a /s + a /s + . . . + a /sm ) s < ( a /t + a /t + . . . + a /tm ) t . (iii) lim r → + ( a /r + a /r + . . . + a /rm ) r = max { a , . . . , a m } . roof (i) Since t/s >
1, it is obvious that (cid:16) a s P mi =1 a si (cid:17) t/s + · · · + (cid:16) a sm P mi =1 a si (cid:17) t/s < a s P mi =1 a si + · · · + a sm P mi =1 a si = 1 , since a si P mi =1 a si ∈ (0 , a t ( P mi =1 a si ) t/s + · · · + a tm ( P mi =1 a si ) t/s < , and hence a t + · · · + a tm < ( m X i =1 a si ) t/s . This implies (i) by rasing both sides to the power of 1 /t . (ii) Inequality (ii) follows directly from (i) . (iii) Defining a := max { a , . . . , a m } yields a ≤ ( a /r + a /r + . . . + a /rm ) r ≤ m r a → a as r → + , which implies (iii) and completes the proof. (cid:3) For p > x ∈ R n , the log-exponential smoothing function of D F ( x ) is defined as D F ( x, p ) := p ln m X i =1 exp (cid:18) G F,i ( x, p ) p (cid:19) , (3.8)where G F,i ( x, p ) := p d F ( x ; Ω i ) + p . Theorem 3.6
The function D F ( x, p ) defined in (3.8) has the following properties: (i) If x ∈ R n and < p < p , then D F ( x, p ) < D F ( x, p ) . (ii) For any x ∈ R n and p > , ≤ D F ( x, p ) − D F ( x ) ≤ p (1 + ln m ) . (iii) For any p > , the function D F ( · , p ) is convex. If we suppose further that F is nor-mally smooth and the sets Ω i for i = 1 , . . . , m are strictly convex and not collinear (i.e., itis impossible to draw a straight line that intersects all the sets Ω i ), then D F ( · , p ) is strictlyconvex. (iv) For any p > , if F is normally smooth and normally round, then D F ( · , p ) is continu-ously differentiable. (v) If at least one of the target sets Ω i for i = 1 , . . . , m is bounded, then D F ( · , p ) is coercivein the sense that lim k x k→∞ D F ( x, p ) = ∞ . roof (i) Define a i ( x, p ) := exp( G F,i ( x, p )) ,a ∞ ( x, p ) := max i =1 ,...,m a i ( x, p ) , and G F, ∞ ( x, p ) := max i =1 ,...,m G F,i ( x, p ) . Then a i ( x, p ) is strictly increasing on (0 , ∞ ) as a function of p and a i ( x, p ) a ∞ ( x, p ) = exp( G F,i ( x, p ) − G F, ∞ ( x, p )) ≤ ,G F, ∞ ( x, p ) ≤ D F ( x ) + p. For 0 < p < p , it follows from Lemma 3.5(ii) that D F ( x, p ) = ln (cid:2) m X i =1 ( a i ( x, p )) /p (cid:3) p < ln (cid:2) m X i =1 ( a i ( x, p )) /p (cid:3) p < ln (cid:2) m X i =1 ( a i ( x, p )) /p (cid:3) p = D F ( x, p ) , which justifies (i) . (ii) It follows from (3.8) that for any i ∈ { , . . . , m } , we have D F ( x, p ) ≥ p ln exp (cid:18) G F,i ( x, p ) p (cid:19) = G F,i ( x, p ) ≥ d F ( x ; Ω i ) . This implies D F ( x, p ) ≥ D F ( x ) for all x ∈ R n and p >
0. Moreover, D F ( x, p ) = ln a ∞ ( x, p ) h m X i =1 (cid:0) a i ( x, p ) a ∞ ( x, p ) (cid:1) /p i p = ln a ∞ ( x, p ) + p ln m X i =1 (cid:0) a i ( x, p ) a ∞ ( x, p ) (cid:1) /p ≤ G F, ∞ ( x, p ) + p ln m ≤ D F ( x ) + p + p ln m. Thus, (ii) has been proved. (iii)
Given p >
0, the function f p ( t ) := √ t + p p is increasing and convex on the interval[0 , ∞ ), and d ( · ; Ω i ) is convex, so the function k i ( x, p ) := G F,i ( x,p ) p is also convex with respectto x . For any x, y ∈ R n and λ ∈ (0 , u = ( u , . . . , u m ) → ln m X i =1 exp( u i ) , D F ( λx + (1 − λ ) y, p ) = p ln m X i =1 exp (cid:20) k i ( λx + (1 − λ ) y, p ) (cid:21) ≤ p ln m X i =1 exp (cid:20) λk i ( x, p ) + (1 − λ ) k i ( y, p ) (cid:21) (3.9) ≤ λp ln m X i =1 exp (cid:18) k i ( x, p ) (cid:19) + (1 − λ ) p ln m X i =1 exp (cid:18) k i ( y, p ) (cid:19) = λ D F ( x, p ) + (1 − λ ) D F ( y, p ) . Thus, D F ( · , p ) is convex. Suppose that F is normally smooth and the sets Ω i for i = 1 , . . . , m are strictly convex and not collinear, but D F ( · , p ) is not strictly convex. Then there exist x, y ∈ R n with x = y and 0 < λ < D F ( λx + (1 − λ ) y, p ) = λ D F ( x, p ) + (1 − λ ) D F ( y, p ) . Thus, all the inequalities (3.9) become equalities. Since the functions ln , exp are strictlyincreasing on (0 , ∞ ), this implies k i ( λx + (1 − λ ) y, p ) = λk i ( x, p ) + (1 − λ ) k i ( y, p ) , for all i = 1 , . . . , m. (3.10)Observe that k i ( · , p ) = f p ( d F ( · ; Ω i )) and the function f p ( · ) is strictly increasing [0 , ∞ ), itfollows from (3.10) that d F ( λx + (1 − λy ) , Ω i ) = λd F ( x, Ω i ) + (1 − λ ) d F ( y, Ω i ) for all i = 1 , . . . , m. The result now follows directly from the proof of [14, Proposition 4.5]. (iv)
Let ϕ i ( x ) := [ d F ( x ; Ω i )] . Then ϕ i is continuously differentiable by Proposition 3.4.By the chain rule, for any p >
0, the function D F ( x, p ) is continuously differentiable as afunction of x . (v) Without loss of generality, we assume that Ω is bounded. It then follows from (ii) that lim k x k→∞ D F ( x, p ) ≥ lim k x k→∞ D F ( x ) ≥ lim k x k→∞ d F ( x ; Ω ) = ∞ . Therefore, D F ( · , p ) is coercive, which justifies (iv) . The proof is now complete. (cid:3) In the next corollary, we obtain an explicit formula of the gradient of the log-exponentialapproximation of D in the case where F is the closed unit ball of R n . For p > x ∈ R n , define D ( x, p ) := p ln m X i =1 exp (cid:18) G i ( x, p ) p (cid:19) , (3.11)where G i ( x, p ) := p d ( x ; Ω i ) + p . orollary 3.7 For any p > , D ( · , p ) is continuously differentiable with the gradient in x computed by ∇ x D ( x, p ) = m X i =1 Λ i ( x, p ) G i ( x, p ) ( x − e x i ) , where e x i := Π( x ; Ω i ) , and Λ i ( x, p ) := exp ( G i ( x, p ) /p ) P mi =1 exp ( G i ( x, p ) /p ) . Proof
It follows from Theorem 3.6 that D ( · , p ) is continuously differentiable. Let ϕ i ( x ) :=[ d ( x ; Ω i )] . Then ∇ ϕ i ( x ) = 2( x − e x i ) , where e x i := Π( x ; Ω i ), and hence the gradient formulafor D ( x, p ) follows from the chain rule. (cid:3) Remark 3.8 (i)
To avoid working with large numbers when implementing algorithms for(2.2), we often use the identityΛ i ( x, p ) := exp ( G i ( x, p ) /p ) P mi =1 exp [( G i ( x, p ) /p ) = exp [( G i ( x, p ) − G ∞ ( x, p )] /p ) P mi =1 exp [( G i ( x, p ) − G ∞ ( x, p )] /p ) , where G ∞ ( x, p ) := max i =1 ,...,m G i ( x, p ). (ii) In general, D ( · , p ) is not strictly convex. For example, in R , consider the sets Ω = {− } × [ − ,
1] and Ω = { } × [ − , D ( · , p ) takes constant value on { } × [ − , Proposition 3.9
Let { p k } be a sequence of positive real numbers converging to 0. For each k , let y k ∈ arg min x ∈ Ω D F ( x, p k ) . Then { y k } is a bounded sequence and every subsequentiallimit of { y k } is an optimal solution of problem (2.4) . Suppose further that problem (2.4) has a unique optimal solution. Then { y k } converges to that optimal solution.Proof First, observe that { y k } is well defined because of the assumption that at least one ofthe sets Ω , Ω , ..., Ω m is bounded and the coercivity of D F ( · , p k ). By Theorem 3.6 (ii), forall x ∈ Ω, we have D F ( x, p k ) ≤ D F ( x ) + p k (1 + ln m ) and D F ( y k ) ≤ D F ( y k , p k ) ≤ D F ( x, p k ) . Thus, D F ( y k ) ≤ D F ( x ) + p k (1 + ln m ), which implies the bounded property of { y k } usingthe boundedness of Ω or the coercivity of D F ( · ) from Theorem 3.6 (v). Suppose that thesubsequence { y k l } converges to y . Then D F ( y ) ≤ D F ( x ) for all x ∈ Ω, and hence y is anoptimal solution of problem (2.4). If (2.4) has a unique optimal solution ¯ y , then y = ¯ y andhence y k → ¯ y . (cid:3) ϕ : Q → R is called strongly convex with modulus m > Q if ϕ ( x ) − m k x k is a convex function on Q . From the definition, it is obvious that anystrongly convex function is also strictly convex. Moreover, when ϕ is twice differentiable, ϕ is strongly convex with modulus m on an open convex set Q if ∇ ϕ ( x ) − mI is positivesemidefinite for all x ∈ Q ; see [10, Theorem 4.3.1(iii)]. Proposition 3.10
Suppose that all the sets Ω i for i = 1 , . . . , m reduce to singletons. Thenfor any p > , the function D ( · , p ) is strongly convex on any bounded convex set, and ∇ x D ( · , p ) is globally Lipschitz continuous on R n with Lipschitz constant p .Proof Suppose that Ω i = { c i } for i = 1 , . . . , m . Then D ( x, p ) = p ln m X i =1 exp (cid:18) g i ( x, p ) p (cid:19) , and the gradient of D ( · , p ) at x becomes ∇ x D ( x, p ) = m X i =1 λ i ( x, p ) g i ( x, p ) ( x − c i ) , where g i ( x, p ) := p k x − c i k + p and λ i ( x, p ) := exp ( g i ( x, p ) /p ) P mi =1 exp ( g i ( x, p ) /p ) . Let us denote Q ij := ( x − c i )( x − c j ) T g i ( x, p ) g j ( x, p ) . Then ∇ x D ( x, p ) = m X i =1 λ i ( x, p ) g i ( x, p ) ( I n − Q ii ) + λ i ( x, p ) p Q ii − m X j =1 λ i ( x, p ) λ j ( x, p ) p Q ij . Given a positive constant K , for any x ∈ R n , k x k < K and z ∈ R n , z = 0, one has1 g i ( x, p ) ( k z k − z T Q ii z ) ≥ g i ( x, p ) ( k z k − k z k k ( x − c i ) /g i ( x, p ) k )= 1 p k x − c i k + p k z k (cid:20) p k x − c i k + p (cid:21) ≥ k z k (cid:20) p [2( k x k + k c i k ) + p ] / (cid:21) ≥ ℓ k z k , where ℓ := p [2 K + 2 max ≤ i ≤ m k c i k + p ] / . m real numbers a , . . . , a m , since λ i ( x, p ) ≥ i = 1 , . . . , m , and P mi =1 λ i ( x, p ) = 1,by Cauchy-Schwartz inequality, we have m X i =1 λ i ( x, p ) a i ! = m X i =1 p λ i ( x, p ) p λ i ( x, p ) a i ! ≤ m X i =1 λ i ( x, p ) a i . This implies z T ∇ x D ( x, p ) z = m X i =1 (cid:20) λ i ( x, p ) g i ( x, p ) ( k z k − z T Q ii z ) (cid:21) + m X i =1 λ i ( x, p ) p z T Q ii z − m X j =1 λ i ( x, p ) λ j ( x, p ) p z T Q ij z . ≥ ℓ k z k + 1 p m X i =1 λ i ( x, p ) a i − m X i =1 λ i ( x, p ) a i ! ≥ ℓ k z k , where a i := z T ( x − c i ) /g i ( x, p ). This shows that D ( x, p ) is strongly convex on B (0; K ).The fact that for any p >
0, the gradient of D ( x, p ) with respect to x is Lipschitz continuouswith constant L = p was proved in [29, Proposition 2]. (cid:3) In this section, we apply the minimization majorization well known in computational statis-tics along with the log-exponential smoothing technique developed in the previous sectionto develop an algorithm for solving the smallest intersecting ball problem. We also providesome examples showing that minimizing functions that involve distances to convex sets notonly allows to study a generalized version of the smallest enclosing circle problem, but alsoopens up the possibility of applications to other problems of constrained optimization.Let f : R n → R be a convex function. Consider the optimization problemminimize f ( x ) subject to x ∈ Ω . (4.12)A function g : R n → R is called a surrogate of f at ¯ z ∈ Ω if f ( x ) ≤ g ( x ) for all x ∈ Ω ,f (¯ z ) = g (¯ z ) . The set of all surrogates of f at ¯ z is denoted by S ( f, ¯ z ).The minimization majorization algorithm for solving (4.12) is given as follows; see [13].12 lgorithm 1 . INPUT: x ∈ Ω, N for k = 1 , . . . , N do Find a surrogate g k ∈ S ( f, x k − )Find x k ∈ argmin x ∈ Ω g k ( x ) end for OUTPUT: x N Clearly, the choice of surrogate g k ∈ S ( f, x k − ) plays a crucial role in the minimizationmajorization algorithm. In what follows, we consider a particular choice of surrogatesfor the minimization majorization algorithm; see, e.g., [5, 11, 12]. An objective function f : Ω → R is said to be majorized by M : Ω × Ω → R if f ( x ) ≤ M ( x, y ) and M ( y, y ) = f ( y ) for all x, y ∈ Ω . Given x k − ∈ Ω, we can define g k ( x ) := M ( x, x k − ), so that g k ∈ S ( f, x k − ). Then theupdate x k ∈ arg min x ∈ Ω M ( x, x k − )defines an minimization majorization algorithm. As mentioned above, finding an appropri-ate majorization is an important piece of this algorithm. It has been shown in [5] that theminimization majorization algorithm using distance majorization provides an effective toolfor solving many important classes of optimization problems. The key step is to use thefollowing: d ( x ; Q ) ≤ k x − Π( y ; Q ) k and d ( y ; Q ) = k y − Π( y ; Q ) k . In the examples below, we revisit some algorithms based on distance majorization andprovide the convergence analysis for these algorithms.
Example 4.1
Let Ω i for i = 1 , ..., m be nonempty closed convex subsets of R n such that m T i =1 Ω i = ∅ . The problem of finding a point x ∗ ∈ m T i =1 Ω i is called the feasible point problem for these sets. Consider the problemminimize f ( x ) := m X i =1 [ d ( x ; Ω i )] , x ∈ R n . (4.13)With the assumption that m T i =1 Ω i = ∅ , x ∗ ∈ R n is an optimal solution of (4.13) if and onlyif x ∗ ∈ m T i =1 Ω i . Thus, we only need to consider (4.13).Let us apply the minimization majorization algorithm for (4.13). First, we need to findsurrogates for the objective function f ( x ) = m P i =1 [ d ( x ; Ω i )] . Let g k ( x ) := m X i =1 k x − Π( x k − ; Ω i ) k . g k ∈ S ( f, x k − ) for all k ∈ N , so the minimization majorization algorithm is given by x k ∈ argmin x ∈ R n g k ( x ) = 1 m m X i =1 Π( x k − ; Ω i ) . Let h k ( x ) := g k ( x ) − f ( x ) = m X i =1 (cid:2) k x − Π( x k − ; Ω i ) k − [ d ( x ; Ω i )] (cid:3) . We can show that h k ( x ) is differentiable on R n and ∇ h k ( x ) is Lipschitz with constant L = 2 m . Moreover, h k ( x k − ) = ∇ h k ( x k − ) = 0 . The function g k ( x ) − m k x k is convex, so g k is strongly convex with modulus ρ = 2 m .Using the same notation as in [13], one has g k ∈ S L,ρ ( f, x k − ) with ρ = L = 2 m . By [13,Proposition 2.8], f ( x k ) − V ∗ ≤ mk k x − x ∗ k for all k ∈ N . Example 4.2
Given a data set S := { ( a i , y i ) } mi =1 , where a i ∈ R p and y i ∈ {− , } , considerthe support vector machine problemminimize 12 k x k subject to y i h a i , x i ≥ i = 1 , . . . , m. Let Ω i := { x ∈ R p | y i h a i , x i ≥ } . Using the quadratic penalty method (see [5]), thesupport vector machine problem can be solved by the following unconstrained optimizationproblem: minimize f ( x ) := 12 k x k + C m X i =1 [ d ( x ; Ω i )] , x ∈ R p , C > . (4.14)Using the minimization majorization algorithm with the surrogates g k ( x ) = 12 k x k + C m X i =1 k x − Π( x k − ; Ω i ) k for (4.14) yields x k = C mC m X i =1 Π( x k − , Ω i ) . Let h k ( x ) := g k ( x ) − f ( x ) = C m X i =1 (cid:2) k x − Π( x k − ; Ω i ) k − [ d ( x ; Ω i )] (cid:3) . We can show that ∇ h k ( x ) is Lipschitz with constant L = mC , and h k ( x k − ) = ∇ h k ( x k − ) =0 . Moreover, g k is strongly convex with parameter ρ = 1 + mC . By [13, Proposition 2.8],the minimization majorization method applied for (4.14) gives f ( x k ) − V ∗ ≤ mC (cid:18) mCmC + 2 (cid:19) k − k x − x ∗ k for all k ∈ N , where x ∗ is the optimal solution of (4.14) and V ∗ is the optimal value.14n what follows, we apply the minimization majorization algorithm in combination with thelog-exponential smoothing technique to solve the smallest intersecting ball problem (2.2). Inthe first step, we approximate the cost function D in (2.2) by the log-exponential smoothingfunction (3.11). Then the new function is majorized in order to apply the minimizationmajorization algorithm. For x, y ∈ R n and p >
0, define G ( x, y, p ) := p ln m X i =1 exp p k x − Π( y ; Ω i ) k + p p ! . Then G ( x, y, p ) serves as a majorization of the log-exponential smoothing function (3.11).From Proposition 3.10, for p > y ∈ R n , the function G ( x, y, p ) with variable x isstrongly convex on any bounded set and continuously differentiable with Lipschitz gradienton R n .Our algorithm is explained as follows. Choose a small number ¯ p . In order to solve thesmallest intersecting ball problem (2.2), we minimize its log-exponential smoothing approx-imation (3.11): minimize D ( x, ¯ p ) subject to x ∈ Ω . (4.15)Pick an initial point x ∈ Ω and apply the minimization majorization algorithm with x k := arg min x ∈ Ω G ( x, x k − , ¯ p ) . (4.16)The algorithm is summarized by the following. Algorithm 2 . INPUT: Ω, ¯ p > x ∈ Ω, m target sets Ω i , i = 1 , . . . , m , N for k = 1 , . . . , N douse a fast gradient algorithm to solve approximately x k := arg min x ∈ Ω G ( x, x k − , ¯ p ) end for OUTPUT: x N Proposition 4.3
Given ¯ p > and x ∈ Ω , the sequence { x k } of exact solutions x k :=arg min x ∈ Ω G ( x, x k − , ¯ p ) generated by Algorithm 2 has a convergent subsequence.Proof Denoting α := D ( x , ¯ p ) and using Theorem 3.6(v) imply that the level set L ≤ α := { x ∈ Ω | D ( x, ¯ p ) ≤ α } is bounded. For any k ≥
1, because G ( · , x k − , ¯ p ) is a surrogate of D ( · , ¯ p ) at x k − , one has D ( x k , ¯ p ) ≤ G ( x k , x k − , ¯ p ) ≤ G ( x k − , x k − , ¯ p ) = D ( x k − , ¯ p ) . It follows that D ( x k , ¯ p ) ≤ D ( x k − , ¯ p ) ≤ . . . ≤ D ( x , ¯ p ) ≤ D ( x , ¯ p ) . { x k } ⊂ L ≤ α which is a bounded set, so { x k } has a convergent subsequence. (cid:3) The convergence of the minimization majorization algorithm depends on the algorithm map ψ ( x ) := arg min y ∈ Ω G ( y, x, ¯ p ) = arg min y ∈ Ω (cid:26) ¯ p ln m X i =1 exp p k y − Π( x ; Ω i ) k + ¯ p ¯ p ! (cid:27) . (4.17)In the theorem below, we show that the conditions in [5, Proposition 1] are satisfied. Theorem 4.4
Given ¯ p > , the function D ( · , ¯ p ) and the algorithm map ψ : Ω → Ω definedby (4.17) satisfy the following conditions: (i) For any x ∈ Ω , the level set L ( x ) := { x ∈ Ω | D ( x, ¯ p ) ≤ D ( x , ¯ p ) } is compact. (ii) ψ is continuous on Ω . (iii) D ( ψ ( x ) , ¯ p ) < D ( x, ¯ p ) whenever x = ψ ( x ) . (iv) Any fixed point ¯ x of ψ is a minimizer of D ( · , ¯ p ) on Ω .Proof Observe that the function D ( · , ¯ p ) is continuous on Ω. Then the level set L ( x ) iscompact for any initial point x since D ( · , ¯ p ) is coercive by Theorem 3.6(v), and hence (i) is satisfied. From the strict convexity on Ω of G ( · , x, ¯ p ) guaranteed by Proposition 3.10, wecan show that the algorithm map ψ : Ω → Ω is a single-valued mapping. Let us prove that ψ is continuous. Take an arbitrary sequence { x k } ⊂ Ω, x k → ¯ x ∈ Ω as k → ∞ . It suffices toshow that the sequence y k := ψ ( x k ) tends to ψ (¯ x ). It follows from the continuity of D ( · , ¯ p )that D ( x k , ¯ p ) → D (¯ x, ¯ p ), and hence we can assume D ( x k , ¯ p ) ≤ D (¯ x, ¯ p ) + δ, for all k ∈ N , where δ is a positive constant. One has the estimates D ( ψ ( x k ) , ¯ p ) ≤ G ( ψ ( x k ) , x k , ¯ p ) ≤ G ( x k , x k , ¯ p ) = D ( x k , ¯ p ) ≤ D (¯ x, ¯ p ) + δ, which imply that { y k } is bounded by the coerciveness of D ( · , ¯ p ). Consider any convergentsubsequence { y k ℓ } with the limit z . Since y k ℓ is a solution of the smooth optimizationproblem min y ∈ Ω G ( y, x k ℓ , ¯ p ), by the necessary and sufficient optimality condition (2.5) forthe given smooth convex constrained optimization problem, we have h∇G ( y k ℓ , x k ℓ , ¯ p ) , x − y k ℓ i ≥ x ∈ Ω . This is equivalent to (cid:28) m X i =1 λ i ( y k ℓ , ¯ p ) g i ( y k ℓ , ¯ p ) ( y k ℓ − Π( x k ℓ ; Ω i )) , x − y k ℓ (cid:29) ≥ x ∈ Ω , where g i ( y k ℓ , ¯ p ) = q k y k ℓ − Π( x k ℓ , Ω i ) k + ¯ p and λ i ( y k ℓ , ¯ p ) = exp ( g i ( y k ℓ , ¯ p ) / ¯ p ) P mi =1 exp ( g i ( y k ℓ , ¯ p ) / ¯ p ) . h∇G ( z, ¯ x, ¯ p ) , x − z i ≥ x ∈ Ω . Thus, applying (2.5) again implies that z is also an optimal solution of the problemmin y ∈ Ω G ( y, ¯ x, ¯ p ). By the uniqueness of solution and ψ (¯ x ) = arg min y ∈ Ω G ( y, ¯ x, ¯ p ), onehas that z = ψ (¯ x ) and { y k ℓ } converges to ψ (¯ x ). Since this conclusion holds for all con-vergent subsequences of the bounded sequence { y k } , the sequence { y k } itself converges to ψ (¯ x ), which shows that (ii) is satisfied. Let us verify that D ( ψ ( x ) , ¯ p ) < D ( x, ¯ p ) whenever ψ ( x ) = x . Observe that ψ ( x ) = x if and only if G ( x, x, ¯ p ) = min y ∈ Ω G ( y, x, ¯ p ). Since G ( y, x, ¯ p ) has a unique minimizer, we have the strict inequality G ( ψ ( x ) , x, ¯ p ) < G ( x, x, ¯ p )whenever x is not a fix point of ψ . Combining with D ( ψ ( x ) , ¯ p ) ≤ G ( ψ ( x ) , x, ¯ p ) and D ( x, ¯ p ) = G ( x, x, ¯ p ), we arrive at the conclusion (iii) .Finally, we show that, any fixed point ¯ x of algorithm map ψ ( x ) is a minimizer of D ( x, ¯ p )on Ω. Fix any ¯ x ∈ Ω such that ψ (¯ x ) = ¯ x . Then G (¯ x, ¯ x, ¯ p ) = min y ∈ Ω G ( y, ¯ x, ¯ p ), which isequivalent to h∇G (¯ x, ¯ x, ¯ p ) , x − ¯ x i ≥ x ∈ Ω . This means (cid:28) m X i =1 λ i (¯ x, ¯ p ) g i (¯ x, ¯ p ) (¯ x − Π(¯ x ; Ω i )) , x − ¯ x (cid:29) ≥ x ∈ Ω , where g i (¯ x, ¯ p ) = p k ¯ x − Π(¯ x, Ω i ) k + ¯ p = p d (¯ x ; Ω i ) + ¯ p = G i (¯ x, ¯ p )and λ i (¯ x, ¯ p ) = exp ( g i (¯ x, ¯ p ) / ¯ p ) P mi =1 exp ( g i (¯ x, ¯ p ) / ¯ p ) = Λ i (¯ x, ¯ p ) . This inequality, however, is equivalent to the inequality h∇D (¯ x, ¯ p ) , x − ¯ x i ≥ x ∈ Ω,which in turn holds if and only if ¯ x is a minimizer of D ( x, ¯ p ) on Ω. (cid:3) Corollary 4.5
Given ¯ p > and x ∈ Ω , the sequence { x k } of exact solution x k :=arg min x ∈ Ω G ( x, x k − , ¯ p ) generated by Algorithm 2 has a subsequence that converges to anoptimal solution of (4.15) . If we suppose further that problem (4.15) has a unique optimalsolution, then { x k } converges to this optimal solution.Proof It follows from Proposition 4.3 that { x k } has a subsequence { x k ℓ } that convergesto ¯ x . Applying [5, Proposition 1] implies that k x k ℓ +1 − x k ℓ k → k → ∞ . From thecontinuity of the algorithm map ψ and the equation x k ℓ +1 = ψ ( x k ℓ ), one has that ψ (¯ x ) = ¯ x .By Theorem 4.4(iv), the element ¯ x is an optimal solution of (4.15). The last conclusion isobvious. (cid:3) In what follows, we apply
Nesterov’s accelerated gradient method introduced in [18, 20] tosolve (4.16) approximately. Let f : R n → R be a a smooth convex function with Lipschitzgradient. That is, there exists ℓ ≥ k∇ f ( x ) − ∇ f ( y ) k ≤ ℓ k x − y k for all x, y ∈ R n . f ( x ) subject to x ∈ Ω . For x ∈ R n , define T Ω ( x ) := arg min {h∇ f ( x ) , y − x i + ℓ k x − y k | y ∈ Ω } . Let d : R n → R be a strongly convex function with parameter σ >
0. Let x ∈ R n such that x = arg min { d ( x ) | x ∈ Ω } . Further, assume that d ( x ) = 0.For simplicity, we choose d ( x ) = k x − x k , where x ∈ Ω, so σ = 1. It is not hard to seethat y k = T Ω ( x k ) = Π( x k − ∇ f ( x k ) ℓ ; Ω) . Moreover, z k = Π( x − ℓ k X i =0 i + 12 ∇ f ( x i ); Ω) . Nesterov’s accelerated gradient algorithm is outlined as follows.
Algorithm 3 .INPUT: f , ℓ , x ∈ Ωset k = 0 repeat find y k := T Ω ( x k )find z k := arg min (cid:8) ℓσ d ( x ) + P ki =0 i + 12 [ f ( x i ) + h∇ f ( x i ) , x − x i i ] (cid:12)(cid:12) x ∈ Ω (cid:9) set x k := 2 k + 3 z k + k + 1 k + 3 y k set k := k + 1 until a stopping criterion is satisfied. OUTPUT: y k . It has been experimentally observed that the algorithm is more effective if, instead ofchoosing a small value p ahead of time, we change its value using an initial value p anddefine p s := σp s − , where σ ∈ (0 , Algorithm 4 .INPUT: Ω, ǫ > p > x ∈ Ω, m target sets Ω i , i = 1 , . . . , m , N set p = p , y = x for k = 1 , . . . , N douse Nesterov’s accelerated gradient method to solve approximately y := arg min x ∈ Ω G ( x, y, p )set p := σp end for OUTPUT: y emark 4.6 (i) When implementing this algorithm, we usually desire to maintain p k > ǫ ,where ǫ < p . So the factor σ can be calculated based on the desired number of iterations N to be run, i.e., σ = ( ǫ/p ) /N . (ii) In Nesterov’s accelerated gradient method, at iteration k , we often use the stoppingcriterion k∇ x G ( x, y, p ) k < γ k , where γ is chosen and γ k = ˜ σγ k − for some ˜ σ ∈ (0 , σ can be calculatedbased on the number of iterations N and the lower bound ˜ ǫ > γ k as above. We implement Algorithm 4 to solve the generalized Sylvester problem in a number ofexamples. In each of the following examples, we implement Algorithm 4 with the followingparameters described in this algorithm and Remark 4.6: ǫ = 10 − , ˜ ǫ = 10 − , p = 5 , γ = . , and N = 10. Observations suggest that when the number of dimensions is large, speedis improved by starting with the relatively high γ and p and decreasing each (therebyreducing error) with each iteration. Choosing σ, ˜ σ as described in Remark 4.6 ensuresthat the final iterations are of desired accuracy. The approximate radii of the smallestintersecting ball that corresponds to the approximate optimal solution x k is r k := D ( x k ). x x Figure 1: An illustration of minimization majorization algorithm.
Example 5.1
Let us first apply Algorithm 4 to an unconstraint generalized Sylvester prob-lem (2.2) in R in which Ω i for i = 1 , . . . , − , , − , − − , − , ,
1) with radii 3 , . , . , , ,
4, respectively. This setup is depicted in Fig-ure 1. A simple MATLAB program yields an approximate smallest intersecting ball withcenter x ∗ ≈ (1 . , .
83) with the approximate radius r ∗ ≈ .
65. Figure 1 shows a signif-icant move toward the optimal solution of the problem in one step of the minimizationmajorization algorithm.
Example 5.2
We consider the smallest intersecting ball problem in which the target setsare square boxes in R n . In R n , a square box S ( ω, r ) with center ω = ( ω , . . . , ω n ) and radius19igure 2: A smallest intersecting ball problem for cubes in R . r is the set S ( ω, r ) := { x = ( x , . . . , x n ) | max {| x − ω | , . . . , | x n − ω n |} ≤ r } . Note that the Euclidean projection from x to S ( ω, r ) can be expressed componentwise asfollows [Π( x ; S )] i = ω i − r i if x i + r ≤ ω i ,x i if ω i − r ≤ x i ≤ ω i + r,ω i + r if ω i + r ≤ x i . Consider the case where n = 3. The target sets are 5 square boxes with centers ( − , , , , , , − , − ,
2) and (0 , ,
5) and radii r i = 1 for i = 1 , . . . ,
5. Our resultsshow that both Algorithm 4 and the subgradient method give an approximate SIB radius r ∗ ≈ .
18; see Figure 2. f un c t i on v a l ue MATLAB RESULT k r k A smallest intersecting ball problem for cubes in high dimensions. xample 5.3 Now we illustrate the performance of Algorithm 4 in high dimensions withthe same setting in Example 5.2. We choose a modification of the pseudo random sequencefrom [31] with a = 7 and for i = 1 , , . . .a i +1 = mod(445 a i + 1 , , b i = a i . . The radii r i and centers c i of the square boxes are successively set to b , b , . . . in thefollowing order10 r , c (1) , . . . , c ( n ); 10 r , c (1) , . . . , c ( n ); . . . ; 10 r m , c m (1) , . . . , c m ( n ) . Consider m = 100 , n = 1000. Figure 3 shows the approximate values of the radii r k for k = 0 , . . . , f un c t i on v a l ue Algorithm 4subgradient algorithm
Figure 4: Comparison between minimization majorization algorithm and subgradient algo-rithm.We also implement algorithm 4 in comparison with the subgradient algorithm. From ournumerical results, we see that Algorithm 4 performs much better than the subgradientalgorithm in both accuracy and speed. In the case where the number of target sets islarge or the dimension m is high, the subgradient algorithm seems to be stagnated butAlgorithm 4 still performs well. Figure 4 shows that comparison between Algorithm 4 andthe subgradient algorithm. Note that in Algorithm 4, we count every iteration of Nesterov’saccelerated gradient method in the total iteration count along the horizontal axis. Thusthe “sharp corner” that can be seen at 50 iterations represents the transition form x to x ,and a subsequent recalculation of p by p = σp . Based on the log-exponential smoothing technique and the minimization majorization al-gorithm, we have developed an effective numerical algorithm for solving the smallest inter-secting ball problem. The problem under consideration not only generalizes the Sylvester21mallest enclosing circle problem, but also opens up the possibility of applications to otherproblems of constrained optimization, especially those that appear frequently in machinelearning. Our numerical examples show that the algorithm works well for the problem inhigh dimensions. Although a number of key convergence results are contained in this pa-per, our future work will further develop an understanding of the convergence rate of thisalgorithm
Acknowledgement.
The author would like to thank Prof. Jie Sun for giving commentsthat help improve the presentation of the paper.
References [1] J. Alonso, H. Martini, and M. Spirova, Minimal enclosing discs, circumcircles, andcircumcenters in normed planes, Comput. Geom. (2012), 258–274.[2] D. Bertsekas, A. Nedic, and A. Ozdaglar, Convex Analysis and Optimization , AthenaScientific, Boston, 2003.[3] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press,Cambridge, England, 2004.[4] D. Cheng, X. Hu, C. Martin, On the smallest enclosing balls, Commun. Inf. Syst. (2006), 137–160.[5] E. Chi, H. Zhou, K. Lange, Distance Majorization and Its Applications, Math Pro-gram., in press.[6] L. Drager, J. Lee, C. Martin, On the geometry of the smallest circle enclosing a finiteset of points, J. Franklin Inst. (2007), 929–940.[7] K. Fischer and B. Gartner, The smallest enclosing ball of balls: Combinatorial structureand algorithms, Comput. Geom. ( 2004), 341-378.[8] S. K. Jacobsen, An algorithm for the minimax Weber problem, European J. Oper. Res. (1981), 144–148.[9] D. W. Hearn and J. Vijay, Efficient algorithms for the (weighted) minimum circleproblem, Oper. Res. (1981), 777-795.[10] J. B. Hiriart-Urruty and C. Lemar´echal, Funndamental of Convex Analysis, Springer-Verlag, 2001.[11] D.R. Hunter, K. Lange, Tutorial on MM algorithms. Amer Statistician ( 2004),30–37.[12] K. Lange, D. R. Hunter, I. Yang, Optimization transfer using surrogate objective func-tions (with discussion). J Comput Graphical Statatisctis (2000), 1–59.2213] J. Mairal. Incremental majorization-minimization optimization with application tolarge-scale machine learning. arXiv preprint arXiv:1402.4419 (2014).[14] N. M. Nam, N. T. An, R. B. Rector and J. Sun, Nonsmooth algorithms and Nesterov’ssmoothing technique for generalized Fermat-Torricelli problems. SIAM J. Optim. 24(2014), no. 4, 1815–1839.[15] B. S. Mordukhovich and N. M. Nam, An Easy Path to Convex Analysis and Applica-tions, Morgan & Claypool Publishers, 2014.[16] N. M. Nam and N. Hoang, A generalized Sylvester problem and a generalized Fermat-Torricelli problem, to appear in Journal of Convex Analysis.[17] N. M. Nam, N. T. An and J. Salinas, Applications of convex analysis to the smallestintersecting ball problem, J. Convex Anal. (2012), 497–518.[18] Yu. Nesterov, Smooth minimization of non-smooth functions, Math. Program. (2005), 127–152.[19] Yu. Nesterov, Introductory lectures on convex optimization. A basic course. AppliedOptimization, 87. Kluwer Academic Publishers, Boston, MA, 2004.[20] Yu. Nesterov, A method for unconstrained convex minimization problem with the rateof convergence O( k ). Doklady AN SSSR (translated as Soviet Math. Docl.) (1983),543–547.[21] F. Nielsen and R. Nock, Approximating smallest enclosing balls with applications tomachine learning, Internat. J. Comput. Geom. Appl. (2009), 389–414.[22] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970.[23] A. Saha, S. Vishwanathan, and X. Zhang, Efficient approximation algorithms for min-imum enclosing convex shapes, proceedings of SODA, 2011.[24] J. J. Sylvester, A question in the geometry of situation. Quarterly Journal of Pure andApplied Mathematics (1857).[25] E. Welzl, Smallest enclosing disks (balls ellipsoids). H. Maurer, editor, Lecture Notesin Comput. Sci. (1991), 359–370.[26] S. Xu, R. M. Freund, and J. Sun, Solution methodologies for the smallest enclosingcircle problem. A tribute to Elijah (Lucien) Polak. Comput. Optim. Appl. (2003),no. 1-3, 283–292.[27] E. A. Yildirim, On the minimum volume covering ellipsoid of ellipsoids, SIAM J. Optim. (2006), 621–641.[28] E. A. Yildirim, Two algorithms for the minimum enclosing ball problem, SIAM J.Optim. (2008), 1368–1391. 2329] X. Zhai, Two problems in convex conic optimization, master’s thesis, National Univer-sity of Singapore, 2007.[30] T. Zhou, D. Tao, and X. Wu, NESVM: a Fast Gradient Method for Support VectorMachines, IEEE International Conference on Data Mining (ICDM), 2010.[31] G. Zhou, K. C. Toh, and J. Sun, Efficient algorithms for the smallest enclosing ballproblem, Comput. Optim. Appl.30