Convergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimization
aa r X i v : . [ m a t h . F A ] N ov Convergence rates for an inertial algorithm of gradient type associatedto a smooth nonconvex minimization
Szil´ard Csaba L´aszl´o ∗ November 26, 2018
Abstract.
We investigate an inertial algorithm of gradient type in connection with the minimization ofa nonconvex differentiable function. The algorithm is formulated in the spirit of Nesterov’s acceleratedconvex gradient method. We show that the generated sequences converge to a critical point of theobjective function, if a regularization of the objective function satisfies the Kurdyka- Lojasiewicz property.Further, we provide convergence rates for the generated sequences and the objective function valuesformulated in terms of the Lojasiewicz exponent.
Key Words. inertial algorithm, nonconvex optimization, Kurdyka- Lojasiewicz inequality, convergencerate
AMS subject classification.
Let g : R m −→ R be a (not necessarily convex) Fr´echet differentiable function with L g -Lipschitz continu-ous gradient, i.e. there exists L g ≥ k∇ g ( x ) − ∇ g ( y ) k ≤ L g k x − y k for all x, y ∈ R m . We dealwith the optimization problem ( P ) inf x ∈ R m g ( x ) . (1)We associate to (1) the following inertial algorithm of gradient type. Consider the starting points x = y ∈ R m , and for all n ∈ N x n +1 = y n − s ∇ g ( y n ) ,y n = x n + βnn + α ( x n − x n − ) , (2)where α > , β ∈ (0 ,
1) and 0 < s < − β ) L g . Note that (2) is a nonconvex descendant of the methods of Polyak [23] and Nesterov [21]. Indeed,in [23], Polyak introduced a modified gradient method for minimizing a smooth convex function g . Histwo-step iterative method, the so called heavy ball method, takes the following form: x n +1 = y n − λ n ∇ g ( x n ) ,y n = x n + α n ( x n − x n − ) , (3) ∗ Technical University of Cluj-Napoca, Department of Mathematics, Str. Memorandumului nr. 28, 400114 Cluj-Napoca,Romania, e-mail: [email protected] This work was supported by a grant of Ministry of Research and Innovation, CNCS- UEFISCDI, project number PN-III-P1-1.1-TE-2016-0266, and by a grant of Ministry of Research and Innovation, CNCS- UEFISCDI, project number PN-III-P4-ID-PCE-2016-0190, within PNCDI III. α n ∈ [0 ,
1) and λ n > α n = t n − t n +1 where t n satisfies the recursion t n +1 = √ t n +1+12 , t = 1 and put y n also for evaluating the gradient. Additionally, λ n is chosen in such way that λ n ≤ L g . His scheme in its simplest form is given by: x n +1 = y n − s ∇ g ( y n ) ,y n = x n + t n − t n +1 ( x n − x n − ) , (4)where s ≤ L g . This scheme leads to the convergence rate g ( x n ) − g ( x ) = O (cid:0) /n (cid:1) , where x is a minimizer of theconvex function g , and this is optimal among all methods having only information about the gradient of g and consecutive iterates, [22].By taking t n = n + a − a , a ≥ O (cid:0) /n (cid:1) , (see [15, 25]). This casehas been considered by Chambolle and Dossal [15], in order to prove the convergence of the iterates ofthe modified FISTA algorithm (see [7]). We emphasize that Algorithm (2) has a similar form as thealgorithm studied by Chambolle and Dossal (see [15] and also [18]), but we allow the function g to benonconvex. Unfortunately, our analysis do not cover the case β = 1 . Su, Boyd and Cand`es (see [25]), showed that in case t n = n +12 the algorithm (4) has the exact limitthe second order differential equation ¨ x ( t ) + αt ˙ x ( t ) + ∇ g ( x ( t )) = 0 . (5)with α = 3 . Recently, Attouch and his co-authors (see [4, 6]), proved that, if α > x ( t ) converges to a minimizer of g as t −→ + ∞ , while the convergence rate of the objectivefunction along the trajectory is o (1 /t ). Further, in [5], some results concerning the convergence rateof the objective function g along the trajectory generated by (5), in the subcritical case α ≤
3, havebeen obtained. However, the convergence of the generated trajectories by (5) in case g is nonconvex isstill an open question. Some important steps in this direction have been made in [14] (see also [12]),where convergence of the trajectories of a system, that can be viewed as a perturbation of (5), have beenobtained in a nonconvex setting. More precisely in [14] is considered the system¨ x ( t ) + (cid:16) γ + αt (cid:17) ˙ x ( t ) + ∇ g ( x ( t )) = 0 , (6)and it is shown that the generated trajectory converges to a critical point of g , if a regularization of g satisfies the Kurdyka- Lojasiewicz property.In what follows we show that by choosing appropriate values of β , the numerical scheme (2) hasas the exact limit the continuous second order dynamical systems (5) studied in [4–6, 25], and also thecontinuous dynamical system (6) studied in [14]. We take to this end in (2) small step sizes and followthe same approach as Su, Boyd and Cand`es in [25], (see also [14]). For this purpose we rewrite (2) in theform x n +1 − x n √ s = βnn + α · x n − x n − √ s − √ s ∇ g ( y n ) ∀ n ≥ Ansatz x n ≈ x ( n √ s ) for some twice continuously differentiable function x : [0 , + ∞ ) → R n . We let n = t √ s and get x ( t ) ≈ x n , x ( t + √ s ) ≈ x n +1 , x ( t − √ s ) ≈ x n − . Then, as the step size s goes2o zero, from the Taylor expansion of x we obtain x n +1 − x n √ s = ˙ x ( t ) + 12 ¨ x ( t ) √ s + o ( √ s )and x n − x n − √ s = ˙ x ( t ) −
12 ¨ x ( t ) √ s + o ( √ s ) . Further, since √ s k∇ g ( y n ) − ∇ g ( x n ) k ≤ √ sL g k y n − x n k = √ sL g (cid:12)(cid:12)(cid:12)(cid:12) βnn + α (cid:12)(cid:12)(cid:12)(cid:12) k x n − x n − k = o ( √ s ) , it follows √ s ∇ g ( y n ) = √ s ∇ g ( x n ) + o ( √ s ). Consequently, (7) can be written as˙ x ( t ) + 12 ¨ x ( t ) √ s + o ( √ s ) = βtt + α √ s (cid:18) ˙ x ( t ) −
12 ¨ x ( t ) √ s + o ( √ s ) (cid:19) − √ s ∇ g ( x ( t )) + o ( √ s )or, equivalently ( t + α √ s ) (cid:18) ˙ x ( t ) + 12 ¨ x ( t ) √ s + o ( √ s ) (cid:19) = βt (cid:18) ˙ x ( t ) −
12 ¨ x ( t ) √ s + o ( √ s ) (cid:19) − √ s ( t + α √ s ) ∇ g ( x ( t )) + o ( √ s ) . Hence, 12 (cid:0) α √ s + (1 + β ) t (cid:1) ¨ x ( t ) √ s + (cid:0) (1 − β ) t + α √ s (cid:1) ˙ x ( t ) + √ s ( t + α √ s ) ∇ g ( x ( t )) = o ( √ s ) . (8)Now, if we take β = 1 − γs < s > γ >
0, we obtain12 (cid:0) α √ s + (2 − γs ) t (cid:1) ¨ x ( t ) √ s + (cid:0) γst + α √ s (cid:1) ˙ x ( t ) + √ s ( t + α √ s ) ∇ g ( x ( t )) = o ( √ s ) . After dividing by √ s and letting s →
0, we obtain t ¨ x ( t ) + α ˙ x ( t ) + t ∇ g ( x ( t )) = 0 , which, after division by t , gives (5), that is¨ x ( t ) + αt ˙ x ( t ) + ∇ g ( x ( t )) = 0 . Similarly, by taking β = 1 − γ √ s < √ s > γ >
0, we obtain12 (cid:0) α √ s + (2 − γ √ s ) t (cid:1) ¨ x ( t ) √ s + (cid:0) γ √ st + α √ s (cid:1) ˙ x ( t ) + √ s ( t + α √ s ) ∇ g ( x ( t )) = o ( √ s ) . After dividing by √ s and letting s →
0, we get t ¨ x ( t ) + ( γt + α ) ˙ x ( t ) + t ∇ g ( x ( t )) = 0 , which, after division by t , gives (6), that is¨ x ( t ) + (cid:16) γ + αt (cid:17) ˙ x ( t ) + ∇ g ( x ( t )) = 0 . l , further we show that the set of cluster points of the iterates is included in the set of critical pointsof the objective function, and, finally, we use the KL property of an appropriate regularization of theobjective function in order to obtain that the iterates gap belongs to l , which implies the convergenceof the iterates, see also [3, 8, 13]. Moreover, in section 3, we obtain several convergence rates both forthe sequences ( x n ) n ∈ N , ( y n ) n ∈ N generated by the numerical scheme (2), as well as for the function values g ( x n ) , g ( y n ) in the terms of the Lojasiewicz exponent of g and a regularization of g , respectively (forsome general results see [16]). In this section we investigate the convergence of the proposed algorithm. We show that the sequencesgenerated by the numerical scheme (2) converge to a critical point of the objective function g , providedthe regularization of g , H ( x, y ) = g ( x ) + k y − x k , is a KL function. The main tool in our forthcominganalysis is the so called descent lemma, see [22]. Lemma 1
Let g : R m −→ R be Fr`echet differentiable with L g Lipschitz continuous gradient. Then g ( y ) ≤ g ( x ) + h∇ g ( x ) , y − x i + L g k y − x k , ∀ x, y ∈ R m . Now we are able to obtain a decrease property for the iterates generated by (2).
Theorem 2
In the settings of problem (1) , for some starting points x = y ∈ R m let ( x n ) n ∈ N , ( y n ) n ∈ N be the sequences generated by the numerical scheme (2) . Consider the sequences A n − = 2 − sL g s (cid:18) (1 + β ) n + αn + α (cid:19) − βn ((1 + β ) n + α ) s ( n + α ) ,C n − = 2 − sL g s βn − βn + α − β ) n + αn + α − s βn − βn + α − βnn + α and δ n = A n − − C n − for all n ∈ N , n ≥ .Then, there exists N ∈ N such that(i) The sequence (cid:0) g ( y n ) + δ n k x n − x n − k (cid:1) n ≥ N is decreasing and δ n > for all n ≥ N .Assume that g is bounded from below. Then, the following statements hold.(ii) The sequence (cid:0) g ( y n ) + δ n k x n − x n − k (cid:1) n ∈ N is convergent;(iii) P n ≥ k x n − x n − k < + ∞ . roof. From (2) we have ∇ g ( y n ) = s ( y n − x n +1 ), hence h∇ g ( y n ) , y n +1 − y n i = 1 s h y n − x n +1 , y n +1 − y n i . Now, from Lemma 1 we obtain g ( y n +1 ) ≤ g ( y n ) + h∇ g ( y n ) , y n +1 − y n i + L g k y n +1 − y n k , consequently we have g ( y n +1 ) − L g k y n +1 − y n k ≤ g ( y n ) + 1 s h y n − x n +1 , y n +1 − y n i . (9)Further, h y n − x n +1 , y n +1 − y n i = −k y n +1 − y n k + h y n +1 − x n +1 , y n +1 − y n i , and y n +1 − x n +1 = β ( n + 1) n + α + 1 ( x n +1 − x n ) , hence g ( y n +1 ) + (cid:18) s − L g (cid:19) k y n +1 − y n k ≤ g ( y n ) + β ( n +1) n + α +1 s h x n +1 − x n , y n +1 − y n i . (10)Since y n +1 − y n = (1 + β ) n + α + β + 1 n + α + 1 ( x n +1 − x n ) − βnn + α ( x n − x n − ) , we have, k y n +1 − y n k = (cid:13)(cid:13)(cid:13)(cid:13) (1 + β ) n + α + β + 1 n + α + 1 ( x n +1 − x n ) − βnn + α ( x n − x n − ) (cid:13)(cid:13)(cid:13)(cid:13) = (cid:18) (1 + β ) n + α + β + 1 n + α + 1 (cid:19) k x n +1 − x n k + (cid:18) βnn + α (cid:19) k x n − x n − k − β ) n + α + β + 1 n + α + 1 βnn + α h x n +1 − x n , x n − x n − i , and h x n +1 − x n , y n +1 − y n i = (cid:28) x n +1 − x n , (1 + β ) n + α + β + 1 n + α + 1 ( x n +1 − x n ) − βnn + α ( x n − x n − ) (cid:29) =(1 + β ) n + α + β + 1 n + α + 1 k x n +1 − x n k − βnn + α h x n +1 − x n , x n − x n − i . Replacing the above equalities in (10), we obtain g ( y n +1 ) + (2 − sL g ) (cid:16) (1+ β ) n + α + β +1 n + α +1 (cid:17) − β ( n +1)((1+ β ) n + α + β +1)( n + α +1) s k x n +1 − x n k ≤ g ( y n ) − (2 − sL g ) (cid:16) βnn + α (cid:17) s k x n − x n − k +(2 − sL g ) βnn + α (1+ β ) n + α + β +1 n + α +1 − βnn + α β ( n +1) n + α +1 s h x n +1 − x n , x n − x n − i . B n = (2 − sL g ) (cid:16) βnn + α (cid:17) s for all n ∈ N . Hence we have g ( y n +1 ) + A n k x n +1 − x n k − C n h x n +1 − x n , x n − x n − i ≤ g ( y n ) − B n k x n − x n − k . By using the equality − h x n +1 − x n , x n − x n − i = k x n +1 + x n − − x n k − k x n +1 − x n k − k x n − x n − k (11)we obtain g ( y n +1 ) + ( A n − C n ) k x n +1 − x n k + C n k x n +1 + x n − − x n k ≤ g ( y n ) + ( C n − B n ) k x n − x n − k . Note that A n − C n = δ n +1 and let us denote ∆ n = B n + A n − − C n − − C n . Consequently the followinginequality holds. C n k x n +1 + x n − − x n k +∆ n k x n − x n − k ≤ ( g ( y n )+ δ n k x n − x n − k ) − ( g ( y n +1 )+ δ n +1 k x n +1 − x n k ) . (12)Since 0 < β < s < − β ) L g , we havelim n −→ + ∞ A n = (2 − sL g )( β + 1) − β − β s > , lim n −→ + ∞ B n = (2 − sL g ) β s > , lim n −→ + ∞ C n = (2 − sL g )( β + β ) − β s > , lim n −→ + ∞ ∆ n = 2 − sL g − β s > , and lim n −→ + ∞ δ n = 2 − β − sL g ( β + 1)2 s > . Hence, there exists N ∈ N and C > , D > n ≥ N one has C n ≥ C, ∆ n ≥ D and δ n > g ( y n ) + δ n k x n − x n − k is decreasing for n ≥ N. Assume now that g is bounded from below. By using (12) again, we obtain0 ≤ C k x n +1 + x n − − x n k + D k x n − x n − k ≤ ( g ( y n ) + δ n k x n − x n − k ) − ( g ( y n +1 ) + δ n +1 k x n +1 − x n k ) , for all n ≥ N, or more convenient, that0 ≤ D k x n − x n − k ≤ ( g ( y n ) + δ n k x n − x n − k ) − ( g ( y n +1 ) + δ n +1 k x n +1 − x n k ) , (13)for all n ≥ N. Let r > N.
By summing up the latter relation we have D r X n = N k x n − x n − k ≤ ( g ( y N ) + δ N k x N − x N − k ) − ( g ( y r +1 ) + δ r +1 k x r +1 − x r k )6hich leads to g ( y r +1 ) + D r X n = N k x n − x n − k ≤ g ( y N ) + δ N k x N − x N − k . (14)Now, taking into account that g is bounded from below, by letting r −→ + ∞ we obtain ∞ X n = N k x n − x n − k ≤ + ∞ which proves (iii).The latter relation also shows that lim n −→ + ∞ k x n − x n − k = 0 , hence lim n −→ + ∞ δ n k x n − x n − k = 0 . But then, from the fact that g is bounded from below we obtain that the sequence g ( y n ) + δ n k x n − x n − k is bounded from below. On the other hand, from (i) we have that the sequence g ( y n ) + δ n k x n − x n − k is decreasing for n ≥ N, hence there existslim n −→ + ∞ g ( y n ) + δ n k x n − x n − k ∈ R . (cid:4) Remark 3
Observe that conclusion (iii) in the hypotheses of Theorem 2 assures that the sequence( x n − x n − ) n ∈ N ∈ l , in particular that lim n −→ + ∞ ( x n − x n − ) = 0 . (15)Let us denote by ω (( x n ) n ∈ N ) the set of cluster points of the sequence ( x n ) n ∈ N , and denote by crit( g ) = { x ∈ R m : ∇ g ( x ) = 0 } the set of critical points of g .In the following result we use the distance function to a set, defined for A ⊆ R n as dist( x, A ) =inf y ∈ A k x − y k for all x ∈ R n . Lemma 4
In the settings of problem (1) , for some starting points x = y ∈ R m consider the sequences ( x n ) n ∈ N , ( y n ) n ∈ N generated by Algorithm (2) . Assume that g is bounded from below and consider thefunction H : R m × R m −→ R , H ( x, y ) = g ( x ) + 12 k y − x k . Consider further the sequence u n = p δ n ( x n − x n − ) + y n , for all n ∈ N , where δ n was defined in Theorem 2. Then, the following statements hold true.(i) ω (( u n ) n ∈ N ) = ω (( y n ) n ∈ N ) = ω (( x n ) n ∈ N ) ⊆ crit g ;(ii) There exists and is finite the limit lim n −→ + ∞ H ( y n , u n ) ;(iii) ω (( y n , u n ) n ∈ N ) ⊆ crit H = { ( x, x ) ∈ R m × R m : x ∈ crit g } ;(iv) k∇ H ( y n , u n ) k ≤ s k x n +1 − x n k + (cid:16) βns ( n + α ) + 2 √ δ n (cid:17) k x n − x n − k for all n ∈ N ; v) k∇ H ( y n , u n ) k ≤ s k x n +1 − x n k + 2 (cid:18)(cid:16) βns ( n + α ) − √ δ n (cid:17) + δ n (cid:19) k x n − x n − k for all n ∈ N ;(vi) H is finite and constant on ω (( y n , u n ) n ∈ N ) . Assume that ( x n ) n ∈ N is bounded. Then,(vii) ω (( y n , u n ) n ∈ N ) is nonempty and compact;(viii) lim n −→ + ∞ dist(( y n , u n ) , ω (( y n , u n ) n ∈ N )) = 0 . Proof. (i) Let x ∈ ω (( x n ) n ∈ N ) . Then, there exists a subsequence ( x n k ) k ∈ N of ( x n ) n ∈ N such thatlim k → + ∞ x n k = x. Since by (15) lim n −→ + ∞ ( x n − x n − ) = 0 and the sequences ( √ δ n ) n ∈ N , (cid:16) βnn + α (cid:17) n ∈ N converge, we obtainthat lim k → + ∞ y n k = lim k → + ∞ u n k = lim k → + ∞ x n k = x, which shows that ω (( x n ) n ∈ N ) ⊆ ω (( u n ) n ∈ N ) and ω (( x n ) n ∈ N ) ⊆ ω (( y n ) n ∈ N ) . Further from (2), the continuity of ∇ g and (15), we obtain that ∇ g ( x ) = lim k −→ + ∞ ∇ g ( y n k ) = 1 s lim k −→ + ∞ ( y n k − x n k +1 ) =1 s lim k −→ + ∞ (cid:20) ( x n k − x n k +1 ) + βn k n k + α ( x n k − x n k − ) (cid:21) = 0 . Hence ω (( x n ) n ∈ N ) ⊆ crit g. Conversely, if u ∈ ω (( u n ) n ∈ N ) then, from (15) results that u ∈ ω (( y n ) n ∈ N )and u ∈ ω (( x n ) n ∈ N ) . Hence, ω (( u n ) n ∈ N ) = ω (( y n ) n ∈ N ) = ω (( x n ) n ∈ N ) ⊆ crit g. (ii) is nothing else than (ii) in Theorem 2.For (iii) observe that ∇ H ( x, y ) = ( ∇ g ( x ) + x − y, y − x ), hence, ∇ H ( x, y ) = 0 leads to x = y and ∇ g ( x ) = 0 . Consequently crit H = { ( x, x ) ∈ R m × R m : x ∈ crit g } . Further, consider ( y, u ) ∈ ω (( y n , u n ) n ∈ N ) . Then, there exists ( y n k , u n k ) k ∈ N ⊆ ( y n , u n ) n ∈ N such that( y, u ) = lim k −→ + ∞ ( y n k , u n k ) = lim k −→ + ∞ ( x n k , x n k ) = ( x, x ) . Hence, u = y = x ∈ ω (( x n ) n ∈ N ) ⊆ crit g and ( x, x ) ∈ crit H. (iv) By using the 1-norm of R m × R m and (2), for every n ∈ N we have k∇ H ( y n , u n ) k ≤ k∇ H ( y n , u n ) k = k ( ∇ g ( y n ) + y n − u n , u n − y n ) k = k∇ g ( y n ) + y n − u n k + k u n − y n k ≤k∇ g ( y n ) k + 2 k p δ n ( x n − x n − ) k =1 s (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) x n + βnn + α ( x n − x n − ) (cid:19) − x n +1 (cid:13)(cid:13)(cid:13)(cid:13) + 2 p δ n k x n − x n − k ≤ s k x n +1 − x n k + (cid:18) βns ( n + α ) + 2 p δ n (cid:19) k x n − x n − k . R m × R m , that is k ( x, y ) k = p k x k + k y k for all ( x, y ) ∈ R m × R m . We have: k∇ H ( y n , u n ) k = k ( ∇ g ( y n ) + y n − u n , u n − y n ) k = k∇ g ( y n ) + y n − u n k + k u n − y n k = (cid:13)(cid:13)(cid:13)(cid:13) s ( x n − x n +1 ) + (cid:18) βns ( n + α ) − p δ n (cid:19) ( x n − x n − ) (cid:13)(cid:13)(cid:13)(cid:13) + 2 δ n k x n − x n − k ≤ s k x n +1 − x n k + 2 (cid:18) βns ( n + α ) − p δ n (cid:19) + δ n ! k x n − x n − k for all n ∈ N . (vi) follows directly from (ii).Assume now that ( x n ) n ∈ N is bounded and let us prove (vii), (see also [14]). Obviously ( y n , u n ) n ∈ N isbounded, hence according to Weierstrass Theorem ω (( y n , u n ) n ∈ N ) , (and also ω (( x n ) n ∈ N )), is nonempty.It remains to show that ω (( y n , u n ) n ∈ N ) is closed. From (i) and the proof of (iii) we have ω (( y n , u n ) n ∈ N ) = { ( x, x ) ∈ R m × R m : x ∈ ω (( x n ) n ∈ N ) } . (16)Hence, it is enough to show that ω (( x n ) n ∈ N ) is closed.Let be ( x p ) p ∈ N ⊆ ω (( x n ) n ∈ N ) and assume that lim p −→ + ∞ x p = x ∗ . We show that x ∗ ∈ ω (( x n ) n ∈ N ) . Obviously, for every p ∈ N there exists a sequence of natural numbers n pk −→ + ∞ , k −→ + ∞ , such thatlim k −→ + ∞ x n pk = x p . Let be ǫ >
0. Since lim p −→ + ∞ x p = x ∗ , there exists P ( ǫ ) ∈ N such that for every p ≥ P ( ǫ ) it holds k x p − x ∗ k < ǫ . Let p ∈ N be fixed. Since lim k −→ + ∞ x n pk = x p , there exists k ( p, ǫ ) ∈ N such that for every k ≥ k ( p, ǫ ) itholds k x n pk − x p k < ǫ . Let be k p ≥ k ( p, ε ) such that n pk p > p . Obviously n pk p −→ ∞ as p −→ + ∞ and for every p ≥ P ( ǫ ) k x n pkp − x ∗ k < ǫ. Hence lim p −→ + ∞ x n pkp = x ∗ , thus x ∗ ∈ ω (( x n ) n ∈ N ) . (viii) By using (16) we havelim n −→ + ∞ dist(( y n , u n ) , ω (( y n , u n ) n ∈ N )) = lim n −→ + ∞ inf x ∈ ω (( x n ) n ∈ N ) k ( y n , u n ) − ( x, x ) k . Since there exists the subsequences ( y n k ) k ∈ N and ( u n k ) k ∈ N such that lim k −→∞ y n k = lim k −→∞ u n k = x ∈ ω (( x n ) n ∈ N ) it is straightforward thatlim n −→ + ∞ dist(( y n , u n ) , ω (( y n , u n ) n ∈ N )) = 0 . (cid:4) emark 5 We emphasize that if g is coercive, that is lim k x k→ + ∞ g ( x ) = + ∞ , then g is bounded frombelow and ( x n ) n ∈ N , ( y n ) n ∈ N , the sequences generated by (2), are bounded.Indeed, notice that g is bounded from below, being a continuous and coercive function (see [24]).Note that according to Theorem 2 the sequence D P rn = N k x n − x n − k is convergent hence is bounded.Consequently, from (14) it follows that y r is contained for every r > N, ( N is defined in the hypothesisof Theorem 2), in a lower level set of g , which is bounded. Since ( y n ) n ∈ N is bounded, taking into account(15), it follows that ( x n ) n ∈ N is also bounded.In order to continue our analysis we need the concept of a KL function. For η ∈ (0 , + ∞ ], we denoteby Θ η the class of concave and continuous functions ϕ : [0 , η ) → [0 , + ∞ ) such that ϕ (0) = 0, ϕ iscontinuously differentiable on (0 , η ), continuous at 0 and ϕ ′ ( s ) > s ∈ (0 , η ). Definition 1 ( Kurdyka- Lojasiewicz property ) Let f : R n → R be a differentiable function. We say that f satisfies the Kurdyka- Lojasiewicz (KL) property at x ∈ R n if there exist η ∈ (0 , + ∞ ], a neighborhood U of x and a function ϕ ∈ Θ η such that for all x in the intersection U ∩ { x ∈ R n : f ( x ) < f ( x ) < f ( x ) + η } the following inequality holds ϕ ′ ( f ( x ) − f ( x )) k∇ f ( x )) k ≥ . If f satisfies the KL property at each point in R n , then f is called a KL function .The origins of this notion go back to the pioneering work of Lojasiewicz [19], where it is provedthat for a real-analytic function f : R n → R and a critical point x ∈ R n (that is ∇ f ( x ) = 0), thereexists θ ∈ [1 / ,
1) such that the function | f − f ( x ) | θ k∇ f k − is bounded around x . This correspondsto the situation when ϕ ( s ) = C (1 − θ ) − s − θ . The result of Lojasiewicz allows the interpretation ofthe KL property as a re-parametrization of the function values in order to avoid flatness around thecritical points. Kurdyka [17] extended this property to differentiable functions definable in an o-minimalstructure. Further extensions to the nonsmooth setting can be found in [2, 9–11].To the class of KL functions belong semi-algebraic, real sub-analytic, semiconvex, uniformly convexand convex functions satisfying a growth condition. We refer the reader to [1–3, 8–11] and the referencestherein for more details regarding all the classes mentioned above and illustrating examples.An important role in our convergence analysis will be played by the following uniformized KL propertygiven in [8, Lemma 6]. Lemma 6
Let Ω ⊆ R n be a compact set and let f : R n → R be a differentiable function. Assume that f is constant on Ω and f satisfies the KL property at each point of Ω . Then there exist ε, η > and ϕ ∈ Θ η such that for all x ∈ Ω and for all x in the intersection { x ∈ R n : dist( x, Ω) < ε } ∩ { x ∈ R n : f ( x ) < f ( x ) < f ( x ) + η } (17) the following inequality holds ϕ ′ ( f ( x ) − f ( x )) k∇ f ( x ) k ≥ . (18)The following convergence result is the first main result of the paper. Theorem 7
In the settings of problem (1) , for some starting points x = y ∈ R m consider the sequences ( x n ) n ∈ N , ( y n ) n ∈ N generated by Algorithm (2) . Assume that g is bounded from below and consider thefunction H : R m × R m −→ R , H ( x, y ) = g ( x ) + 12 k y − x k . Assume that ( x n ) n ∈ N is bounded and H is a KL function. Then the following statements are true a) P n ≥ k x n − x n − k < + ∞ ; (b) there exists x ∈ crit( g ) such that lim n −→ + ∞ x n = x. Proof.
Consider the sequence u n = p δ n ( x n − x n − ) + y n , for all n ∈ N , that was defined in the hypotheses of Lemma 4. Furthermore, consider ( x, x ) ∈ ω (( y n , u n ) n ∈ N ) . Then, according to Lemma 4, the sequence H ( y n , u n ) is decreasing for all n ≥ N , where N was definedin Theorem 2, further x ∈ crit g and lim n −→ + ∞ H ( y n , u n ) = H ( x, x ) . We divide the proof into two cases.
Case I.
There exists n ≥ N, n ∈ N , such that H ( y n , u n ) = H ( x, x ) . Then, since H ( y n , u n ) isdecreasing for all n ≥ N and lim n −→ + ∞ H ( y n , u n ) = H ( x, x ) we obtain that H ( y n , u n ) = H ( x, x ) for all n ≥ n. The latter relation combined with (13) leads to0 ≤ D k x n − x n − k ≤ H ( y n , u n ) − H ( y n +1 , u n +1 ) = H ( x, x ) − H ( x, x ) = 0for all n ≥ n. Hence ( x n ) n ≥ n is constant and the conclusion follows. Case II.
For every n ≥ N one has that H ( y n , u n ) > H ( x, x ) . Let Ω = ω (( y n , u n ) n ∈ N ) . Then accordingto Lemma 4, Ω is nonempty and compact and H is constant on Ω . Since H is KL, according to Lemma6 there exist ε, η > ϕ ∈ Θ η such that for all ( z, w ) belonging to the intersection { ( z, w ) ∈ R n × R n : dist(( z, w ) , Ω) < ε } ∩ { ( z, w ) ∈ R n × R n : H ( x, x ) < H ( z, w ) < H ( x, x ) + η } one has ϕ ′ ( H ( z, w ) − H ( x, x )) k∇ H ( z, w ) k ≥ . Since lim n −→ + ∞ dist(( y n , u n ) , Ω) = 0, there exists n ∈ N such thatdist(( y n , u n ) , Ω) < ǫ, ∀ n ≥ n . Since lim n −→ + ∞ H ( y n , u n ) = H ( x, x )and H ( y n , u n )) > H ( x, x ) for all n ≥ N, there exists n ≥ N such that H ( x, x ) < H ( y n , u n ) < H ( x, x ) + η, ∀ n ≥ n . Hence, for all n ≥ n = max( n , n ) we have ϕ ′ ( H ( y n , u n ) − H ( x, x )) · k∇ H ( y n , u n )) k ≥ . Since ϕ is concave, for all n ∈ N we have ϕ ( H ( y n , u n ) − H ( x, x )) − ϕ ( H ( y n +1 , u n +1 ) − H ( x, x )) ≥ ′ ( H ( y n , u n ) − H ( x, x )) · ( H ( y n , u n ) − H ( y n +1 , u n +1 )) , hence, ϕ ( H ( y n , u n ) − H ( x, x )) − ϕ ( H ( y n +1 , u n +1 ) − H ( x, x )) ≥ H ( y n , u n ) − H ( y n +1 , u n +1 ) k∇ H ( y n , u n ) k for all n ≥ n. Now, from (13) and Lemma 4 (iv) we obtain ϕ ( H ( y n , u n ) − H ( x, x )) − ϕ ( H ( y n +1 , u n +1 ) − H ( x, x )) ≥ (19) D k x n − x n − k s k x n +1 − x n k + (cid:16) βns ( n + α ) + 2 √ δ n (cid:17) k x n − x n − k , for all n ≥ n. Since lim n −→ + ∞ δ n = − β − sL g ( β +1)2 s > n −→ + ∞ βns ( n + α ) = βs ≥ N ∈ N , N ≥ n and M > (cid:18) s , βns ( n + α ) + 2 p δ n (cid:19) ≤ M, for all n ≥ N .
Hence, (19) becomes ϕ ( H ( y n , u n ) − H ( x, x )) − ϕ ( H ( y n +1 , u n +1 ) − H ( x, x )) ≥ (20) D k x n − x n − k M ( k x n +1 − x n k + k x n − x n − k ) , for all n ≥ N .
Consequently, k x n − x n − k ≤ r MD ( ϕ ( H ( y n , u n ) − H ( x, x )) − ϕ ( H ( y n +1 , u n +1 ) − H ( x, x ))) · ( k x n +1 − x n k + k x n − x n − k ) , for all n ≥ N .
By using the arithmetical-geometrical mean inequality we have r MD ( ϕ ( H ( y n , u n ) − H ( x, x )) − ϕ ( H ( y n +1 , u n +1 ) − H ( x, x ))) · ( k x n +1 − x n k + k x n − x n − k ) ≤k x n +1 − x n k + k x n − x n − k M D ( ϕ ( H ( y n , u n ) − H ( x, x )) − ϕ ( H ( y n +1 , u n +1 ) − H ( x, x )))for all n ≥ N .
Hence k x n − x n − k ≤k x n +1 − x n k + k x n − x n − k M D ( ϕ ( H ( y n , u n ) − H ( x, x )) − ϕ ( H ( y n +1 , u n +1 ) − H ( x, x )))for all n ≥ N which leads to2 k x n − x n − k − k x n +1 − x n k ≤ M D ( ϕ ( H ( y n , u n ) − H ( x, x )) − ϕ ( H ( y n +1 , u n +1 ) − H ( x, x ))) (21)12or all n ≥ N .
Let
P > N . By summing up (21) from N to P we obtain P X n = N k x n − x n − k ≤−k x N − x N − k + k x P +1 − x P k + 9 M D ( ϕ ( H ( y N , u N ) − H ( x, x )) − ϕ ( H ( y P +1 , u P +1 ) − H ( x, x ))) . Now, by letting P −→ + ∞ and using the fact that ϕ (0) = 0 and (15) we obtain that ∞ X n = N k x n − x n − k ≤ −k x N − x N − k + 9 M D ϕ ( H ( y N , u N ) − H ( x, x )) < + ∞ , hence X n ≥ k x n − x n − k < + ∞ which is exactly (a).Obviously the sequence S n = P nk =1 k x k − x k − k is Cauchy, hence, for all ǫ > N ǫ ∈ N such that for all n ≥ N ǫ and for all p ∈ N one has S n + p − S n ≤ ǫ. But S n + p − S n = n + p X k = n +1 k x k − x k − k ≥ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n + p X k = n +1 ( x k − x k − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = k x n + p − x n k hence the sequence ( x n ) n ∈ N is Cauchy, consequently is convergent. Letlim n −→ + ∞ x n = x. Now, according to Lemma 4 (i) one has { x } = ω (( x n ) n ∈ N ) ⊆ crit g which proves (b). (cid:4) Remark 8
Since the class of semi-algebraic functions is closed under addition (see for example [8]) and( x, y ) k x − y k is semi-algebraic, the conclusion of the previous theorem holds if the condition H isa KL function is replaced by the assumption that g is semi-algebraic. Remark 9
Note that, according to Remark 5, the conclusion of Theorem 7 remains valid if we replacein its hypotheses the conditions that g is bounded from below and ( x n ) n ∈ N is bounded by the conditionthat g is coercive. Remark 10
Note that under the assumptions of Theorem 7 we have lim n −→ + ∞ y n = x andlim n −→ + ∞ g ( x n ) = lim n −→ + ∞ g ( y n ) = g ( x ) . Convergence rates
In this section we will assume that the regularized function H satisfies the Lojasiewicz property, which,as noted in the previous section, corresponds to a particular choice of the desingularizing function ϕ (see [1, 9, 19]). Definition 2
Let f : R n −→ R be a differentiable function. The function f is said to fulfill the Lojasiewicz property, if for every x ∈ crit f there exist K, ǫ > and θ ∈ (0 , such that | f ( x ) − f ( x ) | θ ≤ K k∇ f ( x ) k for every x fulfilling k x − x k < ǫ. The number θ is called the Lojasiewicz exponent of f at the critical point x. This corresponds to the casewhen the desingularizing function ϕ has the form ϕ ( t ) = K − θ t − θ . In the following theorems we provide convergence rates for the sequence generated by (2), but alsofor the function values, in terms of the Lojasiewicz exponent of H (see, also, [1, 9]). Note that theforthcoming results remain valid if one replace in their hypotheses the conditions that g is bounded frombelow and ( x n ) n ∈ N is bounded by the condition that g is coercive. Theorem 11
In the settings of problem (1) consider the sequences ( x n ) n ∈ N , ( y n ) n ∈ N generated by Algo-rithm (2) . Assume that g is bounded from below and that ( x n ) n ∈ N is bounded, let x ∈ crit( g ) be such that lim n −→ + ∞ x n = x and suppose that H : R n × R n −→ R , H ( x, y ) = g ( x ) + 12 k x − y k fulfills the Lojasiewicz property at ( x, x ) ∈ crit H with Lojasiewicz exponent θ ∈ (cid:0) , (cid:3) . Then, for every p > there exist a , a , a , a > and k ∈ N such that the following statements hold true: (a ) g ( y n ) − g ( x ) ≤ a n p for every n > k , (a ) g ( x n ) − g ( x ) ≤ a n p for every n > k , (a ) k x n − x k ≤ a n p for every n > k , (a ) k y n − x k ≤ a n p for all n > k . Proof.
As we have seen in the proof of Theorem 7, if there exists n ≥ N, n ∈ N , (where N was definedin Theorem 2), such that H ( y n , u n ) = H ( x, x ) , then, H ( y n , u n ) = H ( x, x ) for all n ≥ n and ( x n ) n ≥ n is constant. Consequently ( y n ) n ≥ n is also constant and the conclusion of the theorem isstraightforward.Hence, in what follows we assume that H ( y n , u n ) > H ( x, x ) , for all n ≥ N. Let us fix p > ).For simplicity let us denote r n = H ( y n , u n ) − H ( x, x ) > n ∈ N . From (12) we have∆ n k x n − x n − k ≤ r n − r n +1 for all n ≥ N. From Lemma 4 (v) we have k∇ H ( y n , u n ) k ≤ s k x n +1 − x n k + 2 (cid:18) βns ( n + α ) − p δ n (cid:19) + δ n ! k x n − x n − k n ∈ N . Let S n = 2 (cid:18)(cid:16) βns ( n + α ) − √ δ n (cid:17) + δ n (cid:19) , for all n ∈ N . It follows that, for all n ≥ N one has k x n − x n − k ≥ S n k∇ H ( y n , u n ) k − s S n k x n +1 − x n k ≥ S n k∇ H ( y n , u n ) k − s S n ∆ n +1 ( r n +1 − r n +2 ) . Now by using the Lojasiewicz property of H at ( x, x ) ∈ crit H , and the fact that lim n −→ + ∞ ( y n , u n ) =( x, x ), we obtain that there exists K, ǫ > N ∈ N , such that for all n ≥ N one has k ( y n , u n ) − ( x, x ) k < ǫ, consequently r n − r n +1 ≥ ∆ n S n k∇ H ( y n , u n ) k − n s S n ∆ n +1 ( r n +1 − r n +2 ) ≥ (22)∆ n K S n r θn − n s S n ∆ n +1 ( r n +1 − r n +2 ) ≥ ∆ n K S n r θn +1 − n s S n ∆ n +1 ( r n +1 − r n +2 ) = α n r θn +1 − β n ( r n +1 − r n +2 ) , where α n = ∆ n K S n and β n = n s S n ∆ n +1 . It is obvious that the sequences ( α n ) n ≥ N and ( β n ) n ≥ N are convergent, furtherlim n −→ + ∞ α n > n −→ + ∞ β n > . Now, since 0 < θ ≤ r n +1 −→
0, there exists N ∈ N , N ≥ N , such that r θn +1 ≥ r n +1 for all n ≥ N . ( ∗ ) Note that this implies that 0 ≤ r n ≤ n ≥ N . Hence, r n ≥ ( α n − β n + 1) r n +1 + β n r n +2 , for all n ≥ N . Let us define for every n > N the sequence Ξ n = β n n p ( n +1) p − n p . Then, since p > n −→ + ∞ Ξ n =+ ∞ . Since lim n −→ + ∞ ( α n − β n + 1) − β n +1 Ξ n +1 − β n − β n Ξ n ! = lim n −→ + ∞ α n > , there exists k ∈ N , k ≥ N such that for all n ≥ k one has α n − β n + 1 ≥ β n +1 Ξ n +1 − β n − β n Ξ n . Consequently, r n ≥ β n +1 Ξ n +1 − β n − β n Ξ n ! r n +1 + β n r n +2 , for all n ≥ k, or, equivalently r n + β n − β n Ξ n r n +1 ≥ (cid:18) β n +1 Ξ n +1 (cid:19) r n +1 + β n β n +1 Ξ n +1 r n +2 ! , (23)15or all n ≥ k. Now (23) leads to n Y k = k r k + β k − β k Ξ k r k +1 ! ≥ n Y k = k (cid:18) β k +1 Ξ k +1 (cid:19) n Y k = k r k +1 + β k β k +1 Ξ k +1 r k +2 , hence after simplifying we get r k + β k − β k Ξ k r k +1 n Y k = k
11 + β k +1 Ξ k +1 ≥ r n +1 + β n β n +1 Ξ n +1 r n +2 . (24)But, β n +1 Ξ n +1 = ( n +2) p ( n +1) p − , hence n Y k = k
11 + β k +1 Ξ k +1 = n Y k = k ( k + 1) p ( k + 2) p = ( k + 1) p ( n + 2) p . By denoting r k + β k − βk Ξ k r k +1 ! ( k + 1) p = a , we have a n + 2) p ≥ r n +1 + β n n +1 β n +1 r n +2 . Hence, a n p ≥ a n + 1) p ≥ r n = g ( y n ) − g ( x ) + δ n k x n − x n − k ≥ g ( y n ) − g ( x ) (25)which is (a ).For (a ) we start from Lemma 1 and (2) and we have g ( x n ) − g ( y n ) ≤ h∇ g ( y n ) , x n − y n i + L g k x n − y n k =1 s (cid:28) ( x n − x n +1 ) + βnn + α ( x n − x n − ) , − βnn + α ( x n − x n − ) (cid:29) + L g (cid:18) βnn + α (cid:19) k x n − x n − k = − (cid:18) βnn + α (cid:19) − sL g s k x n − x n − k + 1 s (cid:28) x n +1 − x n , βnn + α ( x n − x n − ) (cid:29) . By using the inequality h X, Y i ≤ (cid:0) a k X k + a k Y k (cid:1) for all X, Y ∈ R m , a ∈ R \ { } , we obtain (cid:28) x n +1 − x n , βnn + α ( x n − x n − ) (cid:29) ≤ − sL g k x n +1 − x n k + (2 − sL g ) (cid:18) βnn + α (cid:19) k x n − x n − k ! , consequently g ( x n ) − g ( y n ) ≤ s (2 − sL g ) k x n +1 − x n k . From (13) we have k x n − x n − k ≤ D (( g ( y n ) + δ n k x n − x n − k ) − ( g ( y n +1 ) + δ n +1 k x n +1 − x n k ))16nd since the sequence ( g ( y n ) + δ n k x n +1 − x n k ) n ≥ k is decreasing and has the limit g ( x ), we obtain that g ( y n +1 ) + δ n +1 k x n +1 − x n k ≥ g ( x ), consequently k x n − x n − k ≤ D r n . (26)Hence, for all n ≥ k one has g ( x n ) − g ( y n ) ≤ sD (2 − sL g ) r n +1 . (27)Now, the identity g ( x n ) − g ( x ) = ( g ( x n ) − g ( y n )) + ( g ( y n ) − g ( x )) and (a ) lead to g ( x n ) − g ( x ) ≤ sD (2 − sL g ) r n +1 + a n p for every n > k , which combined with (25) give g ( x n ) − g ( x ) ≤ sD (2 − sL g ) a n + 2) p + a n p ≤ a (cid:18) sD (2 − sL g ) (cid:19) n p = a n p , for every n > k .For (a ) observe, that by summing up (21) from n ≥ k to P > n and using the triangle inequality weobtain k x P − x n − k ≤ P X k = n k x k − x k − k ≤−k x n − x n − k + k x P +1 − x P k + 9 M D ( ϕ ( H ( y n , u n ) − H ( x, x )) − ϕ ( H ( y P +1 , u P +1 ) − H ( x, x ))) . By letting P −→ + ∞ we get k x n − − x k ≤ −k x n − x n − k + 9 M D ϕ ( H ( y n , u n ) − H ( x, x )) ≤ M D ϕ ( H ( y n , u n ) − H ( x, x )) . But, ϕ ( t ) = K − θ t − θ , hence k x n − − x k ≤ M K D (1 − θ ) ( H ( y n , u n ) − H ( x, x )) − θ = M r − θn , (28)where M = MK D (1 − θ ) . But ( ∗ ) assures that 0 ≤ r n ≤ θ ∈ (cid:0) , (cid:3) leads to r − θn ≤ √ r n , consequentlywe have k x n − − x k ≤ M √ r n . The conclusion follow by (25), since we have k x n − x k ≤ M √ a n p = a n p for every n > k. Finally, for n > k we have k y n − x k = (cid:13)(cid:13)(cid:13)(cid:13) x n + βnn + α ( x n − x n − ) − x (cid:13)(cid:13)(cid:13)(cid:13) ≤ βnn + α (cid:19) k x n − x k + βnn + α k x n − − x k ≤ (cid:18) βnn + α (cid:19) a n p + βnn + α a n p ≤ (cid:18) βnn + α (cid:19) a n p . Let a = (1 + 2 β ) a . Then k y n − x k ≤ a n p , for all n > k , which proves (a ). (cid:4) Remark 12
In the previous theorem we obtained convergence rates with order p, for every p > . Thishappened when we took in (24) β n +1 Ξ n +1 = ( n + 2) p ( n + 1) p − . But actually we have shown more. If one takes β n +1 Ξ n +1 = ρ n +1 > n −→ + ∞ ρ n = 0 then oneobtains that there exits k ∈ N and A > n ≥ k one has α n − β n + 1 ≥ ρ n +1 − β n − ρ n hence (24) becomes g ( y n ) − g ( x ) ≤ A n Y k = k +1
11 + ρ k . From here, as in the proof of Theorem 11, one can derive that g ( x n ) − g ( x ) ≤ A n Y k = k +1
11 + ρ k for some A > , and k x n − x k = O vuut n Y k = k +1
11 + ρ k and k y n − x k = O vuut n Y k = k +1
11 + ρ k . Having in mind this general result, and taking into account that in [14], for the dynamical system (6)which, as it is shown in Introduction, can be viewed as the continuous counterpart of the numerical scheme(2), it was obtained finite time convergence of the generated trajectories for θ ∈ (cid:0) , (cid:1) and exponentialconvergence rate for θ = , it seems a valid question whether we can obtain exponential convergence ratefor the sequences generated by (2), by choosing an appropriate sequence ρ n . We show in what follow thatthis is not possible. We have n Y k = k +1
11 + ρ k = e − P nk = k +1 ln(1+ ρ k ) . Obviously ln(1 + ρ k ) > , for all k > k and lim k −→ + ∞ ln(1 + ρ k ) = 0 . Now, by using the Ces`aro-Stolztheorem we obtain that lim n −→ + ∞ P nk = k +1 ln(1 + ρ k ) n = lim n −→ + ∞ ln(1 + ρ n +1 ) = 0 , P nk = k +1 ln(1 + ρ k ) = o ( n ) , which shows that O n Y k = k +1
11 + ρ k > O (cid:0) e − n (cid:1) . Remark 13
According to [18], H is KL with Lojasiewicz exponent θ ∈ (cid:2) , (cid:1) , whenever g is KL with Lojasiewicz exponent θ ∈ (cid:2) , (cid:1) . Therefore, we have the following corollary.
Corollary 14
In the settings of problem (1) consider the sequences ( x n ) n ∈ N , ( y n ) n ∈ N generated by Al-gorithm (2) . Assume that g is bounded from below and that ( x n ) n ∈ N is bounded, let x ∈ crit( g ) be suchthat lim n −→ + ∞ x n = x and suppose that g fulfills the Lojasiewicz property at x with Lojasiewicz exponent θ = . Then, for every p > there exist a , a , a , a > and k ∈ N such that the following statementshold true: (a ) g ( y n ) − g ( x ) ≤ a n p for every n > k , (a ) g ( x n ) − g ( x ) ≤ a n p for every n > k , (a ) k x n − x k ≤ a n p for every n > k , (a ) k y n − x k ≤ a n p for all n > k . In case the Lojasiewicz exponent of the regularization function H is θ ∈ (cid:0) , (cid:1) we have the followingresult concerning the convergence rates of the sequences generated by (2). Theorem 15
In the settings of problem (1) consider the sequences ( x n ) n ∈ N , ( y n ) n ∈ N generated by Algo-rithm (2) . Assume that g is bounded from below and that ( x n ) n ∈ N is bounded, let x ∈ crit( g ) be such that lim n −→ + ∞ x n = x and suppose that H : R n × R n −→ R , H ( x, y ) = g ( x ) + 12 k x − y k fulfills the Lojasiewicz property at ( x, x ) ∈ crit H with Lojasiewicz exponent θ ∈ (cid:0) , (cid:1) . Then, there exist b , b , b , b > such that the following statements hold true: (b ) g ( y n ) − g ( x ) ≤ b n θ − , for all n ≥ N + 2 ; (b ) g ( x n ) − g ( x ) ≤ b n θ − , for all n ≥ N + 2 ; (b ) k x n − x k ≤ b n − θ θ − , for all n ≥ N + 2 ; (b ) k y n − x k ≤ b n − θ θ − , for all n > N + 2 ,where N ∈ N was defined in the proof of Theorem 11 . Proof.
Also here, to avoid triviality, in what follows we assume that H ( y n , u n ) > H ( x, x ) , for all n ≥ N. From (22) we have that for every n ≥ N it holds r n − r n +1 ≥ α n r θn +1 − β n ( r n +1 − r n +2 ) , where α n = ∆ n K S n and β n = n s S n ∆ n +1 . r n − r n +1 ) r − θn +1 + β n ( r n +1 − r n +2 ) r − θn +1 ≥ α n , for all n ≥ N . Consider the function φ ( t ) = K θ − t − θ where K is the constant defined at the Lojasiewicz propertyof H . Then φ ′ ( t ) = − Kt − θ and we have φ ( r n +1 ) − φ ( r n ) = Z r n +1 r n φ ′ ( t ) dt = K Z r n r n +1 t − θ dt ≥ K ( r n − r n +1 ) r − θn . Analogously, φ ( r n +2 ) − φ ( r n +1 ) ≥ K ( r n +1 − r n +2 ) r − θn +1 . Assume that for some n ≥ N it holds that r − θn ≥ r − θn +1 . Then φ ( r n +1 ) − φ ( r n ) + β n ( φ ( r n +2 ) − φ ( r n +1 )) ≥ K r n − r n +1 ) r − θn +1 + Kβ n ( r n +1 − r n +2 ) r − θn +1 ≥ (29) K r n − r n +1 ) r − θn +1 + K β n ( r n +1 − r n +2 ) r − θn +1 ≥ K α n . Conversely, if 2 r − θn < r − θn +1 for some n ≥ N , then2 θ − θ r − θn < r − θn +1 , hence, φ ( r n +1 ) − φ ( r n ) = K θ − r − θn +1 − r − θn ) ≥ K θ − (cid:16) θ − θ − (cid:17) r − θn ≥ K θ − (cid:16) θ − θ − (cid:17) r − θN = C . Consequently, φ ( r n +1 ) − φ ( r n ) + β n ( φ ( r n +2 ) − φ ( r n +1 )) ≥ C (1 + β n ) . Let C = inf n ≥ N C (1+ β n ) α n > . Then, φ ( r n +1 ) − φ ( r n ) + β n ( φ ( r n +2 ) − φ ( r n +1 )) ≥ C α n . (30)From (29) and (30) we get that there exists C > φ ( r n +1 ) − φ ( r n ) + β n ( φ ( r n +2 ) − φ ( r n +1 )) ≥ Cα n , for all n ≥ N . Let β = sup n ≥ N β n . Then the latter relation becomes φ ( r n +1 ) − φ ( r n ) + β ( φ ( r n +2 ) − φ ( r n +1 )) ≥ Cα n , for all n ≥ N , which leads to n X k = N (cid:18) φ ( r k +1 ) − φ ( r k ) + β ( φ ( r k +2 ) − φ ( r k +1 )) (cid:19) ≥ C n X k = N α k . Consequently, φ ( r n +1 ) − φ ( r N ) + β ( φ ( r n +2 ) − φ ( r N +1 )) ≥ C n X k = N α k r n ) n ≥ N is decreasing and φ is also decreasing, we obtain(1 + β ) φ ( r n +2 ) ≥ C n X k = N α k . In other words r − θn ≥ C (2 θ − K (1 + β ) n − X k = N α k , for all n ≥ N + 2 . Hence, r n ≤ (cid:18) C (2 θ − K (1 + β ) (cid:19) − θ − n − X k = N α k − θ − , for all n ≥ N + 2 . Since P n − k = N α k ≥ α ( n − N − < α = inf k ≥ N α k we have that there exists M > n − X k = N α k − θ − ≤ α − θ − ( n − N − − θ − ≤ α − θ − M n − θ − , for all n ≥ N + 2 . Therefore, we have r n ≤ (cid:18) C (2 θ − K (1 + β ) (cid:19) − θ − α − θ − M n − θ − = b n − θ − , for all n ≥ N + 2 . But, r n = g ( y n ) − g ( x ) + δ n k x n − x n − k , consequently g ( y n ) − g ( x ) ≤ b n − θ − , for all n ≥ N + 2and (b ) is proved.For (b ) observe that (27) holds for all n ≥ N , hence for all n ≥ N one has g ( x n ) − g ( y n ) ≤ sD (2 − sL g ) r n +1 ≤ sD (2 − sL g ) b ( n + 1) − θ − . Thus, there exists
M > g ( x n ) − g ( x ) = ( g ( x n ) − g ( y n )) + ( g ( y n ) − g ( x )) ≤ (cid:18) sD (2 − sL g ) b M + b (cid:19) n − θ − = b n − θ − , for all n ≥ N + 2.For proving ( b ) we use (28). Note that the relation k x n − x k ≤ M r − θn holds for all n ≥ N . Hence, k x n − x k ≤ M (cid:16) b n − θ − (cid:17) − θ , for all n ≥ N + 2 . Consequently, k x n − x k ≤ b n θ − θ − , for all n ≥ N + 2 , where b = M b − θ and this proves (b ).For (b ) observe that for n ≥ N + 3 we have k y n − x k = (cid:13)(cid:13)(cid:13)(cid:13) x n + βnn + α ( x n − x n − ) − x (cid:13)(cid:13)(cid:13)(cid:13) ≤ βnn + α (cid:19) k x n − x k + βnn + α k x n − − x k ≤ (cid:18) βnn + α (cid:19) b n θ − θ − + βnn + α b ( n − θ − θ − ≤ (cid:18)(cid:18) βnn + α (cid:19) b + βnn + α b (cid:19) ( n − θ − θ − ≤ b n θ − θ − , where one can take b = sup n ≥ N +3 (cid:16)(cid:16) βnn + α (cid:17) b + βnn + α b (cid:17) (cid:16) nn − (cid:17) − θ θ − . (cid:4) According to Remark 13 we have the following corollary.
Corollary 16
In the settings of problem (1) consider the sequences ( x n ) n ∈ N , ( y n ) n ∈ N generated by Al-gorithm (2) . Assume that g is bounded from below and that ( x n ) n ∈ N is bounded, let x ∈ crit( g ) be suchthat lim n −→ + ∞ x n = x and suppose that g fulfills the Lojasiewicz property at x with Lojasiewicz exponent θ ∈ (cid:0) , (cid:1) . Then, there exist b , b , b , b > such that the following statements hold true: (b ) g ( y n ) − g ( x ) ≤ b n θ − , for all n ≥ N + 2 ; (b ) g ( x n ) − g ( x ) ≤ b n θ − , for all n ≥ N + 2 ; (b ) k x n − x k ≤ b n − θ θ − , for all n ≥ N + 2 ; (b ) k y n − x k ≤ b n − θ θ − , for all n > N + 2 ,where N ∈ N was defined in the proof of Theorem 11. In this paper we show the convergence of a Nesterov type algorithm in a full nonconvex setting, assum-ing that a regularization of the objective function satisfies the Kurdyka- Lojasiewicz property. For thispurpose as a starting point we show a sufficient decrease property for the iterates generated by our al-gorithm. Though our algorithm is asymptotically equivalent to Nesterov’s accelerated gradient method,we cannot obtain full equivalence due to the fact that in order to obtain the above mentioned decreaseproperty we cannot allow the inertial parameter, more precisely the parameter β , to attain the value 1.Nevertheless, we obtain convergence rates of order p for every p >
0, for the sequences generated by ournumerical scheme but also for the function values in these sequences, provided the objective function, ora regularization of the objective function, satisfies the Lojasiewicz property with Lojasiewicz exponent θ ∈ (cid:0) , (cid:3) . We also show that, at least with our techniques, exponential convergence rates cannot beobtained. In case the Lojasiewicz exponent of the objective function, or a regularization of the objectivefunction, is θ ∈ (cid:0) , (cid:1) , we obtain polynomial convergence rates.A related future research is the study of a modified FISTA algorithm in a nonconvex setting. Indeed,let f : R m −→ R be a proper convex and lower semicontinuous function and let g : R m −→ R be a(possible nonconvex) smooth function with L g Lipschitz continuous gradient. Consider the optimizationproblem inf x ∈ R m f ( x ) + g ( x ) .
22e associate to this optimization problem the following proximal-gradient algorithm. For x , y ∈ R m consider x n +1 = prox sf ( y n − s ∇ g ( y n )) ,y n = x n + βnn + α ( x n − x n − ) , (31)where α > , β ∈ (0 ,
1) and 0 < s < − β ) L g . Obviously, when f ≡ f + g, would open the gate for the study of FISTA type algorithms in a nonconvex setting. References [1] H. Attouch, J. Bolte,
On the convergence of the proximal algorithm for nonsmooth functions in-volving analytic features , Mathematical Programming 116(1-2) Series B, 5-16, 2009[2] H. Attouch, J. Bolte, P. Redont, A. Soubeyran,
Proximal alternating minimization and projec-tion methods for nonconvex problems: an approach based on the Kurdyka- Lojasiewicz inequality ,Mathematics of Operations Research 35(2), 438-457, 2010[3] H. Attouch, J. Bolte, B.F. Svaiter,
Convergence of descent methods for semi-algebraic and tameproblems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods ,Mathematical Programming 137(1-2) Series A, 91-129, 2013[4] H. Attouch, Z. Chbani, J. Peypouquet, P. Redont,
Fast convergence of inertial dynamics andalgorithms with asymptotic vanishing viscosity , Math. Program. 168(1-2) Ser. B, 123-175, 2018[5] H. Attouch, Z. Chbani, H. Riahi,
Rate of convergence of the Nesterov accelerated gradient methodin the subcritical case α ≤
3, ESAIM: COCV (2017). doi:10.1051/cocv/2017083[6] H. Attouch, J. Peypouquet, P. Redont,
Fast convex optimization via inertial dynamics with Hessiandriven damping , J. Differential Equations 261(10), 5734-5783, 2016[7] A. Beck, M. Teboulle,
A Fast Iterative Shrinkage- Thresholding Algorithm for Linear Inverse Prob-lems , SIAM Journal on Imaging Sciences 2(1), 183-202, 2009[8] J. Bolte, S. Sabach, M. Teboulle,
Proximal alternating linearized minimization for nonconvex andnonsmooth problems , Mathematical Programming Series A (146)(1-2), 459-494, 2014[9] J. Bolte, A. Daniilidis, A. Lewis,
The Lojasiewicz inequality for nonsmooth subanalytic functionswith applications to subgradient dynamical systems , SIAM Journal on Optimization 17(4), 1205-1223, 2006[10] J. Bolte, A. Daniilidis, A. Lewis, M. Shiota,
Clarke subgradients of stratifiable functions , SIAMJournal on Optimization 18(2), 556-572, 2007[11] J. Bolte, A. Daniilidis, O. Ley, L. Mazet,
Characterizations of Lojasiewicz inequalities: subgradientflows, talweg, convexity , Transactions of the American Mathematical Society 362(6), 3319-3363,2010 2312] R.I. Bot¸, E.R. Csetnek, S.C. L´aszl´o,
Approaching nonsmooth nonconvex minimization throughsecond-order proximal-gradient dynamical systems , Journal of Evolution Equations (2018).https://doi.org/10.1007/s00028-018-0441-7[13] R.I. Bot¸, E.R. Csetnek, S.C. L´aszl´o,
An inertial forward-backward algorithm for minimizing thesum of two non-convex functions , Euro Journal on Computational Optimization 4(1), 3-25, 2016[14] R.I. Bot¸, E.R. Csetnek, S.C. L´aszl´o,
A second order dynamical approach withvariable damping to nonconvex smooth minimization , Applicable Analysis (2018).https://doi.org/10.1080/00036811.2018.1495330[15] A. Chambolle, Ch. Dossal,
On the convergence of the iterates of the ”fast iterative shrinkage/thresh-olding algorithm” , J. Optim. Theory Appl. 166(3), 968-982, 2015[16] P. Frankel, G. Garrigos, J. Peypouquet,
Splitting Methods with Variable Metric forKurdyka Lojasiewicz Functions and General Convergence Rates , Journal of Optimization Theoryand Applications, 165(3), 874900, 2015[17] K. Kurdyka,
On gradients of functions definable in o-minimal structures , Annales de l’institutFourier (Grenoble) 48(3), 769-783, 1998[18] G. Li, T. K. Pong,
Calculus of the Exponent of Kurdyka- Lojasiewicz Inequality and Its Applicationsto Linear Convergence of First-Order Methods , Foundations of Computational Mathematics, 1-34, 2018[19] S. Lojasiewicz,
Une propri´et´e topologique des sous-ensembles analytiques r´eels , Les ´Equations auxD´eriv´ees Partielles, ´Editions du Centre National de la Recherche Scientifique Paris, 87-89, 1963[20] D.A. Lorenz, T. Pock,
An inertial forwardbackward algorithm for monotone inclusions , J. Math.Imaging Vis., 51(2), 311-325 2015[21] Y.E. Nesterov,
A method for solving the convex programming problem with convergence rate O (1 /k ), (Russian) Dokl. Akad. Nauk SSSR 269(3), 543-547, 1983[22] Y. Nesterov , Introductory lectures on convex optimization: a basic course . Kluwer Academic Pub-lishers, Dordrecht, 2004[23] B. T. Polyak,
Some methods of speeding up the convergence of iteration methods , U.S.S.R. Comput.Math. Math. Phys., 4(5):1-17, 1964[24] R.T. Rockafellar, R.J.-B. Wets,
Variational Analysis , Fundamental Principles of MathematicalSciences 317, Springer-Verlag, Berlin, 1998[25] W. Su, S. Boyd, E.J. Candes,