Damped Newton's Method on Riemannian Manifolds
M. A. A. Bortoloti, T. A. Fernandes, O. P. Ferreira, Jinyun Yuan
DDamped Newton’s Method on Riemannian Manifolds
Marcio Antˆonio de A. Bortoloti a , Teles A. Fernandes a, ∗ , Orizon PereiraFerreira b , Yuan Jin Yun c a DCET/UESB, CP-95, CEP 45083-900-Vit´oria da Conquista, Bahia, Brazil b IME/UFG, CP-131, CEP 74001-970 - Goaiania, Goi´as, Brazil c DM/CP/UFPR, Jardin das Am´ericas, CEP 81531-980 - Curitiba, Paran´a, Brazil
Abstract A damped Newton’s method to find a singularity of a vector field in Rieman-nian setting is presented with global convergence study. It is ensured that thesequence generated by the proposed method reduces to a sequence generatedby the Riemannian version of the classical Newton’s method after a finite num-ber of iterations, consequently its convergence rate is superlinear/quadratic.Moreover, numerical experiments illustrate that the damped Newton’s methodhas better performance than the Newton’s method in number of iteration andcomputational time. Keywords:
Global Optimization, Damped Newton method,Superlinear/Quadratic Rate, Riemannian Manifold
AMS subject classifications:
1. Introduction
In the 1990s, the optimization on manifolds area gained considerable pop-ularity, especially with the work of Edelman et al. [1]. Recent years havewitnessed a growing interest in the development of numerical algorithms fornonlinear manifolds, as there are many numerical problems posed in manifoldsarising in various natural contexts, see [1, 2, 3]. Thus, algorithms using the ∗ Corresponding author
Email addresses: [email protected] (Marcio Antˆonio de A. Bortoloti), [email protected] (Teles A. Fernandes), [email protected] (Orizon Pereira Ferreira), [email protected] (Yuan Jin Yun), [email protected] (Yuan Jin Yun)
Preprint submitted to Journal of L A TEX Templates July 20, 2018 a r X i v : . [ m a t h . O C ] J u l ifferential structure of nonlinear manifolds play an important role in optimiza-tion; see [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]. In this paper, instead of focusingon finding singularities of gradient vector fields, including local minimizers, onRiemannian manifolds, we consider the more general problem of finding singu-larities of vector fields.Among the methods to find a zero of a nonlinear function, Newton’s methodunder suitable conditions has local quadratic/superlinear convergence rate; see[16, 17]. This remarkable property has motivated several studies on generalizingNewton’s method from a linear setting to the Riemannian one [3, 18, 19, 20,21, 22, 23, 24]. Although of Newton’s method shows fast local convergence, itis very sensitive with respect to the initial iterate and may diverge if it is notsufficiently close to the solution. To overcome this drawback, some strategiesused in Newton’s method for optimization problems were introduced includingBFGS, Levenberg-Marquardt and trust region etc, see [16] and [25]. On theother hand, it is well-known in the linear context that one way to globalize theconvergence of Newton’s method is to damp its step-size (see [16, 26, 25, 27]).Among the strategies used, one particularly interesting is the one by using alinear search together with a merit function. In this case, the basic idea is touse linear search to damp Newton step-size when the full step does not providea sufficient decrease for values of the chosen merit function, which measures thequality of approximation to a zero of the nonlinear function in consideration.Newton’s method with these globalization strategies are called Damped New-ton’s Methods . For a comprehensive study about these subject on linear settingsee, for example, [17, 16, 28]. In this paper, we generalize this strategy of glob-alization of Newton’s method to solve the problem of finding singularities ofvector fields defined on Riemannian manifolds. Until now, studies on the glob-alization strategies in Riemannian settings have been restricted to optimizationproblems, for example, the Newton’s method with the hessian of the objectivefunction updated by
BFGS family, [5, 9], the
Trust-Region methods, [29] and
Levenberg-Marquardt methods [3, Chapter 8, Section 8.4.2]. To the best of ourknowledge, a global analysis of the damped Newton’s method for finding singu-2arities of vector fields defined on Riemannian manifolds by using a linear searchtogether with a merit function, whose particular case is to find local minimizersof real-valued functions defined on Riemannian manifolds, is novel here. Basedon the idea presented in [30, Section 4] for nonlinear complementarity problem,we propose a damped Newton method in the Riemannian setting. Moreover, weshall show its global convergence to a singularity of the vector field preservingthe same convergence rate of the classical Newton’s method. We perform somenumerical experiments for minimizing families of functions defined on cone ofsymmetric positive definite matrices which is one of Riemannian manifolds. Ourexperiments illustrate numerical performance of the proposed damped Newtonmethod by linear search decreasing a merit function. The numerical resultsdisplay that the damped Newton improves the behavior of the method whencompared to the full step Newton method.The remainder this paper is organized as follows. In Section 2 we present thenotations and basic results used in the rest paper. In Section 3 we describe theglobal superlinear and quadratic convergence analysis of the damped Newtonmethod. In Section 4 we display numerical experiments to verify the maintheoretical results. Finally, we concludes the paper Section 5.
2. Basic Concepts and Auxiliary Results
In this section we recall some notations, definitions and auxiliary results ofRiemannian geometry used throughout the paper. Some basic concepts usedhere can be found in many introductory books on Riemannian geometry, forexample, in [31] and [32]. Let M be a finite dimensional Riemannian manifold,denote the tangent space of M at p by T p M and the tangent bundle of M by T M = (cid:83) p ∈ M T p M . The corresponding norm associated to the Riemannian metric (cid:104)· , ·(cid:105) is denoted by (cid:107) · (cid:107) . The Riemannian distance between p and q in M is denotedby d ( p, q ), which induces the original topology on M , namely, ( M , d ) that is acomplete metric space. The open ball of radius r > p is definedas B r ( p ) := { q ∈ M : d ( p, q ) < r } . Let Ω ⊆ M be an open set and denote by3 (Ω) the space of differentiable vector fields on Ω. Let ∇ be the Levi-Civitaconnection associated to ( M , (cid:104)· , ·(cid:105) ). The covariant derivative of X ∈ X (Ω)determined by ∇ defines at each p ∈ Ω a linear map ∇ X ( p ) : T p M → T p M given by ∇ X ( p ) v := ∇ Y X ( p ) , where Y is a vector field such that Y ( p ) = v . For f : M → R , a twice-differentiable function the Riemannian metric induces themappings f (cid:55)→ grad f and f (cid:55)→ Hess f , which associate its gradient and hessian by the rules (cid:104) grad f, X (cid:105) := df ( X ) , (cid:104) Hess f X, X (cid:105) := d f ( X, X ) , ∀ X ∈ X (Ω) , (1)respectively, and the last equalities imply that Hess f X = ∇ X grad f , for all X ∈ X (Ω). For each p ∈ Ω, the conjugate of a linear map A p : T p M → T p M is a linear map A ∗ p : T p M → T p M defined by (cid:104) A p v, u (cid:105) = (cid:104) v, A ∗ p u (cid:105) , forall u, v ∈ T p M . The norm of the linear map A p is defined by (cid:107) A p (cid:107) :=sup {(cid:107) A p v (cid:107) : v ∈ T p M , (cid:107) v (cid:107) = 1 } . A vector field V along a differentiable curve γ in M is said to be parallel if and only if ∇ γ (cid:48) V = 0. If γ (cid:48) itself is parallel we saythat γ is a geodesic . The restriction of a geodesic to a closed bounded intervalis called a geodesic segment . A geodesic segment joining p to q in M is saidto be minimal if its length is equal to d ( p, q ). If there exists a unique geodesicsegment joining p to q , then we denote it by γ pq . For each t ∈ [ a, b ], ∇ inducesan isometric mapping, relative to (cid:104)· , ·(cid:105) , P γ,a,t : T γ ( a ) M → T γ ( t ) M defined by P γ,a,t v = V ( t ), where V is the unique vector field on γ such that ∇ γ (cid:48) ( t ) V ( t ) = 0,and V ( a ) = v . This mapping is the so-called parallel transport along the geodesicsegment γ joining γ ( a ) to γ ( t ). Note also that P γ, b , b ◦ P γ, a, b = P γ, a, b and P γ, b, a = P − γ, a, b . When there is no confusion we will consider the notation P pq instead of P γ, a, b in the case when γ is the unique geodesic segment joining p and q . A Riemannian manifold is complete if the geodesics are defined for anyvalues of t ∈ R . Hopf-Rinow’s theorem asserts that every pair of points in acomplete Riemannian manifold M can be joined by a (not necessarily unique)minimal geodesic segment. Due to the completeness of the Riemannian mani-fold M , the exponential map exp p : T p M → M can be given by exp p v = γ (1),for each p ∈ M , where γ is the geodesic defined by its position p and velocity v p . Let p ∈ M , the injectivity radius of M at p is defined by i p := sup (cid:26) r > p | Br (0 p ) is a diffeomorphism (cid:27) , where B r (0 p ) := { v ∈ T p M : (cid:107) v − p (cid:107) < r } and 0 p denotes the origin of thetangent plane T p M . Remark 1.
Let ¯ p ∈ M . The above definition implies that if < δ < i ¯ p ,then exp ¯ p B δ (0 ¯ p ) = B δ (¯ p ) . Moreover, for all p ∈ B δ (¯ p ) , there exists a uniquegeodesic segment γ joining p to ¯ p , which is given by γ p ¯ p ( t ) = exp p ( t exp − p ¯ p ) ,for all t ∈ [0 , . Consider p ∈ M and δ p := min { , i p } . The quantity assigned to measurehow fast the geodesic spread apart in M has been defined in [33] as K p := sup (cid:26) d (exp q u, exp q v ) (cid:107) u − v (cid:107) : q ∈ B δ p ( p ) , u, v ∈ T q M , u (cid:54) = v, (cid:107) v (cid:107)≤ δ p , (cid:107) u − v (cid:107)≤ δ p (cid:27) . Remark 2.
In particular, when u = 0 or more generally when u and v are onthe same line through , d (exp q u, exp q v ) = (cid:107) u − v (cid:107) . Hence, K p ≥ , for all p ∈ M . Moreover, when M has non-negative sectional curvature, the geodesicsspread apart less than the rays [31, chapter 5], i.e., d (exp p u, exp p v ) ≤ (cid:107) u − v (cid:107) and, in this case, K p = 1 for all p ∈ M . Let X ∈ X (Ω) and ¯ p ∈ Ω. Assume that 0 < δ < δ ¯ p . Since exp ¯ p B δ (0 ¯ p ) = B δ (¯ p ), there exists a unique geodesic joining each p ∈ B δ (¯ p ) to ¯ p . Moreover,using [22, equality 2.3] we obtain X ( p ) = P ¯ pp X (¯ p ) + P ¯ pp ∇ X (¯ p ) exp − p p + d ( p, ¯ p ) r ( p ) , lim p → ¯ p r ( p ) = 0 . (2)Let p ∗ ∈ M . The following result establish that, if ∇ X ( p ∗ ) is nonsingular thereexists a neighborhood of p ∗ which ∇ X is nonsingular; see [24, Lemma 3.2]. Lemma 1.
Let X ∈ X (Ω) and ¯ p ∈ M . Assume that ∇ X is continuous at ¯ p ,and ∇ X (¯ p ) is nonsingular. Then, there exists < ˆ δ < δ ¯ p such that B ˆ δ (¯ p ) ⊂ Ω ,and ∇ X ( p ) is nonsingular for each p ∈ B ˆ δ (¯ p ) . X ∈ X (Ω) and p ∈ M such that ∇ X ( p ) is nonsingular, the Newton’siterate mapping N X at p is defined by N X ( p ) := exp p ( −∇ X ( p ) − X ( p )) . (3)In the folowing we present a result about the behavior of the Newton’s iteratemapping near a singularity of X , whose proof can be found in [24, Lemma 3.3]. Lemma 2.
Let X : Ω → T M a differentiable vector field and p ∗ ∈ M . Assumethat X ( p ∗ ) = 0 , ∇ X is continuous at p ∗ and ∇ X ( p ∗ ) is nonsingular. Then, lim p → p ∗ [ dN X ( p ) , p ∗ ) /d ( p, p ∗ )] = 0 . The next result establishes superlinear convergence of the Newton’s method;see [24, Theorem 3.1].
Theorem 1.
Let Ω ⊂ M be an open set, X : Ω → T M a differentiable vectorfield and p ∗ ∈ Ω . Suppose that p ∗ is a singularity of X , ∇ X is continuousat p ∗ , and ∇ X ( p ∗ ) is nonsingular. Then, there exists ¯ δ > such that, for all p ∈ B ¯ δ ( p ∗ ) , the Newton sequence p k +1 = exp p k ( −∇ X ( p k ) − X ( p k )) , for all k = 0 , , . . . , is well-defined, contained in B ¯ δ ( p ∗ ) , and converges superlinearly to p ∗ . Let X ∈ X (Ω) and ¯ p ∈ Ω. The map ∇ X is locally Lipschitz continuousat ¯ p if and only if there exist a number L > < δ L < δ p ∗ such that (cid:107) P pp ∗ ∇ X ( p ) − ∇ X ( p ∗ ) P pp ∗ (cid:107)≤ Ld ( p, p ∗ ), for all p ∈ B δ L ( p ∗ ) . We end thissection with a result which, up to some minor adjustments, merges into [2,Theorem 4.4], see also [21, Theorem 7.1].
Theorem 2.
Let X : Ω → T M be a continuously differentiable vector field, and p ∗ ∈ M a singularity of X . Suppose that ∇ X is locally Lipschitz continuous at p ∗ with constant L > , and ∇ X ( p ∗ ) is nonsingular. Then, there exists a r > such that the sequence p k +1 = exp p k (cid:0) −∇ X ( p k ) − X ( p k ) (cid:1) , for all k = 0 , , . . . ,starting in p ∈ B r ( p ∗ ) \ { p ∗ } , is well-defined, contained B r ( p ∗ ) , converges to p ∗ , and there holds d ( p ∗ , p k +1 ) ≤ L K p ∗ (cid:107) ∇ X ( p ∗ ) − (cid:107) K p ∗ − d ( p , p ∗ ) L (cid:107)∇ X ( p ∗ ) − (cid:107) ] d ( p ∗ , p k ) , k = 0 , , . . . . . Damped Newton Method In this section we consider the problem of finding a singularity of a differ-entiable vector field. The formal statement of the problem is as follows: Find apoint p ∈ M such that X ( p ) = 0 , (4)where X : Ω ⊆ M → T M is a differentiable vector field. The Newton’s methodapplied to this problem has local superlinear convergence, whenever the covari-ant derivative in the singularity of X is non-singular, see [24]. Additionally, if thecovariant derivative is Lipschitz continous at singularity, then the method has Q -quadratic convergence, see [21]. In order to globalize the Newton’s methodkeeping the same properties aforementioned, we will use the same idea of theEuclidean setting by introducing a linear search strategy. This modificationof Newton’s method called damped Newton’s method performs a linear searchdecreasing the merit function ϕ : M → R , ϕ ( p ) = 12 (cid:107) X ( p ) (cid:107) , (5)in order to reach the superlinear convergence region of the Newton’s method.In the following we present the formal description of the damped Newton’salgorithm. Damped Newton MethodStep 0.
Choose a scalar σ ∈ (0 , / p ∈ M , and set k = 0; Step 1.
Compute search direction v k ∈ T p k M as a solution of the linear equa-tion X ( p k ) + ∇ X ( p k ) v = 0 . (6)If v k exists, go to Step
2. Otherwise, set the search direction v k = − grad ϕ ( p k ), where ϕ is defined by (5), i.e., v k = −∇ X ( p k ) ∗ X ( p k ) . (7)7f v k = 0, stop; Step 2.
Compute the stepsize by the rule α k := max (cid:8) − j : ϕ (cid:0) exp p k (2 − j v k ) (cid:1) ≤ ϕ ( p k )+ σ − j (cid:104) grad ϕ ( p k ) , v k (cid:105) , j ∈ N (cid:9) ; (8)and set the next iterated as p k +1 := exp p k ( α k v k ); (9) Step 3.
Set k ← k + 1 and go to Step 1 .We remark that in
Step 1 we resort directly to the steepest descent stepof ϕ as a safeguard, but only in those cases when the Newtonian direction in(6) does not exist. This idea has been used, in Euclidian context, for nonlinearcomplementarity problems, see for example [30] and [26, Chap. 5, Sec. 5.1.1]. In this subsection we present some preliminaries results in order to ensurewell-definition and convergence of the sequence generated by the damped New-ton’s method. We begin with an useful result for establishing the well-definedsequence.
Lemma 3.
Let p ∈ Ω such that X ( p ) (cid:54) = 0 . Assume that v = −∇ X ( p ) ∗ X ( p ) orthat v is a solution of the following equation X ( p ) + ∇ X ( p ) v = 0 . (10) If v (cid:54) = 0 , then v is a descent direction for ϕ from p , i.e., (cid:104) grad ϕ ( p ) , v (cid:105) < .Proof. First we assume that v satisfies (10). Since grad ϕ ( p ) = ∇ X ( p ) ∗ X ( p ),there is (cid:104) grad ϕ ( p ) , v (cid:105) = (cid:104) X ( p ) , ∇ X ( p ) v (cid:105) . It follows from X ( p ) (cid:54) = 0 and v satisfying (10) that (cid:104) grad ϕ ( p ) , v (cid:105) = − (cid:107) X ( p ) (cid:107) <
0, which implies that v isa descent direction for ϕ from p . Now, assume that v = −∇ X ( p ) ∗ X ( p ). Thus (cid:104) grad ϕ ( p ) , v (cid:105) = −(cid:107)∇ X ( p ) ∗ X ( p ) (cid:107) < v is also a descent direction for ϕ from p . 8nder suitable assumptions the following result guarantees that the dampedNewton’s method, after a finite quantity of iterates, reduces to the classicaliteration of Newton’s method. Lemma 4.
Let ¯ p ∈ M satisfy X (¯ p ) = 0 . If ∇ X and ∇ X (¯ p ) are continuous at ¯ p and nonsingular respectively, then there exists < ˆ δ < δ ¯ p such that B ˆ δ (¯ p ) ⊂ Ω , ∇ X ( p ) is nonsingular for each p ∈ B ˆ δ (¯ p ) and lim p → ¯ p ϕ ( N X ( p )) (cid:107) X ( p ) (cid:107) = 0 . (11) As a consequence, there exists a δ > such that, for all σ ∈ (0 , / and δ < ˆ δ there holds ϕ ( N X ( p )) ≤ ϕ ( p )) + σ (cid:10) grad ϕ ( p ) , −∇ X ( p ) − X ( p ) (cid:11) , ∀ p ∈ B δ (¯ p ) . (12) Proof.
By Lemma 1 there exists 0 < ˆ δ < δ ¯ p such that B ˆ δ (¯ p ) ⊂ Ω and ∇ X ( p )is nonsingular for each p ∈ B ˆ δ (¯ p ). Define v p = −∇ X ( p ) − X ( p ), for all p ∈ B ˆ δ (¯ p ). Since X (¯ p ) = 0 and ∇ X is continuous and nonsingular at ¯ p , thereis lim p → ¯ p v p = 0. Moreover, it follows from (2), the isometry of the paralleltransport and (cid:107) exp − p exp p ( v p ) (cid:107) = d (exp p ( v p ) , ¯ p ) that ϕ ( N X ( p )) = 12 (cid:107) X (cid:0) exp p ( v p ) (cid:1) − P ¯ p exp p ( v p ) X (¯ p ) (cid:107) ≤ (cid:2) (cid:107)∇ X (¯ p ) (cid:107) + (cid:107) r (exp p ( v p )) (cid:107) (cid:3) d (exp p ( v p ) , ¯ p ) . Hence, after some simple algebraic manipulations and from the last inequalitywe obtain for all p ∈ B ˆ δ (¯ p ) \{ ¯ p } , ϕ (cid:0) exp p ( v p ) (cid:1) (cid:107) X ( p ) (cid:107) ≤ (cid:2) (cid:107)∇ X (¯ p ) (cid:107) + (cid:107) r (exp p ( v p )) (cid:107) (cid:3) d (exp p ( v p ) , ¯ p ) d ( p, ¯ p ) d ( p, ¯ p ) (cid:107) X ( p ) (cid:107) . (13)On the other hand, owing that X (¯ p ) = 0 and ∇ X (¯ p ) is nonsingular, it is easyto see thatexp − p p = −∇ X (¯ p ) − (cid:2) P p ¯ p X ( p ) − X (¯ p ) − ∇ X (¯ p ) exp − p p (cid:3) + ∇ X (¯ p ) − P p ¯ p X ( p ) . (14)9sing again (2) and that ∇ X (¯ p ) is nonsingular, we conclude that there exists0 < ¯ δ < δ ¯ p such that (cid:13)(cid:13) ∇ X (¯ p ) − (cid:2) P p ¯ p X ( p ) − X (¯ p ) − ∇ X (¯ p ) exp − p p (cid:3)(cid:13)(cid:13) ≤ d ( p, ¯ p ) , ∀ p ∈ B ¯ δ (¯ p ) . Thus, combining (14) with last inequality and considering that d (¯ p, p ) = (cid:107) exp − p p (cid:107) we obtain that d (¯ p, p ) ≤ d ( p, ¯ p ) / (cid:107)∇ X (¯ p ) − P p ¯ p X ( p ) (cid:107) , for all p ∈ B ¯ δ (¯ p ), whichimplies that d (¯ p, p ) ≤ (cid:2) (cid:107)∇ X (¯ p ) − (cid:107) (cid:3) (cid:107) X ( p ) (cid:107) , for all p ∈ B ¯ δ (¯ p ). Hence, let-ting ˜ δ = min { ˆ δ, ¯ δ } we obtain from (13) that for all p ∈ B ˜ δ (¯ p ) \{ ¯ p } , ϕ (cid:0) exp p ( v p ) (cid:1) (cid:107) X ( p ) (cid:107) ≤ (cid:2) (cid:107)∇ X (¯ p ) − (cid:107) (cid:0) (cid:107)∇ X (¯ p ) (cid:107) + (cid:107) r (exp p ( v p )) (cid:107) (cid:1)(cid:3) d (exp p ( v p ) , ¯ p ) d ( p, ¯ p ) . Therefore, it follows from Lemma 2 and lim p → ¯ p r (exp p ( v p )) = 0 that the equality(11) by taking limit, as p goes to ¯ p , in the latter inequality. For proving (12),we first use (11) for concluding that there exists a δ > δ < ˆ δ andfor σ ∈ (0 , /
2) there is ϕ ( N X ( p )) ≤ − σ (cid:107) X ( p ) (cid:107) , ∀ p ∈ B δ (¯ p ) . Taking into account that grad ϕ ( p ) = ∇ X ( p ) ∗ X ( p ) we can conclude directlythat (cid:10) grad ϕ ( p ) , −∇ X ( p ) − X ( p ) (cid:11) = −(cid:107) X ( p ) (cid:107) , then the last inequality is equiv-alent to (12) and then the proof is complete.In the next result we show that whenever the vector field is continuous withnonsingular covariant derivative at a singularity, there exists a neighborhoodwhich is invariant by the Newton’s iterate mapping associated. Lemma 5.
Let ¯ p ∈ M such that X (¯ p ) = 0 . If ∇ X is continuous at ¯ p and ∇ X (¯ p ) is nonsingular, then there exists < ˆ δ < δ ¯ p such that B ˆ δ (¯ p ) ⊂ Ω and ∇ X ( p ) is nonsingular for each p ∈ B ˆ δ (¯ p ) . Moreover, N X ( p ) ∈ B ˆ δ (¯ p ) , for all p ∈ B ˆ δ (¯ p ) .Proof. It follows from Lemma 1 that there exists 0 < ˆ δ < δ ¯ p such that B ˆ δ (¯ p ) ⊂ Ωand ∇ X ( p ) is nonsingular for each p ∈ B ˆ δ (¯ p ). Thus, shirking ˆ δ if necessary, wecan use Lemma 2 to conclude that d ( N X ( p ) , ¯ p ) < d ( p, ¯ p ) /
2, for all p ∈ B ˆ δ ( p ∗ ),which implies the last statement of the lemma.10 .2. Convergence Analysis In this subsection we establish our main result. We shall prove a resulton the global convergence of the damped Newton’s method preserving the fastconvergence rates of the Newton’s one. We begin by proving the well-definitionof the sequence generated by the damped Newton’s method.
Lemma 6.
The sequence { p k } generated by the damped Newton’s method iswell-defined.Proof. If v (cid:54) = 0 and X ( p ) (cid:54) = 0, then by using Lemma 3 we can conclude that v satisfying (6) or (7) is a descent direction for ϕ from p . Hence, by usingordinary argument we conclude that α in (8) is well-defined, and p in (9) isalso well-defined. Therefore, by using induction argument we can prove thatthe sequence { p k } is well-defined.Note that, if the sequence generated by the damped Newton’s method isfinite, then the last iterate is a solution of (4) or a critical point of ϕ defined in(5). Thus, we can assume that { p k } is infinite, v k (cid:54) = 0 and X ( p k ) (cid:54) = 0, for all k = 0 , , . . . Theorem 3.
Let X : Ω ⊆ M → T M be a differentiable vector field. Assumethat { p k } generated by the Damped Newton’s Method has an accumulation point ¯ p ∈ Ω such that ∇ X is continuous at ¯ p with nonsingular ∇ X (¯ p ) . Then, { p k } converges superlinearly to the singularity ¯ p of X . Moreover, the convergencerate is quadratic provided that ∇ X is locally Lipschitz continuous at ¯ p .Proof. We begin to show that ¯ p is a singularity of X . As aforementioned, we canassume that { p k } is infinite, grad ϕ ( p k ) (cid:54) = 0 and X ( p k ) (cid:54) = 0, for all k = 0 , , . . . .Hence, (8) and (9) imply that ϕ ( p k ) − ϕ ( p k +1 ) ≥ − σα k (cid:104) grad ϕ ( p k ) , v k (cid:105) = σα k (cid:107) X ( p k ) (cid:107) > , if v k satisfies (6); σα k (cid:107) grad ϕ ( p k ) (cid:107) > , else . { ϕ ( p k ) } is strictly decreasing and, bounded below by zero which results inconvergence of the sequence. Hence, taking the limite in this latter inequalitywe obtain lim k →∞ [ α k (cid:104) grad ϕ ( p k ) , v k (cid:105) ] = 0 . Let { p k j } be a subsequence of { p k } such that lim k j → + ∞ p k j = ¯ p . Thus, we havelim k j →∞ (cid:2) α k j (cid:10) grad ϕ ( p k j ) , v k j (cid:11)(cid:3) = 0 . (15)Owing that ∇ X is continuous at ¯ p and ∇ X (¯ p ) is nonsingular, by using Lemma 1,we can also assume that ∇ X ( p k j ) is nonsingular, for all j = 0 , , . . . . Hence, (6)has a solution and then for j = 0 , , . . . we have v k j = −∇ X ( p k j ) − X ( p k j ) , (cid:10) grad ϕ ( p k j ) , v k j (cid:11) = −(cid:107) X ( p k j ) (cid:107) . (16)To analyse the consequences of (15) we consider the following two possible cases: a ) lim inf j →∞ α k j > , b ) lim inf j →∞ α k j = 0 . First we assume item a) holds. From (15), passing onto a further subsequenceif necessary, we can assume thatlim j →∞ (cid:10) grad ϕ ( p k j ) , v k j (cid:11) = 0 . (17)Taking the limit in the latter equality, as j goes to infinity, and considering thatlim j → + ∞ p k j = ¯ p and X is continuous at ¯ p , we conclude from (17) and (16) that X (¯ p ) = 0 what means ¯ p is a singularity of X . Now, we assume item b) holds.Hence, given s ∈ N we can take j large enough such that α k j < − s . Thus 2 − s does not satisfies the Armijo’s condition (8), i.e., ϕ (cid:16) exp p kj (2 − s v k ) (cid:17) > ϕ ( p k j ) + σ − s (cid:10) grad ϕ ( p k j ) , v k j (cid:11) . Letting j goes to infinity, considering that the exponential mapping is continu-ous, lim j → + ∞ p k j = ¯ p and due to ∇ X and X be continuous at ¯ p , it follows from(16) and the last inequality that ϕ (cid:0) exp ¯ p (2 − s ¯ v ) (cid:1) > ϕ (¯ p ) + σ − s (cid:104) grad ϕ (¯ p ) , ¯ v (cid:105) , where ¯ v = −∇ X (¯ p ) − X (¯ p ). Hence, we obtain [ ϕ (cid:0) exp ¯ p (2 − s ¯ v ) (cid:1) − ϕ (¯ p )] / − s ≥ σ (cid:104) grad ϕ (¯ p ) , ¯ v (cid:105) . Therefore, letting s goes to infinity we can conclude that12 grad ϕ (¯ p ) , ¯ v (cid:105) ≥ σ (cid:104) grad ϕ (¯ p ) , ¯ v (cid:105) , or equivalently (cid:107) X (¯ p ) (cid:107) ≤ σ (cid:107) X (¯ p (cid:107) . Thus,owing σ ∈ (0 , /
2) we have (cid:107) X (¯ p ) (cid:107) = 0 which implies that ¯ p is singularity of X . We proceed to prove that there exists k such that α k = 1, for all k ≥ k .Since ∇ X (¯ p ) is nonsingular and X (¯ p ) = 0, the Lemma 1 and Lemma 5 implythat there exists ˆ δ > ∇ X ( p ) is nonsingular and N X ( p ) ∈ B δ (¯ p ), forall p ∈ B δ (¯ p ) and all δ ≤ ˆ δ . Since ¯ p is a cluster point of { p k } , there exits k such that p k ∈ B ˆ δ (¯ p ) (shirking ˆ δ if necessary). It follows from Lemma 4 that ϕ ( N X ( p k )) ≤ ϕ ( p k )) + σ (cid:104) grad ϕ ( p k ) , v k (cid:105) , with v k = −∇ X ( p k ) − X ( p k ).Hence, it follows from the last inequality, (3) and (8) that α k = 1. By (9) wecan conclude that p k +1 = N X ( p k ) which implies p k +1 ∈ B ˆ δ (¯ p ) because of N X ( p k ) ∈ B ˆ δ (¯ p ). Then, an induction step is completely analogous, yielding α k = 1 , p k +1 = N X ( p k ) ∈ B ˆ δ (¯ p ) , ∀ k ≥ k . (18)In order to obtain supetlinear convergence of the entire sequence { p k } , let ¯ δ > δ smaller if necessary so that ˆ δ < ¯ δ , wecan apply Theorem 1 to conclude from (18) that { p k } converges superlinearlyto ¯ p .To prove that { p k } converges quadratically to ¯ p , we first make ˆ δ < r , where r is given by Theorem 2. Since ∇ X is locally Lipschitz continuous at ¯ p , theresult follows from the combination of (18) and Theorem 2.
4. Numerical Experiments
In this section, we shall give some numerical experiments to illustrate theperformance of the damped Newton method for minimizing two families of func-tions defined on the cone of symmetric positive definite matrices. A particularinstance of one of these families has application in robotics. Before presentour numerical experiment, we need to introduce some concepts. Let P n be theset of symmetric matrices of order n × n and P n ++ be the cone of symmetricpositive definite matrices. Following Rothaus [34], let M := ( P n ++ , (cid:104) , (cid:105) ) be the13iemannian manifold endowed with the Riemannian metric defined by (cid:104) U, V (cid:105) = tr(
V ψ (cid:48)(cid:48) ( P ) U ) = tr( V P − U P − ) , P ∈ M , U, V ∈ T P M ≈ P n , (19)where tr P denotes the trace of P . Then, the exponential mapping is given byexp P V = P / e ( P − / V P − / ) P / , P ∈ M , V ∈ T P M ≈ P n . (20)Let X and Y be vector fields in M . Then, by using [32, Theorem 1.2, page 28],we can prove that the Levi-Civita connection of M is given by ∇ Y X ( P ) = X (cid:48) Y − (cid:2) Y P − X + XP − Y (cid:3) , (21)where P ∈ M , and X (cid:48) denotes the Euclidean derivative of X . Therefore, itfollows from (19) and (21) that the Riemannian gradient and hessian of a twice-differentiable function f : M → R are respectively given by:grad f ( P ) = P f (cid:48) ( P ) P, Hess f ( P ) V = P f (cid:48)(cid:48) ( P ) V P +12 [
V f (cid:48) ( P ) P + P F (cid:48) ( P ) V ] , (22)for all V ∈ T P M , where f (cid:48) ( P ) and f (cid:48)(cid:48) ( P ) are the Euclidean gradient and hessianof f at P , respectively. In this case, by using (20), the Newton’s iteration forfinding P ∈ M such that grad f ( P ) = 0 is given by P k +1 = P / k e P − / k V k P − / k P / k , k = 0 , , . . . , (23)where, by using again the equalities in (22), V k is the unique solution of theNewton’s linear equation P k f (cid:48)(cid:48) ( P k ) V k P k + 12 [ V k f (cid:48) ( P k ) P k + P k f (cid:48) ( P k ) V k ] = − P k f (cid:48) ( P k ) P k , (24)which corresponds to (6) for X = grad f . Now, we are going to present con-crete examples for (23) and (24). Let i = 1 , f i : P n ++ → R be defined,respectively, by f ( P ) = a ln det P + b tr P − , f ( P ) = a ln det P − b tr P, (25)14here b / a >
0, a / b >
0, and det P denotes the determinant of P , respec-tively. In particular, the function f can be associated to the optimal dextroushand grasping problem in robotics, see [35, 36, 37].Then, for each i = 1 ,
2, the Euclidean gradient and hessian of f i are given,respectively, by f (cid:48) ( P ) = a P − − b P − ,f (cid:48)(cid:48) ( P ) V = b (cid:0) P − V P − + P − V P − (cid:1) − a P − V P − , (26) f (cid:48) ( P ) = a P − − b I, f (cid:48)(cid:48) ( P ) V k = − a P − V P − , (27)where I denotes the n × n identity matrix. It follows from (22), (25), (26) and(27) that grad f ( P ) = a P − b I, grad f ( P ) = a P − b P . (28)From the last two equalities, we can conclude that the global minimizer of f i ,for each i = 1 , P ∗ = b / a I and P ∗ = a / b I , respectively. Our task isto execute explicitly the Damped Newton method to find the global minimizerof f i , for each i = 1 ,
2. For this purpose, consider ϕ i : P n ++ → R given by ϕ i ( P ) = 1 / (cid:107) grad f i ( P ) (cid:107) , i = 1 ,
2. Thus, by using (28) we conclude thatgrad ϕ ( P ) = ab I − b P − , grad ϕ ( P ) = d P − a b P . (29)By combining (19), (20) and (28), after some calculations we obtain the followingequalities ϕ (exp P V ) = 12 tr (cid:18) a I − b (cid:16) P / e P − / V P − / P / (cid:17) − (cid:19) , (30) ϕ (exp P V ) = 12 tr (cid:16) a I − b P / e P − / V P − / P / (cid:17) . (31)Finally, by using (23), (24), definition of f in (25), (26), left hand sides of (28)and (29) and (30) we give the damped Newton method, for finding the globalminimizer of f . The formal algorithm to find the global minimizer of function f will not be presented here because it can be obtained similarly by using (23),1524), the definition of f in (25), (27), right hand sides of (28) and (29) and(31). The Damped Newton algorithim for the matrix cone is given as follows. Damped Newton Method in P n ++ (DNM- P n ++ )Step 0. Choose a scalar σ ∈ (0 , / P ∈ M , and set k = 0; Step 1.
Compute search direction V k , as a solution of the linear equation P k V k + V k P k = 2( P k − a/bP k ) . (32)If V k exists go to Step
2. Othewise, put V k = − grad ϕ ( p k ), i.e, V k = b P − k − ab I. (33)If V k = 0, stop; Step 2.
Compute the stepsize by the rule α k := max (cid:26) − j : 12 tr (cid:18) a I − b (cid:16) P / k e − j P − / k V P − / k P / k (cid:17) − (cid:19) ≤ (cid:18) − σ − j (cid:19) tr (cid:0) a I − b P − k (cid:1) (cid:27) (34)and set the next iterated as P k +1 := P / k e α k P − / k V k P − / k P / k ; (35) Step 3.
Set k ← k + 1 and go to Step 1 .Although the domain of f is a subset of the symmetric matrix set, namely, P n ++ , equality (9) shows that DNM- P n ++ generates only feasible points with-out using projections or any other procedure to remain the feasibility. Hence,problem of minimizing f onto P n ++ can be seen as unconstrained Riemannian16ptimization problem. Note that, in this case, the equation (32) always hasunique solution V k ∈ T P k P n ++ , see [38, Th. 8.2.1]. Consequently the direction V k in (33) does not play any role here. Thus, in this case, we can compare theNewton and damped Newton methods in number of iterations, function eval-uation and CPU time to reach the solution. The Newton method is formallydescribed as follow. Newton Method in P n ++ (NM- P n ++ )Step 0. Take an initial point P ∈ M , and set k = 0; Step 1.
Compute search direction V k as a solution of the linear equation P k V k + V k P k = 2( P k − a / b P k ) . Step 2.
Compute the next iterated by P k +1 := P / k e P − / k V k P − / k P / k ; Step 3.
Set k ← k + 1 and go to Step 1 .We have implemented the above two algorithms by using MATLAB R2015b.The experiments are performed on an Intel Core 2 Duo Processor 2.26 GHz ,4 GB of RAM, and OSX operating system. We compare the iterative behaviorof NM- P n ++ and DNM- P n ++ applied to f and f with randomly chosen ini-tial guesses. In all experiments the stop criterion at the iterate P k ∈ P n ++ is (cid:107) grad f ( P k ) (cid:107) ≤ − where (cid:107)·(cid:107) is norm associated to the metric givenby (19). All codes are freely available at . We run theabove two algorithms in a huge number of problems by varying the dimensionsand parameters b / a and b / a in functions f and f , respectively. In gen-eral, DNM- P n ++ is superior to NM- P n ++ in number of iterations and CPU time.17 -21 -12
0 10 20 30 40 50 60 70 80 90 100 110 120 || g r ad f || IterateDamped Newton Method Newton Method (a) f with b / a = 0 . n = 1000. -15 -12 -6
0 10 20 30 40 50 60 70 80 90 100 110 120 || g r a d f || IterateDamped Newton Method Newton Method (b) f with b / a = 0 .
002 and n = 1000. As we can see in figures (a) and (b) NM- P n ++ performs huge number of itera-tions before reaching the superlinear convergence region while the line-search ofDNM- P n ++ decreasing gradient norm ensuring a superior performance in numberof iterations. Note that in these figures DNM- P n ++ and NM- P n ++ have the samebehavior when the iterations close to the solution, which is the consequence ofLemma 4. In part the efficiency of the DNM- P n ++ can be explained due to thelinearsearch decreasing the merit function.Table 1 shows that DNM- P n ++ is superior to NM- P n ++ in number of iterationsand CPU time, besides showing the number of evaluation of the function in thelinear-search to compute the stepsize. In the columns of this table we read: n isthe dimension of P n , b / a and b / a are the parameters defining the functions f and f , respectively, NIT is the number of iterates to reach the minimaizer18 M- P n ++ DNM- P n ++ b i /a i n NIT HE GE T NIT HE GE T f . .
19 4 4 21 0 . .
90 6 6 30 0 . . .
10 4 4 17 0 . .
44 4 4 18 0 . . . .
10 4 4 17 0 . .
22 5 5 22 0 . . f .
001 1 249 249 249 0 .
37 6 6 31 0 . .
66 7 7 34 0 . .
00 7 7 34 131 . .
002 1 125 125 125 0 .
29 6 6 30 0 . .
90 6 6 29 0 . .
01 1 26 26 26 0 .
31 5 5 23 0 . .
92 5 5 22 0 . . Table 1:
Performance of DNM- P n ++ and NM- P n ++ to f and f functions. of the function, HE is the number of Hessian evaluation, GE is the numberof Gradient evaluation and T is the CPU time in seconds. It is important toobserve GE in DNM- P n ++ takes into account the evaluations in the Armijo’srule also. We know that, in general, the linear-search to compute the stepsizeis expensive, but the computational effort did not grow significantly with thedimension of of P n in the DNM- P n ++ . In each method we can observe thateach iteration evaluates the Hessian once. Hence, DNM- P n ++ has a superiorperformance than NM- P n ++ because the minimizer is achieved by DNM- P n ++ with less iterates than NM- P n ++ , as can be seen in Table 1.19 . Final Remarks In this paper, in order to find a singularity of a vector field defined on Rie-mannian manifold, we presented a damped Newton’s method and established itsglobal convergence with quadratic/superlinear convergence rate. Note that theglobal convergence analysis of the damped Newton’s method without assumingnonsingularity of the hessian of the objective function at its critical points wasestablished in the Euclidean context; see [25, Chapter 1, Section 1.3.3]. Sincemany important problems in the context of a Riemannian manifold become theproblem of finding a singularity of a vector field [39, 1], it would be interestingto obtain a similar global analysis for a damped Newton’s method to find asingularity of a vector field defined on a Riemannian manifold. Based on ourprevious experience, we feel that more tools need to be developed to achievethis goal. In order to obtain a better numerical efficiency to the damped New-ton’s method our next task is to design new methods where the equation (6) isapproximately solved, besides using retraction in (9) and more efficient linearseach in (8).
References [1] A. Edelman, T. A. Arias, S. T. Smith, The geometry of algorithms withorthogonality constraints, SIAM J. Matrix Anal. Appl. 20 (2) (1999) 303–353.[2] S. T. Smith, Optimization techniques on Riemannian manifolds, in: Hamil-tonian and gradient flows, algorithms and control, Vol. 3 of Fields Inst.Commun., Amer. Math. Soc., Providence, RI, 1994, pp. 113–136.[3] P.-A. Absil, R. Mahony, R. Sepulchre, Optimization algorithms on matrixmanifolds, Princeton University Press, Princeton, NJ, 2008.[4] P.-A. Absil, L. Amodei, G. Meyer, Two Newton methods on the mani-fold of fixed-rank matrices endowed with Riemannian quotient geometries,Comput. Statist. 29 (3-4) (2014) 569–590.205] W. Huang, K. A. Gallivan, P.-A. Absil, A Broyden class of quasi-Newtonmethods for Riemannian optimization, SIAM J. Optim. 25 (3) (2015) 1660–1685.[6] C. Li, G. L´opez, V. Mart´ın-M´arquez, Monotone vector fields and the proxi-mal point algorithm on Hadamard manifolds, J. Lond. Math. Soc. (2) 79 (3)(2009) 663–683.[7] C. Li, J. Wang, Newton’s method for sections on Riemannian manifolds:generalized covariant α -theory, J. Complexity 24 (3) (2008) 423–451.[8] Y. Hu, C. Li, X. Yang, On convergence rates of linearized proximal al-gorithms for convex composite optimization with applications, SIAM J.Optim. 26 (2) (2016) 1207–1235.[9] W. Ring, B. Wirth, Optimization methods on Riemannian manifolds andtheir application to shape space, SIAM J. Optim. 22 (2) (2012) 596–627.[10] J. H. Manton, A framework for generalising the Newton method and otheriterative methods from Euclidean space to manifolds, Numer. Math. 129 (1)(2015) 91–125.[11] S. Hosseini, A. Uschmajew, A Riemannian gradient sampling algorithmfor nonsmooth optimization on manifolds, SIAM J. Optim. 27 (1) (2017)173–189.[12] P. Grohs, S. Hosseini, Nonsmooth trust region algorithms for locally Lip-schitz functions on Riemannian manifolds, IMA J. Numer. Anal. 36 (3)(2016) 1167–1192.[13] Z. Zhao, Z.-J. Bai, X.-Q. Jin, A Riemannian Newton algorithm for nonlin-ear eigenvalue problems, SIAM J. Matrix Anal. Appl. 36 (2) (2015) 752–774.[14] L. Cambier, P.-A. Absil, Robust low-rank matrix completion by Rieman-nian optimization, SIAM J. Sci. Comput. 38 (5) (2016) S440–S460.2115] H. Sato, T. Iwai, A Riemannian optimization approach to the matrix sin-gular value decomposition, SIAM J. Optim. 23 (1) (2013) 188–212.[16] J. E. Dennis, Jr., R. B. Schnabel, Numerical methods for unconstrainedoptimization and nonlinear equations, Vol. 16 of Classics in Applied Math-ematics, Society for Industrial and Applied Mathematics (SIAM), Philadel-phia, PA, 1996, corrected reprint of the 1983 original.[17] J. M. Ortega, W. C. Rheinboldt, Iterative solution of nonlinear equationsin several variables, Vol. 30 of Classics in Applied Mathematics, Societyfor Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2000,reprint of the 1970 original.[18] C. Li, J.-H. Wang, J.-P. Dedieu, Smale’s point estimate theory for Newton’smethod on Lie groups, J. Complexity 25 (2) (2009) 128–151.[19] V. H. Schulz, A Riemannian view on shape optimization, Found. Comput.Math. 14 (3) (2014) 483–501.[20] J.-H. Wang, C. Li, Kantorovich’s theorems for Newton’s method for map-pings and optimization problems on Lie groups, IMA J. Numer. Anal. 31 (1)(2011) 322–347.[21] O. P. Ferreira, R. C. M. Silva, Local convergence of Newton’s method undera majorant condition in Riemannian manifolds, IMA J. Numer. Anal. 32 (4)(2012) 1696–1713.[22] O. P. Ferreira, B. F. Svaiter, Kantorovich’s theorem on Newton’s methodin Riemannian manifolds, J. Complexity 18 (1) (2002) 304–329.[23] C. Li, J. Wang, Newton’s method on Riemannian manifolds: Smale’s pointestimate theory under the γγ