Explicit continuation methods with L-BFGS updating formulas for linearly constrained optimization problems
JJournal of XXX manuscript No. (will be inserted by the editor)
Explicit continuation methods with L-BFGS updatingformulas for linearly constrained optimization problems
Xin-long Luo ∗ · Jia-hui Lv · Hang Xiao
Received: date / Accepted: date
Abstract
This paper considers an explicit continuation method with the trusty time-stepping scheme and the limited-memory BFGS (L-BFGS) updating formula (Eptctr)for the linearly constrained optimization problem. At every iteration, Eptctr only in-volves three pairs of the inner product of vector and one matrix-vector product, otherthan the traditional and representative optimization method such as the sequentialquadratic programming (SQP) or the latest continuation method such as Ptctr [25],which needs to solve a quadratic programming subproblem (SQP) or a linear systemof equations (Ptctr). Thus, Eptctr can save much more computational time than SQPor Ptctr. Numerical results also show that the consumed time of EPtctr is about onetenth of that of Ptctr or one fifteenth to 0.4 percent of that of SQP. Furthermore, Eptctrcan save the storage space of an ( n + m ) × ( n + m ) large-scale matrix, in comparisonto SQP. The required memory of Eptctr is about one fifth of that of SQP. Finally,we also give the global convergence analysis of the new method under the standardassumptions. Keywords continuation method · trust-region method · SQP · structure-preservingalgorithm · generalized projected gradient flow · large-scale optimization Xin-long LuoCorresponding author. School of Artificial Intelligence,Beijing University of Posts and Telecommunications, P. O. Box 101,Xitucheng Road No. 10, Haidian District, 100876, Beijing ChinaE-mail: [email protected] LvSchool of Artificial Intelligence,Beijing University of Posts and Telecommunications, P. O. Box 101,Xitucheng Road No. 10, Haidian District, 100876, Beijing ChinaE-mail: [email protected] XiaoSchool of Artificial Intelligence,Beijing University of Posts and Telecommunications, P. O. Box 101,Xitucheng Road No. 10, Haidian District, 100876, Beijing ChinaE-mail: [email protected] a r X i v : . [ m a t h . O C ] J a n Luo, Lv and Xiao
Mathematics Subject Classification (2010) · · In this article, we consider the following linearly equality-constrained optimizationproblem min x ∈ ℜ n f ( x ) subject to Ax = b , (1)where matrix A ∈ ℜ m × n and vector b ∈ ℜ m may have random noise. This problemhas many applications in engineering fields such as the visual-inertial navigation of anunmanned aerial vehicle maintaining the horizontal flight [8,25], and there are manypractical methods to solve it such as the sequential quadratic programming (SQP)method [3,30] or the penalty function method [11].For the constrained optimization problem (1), the continuation method [1,9,13,19,33,39] is another method other than the traditional optimization method such asSQP or the penalty function method. The advantage of the continuation method overthe SQP method is that the continuation method is capable of finding many local op-timal points of the non-convex optimization problem by tracking its trajectory, and itis even possible to find the global optimal solution [4,35,43]. However, the compu-tational efficiency of the continuation method may be higher than that of SQP. Re-cently, Luo, Lv and Sun [25] give a continuation method with the trusty time-steppingscheme and its consumed time is about one fifth of that of SQP for the linearly con-strained optimization problem (1). Their method only needs to solve a linear systemof equations with an n × n symmetric definite coefficient matrix at every iteration,which involves about n flops. SQP needs to solve a linear system of equations withan ( m + n ) × ( m + n ) coefficient matrix, which involves about ( m + n ) flops. Inorder to improve the computational efficiency further and save the storage of the con-tinuation method [25] for the large-scale optimization problem, we consider a speciallimited-memory BFGS updating formula and the trusty time-stepping scheme in thisarticle.The rest of the paper is organized as follows. In section 2, we give a new con-tinuation method with the trusty time-stepping scheme and the L-BFGS updatingformula for the linearly equality-constrained optimization problem (1). In section 3,we analyze the global convergence of this new method. In section 4, we report somepromising numerical results of the new method, in comparison to the traditional op-timization method (SQP) and the latest continuation method (Ptctr) for some large-scale problems. Finally, we give some discussions and conclusions in section 5. xplicit Continuation methods with L-BFGS updating formulas 3 In this section, we construct an explicit continuation method with the adaptive time-stepping scheme based on the trust-region updating strategy [44] for the linearlyequality-constrained optimization problem (1). Firstly, we construct a generalizedprojected gradient flow based on the KKT conditions of linearly constrained opti-mization problem. Then, in order to efficiently follow the generalized gradient flow,we construct an explicit continuation method with an adaptive time-stepping schemefor this special ordinary differential equations (ODEs). Furthermore, we give a pre-processing method for the infeasible initial point.2.1 The generalized projected gradient flowFor the linearly constrained optimization problem (1), it is well known that its opti-mal solution x ∗ needs to satisfy the Karush-Kuhn-Tucker conditions (p. 328, [30]) asfollows: ∇ x L ( x , λ ) = ∇ f ( x ) + A T λ = , (2) Ax − b = , (3)where the Lagrangian function L ( x , λ ) is defined by L ( x , λ ) = f ( x ) + λ T ( Ax − b ) . (4)Similarly to the method of the negative gradient flow for the unconstrained optimiza-tion problem [23], from the first-order necessary conditions (2)-(3), we can constructa dynamical system of differential-algebraic equations for problem (1) [21,22,24,36]as follows: dxdt = − ∇ L x ( x , λ ) = − (cid:0) ∇ f ( x ) + A T λ (cid:1) , (5) Ax − b = . (6)By differentiating the algebraic constraint (6) with respect to t and replacing itinto the differential equation (5), we obtain A dxdt = − A (cid:0) ∇ f ( x ) + A T λ (cid:1) = − A ∇ f ( x ) − AA T λ = . (7)If we assume that matrix A has full row rank further, from equation (7), we obtain λ = − (cid:0) AA T (cid:1) − A ∇ f ( x ) . (8)By replacing λ of equation (8) into equation (5), we obtain the dynamical system ofordinary differential equations (ODEs) as follows: dxdt = − (cid:16) I − A T (cid:0) AA T (cid:1) − A (cid:17) ∇ f ( x ) . (9) Luo, Lv and Xiao
Thus, we also obtain the projected gradient flow for the constrained optimizationproblem [39].For convenience, we denote the projection matrix P as P = I − A T (cid:0) AA T (cid:1) − A . (10)It is not difficult to verify P = P and AP =
0. That is to say, P is a symmetricprojection matrix and its eigenvalues are 0 or 1. From Theorem 2.3.1 in p. 73 of [15],we know that its matrix 2-norm is (cid:107) P (cid:107) = . (11)We denote P + as the Moore-Penrose generalized inverse of P (p. 11, [38]). Since P is symmetric and its eigenvalues are 0 or 1, it is not difficult to verify P + = P . (12)Thus, for a full rank matrix B ∈ ℜ n × n , we obtain the generalized inverse ( PB ) + of PB as follows: ( PB ) + = B + P + = B − P . (13)Similarly to the generalized gradient flow for an unconstrained optimization prob-lem (p. 361, [17]), from the projected gradient flow (9), we can construct the general-ized projected gradient flow for the constrained optimization problem (1) as follows: dxdt = − ( PH ( x ) P ) P ∇ f ( x ) , x ( ) = x , (14)where H ( x ) is a symmetric positive definite matrix for any x ∈ ℜ n . Here, H ( x ) maybe selected as the inverse of the Hessian matrix ∇ f ( x ) of f ( x ) and PH ( x ) P can beregarded as a pre-conditioner of P ∇ f ( x ) to mitigate the stiffness of the ODEs (14).Consequently, we can adopt the explicit numerical method to compute the trajectoryof the ODEs (14) efficiently [26,27]. Remark 1 If x ( t ) is the solution of the ODEs (14), it is not difficult to verify that x ( t ) satisfies A ( dx / dt ) =
0. That is to say, if the initial point x satisfies Ax = b , the solu-tion x ( t ) of the generalized projected gradient flow (14) also satisfies Ax ( t ) = b , ∀ t ≥
0. This property is very useful when we construct a structure-preserving algorithm[16,40] to follow the trajectory of the ODEs (14) and obtain its equilibrium point x ∗ . Remark 2
If we assume that x ( t ) is the solution of the ODEs (14), from equations(10)-(11) and the positive definite property of H ( x ) , we obtain d f ( x ) dt = ( ∇ f ( x )) T dxdt = − ( ∇ f ( x )) T PH ( x ) P ∇ f ( x )= − ( P ∇ f ( x )) T H ( x )( P ∇ f ( x )) ≤ . xplicit Continuation methods with L-BFGS updating formulas 5 That is to say, f ( x ) is monotonically decreasing along the solution curve x ( t ) of thedynamical system (14). Furthermore, the solution x ( t ) converges to x ∗ when f ( x ) islower bounded and t tends to infinity [17,35,39], where x ∗ satisfies the first-orderKarush-Kuhn-Tucker conditions (2)-(3). Thus, we can follow the trajectory x ( t ) ofthe ODEs (14) to obtain its equilibrium point x ∗ , which is also one saddle point of theoriginal optimization problem (1).2.2 The explicit continuation methodThe solution curve of the degenerate ordinary differential equations is not efficientlyfollowed on an infinite interval by the traditional ODE method [2,5,20], so one needsto construct the particular method for this problem (14). We apply the first-orderimplicit Euler method [37] to the ODEs (14), then we obtain x k + = x k − ∆ t k ( PH ( x k ))( P ∇ f ( x k )) , (15)where ∆ t k is the time-stepping size.Since the system of equations (15) is a nonlinear system which is not directlysolved, we seek for its explicit approximation formula. We denote s k = x k + − x k . Byusing the first-order Taylor expansion, we have the linear approximation ∇ f ( x k ) + ∇ f ( x k ) s k of ∇ f ( x k + ) . By substituting it into equation (15) and using the zero-orderapproximation H ( x k ) of H ( x k + ) , we have s k ≈ − ∆ t k ( PH ( x k )) (cid:0) P (cid:0) ∇ f ( x k ) + ∇ f ( x k ) s k (cid:1)(cid:1) = − ∆ t k ( PH ( x k ))( P ∇ f ( x k )) − ∆ t k P ( H ( x k ) P )( P ∇ f ( x k )) s k . (16)From equation (15) and P = P , we have Ps k = s k . Let H ( x k ) = ( ∇ f ( x k )) − . Then,we have H ( x k ) P = ( P ∇ f ( x k )) + . Thus, we regard P ( H ( x k ) P )( P ∇ f ( x k ) Ps k ) = P ( P ∇ f ( x k )) + ( P ∇ f ( x k )) Ps k ≈ Ps k = s k . (17)By substituting it into equation (16), we obtain the explicit continuation method asfollows: s k = − ∆ t k + ∆ t k ( PH k )( Pg k ) , (18) x k + = x k + s k , (19)where g k = ∇ f ( x k ) and H k = ( ∇ f ( x k )) − or its quasi-Newton approximation in theprojective space S kp = { x : x = x k + Pd , d ∈ ℜ n } .If we let the projection matrix P = I , the formula (18) is equivalent to the explicitcontinuation method given by Luo, Xiao and Lv [26] for nonlinear equations. Theexplicit continuation method (18)-(19) is similar to the projected damped Newtonmethod if we let α k = ∆ t k / ( + ∆ t k ) in equation (18). However, from the view of theODE method, they are different. The projected damped Newton method is obtained Luo, Lv and Xiao by the explicit Euler scheme applied to the generalized projected gradient flow (14),and its time-stepping size α k is restricted by the numerical stability [37]. That is tosay, the large time-stepping size α k can not be adopted in the steady-state phase.The explicit continuation method (18)-(19) is obtained by the implicit Euler ap-proximation method applied to the generalized projected gradient flow (14), and itstime-stepping size ∆ t k is not restricted by the numerical stability. Therefore, the largetime-stepping size can be adopted in the steady-state phase for the explicit continua-tion method (18)-(19), and it mimics the Newton method near the equilibrium solu-tion x ∗ such that it has the fast local convergence rate. The most of all, the new stepsize α k = ∆ t k / ( ∆ t k + ) is favourable to adopt the trust-region updating techniquefor adaptively adjusting the time-stepping size ∆ t k such that the explicit continuationmethod (18)-(19) accurately tracks the trajectory of the generalized projected gradi-ent flow in the transient-state phase and achieves the fast convergence rate near theequilibrium point x ∗ . Remark 3
From equation (18) and the property AP = P , it isnot difficult to verify As k =
0. Thus, if the initial point x satisfies the linear constraint Ax = b , the point x k also satisfies the linear constraint Ax k = b . That is to say, theexplicit continuation method (18)-(19) is a structure-preserving method.2.3 The L-BFGS quasi-Newton updating formulaFor the large-scale problem, the numerical evaluation of the Hessian matrix ∇ f ( x k ) consumes much time and stores an n × n matrix. In order to overcome these two short-comings, we use the L-BFGS quasi-Newton formula ([6,14] or pp. 222-230, [30]) toapproximate the generalized inverse H ( x k ) P of P ∇ f ( x k ) . Recently, Ullah, Sabi andShah [41] give an efficient L-BFGS updating formula for the system of monotonenonlinear equations. Here, in order to suit the generalized projected gradient flow(14), we revise their L-BFGS updating formula as H k + = I − y k s Tk + s k y Tk y Tk s k + y Tk y k ( y Tk s k ) s k s Tk , if | s Tk y k | > θ (cid:107) s k (cid:107) , I , otherwise . (20)where s k = x k + − x k , y k = P ∇ f ( x k + ) − P ∇ f ( x k ) and θ is a small positive constantsuch as θ = − . The initial matrix H can be simply selected by the identity matrix.When | s Tk y k | ≥ θ (cid:107) s k (cid:107) , from equation (20), it is not difficult to verify H k + y k = y Tk y k y Tk s k s k . That is to say, H k + satisfies the scaling quasi-Newton property. By using the Sherman-Morrison-Woodburg formula, from equation (20), when | s Tk y k | ≥ θ (cid:107) s k (cid:107) , we have B k + = H − k + = I − s k s Tk s Tk s k + y k y Tk y Tk y k . xplicit Continuation methods with L-BFGS updating formulas 7 The L-BFGS updating formula (20) has some nice properties such as the sym-metric positive definite property and the positive lower bound of its eigenvalues.
Lemma 1
Matrix H k + defined by equation (20) is symmetric positive definite andits eigenvalues are greater than / . Proof. (i) For any nonzero vector z ∈ ℜ n , from equation (20), when | s Tk y k | > θ (cid:107) s k (cid:107) , we have z T H k + z = (cid:107) z (cid:107) − ( z T y k )( z T s k ) / y Tk s k + ( z T s k ) (cid:107) y k (cid:107) / ( y Tk s k ) = (cid:0) (cid:107) z (cid:107) − (cid:12)(cid:12) z T s k / y Tk s k (cid:12)(cid:12) (cid:107) y k (cid:107) (cid:1) + (cid:107) z (cid:107) (cid:12)(cid:12) z T s k / y Tk s k (cid:12)(cid:12) (cid:107) y k (cid:107)− ( z T y k )( z T s k ) / y Tk s k + (cid:107) y k (cid:107) ( z T s k / y Tk s k ) ≥ . (21)In the last inequality of equation (21), we use the Cauchy-Schwartz inequality (cid:107) z T y (cid:107) ≤(cid:107) z (cid:107)(cid:107) y k (cid:107) and its equality holds if only if z = ty k . When z = ty k , from equation (21), wehave z T H k + z = t (cid:107) y k (cid:107) = (cid:107) z (cid:107) >
0. When z T s k =
0, from equation (21), we also have z T H k + z = (cid:107) z (cid:107) >
0. Therefore, we conclude that H k + is a symmetric positive defi-nite matrix when | s Tk y k | > θ (cid:107) s k (cid:107) . From equation (20), We apparently conclude that H k + is a symmetric positive definite matrix since H k + = I when | s Tk y k | ≤ θ (cid:107) s k (cid:107) .(ii) It is not difficult to know that it exists at least n − z , z , . . . , z n − such that z Ti s k = , z Ti y k = ( i = ( n − )) hold. That is tosay, matrix H k + defined by equation (20) has at least ( n − ) linearly independenteigenvectors whose corresponding eigenvalues are 1. We denote the other two eigen-values of H k + as µ k + i ( i = ) and their corresponding eigenvalues as p and p ,respectively. Then, from equation (20), we know that the eigenvectors p i ( i = ) can be represented as p i = y k + β i s k when µ k + i (cid:54) = ( i = ) . From equation (20)and H k + p i = µ k + i p i ( i = ) , we have − (cid:18) µ k + i + β i s Tk s k s Tk y k (cid:19) y k + (cid:18) y Tk y k y Tk s k + β i ( y Tk y k )( s Tk s k )( y Tk s k ) − µ k + i β i (cid:19) s k = , i = n . (22)When y k = ts k , from equation (20), we have H k + = I . In this case, we concludethat the eigenvalues of H k + are greater than 1 /
2. When vectors y k and s k are linearlyindependent, from equation (22), we have µ k + i + β i s Tk s k / s Tk y k = , y Tk y k / y Tk s k + β i ( y Tk y k )( s Tk s k ) / ( y Tk s k ) − µ k + i β i = , i = n . That is to say, µ k + i ( i = ) are the two solutions of the following equation: µ − µ ( y Tk y k )( s Tk s k ) / ( s Tk y k ) + ( y Tk y k )( s Tk s k ) / ( s Tk y k ) = . (23)Consequently, from equation (23), we obtain µ k + + µ k + = ( y Tk y k )( s Tk s k ) / ( s Tk y k ) , µ k + µ k + = ( y Tk y k )( s Tk s k ) / ( s Tk y k ) . (24) Luo, Lv and Xiao
From equation (24), it is not difficult to obtain1 / µ k + + / µ k + = , µ k + i > , i = . (25)Therefore, from equation (25), we conclude that µ k + i > ( i = ) . Consequently,the eigenvalues of H k + are greater than 1/2. (cid:117)(cid:116) If s k − is obtained from the explicit continuation method (18), we have Ps k − = s k − since P = P . By combining it with the L-BFGS updating formula (20), theexplicit continuation method (18)-(19) can be simplified by d k = − p g k , if | s Tk − y k − | ≤ θ (cid:107) s k − (cid:107) , − p g k + y k − ( s Tk − p gk )+ s k − ( y Tk − p gk ) y Tk − s k − − (cid:107) y k − (cid:107) ( s Tk − p gk )( y Tk − s k − ) s k − , otherwise , (26) s k = ∆ t k + ∆ t k d k , x k + = x k + s k , (27)where p g k = Pg k = P ∇ f ( x k ) and y k − = P ∇ f ( x k ) − P ∇ f ( x k − ) . Thus, it does notneed to store the matrix H k in practical computation. Furthermore, it only requiresthree pairs of the inner product of vector and one matrix-vector product ( p g k = Pg k )to obtain the trial step s k and involves in O (( n − m ) n ) flops when we use the QRdecomposition or the singular value decomposition to obtain the projection matrix P in subsection 2.4.2.4 The treatments of infeasible initial points and projection matricesWe need to compute the projected gradient Pg k at every iteration in the updating for-mula (26). In order to reduce the computational complexity, we use the QR factoriza-tion (pp.276-278, [15]) to factor A T into a product of an orthogonal matrix Q ∈ ℜ n × n and an upper triangular matrix R ∈ ℜ n × m : A T = QR = (cid:2) Q | Q (cid:3) (cid:20) R (cid:21) , (28)where Q = Q ( n , m ) , Q = Q ( n , m + n ) , R = R ( r , m ) is upper tri-angular and nonsingular. Then, from equations (10), (28), we simplify the projectionmatrix P as P = I − Q Q T = Q Q T . (29)In practical computation, we adopt the different formulas of the projection P accord-ing to m ≤ n / m > n /
2. Thus, we give the computational formula of the projectedgradient Pg k as follows: Pg k = (cid:40) g k − Q (cid:0) Q T g k (cid:1) , if m ≤ n , Q (cid:0) Q T g k (cid:1) , otherwise . (30) xplicit Continuation methods with L-BFGS updating formulas 9 For a real-world optimization problem (1), we probably meet the infeasible initialpoint x . That is to say, the initial point can not satisfy the constraint Ax = b . Wehandle this problem by solving the following projection problem:min x ∈ ℜ n (cid:107) x − x (cid:107) subject to Q T x = b r , (31)where b r = (cid:0) R R T (cid:1) − ( R b ) . By using the Lagrangian multiplier method and the QRfactorization (28) of matrix A T to solve problem (31), we obtain the initial feasiblepoint x F of problem (1) as follows: x F = x − Q (cid:0) Q T Q (cid:1) − (cid:0) Q T x − b r (cid:1) = x − Q (cid:0) Q T x − b r (cid:1) . (32)For convenience, we set x = x F in line 4, Algorithm 1.2.5 The trusty time-stepping schemeAnother issue is how to adaptively adjust the time-stepping size ∆ t k at every iteration.We borrow the adjustment method of the trust-region radius from the trust-regionmethod due to its robust convergence and fast local convergence [10]. After the pre-process of the initial point x , it is feasible. According to the structure-preservingproperty of the explicit continuation method (18)-(20), x k + will preserve the feasi-bility. That is to say, x k + satisfies Ax k + = b . Therefore, we use the objective function f ( x ) instead of the nonsmooth penalty function f ( x )+ σ (cid:107) Ax − b (cid:107) as the cost function.When we use the trust-region updating strategy to adaptively adjust time-steppingsize ∆ t k [18], we need to construct a local approximation model of the objective f ( x ) around x k . Here, we adopt the following quadratic function as its approximationmodel: q k ( x k + s ) = f ( x k ) + s T g k + s T H − k s . (33)In practical computation, we do not store the matrix H k . Thus, we use the explicitcontinuation method (18)-(20) and regard ( H k P )( H k P ) + ≈ I to simplify the quadraticmodel q k ( x k + s k ) − q ( x k ) as follows: m k ( s k ) = g Tk s k − . ∆ t k + ∆ t k g Tk s k = + . ∆ t k + ∆ t k g Tk s k ≈ q k ( x k + s k ) − q k ( x k ) . (34)where g k = ∇ f ( x k ) . We enlarge or reduce the time-stepping size ∆ t k at every iterationaccording to the following ratio: ρ k = f ( x k ) − f ( x k + ) m k ( ) − m k ( s k ) . (35)A particular adjustment strategy is given as follows: ∆ t k + = γ ∆ t k , i f ≤ | − ρ k | ≤ η , ∆ t k , i f η < | − ρ k | < η , γ ∆ t k , i f | − ρ k | ≥ η , (36) where the constants are selected as η = . , γ = , η = . , γ = . ρ k ≥ η a , we accept the trial step s k and let x k + = x k + s k , where η a is a small positive number such as η a = . × − . Otherwise, wediscard it and let x k + = x k .According to the above discussions, we give the detailed implementation of theexplicit continuation method with the trusty time-stepping scheme for the linearlyequality-constrained optimization problem (1) in Algorithm 1. Algorithm 1
The explicit continuation method with the trusty time-stepping schemefor linearly constrained optimization (the Eptctr method)
Input: the objective function f ( x ) , the linear constraint Ax = b , the initial point x (optional), the terminatedparameter ε (optional). Output: the optimal approximation solution x ∗ .1: Set x = ones ( n , ) and ε = − as the default values.2: Initialize the parameters: η a = − , η = . , γ = , η = . , γ = . , θ = − .3: Factorize matrix A T with the QR factorization (28).4: Compute x ← x − Q (cid:0) Q T x − b r (cid:1) , such that x satisfies the linear system of constraints Ax = b .5: Set k =
0. Evaluate f = f ( x ) and g = ∇ f ( x ) .6: Compute the projected gradient p g = Pg according to the formula (30). Set y − = s − = ∆ t = − .8: while (cid:107) p g k (cid:107) > ε do if (cid:0) | s Tk − y k − | > θ s Tk − s k − (cid:1) then s k = − ∆ t k + ∆ t k (cid:18) p g k − y k − ( s Tk − p gk )+ s k − ( y Tk − p gk ) y Tk − s k − + (cid:107) y k − (cid:107) ( s Tk − p gk )( y Tk − s k − ) s k − (cid:19) .11: else s k = − ∆ t k + ∆ t k p g k .13: end if
14: Compute x k + = x k + s k .15: Evaluate f k + = f ( x k + ) and compute the ratio ρ k from equations (34)-(35).16: if ρ k ≤ η a then
17: Set x k + = x k , f k + = f k , p g k + = p g k , g k + = g k , y k = y k − . else
19: Evaluate g k + = ∇ f ( x k + ) .20: Compute p g k + = Pg k + according to the formula (30). Set y k = p g k + − p g k and s k = x k + − x k .21: end if
22: Adjust the time-stepping size ∆ t k + based on the trust-region updating scheme (36).23: Set k ← k + end while In this section, we analyze the global convergence of the explicit continuation method(18)-(19) with the trusty time-stepping scheme and the L-BFGS updating formula(20) for the linearly equality-constrained optimization problem (i.e. Algorithm 1). xplicit Continuation methods with L-BFGS updating formulas 11
Firstly, we give a lower-bounded estimate of m k ( ) − m k ( s k ) ( k = , , . . . ) . This re-sult is similar to that of the trust-region method for the unconstrained optimizationproblem [34]. For simplicity, we assume that the rank of matrix A is full. Lemma 2
Assume that the quadratic model q k ( x ) is defined by equation (34) and s k is computed by the explicit continuation method (18) - (20) . Then, we havem k ( ) − m k ( s k ) ≥ ∆ t k ( + ∆ t k ) (cid:13)(cid:13) p g k (cid:13)(cid:13) , (37) where p g k = Pg k = P ∇ f ( x k ) and the projection matrix P is defined by equation (10) . Proof.
From equation (20) and Lemma 1, we know that H k is symmetric positivedefinite and its eigenvalues are greater than 1/2. According to the eigenvalue decom-position of H k , we know that it exists an orthogonal matrix Q k such that H k = Q Tk S k Q k holds, where S k = diag ( µ k , . . . , µ kn ) and µ ki ( i = n ) are the eigenvalues of H k . Wedenote the smallest eigenvalue of H k is µ kmin . From the explicit continuation method(18) and P = P , we know that s k = Ps k . By combining it with the explicit continua-tion method (18) and the quadratic model (34), we have m k ( ) − m k ( s k ) ≥ − g Tk s k = − g Tk Ps k = − ( Pg k ) T s k = ∆ t k ( + ∆ t k ) ( Pg k ) T H k ( Pg k )= ∆ t k ( + ∆ t k ) ( Pg k ) T ( Q Tk S k Q k )( Pg k ) = ∆ t k ( + ∆ t k ) ( QPg k ) T S k ( QPg k ) ≥ µ kmin ∆ t k ( + ∆ t k ) (cid:107) QPg k (cid:107) ≥ ∆ t k ( + ∆ t k ) (cid:107) QPg k (cid:107) = ∆ t k ( + ∆ t k ) (cid:107) Pg k (cid:107) . (38)In the first inequality in equation (38), we use the property ( + . ∆ t k ) / ( + ∆ t k ) ≥ . ∆ t k ≥
0. Consequently, we prove the result (37). (cid:117)(cid:116)
In order to prove that p g k converges to zero when k tends to infinity, we needto estimate the lower bound of time-stepping sizes ∆ t k ( k = , , . . . ) . We denote theconstrained level set S f as S f = { x : f ( x ) ≤ f ( x ) , Ax = b } . (39) Lemma 3
Assume that f : ℜ n → ℜ is continuously differentiable and its gradientg ( x ) satisfies the following Lipschitz continuity: (cid:107) g ( x ) − g ( y ) (cid:107) ≤ L c (cid:107) x − y (cid:107) , ∀ x , y ∈ S f . (40) where L c is the Lipschitz constant. We suppose that the sequence { x k } is generatedby Algorithm 1. Then, there exists a positive constant δ ∆ t such that ∆ t k ≥ γ δ ∆ t (41) holds for all k = , , . . . , where ∆ t k is adaptively adjusted by the trust-region updat-ing scheme (34) - (36) . Proof.
From Lemma 1, we know that the eigenvalues of H k is greater than 1/2and it has at least n − | s Tk − y k − | ≥ θ (cid:107) s k − (cid:107) , wedenote the other two eigenvalues of H k as µ k and µ k . By substituting it into equation(24), we obtain µ k µ k = (cid:107) y k − (cid:107) (cid:107) s k − (cid:107) ( s Tk − y k − ) ≤ (cid:107) y k − (cid:107) (cid:107) s k − (cid:107) θ (cid:107) s k − (cid:107) = θ (cid:107) y k − (cid:107) (cid:107) s k − (cid:107) . (42)From Lemma 2 and Algorithm 1, we know f ( x k ) ≤ f ( x ) ( k = , , . . . ) . Fromthe explicit continuation method (18)-(20) and Remark 3, we know that Ax k = Ax = b ( k = , , . . . ) . Thus, from the Lipschitz continuity (40) of g ( x ) , we have (cid:107) y k − (cid:107) ≤ (cid:107) P (cid:107)(cid:107) g ( x k ) − g ( x k − ) (cid:107) ≤ L C (cid:107) x k − x k − (cid:107) = L C (cid:107) s k − (cid:107) . (43)By substituting it into equation (42) and using µ ki > ( i = , ) , we obtain12 < µ ki < L C θ , i = , . (44)That is to say, the eigenvalues of H k are less than or equal to M H , where M H = max { , L C / θ } . According to the eigenvalue decomposition theorem, we knowthat there exists an orthogonal matrix Q k such that H k = Q Tk S k Q k holds, where S k = diag ( µ k , . . . , µ kn ) and µ ki ( i = n ) are the eigenvalues of H k . Consequently, we have (cid:107) H k ( Pg k ) (cid:107) = (cid:107) ( Q Tk S k Q k ) Pg k (cid:107) = (cid:107) S k ( Q k Pg k ) (cid:107) ≤ M H (cid:107) Pg k (cid:107) . (45)From the first-order Taylor expansion, we have f ( x k + s k ) = f ( x k ) + (cid:90) s Tk g ( x k + ts k ) dt . (46)Thus, from equations (34)-(37), (46) and the Lipschitz continuity (40) of g ( x ) , wehave | ρ k − | = (cid:12)(cid:12)(cid:12)(cid:12) ( f ( x k ) − f ( x k + s k )) − ( m k ( ) − m k ( s k )) m k ( ) − m k ( s k ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ + ∆ t k + . ∆ t k (cid:12)(cid:12)(cid:12) (cid:82) s Tk ( g ( x k + ts k ) − g ( x k )) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) s Tk g k (cid:12)(cid:12) + . ∆ t k + . ∆ t k ≤ L C ( + ∆ t k ) ∆ t k (cid:107) s k (cid:107) (cid:107) Pg k (cid:107) + . ∆ t k + . ∆ t k . (47)By substituting equation (18) and equation (45) into equation (47), we have | ρ k − | ≤ L C ∆ t k + ∆ t k (cid:107) PH k ( Pg k ) (cid:107) (cid:107) Pg k (cid:107) + . ∆ t k + . ∆ t k ≤ L C ∆ t k + ∆ t k (cid:107) P (cid:107) (cid:107) H k ( Pg k ) (cid:107) (cid:107) Pg k (cid:107) + . ∆ t k + . ∆ t k ≤ ( L C M H + . ) ∆ t k + . ∆ t k . (48) xplicit Continuation methods with L-BFGS updating formulas 13 In the last inequality of equation (48), we use the property (cid:107) P (cid:107) =
1. We denote δ ∆ t (cid:44) η L C M H + . . (49)Then, from equation (48)-(49), when ∆ t k ≤ δ ∆ t , it is not difficult to verify | ρ k − | ≤ ( L C M H + . ) ∆ t k ≤ η . (50)We assume that K is the first index such that ∆ t K ≤ δ ∆ t where δ ∆ t is defined byequation (49). Then, from equations (49)-(50), we know that | ρ K − | ≤ η . Accord-ing to the time-stepping adjustment formula (36), x K + s K will be accepted and thetime-stepping size ∆ t K + will be enlarged. Consequently, the time-stepping size ∆ t k holds ∆ t k ≥ γ δ ∆ t for all k = , , . . . . (cid:117)(cid:116) By using the results of Lemma 2 and Lemma 3, we prove the global convergenceof Algorithm 1 for the linearly constrained optimization problem (1) as follows.
Theorem 1
Assume that f : ℜ n → ℜ is continuously differentiable and its gradient ∇ f ( x ) satisfies the Lipschitz continuity (40) . Furthermore, we suppose that f ( x ) islower bounded when x ∈ S f , where the constrained level set S f is defined by equation (39) . The sequence { x k } is generated by Algorithm 1. Then, we have lim k → ∞ inf (cid:107) Pg k (cid:107) = , (51) where g k = ∇ f ( x k ) and matrix P is defined by equation (10) . Proof.
According to Lemma 3 and Algorithm 1, we know that there exists an infi-nite subsequence { x k i } such that trial steps s k i are accepted, i.e., ρ k i ≥ η a , i = , , . . . .Otherwise, all steps are rejected after a given iteration index, then the time-steppingsize will keep decreasing, which contradicts (41). Therefore, from equations (35) and(37), we have f ( x ) − lim k → ∞ f ( x k ) = ∞ ∑ k = ( f ( x k ) − f ( x k + )) ≥ η a ∞ ∑ i = (cid:0) m k i ( ) − m k i ( s k i ) (cid:1) ≥ η a ∞ ∑ i = ∆ t k i ( ∆ t k i + ) (cid:107) Pg k i (cid:107) . (52)From the result (41) of Lemma 3, we know that ∆ t k ≥ γ δ ∆ t ( k = , , . . . ) . Bysubstituting it into equation (52), we have f ( x ) − lim k → ∞ f ( x k ) ≥ η a ∞ ∑ i = γ δ ∆ t ( γ δ ∆ t + ) (cid:107) Pg k i (cid:107) . (53)Since f ( x ) is lower bounded when x ∈ S f and the sequence { f ( x k ) } is monotonicallydecreasing, we have lim k → ∞ f ( x k ) = f ∗ . By substituting it into equation (53), weobtain the result (51). (cid:117)(cid:116) In this section, some numerical experiments are executed to test the performance ofAlgorithm 1 (the Eptctr method). The codes are executed by a Dell G3 notebook withthe Intel quad-core CPU and 20Gb memory. We compare Eptctr with SQP (the built-in subroutine fmincon.m of the MATLAB2018a environment) [12,14,29,30,42] Ptctr[25]) for some large-scale linearly constrained-equality optimization problems whichare listed in Appendix A. SQP is the traditional and representative optimization forthe constrained optimization problem. Ptctr is significantly better than SQP for lin-early constrained optimization problems according to the numerical results in [25].Therefore, we select these two typical methods as the basis for comparison.The termination conditions of the three compared methods are all set by (cid:107) ∇ x L ( x k , λ k ) (cid:107) ∞ ≤ . × − , (54) (cid:107) Ax k − b (cid:107) ∞ ≤ . × − , k = , , . . . , (55)where the Lagrange function L ( x , λ ) is defined by equation (4) and λ is defined byequation (8).We test those ten problems with n ≈ p g k = Pg k ) to ob-tain the trial step s k and involves about ( n − m ) n flops at every iteration. However,Ptctr needs to solve a linear system of equations with an n × n symmetric definitecoefficient matrix and involves about n flops (p. 169, [15]) at every iteration. SQPneeds to solve a linear system of equations with dimension ( m + n ) when it solvesa quadratic programming subproblem at every iteration (pp. 531-532, [30]) and in-volves about ( m + n ) flops (p. 116, [15]). Furthermore, Eptctr can save the storagespace of an ( n + m ) × ( n + m ) large-scale matrix, in comparison to SQP. The requiredmemory of Eptctr is about one fifth of that of SQP. In this paper, we give an explicit continuation method with the trusty time-steppingscheme and the L-BFGS updating formula (Eptctr) for linearly equality-constrainedoptimization problems. This method only involves three pairs of the inner product ofvector and one matrix-vector product ( p g k = Pg k ) at every iteration, other than the xplicit Continuation methods with L-BFGS updating formulas 15 Table 1: Numerical results of test problems with n ≈ Problems Ptctr Eptctr SQPsteps(time) f ( x (cid:63) ) Mem/Gb steps(time) f ( x (cid:63) ) Mem/Gb steps(time) f ( x (cid:63) ) Mem/GbExam. 1(n = 5000,m = n/2) 11(15.46) 3.64E+04 3.41 11(1.94) 3.64E+04 0.51 2(34.93) 3.64E+04 2.43Exam. 2(n = 4800,m = n/2) 17(16.06) 5.78E+03 4.09 15(1.58) 5.78E+03 0.31 14(116.16) 5.78E+03 1.53Exam. 3(n = 4800,m = 2/3n) 12(23.51) 2.86E+03 3.40 12(2.11) 2.86E+03 0.54 3(56.06) 2.86E+03 3.08Exam. 4(n = 5000,m = n/2) 11(17.07) 493.79 3.41 11(2.02) 493.79 0.51 8(123.04) 493.79 2.43Exam. 5(n = 5000,m = n/2) 14(16.79) 432.15 3.97 13(2.04) 432.15 0.51 11(178.41) 432.15 2.43Exam. 6(n = 4800,m = 2/3n) 13(23.58) 2.06E+03 3.57 15(2.17) 2.06E+03 0.54 11(211.35) 2.06E+03 3.10Exam. 7(n = 5000,m = n/2) 10(15.02) 5.94E+04 3.22 13(2.07) 5.94E+04 0.51 20(344.37) 5.94E+04 2.43Exam. 8(n = 4800,m = n/3) 38(38.29) 776.88 7.36 133(3.21) -1.21E+04 0.31 28(243.09) 784.94 1.53Exam. 9(n = 5000,m = n/2) 12(22.59) 2.21E+05 3.59 8(1.94) 2.21E+05 0.51 29(490.36) 2.21E+05 2.43Exam. 10(n = 4800,m = n/3) 16(13.51) 2.00 3.92 20(1.39) 2.00 0.31 14(109.59) 2.00 1.53 traditional optimization method such as SQP or the latest continuation method suchas Ptctr [25], which needs to solve a quadratic programming subproblem (SQP) or alinear system of equations (Ptctr). Thus, Eptctr involves about ( n − m ) n flops, Ptctrinvolves about n flops, and SQP involves about ( m + n ) flops at every iteration.This means that Eptctr can save much more computational time than SQP or Ptctr.Numerical results also show that the consumed time of EPtctr is about one tenth ofthat Ptctr or one fifteenth to 0.4 percent of that of SQP for the test problem with n ≈ ( n + m ) × ( n + m ) large-scale matrix, in comparison to SQP. The required memory of Eptctr is aboutone fifth of that of SQP. Therefore, Eptctr is worth exploring further, and we willextend it to the general nonlinear optimization problem in the future. Acknowledgments
This work was supported in part by Grant 61876199 from National Natural ScienceFoundation of China, Grant YBWL2011085 from Huawei Technologies Co., Ltd.,and Grant YJCB2011003HI from the Innovation Research Program of Huawei Tech-nologies Co., Ltd.. The first author is grateful to Prof. Ya-xiang Yuan and Prof. Li-zhiLiao for their suggestions. C on s u m ed T i m e PtctrEptctrSQP
Fig. 1: The consumed CPU time (s) of Ptctr, Eptctr and SQP for test problems with n ≈ A Test Problems
Example 1. m = n / x ∈ ℜ n f ( x ) = n / ∑ k = (cid:0) x k − + x k (cid:1) , subject to x i − + x i = , i = , ,..., m . This problem is extended from the problem of [31]. We assume that the feasible initial point is ( , , ..., , ) . Example 2. m = n / x ∈ ℜ n f ( x ) = n / ∑ k = (cid:16) ( x k − − ) + ( x k − ) (cid:17) − , subject to x i − + x i − + x i = , i = , ,..., n / . We assume that the infeasible initial point is ( − . , . , , , ..., , ) . Example 3. m = ( / ) n min x ∈ ℜ n f ( x ) = n ∑ k = x k , subject to x i − + x i − + x i = , x i − − x i − − x i = , i = , ,..., n / . xplicit Continuation methods with L-BFGS updating formulas 17This problem is extended from the problem of [32]. The infeasible initial point is ( , . , − , ..., , . , − ) . Example 4. m = n / x ∈ ℜ n f ( x ) = n / ∑ k = (cid:16) x k − + x k (cid:17) − , subject to x i − + x i = , i = , , ..., n / . This problem is modified from the problem of [28]. We assume that the infeasible initial point is ( , , ..., ) . Example 5. m = n / x ∈ ℜ n f ( x ) = n / ∑ k = (cid:16) ( x k − − ) + ( x k − ) (cid:17) − , subject to x i − + x i = , i = , , ..., m . We assume that the feasible initial point is ( − , , − , , ..., − , ) . Example 6. m = ( / ) n min x ∈ ℜ n f ( x ) = n / ∑ k = (cid:16) x k − + x k − + x k (cid:17) , subject to x i − + x i − + x i = , x i − − x i − − x i = , i = , , ..., m / . This problem is extended from the problem of [32]. We assume that the infeasible initial point is ( , , ..., ) . Example 7. m = n / x ∈ ℜ n f ( x ) = n / ∑ k = (cid:0) x k − + x k (cid:1) , subject to x i − + x i = , i = , , ..., n / . This problem is extended from the problem of [7]. We assume that the infeasible initial point is ( , , , ..., , ) . Example 8. m = n / x ∈ ℜ n f ( x ) = n / ∑ k = (cid:0) x k − + x k − x k + x k − x k − + x k − + x k − (cid:1) , subject to 2 x i − + x i − + x i = , i = , , ..., m . We assume that the infeasible initial point is ( . , , , ..., ) . Example 9. m = n / x ∈ ℜ n f ( x ) = n / ∑ k = (cid:16) x k − + x k (cid:17) , subject to x i − + x i = , i = , , ..., m . This problem is extended from the problem of [31]. We assume that the feasible initial point is ( , , ..., , ) . Example 10. m = n / x ∈ ℜ n f ( x ) = n / ∑ k = (cid:16) x k − + x k − + x k (cid:17) , subject to x i − + x i − + x i = , i = , , ..., m . This problem is modified from the problem of [43]. The feasible initial point is ( , , , ..., , , ) .8 Luo, Lv and Xiao References
1. E. L. Allgower and K. Georg,
Introduction to Numerical Continuation Methods , SIAM, Philadelphia,PA, 2003.2. U. M. Ascher and L. R. Petzold,
Computer Methods for Ordinary Differential Equations andDifferential-Algebraic Equations , SIAM, Philadelphia, PA, 1998.3. D. P. Bertsekas,
Nonlinear Programming (3rd Edition) , Tsinghua University Press, 2018.4. A. A. Brown and M. C. Bartholomew-Biggs,
ODE versus SQP methods for constrained optimization ,Journal of Optimization and Theory Applications, (3): 371-386, 1989.5. K. E. Brenan, S. L. Campbell and L. R. Petzold, Numerical solution of initial-value problems indifferential-algebraic equations , SIAM, Philadelphia, PA, 1996.6. R. Byrd, J. Nocedal and Y. X. Yuan,
Global convergence of a class of quasi-Newton methods onconvex problems , SIAM Journal of Numerical Analysis, : 1171-1189, 1987.7. K. Carlberg, Lecture notes of constrained optimization , , 2009.8. F. Caballero, L. Merino, J. Ferruz and A. Ollero, Vision-based odometry and SLAM for medium andhigh altitude flying UAVs , Journal of Intelligent and Robotic Systems, (1-3): 137-161, 2009.9. T. S. Coffey, C. T. Kelley and D. E. Keyes, Pseudotransient continuation and differential-algebraicequations , SIAM Journal on Scientific Computing, : 553-569, 2003.10. A. R. Conn, N. Gould and Ph. L. Toint, Trust-Region Methods , SIAM, Philadelphia, USA, 2000.11. A. V. Fiacco and G. P. McCormick,
Nonlinear programming: Sequential Unconstrained MinimizationTechniques , SIAM, 1990.12. R. Fletcher and M. J. D. Powell,
A rapidly convergent descent method for minimization , ComputerJournal, : 163-168, 1963.13. B. S. Goh, Approximate greatest descent methods for optimization with equality constraints , Journalof Optimization Theory and Applications (3): 505-527, 2011.14. D. Goldfarb,
A family of variable metric updates derived by variational means , Mathematics of Com-puting, : 23-26, 1970.15. G. H. Golub and C. F. Van Loan, Matrix Computations , 4th ed., The Johns Hopkins University Press,2013.16. E. Hairer, C. Lubich and G. Wanner,
Geometric Numerical Integration: Structure-Preserving Algo-rithms for Ordinary Differential Equations , 2nd ed., Springer, Berlin, 2006.17. U. Helmke and J. B. Moore,
Optimization and Dynamical Systems , 2nd ed., Springer-Verlag, London,1996.18. D. J. Higham,
Trust region algorithms and timestep selection , SIAM Journal on Numerical Analysis, : 194-210, 1999.19. C. T. Kelley, L.-Z. Liao, L. Qi, M. T. Chu, J. P. Reese and C. Winton, Projected PseudotransientContinuation , SIAM Journal on Numerical Analysis, : 3071-3083, 2008.20. D. G. Liu and J. G. Fei, Digital Simulation Algorithms for Dynamic Systems (in Chinese), SciencePress, Beijing, 2000.21. S.-T. Liu and X.-L. Luo,
A method based on Rayleigh quotient gradient flow for extreme and interioreigenvalue problems , Linear Algebra and its Applications, (7): 1851-1863, 2010.22. X.-L. Luo,
A dynamical method of DAEs for the smallest eigenvalue problem , Journal of Computa-tional Science, (3): 113-119, 2012.23. X.-L. Luo, C. T. Kelley, L.-Z. Liao and H.-W. Tam, Combining trust-region techniques and Rosen-brock methods to compute stationary points , Journal of Optimization Theory and Applications, (2): 265-286, 2009.24. X.-L. Luo, J.-R. Lin and W.-L. Wu,
A prediction-correction dynamic method for large-scale general-ized eigenvalue problems , Abstract and Applied Analysis, Article ID 845459, 1-8, http://dx.doi.org/10.1155/2013/845459 , 2013.25. X.-L. Luo, J.-H. Lv and G. Sun,
Continuation method with the trusty time-stepping scheme for lin-early constrained optimization with noisy data , published online in http://arxiv.org/abs/2005.05965 or http://doi.org/10.1007/s11081-020-09590-z , Optimization and Engineering, Ac-cepted, January 10, 2021.26. X.-L. Luo, H. Xiao and J.-H. Lv, Continuation Newton methods with the residual trust-region time-stepping scheme for nonlinear equations , June 2020, arXiv preprint, http://arxiv.org/abs/2006.02634 .xplicit Continuation methods with L-BFGS updating formulas 1927. X.-L. Luo and Y.Y. Yao,
Primal-dual path-following methods and the trust-region updating strategyfor linear programming with noisy data , June 2020, arXiv preprint available at http://arxiv.org/abs/2006.07568 , minor revision resubmitted to Journal of Computational Mathematics, January 16,2021.28. M.-W. Mak,
Lecture notes of constrained optimization and support vector machines , , 2019.29. MATLAB 9.4.0 (R2018a), The MathWorks Inc., , 2018.30. J. Nocedal and S. J. Wright, Numerical Optimization , Springer-Verlag, 1999.31. N. H. Kim,
Leture notes of constrained optimization , https://mae.ufl.edu/nkim/eas6939/ConstrainedOpt.pdf , 2010.32. M. J. Obsborne, Mathematical methods for economic theory , https://mjo.osborne.economics.utoronto.ca/index.php/tutorial/index/1/mem , 2016.33. P.-Q. Pan, New ODE methods for equality constrained optimization (2): algorithms , Journal of Com-putational Mathematics, (2): 129-146, 1992.34. M. J. D. Powell, Convergence properties of a class of minimization algorithms , in: O.L. Mangasarian,R. R. Meyer and S. M. Robinson, eds.,
Nonlinear Programming 2 , Academic Press, New York, 1-27,1975.35. J. Schropp,
A dynamical systems approach to constrained minimization , Numerical Functional Anal-ysis and Optimization, (3-4): 537-551, 2000.36. J. Schropp, One- and multistep discretizations of index 2 differential algebraic systems and their usein optimization , Journal of Computational and Applied Mathematics, : 375-396, 2003.37. L. F. Shampine, I. Gladwell and S. Thompson,
Solving ODEs with MATLAB , Cambridge UniversityPress, Cambridge, 2003.38. W. Y. Sun and Y. X. Yuan,
Optimization Theory and Methods: Nonlinear Programming , Springer,New York, 2006.39. K. Tanabe,
A geometric method in nonlinear programming , Journal of Optimization Theory and Ap-plications, (2): 181-210, 1980.40. T. E. Simos, New open modified Newton Cotes type formulae as multilayer symplectic integrators ,Applied Mathematical Modelling : 1983-1991, 2013.41. N. Ullah, J. Sabi and A. Shah, A derivative-free scaled memoryless BFGS method for solving a systemof monotone nonlinear equations , Submitted to Numerical Linear Algebra with Applications, October2020.42. R. B. Wilson,
A Simplicial Method for Convex Programming , Phd thesis, Harvard University, 1963.43. H. Yamashita,
A differential equation approach to nonlinear programming , Mathematical Program-ming, : 155-168, https://doi.org/10.1007/BF01588311 , 1980.44. Y. Yuan, Recent advances in trust region algorithms , Mathematical Programming,151