On solving large-scale limited-memory quasi-Newton equations
OON SOLVING LARGE-SCALE LIMITED-MEMORYQUASI-NEWTON EQUATIONS
JENNIFER B. ERWAY AND ROUMMEL F. MARCIA
Abstract.
We consider the problem of solving linear systems of equationsarising with limited-memory members of the restricted Broyden class of up-dates and the symmetric rank-one (SR1) update. In this paper, we proposea new approach based on a practical implementation of the compact repre-sentation for the inverse of these limited-memory matrices. Numerical resultssuggest that the proposed method compares favorably in speed and accuracyto other algorithms and is competitive with several update-specific methodsavailable to only a few members of the Broyden class of updates. Using theproposed approach has an additional benefit: The condition number of thesystem matrix can be computed efficiently.
Limited-memory quasi-Newton methods, compact representation, restricted Broy-den class of updates, symmetric rank-one update, Broyden-Fletcher-Goldfarb-Shannoupdate, Davidon-Fletcher-Powell update, Sherman-Morrison-Woodbury formula1.
Introduction
We consider linear systems of the following form:(1) B k +1 r = z, where r, z ∈ (cid:60) n and B k +1 ∈ (cid:60) n × n is a limited-memory quasi-Newton matrix ob-tained from applying k + 1 Broyden class updates to an initial matrix B . Weassume n is large, and thus, explicitly forming and storing B k +1 is impracticalor impossible; moreover, in this setting, we assume limited-memory quasi-Newtonmatrices so that only the most recently-computed k updates are stored and usedto update B . In practice, the value of k is small, i.e., less than 10 (see e.g., [5]),making k (cid:28) n . Problems such as (1) arise in quasi-Newton line-search and trust-region methods for large-scale optimization (see, e.g., [6, 7, 12, 18]), as well as inpreconditioning iterative solvers (see, e.g., [16, 19]). In this paper, we propose acompact formulation of B − k +1 that can be used to efficiently solve (1).Traditional quasi-Newton methods for minimizing a continuously differentiablefunction f : (cid:60) n → (cid:60) generate a sequence of iterates { x k } such that f is strictlydecreasing on this sequence. Moreover, at each iteration, the most-recently com-puted iterate x k +1 is used to update the quasi-Newton matrix by defining a new quasi-Newton pair ( s k , y k ) given by s k (cid:52) = x k +1 − x k and y k (cid:52) = ∇ f ( x k +1 ) − ∇ f ( x k ) . Research supported in part by NSF grants CMMI-1334042 and CMMI-1333326. a r X i v : . [ m a t h . NA ] N ov J. B. ERWAY AND R. F. MARCIA
In the case of the Broyden class updates, B k +1 is updated as follows:(2) B k +1 = B k − s Tk B k s k B k s k s Tk B k + 1 y Tk s k y k y Tk + φ ( s Tk B k s k ) w k w Tk , with w k = y k y Tk s k − B k s k s Tk B k s k , where φ ∈ (cid:60) . The most well-known quasi-Newton update is the Broyden-Fletcher-Goldfarb-Shanno ( BFGS ) update, which is obtained by setting φ = 0. While thisis the most widely-used update, research has suggested that other values of φ maylead to faster convergence [4, 14, 20]. Because of this interest in other values of φ ,the entire Broyden class (i.e., φ ∈ (cid:60) ) of quasi-Newton methods has been generalizedto solve minimization problems over Riemannian manifolds [13]. For these reasons,in this paper, we consider solving (1) for other values of φ other than φ = 0(the BFGS update). In particular, we consider the restricted Broyden class ofupdates, where φ ∈ [0 , BFGS update ( φ = 0) using the well-known two-loop recursion [17] (see A for details).However, there is no known corresponding recursion method for other updates of theBroyden class (see [18, p.190]); in fact, for this reason many researchers prefer the BFGS update. In this paper we present various methods for solving linear systemsof equations with limited-memory quasi-Newton matrices. The main contributionof this paper is a new approach by formulating a compact representation for theinverse of any member of the restricted Broyden class. This representation allowsus to efficiently solve linear systems of the form (1). The compact formulationfor the inverse of these matrices is based on ideas found in [10], where a compactformulation for the restricted Broyden class of matrices is presented. An additionalbenefit of our proposed approach is the ability to calculate the eigenvalues of thelimited-memory matrix, and hence, the condition number of the linear system.This paper is organized in seven main sections. In Section 2, we present currentmethods for solving linear systems with a limited-memory quasi-Newton matrix.In Section 3, we review the compact formulation for matrices obtained using therestricted Broyden class of updates. The compact formulation for the inverse ofthese matrices is presented in Section 4. In Section 5, we provide a practicalimplementation for solving systems of the form (1) when the system matrix iseither a member of the restricted Broyden class or an
SR1 matrix. In addition,we discuss how to compute the condition number of the linear system being solvedand how to obtain additional computational savings when a new quasi-Newtonpair is computed. We present numerical experiments in Section 6 that demonstratethe competitiveness of our proposed approach. Finally, concluding remarks are inSection 7.
N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 3 Current methods
One approach to solve linear equations with members of the Broyden class is touse the Sherman-Morrison-Woodbury (
SMW ) formula to update the inverse aftereach rank-one change (see, e.g., [11]). Thus, to compute the inverse after a rank-twoupdate, the
SMW formula may be applied twice. (For algorithmic details on thisapproach, see B.) Alternatively, linear solves with a general member of the Broydenclass can be performed using the following recursion formula found in [7] for H k +1 ,the inverse of B k +1 . Specifically, the inverse of B k +1 in (2) is given by the followingrank-two update:(3) H k +1 = H k + 1 s Tk y k s k s Tk − y Tk H k y k H k y k y Tk H k + Φ k ( y Tk H k y k ) v k v Tk , where(4) v k = s k y Tk s k − H k y k y Tk H k y k , and Φ k = (1 − φ )( y Tk s k ) (1 − φ )( y Tk s k ) + φ ( y Tk H k y k )( s Tk B k s k ) , and H k (cid:52) = B − k . (Algorithmic details associated with this approach can be found inC.)The method proposed in this paper makes use of the so-called compact formu-lation of quasi-Newton matrices that not only appears to be competitive in termsof speed but also yields additional information about the system matrix.3. Compact representation
In this section, we briefly review the compact representation of members of theBroyden class of quasi-Newton matrices.Given k + 1 update pairs of the form { ( s i , y i ) } , the compact formulation of B k +1 is given by(5) B k +1 = B + Ψ k M k Ψ Tk , where M k is a square matrix and B ∈ (cid:60) n × n is an initial matrix, which is oftentaken to be a scalar multiple of the identity. (Note that the size of Ψ is not specified.)The compact formulation of members of the Broyden class of updates are definedin terms of S k = [ s s s · · · s k ] ∈ (cid:60) n × ( k +1) ,Y k = [ y y y · · · y k ] ∈ (cid:60) n × ( k +1) , and the following decomposition of S Tk Y k ∈ (cid:60) ( k +1) × ( k +1) : S Tk Y k = L k + D k + R k , where L k is strictly lower triangular, D k is diagonal, and R k is strictly uppertriangular.Assuming all updates are well defined, Erway and Marcia [10] derive the compactformulation for all members of the Broyden class for φ ∈ [0 , φ ∈ [0 , B k +1 is given by (5) where(6) Ψ k = (cid:2) B S k Y k (cid:3) and M k = − (cid:20) S Tk B S k − φ Λ k ( L k − φ Λ k )( L k − φ Λ k ) T − ( D k + φ Λ k ) (cid:21) − , J. B. ERWAY AND R. F. MARCIA where(7) Λ k = diag ≤ i ≤ k ( λ i ) , where λ i = 1 − − φs Ti B i s i − φs Ti y i for 0 ≤ i ≤ k .When φ = 0, this representation simplifies to the BFGS compact representationfound in [5]. In the limited-memory case, k (cid:28) n , and thus, M k is easily inverted.4. Inverses
In this section, we present the compact formulation for the inverse of any memberof the restricted Broyden class of matrices and for
SR1 matrices. We then demon-strate how this compact formulation can be used to solve linear systems with theselimited-memory matrices. Finally, we discuss computing the condition number ofthe linear system and potential computational savings when a new quasi-Newtonpair is computed.The main contribution of this paper is formulating a compact representation forthe inverse for any member of the restricted Broyden class. It should be notedthat the compact formulation for the inverse of a
BFGS matrix is already known(see [5]). We now derive a general expression for the compact representation ofany member of the restricted Broyden class. To do this, we apply the Sherman-Morrison-Woodbury formula (see, e.g., [11]) to the compact representation of B k +1 found in (5): B − k +1 = B − + B − Ψ k (cid:0) − M − k − Ψ Tk B − Ψ k (cid:1) − Ψ Tk B − . For quasi-Newton matrices it is conventional to let H i denote the inverse of B i foreach i ; using this notation, the inverse of B − k +1 is given by(8) H k +1 = H + H Ψ k (cid:0) − M − k − Ψ Tk H Ψ k (cid:1) − Ψ Tk H . Since H Ψ k = H [ B S k Y k ] = [ S k H Y k ] , (8) can be written as(9) H k +1 = H + [ S k H Y k ]( − M − k − Ψ Tk H Ψ k ) − (cid:20) S Tk Y Tk H (cid:21) . Finally, (9) can be simplified usingΨ Tk H Ψ k = (cid:20) S Tk B S k S Tk Y k Y Tk S k Y Tk H Y k (cid:21) to obtain( − M − k − Ψ Tk H Ψ k ) − = (cid:18)(cid:20) S Tk B S k − φ Λ k ( L k − φ Λ k )( L k − φ Λ k ) T − ( D k + φ Λ k ) (cid:21) − (cid:20) S Tk B S k S Tk Y k Y Tk S k Y Tk H Y k (cid:21)(cid:19) − = (cid:20) − φ Λ k L k − φ Λ k − S Tk Y k L Tk − φ Λ k − Y Tk S k − D k − φ Λ k − Y Tk H Y k (cid:21) − = (cid:20) − φ Λ k − R k − D k − φ Λ k − R Tk − D k − φ Λ k − D k − φ Λ k − Y Tk H Y k (cid:21) − . N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 5
In other words, the compact formulation for the inverse of a member of the restrictedBroyden class is given by(10) H k +1 = H + ˜Ψ k ˜ M k ˜Ψ Tk where ˜Ψ k (cid:52) = [ S k H Y k ] , and(11) ˜ M k (cid:52) = (cid:20) − φ Λ k − R k − D k − φ Λ k − R Tk − D k − φ Λ k − D k − φ Λ k − Y Tk H Y k (cid:21) − . In the following section, we discuss a practical method to compute ˜ M k . In D,we explore remarkable relationships between various updates.5. Recursive formulation and practical implementation
In this section, we present a practical method to compute ˜ M k in (10) given by(11). We begin by providing an alternative expression for ˜ M that will allow us todefine a recursion method to compute ˜ M k . Lemma 1.
Suppose H = H + ˜Ψ ˜ M ˜Ψ T , is the inverse of a member of therestricted Broyden class of updates after performing one update. Then, (12) ˜ M = (cid:20) ˜ α ˜ β ˜ β ˜ δ (cid:21) , where (13) ˜ α = 1 s T y + Φ y T H y ( s T y ) , ˜ β = − Φ y T s , ˜ δ = − − Φ y T H y , and Φ is given in (4) . Proof.
Expanding (3), yields H = H + (cid:18) s T y + Φ y T H y ( s T y ) (cid:19) s s T − Φ y T s H y s T − Φ y T s s y T H − − Φ y T H y H y y T H , which simplifies to(14) H = H + (cid:2) s H y (cid:3) (cid:20) ˜ α ˜ β ˜ β ˜ δ (cid:21) (cid:20) s T y T H (cid:21) , where ˜ α , ˜ β , and ˜ δ are defined as in (13) and Φ is given by (4). Note the (14) isof the form H = H + ˜Ψ ˜ M ˜Ψ T .We now show that ˜ M defined by (12) and (13) is equivalent to (11) with k = 0.To see this, we simplify the entries of (12). First, define∆ (cid:52) = (1 − φ )( y T s ) + φ ( y T H y )( s T B s ) . Note that ∆ (cid:54) = 0 since H and B are positive definite and φ ∈ [0 , = (1 − φ )( s T y ) / ∆ , and thus,˜ α = 1 s T y + (1 − φ ) y T H y ∆ , ˜ β = − (1 − φ )( s T y )∆ , and ˜ δ = − φ ( s T B s )∆ . J. B. ERWAY AND R. F. MARCIA
Consider the inverse of (12), which is given by(15) (cid:20) ˜ α ˜ β ˜ β ˜ δ (cid:21) − = 1˜ α ˜ δ − ˜ β (cid:20) ˜ δ − ˜ β − ˜ β ˜ α (cid:21) . We now simplify the entries on the right side of (15). The determinant of (12) isgiven by˜ α ˜ δ − ˜ β = − (1 − Φ )( s T y )( y T H y ) − Φ (1 − Φ )( s T y ) − Φ ( s T y ) = 1∆ (cid:18) s T B s λ (cid:19) , (16)where λ in (16) is defined in (7). Thus, the first entry of (15) is given by(17) ˜ δ ˜ α ˜ δ − ˜ β = − φ ( s T B s )∆ ∆ λ s T B s = − φλ . The off-diagonal elements of the right-hand side of (15) simplify as follows:˜ β ˜ α ˜ δ − ˜ β = − (cid:18) (1 − φ )( s T y )∆ (cid:19) (cid:18) ∆ (cid:18) λ s T B s (cid:19)(cid:19) = s T y + φλ . (18)Finally, the last entry of (15) can be simplified as follows:˜ α ˜ α ˜ δ − ˜ β = 1 s T y ∆ λ s T B s + (1 − φ ) y T H y λ s T B s = − s T y − φλ − y T H y . (19)(For details in the calculations of (16), (18), (19), see F.) Thus, (15) together with(17), (18), and (19) gives (cid:20) ˜ α ˜ β ˜ β ˜ δ (cid:21) − = (cid:20) − φλ − s T y − φλ − s T y − φλ − s T y − φλ − y T H y (cid:21) , showing that ˜ M defined by (12) and (13) is equivalent to (11) with k = 0. (cid:3) Together with Lemma 1, the following theorem shows how ˜ M k is related to˜ M k − ; from this, we present an algorithm to compute ˜ M k using recursion. Thiscomputation avoids explicitly forming B k , which is used in the definition of Λ k andΦ k . Theorem 1.
Suppose H j +1 = H + ˜Ψ j ˜ M j ˜Ψ Tj as in (10) for all j ∈ { , . . . , k } , isthe inverse of a member of the Broyden class of updates for a fixed φ ∈ [0 , . If ˜ M is defined by (12), then for all j ∈ { , . . . k } , ˜ M j satisfies the recursion relation (20) ˜ M j = Π Tj ˜ M j − + ˜ δ j ˜ u j ˜ u Tj ˜ β j ˜ u j ˜ δ j ˜ u j ˜ β j ˜ u Tj ˜ α j ˜ β j ˜ δ j ˜ u Tj ˜ β j ˜ δ j Π j , where Π j = I j I j is such that (cid:2) ˜Ψ j − s j H y j (cid:3) Π j = ˜Ψ j , and ˜ α j = 1 s Tj y j + Φ j y Tj H j y j ( s Tj y j ) , ˜ β j = − Φ j y Tj s j , ˜ δ j = − − Φ j y Tj H j y j , and ˜ u j = ˜ M j − ˜Ψ Tj − y j . N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 7
Proof.
This proof is by induction on j . For the base case, we show that equation(20) holds for j = 1. Setting k = 1 in (3) yields(21) H = H + (cid:2) s H y (cid:3) (cid:20) ˜ α ˜ β ˜ β ˜ δ (cid:21) (cid:20) s T y T H (cid:21) , where ˜ α = 1 s T y + Φ y T H y ( s T y ) , ˜ β = − Φ y T s , and ˜ δ = − − Φ y T H y . By Lemma 1, (21) can be written as(22) H = H + ˜Ψ ˜ M ˜Ψ T + (cid:2) s H y (cid:3) (cid:20) ˜ α ˜ β ˜ β ˜ δ (cid:21) (cid:20) s T y T H (cid:21) , where ˜Ψ = [ S H Y ] and ˜ M is given by (12). Letting ˜ u = ˜ M ˜Ψ T y , we havethat(23) H y = (cid:16) H + ˜Ψ ˜ M ˜Ψ T (cid:17) y = H y + ˜Ψ ˜ u . Substituting (23) into the last quantity on the right side of (22) yields (cid:2) s H y (cid:3) (cid:20) ˜ α ˜ β ˜ β ˜ δ (cid:21) (cid:20) s T y T H (cid:21) = (cid:2) s H y + ˜Ψ ˜ u (cid:3) (cid:20) ˜ α ˜ β ˜ β ˜ δ (cid:21) (cid:20) s T y T H + ˜ u T ˜Ψ T (cid:21) = (cid:2) ˜Ψ s H y (cid:3) ˜ δ ˜ u ˜ u T ˜ β ˜ u ˜ δ ˜ u ˜ β ˜ u T ˜ α ˜ β ˜ δ ˜ u T ˜ β ˜ δ ˜Ψ T s T y T H . Letting Π be defined as in (20) with j = 1, gives that (cid:2) ˜Ψ s H y (cid:3) Π = ˜Ψ , and thus, H = H + ˜Ψ Π T ¯ M Π ˜Ψ T , where ¯ M (cid:52) = ˜ M + ˜ δ ˜ u ˜ u T ˜ β ˜ u ˜ δ ˜ u ˜ β ˜ u T ˜ α ˜ β ˜ δ ˜ u T ˜ β ˜ δ . It remains to show that ˜ M = Π T ¯ M Π ; however, for simplicity, we instead show˜ M − = Π T ¯ M − Π.Notice that ¯ M can be written as the product of three matrices:¯ M = I u (cid:20) ˜ M
00 ˜ N (cid:21) I u T , where ˜ N = (cid:20) ˜ α ˜ β ˜ β ˜ δ (cid:21) . Then, ¯ M − is given by¯ M − = I − ˜ u T (cid:20) ˜ M −
00 ˜ N − (cid:21) I − ˜ u . (24)It can be shown, using similar computations as in the proof of Lemma 1, that˜ N − = 1˜ α ˜ δ − ˜ β (cid:20) ˜ δ − ˜ β − ˜ β ˜ α (cid:21) = (cid:20) − φλ − s T y − φλ − s T y − φλ − s T y − φλ − y T H y (cid:21) . J. B. ERWAY AND R. F. MARCIA
Substituting ˜ N − into (24) and simplifying yields¯ M − = ˜ M − − ˜Ψ T y − φλ − s T y − φλ − y T ˜Ψ − s T y − φλ ˜ u T M − ˜ u − s T y − φλ − y T H y . (25)Terms in the (3,3)-entry in the right side of (25) can be simplified using that˜ M − ˜ u = ˜Ψ T y and that y T H y = y T H y + y T ˜Ψ ˜ M ˜Ψ T y as follows:˜ u T ˜ M − ˜ u − y T H y = y T ˜Ψ ˜ M ˜Ψ T y − y T H y = − y T H y + y T H y − y T H y = − y T H y . Thus,(26) ¯ M − = ˜ M − − ˜Ψ T y − φλ − s T y − φλ − y T ˜Ψ − s T y − φλ − s T y − φλ − y T H y . Lemma 1, together with the substitution ˜Ψ T y = (cid:18) S T y Y T H y (cid:19) , gives thatΠ T ¯ M − Π = Π Tj ˜ M − − ˜Ψ T y − φλ − s T y − φλ − y T ˜Ψ − s T y − φλ − s T y − φλ − y T H y Π j = − φλ − s T y − φλ − S T y − φλ − s T y − φλ − s T y − φλ − s T y − φλ − y T H y − Y T H y − y T S − s T y − φλ − y T H Y − s T y − φλ − y T H y = − φ Λ − R − D − φ Λ − S T y − φλ − s T y − φλ − R T − D − φ Λ − D − φ Λ − Y T H Y − Y T H y − y T S − s T y − φλ − y T H Y − s T y − φλ − y T H y = (cid:20) − φ Λ − R − D − φ Λ − R T − D − φ Λ − D − φ Λ − Y T H Y (cid:21) = ˜ M − , proving the base case.The inductive step is similar to the base case. For the inductive step, we assumethe theorem holds for k = 1 , . . . j − k = j . We begin,as before, setting k = j in (3) to obtain(27) H j +1 = H j + (cid:2) s j H j y j (cid:3) (cid:20) ˜ α j ˜ β j ˜ β j ˜ δ j (cid:21) (cid:20) s Tj y Tj H j (cid:21) , where ˜ α j = 1 s Tj y j + Φ j y Tj H j y j ( s Tj y j ) , ˜ β j = − Φ j y Tj s j , and ˜ δ j = − − Φ j y Tj H j y j , N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 9
Using the induction hypothesis, (27) becomes H j +1 = H + ˜Ψ j − ˜ M j − ˜Ψ j − + (cid:2) s j H j y j (cid:3) (cid:20) ˜ α j ˜ β j ˜ β j ˜ δ j (cid:21) (cid:20) s Tj y Tj H j (cid:21) . Similar to the base case, (cid:2) s j H j y j (cid:3) (cid:20) ˜ α j ˜ β j ˜ β j ˜ δ j (cid:21) (cid:20) s Tj y Tj H j (cid:21) = (cid:2) s j H y j + ˜Ψ j − ˜ u j (cid:3) (cid:20) ˜ α j ˜ β j ˜ β j ˜ δ j (cid:21) (cid:20) s Tj y Tj H + ˜ u Tj ˜Ψ Tj − (cid:21) = (cid:2) ˜Ψ j − s j H y j (cid:3) ˜ δ j ˜ u j ˜ u Tj ˜ β j ˜ u j ˜ δ j ˜ u j ˜ β j ˜ u Tj ˜ α j ˜ β j ˜ δ j ˜ u Tj ˜ β j ˜ δ j ˜Ψ Tj − s Tj y Tj H , where ˜ u j = ˜ M j − ˜Φ Tj − y j . With Π j defined as in (20), then (cid:2) ˜Ψ j − s j H y j (cid:3) Π j =˜Ψ j , and so H j = H + ˜Ψ j Π Tj ¯ M j Π j ˜Ψ Tj , where(28) ¯ M j (cid:52) = ˜ M j − + ˜ δ j ˜ u j ˜ u Tj ˜ β j ˜ u j ˜ δ j ˜ u j ˜ β j ˜ u Tj ˜ α j ˜ β j ˜ δ j ˜ u Tj ˜ β j ˜ δ j . It remains to show that ˜ M j = Π Tj ¯ M j Π j ; however, for simplicity, we instead show˜ M − j = Π Tj ¯ M − j Π j using a similar argument as in the base case. In particular,similar to (25), it can be shown that¯ M − j = ˜ M − j − − ˜Ψ Tj − y j − φλ j − s Tj y j − φλ j − y Tj ˜Ψ j − − s Tj y j − φλ j − s Tj y j − φλ j − y Tj H y j . By the inductive hypothesis that (11) holds for k = j − M − j − = (cid:20) − φ Λ j − − R j − − D j − − φ Λ j − − R Tj − − D j − − φ Λ j − − D j − − φ Λ k − Y Tj − H Y j − (cid:21) and so,Π Tj ¯ M − j Π j = Π Tj ˜ M − j − − ˜Ψ Tj − y j − φλ j − s Tj y j − φλ j − y Tj ˜Ψ j − − s Tj y j − φλ j − s Tj y j − φλ j − y Tj H y j Π j = − φ Λ j − − R j − − D j − − φ Λ j − − S Tj − y j − φλ j − s Tj y j − φλ j − R Tj − − D j − − φ Λ j − − D j − − φ Λ j − − Y Tj − H Y j − − Y Tj − H y j − y Tj S j − − s Tj y j − φλ j − y Tj H Y j − − s Tj y j − φλ j − y Tj H y j = (cid:20) − φ Λ j − R j − D j − φ Λ j − R Tj − D j − φ Λ j − D j − φ Λ j − Y Tj H Y j (cid:21) = ˜ M − j , proving the induction step. (cid:3) Using Theorem 1, Algorithm 1 computes ˜ M k and then uses the compact formu-lation for the inverse to solve linear systems of the form (1). There are two parts tothe for loop in Algorithm 1: The first half of the loop is used to build products ofthe form s Tj B j s j ; the second half of the loop computes ˜ M k , making use of s Tj B j s j tocompute Φ j . In the special case when φ = 0 or φ = 1, Φ j can be quickly computedas Φ j = 1 and Φ j = 0, respectively.Define φ , B , and H ;Form M using (6) and ˜ M using (12);Let Ψ = ( B s y ) and ˜Ψ = ( s H y ); for j = 1 : k % Part 1: Compute s Tj B j s j u j ← M j − (Ψ Tj − s j ); s Tj B j s j ← s Tj B s j + ( s Tj Ψ j − ) u j ; α j ← − (1 − φ ) / ( s Tj B j s j ); β j ← − φ/ ( y Tj s j ); δ j ← (1 + φ ( s Tj B j s j ) / ( y Tj s j )) / ( y Tj s j ); M j ← Π j M j − + α j u j u Tj α j u j β j u j α j u Tj α j β j β j u Tj β j δ j Π Tj , where Π j is as in (20);%Part 2: Compute ˜ M j ˜ u j ← ˜ M j − ( ˜Ψ Tj − y j ); y Tj H j y j ← y Tj H y j + ( y Tj ˜Ψ j − )˜ u j ;Φ j = (1 − φ )( y Tj s j ) / ((1 − φ )( y Tj s j ) + φ ( y Tj H j y j )( s Tj B j s j ));˜ α j ← (1 + Φ j ( y Tj H j y j ) / ( y Tj s j )) / ( y Tj s j );˜ β j ← − Φ j / ( y Tj s j );˜ δ j ← − (1 − Φ j ) / ( y Tj H j y j );˜ M j ← Π Tj ˜ M j − + δ j ˜ u j ˜ u Tj ˜ β j ˜ u j ˜ δ j ˜ u j ˜ β j ˜ u Tj ˜ α j ˜ β j ˜ δ j ˜ u Tj ˜ β j ˜ δ j Π j , where Π j is as in (20); end r = H z + ˜Ψ k ˜ M k ˜Ψ Tk z ; ALGORITHM 1:
Computing r = B − k +1 z using (10)In Algorithm 1, the computations for u j and ˜ u j in lines 7 and 15, respectively,can be simplified by noting that(29) Ψ Tj − s j = (cid:20) S Tj − B s j Y Tj − s j (cid:21) and ˜Ψ Tj − y j = (cid:20) S Tj − y j Y Tj − H y j (cid:21) . In other words, Ψ Tj − s j is a 2 j vector whose first j entries are the first j entries inthe ( j + 1)th column of S Tk B S k and whose last j entries are the first j entries inthe ( j + 1)th row of S Tk Y k ; similarly, ˜Ψ Tj − y j is a vector whose first j entries arethe first j entries in the ( j + 1)th column of S Tk Y k and whose last j entries are thefirst j th entries in the ( j + 1)th column of Y Tk H Y k . Thus, computing u j and ˜ u j N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 11 only requires 2 j (4 j −
1) flops additional flops after precomputing and storing thequantities S Tk B S k , Y Tk H Y k , and S Tk Y k . Note that provided B is a scalar multipleof the identity matrix and H = B − , then forming S Tk B S k and Y Tk H Y k requires( k + 1)( k + 2) n flops each, and forming S Tk Y k requires ( k + 1) (2 n −
1) flops.Computing the scalar s Tj B j s j in line 8 of Algorithm 1 requires performing oneinner product between Ψ Tj − s j and u j and then summing this with s Tj B s j , result-ing in an operation cost of 4 j . This is the same cost is associated with forming y Tj H j y j in line 16. The scalar Φ j requires ten flops to compute. The other scalars( α j , β j , δ j , ˜ α j , ˜ β j , and ˜ δ j ) require a total of 14 flops. Forming the matrices M j and˜ M j requires 24 j + 16 j flops each. Thus, excluding the cost of forming S Tk B S k , S Tk Y k , and Y Tk H Y k , the operation count for computing ˜ M k is given by k (cid:88) j =1 j (4 j −
1) + 8 j + 24 + 24 j + 16 j = 13 (cid:0) k + 90 k + 122 k (cid:1) . Finally, forming r in line 23 requires k + 1 vector inner products and (2 k + 1)( n + k + 1) + 2 n additional flops. Thus, the overall flop count for Algorithm 1 is (2 k +1)( n + k + 1) + 2 n + (40 k + 90 k + 122 k ) / n − k + 1) , which includes thecost of k + 1 vector inner products.Table 1 presents the computational complexity of Algorithm 1 and the algorithmsin Appendices A-C for linear solves where the system matrix is from the restrictedBroyden class of updates. The operational costs do not include the computationfor S Tk Y k , S Tk B S k , and Y Tk H Y k , which all can be efficiently updated as new quasi-Newton pairs are computed.Alg. Flop count (including vector inner products)1 (2 k + 1)( n + k + 1) + 2 n + (40 k + 90 k + 122 k ) + (2 n − k + 1)3 4 nk + 3 k + n + 2 k (2 n − ( k + 1)(23 kn + 52 n + 12 k + 42) + (2 n − k + 1) + 7( k + 1))6 (5 n + 3)( k + 1) k + (10 n + 14)( k + 1) + (2 n − k + 2)( k + 1) Table 1.
Computational complexity comparison of Algorithms 1,3, 4, and 6 when the system matrix is a member of the restrictedBroyden class of quasi-Newton updates.5.1.
Efficient solves after updates.
To compute the factors in (8), it is necessaryto form the matrices L k , D k and R k . After a new quasi-Newton pair has beencomputed, updating (8) can be done efficiently by storing S Tk S k , Y Tk Y k and S Tk Y k .In this case, to update H k +1 we add a column to (and possibly delete a columnfrom) S k and Y k ; the corresponding changes can then be made in S Tk S k , Y Tk Y k and S Tk Y k (see [5] for more details).In the practical implementation given in Algorithm 1, it is necessary to computeΨ Tj − s j and ˜Ψ Tj − y j given by (29), which can be formed using the stored quantities S Tk S k , Y Tk Y k , and S Tk Y k . To compute these quantities when a new quasi-Newtonpair of updates is obtained we apply the same strategy as described above byadding (and possibly deleting) columns in S k and Y k , enabling Ψ Tj − s j and ˜Ψ Tj − y j to be updated efficiently. With these updates, the work required for Algorithm 1is significantly reduced. Compact representation of the inverse of an SR1 matrix.
In this sub-section, we demonstrate that this same strategy can be applied to the
SR1 update.The symmetric rank-one ( SR1 ) update is obtained by setting φ = y Tk s k / ( y Tk s k − s Tk B k s k ; in this case, B k +1 = B k + 1 s Tk ( y k − B k s k ) ( y k − B k s k )( y k − B k s k ) T . (30)The SR1 update is a member of the Broyden class but not the restricted Broydenclass. This update exhibits hereditary symmetric but not positive definiteness.This update is remarkable in that it is self-dual: Initializing B − in place of B and interchanging y k and s k everywhere in (30) yields a recursion relation for B − k +1 . Thus, like the BFGS update, solving linear systems with
SR1 matrices canbe performed using vector inner products (see E for details).The compact formulation for the
SR1 update is given by B k +1 = B + ˆΨ k ˆ M k ˆΨ Tk , where(31) ˆΨ k = Y k − B S k and ˆ M k = ( D k + L k + L Tk − S Tk B S k ) − , where Ψ k ∈ (cid:60) n × ( k +1) and M k ∈ (cid:60) ( k +1) × ( k +1) (see [5]). The SR1 update has thedistinction of being the only rank-one update in the Broyden class of updates. Itis for this reason that Ψ k has only k + 1 columns and M k is a ( k + 1) × ( k + 1)matrix. (In contrast, the compact representation for rank-two updates have twiceas many columns and M k is twice the size as in the SR1 case.)The compact formulation for the inverse of an
SR1 matrix can be derived usingthe same strategy as for the restricted Broyden class. The
SMW formula appliedto the compact formulation of an
SR1 with Ψ k and M k defined as in (31) yields H k +1 = H + H ( Y k − B S k ) (cid:0) − M k − Ψ Tk H Ψ k (cid:1) − ( Y k − B S k ) T H . Substituting in for M k using (31) together with the identityΨ Tk H Ψ k = Y Tk H Y k − S Tk Y k − Y Tk S k + S Tk B S k , yields H k +1 = H + ( H Y k − S k ) (cid:0) − ( D k + L k + L Tk − S Tk B S k ) − ( Y Tk H Y k − S Tk Y k − Y Tk S k + S Tk B S k ) (cid:1) − ( H Y k − S k ) T = H + ( S k − H Y k ) T ( D k + R k + R Tk − Y Tk H Y k ) − ( S k − H Y k ) T . In other words, the compact formulation for the inverse of an
SR1 matrix is givenby(32) H k +1 = H + ˜Ψ k ˜ M k ˜Ψ Tk where ˜Ψ k (cid:52) = [ S k − H Y k ] , and ˜ M k (cid:52) = ( D k + R k + R Tk − Y Tk H Y k ) − . Algorithm 2 details how to solve a linearsystem defined by an
L-SR1 matrix via the compact representation for its inverse.Assuming S Tk Y k is precomputed, Algorithm 2 requires (2 k + 1)( n + k + 1) + 2 n +(2 n − k +1) flops, which includes ( k +1) vector inner products to compute ˜Ψ Tk z inline 5. Note that this count does not include the cost of inverting the ( k +1) × ( k +1)matrix, ˜ M k , in line 4 since when k (cid:28) n , this cost is significantly smaller than thedominant costs for Algorithm 2. N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 13
Define H , S k and Y k ;Form S Tk Y k = L k + D k + R k ;Form ˜Ψ k = S k − H Y k ;˜ M k ← ( D k + R k + R Tk − Y Tk H Y k ) − ; r = H z + ˜Ψ k ˜ M k ˜Ψ Tk z ; ALGORITHM 2:
Computing r = B − k +1 z using (32) when B k +1 is an SR1matrixTable 2 presents the computational complexity of Algorithm 2 and Algorithm8 in E for solving SR1 linear systems. The operational costs do not include thecomputation for S Tk Y k , S Tk B S k , and Y Tk H Y k , which all can be efficiently updatedas new quasi-Newton pairs are computed.Algorithm Flop count (including vector inner products)2 (2 k + 1)( n + k + 1) + 2 n + (2 n − k + 1)8 ( k + 1)( k (2 n + 1) + 2(3 n + 1)) + 2 n + (2 n − k + 2)( k + 1) Table 2.
Computational complexity comparison of Algorithms 2and 8 when the system matrix is an SR1 matrix. Note that thecomputational cost for Algorithm 2 does not include the cost ofinverting a ( k + 1) × ( k + 1) matrix, where k (cid:28) n .5.3. Computing condition numbers.
Provided the initial approximate Hessian B is taken to be a scalar multiple of the identity, (i.e., B = γI ), it is possibleto compute the condition number of B k +1 in (1). To do this, we consider thecondition number of H k +1 = B − k +1 . The eigenvalues of H k +1 can be obtained fromthe compact representation of H k +1 together with the techniques from [3, 10]. Forcompleteness, we review this approach below.Suppose B k +1 is a member of the restricted Broyden class. The eigenvalues of H k +1 can be computed using the compact formulation for H k +1 given in (10): H k +1 = H + ˜Ψ k ˜ M k ˜Ψ Tk . Letting ˜Ψ k = QR be the QR decomposition of ˜Ψ k where Q ∈ (cid:60) n × n is an orthogonalmatrix and R ∈ (cid:60) n × k +1) is an upper triangular matrix. Then, H k +1 = H + ˜Ψ k ˜ M k ˜Ψ Tk (33) = H + QR ˜ M k R T Q T . The matrix R ˜ M k R T is a real symmetric n × n matrix. However, since ˜Ψ k ∈(cid:60) n × k +1) has at most rank 2( k + 1), then R can be written in the form R = (cid:18) R (cid:19) , where R ∈ k + 1) × k + 1). Thus, R ˜ M k R T = (cid:18) R (cid:19) ˜ M k (cid:0) R T (cid:1) = (cid:18) R ˜ M k R T
00 0 (cid:19) . Since R ˜ M k R T ∈ (cid:60) k +1) × k +1) , its spectral decomposition can be computed ex-plicitly. Letting V D V T is the spectral decomposition of R ˜ M k R T and substitut-ing into (34) yields: H k +1 = H + QV DV T Q T where V (cid:52) = (cid:18) V
00 0 (cid:19) ∈ (cid:60) n × n and D (cid:52) = (cid:18) D
00 0 (cid:19) ∈ (cid:60) n × n . Thus, H k +1 = H + QV DV T Q T = QV ( γ − I + D ) V T Q T , giving the spectral decomposition of H k +1 . In particular, the matrix H k +1 has aneigenvalue of γ − with multiplicity n − k + 1) and 2( k + 1) eigenvalues given by γ − + d i for 1 ≤ i ≤ k + 1) where d i denotes the i th diagonal element of D .Thus, the condition number of H k +1 can be computed as the ratio of the largesteigenvalue to the smallest eigenvalue (in magnitude). The condition number of B k +1 is the reciprocal of this value.When B k +1 a quasi-Newton matrix generated by SR1 updates, the procedure issimilar except ˜Ψ k has half as many columns resulting in ( k + 1) eigenvalues givenby γ − + d i for 1 ≤ i ≤ ( k + 1) and n − ( k + 1) eigenvalues of γ − . After a newquasi-Newton pair is computed it is possible to update the QR factorization (fordetails, see [10]). 6. Numerical experiments
In this section, we demonstrate the accuracy of the proposed method for solvingquasi-Newton equations. We solve the following linear system:(34) B k +1 p k +1 = − g k +1 , where B k +1 is a member of the restricted Broyden class of updates or an SR1matrix. For members of the restricted Broyden class of updates, we consideredthree values of φ : 0 . , .
50, and 0 .
99 and compare the following methods:(1) “Two-loop recursion” for the BFGS case ( φ = 0 .
0) (Algorithm 3)(2) Recursive SMW approach (Algorithm 4)(3) Recursion to compute products with B − k +1 = H k +1 (Algorithm 6)(4) Compact inverses formulation (Algorithm 1)For SR1 matrices, we compare the following methods:(1) The self-duality method (Algorithm 8)(2) Compact inverses formulation (Algorithm 2)We consider problem sizes ( n ) ranging from 10 ,
000 to 1 , , (cid:107) B k +1 p k +1 − ( − g k +1 ) (cid:107) (cid:107) g k +1 (cid:107) , and the time (in seconds) needed to compute each solution. The number of limited-memory updates was set to five.We simulate the first five iterations of an unconstrained line-search method. Wegenerate the initial point x and the gradients g j = ∇ f ( x j ), for 0 ≤ j ≤
5, randomlyand compute the subsequent iterations x j +1 = x j − α j H j g j , for 0 ≤ j ≤ , N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 15 where H j = B − j is the inverse of the quasi-Newton matrix. Results.
We ran each algorithm ten times for each value of φ (0 . , .
50, and 0 . n (10 , , ,
000 and 1 , , L-BFGS case. In this case, Algorithm 1 is comparable to the thetwo-loop recursion (Algorithm 3), which is specific to the
L-BFGS update. (When n = 1 , , SR1 update, the compact inverse formulation outperformed the algorithm based onself-duality (Algorithm 8).
Table 3.
Relative error for Broyden class matrices with φ = 0 .
00 (
BFGS ).Two-Loop Recursive Recursive Compact n Recursion SMW H k +1 Inverses(Alg. 3) (Alg. 4) (Alg. 6) (Alg. 1)
Table 4.
Relative error for Broyden class matrices with φ = 0 . n Recursive SMW Recursive H k +1 Compact Inverses(Alg. 4) (Alg. 6) (Alg. 1)
Table 5.
Relative error for Broyden class matrices with φ = 0 . n Recursive SMW Recursive H k +1 Compact Inverses(Alg. 4) (Alg. 6) (Alg. 1)
Table 6.
Relative error for SR1 matrices. n Self-Duality Compact Inverses(Alg. 8) (Alg. 2) Concluding Remarks
We derived the compact formulation for members of the restricted Broyden classand the
SR1 update. With this compact formulation, we showed how to solve lin-ear systems defined by limited-memory quasi-Newton matrices. Numerical resultssuggest that this proposed approach is efficient and accurate. This approach hastwo distinct advantages over existing procedures for solving limited-memory quasi-Newton systems. First, there is a natural way to use the compact formulation forthe inverse to obtain the condition number of the linear system. Second, whena new quasi-Newton pair is computed, computational savings can be achieved bysimple updates to the matrix factors in the compact formulation.Future work includes integrating this linear solver inside line-search and trust-region methods for large-scale optimization.8.
Acknowledgements
The authors would like to thank Tammy Kolda for initial conversations on thissubject.
Appendix A. Two-loop recursion
The Broyden-Fletcher-Goldfarb-Shanno (
BFGS ) update is obtained by setting φ = 0 in (2). In this case, B k +1 simplifies to(35) B k +1 = B k − s Tk B k s k B k s k s Tk B k + 1 y Tk s k y k y Tk . The inverse of the
BFGS matrix B k +1 is given by B − k +1 = (cid:18) I − s k y Tk y Tk s k (cid:19) B − k (cid:18) I − y k s Tk y Tk s k (cid:19) + s k s Tk y Tk s k , which can be written recursively as B − k +1 = ( V Tk · · · V T ) B − ( V · · · V k ) + 1 y T s ( V Tk · · · V T ) s s T ( V · · · V k )+ 1 y T s ( V Tk · · · V T ) s s T ( V · · · V k ) + · · · + 1 y Tk s k s k s Tk , (36)where V i = I − y Ti s i y i s Ti (see, e.g., [18, p.177–178]). Solving (1) can be doneefficiently using the well-known two-loop recursion [17].Assuming that the inner products y Ti s i can be precomputed and B is a scalarmultiple of the identity matrix, then the total operation count for Algorithm 3 is4 nk + 3 k + n + 2 k (2 n −
1) flops, which includes 2 k vector inner products. N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 17
Matrix size (n) T i m e ( s e c ond s ) -3 -2 -1 Alg. 4 (SMW recursion)Alg. 6 (Dual recursion)Alg. 3 (Two-loop recursion)Alg. 1 (Compact inverses)
Matrix size (n) T i m e ( s e c ond s ) -3 -2 -1 Alg. 4 (SMW recursion)Alg. 6 (Dual recursion)Alg. 1 (Compact inverses) (a) φ = 0 (BFGS) (b) φ = 0 . Matrix size (n) T i m e ( s e c ond s ) -3 -2 -1 Alg. 4 (SMW recursion)Alg. 6 (Dual recursion)Alg. 1 (Compact inverses)
Matrix size (n) T i m e ( s e c ond s ) -3 -2 -1 Alg. 8 (SR1 self-duality)Alg. 2 (Compact inverses) (c) φ = 0 .
99 (d) SR1
Figure 1.
Semi-log plots of the computational times (in seconds)for ten runs of each of the algorithms discussed in this paper. In(a), (b), and (c), the system matrix is a member of the restrictedBroyden class of updates with φ = 0 , . , and 0 .
99, respectively.In (d), the system matrix is an
SR1 matrix. The proposed methodusing compact inverses generally outperforms the other methods.Note that when φ = 0, our proposed method is competitive withthe “two-loop recursion”, which is specific to the BFGS update. q ← z ; for i = k − , . . . , α i ← ( s Ti q ) / ( y Ti s i ); q ← q − α i y i ; end r ← B − q ; for i = 0 , . . . , k − β ← ( y Ti r ) / ( y Ti s i ); r ← r + ( α i − β ) s i : endALGORITHM 3: Two-loop recursion to compute r = B − k z when B k is a BFGSmatrix Appendix B. SMW recursion
In the symmetric case, the
SMW formula for a rank-one change is given by(37) ( A + αuu T ) − = A − − α αu T A − u A − uu T A − . We now show how (37) can be used to compute the inverse of B k +1 . First, for0 ≤ j ≤ k , define(38) α j = ( y Tj s j ) − , u j = y j ,α j +1 = φ ( s Tj B j s j ) , u j +1 = y j y Tj s j − B j s j s Tj B j s j ,α j +2 = − ( s Tj B j s j ) − , u j +2 = B j s j , Letting C = B and U i = α i u i u Ti for 0 ≤ i ≤ k , we define the matrices(39) C = C + U , C = C + U , . . . , C k +1) = C k +1) − + U k +1) − . By construction, C k +1) = B k +1 . It can be shown that each C i is positive definiteand so u Ti C − i u i > i . For the restricted Broyden class, α j and α j +1 , j = 0 , . . . , k are both strictly positive. Thus, the denominator in (37) is strictlypositive for the rank-one changes associated the 3 j and 3 j + 1, j = 0 , . . . k , andthus, the SMW formula is well-defined for these rank-one updates. Since C j = B j and B j is positive definite, the rank-one update associated with j + 2, j = 0 , . . . k ,must result in a positive-definite matrix; in other words, (37) is well defined. Thus,solving the system B k +1 r = z is equivalent to computing r = C − k +1) z . Apply the SMW formula in (37) to obtain the inverse of C i +1 from C − i , we obtain(40) C − i +1 z = C − i z − α i α i u Ti C − i u i C − i u i u Ti C − i z, for 0 ≤ i < k + 1) . Recursively applying (40) to C − i z , we obtain that(41) C − i +1 z = C − z − i (cid:88) j =0 α j α j u Tj C − j u j C − j u j u Tj C − j z, and more importantly,(42) C − k +1) z = C − z − k +1) − (cid:88) j =0 α j α j u Tj C − j u j C − j u j u Tj C − j z. Computing (42) can be simplified by defining the following two quantities:(43) τ j = α j α j u Tj C − j u j and p j = C − j u j . Substituting (43) into (42) yields(44) C − k +1) z = C − z − k +1) − (cid:88) j =0 τ j ( p Tj z ) p j , N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 19 where τ j = α j (1 + α j p Tj u j ) − . The advantage of representation is that computing C − k +1) z requires only vector inner products. Moreover, the vectors p j can becomputed by evaluating (41) at z = u j for i = j −
1; that is, p j = C − j u j = C − u j − j − (cid:88) i =0 τ i ( p Ti u j ) p i . In Algorithm 4, we present the recursion to solve the linear system B k +1 r = z using the SMW formula (see related methods in [9,15]). We assume B is an easily-invertible initial matrix. r ← C − z ; for j = 0 , . . . , k + 1) − if mod ( j,
3) = 0 i ← j/ u ← y i ; α ← y Ti s i ; else if mod ( j,
3) = 1 i ← ( j − / u ← ( y i ) / ( y Ti s i ) − ( B i s i ) / ( s Ti B i s i ); α ← φ ( s Ti B i s i ); else i ← ( j − / u ← B i s i ; α ← − ( s Ti B i s i ) − ; end p j ← C − u ; for l = 0 , . . . , jp j ← p j − τ l ( p Tl u ) p l ; end τ j ← α/ (1 + αp Tj u ); r ← r − τ j ( p Tj z ) p j ; end ALGORITHM 4: Computing r = B − k +1 z At each iteration, Algorithm 4 computes α and u , which are defined in (38) andrequire matrix-vector products involving the matrices B i for 0 ≤ i ≤ k . The maindifficulty in computing α and u is the computation of matrix-vector products withthe matrices B i for each i . Note that if we are able to form u j +2 = B j s j , thenwe are able to compute all other terms that use B j s j in (38). In what follows, weshow how to compute u j +2 without storing B i for 0 ≤ i ≤ k . This idea is basedon [18, Procedure 7.6].We begin by writing C k +1) in (39) as follows: C k +1) = B + k (cid:88) j =0 α j u j u T j + α j +1 u j +1 u T j +1 + α j +2 u j +2 u T j +2 , where α i and u i are defined by (38). Since B j = C j , then u j +2 = B j s j = (cid:32) B + j − (cid:88) i =0 α i u i u T i + α i +1 u i +1 u T i +1 + α i +2 u i +2 u T i +2 (cid:33) s j = B s j + j − (cid:88) i =0 α i ( u T i s j ) u i + α i +1 ( u T i +1 s j ) u i +1 + α i +2 ( u T i +2 s j ) u i +2 . All the terms in the above summation only involve vectors that have been previouslycomputed. Having computed u j +2 , it is then possible to compute α j +1 , α j +2 , and u j +1 . (The other terms α j and u j do not depend on B j s j .) The followingalgorithm computes the terms α and u that are used in Algorithm 4. for j = 0 , . . . , ku j ← y j ; α j ← / ( s Tj y j ); u j +2 ← B s j + (cid:80) j − i =0 (cid:2) α i ( u T i s j ) u i + α i +1 ( u T i +1 s j ) u i +1 + α i +2 ( u T i +2 s j ) u i +2 (cid:3) ; α j +2 ← − / ( s Tj u j +2 ); u j +1 ← α j y j + α j +2 u j +2 ; α j +1 = − φ/α j +2 ; endALGORITHM 5: Unrolling the limited-memory Broyden convex class formulaAssuming s Tj y j is precomputed at each step, Algorithm 5 requires k (cid:88) j =0 (cid:8) j + 1 (cid:9) = 3( k + 1) k k + 1 = ( k + 1)(3 k + 2)2vector inner products and k (cid:88) j =0 { n + (5 n + 3) j + 1 + 3 n + 1 } = (5 n + 3)( k + 2)( k + 1)2additional flops. With the α ’s and the u ’s computed in Algorithm 5, the rest ofAlgorithm 4 requires k +1) − (cid:88) j =0 { j + 1 + 1 } = 3( k + 1)(3( k + 1) + 5)2vector inner products and k +1) − (cid:88) j =0 { n + (2 n + 1) l + 3 + (2 n + 1) } = 3( k + 1)(3( k + 1) + 1)(2 n + 1)2+3( k + 1)(3 n + 4)additional flops. Thus, the total number of vector inner products for Algorithm 4is 6( k + 1) + 7( k + 1) and the total flop count is12 ( k + 1)(23 kn + 52 n + 12 k + 42) + (2 n − k + 1) + 7( k + 1)) , which includes 6( k + 1) + 7( k + 1) vector inner products. Appendix C. Computing products with H k +1 recursively The inverse of B k +1 in (2) is given by H k +1 = H k + 1 s Tk y k s k s Tk − y Tk H k y k H k y k y Tk H k + Φ k ( y Tk H k y k ) v k v Tk , N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 21 where v k = s k y Tk s k − H k y k y Tk H k y k and Φ k = (1 − φ )( y Tk s k ) (1 − φ )( y Tk s k ) + φ ( y Tk H k y k )( s Tk B k s k ) , and H k (cid:52) = B − k . Thus, the solution to B k +1 r = z can be obtained from B − k +1 z = (cid:32) H z + k (cid:88) i =0 s Ti y i s i s Ti − y Ti H i y i H i y i y Ti H i + Φ i ( y Ti H i y i ) v Ti v i (cid:33) z = H z + k (cid:88) i =0 s Ti zs Ti y i s i − y Tk H i zy Ti H i y i H i y i + Φ i ( y Ti H i y i )( v Ti z ) v i . In order to avoid storing H , . . . , H k for matrix-vector products, we make use ofthe following variables:(45) α i = ( s Ti y i ) − , u i = s i ,α i +2 = − ( y Ti H i y i ) − u i +2 = H i y i ,α i +1 = Φ i ( y Ti H i y i ) , u i +1 = s i y Ti s i − H i y i y Ti H i s i . With these definitions, we have that(46) B − k +1 z = H z + k (cid:88) i =0 α i ( u T i z ) u i + α i ( u T i +1 z ) u i +1 + α i ( u T i +2 z ) u i +2 , and, thus, a recursion relation can be used for matrix-vector products with H , . . . , H k .Algorithm 6 details how to solve for r in (1) using the expression for B − k +1 in (46)without explicitly storing H , . . . , H k . r ← H z ; for j = 0 , . . . , k + 1) − if mod ( j,
3) = 0 i ← j/ u ← s i ; α ← ( y Ti s i ) − ; else if mod ( j,
3) = 1 i ← ( j − / u ← H i y i ; α ← − ( y Ti H i y i ) − ; else i ← ( j − / u ← ( s i ) / ( y Ti s i ) − ( H i y i ) / ( y Ti H i y i );Φ ← (1 − φ )( y Ti s i ) / (cid:0) (1 − φ )( y Ti s i ) + φ ( y Ti H i y i )( s Ti B i s i ) (cid:1) ; α ← Φ( y Ti H i y i ); end r ← r + α ( u T z ) u ; end ALGORITHM 6: Computing r = H k +1 z The following algorithm computes products involving the matrices H i for 0 ≤ i ≤ k for use in Algorithm 6. The algorithm avoids storing any matrices, and matrix-vector products are computed recursively. The derivation of this algorithmis similar to that for Algorithm 5. for j = 0 , . . . , k ˆ u j ← s j ;ˆ α j ← / ( s Tj y j );ˆ u j +1 ← H y j + (cid:80) j − i =0 (cid:2) ˆ α i (ˆ u T i y j )ˆ u i + ˆ α i +1 (ˆ u T i +1 y j )ˆ u i +1 + ˆ α i +2 (ˆ u T i +2 y j )ˆ u i +2 (cid:3) ;ˆ α j +1 ← − / ( y Tj ˆ u j +2 );ˆ u j +2 ← ˆ α j s j + ˆ α j +2 ˆ u j +2 ;Compute α j +2 using Algorithm 5;Φ j ← (1 − φ ) / (1 − φ + φ ˆ α j / (ˆ α j +1 α j +2 ));ˆ α j +2 = − Φ j / ˆ α j +1 ; endALGORITHM 7: Unrolling the limited-memory Broyden convex class formulaAssuming s Tj y j is precomputed at each step, Algorithm 7 requires k (cid:88) j =0 { j + 1 } + ( k + 1)(3 k + 2)2 = (3 k + 2)( k + 1)vector inner products and k (cid:88) j =0 { n + 3) j + 2 n + 1 + 3 n + 8 + 1 } + 3( k + 1)(3( k + 1) + 5)2= (5 n + 3)( k + 1) k + (10 n + 14)( k + 1)additional flops. With the α ’s and the u ’s computed in Algorithm 7, the restof Algorithm 6 requires 3( k + 1) vector inner products and 3( k + 1)(2 n + 1) + n additional flops. Thus, the total flop count for Algorithm 6 is(5 n + 3)( k + 1) k + ( k + 1)(16 n + 17) + n + (2 n − k + 5)( k + 1) , which includes (3 k + 5)( k + 1) vector inner products. Appendix D. Relationships between updates
We note that in the case of the
BFGS update ( φ = 0), the compact representationof the BFGS matrix is consistent with its known compact representation derivedin [5]. In particular, ˜ M k in (11) simplifies to˜ M k = ( − M − k − Ψ Tk H Ψ k ) − = (cid:20) − R k − D k − R Tk − D k − D k − Y Tk H Y k (cid:21) − = (cid:20) ¯ R − Tk ( D k + Y Tk H Y k ) ¯ R − k − ¯ R − Tk − ¯ R − k (cid:21) , where ¯ R k = R k + D k . Thus, the compact representation for the inverse of a BFGS matrix is given by(47) H k +1 = H + [ S k H Y k ] (cid:20) ¯ R − Tk ( D k + Y Tk H Y k ) ¯ R − k − ¯ R − Tk − ¯ R − k (cid:21) (cid:20) S Tk Y Tk H (cid:21) , N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 23 which is equivalent to that found in [5, Equation (2.6)].When φ = 1, then B k +1 in (2) is known as the Davidon-Fletcher-Powell ( DFP )update, which preceded the
BFGS update. The
BFGS and
DFP formulas are knownto be duals of each other, meaning one update can be obtained from the other byinterchanging s k with y k and B k with H k . Here, we demonstrate explicitly thatthe compact representation of the BFGS and
DFP updates are also duals of eachother. In addition, we show that the compact representation of the SR1 matrix isself-dual.Consider the compact formulation for the inverse of a
BFGS matrix (i.e., φ = 0)in (47). This is equivalent to H k +1 = H + [ H Y k S k ] (cid:20) − ¯ R − k − ¯ R − Tk ¯ R − Tk ( D k + Y Tk H Y k ) ¯ R − k (cid:21) (cid:20) Y Tk H S Tk (cid:21) = H − [ H Y k S k ] (cid:20) Y Tk H Y k + D k ¯ R Tk ¯ R k (cid:21) − (cid:20) Y Tk H S Tk (cid:21) . Now we consider interchanging B with H and Y k with S k . Notice that if S k and Y k are interchanged then the upper triangular part of S Tk Y k corresponds to thelower triangular part of Y Tk S k , implying that L k and R k must also be interchanged.Putting this all together yields: B k +1 = B − [ B S k Y k ] (cid:20) S Tk B S k + D k ( L k + D k ) T L k + D k (cid:21) − (cid:20) S Tk B Y Tk (cid:21) , which is the compact formulation for a DFP matrix (see [8, Theorem 1]). In otherwords, the compact representations of
BFGS and
DFP are complementary updatesof each other.For the DFP update, Λ k = − D k in (7) since φ = 1. Thus, the compact formu-lation for the inverse of a DFP matrix is given by H k +1 = H + [ S k H Y k ] (cid:20) D k − R k − R Tk − Y Tk H Y k (cid:21) − (cid:20) S Tk Y Tk H (cid:21) = H − [ H Y k S k ] (cid:20) Y Tk H Y k R Tk R k − D k (cid:21) − (cid:20) Y Tk H S Tk (cid:21) . Interchanging H with B and S k with Y k (and R k with L k ) yields B k +1 = B − [ B S k Y k ] (cid:20) S Tk B S k L Tk L k − D k (cid:21) (cid:20) S Tk B Y Tk (cid:21) , which is the compact formulation of the BFGS matrix (see Eq. (2.17) in [5].Finally, it is worth noting that the inverse of compact formulation for an SR1 matrix is self-dual in the sense of Section 5.2. That is, replacing H with B and S k with Y k (and, thus, R k with L k ) in (32) yields the compact formulation for the SR1 matrix given by (31).
Appendix E. Solves with SR1 matrices.
The
SR1 update is remarkable in that it is self-dual: initializing with B − insteadof B , replacing s k with y k , and replacing y k with s k for all k in (30) results in B − k +1 (see e.g., [18]). Thus, r = B − k +1 z can be computed using recursion (see Algorithm E.1). The first loop of Algorithm E.1 computes p i (cid:52) = s i − H i y i for i = 0 , . . . k ; thefinal line of Algorithm E.1 computes r = B − k +1 z = B − z + (cid:80) ki =0 ( p Ti z ) / ( p Ti y i ) p i . for i = 0 , . . . kp i = s i − H y i ; for j = 0 . . . i − p i = p i − (( p Tj y i ) / ( p Tj y j )) p j ; endend r ← H z + (cid:80) ki =0 (( p Ti z ) / ( p Ti y i )) p i ; ALGORITHM 8:
Computing r = B − k +1 z, when B k +1 is an SR1 matrixThus, in the case of
SR1 updates, solving linear systems with
SR1 matrices canbe performed using vector inner products; the cost for solving linear systems withan
SR1 matrix is the same as the cost of computing products with an
SR1 matrix,which can be done with (cid:40) k (cid:88) i =0 i (cid:41) + k + 1 = ( k + 2)( k + 1)2vector inner products and (cid:40) k (cid:88) i =0 n + (2 n + 1) i (cid:41) + 2 n + ( k + 1)(1 + n ) = ( k + 1)2 ( k (2 n + 1) + 2(3 n + 1)) + 2 n additional flops. It should be noted that unlike members of the restricted Broy-den class, SR1 matrices can be indefinite, and in particular, numerically singular.Methods found in [3, 10] can be used to compute the eigenvalues (and thus, thecondition number) of
SR1 matrices before performing linear solves to help avoidsolving ill-conditioned systems.
N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 25
Appendix F. Calculations in Lemma 1.
The determinant of (12) is given by˜ α ˜ δ − ˜ β = − − Φ ( s T y )( y T H y ) − Φ (1 − Φ )( s T y ) − Φ ( s T y ) = − φs T B s ∆ s T y − − φ ∆ = 1∆ (cid:18) s T B s λ (cid:19) , where λ in (16) is defined in (7). Thus, the first entry of (15) is given by˜ δ ˜ α ˜ δ − ˜ β = − φ ( s T B s )∆ ∆ λ s T B s = − φλ . The off-diagonal elements of the right-hand side of (15) simplify as follows:˜ β ˜ α ˜ δ − ˜ β = − (cid:18) (1 − φ )( s T y )∆ (cid:19) (cid:18) ∆ (cid:18) λ s T B s (cid:19)(cid:19) = − (1 − φ ) s T y − (1 − φ ) − φ s T B s s T y = − (1 − φ )( s T y ) + φs T B s ( s T y ) − φs T B s ( s T y ) − (1 − φ ) s T y − φs T B s = s T y + φ s T B s ( s T y ) − (1 − φ ) s T y − φs T B s = s T y + φλ . Finally, the last entry of (15) can be simplified as follows:˜ α ˜ α ˜ δ − ˜ β = 1 s T y ∆ λ s T B s + (1 − φ ) y T H y λ s T B s = λ ( s T y )( s T B s ) (cid:18) ∆ + (1 − φ )( y T H y )( s T y ) (cid:19) = 1 − (1 − φ ) s T y − φs T B s (cid:18) ∆ + (1 − φ )( y T H y )( s T y ) (cid:19) = (1 − φ )( s T y ) + φ ( y T H y )( s T B s ) + (1 − φ )( y T H y )( s T y ) − (1 − φ ) s T y − φ ( s T B s )= (1 − φ )( s T y ) + φ ( s T B s )( s T y ) − (1 − φ ) s T y − φ ( s T B s ) − φ ( s T B s )( s T y ) − (1 − φ ) s T y − φ ( s T B s )+ φ ( y T H y )( s T B s ) + (1 − φ )( y T H y ) s T y − (1 − φ ) s T y − φ ( s T B s )= − s T y − φλ − y T H y . References [1]
J. Brust, O. Burdakov, J. B. Erway, R. F. Marcia, and Y.-x. Yuan , Shape-changingL-SR1 trust-region methods . Submitted. [2]
J. Brust, J. B. Erway, and R. F. Marcia , On solving L-SR1 trust-region subproblems .[3]
O. Burdakov, L. Gong, Y.-X. Yuan, and S. Zikrin , On efficiently combining limitedmemory and trust-region techniques , Tech. Rep. 2013:13, Link¨oping University, The Instituteof Technology, 2013.[4]
R. H. Byrd, D. C. Liu, and J. Nocedal , On the behavior of broyden’s class of quasi-newtonmethods , SIAM Journal on Optimization, 2 (1992), pp. 533–557.[5]
R. H. Byrd, J. Nocedal, and R. B. Schnabel , Representations of quasi-Newton matricesand their use in limited-memory methods , Math. Program., 63 (1994), pp. 129–156.[6]
A. R. Conn, N. I. M. Gould, and P. L. Toint , Trust-Region Methods , Society for Industrialand Applied Mathematics (SIAM), Philadelphia, PA, 2000.[7]
J. E. Dennis, Jr and J. J. Mor´e , Quasi-newton methods, motivation and theory , SIAMReview, 19 (1977), pp. 46–89.[8]
J. B. Erway, V. Jain, and R. F. Marcia , Shifted limited-memory DFP systems , in 2013Asilomar Conference on Signals, Systems and Computers, Nov 2013, pp. 1033–1037.[9]
J. B. Erway and R. F. Marcia , Limited-memory BFGS systems with diagonal updates ,Linear Algebra and its Applications, 437 (2012), pp. 333–344.[10]
J. B. Erway and R. F. Marcia , On efficiently computing the eigenvalues of limited-memoryquasi-newton matrices , SIAM Journal on Matrix Analysis and Applications, 36 (2015),pp. 1338–1359.[11]
G. H. Golub and C. F. Van Loan , Matrix Computations , The Johns Hopkins UniversityPress, Baltimore, Maryland, third ed., 1996.[12]
I. Griva, S. G. Nash, and A. Sofer , Linear and nonlinear programming , Society for Indus-trial and Applied Mathematics, Philadelphia, 2009.[13]
W. Huang, K. A. Gallivan, and P.-A. Absil , A broyden class of quasi-newton methodsfor riemannian optimization , SIAM Journal on Optimization, 25 (2015), pp. 1660–1685.[14]
C. Liu and S. A. Vander Wiel , Statistical quasi-newton: A new look at least change , SIAMJournal on Optimization, 18 (2007), pp. 1266–1285.[15]
K. S. Miller , On the Inverse of the Sum of Matrices , Mathematics Magazine, 54 (1981),pp. 67–72.[16]
J. L. Morales and J. Nocedal , Automatic preconditioning by limited memory quasi-newtonupdating , SIAM Journal on Optimization, 10 (2000), pp. 1079–1096.[17]
J. Nocedal , Updating quasi-Newton matrices with limited storage , Mathematics of Compu-tation, 35 (1980), pp. 773–782.[18]
J. Nocedal and S. J. Wright , Numerical Optimization , Springer-Verlag, New York, sec-ond ed., 2006.[19]
D. P. O’Leary and A. Yeremin , The linear algebra of block quasi-newton algorithms , LinearAlgebra and Its Applications, 212 (1994), pp. 153–168.[20]
Y. Zhang and R. Tewarson , Quasi-newton algorithms with updates from the preconvex partof broyden’s family , IMA Journal of Numerical Analysis, 8 (1988), pp. 487–509.
E-mail address : [email protected] Department of Mathematics, PO Box 7388, Wake Forest University, Winston-Salem,NC 27109
E-mail address : [email protected]@ucmerced.edu