[PDF] On solving large-scale limited-memory quasi-Newton equations

Abstract

We consider the problem of solving linear systems of equations arising with limited-memory members of the restricted Broyden class of updates and the symmetric rank-one (SR1) update. In this paper, we propose a new approach based on a practical implementation of the compact representation for the inverse of these limited-memory matrices. Numerical results suggest that the proposed method compares favorably in speed and accuracy to other algorithms and is competitive with several update-specific methods available to only a few members of the Broyden class of updates. Using the proposed approach has an additional benefit: The condition number of the system matrix can be computed efficiently.

Full PDF

OON SOLVING LARGE-SCALE LIMITED-MEMORYQUASI-NEWTON EQUATIONS

JENNIFER B. ERWAY AND ROUMMEL F. MARCIA

Abstract.

We consider the problem of solving linear systems of equationsarising with limited-memory members of the restricted Broyden class of up-dates and the symmetric rank-one (SR1) update. In this paper, we proposea new approach based on a practical implementation of the compact repre-sentation for the inverse of these limited-memory matrices. Numerical resultssuggest that the proposed method compares favorably in speed and accuracyto other algorithms and is competitive with several update-speciﬁc methodsavailable to only a few members of the Broyden class of updates. Using theproposed approach has an additional beneﬁt: The condition number of thesystem matrix can be computed eﬃciently.

Limited-memory quasi-Newton methods, compact representation, restricted Broy-den class of updates, symmetric rank-one update, Broyden-Fletcher-Goldfarb-Shannoupdate, Davidon-Fletcher-Powell update, Sherman-Morrison-Woodbury formula1.

Introduction

We consider linear systems of the following form:(1) B k +1 r = z, where r, z ∈ (cid:60) n and B k +1 ∈ (cid:60) n × n is a limited-memory quasi-Newton matrix ob-tained from applying k + 1 Broyden class updates to an initial matrix B . Weassume n is large, and thus, explicitly forming and storing B k +1 is impracticalor impossible; moreover, in this setting, we assume limited-memory quasi-Newtonmatrices so that only the most recently-computed k updates are stored and usedto update B . In practice, the value of k is small, i.e., less than 10 (see e.g., [5]),making k (cid:28) n . Problems such as (1) arise in quasi-Newton line-search and trust-region methods for large-scale optimization (see, e.g., [6, 7, 12, 18]), as well as inpreconditioning iterative solvers (see, e.g., [16, 19]). In this paper, we propose acompact formulation of B − k +1 that can be used to eﬃciently solve (1).Traditional quasi-Newton methods for minimizing a continuously diﬀerentiablefunction f : (cid:60) n → (cid:60) generate a sequence of iterates { x k } such that f is strictlydecreasing on this sequence. Moreover, at each iteration, the most-recently com-puted iterate x k +1 is used to update the quasi-Newton matrix by deﬁning a new quasi-Newton pair ( s k , y k ) given by s k (cid:52) = x k +1 − x k and y k (cid:52) = ∇ f ( x k +1 ) − ∇ f ( x k ) . Research supported in part by NSF grants CMMI-1334042 and CMMI-1333326. a r X i v : . [ m a t h . NA ] N ov J. B. ERWAY AND R. F. MARCIA

In the case of the Broyden class updates, B k +1 is updated as follows:(2) B k +1 = B k − s Tk B k s k B k s k s Tk B k + 1 y Tk s k y k y Tk + φ ( s Tk B k s k ) w k w Tk , with w k = y k y Tk s k − B k s k s Tk B k s k , where φ ∈ (cid:60) . The most well-known quasi-Newton update is the Broyden-Fletcher-Goldfarb-Shanno ( BFGS ) update, which is obtained by setting φ = 0. While thisis the most widely-used update, research has suggested that other values of φ maylead to faster convergence [4, 14, 20]. Because of this interest in other values of φ ,the entire Broyden class (i.e., φ ∈ (cid:60) ) of quasi-Newton methods has been generalizedto solve minimization problems over Riemannian manifolds [13]. For these reasons,in this paper, we consider solving (1) for other values of φ other than φ = 0(the BFGS update). In particular, we consider the restricted Broyden class ofupdates, where φ ∈ [0 , BFGS update ( φ = 0) using the well-known two-loop recursion [17] (see A for details).However, there is no known corresponding recursion method for other updates of theBroyden class (see [18, p.190]); in fact, for this reason many researchers prefer the BFGS update. In this paper we present various methods for solving linear systemsof equations with limited-memory quasi-Newton matrices. The main contributionof this paper is a new approach by formulating a compact representation for theinverse of any member of the restricted Broyden class. This representation allowsus to eﬃciently solve linear systems of the form (1). The compact formulationfor the inverse of these matrices is based on ideas found in [10], where a compactformulation for the restricted Broyden class of matrices is presented. An additionalbeneﬁt of our proposed approach is the ability to calculate the eigenvalues of thelimited-memory matrix, and hence, the condition number of the linear system.This paper is organized in seven main sections. In Section 2, we present currentmethods for solving linear systems with a limited-memory quasi-Newton matrix.In Section 3, we review the compact formulation for matrices obtained using therestricted Broyden class of updates. The compact formulation for the inverse ofthese matrices is presented in Section 4. In Section 5, we provide a practicalimplementation for solving systems of the form (1) when the system matrix iseither a member of the restricted Broyden class or an

SR1 matrix. In addition,we discuss how to compute the condition number of the linear system being solvedand how to obtain additional computational savings when a new quasi-Newtonpair is computed. We present numerical experiments in Section 6 that demonstratethe competitiveness of our proposed approach. Finally, concluding remarks are inSection 7.

N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 3 Current methods

One approach to solve linear equations with members of the Broyden class is touse the Sherman-Morrison-Woodbury (

SMW ) formula to update the inverse aftereach rank-one change (see, e.g., [11]). Thus, to compute the inverse after a rank-twoupdate, the

SMW formula may be applied twice. (For algorithmic details on thisapproach, see B.) Alternatively, linear solves with a general member of the Broydenclass can be performed using the following recursion formula found in [7] for H k +1 ,the inverse of B k +1 . Speciﬁcally, the inverse of B k +1 in (2) is given by the followingrank-two update:(3) H k +1 = H k + 1 s Tk y k s k s Tk − y Tk H k y k H k y k y Tk H k + Φ k ( y Tk H k y k ) v k v Tk , where(4) v k = s k y Tk s k − H k y k y Tk H k y k , and Φ k = (1 − φ )( y Tk s k ) (1 − φ )( y Tk s k ) + φ ( y Tk H k y k )( s Tk B k s k ) , and H k (cid:52) = B − k . (Algorithmic details associated with this approach can be found inC.)The method proposed in this paper makes use of the so-called compact formu-lation of quasi-Newton matrices that not only appears to be competitive in termsof speed but also yields additional information about the system matrix.3. Compact representation

In this section, we brieﬂy review the compact representation of members of theBroyden class of quasi-Newton matrices.Given k + 1 update pairs of the form { ( s i , y i ) } , the compact formulation of B k +1 is given by(5) B k +1 = B + Ψ k M k Ψ Tk , where M k is a square matrix and B ∈ (cid:60) n × n is an initial matrix, which is oftentaken to be a scalar multiple of the identity. (Note that the size of Ψ is not speciﬁed.)The compact formulation of members of the Broyden class of updates are deﬁnedin terms of S k = [ s s s · · · s k ] ∈ (cid:60) n × ( k +1) ,Y k = [ y y y · · · y k ] ∈ (cid:60) n × ( k +1) , and the following decomposition of S Tk Y k ∈ (cid:60) ( k +1) × ( k +1) : S Tk Y k = L k + D k + R k , where L k is strictly lower triangular, D k is diagonal, and R k is strictly uppertriangular.Assuming all updates are well deﬁned, Erway and Marcia [10] derive the compactformulation for all members of the Broyden class for φ ∈ [0 , φ ∈ [0 , B k +1 is given by (5) where(6) Ψ k = (cid:2) B S k Y k (cid:3) and M k = − (cid:20) S Tk B S k − φ Λ k ( L k − φ Λ k )( L k − φ Λ k ) T − ( D k + φ Λ k ) (cid:21) − , J. B. ERWAY AND R. F. MARCIA where(7) Λ k = diag ≤ i ≤ k ( λ i ) , where λ i = 1 − − φs Ti B i s i − φs Ti y i for 0 ≤ i ≤ k .When φ = 0, this representation simpliﬁes to the BFGS compact representationfound in [5]. In the limited-memory case, k (cid:28) n , and thus, M k is easily inverted.4. Inverses

In this section, we present the compact formulation for the inverse of any memberof the restricted Broyden class of matrices and for

SR1 matrices. We then demon-strate how this compact formulation can be used to solve linear systems with theselimited-memory matrices. Finally, we discuss computing the condition number ofthe linear system and potential computational savings when a new quasi-Newtonpair is computed.The main contribution of this paper is formulating a compact representation forthe inverse for any member of the restricted Broyden class. It should be notedthat the compact formulation for the inverse of a

BFGS matrix is already known(see [5]). We now derive a general expression for the compact representation ofany member of the restricted Broyden class. To do this, we apply the Sherman-Morrison-Woodbury formula (see, e.g., [11]) to the compact representation of B k +1 found in (5): B − k +1 = B − + B − Ψ k (cid:0) − M − k − Ψ Tk B − Ψ k (cid:1) − Ψ Tk B − . For quasi-Newton matrices it is conventional to let H i denote the inverse of B i foreach i ; using this notation, the inverse of B − k +1 is given by(8) H k +1 = H + H Ψ k (cid:0) − M − k − Ψ Tk H Ψ k (cid:1) − Ψ Tk H . Since H Ψ k = H [ B S k Y k ] = [ S k H Y k ] , (8) can be written as(9) H k +1 = H + [ S k H Y k ]( − M − k − Ψ Tk H Ψ k ) − (cid:20) S Tk Y Tk H (cid:21) . Finally, (9) can be simpliﬁed usingΨ Tk H Ψ k = (cid:20) S Tk B S k S Tk Y k Y Tk S k Y Tk H Y k (cid:21) to obtain( − M − k − Ψ Tk H Ψ k ) − = (cid:18)(cid:20) S Tk B S k − φ Λ k ( L k − φ Λ k )( L k − φ Λ k ) T − ( D k + φ Λ k ) (cid:21) − (cid:20) S Tk B S k S Tk Y k Y Tk S k Y Tk H Y k (cid:21)(cid:19) − = (cid:20) − φ Λ k L k − φ Λ k − S Tk Y k L Tk − φ Λ k − Y Tk S k − D k − φ Λ k − Y Tk H Y k (cid:21) − = (cid:20) − φ Λ k − R k − D k − φ Λ k − R Tk − D k − φ Λ k − D k − φ Λ k − Y Tk H Y k (cid:21) − . N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 5

In other words, the compact formulation for the inverse of a member of the restrictedBroyden class is given by(10) H k +1 = H + ˜Ψ k ˜ M k ˜Ψ Tk where ˜Ψ k (cid:52) = [ S k H Y k ] , and(11) ˜ M k (cid:52) = (cid:20) − φ Λ k − R k − D k − φ Λ k − R Tk − D k − φ Λ k − D k − φ Λ k − Y Tk H Y k (cid:21) − . In the following section, we discuss a practical method to compute ˜ M k . In D,we explore remarkable relationships between various updates.5. Recursive formulation and practical implementation

In this section, we present a practical method to compute ˜ M k in (10) given by(11). We begin by providing an alternative expression for ˜ M that will allow us todeﬁne a recursion method to compute ˜ M k . Lemma 1.

Suppose H = H + ˜Ψ ˜ M ˜Ψ T , is the inverse of a member of therestricted Broyden class of updates after performing one update. Then, (12) ˜ M = (cid:20) ˜ α ˜ β ˜ β ˜ δ (cid:21) , where (13) ˜ α = 1 s T y + Φ y T H y ( s T y ) , ˜ β = − Φ y T s , ˜ δ = − − Φ y T H y , and Φ is given in (4) . Proof.

Expanding (3), yields H = H + (cid:18) s T y + Φ y T H y ( s T y ) (cid:19) s s T − Φ y T s H y s T − Φ y T s s y T H − − Φ y T H y H y y T H , which simpliﬁes to(14) H = H + (cid:2) s H y (cid:3) (cid:20) ˜ α ˜ β ˜ β ˜ δ (cid:21) (cid:20) s T y T H (cid:21) , where ˜ α , ˜ β , and ˜ δ are deﬁned as in (13) and Φ is given by (4). Note the (14) isof the form H = H + ˜Ψ ˜ M ˜Ψ T .We now show that ˜ M deﬁned by (12) and (13) is equivalent to (11) with k = 0.To see this, we simplify the entries of (12). First, deﬁne∆ (cid:52) = (1 − φ )( y T s ) + φ ( y T H y )( s T B s ) . Note that ∆ (cid:54) = 0 since H and B are positive deﬁnite and φ ∈ [0 , = (1 − φ )( s T y ) / ∆ , and thus,˜ α = 1 s T y + (1 − φ ) y T H y ∆ , ˜ β = − (1 − φ )( s T y )∆ , and ˜ δ = − φ ( s T B s )∆ . J. B. ERWAY AND R. F. MARCIA

Consider the inverse of (12), which is given by(15) (cid:20) ˜ α ˜ β ˜ β ˜ δ (cid:21) − = 1˜ α ˜ δ − ˜ β (cid:20) ˜ δ − ˜ β − ˜ β ˜ α (cid:21) . We now simplify the entries on the right side of (15). The determinant of (12) isgiven by˜ α ˜ δ − ˜ β = − (1 − Φ )( s T y )( y T H y ) − Φ (1 − Φ )( s T y ) − Φ ( s T y ) = 1∆ (cid:18) s T B s λ (cid:19) , (16)where λ in (16) is deﬁned in (7). Thus, the ﬁrst entry of (15) is given by(17) ˜ δ ˜ α ˜ δ − ˜ β = − φ ( s T B s )∆ ∆ λ s T B s = − φλ . The oﬀ-diagonal elements of the right-hand side of (15) simplify as follows:˜ β ˜ α ˜ δ − ˜ β = − (cid:18) (1 − φ )( s T y )∆ (cid:19) (cid:18) ∆ (cid:18) λ s T B s (cid:19)(cid:19) = s T y + φλ . (18)Finally, the last entry of (15) can be simpliﬁed as follows:˜ α ˜ α ˜ δ − ˜ β = 1 s T y ∆ λ s T B s + (1 − φ ) y T H y λ s T B s = − s T y − φλ − y T H y . (19)(For details in the calculations of (16), (18), (19), see F.) Thus, (15) together with(17), (18), and (19) gives (cid:20) ˜ α ˜ β ˜ β ˜ δ (cid:21) − = (cid:20) − φλ − s T y − φλ − s T y − φλ − s T y − φλ − y T H y (cid:21) , showing that ˜ M deﬁned by (12) and (13) is equivalent to (11) with k = 0. (cid:3) Together with Lemma 1, the following theorem shows how ˜ M k is related to˜ M k − ; from this, we present an algorithm to compute ˜ M k using recursion. Thiscomputation avoids explicitly forming B k , which is used in the deﬁnition of Λ k andΦ k . Theorem 1.

Suppose H j +1 = H + ˜Ψ j ˜ M j ˜Ψ Tj as in (10) for all j ∈ { , . . . , k } , isthe inverse of a member of the Broyden class of updates for a ﬁxed φ ∈ [0 , . If ˜ M is deﬁned by (12), then for all j ∈ { , . . . k } , ˜ M j satisﬁes the recursion relation (20) ˜ M j = Π Tj  ˜ M j − + ˜ δ j ˜ u j ˜ u Tj ˜ β j ˜ u j ˜ δ j ˜ u j ˜ β j ˜ u Tj ˜ α j ˜ β j ˜ δ j ˜ u Tj ˜ β j ˜ δ j  Π j , where Π j =  I j I j  is such that (cid:2) ˜Ψ j − s j H y j (cid:3) Π j = ˜Ψ j , and ˜ α j = 1 s Tj y j + Φ j y Tj H j y j ( s Tj y j ) , ˜ β j = − Φ j y Tj s j , ˜ δ j = − − Φ j y Tj H j y j , and ˜ u j = ˜ M j − ˜Ψ Tj − y j . N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 7

Proof.

This proof is by induction on j . For the base case, we show that equation(20) holds for j = 1. Setting k = 1 in (3) yields(21) H = H + (cid:2) s H y (cid:3) (cid:20) ˜ α ˜ β ˜ β ˜ δ (cid:21) (cid:20) s T y T H (cid:21) , where ˜ α = 1 s T y + Φ y T H y ( s T y ) , ˜ β = − Φ y T s , and ˜ δ = − − Φ y T H y . By Lemma 1, (21) can be written as(22) H = H + ˜Ψ ˜ M ˜Ψ T + (cid:2) s H y (cid:3) (cid:20) ˜ α ˜ β ˜ β ˜ δ (cid:21) (cid:20) s T y T H (cid:21) , where ˜Ψ = [ S H Y ] and ˜ M is given by (12). Letting ˜ u = ˜ M ˜Ψ T y , we havethat(23) H y = (cid:16) H + ˜Ψ ˜ M ˜Ψ T (cid:17) y = H y + ˜Ψ ˜ u . Substituting (23) into the last quantity on the right side of (22) yields (cid:2) s H y (cid:3) (cid:20) ˜ α ˜ β ˜ β ˜ δ (cid:21) (cid:20) s T y T H (cid:21) = (cid:2) s H y + ˜Ψ ˜ u (cid:3) (cid:20) ˜ α ˜ β ˜ β ˜ δ (cid:21) (cid:20) s T y T H + ˜ u T ˜Ψ T (cid:21) = (cid:2) ˜Ψ s H y (cid:3)  ˜ δ ˜ u ˜ u T ˜ β ˜ u ˜ δ ˜ u ˜ β ˜ u T ˜ α ˜ β ˜ δ ˜ u T ˜ β ˜ δ   ˜Ψ T s T y T H  . Letting Π be deﬁned as in (20) with j = 1, gives that (cid:2) ˜Ψ s H y (cid:3) Π = ˜Ψ , and thus, H = H + ˜Ψ Π T ¯ M Π ˜Ψ T , where ¯ M (cid:52) =  ˜ M + ˜ δ ˜ u ˜ u T ˜ β ˜ u ˜ δ ˜ u ˜ β ˜ u T ˜ α ˜ β ˜ δ ˜ u T ˜ β ˜ δ  . It remains to show that ˜ M = Π T ¯ M Π ; however, for simplicity, we instead show˜ M − = Π T ¯ M − Π.Notice that ¯ M can be written as the product of three matrices:¯ M =  I u  (cid:20) ˜ M

00 ˜ N (cid:21)  I u T  , where ˜ N = (cid:20) ˜ α ˜ β ˜ β ˜ δ (cid:21) . Then, ¯ M − is given by¯ M − =  I − ˜ u T  (cid:20) ˜ M −

00 ˜ N − (cid:21)  I − ˜ u  . (24)It can be shown, using similar computations as in the proof of Lemma 1, that˜ N − = 1˜ α ˜ δ − ˜ β (cid:20) ˜ δ − ˜ β − ˜ β ˜ α (cid:21) = (cid:20) − φλ − s T y − φλ − s T y − φλ − s T y − φλ − y T H y (cid:21) . J. B. ERWAY AND R. F. MARCIA

Substituting ˜ N − into (24) and simplifying yields¯ M − =  ˜ M − − ˜Ψ T y − φλ − s T y − φλ − y T ˜Ψ − s T y − φλ ˜ u T M − ˜ u − s T y − φλ − y T H y  . (25)Terms in the (3,3)-entry in the right side of (25) can be simpliﬁed using that˜ M − ˜ u = ˜Ψ T y and that y T H y = y T H y + y T ˜Ψ ˜ M ˜Ψ T y as follows:˜ u T ˜ M − ˜ u − y T H y = y T ˜Ψ ˜ M ˜Ψ T y − y T H y = − y T H y + y T H y − y T H y = − y T H y . Thus,(26) ¯ M − =  ˜ M − − ˜Ψ T y − φλ − s T y − φλ − y T ˜Ψ − s T y − φλ − s T y − φλ − y T H y  . Lemma 1, together with the substitution ˜Ψ T y = (cid:18) S T y Y T H y (cid:19) , gives thatΠ T ¯ M − Π = Π Tj  ˜ M − − ˜Ψ T y − φλ − s T y − φλ − y T ˜Ψ − s T y − φλ − s T y − φλ − y T H y  Π j =  − φλ − s T y − φλ − S T y − φλ − s T y − φλ − s T y − φλ − s T y − φλ − y T H y − Y T H y − y T S − s T y − φλ − y T H Y − s T y − φλ − y T H y  =  − φ Λ − R − D − φ Λ − S T y − φλ − s T y − φλ − R T − D − φ Λ − D − φ Λ − Y T H Y − Y T H y − y T S − s T y − φλ − y T H Y − s T y − φλ − y T H y  = (cid:20) − φ Λ − R − D − φ Λ − R T − D − φ Λ − D − φ Λ − Y T H Y (cid:21) = ˜ M − , proving the base case.The inductive step is similar to the base case. For the inductive step, we assumethe theorem holds for k = 1 , . . . j − k = j . We begin,as before, setting k = j in (3) to obtain(27) H j +1 = H j + (cid:2) s j H j y j (cid:3) (cid:20) ˜ α j ˜ β j ˜ β j ˜ δ j (cid:21) (cid:20) s Tj y Tj H j (cid:21) , where ˜ α j = 1 s Tj y j + Φ j y Tj H j y j ( s Tj y j ) , ˜ β j = − Φ j y Tj s j , and ˜ δ j = − − Φ j y Tj H j y j , N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 9

Using the induction hypothesis, (27) becomes H j +1 = H + ˜Ψ j − ˜ M j − ˜Ψ j − + (cid:2) s j H j y j (cid:3) (cid:20) ˜ α j ˜ β j ˜ β j ˜ δ j (cid:21) (cid:20) s Tj y Tj H j (cid:21) . Similar to the base case, (cid:2) s j H j y j (cid:3) (cid:20) ˜ α j ˜ β j ˜ β j ˜ δ j (cid:21) (cid:20) s Tj y Tj H j (cid:21) = (cid:2) s j H y j + ˜Ψ j − ˜ u j (cid:3) (cid:20) ˜ α j ˜ β j ˜ β j ˜ δ j (cid:21) (cid:20) s Tj y Tj H + ˜ u Tj ˜Ψ Tj − (cid:21) = (cid:2) ˜Ψ j − s j H y j (cid:3)  ˜ δ j ˜ u j ˜ u Tj ˜ β j ˜ u j ˜ δ j ˜ u j ˜ β j ˜ u Tj ˜ α j ˜ β j ˜ δ j ˜ u Tj ˜ β j ˜ δ j   ˜Ψ Tj − s Tj y Tj H  , where ˜ u j = ˜ M j − ˜Φ Tj − y j . With Π j deﬁned as in (20), then (cid:2) ˜Ψ j − s j H y j (cid:3) Π j =˜Ψ j , and so H j = H + ˜Ψ j Π Tj ¯ M j Π j ˜Ψ Tj , where(28) ¯ M j (cid:52) =  ˜ M j − + ˜ δ j ˜ u j ˜ u Tj ˜ β j ˜ u j ˜ δ j ˜ u j ˜ β j ˜ u Tj ˜ α j ˜ β j ˜ δ j ˜ u Tj ˜ β j ˜ δ j  . It remains to show that ˜ M j = Π Tj ¯ M j Π j ; however, for simplicity, we instead show˜ M − j = Π Tj ¯ M − j Π j using a similar argument as in the base case. In particular,similar to (25), it can be shown that¯ M − j =  ˜ M − j − − ˜Ψ Tj − y j − φλ j − s Tj y j − φλ j − y Tj ˜Ψ j − − s Tj y j − φλ j − s Tj y j − φλ j − y Tj H y j  . By the inductive hypothesis that (11) holds for k = j − M − j − = (cid:20) − φ Λ j − − R j − − D j − − φ Λ j − − R Tj − − D j − − φ Λ j − − D j − − φ Λ k − Y Tj − H Y j − (cid:21) and so,Π Tj ¯ M − j Π j = Π Tj  ˜ M − j − − ˜Ψ Tj − y j − φλ j − s Tj y j − φλ j − y Tj ˜Ψ j − − s Tj y j − φλ j − s Tj y j − φλ j − y Tj H y j  Π j =  − φ Λ j − − R j − − D j − − φ Λ j − − S Tj − y j − φλ j − s Tj y j − φλ j − R Tj − − D j − − φ Λ j − − D j − − φ Λ j − − Y Tj − H Y j − − Y Tj − H y j − y Tj S j − − s Tj y j − φλ j − y Tj H Y j − − s Tj y j − φλ j − y Tj H y j  = (cid:20) − φ Λ j − R j − D j − φ Λ j − R Tj − D j − φ Λ j − D j − φ Λ j − Y Tj H Y j (cid:21) = ˜ M − j , proving the induction step. (cid:3) Using Theorem 1, Algorithm 1 computes ˜ M k and then uses the compact formu-lation for the inverse to solve linear systems of the form (1). There are two parts tothe for loop in Algorithm 1: The ﬁrst half of the loop is used to build products ofthe form s Tj B j s j ; the second half of the loop computes ˜ M k , making use of s Tj B j s j tocompute Φ j . In the special case when φ = 0 or φ = 1, Φ j can be quickly computedas Φ j = 1 and Φ j = 0, respectively.Deﬁne φ , B , and H ;Form M using (6) and ˜ M using (12);Let Ψ = ( B s y ) and ˜Ψ = ( s H y ); for j = 1 : k % Part 1: Compute s Tj B j s j u j ← M j − (Ψ Tj − s j ); s Tj B j s j ← s Tj B s j + ( s Tj Ψ j − ) u j ; α j ← − (1 − φ ) / ( s Tj B j s j ); β j ← − φ/ ( y Tj s j ); δ j ← (1 + φ ( s Tj B j s j ) / ( y Tj s j )) / ( y Tj s j ); M j ← Π j  M j − + α j u j u Tj α j u j β j u j α j u Tj α j β j β j u Tj β j δ j  Π Tj , where Π j is as in (20);%Part 2: Compute ˜ M j ˜ u j ← ˜ M j − ( ˜Ψ Tj − y j ); y Tj H j y j ← y Tj H y j + ( y Tj ˜Ψ j − )˜ u j ;Φ j = (1 − φ )( y Tj s j ) / ((1 − φ )( y Tj s j ) + φ ( y Tj H j y j )( s Tj B j s j ));˜ α j ← (1 + Φ j ( y Tj H j y j ) / ( y Tj s j )) / ( y Tj s j );˜ β j ← − Φ j / ( y Tj s j );˜ δ j ← − (1 − Φ j ) / ( y Tj H j y j );˜ M j ← Π Tj  ˜ M j − + δ j ˜ u j ˜ u Tj ˜ β j ˜ u j ˜ δ j ˜ u j ˜ β j ˜ u Tj ˜ α j ˜ β j ˜ δ j ˜ u Tj ˜ β j ˜ δ j  Π j , where Π j is as in (20); end r = H z + ˜Ψ k ˜ M k ˜Ψ Tk z ; ALGORITHM 1:

Computing r = B − k +1 z using (10)In Algorithm 1, the computations for u j and ˜ u j in lines 7 and 15, respectively,can be simpliﬁed by noting that(29) Ψ Tj − s j = (cid:20) S Tj − B s j Y Tj − s j (cid:21) and ˜Ψ Tj − y j = (cid:20) S Tj − y j Y Tj − H y j (cid:21) . In other words, Ψ Tj − s j is a 2 j vector whose ﬁrst j entries are the ﬁrst j entries inthe ( j + 1)th column of S Tk B S k and whose last j entries are the ﬁrst j entries inthe ( j + 1)th row of S Tk Y k ; similarly, ˜Ψ Tj − y j is a vector whose ﬁrst j entries arethe ﬁrst j entries in the ( j + 1)th column of S Tk Y k and whose last j entries are theﬁrst j th entries in the ( j + 1)th column of Y Tk H Y k . Thus, computing u j and ˜ u j N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 11 only requires 2 j (4 j −

1) ﬂops additional ﬂops after precomputing and storing thequantities S Tk B S k , Y Tk H Y k , and S Tk Y k . Note that provided B is a scalar multipleof the identity matrix and H = B − , then forming S Tk B S k and Y Tk H Y k requires( k + 1)( k + 2) n ﬂops each, and forming S Tk Y k requires ( k + 1) (2 n −

1) ﬂops.Computing the scalar s Tj B j s j in line 8 of Algorithm 1 requires performing oneinner product between Ψ Tj − s j and u j and then summing this with s Tj B s j , result-ing in an operation cost of 4 j . This is the same cost is associated with forming y Tj H j y j in line 16. The scalar Φ j requires ten ﬂops to compute. The other scalars( α j , β j , δ j , ˜ α j , ˜ β j , and ˜ δ j ) require a total of 14 ﬂops. Forming the matrices M j and˜ M j requires 24 j + 16 j ﬂops each. Thus, excluding the cost of forming S Tk B S k , S Tk Y k , and Y Tk H Y k , the operation count for computing ˜ M k is given by k (cid:88) j =1 j (4 j −

1) + 8 j + 24 + 24 j + 16 j = 13 (cid:0) k + 90 k + 122 k (cid:1) . Finally, forming r in line 23 requires k + 1 vector inner products and (2 k + 1)( n + k + 1) + 2 n additional ﬂops. Thus, the overall ﬂop count for Algorithm 1 is (2 k +1)( n + k + 1) + 2 n + (40 k + 90 k + 122 k ) / n − k + 1) , which includes thecost of k + 1 vector inner products.Table 1 presents the computational complexity of Algorithm 1 and the algorithmsin Appendices A-C for linear solves where the system matrix is from the restrictedBroyden class of updates. The operational costs do not include the computationfor S Tk Y k , S Tk B S k , and Y Tk H Y k , which all can be eﬃciently updated as new quasi-Newton pairs are computed.Alg. Flop count (including vector inner products)1 (2 k + 1)( n + k + 1) + 2 n + (40 k + 90 k + 122 k ) + (2 n − k + 1)3 4 nk + 3 k + n + 2 k (2 n − ( k + 1)(23 kn + 52 n + 12 k + 42) + (2 n − k + 1) + 7( k + 1))6 (5 n + 3)( k + 1) k + (10 n + 14)( k + 1) + (2 n − k + 2)( k + 1) Table 1.

Computational complexity comparison of Algorithms 1,3, 4, and 6 when the system matrix is a member of the restrictedBroyden class of quasi-Newton updates.5.1.

Eﬃcient solves after updates.

To compute the factors in (8), it is necessaryto form the matrices L k , D k and R k . After a new quasi-Newton pair has beencomputed, updating (8) can be done eﬃciently by storing S Tk S k , Y Tk Y k and S Tk Y k .In this case, to update H k +1 we add a column to (and possibly delete a columnfrom) S k and Y k ; the corresponding changes can then be made in S Tk S k , Y Tk Y k and S Tk Y k (see [5] for more details).In the practical implementation given in Algorithm 1, it is necessary to computeΨ Tj − s j and ˜Ψ Tj − y j given by (29), which can be formed using the stored quantities S Tk S k , Y Tk Y k , and S Tk Y k . To compute these quantities when a new quasi-Newtonpair of updates is obtained we apply the same strategy as described above byadding (and possibly deleting) columns in S k and Y k , enabling Ψ Tj − s j and ˜Ψ Tj − y j to be updated eﬃciently. With these updates, the work required for Algorithm 1is signiﬁcantly reduced. Compact representation of the inverse of an SR1 matrix.

In this sub-section, we demonstrate that this same strategy can be applied to the

SR1 update.The symmetric rank-one ( SR1 ) update is obtained by setting φ = y Tk s k / ( y Tk s k − s Tk B k s k ; in this case, B k +1 = B k + 1 s Tk ( y k − B k s k ) ( y k − B k s k )( y k − B k s k ) T . (30)The SR1 update is a member of the Broyden class but not the restricted Broydenclass. This update exhibits hereditary symmetric but not positive deﬁniteness.This update is remarkable in that it is self-dual: Initializing B − in place of B and interchanging y k and s k everywhere in (30) yields a recursion relation for B − k +1 . Thus, like the BFGS update, solving linear systems with

SR1 matrices canbe performed using vector inner products (see E for details).The compact formulation for the

SR1 update is given by B k +1 = B + ˆΨ k ˆ M k ˆΨ Tk , where(31) ˆΨ k = Y k − B S k and ˆ M k = ( D k + L k + L Tk − S Tk B S k ) − , where Ψ k ∈ (cid:60) n × ( k +1) and M k ∈ (cid:60) ( k +1) × ( k +1) (see [5]). The SR1 update has thedistinction of being the only rank-one update in the Broyden class of updates. Itis for this reason that Ψ k has only k + 1 columns and M k is a ( k + 1) × ( k + 1)matrix. (In contrast, the compact representation for rank-two updates have twiceas many columns and M k is twice the size as in the SR1 case.)The compact formulation for the inverse of an

SR1 matrix can be derived usingthe same strategy as for the restricted Broyden class. The

SMW formula appliedto the compact formulation of an

SR1 with Ψ k and M k deﬁned as in (31) yields H k +1 = H + H ( Y k − B S k ) (cid:0) − M k − Ψ Tk H Ψ k (cid:1) − ( Y k − B S k ) T H . Substituting in for M k using (31) together with the identityΨ Tk H Ψ k = Y Tk H Y k − S Tk Y k − Y Tk S k + S Tk B S k , yields H k +1 = H + ( H Y k − S k ) (cid:0) − ( D k + L k + L Tk − S Tk B S k ) − ( Y Tk H Y k − S Tk Y k − Y Tk S k + S Tk B S k ) (cid:1) − ( H Y k − S k ) T = H + ( S k − H Y k ) T ( D k + R k + R Tk − Y Tk H Y k ) − ( S k − H Y k ) T . In other words, the compact formulation for the inverse of an

SR1 matrix is givenby(32) H k +1 = H + ˜Ψ k ˜ M k ˜Ψ Tk where ˜Ψ k (cid:52) = [ S k − H Y k ] , and ˜ M k (cid:52) = ( D k + R k + R Tk − Y Tk H Y k ) − . Algorithm 2 details how to solve a linearsystem deﬁned by an

L-SR1 matrix via the compact representation for its inverse.Assuming S Tk Y k is precomputed, Algorithm 2 requires (2 k + 1)( n + k + 1) + 2 n +(2 n − k +1) ﬂops, which includes ( k +1) vector inner products to compute ˜Ψ Tk z inline 5. Note that this count does not include the cost of inverting the ( k +1) × ( k +1)matrix, ˜ M k , in line 4 since when k (cid:28) n , this cost is signiﬁcantly smaller than thedominant costs for Algorithm 2. N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 13

Deﬁne H , S k and Y k ;Form S Tk Y k = L k + D k + R k ;Form ˜Ψ k = S k − H Y k ;˜ M k ← ( D k + R k + R Tk − Y Tk H Y k ) − ; r = H z + ˜Ψ k ˜ M k ˜Ψ Tk z ; ALGORITHM 2:

Computing r = B − k +1 z using (32) when B k +1 is an SR1matrixTable 2 presents the computational complexity of Algorithm 2 and Algorithm8 in E for solving SR1 linear systems. The operational costs do not include thecomputation for S Tk Y k , S Tk B S k , and Y Tk H Y k , which all can be eﬃciently updatedas new quasi-Newton pairs are computed.Algorithm Flop count (including vector inner products)2 (2 k + 1)( n + k + 1) + 2 n + (2 n − k + 1)8 ( k + 1)( k (2 n + 1) + 2(3 n + 1)) + 2 n + (2 n − k + 2)( k + 1) Table 2.

Computational complexity comparison of Algorithms 2and 8 when the system matrix is an SR1 matrix. Note that thecomputational cost for Algorithm 2 does not include the cost ofinverting a ( k + 1) × ( k + 1) matrix, where k (cid:28) n .5.3. Computing condition numbers.

Provided the initial approximate Hessian B is taken to be a scalar multiple of the identity, (i.e., B = γI ), it is possibleto compute the condition number of B k +1 in (1). To do this, we consider thecondition number of H k +1 = B − k +1 . The eigenvalues of H k +1 can be obtained fromthe compact representation of H k +1 together with the techniques from [3, 10]. Forcompleteness, we review this approach below.Suppose B k +1 is a member of the restricted Broyden class. The eigenvalues of H k +1 can be computed using the compact formulation for H k +1 given in (10): H k +1 = H + ˜Ψ k ˜ M k ˜Ψ Tk . Letting ˜Ψ k = QR be the QR decomposition of ˜Ψ k where Q ∈ (cid:60) n × n is an orthogonalmatrix and R ∈ (cid:60) n × k +1) is an upper triangular matrix. Then, H k +1 = H + ˜Ψ k ˜ M k ˜Ψ Tk (33) = H + QR ˜ M k R T Q T . The matrix R ˜ M k R T is a real symmetric n × n matrix. However, since ˜Ψ k ∈(cid:60) n × k +1) has at most rank 2( k + 1), then R can be written in the form R = (cid:18) R (cid:19) , where R ∈ k + 1) × k + 1). Thus, R ˜ M k R T = (cid:18) R (cid:19) ˜ M k (cid:0) R T (cid:1) = (cid:18) R ˜ M k R T

00 0 (cid:19) . Since R ˜ M k R T ∈ (cid:60) k +1) × k +1) , its spectral decomposition can be computed ex-plicitly. Letting V D V T is the spectral decomposition of R ˜ M k R T and substitut-ing into (34) yields: H k +1 = H + QV DV T Q T where V (cid:52) = (cid:18) V

00 0 (cid:19) ∈ (cid:60) n × n and D (cid:52) = (cid:18) D

00 0 (cid:19) ∈ (cid:60) n × n . Thus, H k +1 = H + QV DV T Q T = QV ( γ − I + D ) V T Q T , giving the spectral decomposition of H k +1 . In particular, the matrix H k +1 has aneigenvalue of γ − with multiplicity n − k + 1) and 2( k + 1) eigenvalues given by γ − + d i for 1 ≤ i ≤ k + 1) where d i denotes the i th diagonal element of D .Thus, the condition number of H k +1 can be computed as the ratio of the largesteigenvalue to the smallest eigenvalue (in magnitude). The condition number of B k +1 is the reciprocal of this value.When B k +1 a quasi-Newton matrix generated by SR1 updates, the procedure issimilar except ˜Ψ k has half as many columns resulting in ( k + 1) eigenvalues givenby γ − + d i for 1 ≤ i ≤ ( k + 1) and n − ( k + 1) eigenvalues of γ − . After a newquasi-Newton pair is computed it is possible to update the QR factorization (fordetails, see [10]). 6. Numerical experiments

In this section, we demonstrate the accuracy of the proposed method for solvingquasi-Newton equations. We solve the following linear system:(34) B k +1 p k +1 = − g k +1 , where B k +1 is a member of the restricted Broyden class of updates or an SR1matrix. For members of the restricted Broyden class of updates, we consideredthree values of φ : 0 . , .

50, and 0 .

99 and compare the following methods:(1) “Two-loop recursion” for the BFGS case ( φ = 0 .

0) (Algorithm 3)(2) Recursive SMW approach (Algorithm 4)(3) Recursion to compute products with B − k +1 = H k +1 (Algorithm 6)(4) Compact inverses formulation (Algorithm 1)For SR1 matrices, we compare the following methods:(1) The self-duality method (Algorithm 8)(2) Compact inverses formulation (Algorithm 2)We consider problem sizes ( n ) ranging from 10 ,

000 to 1 , , (cid:107) B k +1 p k +1 − ( − g k +1 ) (cid:107) (cid:107) g k +1 (cid:107) , and the time (in seconds) needed to compute each solution. The number of limited-memory updates was set to ﬁve.We simulate the ﬁrst ﬁve iterations of an unconstrained line-search method. Wegenerate the initial point x and the gradients g j = ∇ f ( x j ), for 0 ≤ j ≤

5, randomlyand compute the subsequent iterations x j +1 = x j − α j H j g j , for 0 ≤ j ≤ , N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 15 where H j = B − j is the inverse of the quasi-Newton matrix. Results.

We ran each algorithm ten times for each value of φ (0 . , .

50, and 0 . n (10 , , ,

000 and 1 , , L-BFGS case. In this case, Algorithm 1 is comparable to the thetwo-loop recursion (Algorithm 3), which is speciﬁc to the

L-BFGS update. (When n = 1 , , SR1 update, the compact inverse formulation outperformed the algorithm based onself-duality (Algorithm 8).

Table 3.

Relative error for Broyden class matrices with φ = 0 .

00 (

BFGS ).Two-Loop Recursive Recursive Compact n Recursion SMW H k +1 Inverses(Alg. 3) (Alg. 4) (Alg. 6) (Alg. 1)

Table 4.

Relative error for Broyden class matrices with φ = 0 . n Recursive SMW Recursive H k +1 Compact Inverses(Alg. 4) (Alg. 6) (Alg. 1)

Table 5.

Relative error for Broyden class matrices with φ = 0 . n Recursive SMW Recursive H k +1 Compact Inverses(Alg. 4) (Alg. 6) (Alg. 1)

Table 6.

Relative error for SR1 matrices. n Self-Duality Compact Inverses(Alg. 8) (Alg. 2) Concluding Remarks

We derived the compact formulation for members of the restricted Broyden classand the

SR1 update. With this compact formulation, we showed how to solve lin-ear systems deﬁned by limited-memory quasi-Newton matrices. Numerical resultssuggest that this proposed approach is eﬃcient and accurate. This approach hastwo distinct advantages over existing procedures for solving limited-memory quasi-Newton systems. First, there is a natural way to use the compact formulation forthe inverse to obtain the condition number of the linear system. Second, whena new quasi-Newton pair is computed, computational savings can be achieved bysimple updates to the matrix factors in the compact formulation.Future work includes integrating this linear solver inside line-search and trust-region methods for large-scale optimization.8.

Acknowledgements

The authors would like to thank Tammy Kolda for initial conversations on thissubject.

Appendix A. Two-loop recursion

The Broyden-Fletcher-Goldfarb-Shanno (

BFGS ) update is obtained by setting φ = 0 in (2). In this case, B k +1 simpliﬁes to(35) B k +1 = B k − s Tk B k s k B k s k s Tk B k + 1 y Tk s k y k y Tk . The inverse of the

BFGS matrix B k +1 is given by B − k +1 = (cid:18) I − s k y Tk y Tk s k (cid:19) B − k (cid:18) I − y k s Tk y Tk s k (cid:19) + s k s Tk y Tk s k , which can be written recursively as B − k +1 = ( V Tk · · · V T ) B − ( V · · · V k ) + 1 y T s ( V Tk · · · V T ) s s T ( V · · · V k )+ 1 y T s ( V Tk · · · V T ) s s T ( V · · · V k ) + · · · + 1 y Tk s k s k s Tk , (36)where V i = I − y Ti s i y i s Ti (see, e.g., [18, p.177–178]). Solving (1) can be doneeﬃciently using the well-known two-loop recursion [17].Assuming that the inner products y Ti s i can be precomputed and B is a scalarmultiple of the identity matrix, then the total operation count for Algorithm 3 is4 nk + 3 k + n + 2 k (2 n −

1) ﬂops, which includes 2 k vector inner products. N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 17

Matrix size (n) T i m e ( s e c ond s ) -3 -2 -1 Alg. 4 (SMW recursion)Alg. 6 (Dual recursion)Alg. 3 (Two-loop recursion)Alg. 1 (Compact inverses)

Matrix size (n) T i m e ( s e c ond s ) -3 -2 -1 Alg. 4 (SMW recursion)Alg. 6 (Dual recursion)Alg. 1 (Compact inverses) (a) φ = 0 (BFGS) (b) φ = 0 . Matrix size (n) T i m e ( s e c ond s ) -3 -2 -1 Alg. 4 (SMW recursion)Alg. 6 (Dual recursion)Alg. 1 (Compact inverses)

Matrix size (n) T i m e ( s e c ond s ) -3 -2 -1 Alg. 8 (SR1 self-duality)Alg. 2 (Compact inverses) (c) φ = 0 .

99 (d) SR1

Figure 1.

Semi-log plots of the computational times (in seconds)for ten runs of each of the algorithms discussed in this paper. In(a), (b), and (c), the system matrix is a member of the restrictedBroyden class of updates with φ = 0 , . , and 0 .

99, respectively.In (d), the system matrix is an

SR1 matrix. The proposed methodusing compact inverses generally outperforms the other methods.Note that when φ = 0, our proposed method is competitive withthe “two-loop recursion”, which is speciﬁc to the BFGS update. q ← z ; for i = k − , . . . , α i ← ( s Ti q ) / ( y Ti s i ); q ← q − α i y i ; end r ← B − q ; for i = 0 , . . . , k − β ← ( y Ti r ) / ( y Ti s i ); r ← r + ( α i − β ) s i : endALGORITHM 3: Two-loop recursion to compute r = B − k z when B k is a BFGSmatrix Appendix B. SMW recursion

In the symmetric case, the

SMW formula for a rank-one change is given by(37) ( A + αuu T ) − = A − − α αu T A − u A − uu T A − . We now show how (37) can be used to compute the inverse of B k +1 . First, for0 ≤ j ≤ k , deﬁne(38) α j = ( y Tj s j ) − , u j = y j ,α j +1 = φ ( s Tj B j s j ) , u j +1 = y j y Tj s j − B j s j s Tj B j s j ,α j +2 = − ( s Tj B j s j ) − , u j +2 = B j s j , Letting C = B and U i = α i u i u Ti for 0 ≤ i ≤ k , we deﬁne the matrices(39) C = C + U , C = C + U , . . . , C k +1) = C k +1) − + U k +1) − . By construction, C k +1) = B k +1 . It can be shown that each C i is positive deﬁniteand so u Ti C − i u i > i . For the restricted Broyden class, α j and α j +1 , j = 0 , . . . , k are both strictly positive. Thus, the denominator in (37) is strictlypositive for the rank-one changes associated the 3 j and 3 j + 1, j = 0 , . . . k , andthus, the SMW formula is well-deﬁned for these rank-one updates. Since C j = B j and B j is positive deﬁnite, the rank-one update associated with j + 2, j = 0 , . . . k ,must result in a positive-deﬁnite matrix; in other words, (37) is well deﬁned. Thus,solving the system B k +1 r = z is equivalent to computing r = C − k +1) z . Apply the SMW formula in (37) to obtain the inverse of C i +1 from C − i , we obtain(40) C − i +1 z = C − i z − α i α i u Ti C − i u i C − i u i u Ti C − i z, for 0 ≤ i < k + 1) . Recursively applying (40) to C − i z , we obtain that(41) C − i +1 z = C − z − i (cid:88) j =0 α j α j u Tj C − j u j C − j u j u Tj C − j z, and more importantly,(42) C − k +1) z = C − z − k +1) − (cid:88) j =0 α j α j u Tj C − j u j C − j u j u Tj C − j z. Computing (42) can be simpliﬁed by deﬁning the following two quantities:(43) τ j = α j α j u Tj C − j u j and p j = C − j u j . Substituting (43) into (42) yields(44) C − k +1) z = C − z − k +1) − (cid:88) j =0 τ j ( p Tj z ) p j , N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 19 where τ j = α j (1 + α j p Tj u j ) − . The advantage of representation is that computing C − k +1) z requires only vector inner products. Moreover, the vectors p j can becomputed by evaluating (41) at z = u j for i = j −

1; that is, p j = C − j u j = C − u j − j − (cid:88) i =0 τ i ( p Ti u j ) p i . In Algorithm 4, we present the recursion to solve the linear system B k +1 r = z using the SMW formula (see related methods in [9,15]). We assume B is an easily-invertible initial matrix. r ← C − z ; for j = 0 , . . . , k + 1) − if mod ( j,

3) = 0 i ← j/ u ← y i ; α ← y Ti s i ; else if mod ( j,

3) = 1 i ← ( j − / u ← ( y i ) / ( y Ti s i ) − ( B i s i ) / ( s Ti B i s i ); α ← φ ( s Ti B i s i ); else i ← ( j − / u ← B i s i ; α ← − ( s Ti B i s i ) − ; end p j ← C − u ; for l = 0 , . . . , jp j ← p j − τ l ( p Tl u ) p l ; end τ j ← α/ (1 + αp Tj u ); r ← r − τ j ( p Tj z ) p j ; end ALGORITHM 4: Computing r = B − k +1 z At each iteration, Algorithm 4 computes α and u , which are deﬁned in (38) andrequire matrix-vector products involving the matrices B i for 0 ≤ i ≤ k . The maindiﬃculty in computing α and u is the computation of matrix-vector products withthe matrices B i for each i . Note that if we are able to form u j +2 = B j s j , thenwe are able to compute all other terms that use B j s j in (38). In what follows, weshow how to compute u j +2 without storing B i for 0 ≤ i ≤ k . This idea is basedon [18, Procedure 7.6].We begin by writing C k +1) in (39) as follows: C k +1) = B + k (cid:88) j =0 α j u j u T j + α j +1 u j +1 u T j +1 + α j +2 u j +2 u T j +2 , where α i and u i are deﬁned by (38). Since B j = C j , then u j +2 = B j s j = (cid:32) B + j − (cid:88) i =0 α i u i u T i + α i +1 u i +1 u T i +1 + α i +2 u i +2 u T i +2 (cid:33) s j = B s j + j − (cid:88) i =0 α i ( u T i s j ) u i + α i +1 ( u T i +1 s j ) u i +1 + α i +2 ( u T i +2 s j ) u i +2 . All the terms in the above summation only involve vectors that have been previouslycomputed. Having computed u j +2 , it is then possible to compute α j +1 , α j +2 , and u j +1 . (The other terms α j and u j do not depend on B j s j .) The followingalgorithm computes the terms α and u that are used in Algorithm 4. for j = 0 , . . . , ku j ← y j ; α j ← / ( s Tj y j ); u j +2 ← B s j + (cid:80) j − i =0 (cid:2) α i ( u T i s j ) u i + α i +1 ( u T i +1 s j ) u i +1 + α i +2 ( u T i +2 s j ) u i +2 (cid:3) ; α j +2 ← − / ( s Tj u j +2 ); u j +1 ← α j y j + α j +2 u j +2 ; α j +1 = − φ/α j +2 ; endALGORITHM 5: Unrolling the limited-memory Broyden convex class formulaAssuming s Tj y j is precomputed at each step, Algorithm 5 requires k (cid:88) j =0 (cid:8) j + 1 (cid:9) = 3( k + 1) k k + 1 = ( k + 1)(3 k + 2)2vector inner products and k (cid:88) j =0 { n + (5 n + 3) j + 1 + 3 n + 1 } = (5 n + 3)( k + 2)( k + 1)2additional ﬂops. With the α ’s and the u ’s computed in Algorithm 5, the rest ofAlgorithm 4 requires k +1) − (cid:88) j =0 { j + 1 + 1 } = 3( k + 1)(3( k + 1) + 5)2vector inner products and k +1) − (cid:88) j =0 { n + (2 n + 1) l + 3 + (2 n + 1) } = 3( k + 1)(3( k + 1) + 1)(2 n + 1)2+3( k + 1)(3 n + 4)additional ﬂops. Thus, the total number of vector inner products for Algorithm 4is 6( k + 1) + 7( k + 1) and the total ﬂop count is12 ( k + 1)(23 kn + 52 n + 12 k + 42) + (2 n − k + 1) + 7( k + 1)) , which includes 6( k + 1) + 7( k + 1) vector inner products. Appendix C. Computing products with H k +1 recursively The inverse of B k +1 in (2) is given by H k +1 = H k + 1 s Tk y k s k s Tk − y Tk H k y k H k y k y Tk H k + Φ k ( y Tk H k y k ) v k v Tk , N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 21 where v k = s k y Tk s k − H k y k y Tk H k y k and Φ k = (1 − φ )( y Tk s k ) (1 − φ )( y Tk s k ) + φ ( y Tk H k y k )( s Tk B k s k ) , and H k (cid:52) = B − k . Thus, the solution to B k +1 r = z can be obtained from B − k +1 z = (cid:32) H z + k (cid:88) i =0 s Ti y i s i s Ti − y Ti H i y i H i y i y Ti H i + Φ i ( y Ti H i y i ) v Ti v i (cid:33) z = H z + k (cid:88) i =0 s Ti zs Ti y i s i − y Tk H i zy Ti H i y i H i y i + Φ i ( y Ti H i y i )( v Ti z ) v i . In order to avoid storing H , . . . , H k for matrix-vector products, we make use ofthe following variables:(45) α i = ( s Ti y i ) − , u i = s i ,α i +2 = − ( y Ti H i y i ) − u i +2 = H i y i ,α i +1 = Φ i ( y Ti H i y i ) , u i +1 = s i y Ti s i − H i y i y Ti H i s i . With these deﬁnitions, we have that(46) B − k +1 z = H z + k (cid:88) i =0 α i ( u T i z ) u i + α i ( u T i +1 z ) u i +1 + α i ( u T i +2 z ) u i +2 , and, thus, a recursion relation can be used for matrix-vector products with H , . . . , H k .Algorithm 6 details how to solve for r in (1) using the expression for B − k +1 in (46)without explicitly storing H , . . . , H k . r ← H z ; for j = 0 , . . . , k + 1) − if mod ( j,

3) = 0 i ← j/ u ← s i ; α ← ( y Ti s i ) − ; else if mod ( j,

3) = 1 i ← ( j − / u ← H i y i ; α ← − ( y Ti H i y i ) − ; else i ← ( j − / u ← ( s i ) / ( y Ti s i ) − ( H i y i ) / ( y Ti H i y i );Φ ← (1 − φ )( y Ti s i ) / (cid:0) (1 − φ )( y Ti s i ) + φ ( y Ti H i y i )( s Ti B i s i ) (cid:1) ; α ← Φ( y Ti H i y i ); end r ← r + α ( u T z ) u ; end ALGORITHM 6: Computing r = H k +1 z The following algorithm computes products involving the matrices H i for 0 ≤ i ≤ k for use in Algorithm 6. The algorithm avoids storing any matrices, and matrix-vector products are computed recursively. The derivation of this algorithmis similar to that for Algorithm 5. for j = 0 , . . . , k ˆ u j ← s j ;ˆ α j ← / ( s Tj y j );ˆ u j +1 ← H y j + (cid:80) j − i =0 (cid:2) ˆ α i (ˆ u T i y j )ˆ u i + ˆ α i +1 (ˆ u T i +1 y j )ˆ u i +1 + ˆ α i +2 (ˆ u T i +2 y j )ˆ u i +2 (cid:3) ;ˆ α j +1 ← − / ( y Tj ˆ u j +2 );ˆ u j +2 ← ˆ α j s j + ˆ α j +2 ˆ u j +2 ;Compute α j +2 using Algorithm 5;Φ j ← (1 − φ ) / (1 − φ + φ ˆ α j / (ˆ α j +1 α j +2 ));ˆ α j +2 = − Φ j / ˆ α j +1 ; endALGORITHM 7: Unrolling the limited-memory Broyden convex class formulaAssuming s Tj y j is precomputed at each step, Algorithm 7 requires k (cid:88) j =0 { j + 1 } + ( k + 1)(3 k + 2)2 = (3 k + 2)( k + 1)vector inner products and k (cid:88) j =0 { n + 3) j + 2 n + 1 + 3 n + 8 + 1 } + 3( k + 1)(3( k + 1) + 5)2= (5 n + 3)( k + 1) k + (10 n + 14)( k + 1)additional ﬂops. With the α ’s and the u ’s computed in Algorithm 7, the restof Algorithm 6 requires 3( k + 1) vector inner products and 3( k + 1)(2 n + 1) + n additional ﬂops. Thus, the total ﬂop count for Algorithm 6 is(5 n + 3)( k + 1) k + ( k + 1)(16 n + 17) + n + (2 n − k + 5)( k + 1) , which includes (3 k + 5)( k + 1) vector inner products. Appendix D. Relationships between updates

We note that in the case of the

BFGS update ( φ = 0), the compact representationof the BFGS matrix is consistent with its known compact representation derivedin [5]. In particular, ˜ M k in (11) simpliﬁes to˜ M k = ( − M − k − Ψ Tk H Ψ k ) − = (cid:20) − R k − D k − R Tk − D k − D k − Y Tk H Y k (cid:21) − = (cid:20) ¯ R − Tk ( D k + Y Tk H Y k ) ¯ R − k − ¯ R − Tk − ¯ R − k (cid:21) , where ¯ R k = R k + D k . Thus, the compact representation for the inverse of a BFGS matrix is given by(47) H k +1 = H + [ S k H Y k ] (cid:20) ¯ R − Tk ( D k + Y Tk H Y k ) ¯ R − k − ¯ R − Tk − ¯ R − k (cid:21) (cid:20) S Tk Y Tk H (cid:21) , N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 23 which is equivalent to that found in [5, Equation (2.6)].When φ = 1, then B k +1 in (2) is known as the Davidon-Fletcher-Powell ( DFP )update, which preceded the

BFGS update. The

BFGS and

DFP formulas are knownto be duals of each other, meaning one update can be obtained from the other byinterchanging s k with y k and B k with H k . Here, we demonstrate explicitly thatthe compact representation of the BFGS and

DFP updates are also duals of eachother. In addition, we show that the compact representation of the SR1 matrix isself-dual.Consider the compact formulation for the inverse of a

BFGS matrix (i.e., φ = 0)in (47). This is equivalent to H k +1 = H + [ H Y k S k ] (cid:20) − ¯ R − k − ¯ R − Tk ¯ R − Tk ( D k + Y Tk H Y k ) ¯ R − k (cid:21) (cid:20) Y Tk H S Tk (cid:21) = H − [ H Y k S k ] (cid:20) Y Tk H Y k + D k ¯ R Tk ¯ R k (cid:21) − (cid:20) Y Tk H S Tk (cid:21) . Now we consider interchanging B with H and Y k with S k . Notice that if S k and Y k are interchanged then the upper triangular part of S Tk Y k corresponds to thelower triangular part of Y Tk S k , implying that L k and R k must also be interchanged.Putting this all together yields: B k +1 = B − [ B S k Y k ] (cid:20) S Tk B S k + D k ( L k + D k ) T L k + D k (cid:21) − (cid:20) S Tk B Y Tk (cid:21) , which is the compact formulation for a DFP matrix (see [8, Theorem 1]). In otherwords, the compact representations of

BFGS and

DFP are complementary updatesof each other.For the DFP update, Λ k = − D k in (7) since φ = 1. Thus, the compact formu-lation for the inverse of a DFP matrix is given by H k +1 = H + [ S k H Y k ] (cid:20) D k − R k − R Tk − Y Tk H Y k (cid:21) − (cid:20) S Tk Y Tk H (cid:21) = H − [ H Y k S k ] (cid:20) Y Tk H Y k R Tk R k − D k (cid:21) − (cid:20) Y Tk H S Tk (cid:21) . Interchanging H with B and S k with Y k (and R k with L k ) yields B k +1 = B − [ B S k Y k ] (cid:20) S Tk B S k L Tk L k − D k (cid:21) (cid:20) S Tk B Y Tk (cid:21) , which is the compact formulation of the BFGS matrix (see Eq. (2.17) in [5].Finally, it is worth noting that the inverse of compact formulation for an SR1 matrix is self-dual in the sense of Section 5.2. That is, replacing H with B and S k with Y k (and, thus, R k with L k ) in (32) yields the compact formulation for the SR1 matrix given by (31).

Appendix E. Solves with SR1 matrices.

The

SR1 update is remarkable in that it is self-dual: initializing with B − insteadof B , replacing s k with y k , and replacing y k with s k for all k in (30) results in B − k +1 (see e.g., [18]). Thus, r = B − k +1 z can be computed using recursion (see Algorithm E.1). The ﬁrst loop of Algorithm E.1 computes p i (cid:52) = s i − H i y i for i = 0 , . . . k ; theﬁnal line of Algorithm E.1 computes r = B − k +1 z = B − z + (cid:80) ki =0 ( p Ti z ) / ( p Ti y i ) p i . for i = 0 , . . . kp i = s i − H y i ; for j = 0 . . . i − p i = p i − (( p Tj y i ) / ( p Tj y j )) p j ; endend r ← H z + (cid:80) ki =0 (( p Ti z ) / ( p Ti y i )) p i ; ALGORITHM 8:

Computing r = B − k +1 z, when B k +1 is an SR1 matrixThus, in the case of

SR1 updates, solving linear systems with

SR1 matrices canbe performed using vector inner products; the cost for solving linear systems withan

SR1 matrix is the same as the cost of computing products with an

SR1 matrix,which can be done with (cid:40) k (cid:88) i =0 i (cid:41) + k + 1 = ( k + 2)( k + 1)2vector inner products and (cid:40) k (cid:88) i =0 n + (2 n + 1) i (cid:41) + 2 n + ( k + 1)(1 + n ) = ( k + 1)2 ( k (2 n + 1) + 2(3 n + 1)) + 2 n additional ﬂops. It should be noted that unlike members of the restricted Broy-den class, SR1 matrices can be indeﬁnite, and in particular, numerically singular.Methods found in [3, 10] can be used to compute the eigenvalues (and thus, thecondition number) of

SR1 matrices before performing linear solves to help avoidsolving ill-conditioned systems.

N SOLVING LARGE-SCALE LIMITED-MEMORY QUASI-NEWTON EQUATIONS 25

Appendix F. Calculations in Lemma 1.

The determinant of (12) is given by˜ α ˜ δ − ˜ β = − − Φ ( s T y )( y T H y ) − Φ (1 − Φ )( s T y ) − Φ ( s T y ) = − φs T B s ∆ s T y − − φ ∆ = 1∆ (cid:18) s T B s λ (cid:19) , where λ in (16) is deﬁned in (7). Thus, the ﬁrst entry of (15) is given by˜ δ ˜ α ˜ δ − ˜ β = − φ ( s T B s )∆ ∆ λ s T B s = − φλ . The oﬀ-diagonal elements of the right-hand side of (15) simplify as follows:˜ β ˜ α ˜ δ − ˜ β = − (cid:18) (1 − φ )( s T y )∆ (cid:19) (cid:18) ∆ (cid:18) λ s T B s (cid:19)(cid:19) = − (1 − φ ) s T y − (1 − φ ) − φ s T B s s T y = − (1 − φ )( s T y ) + φs T B s ( s T y ) − φs T B s ( s T y ) − (1 − φ ) s T y − φs T B s = s T y + φ s T B s ( s T y ) − (1 − φ ) s T y − φs T B s = s T y + φλ . Finally, the last entry of (15) can be simpliﬁed as follows:˜ α ˜ α ˜ δ − ˜ β = 1 s T y ∆ λ s T B s + (1 − φ ) y T H y λ s T B s = λ ( s T y )( s T B s ) (cid:18) ∆ + (1 − φ )( y T H y )( s T y ) (cid:19) = 1 − (1 − φ ) s T y − φs T B s (cid:18) ∆ + (1 − φ )( y T H y )( s T y ) (cid:19) = (1 − φ )( s T y ) + φ ( y T H y )( s T B s ) + (1 − φ )( y T H y )( s T y ) − (1 − φ ) s T y − φ ( s T B s )= (1 − φ )( s T y ) + φ ( s T B s )( s T y ) − (1 − φ ) s T y − φ ( s T B s ) − φ ( s T B s )( s T y ) − (1 − φ ) s T y − φ ( s T B s )+ φ ( y T H y )( s T B s ) + (1 − φ )( y T H y ) s T y − (1 − φ ) s T y − φ ( s T B s )= − s T y − φλ − y T H y . References [1]

J. Brust, O. Burdakov, J. B. Erway, R. F. Marcia, and Y.-x. Yuan , Shape-changingL-SR1 trust-region methods . Submitted. [2]

J. Brust, J. B. Erway, and R. F. Marcia , On solving L-SR1 trust-region subproblems .[3]

O. Burdakov, L. Gong, Y.-X. Yuan, and S. Zikrin , On eﬃciently combining limitedmemory and trust-region techniques , Tech. Rep. 2013:13, Link¨oping University, The Instituteof Technology, 2013.[4]

R. H. Byrd, D. C. Liu, and J. Nocedal , On the behavior of broyden’s class of quasi-newtonmethods , SIAM Journal on Optimization, 2 (1992), pp. 533–557.[5]

R. H. Byrd, J. Nocedal, and R. B. Schnabel , Representations of quasi-Newton matricesand their use in limited-memory methods , Math. Program., 63 (1994), pp. 129–156.[6]

A. R. Conn, N. I. M. Gould, and P. L. Toint , Trust-Region Methods , Society for Industrialand Applied Mathematics (SIAM), Philadelphia, PA, 2000.[7]

J. E. Dennis, Jr and J. J. Mor´e , Quasi-newton methods, motivation and theory , SIAMReview, 19 (1977), pp. 46–89.[8]

J. B. Erway, V. Jain, and R. F. Marcia , Shifted limited-memory DFP systems , in 2013Asilomar Conference on Signals, Systems and Computers, Nov 2013, pp. 1033–1037.[9]

J. B. Erway and R. F. Marcia , Limited-memory BFGS systems with diagonal updates ,Linear Algebra and its Applications, 437 (2012), pp. 333–344.[10]

J. B. Erway and R. F. Marcia , On eﬃciently computing the eigenvalues of limited-memoryquasi-newton matrices , SIAM Journal on Matrix Analysis and Applications, 36 (2015),pp. 1338–1359.[11]

G. H. Golub and C. F. Van Loan , Matrix Computations , The Johns Hopkins UniversityPress, Baltimore, Maryland, third ed., 1996.[12]

I. Griva, S. G. Nash, and A. Sofer , Linear and nonlinear programming , Society for Indus-trial and Applied Mathematics, Philadelphia, 2009.[13]

W. Huang, K. A. Gallivan, and P.-A. Absil , A broyden class of quasi-newton methodsfor riemannian optimization , SIAM Journal on Optimization, 25 (2015), pp. 1660–1685.[14]

C. Liu and S. A. Vander Wiel , Statistical quasi-newton: A new look at least change , SIAMJournal on Optimization, 18 (2007), pp. 1266–1285.[15]

K. S. Miller , On the Inverse of the Sum of Matrices , Mathematics Magazine, 54 (1981),pp. 67–72.[16]

J. L. Morales and J. Nocedal , Automatic preconditioning by limited memory quasi-newtonupdating , SIAM Journal on Optimization, 10 (2000), pp. 1079–1096.[17]

J. Nocedal , Updating quasi-Newton matrices with limited storage , Mathematics of Compu-tation, 35 (1980), pp. 773–782.[18]

J. Nocedal and S. J. Wright , Numerical Optimization , Springer-Verlag, New York, sec-ond ed., 2006.[19]

D. P. O’Leary and A. Yeremin , The linear algebra of block quasi-newton algorithms , LinearAlgebra and Its Applications, 212 (1994), pp. 153–168.[20]

Y. Zhang and R. Tewarson , Quasi-newton algorithms with updates from the preconvex partof broyden’s family , IMA Journal of Numerical Analysis, 8 (1988), pp. 487–509.

E-mail address : [email protected] Department of Mathematics, PO Box 7388, Wake Forest University, Winston-Salem,NC 27109

E-mail address : [email protected]@ucmerced.edu