Infinite GMRES for parameterized linear systems
IINFINITE GMRES FOR PARAMETERIZED LINEAR SYSTEMS
ELIAS JARLEBRING, SIOBH ´AN CORRENTY ∗ Abstract.
We consider linear parameter-dependent systems A ( µ ) x ( µ ) = b for many different µ , where A is large and sparse, and depends nonlinearly on µ . Solving such systems individuallyfor each µ would require great computational effort. In this work we propose to compute a partialparameterization ˜ x ≈ x ( µ ) where ˜ x ( µ ) is cheap to compute for many different µ . Our methods arebased on the observation that a companion linearization can be formed where the dependence on µ isonly linear. In particular, we develop methods which combine the well-established Krylov subspacemethod for linear systems, GMRES, with algorithms for nonlinear eigenvalue problems (NEPs) togenerate a basis for the Krylov subspace. Within this new approach, the basis matrix is constructedin three different ways. We show convergence factor bounds obtained similarly to those for themethod GMRES for linear systems. More specifically, a bound is obtained based on the magnitudeof the parameter µ and the spectrum of the linear companion matrix, which corresponds to thereciprocal solutions to the corresponding NEP. Numerical experiments illustrate the competitivenessof our methods for large-scale problems. Key words. parameter-dependent linear systems, Krylov methods, shifted linear systems,infinite Arnoldi, low-rank
1. Introduction.
We are interested in the numerical solution of the large com-plex linear system A ( µ ) x ( µ ) = b, (1.1)with A ( µ ) ∈ C n × n analytic, A ( ) nonsingular and b ∈ C n , for many possible values of µ ∈ C /{ } .Solving large linear systems of equations in an efficient manner is a topic of greatimportance in most scientific fields involving computation. Linear systems often arisefrom the application of the finite element method (FEM) to a partial differential equa-tion (PDE), leading to a large and sparse matrix A . Since a finer FEM-discretizationleads to higher accuracy and better modelling possibilities but also to larger systems,improving efficiency, accuracy and robustness of methods is critical [8].We consider problems where there is a nonlinear dependence on a parameter, yetwe are interested in obtaining solutions for many different µ simultaneously. Suchproblems arise naturally in many situations; for example in the study of PDEs withuncertainty, a parameter-dependent system which we will consider is the Helmholtzequation with a parameterized material coefficient. Since the parameter µ is notknown in advance, the problem can be seen as an uncertainty quantification problemin the sense of [19]. Problems of this type can also appear in the context of modelreduction, where µ is usually the Laplace variable. We present an example in thisdirection in Section 5.1. See [5] and [6] for samples of applications and literature onmodel order reduction. In contrast to many model reduction techniques, our settingdoes not lead to a parameterization which is of the same structure as the originalproblem, but we aim to obtain any computationally cheap parameterization of thesolution vector. This freedom allows us to completely generalize iterative methods forlinear systems.The new methods we present are based on the well-established GMRES methodfor linear systems [25]. The main idea of our approach can be summarized as follows. ∗ Department of Mathematics, Royal Institute of Technology (KTH), Stockholm, SeRCSwedish e-Science Research Center, Lindstedtsv¨agen 25, SE-100 44 Stockholm, Sweden, email: { eliasj,correnty } @kth.se a r X i v : . [ m a t h . NA ] F e b Elias Jarlebring, Siobh´an Correnty
We define a new function B ( µ ) ∶= µ A ( ) − ( A ( ) − A ( µ )) (1.2)such that (1.1) can equivalently be written as ( µB ( µ ) − I ) x ( µ ) = − A ( ) − b. (1.3)Let B N be the truncated Taylor series expansion of B : B N ( µ ) = B + B µ + B µ + ⋅ ⋅ ⋅ + N ! B N µ N , (1.4)where B i = − i + A ( ) − A ( i + ) ( ) ∈ C n × n . In order to handle the nonlinearity in (1.3),we use a technique called companion linearization which is very commonly used to an-alyze polynomial eigenvalue problems (PEPs). As we show in Theorem 2.1, equation(1.3) is equivalent to the following linear system where µ only appears linearly: ( µ B N − I ) v ( µ ) = c. (1.5)The constant matrix B N and constant vector c are defined as B N ∶= ⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ B ⋯ ⋯ ⋯ B N I I ⋮⋱ ⋮ N I ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ ∈ C ( N + ) n ×( N + ) n (1.6)and c ∶= − e ⊗ A ( ) − b ∈ C ( N + ) n , (1.7)where ⊗ denotes the Kronecker product, and e is the first unit vector. In Section 2we completely describe this equivalence, analogous to the theory of PEPs, where B N is called a companion matrix.If we apply GMRES on the linear system (1.3), we need to build the Krylovsubspace K k ( B N , c ) ∶= span { c, B N c, . . . , B k − N c } . (1.8)The same matrix B N appears frequently in the field of nonlinear eigenvalue problems(NEPs), where the same Krylov subspace (1.8) is used to construct numerical meth-ods. The Krylov subspace has a particular structure which has been heavily exploitedfor NEP methods, e.g. the infinite Arnoldi method [15] and various improvementsthereof [3, 12, 13, 14, 20]. We will now show that many of the same techniques canbe applied here. In particular, we represent B ( µ ) in a finite number of linear algebraoperations without truncation error.To our knowledge, this work contains the first result that exploits the connectionbetween linear systems and NEPs in a way that allows the generalization of iterativemethods for linear systems to parameterized linear systems. Although we have focusedon the flavors of the NEP-method infinite Arnoldi method, there are many algorithmsfor NEPs that may also lead to competitive approaches, e.g., CORK [4] or TOAR[16]. nfinite GMRES for parameterized linear systems 3 Krylov subspaces are invariant under shifts, a property previously shown to beuseful in other works on parameterized linear systems [2, 9, 17, 26, 27]. Our lineariza-tion allows us to consider just one Krylov subspace formed independently of µ bya linear combination of the power sequence associated with the matrix B N and thevector c as in (1.8). We reuse the associated basis matrix to compute ˜ x ( µ ) for allvalues of µ , solving a least squares problem for every value of µ .The first new method we propose in this paper considers an efficient way ofhandling the Krylov basis matrix as in [14] while the second is specifically designed tohandle A ( µ ) with higher order terms of reduced rank, ultimately allowing for a moreefficient construction of the Krylov basis matrix analogous to the low-rank [3]. Weprovide convergence theory and show that convergence factor bounds can be obtainedin a way similar to the linear case. Eigenvalue based bounds are derived from thesolution of NEPs.The paper is organized as follows. In the next section, we explain the lineariza-tion we used and prove that we can easily recover the solution we seek from thislinearization. In Section 3 we present our new methods, Infinite GMRES and Low-Rank Infinite GMRES. In Section 4 we show our convergence theory and in Section 5we provide numerical examples which illustrate the theory.
2. Companion Linearization.
Companion linearizations has been extensivelyused for PEPs (see, e.g., [18]) but also for linear systems [9]. We now show howcompanion linearization can be applied leading with a matrix structure that is alsopresent in the infinite Arnoldi method, where it was be used to dynamically expandthe linearization, further explained in Section 3. Let B N ( µ ) be the truncated Taylorexpansion of B ( µ ) as in (1.4) and denote D N,N + ∶= ⎡⎢⎢⎢⎢⎢⎢⎢⎣ ⋮⋱ ⋮ N ⎤⎥⎥⎥⎥⎥⎥⎥⎦ ∈ R N ×( N + ) . (2.1)The companion linearization can be explicitly expressed as follows. Theorem 2.1.
Suppose a given parameter-dependent matrix A ( µ ) is as in (1.1) .Let B N ( µ ) be a truncated expansion of B as in (1.4) . For any µ ∈ C /{ } , considerthe linear system (1.5) , where B N as in (1.6) , c as in (1.7) and v ( µ ) ∶= [ v ( µ ) , . . . , v N ( µ )] T ∈ C ( N + ) n . Let A N ( µ ) refer to the Taylor series expansion of A ( µ ) truncated after ( N + ) terms.Then, the linear system (1.5) and A N ( µ ) x N ( µ ) = b (2.2) are equivalent in the following sense:(a) Solutions to the linear system (1.5) are of the form v i ( µ ) = µ i i ! x N ( µ ) , i = , . . . , N , where x N ( µ ) ∈ C n satisfies (2.2) (b) Let x N ( µ ) ∈ C n be a solution to (2.2) , then v i ( µ ) ∶= µ i i ! x N ( µ ) is a solution to (1.5) Elias Jarlebring, Siobh´an Correnty
Proof . a) We assume (1.5) holds where v i ( µ ) = µ i i ! x N ( µ ) and we look at the firstline in this matrix vector product: − A ( ) − b = µB µ x N ( µ ) + µB µ x N ( µ ) + ⋯ + µB N µ N N ! x N ( µ ) − µ x N ( µ )= ( µB N ( µ ) − I ) x N ( µ )= ( µ ( µ A ( ) − ( A ( ) − A N ( µ ))) − I ) x N ( µ ) . Therefore, (2.2) holds.b) From (2.2) we directly conclude that ⎛⎜⎜⎜⎜⎜⎝ µ ⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ B B ⋯ ⋯ B N I I ⋮⋱ ⋮ N I ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ − ⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ I ⋱ ⋱ ⋱ I ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦⎞⎟⎟⎟⎟⎟⎠ ⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ µ x N ( µ )⋮⋮⋮ µ N N ! x N ( µ )⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ = ⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣− A ( ) − b ⋮⋮ ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ . (2.3)Block rows 2 , , . . . , N + S ∶= ⎡⎢⎢⎢⎢⎢⎣ ⋱ ⋱ ⎤⎥⎥⎥⎥⎥⎦ ∈ R N ×( N + ) , ¯ µ ∶= [ µ , µ , . . . , µ N N ! ] T ∈ R N + we have (( µD N,N + − S ) ⊗ I ) ( ¯ µ ⊗ x N ( µ )) = ( µD N,N + − S ) ¯ µ ⊗ x N ( µ ) = , since ( µD N,N + − S ) ¯ µ =
0, which corresponds to rows 2 , . . . , N + n equations in (1.5) are satisfied, we now consider the firstblock row of (2.3). The difference between the left-hand side and the right-hand sidegives us µ ( µ B + . . . + µ N N ! B N ) x ( µ ) − x ( µ ) + A ( ) − b = ( µB N ( µ ) − I ) x N ( µ ) + A ( ) − b =
3. Algorithms.3.1. GMRES for the shifted system.
As a preparation for the algorithmderivation, we consider a shifted parameter-dependent linear equation system givengenerally as ( µC − I ) x ( µ ) = b, (3.1)where C ∈ C p × p , µ ∈ C /{ } , x ( µ ) ∈ C p and b ∈ C p . GMRES for this type of shifted sys-tems is derived from the standard GMRES method for solving Cx = b [25] summarizedbelow. nfinite GMRES for parameterized linear systems 5 On the m -th iteration of GMRES, we form an Arnoldi factorization, consistingof matrices Q m ∈ C p × m , Q m + ∈ C p ×( m + ) and H m ∈ C ( m + )× m that satisfy CQ m = Q m + H m , where H m is an upper Hessenberg matrix and Q m = ( q , . . . , q m ) ∈ C p × m is a matrixwhose columns form an orthonormal basis for the Krylov subspace of dimension m associated with matrix C and vector b , defined as K m ( C, b ) ∶= span { b, Cb, . . . , C m − b } . In practice, we perform one matrix vector product, y = Cq m with q = b / ∥ b ∥ , andorthogonalize this vector against the columns of Q m by a Gram-Schmidt process.This new vector is used to form Q m + . The orthogonalization coefficients are storedin H m .Due to the shift-invariance property of Krylov subspaces, we have K m ( µC − I, b ) = K m ( C, b ) . Thus, we can directly form an Arnoldi factorization for µC − I : ( µC − I ) Q m = Q m + ( µH m − I m ) . This is essentially the relation pointed out in [2, Equation (2.4)]. Therefore, the m -thiterate of GMRES for (3.1) is found by solving the shifted least squares problem x m ( µ ) = Q m ( argmin x ∈ C m ∥( µH m − I m ) x − e ∥ b ∥∥ ) (3.2)where I m is is the identity matrix of size m × m with an extra row of zeros added atthe bottom. This least squares problem is equivalent to finding the vector x m ( µ ) ∈K m ( C, b ) which minimizes the residual of (3.1). Thus, solving (3.1) for a set µ ={ µ i } ji = reduces to constructing one Arnoldi factorization for the entire set, followedby a least squares problem for each µ i .The overdetetermined (3.2) is computationally cheap since it is small in compari-son to the size of the original problem. In practice (3.2) can be computed using Givensrotators as the matrix is a Hessenberg matrix, as pointed out e.g. in [25]. We leaveout the Givens rotator in our algorithms below since other parts of the algorithm arecomputationally dominating, although it could also be used here. The Arnoldi methodis efficient only if we can compute the corresponding matrix vector products efficiently.We consider a special representation of this product using the linearization in (1.5).This is completely analogous as the infinite Arnoldi method [15], and the proofsin this section are omitted for brevity. Denote the i -th column of matrix X by x i ∈ C n and the operation of stacking the columns of a matrix into a vector byvec ( X ) ∶= ( x T , x T , . . . , x TN + ) T . Lemma 3.1.
Let B N be as in (1.6) . For any X ∈ C n ×( N + ) , B N vec ( X ) = vec ( ˜ x, XD N + ,N ) where ˜ x = − A ( ) − ( N + ∑ i = i A ( i ) ( ) x i ) . Elias Jarlebring, Siobh´an Correnty
We note that since A ( ) − does not change, we can compute the LU factorizationbefore beginning the algorithm. In this way, the linear system can be solved efficientlyin every iteration.As we mentioned in section 3.1, the method GMRES forms the Arnoldi factoriza-tion with the right-hand side vector. In the case of solving (1.5), the right-hand sidevector (1.7) has only n non-zero entries, located in the first block. This correspondingmatrix vector product can be represented in a specific way. We consider the followingtheorem. Theorem 3.2.
Let B N be as in (1.6) . Suppose X = ( ˆ X, , . . . , ) ∈ C n ×( N + ) and ˆ X ∈ C n × k , k < N . Then, B N vec ( X ) = vec ( ˜ x, ˆ XD k,k , , . . . , ) , where ˜ x = − A ( ) − ( k ∑ i = i A ( i ) ( ) x i ) . (3.3)A direct result of Theorem 3.2 is the structure of the resulting vector is changed onlyby an expanding the number of non-zero entries in the first block and the number offloating point operations to compute B N vec ( ˆ X, , ..., ) is independent of the numberof zero elements. Hence, we can take the product of a vector with an infinite tail ofzeros with an infinite companion matrix in a finite number of linear algebra operations,thus representing the Taylor series (1.4) without any truncation error.Let X , . . . , X m be the matrix version of the vectors after m Arnoldi iterations.We note that the tailing zeros of the new vector B N vec ( X m ) are preserved afterorthogonalization against X , . . . , X m . As Arnoldi consists of just these operations,the method described above is suitable. Based on the results pre-sented in Section 3.2 we can directly state a generalization of GMRES, summarizedin Algorithm 1. This algorithm consists of an Arnoldi factorization, where the matrixvector products are performed as in Section 3.2. The resulting vectors are orthogo-nalized by a Gram-Schmidt process. The approximate solution is the result of a leastsquares problem as described in (3.2). The error from this method thus comes entirelyfrom GMRES, i.e. we extend the expansion to infinity without approximation error.We refer to this method as infinite GMRES. The complete algorithm is included inAlgorithm 1.
The infinite Arnoldi method as presentedin [15] has been extended and improved in various ways. We illustrate how the low-rank exploitation [3] can be adapted to this setting. Suppose now that A ( µ ) ∈ C n × n analytic and the higher order terms are of reduced rank, i.e. A ( µ ) = s ∑ i = i ! A i µ i + ∞ ∑ i = s + i ! U i V T µ i , (3.4)where A i = A ( i ) ( ) ∈ C n × n , U i = U F ( i ) ( ) ∈ C n × p , U, V ∈ C n × p , and F ( i ) ( ) ∈ C p × p . nfinite GMRES for parameterized linear systems 7 We can approximate the solution to (1.1) using our linearization as in (1.5),exploiting this structure in particular.
Corollary 3.3.
Suppose the higher order terms of B ( µ ) as in (1.2) are ofreduced rank and denote B i = − i + A − ( U i + V T ) = ˜ U i V T , i = s, s + , . . . . If (1.5) holds, then ( µ ˜ B N − I ) ˜ v ( µ ) = c, (3.5) where ˜ B N ∶= ⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ B ⋯ ⋯ B s − ˜ U s ⋯ ⋯ ˜ U N I ⋱ s − I s V T s + I p ⋱ N I p ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ (3.6) and ˜ v ( µ ) ∶= [ µ x ( µ ) , ⋯ , µ s − ( s − ) ! x ( µ ) , µ s s ! V T x ( µ ) , ⋯ , µ N N ! V T x ( µ )] T . Proof . The equation (3.5) is obtained directly by multiplying (1.5) from the leftwith the block diagonal matrix ⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ I ⋱ I V T ⋱ V T ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ . To show the converse is established by noting that the first block row in (3.5) multi-plied by A ( ) − , is the equation (2.2).In every loop of Algorithm 2, we must compute a new matrix vector product to beused in the expansion of Q m and H m . In theory, we must multiply the nonzero blockof the vector q m by the relevant parts of ˜ B N in (3.6) and orthogonalize the resultingnonzero block against ( q , . . . , q m ) . In practice, since we do not store an infinite tailof zeros, we compute the new vector q m + as follows.Let y = vec ( ˜ x, X D s − ,s − , s V T X , X ˜ D s + ,m ) , (3.7)where(3.8) ˜ x = −( A ) − ( s ∑ i = i A i x i + m ∑ i = s + i U i x i ) , Elias Jarlebring, Siobh´an Correnty A i and U i according to (3.4),vec ( X ) = q m ( ∶ ( s − ) n ) , (3.9a) X = q m (( s − ) n + ∶ sn ) , (3.9b) vec ( X ) = q m ( sn + ∶ sn + p ( m − s )) , (3.9c)with x i = ⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩ X i ∈ C n ×( s − ) i = , . . . , s − X ∈ C n × i = sX i − s ∈ C p ×( m − s ) i = s + , s + , . . . , (3.10) ˜ D i,j ∶= ⎡⎢⎢⎢⎢⎢⎢⎢⎣ i ⋱ ⋱ j ⎤⎥⎥⎥⎥⎥⎥⎥⎦ ∈ R ( j − i + )×( j − i + ) (3.11)and D s − ,s − ∈ R s − × s − according to (2.1). A (possibly repeated) Gram-Schmidtprocedure follows to orthogonalize y against { q , . . . , q m } . The normalized version ofthis vector becomes q m + .We note that since A − does not change, we can compute the LU factorizationbefore beginning the algorithm. In this way, the linear system can be solved efficientlyin every iteration.The complete algorithm for low-rank infinite GMRES is included in Algorithm 2.An advantage of this method is that Q m grows only by p rows after the first s itera-tions, in contrast to Algorithm 1 where Q m grows by n rows on each iteration. In [14], a new method called tensor infiniteArnoldi (TIAR) was proposed. This method provided an equivalent factorization asin infinite Arnoldi, but was improved in terms of memory and computation time. Wepropose to use this way of generating the Arnoldi factorization within Algorithm 1.Algorithm 1 and infinite Arnoldi generate the same basis matrix for the Krylovsubspace. Directly applying [14, Lemma 3.1], we can generate a tensor representationof the basis matrix instead of explicitly forming it, then convert all necessary oper-ations to work on this factorization. This is the method TIAR. We propose a newmethod which uses TIAR to form the Arnoldi factorization within Algorithm 1, whichwe call Tensor infinite GMRES. This method is more efficient than Algorithm 1 inpractice and leads to an equivalent solution.
4. Convergence theory.
In the following we present a convergence characteri-zation of Algorithm 1. The tensor variant in Section 3.3.2 is equivalent to Algorithm 1and convergence results for the low-rank version Algorithm 2 can be derived analo-gously.More precisely, we will now show that the convergence of Algorithm 1 can bedescribed by the magnitude of the parameter µ and the smallest solutions to anassociated NEP. Solutions to the NEP are given generally as λ ∈ C s.t. A ( λ ) v = A ( µ ) ∈ C n × n and v ∈ C n /{ } . Essentially, the reciprocal eigenvalues of the NEP playthe same role eigenvalues play for the convergence of standard GMRES. nfinite GMRES for parameterized linear systems 9 Algorithm 1:
Infinite GMRES input : j the desired dimension of the Krylov subspace, A i , i = , . . . , j Taylor expansion coefficients of A , b ∈ C n , µ ∈ C /{ } output: Approximate solution x j ( µ ) of A ( µ ) x ( µ ) = b ˆ c = − A − b ∈ C n Q = ˆ c /∣∣ ˆ c ∣∣ H = empty matrix for m = , , . . . , j do Let vec ( X ) = q m ∈ C nm Compute ˜ x according to (3.3) Compute y = vec ( ˜ x, XD m + ,m ) according to (2.1) Expand Q m by n rows of zeros Orthogonalize y against q , . . . , q m by a Gram-Schmidt process: h m = Q Tm yy ⊥ = y − Q m h m Possibly repeat Step 9 Compute β m = ∣∣ y ⊥ ∣∣ Let q m + = y ⊥ / β m Expand Q m into Q m + = [ Q m , q m + ] Let H m = [ H m − h m x β m ] ∈ C ( m + )× m end Return function handle (Theorem 2.1) x j ( µ ) ∶ µ ↦ Q j ( ∶ n, ∶) argmin v ∈ C j ∥( µH j − I j ) v − e ∥ c ∥∥ where I j is as in (3.2)GMRES is a Krylov subspace method with a finite termination for which theresidual vectors satisfy the following for A ∈ C n × n nonsingular, b ∈ C n , ∥ r k ∥ = min x ∈K k ( A,b ) ∥ Ax − b ∥ = min p ∈P k ∥ p ( A ) b ∥ , (4.1)where P k = { polynomials p of degree ≤ k with p ( ) = } . Various specializations of the min-max bound (4.1) can be found in the litera-ture, see e.g., the summary [28]. We specialize a bound based on the j + A , where j is a free parameter, typically the number of out-lier eigenvalues. When j =
0, this leads to the standard bound involving the largesteigenvalue.We need the following theorem, which is a generalization of Gelfand’s Theorem . Theorem 4.1.
Suppose A ∈ C n × n . Assume γ , . . . , γ j are the j largest eigenvaluesand that they are semi-simple. Moreover, assume ∣ γ i ∣ < min (∣ γ ∣ , . . . , ∣ γ j ∣) for i = j + , . . . , n . For any matrix norm ∣∣ ⋅ ∣∣ , ρ ( A ) = lim k →∞ ∣∣ A k ∣∣ / k . Algorithm 2:
Low-Rank Infinite GMRES input : j the desired dimension of the Krylov subspace, A i , i = , . . . , s , U i , i = s + , . . . , j and V T as in (3.4), b ∈ C n , µ ∈ C /{ } output: Approximate solution x j ( µ ) of A ( µ ) x ( µ ) = b ˆ c = − A − b ∈ C n Q = ˆ c /∣∣ ˆ c ∣∣ H = empty matrix for m = , , . . . , j do Compute y using (3.7) – (3.11) Expand Q m by: n rows of zeros if m < sp rows of zeros otherwise Orthogonalize y against q , . . . , q m by a Gram-Schmidt process: h m = Q Tm yy ⊥ = y − Q m h m Possibly repeat Step 7 Compute β m = ∣∣ y ⊥ ∣∣ Let q m + = y ⊥ / β m Expand Q m into Q m + = [ Q m , q m + ] Let H m = [ H m − h m β m ] ∈ C ( m + )× m end Return function handle (Corollary 3.3) x j ∶ µ ↦ Q j ( ∶ n, ∶) argmin v ∈ C j ∥( µH j − I j ) v − e ∥ c ∥∥ where I j is in (3.2) Then, lim k —→ ∞ ∣∣( A − γ I )( A − γ I ) . . . ( A − γ j I ) A k ∣∣ k = ∣ γ j + ∣ . Although Gelfand’s theorem is well-known in many variations, we have not foundthis particular variant in the literature and therefore provide a proof of Theorem 4.1in Appendix A.Let j ∈ N + be as in Theorem 4.1 for B N as in (1.6). We define q ( z ) ∶= (∏ ji = ( z + − µγ i )) ( z + ) k − j ∏ ji = ( − µγ i ) , which can be viewed as a generalization of the Zarantonello polynomial [24, pg 201].With this specific q , the bound (4.1) can be simplified if we use the matrix µ B N − I as in (1.5) and the right-hand side vector c as in (1.7). We obtain the bound ∥ r k ∥ ≤ ∥ q ( µ B N − I ) c ∥ ≤ XXXXXXXXXXX (∏ ji = ( µ B N − µγ i )) ( µ B N ) k − j ∏ ji = ( − µγ i ) XXXXXXXXXXX ⋅ ∥ c ∥ , nfinite GMRES for parameterized linear systems 11 and therefore also ∥ r k ∥ k ≤ ∥ µ k ∥ k ∥∏ ji = ( − µγ i )∥ k ⋅ ∥( j ∏ i = ( B N − γ i )) ( B N ) k ∥ k ⋅ ∥ B − jN ∥ k ⋅ ∥ c ∥ k . Using Theorem 4.1, we have ∥ r k ∥ k ≤ ∣ µ ∣∣ γ j + ∣ as k → ∞ , and consequently ∥ r k ∥ ≤ (∣ µ ∣∣ γ j + ∣) k , (4.2)for sufficiently large k .From equation (4.2) we conclude that the convergence factor bound is propor-tional to both µ and γ j + . The value of γ j + can be further interpreted as follows.The B N -matrix is a companion matrix with eigenvalues γ i , and its reciprocal eigen-values are solutions to a PEP. More specifically, γ i = / λ i where λ i are s.t. − A ( ) − A N ( λ i ) v i = , (4.3)where v i ∈ C n /{ } and A N represents a truncated Taylor series expansion of A . Underthe assumption that the eigenvalues converge (as a function of the truncation) thevalues λ , . . . , λ j will converge to the j smallest eigenvalues of the nonlinear eigenvalueproblem. Therefore, the convergence of Algorithm 1 for (1.5) is determined by thelargest reciprocal eigenvalues of the nonlinear eigenvalue problem, in the same waythat the largest eigenvalues of a matrix describe the convergence of GMRES.
5. Simulations.5.1. Time-delay system.
As a first illustration we consider the dynamical sys-tem with delays described by˙ x ( t ) = A x ( t ) + A x ( t − τ ) − bu ( t ) (5.1a) y ( t ) = C T x ( t ) , (5.1b)where A , A ∈ C n × n . For simplicity we assume that the entire state is the output, i.e., C = I ∈ C n × n . The vector b ∈ C n is the external force, x ( t ) ∈ C n is the state vector, u ( t ) is the input, y ( t ) is the output and τ > τ =
1. In the context of systems and control this is usually referred as atime-delay system, see standard references for time-delay systems, e.g. [10, 22, 23].The frequency domain formulation of (5.1) relates the input and the output asfollows ˆ y ( ω ) = H ( iω ) ˆ u ( ω ) where H ( s ) = (− sI + A + A e − s ) − b. (5.2)The matrix H ( s ) is called the transfer function and can be obtained by applying theLaplace transform to the state equation under the condition x ( ) = . Note that (5.2) is a parameterized linear system of the form (1.1). We use our approach to evaluatethe transfer function for many s -values.The eigenvalues λ i of the PEP given in (4.3) are the reciprocals of the eigenvalues γ i of the constant matrix B N coming from the linearization of A ( µ ) (see Section 1).Figure 5.1a shows the solutions to this PEP and Figure 5.1b shows the eigenvaluesof B N , plotted in the complex plane along with a circle centered around origin withradius ∣ γ j + ∣ , where j = γ j + is the non-outlier eigenvalue of largest magnitude,as described in Theorem 4.1. A result of Cauchy’s residue theorem and the principalof argument guarantees that within such a compact set, we have only a finite numberof solutions to the nonlinear eigenvalue problem, unless µ = ∣ γ j + ∣ and ∣ µ ∣ with j =
4, which illustrates the bound (4.2). In particular,in Figure 5.2b we see the residual is under the predicted bound after a certain numberof iterations k << N .In Figure 5.3a we show a plot of µ vs iterations to achieve a residual below 10 − when evaluating (5.2). We see that iterations required for convergence increases with µ . Figure 5.3b shows the observed and predicted convergence factors when evaluating(5.2) for different choices of µ . The predicted convergence factor was calculated usingthe bound (4.2) with j = k which was least impressive,representing a worst case scenario. More precisely, we plot µ vs ρ where ρ = sup k ∥ r k + ∥∥ r k ∥ . (5.3) − −
202 real i m ag (a) Solutions to the PEP (4.3) for (5.2) − − − −
202 real i m ag (b) Spectrum of the companion matrix B N for (5.2) Figure 5.1:
Solutions λ i to the PEP and spectrum γ i = / λ i of B N , n = The low-rank structure described inSection 3.3.1, arises naturally from artificial boundary conditions similar to how thestructure arise for NEPs [3]. We illustrate this with the following boundary value nfinite GMRES for parameterized linear systems 13 − − − − − iteration k r e l a t i v e r e s i du a l ObservedPredicted (a) µ = . − − − − − iteration k r e l a t i v e r e s i du a l ObservedPredicted (b) µ = . Figure 5.2:
Iterations vs norm of relative residual for evaluating (5.2) withAlgorithm 1, n = ⋅ − ⋅ − ⋅ − ⋅ − . µ i t e r a t i o n s (a) Iterations to achieve a relativeresidual below 10 − − − − − − − − − µ ρ ObservedPredicted (b) µ vs observed convergence factor ρ as in (5.3) and predicted convergencefactor as in (4.2) Figure 5.3:
Convergence of Algorithm 1 for evaluating (5.2) for different values of µ , n = ( − ∂ ∂x + ( + µk ( x )) + β ( x )) u ( x ) = h ( x ) u ( a ) = u ( c ) = where k ( x ) = ⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩ + ( xb ) sin ( απb x ) x ∈ [ , b ) + ( − xb ) sin ( απb x ) x ∈ [ b , b ) x ≥ b,β ( x ) = ⎧⎪⎪⎨⎪⎪⎩ sin ( x − ab − a π ) x ∈ [ a, b ) x ≥ b,h ( x ) = ⎧⎪⎪⎨⎪⎪⎩ e − . x x ∈ [ a, b ) x ≥ b. Plots of k ( x ) and h ( x ) follow in Figure 5.4a and Figure 5.4b respectively. We trans-form the problem on the interval [ b, c ] : ddx [ u ( x ) u ′ ( x )] = [ ( + µk ( x )) + β ( x ) ] [ u ( x ) u ′ ( x )] + [ h ( x )] . Since k ( x ) ≡ β ( x ) ≡ h ( x ) ≡ [ b, c ] , we can use the matrix exponential tosolve the following differential equation on this interval, i.e. [ u ( x ) u ′ ( x )] = exp (( x − b ) [ ( + µ ) ]) [ u ( b ) u ′ ( b )] . The boundary condition at u ( c ) can be imposed as0 = u ( c ) = [ ] [ u ( c ) u ′ ( c )] = [ ] exp (( c − b ) [ ( + µ ) ]) [ u ( b ) u ′ ( b )] . We can solve the above matrix exponential using the hyperbolic functions formula foran antidiagonal two-by-two matrix. Thus, we obtain the relation0 = g ( µ ) u ( b ) + f ( µ ) u ′ ( b ) , (5.5)where g ( µ ) ∶= cosh (( c − b )√( + µ ) ) f ( µ ) ∶= sinh (( c − b )√( + µ ) ) √( + µ ) . We note that a solution to the original boundary value problem will also satisfy(5.5). With this in mind, we split the domain into two parts and solve for u ( x ) on [ a, b ] , i.e. (− ∂ ∂x + ( + µk ( x )) + β ( x )) u ( x ) = h ( x ) (5.6a) u ( a ) = g ( µ ) u ( b ) + f ( µ ) u ′ ( b ) = , (5.6c)a boundary value problem on the reduced domain with a Robin boundary condition at x = b . This is essentially a Dirichlet-to-Neumann map absorbing boundary condition, nfinite GMRES for parameterized linear systems 15 . . . . x k ( x ) (a) Potential function k ( x ) , x ∈ [ , ] . . . . x h ( x ) (b) Right-hand side function h ( x ) , x ∈ [ , ] Figure 5.4:
Functions for the boundary value problem (5.6)where the additional parameter appears in the operator, in this case as a scalarcoefficient in the boundary condition. See [7, 11] for literature on artificial boundaryconditions. In the examples and plots which follow, we have used [ a, b ) = [ , ) , [ b, c ] = [ , ] and α = x k = k ∆ x , k = , . . . , n , and ∆ x = / n with x = ∆ x and x n = b . To approximate the Robin boundary condition at x = b , weuse a one-sided second order difference scheme, i.e.0 = g ( µ ) u ( b ) + f ( µ ) x ( u ( b ) − u ( x n − ) + u ( x n − )) + O( ∆ x ) . Thus, the discretized boundary value problem can be expressed as A n ( µ ) u n ( µ ) = h n , where A n ( µ ) = D n + K n ( µ ) + L n + X n F Y Tn , with D n = x ⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣− ⋱ ⋱⋱ ⋱ ⋱ − ⋯ ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ ∈ R n × n ,K n ( µ ) = diag (( + µk ( x )) , . . . , ( + µk ( x n − )) , ) ∈ R n × n ,L n = diag ( β ( x ) , . . . , β ( x n − ) , ) ∈ R n × n and X n = [ e n , e n ] ∈ R n × ,Y n = [ e n , ( x e n − x e n − + x e n − )] ∈ R n × ,F = [ g ( µ ) f ( µ )] ∈ R × ,h n = [ h ( x ) , . . . , h ( x n )] T ∈ R n × . . . . . x u n ( x ) (a) µ = . . . . x u n ( x ) (b) µ = . Figure 5.5:
Numerical solution for (5.6) − − − − − iterations r e l a t i v e r e s i du a l Alg. 1TensorL. rank (a)
Convergence for (5.6) − − − − − time in sec r e l a t i v e r e s i du a l Alg. 1TensorL. rank (b)
Simulation time for (5.6)
Figure 5.6: µ = . n = , condition number κ ( A n ( µ )) = . × In Figures 5.5a and 5.5b, we see the numerical solution of the boundary valueproblem (5.6) on the domain [ , ] for 2 different values of µ , calculated with Al-gorithm 2. Figures 5.6a and 5.7a show the convergence of Algorithm 1, the tensorversion of Algorithm 1 and Algorithm 2 for solving (5.6). In general, the error issmaller for smaller µ , and we can therefore expect that Figure 5.7a shows an ap-proximate bound for all µ smaller than 0 .
2. Figures 5.6b and 5.7b specify the error nfinite GMRES for parameterized linear systems 17 − − − − − iterations r e l a t i v e r e s i du a l Alg. 1TensorL. rank (a)
Convergence for (5.6) − − − − − time in sec r e l a t i v e r e s i du a l Alg. 1TensorL. rank (b)
Simulation time for (5.6)
Figure 5.7: µ = . n = , condition number κ ( A n ( µ )) = . × as a function of CPU-time for a given µ , although after one run of the algorithmwe have access to the solution approximation for many different µ . We see that thetensor version of Algorithm 1 and Algorithm 2 offer an improvement in CPU-timeover Algorithm 1, especially for larger µ .
6. A finite element discretization of Helmholtz equation.
In order to il-lustrate the competitiveness of our approach we consider a Helmholtz equation with aparameter dependent material coefficient and using a discretization with the finite ele-ment software FEniCS [1]. Specifically, we consider the following Helmholtz equationwith a homogeneous Dirichlet boundary condition (∇ + f ( µ )( + µk ( x )) + f ( µ ) β ( x )) u ( x ) = h ( x ) x ∈ Ω(6.1a) u ( x ) = x ∈ ∂ Ω , (6.1b)where x = ( x , x ) , Ω is as described on pp. 37-39 in [21] and k ( x ) = ⎧⎪⎪⎨⎪⎪⎩ + ( x ) sin ( απx ) x ∈ [ , ) + ( − x ) sin ( απx ) x ∈ [ , ] ,h ( x ) = e − αx , β ( x ) = sin ( πx ) ,f ( µ ) = µ, f ( µ ) = sin ( µ ) . Figures 6.1a and 6.1b show the solutions to (6.1) on Ω. We display the solutionfor two different values of µ . We include also a plot of the mesh used to generatethe finite element matrices needed for the linearization of this problem in Figure 6.2.Figures 6.3a-6.3d shows the performance of Algorithm 1 and its tensor version forsolving the same problem for two different values of µ . We see the benefit of thetensor variation in terms of time taken to build the Krylov basis matrix, in particularwhen µ is large.
7. Conclusion and outlook.
The result of this paper is a new Krylov-subspacemethod to solve parameter-dependent systems of the form A ( µ ) x ( µ ) = b for many All simulations were carried out with 2.3 GHz Dual-Core Intel Core i5 and 16 GB RAM. x x (a) µ = . x x (b) µ = Figure 6.1:
Numerical solutions for (6.1), n = x x Figure 6.2:
Mesh for FEM discretization, n = µ simultaneously, where A depends nonlinearly on µ . We have constructed acompanion linearization where µ appears only linearly and constructed a basis for theKrylov subspace in an efficient way without introducing truncation error. Numericalexperiments verify the convergence of our methods is predicted by the magnitude ofthe parameter µ and the solutions to the corresponding NEP.We have shown how to specialize this method to solve a specific discretized bound-ary value problem where the higher order terms in the Taylor series are of a certainform due to a Robin boundary condition on one end. In this way we have incorporatedthe structure of the problem into the design of the algorithm.There are several variants of infinite Arnoldi, e.g. the Chebyshev version [15] andrestarting variations [12]. These strategies could be applied to the methods presentedin this paper, but this would require adaptation based on the specific structure of theproblem and further analysis. nfinite GMRES for parameterized linear systems 19 − − − − iterations r e l a t i v e r e s i du a l Alg. 1Tensor (a) µ = . − − − − time in sec r e l a t i v e r e s i du a l Alg. 1Tensor (b) µ = . − − − − iterations r e l a t i v e r e s i du a l Alg. 1Tensor (c) µ = − − − − time in sec r e l a t i v e r e s i du a l Alg. 1Tensor (d) µ = Figure 6.3:
Algorithm 1 and tensor variant applied to (6.1), n = Acknowledgements.
We are grateful for Prof. Tobias Damm, TU Kaiser-slautern for providing crucial ideas for the proof of this version of Gelfand’s lemma.
Appendix A. Proof of Theorem 4.1.
Since γ , . . . , γ j are semi-simple, aJordan decomposition can be expressed as A = V diag ( γ , γ , . . . , γ j , J ) V − = V Γ V − . We have ∥( A − γ I )( A − γ I )⋯( A − γ j I ) A k ∥ ≤ κ ( V ) ∥( Γ − γ I )( Γ − γ I )⋯( Γ − γ j I ) Γ k ∥ (A.1) ∥( A − γ I )( A − γ I )⋯( A − γ j I ) A k ∥ ≥ κ ( V ) ∥( Γ − γ I )( Γ − γ I )⋯( Γ − γ j I ) Γ k ∥ (A.2)where (A.2) follows from properties of singular values, i.e. ∣∣ B ∣∣ ≥ σ min ( B ) .Due to the placement of the zeros on the diagonals of the matrices ( Γ − γ i I ) , i = , . . . , j and the upper triangular structure of all the matrices, we have the relation ( j ∏ i = ( Γ − γ i I )) Γ k = j ∏ i = ⎛⎜⎜⎜⎜⎜⎝⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ γ − γ i γ − γ i ⋱ γ j − γ i J − γ i I ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦⎞⎟⎟⎟⎟⎟⎠ ⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ γ k γ k ⋱ γ kj J k ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦= [ ∏ ji = ( J − γ i I )] [ J k ] . Thus, we have ∥( j ∏ i = ( Γ − γ i I )) Γ k ∥ = ∥[ ∏ ji = ( J − γ i I )] [ J k ]∥ ≤ α ∥ J k ∥ ∥( j ∏ i = ( Γ − γ i I )) Γ k ∥ = ∥[ ∏ ji = ( J − γ i I )] [ J k ]∥ ≥ β ∥ J k ∥ , where α = σ max ( j ∏ i = ( Γ − γ i I )) > β = σ min ( j ∏ i = ( Γ − γ i I )) > . We note that we have α > β > γ , . . . , γ j are not eigenvalues of J by assumption, and therefore all singular values are positive(non-zero).So, we have ∥( A − γ I )( A − γ I )⋯( A − γ j I ) A k ∥ k ≤ ( κ ( V ) α ) k ∥ J k ∥ k k —→ ∞ ———→ ∣ γ j + ∣∥( A − γ I )( A − γ I )⋯( A − γ j I ) A k ∥ k ≥ ( βκ ( V ) ) k ∥ J k ∥ k k —→ ∞ ———→ ∣ γ j + ∣ , and thus lim k —→ ∞ ∥( A − γ I )( A − γ I )⋯( A − γ j I ) A k ∥ k = ∣ γ j + ∣ . REFERENCES[1] M. S. Alnaes, J. Blechta, J. Hake, A. Johansson, B. Kehlet, A. Logg, C. Richardson, J. Ring,M. E. Rognes, and G. N. Wells. The FEniCS project version 1.5.
Archive of NumericalSoftware , 3, 2015.[2] M. Baumann and M. B. van Gijzen. Nested Krylov methods for shifted linear systems.
SIAMJ. Sci. Comput. , 37, 2015.[3] R. Van Beeumen, E. Jarlebring, and W. Michiels. A rank-exploiting infinite Arnoldi algorithmfor nonlinear eigenvalue problems.
Numer. Linear Algebra Appl. , 23(4):607–628, 2016.[4] R. Van Beeumen, K. Meerbergen, and W. Michiels. Compact rational Krylov methods fornonlinear eigenvalue problems.
SIAM J. Sci. Comput. , 36(2):820–838, 2015.[5] P. Benner, S. Grivet-Talocia, A. Quarteroni, G. Rozza, W. Schilders, and L. Silveira, editors.
Model order reduction. Volume 3: Applications . Degruyter, 2021. nfinite GMRES for parameterized linear systems 21 [6] P. W. Benner and J. A. Schneider. Uncertainty quantification for Maxwell’s equations usingstochastic collocation and model order reduction.
Int. J. Uncertainty Quant , 5:195–208,2015.[7] J.-P. Berenger. A perfectly matched layer for the absorption of electromagnetic waves.
J.Comput. Phys. , 114(2):185–200, 1994.[8] D. Estep, P. Hansbo, C. Johnson, and K. Eriksson.
Computational Differential Equations .Cambridge University Press, 2009.[9] G.-D. Gu and V. Simoncini. Numerical solution of parameter-dependent linear systems.
Numer.Linear Algebra Appl. , 12(9):923–940, 2005.[10] K. Gu, V. Kharitonov, and J. Chen.
Stability of Time-Delay Systems . Control Engineering.Boston, MA: Birkh¨auser, 2003.[11] T. Hagstrom. New results on absorbing layers and radiation boundary conditions.
Ainsworth,Mark (ed.) et al., Topics in computational wave propagation. Direct and inverse problems.Berlin: Springer. Lect. Notes Comput. Sci. Eng. , 2003.[12] E. Jarlebring, K. Meerbergen, and W. Michiels. Computing a partial Schur factorization ofnonlinear eigenvalue problems using the infinite Arnoldi method.
SIAM J. Matrix Anal.Appl. , 35(2):411–436, 2014.[13] E. Jarlebring, K. Meerbergen, and W-Michiels. An Arnoldi method with structured startingvectors for the delay eigenvalue problem.
Proceedings of the 9th IFAC Workshop on TimeDelay Systems, Prague , 2010.[14] E. Jarlebring, G. Mele, and O. Runborg. The waveguide eigenvalue problem and the tensorinfinite Arnoldi method.
SIAM J. Sci. Comput. , 39(3), 2017.[15] E. Jarlebring, W. Michiels, and K. Meerbergen. A linear eigenvalue algorithm for the nonlineareigenvalue problem.
Numer. Math. , 122(1):169–195, 2012.[16] D. Kressner and J. Roman. Memory-efficient Arnoldi algorithms for linearizations of matrixpolynomials in Chebyshev basis.
Numer. Linear Algebra Appl. , 21(4):569–588, 2014.[17] D. Kressner and C. Tobler. Low-rank tensor Krylov subspace methods for parametrized linearsystems.
SIAM J. Matrix Anal. Appl. , 32:1288–1316, 2011.[18] S. Mackey, N. Mackey, C. Mehl, and V. Mehrmann. Vector spaces of linearizations for matrixpolynomials.
SIAM J. Matrix Anal. Appl. , 28:971–1004, 2006.[19] H. G. Matthies, A. Litvinenko, O. Pajonk, B. V. Rosi´c, and E. Zander. Parametric and uncer-tainty computations with tensor product representations. In
Uncertainty Quantificationin Scientific Computing (Berlin) (A. Dienstfrey and R. Boisvert, eds.), IFIP Advances inInformation and Communication Technology , pages 139–150. Springer, 2012.[20] G. Mele and E. Jarlebring. On restarting the tensor infinite Arnoldi method.
BIT , 58(1):133–162, 2018.[21] G. Mele, E. Ringh, D. Ek, F. Izzo, P. Upadhyaya, and E. Jarlebring.
Preconditioning for LinearSystems . KD Publishing, first edition edition, 2020.[22] W. Michiels, E. Jarlebring, and K. Meerbergen. Krylov-based model order reduction of time-delay systems.
SIAM J. Matrix Anal. Appl. , 32(4):1399–1421, 2011.[23] W. Michiels and S.-I. Niculescu.
Stability and Stabilization of Time-Delay Systems: AnEigenvalue-Based Approach . Advances in Design and Control 12. SIAM Publications,Philadelphia, 2007.[24] Y. Saad.
Iterative Methods for sparse linear systems . SIAM, 2nd edition, 2003.[25] Y. Saad and M. H. Schultz. GMRES: A generalized minimal residual algorithm for solvingnonsymmetric linear systems.
SIAM J. Sci. Stat. Comput. , 7:856–869, 1986.[26] K. M. Soodhalter. Two recursive GMRES-type methods for shifted linear systems with generalpreconditioning.
Electron. Trans. Numer. Anal. , 45:499–523, 2016.[27] K. M. Soodhalter, D. B. Szyld, and F. Xue. Krylov subspace recycling for sequences of shiftedlinear systems.
Appl. Numer. Math. , 81:105–118, 2014.[28] Z. Strakoˇs and J. Liesen.