Any decreasing cycle-convergence curve is possible for restarted GMRES
aa r X i v : . [ m a t h . NA ] J u l Any decreasing cycle–convergence curve is possible for restartedGMRES
Eugene Vecharynski and Julien LangouNovember 21, 2018
Abstract
Given a matrix order n , a restart parameter m ( m < n ), a decreasing positive sequence f (0) >f (1) > . . . > f ( q ) ≥
0, where q < n/m , it is shown that there exits an n -by- n matrix A and a vector r with k r k = f (0) such that k r k k = f ( k ), k = 1 , . . . , q , where r k is the residual at cycle k of restartedGMRES with restart parameter m applied to the linear system Ax = b , with initial residual r = b − Ax .Moreover, the matrix A can be chosen to have any desired eigenvalues. We can also construct arbitrarycases of stagnation; namely, when f (0) > f (1) > . . . > f ( i ) = f ( i + 1) ≥ i < q . The restartparameter can be fixed or variable. We consider the generalized minimal residual method (GMRES) [15] for solution of a nonsingular non-Hermitian systems of linear equations Ax = b, A ∈ C n × n , b ∈ C n . (1)For a few class of matrices, some convergence estimates are available for restarted GMRES and full
GMRES. For example for real positive definite matrices (that is, for matrices A for which H = ( A + A H ) / A for which x H Ax > x ∈ R n ),the Elman’s bound [6, 7, 11, 15] can be stated as follows k r k k ≤ (1 − ρ ) k k r k where 0 < ρ ≡ ( λ min ( H ) / k A k ) ≤ . The latter guarantees linear convergence of GMRES( m ) for any value of m ≥ full GMRES for normal matrices is known to be linear and there exist convergence estimatesgoverned solely by the spectrum of A [17, 18]. The convergence of restarted GMRES for normal matrices,however, is sublinear [2, 19]. The current paper is concerned with the general case.For the general case, the following theorem proves that we can not prove convergence results based onthe spectrum of the coefficient matrix alone.
Theorem 1 (Greenbaum, Pt´ak, and Strakoˇs, 1996, [12])
Given a nonincreasing positive sequence f (0) ≥ f (1) ≥ · · · ≥ f ( n − > , there exists an n -by- n matrix A and a vector r with k r k = f (0) such that f ( k ) = k r k k , k = 1 , . . . , n − , where r k is the residual at step k of the GMRES algorithm applied to thelinear system Ax = b , with initial residual r = b − Ax . Moreover, the matrix A can be chosen to have anydesired eigenvalues. This result states that, in general, eigenvalues alone do not determine the convergence of full
GMRES.Assuming that the coefficient matrix A is diagonalizable, some characterizations of the convergence of fullGMRES rely on the condition number of the eigenbasis [18]. Other characterizations of the convergence offull GMRES rely on pseudospectra [14]. More commonly, the field of values is used [3, 6, 7, 11, 15, 16, 21].A discussion on how descriptive some of these bounds are is given by Embree [8].The main result of this paper is given in the abstract. We will repeat it here1 heorem 2 Given a matrix order n , a restart parameter m ( m < n ), a decreasing positive sequence f (0) >f (1) > . . . > f ( q ) ≥ , where q < n/m , there exits an n -by- n matrix A and a vector r with k r k = f (0) such that k r k k = f ( k ) , k = 1 , . . . , q , where r k is the residual at cycle k of restarted GMRES with restartparameter m applied to the linear system Ax = b , with initial residual r = b − Ax . Moreover, the matrix A can be chosen to have any desired eigenvalues. Section 2 contains a proof of Theorem 2. Theorem 2 is to restarted
GMRES what Theorem 1 is to full
GMRES. The proof we provide is constructive and directly inspired by the article of Greenbaum, Pt´ak, andStrakoˇs [12]. Although Greenbaum, Pt´ak, and Strakoˇs laid the path, there are several specific difficultiesahead in the case of restarted
GMRES.
Full
GMRES has a nonincreasing convergence (for any i ≥ f ( i ) ≥ f ( i + 1)) and it computes theexact solution in at most n steps ( f ( n ) = 0). It is remarkable that Greenbaum, Pt´ak, and Strakoˇs areable to characterize any admissible convergence for GMRES. (See assumptions on f in Theorem 1.) At thesame time we would like to note that the cycle–convergence of restarted GMRES can have two admissible scenarios: either for any i , f ( i ) > f ( i + 1), in other words, the cycle–convergence is decreasing; or there exits s such that f ( i ) > f ( i + 1) for any i < s , and then for any i > s, f ( i ) = f ( s ), in other words, if restartedGMRES stagnates at cycle s + 1, it stagnates forever. Theorem 2 considers the first case (decreasingcycle–convergence). In Section 3, we consider the second case (stagnation). Therefore with Theorem 2 andSection 3, we prove that any admissible cycle–convergence curve is possible for the q first cycles of restarted GMRES.As mentioned above, the maximum number of iterations of full
GMRES is at most n , and the methoddelivers the exact solution in a finite number of steps. Restarted
GMRES, however, may never provide theexact solution. It will (hopefully) decrease the residual norm at each cycle, that is, provide a more and moreaccurate approximation to the exact solution. With n parameters in A and n parameters in b we are notable to control the convergence for an infinite amount of cycles. For this reason, it is natural to consider onlythe first q < n/m initial GMRES( m ) cycles. Actually, we provide the same level of control as Greenbaum,Pt´ak, and Strakoˇs: n iterations (or q cycles with q < n/m ) and n eigenvalues.In Section 4, we generalize the result given by Theorem 2 and Section 3 for the case of variable restartparameters. The sequence of restart parameters m k needs to be known a priori. We show that GMRES( m k )can produce any admissible cycle–convergence curve at the q initial cycles, regardless of the spectrum of thecoefficient matrix, where q is such that P qi =1 m k < n . We note that our construction can be a reasonable toolfor generating examples/counter-examples for different strategies for varying the restart parameter, e.g. [2].The cycle–convergence of restarted GMRES for normal matrices is sublinear [2, 19]. However, for generalmatrices, through Theorem 2, one can expect any convergence curve. In particular, it is possible to constructmatrices for which the convergence of GMRES( m ) is fast (e.g. superlinear). This relates to the observationsof Zhong and Morgan [20] who report superlinear cycle–convergence for their particular cases of nonnormalmatrices, as well as to [19], where it is shown that the cycle–convergence can become superlinear as thecoefficient matrix departs from normality.In a pedagogical paper, Embree [9] presents a 3-by-3 linear system of equations and attempts to solve itwith GMRES(1) and GMRES(2). While GMRES(1) converges to the exact solution in 3 cycles, GMRES(2)(almost) stagnates. Our main result, basically, reaffirms this intuition in the sense that the increase in therestart parameter (and thus, in the computational complexity at each cycle) does not necessarily imply afaster convergence.In order to improve the convergence of restarted GMRES, several techniques [1, 4, 5, 10, 13] have beenproposed which consist of augmenting (or enriching) the Krylov space with eigenvectors or, alternatively,deflating some of the eigenvalues from the spectrum of the original matrix A . The eigenvalues targettedare the ones the closest from zero. These techniques have proved effective and the convergence of restartedGMRES is, in practice, greatly improved. Theorem 2 states that, in the general case, eigenvalues alone donot determine the convergence of restarted GMRES, therefore it is hard to provide a theorical justificationfor the choice of removing the eigenvalues the closest from zero. A beginning of theoretical understandinghas been provided by Z´ıtko [22].We have generated two Matlab functions that correspond to Theorem 1 and Theorem 2. Given a matrixsize, a restart parameter, a convergence curve and a spectrum, we construct the appropriate matrix andright-hand side. See: .2he main message that we would like our readers to retain from this paper is that in the context ofGMRES( m ), for a certain number of initial cycles, any convergence curve is possible independently of thespectrum of the coefficient matrix. This means that eigenvalues alone do not determine the convergence of restarted GMRES.
Let n be a matrix order and m a restart parameter ( m < n ), Λ = { λ , λ , . . . λ n } ⊂ C \ { } be a set of n nonzero complex numbers, and { f ( k ) } qk =0 be a decreasing sequence of positive real numbers, q < n/m .In this section we construct a matrix A ∈ C n × n and an initial residual vector r = b − Ax ∈ C n suchthat GMRES( m ) applied to the system (1) with the initial approximate solution x , produces a sequence { x k } qk =1 of approximate solutions with corresponding residual vectors { r k } qk =0 having the prescribed norms: k r k k = f ( k ). Moreover the spectrum of A is Λ. The general approach described in this paper is similar to the approach of Greenbaum, Pt´ak, and Strakoˇs [12]:we fix an initial residual vector, construct an appropriate basis of C n and use this basis to define a linearoperator A . This operator is represented by the matrix A in the canonical basis. It has the prescribedspectrum and provides the desired cycle–convergence at the first q cycles of GMRES( m ). However, thepresence of restarts somewhat complicates the construction: the choice of the basis vectors, as well asthe structure of the resulting operator A , becomes less transparent. Below we describe our three-stepconstruction.At the first step we construct q sets of vectors W ( k ) m = { w ( k )1 , . . . , w ( k ) m } , k = 1 , . . . , q , each set W ( k ) m is theorthonormal basis of the Krylov residual subspace A K m ( A, r k − ) generated at the k -th GMRES( m ) cyclesuch that span W ( k ) j = A K j ( A, r k − ) , j = 1 , . . . , m. (2)(With this definition, W ( k ) m is defined up to multiplication by a complex number of unit modulus.)The orthonormal basis W ( k ) m needs to be chosen in order to generate residual vectors r k with the prescribednorms f ( k ) at the end of each cycle subject to the additional requirement that the set of mq + 1( ≤ n ) vectors S = { r , w (1)1 , . . . , w (1) m − , r , w (2)1 , . . . , w (2) m − , . . . , r q − , w ( q )1 , . . . , w ( q ) m − , r q } (3)is linearly independent.Once we have the set S , we will complete it to have a basis for C n . When the number of vectors in S isless than n , a basis S of C n is obtained by completion of S with a set b S of n − mq − S = {S , b S} .This will provide a representation of C n as the direct sum C n = span S = span { r , W (1) m − } ⊕ · · · ⊕ span { r q − , W ( q ) m − } ⊕ span { r q , b S} . (4)The latter translates in terms of Krylov subspaces into C n = span S = K m ( A, r ) ⊕ · · · ⊕ K m ( A, r q − ) ⊕ span { r q , b S} . At the second step of our construction, we define a linear operator A : C n −→ C n with spectrum Λwhich generates the Krylov residual subspaces in Eq. (2) at each GMRES( m ) cycle, by its action on thebasis vectors S , such that the desired matrix A is the operator A ’s representation in the canonical basis.The third step accomplishes the construction by a similarity transformation.The two following subsections are concerned with the question if (2)–(3)–(4) and the definition of theoperator A with the prescribed spectrum is actually possible.3 .2 Step 1: Construction of a sequence of Krylov subspaces which provide theprescribed cycle–convergence At the k -th GMRES( m ) cycle, the residual vector r k satisfies the following minimality condition: k r k k = min u ∈ A K m ( A,r k − ) k r k − − u k . (5)We assume that each set W ( k ) m is an orthonormal basis of a corresponding Krylov residual subspace A K m ( A, r k − ), therefore the condition (5) implies r k = r k − − m X j =1 h r k − , w ( k ) j i w ( k ) j , k = 1 , . . . , q. (6)At this stage, in order to simplify the forthcoming justification of the linear independence of the set S ,we impose a stricter requirement on the residual change inside the cycle. We will require that the residualvector r k − remains constant during the first m − m -th, step. Thus, the equality in (6) can be written as r k = r k − − h r k − , w ( k ) m i w ( k ) m , k = 1 , . . . , q. (7)This implies that the vectors w ( k ) j , j = 1 , . . . , m −
1, are orthogonal to the residual vector r k − , i.e. h r k − , w ( k ) j i = 0 , j = 1 , . . . , m − , k = 1 , . . . , q. (8)From Eq. (7), using the fact that r k ⊥ w ( k ) m and the Pythagorean theorem, we obtain |h r k − , w ( k ) m i| = p k r k − k − k r k k , k = 1 , . . . , q. We rewrite the expression above in terms of cosines of angles ψ k = ∠ ( r k − , w ( k ) m ) by prescribing the expectedvalues f ( k ) for the norms of the residuals. We getcos ψ k = p f ( k − − f ( k ) f ( k − ∈ (0 , , k = 1 , . . . , q. (9)This latter equation means that, if we are given r k − , one way to ensure the desired cycle–convergenceat cycle k of GMRES( m ) is to choose the unit vectors w ( k ) j such that (7)–(9) holds.In the following lemma, we show constructively that the described approach (7)–(9) leads to an appro-priate set S . Lemma 1
Given an initial vector r , k r k = f (0) , there exist vectors r k , k r k k = f ( k ) and orthonormal sets W ( k ) m such that Eq. (7), (8) and (9) hold, and the set S is linearly independent, k = 1 , . . . , q < n/m . Proof.
The proof is by induction.Let k = 1. Given the initial vector r , k r k = f (0), we pick W (1) m − = { w (1)1 , . . . , w (1) m − } an orthonormalset in r ⊥ in order to satisfy Eq. (8). The set { r , W (1) m − } is linearly independent.In order to choose the unit vector w (1) m orthogonal to the previously constructed vectors W (1) m − andsatisfying Eq. (9), we introduce a unit vector y (1) ∈ { r , W (1) m − } ⊥ , so that w (1) m = r f (0) cos ψ + y (1) sin ψ . We find the vector r by satisfying Eq. (7). Eq. (9) guarantees that k r k = f (1), as desired. Finally, weappend the constructed vector r to { r , W (1) m − } and get the set { r , W (1) m − , r } , which is linearly independent,since, by construction, r is not in span { r , W (1) m − } . 4he induction assumption is that we have constructed k − r , . . . , r k − with the prescribednorms f (1) , . . . , f ( k −
1) and orthonormal sets W (1) m , . . . , W ( k − m , such that the equalities (7), (8) and (9)hold, and the set { r , W (1) m − , . . . , r k − , W ( k − m − , r k − } (10)is linearly independent. We want to show that we can construct the next vector r k , k r k k = f ( k ), and theorthonormal set W ( k ) m , satisfying Eq. (7), (8) and (9), such that { r , W (1) m − , . . . , r k − , W ( k − m − , r k − , W ( k ) m − , r k } (11)is linearly independent, k ≤ q .We start by constructing orthonormal vectors W ( k ) m − = { w ( k )1 , . . . , w ( k ) m − } , satisfying Eq. (8), with theadditional requirement that the set W ( k ) m − is not in the span of the previously constructed vectors given in theset (10). From these considerations we choose W ( k ) m − as an orthonormal set in the orthogonal complementof (10), i.e. w ( k ) j ∈ { r , W (1) m − , . . . , r k − , W ( k − m − , r k − } ⊥ , j = 1 , . . . , m − . Appending W ( k ) m − to the set (10) will give a linearly independent set.To finish the proof, we need to construct the vector w ( k ) m , satisfying Eq. (9) and orthogonal to W ( k ) m − .For this reason we introduce a unit vector y ( k ) , y ( k ) ∈ { r , W (1) m − , . . . , r k − , W ( k − m − , r k − , W ( k ) m − } ⊥ , so that w ( k ) m w ( k ) m = r k − f ( k −
1) cos ψ k + y ( k ) sin ψ k . We define the vector r k with Eq. (7). Eq. (9) guarantees k r k k = f ( k ). The set (11) is linearly independent,since, by construction, the vector r k is not in span { r , W (1) m − , . . . , r k − , W ( k − m − , r k − , W ( k ) m − } . (cid:3) So far we have shown that, given an initial residual vector r , k r k = f (0), it is possible to construct vectors r k , k r k k = f ( k ), and orthonormal vectors W ( k ) m , k = 1 , . . . , q , satisfying Eq. (7), (8) and (9), such that theset S of mq + 1 vectors is linearly independent.In order to define a unique linear operator, we need to have a valid basis of C n on hand. Thus, we expandthe set S by linearly independent vectors b S = { b s , . . . , b s t } , t = n − mq − S = { r , W (1) m − , . . . , r q − , W ( q ) m − , r q , b s , . . . , b s t } , (12)so that S is a basis of C n .Before we define a linear operator A , let us consider the set Λ = { λ , λ , . . . , λ n } of nonzero numbers inthe complex plane that will define A ’s spectrum. We split Λ into q + 1 disjoint subsetsΛ = { Λ , Λ , . . . , Λ q , Λ q +1 } , such that each Λ k , k = 1 , . . . , q , contains m elements of Λ, and the remaining n − mq elements are includedinto Λ q +1 .For each set Λ k we define a monic polynomial p k ( x ), such that the roots of this polynomial are exactlythe elements of the corresponding Λ k : p k ( x ) = x m − m − X j =0 α ( k ) j x j , k = 1 , . . . , q ; (13) p q +1 ( x ) = x t +1 − t X j =0 α ( q +1) j x j , t = n − mq −
1; (14)5ith α ( k ) j ’s being the coefficients of the respective polynomials, α ( k )0 = 0, k = 1 , . . . q + 1. p k ( x ) canbe considered as the characteristic polynomial of an m -by- m matrix with spectrum Λ k . p q +1 ( x ) can beconsidered as the characteristic polynomial of a ( t + 1)-by-( t + 1) matrix with spectrum Λ q +1 .We define the operator A : C n −→ C n as follow: A r k − = w ( k )1 , A w ( k )1 = w ( k )2 , ... A w ( k ) m − = w ( k ) m − , A w ( k ) m − = − α ( k )0 r k + α ( k )0 r k − + α ( k )1 w ( k )1 + · · · + α ( k ) m − w ( k ) m − , k = 1 , . . . q ; A r q = b s , A b s = b s , ... (15) A b s t − = b s t , A b s t = α ( q +1)0 r q + α ( q +1)1 b s + · · · + α ( q +1) t b s t , where α ( k ) j ’s are the coefficients of polynomials (13) and (14).The following lemma shows that, given vectors r k and orthonormal sets W ( k ) m constructed according toLemma 1, the linear operator A , defined by (15) and represented by a matrix A in the canonical basis,generates the desired Krylov residual subspaces given in Eq. (2); and the spectrum of A can be arbitrarilychosen. Lemma 2
Let the initial residual vector r , k r k = f (0) , as well as the residual vectors r k and orthonormalsets W ( k ) m be constructed according to Lemma 1. Let S be the basis of C n as defined by Eq. (12). We assumea matrix A to be the representation in the canonical basis of the linear operator A defined by Eq. (15).Then the linear operator A generates the Krylov residual subspaces given in Eq. (2). Moreover, A has theprescribed spectrum Λ . Proof.
Directly from the definition (15) of the linear operator A , for k = 1 , . . . , q , we have:span {A r k − , . . . , A j r k − } = span W ( k ) j , j = 1 , . . . , m − . To see that, for each k , span {A r k − , . . . , A m r k − } = span W ( k ) m , notice that, by Eq. (7), ( − α ( k )0 r k + α ( k )0 r k − ) ∈ span { w ( k ) m } . Thus, given the representation A of the operator A in the canonical basis, Eq. (2) holds for each k , k = 1 , . . . , q .To prove that the arbitrarily chosen set Λ is the spectrum of A , let us consider the matrix [ A ] S of the6perator A in the basis S :[ A ] S = · · · α (1)0 · · · α (1)1 · · · α (1)2 ... ... . . . ...0 0 · · · α (1) m − − α (1)0 · · · α (2)0 · · · α (2)1 · · · α (2)2 ... ... . . . ...0 0 · · · α (2) m − − α (2)0 . . .. . . . . . − α ( q )0 · · · α ( q +1)0 · · · α ( q +1)1 · · · α ( q +1)2 ... ... . . . ...0 0 · · · α ( q +1) t . (16)The matrix [ A ] S has a block lower triangular structure, hence [ A ] S ’s spectrum is the union of theeigenvalues of all diagonal blocks, which are the companion matrices corresponding to the sets Λ k withcharacteristic polynomials defined in (13) and (14). Thus, the spectrum of A is Λ. (cid:3) Finally, we define A as the representation of the operator A in the canonical basis: { e , e , . . . , e n } , A = S [ A ] S S − , (17)where the square matrix S is formed by the vectors given in Eq. (12) written as columns and [ A ] S is definedby Eq. (16). The constructed matrix A provides the prescribed norms of residual vectors at the first q GMRES( m ) cycles when starting with r and its spectrum is Λ. For the reader familiar with the work of Greenbaum, Pt´ak, and Strakoˇs [12], it might be tempting toobtain the present result by pursuing the following scheme: fix r and then consider the first restartedGMRES cycle as the initial part of a full GMRES run where the convergence is prescribed for the first m iterations (and arbitrarily set for the remaining n − m iterations). Then, similarly, given the startingresidual vector r provided by this first cycle, construct the next Krylov residual subspace which providesthe desired convergence following the scheme of Greenbaum, Pt´ak, and Strakoˇs [12]. Proceed identically forthe remaining cycles. This approach, however, does not guarantee the linear independence of the set S and,hence, one meets the problem of defining the linear operator A . These considerations were the reason forthe assumption (7) on the residual reduction inside a cycle, which allowed to quite easily justify the linearindependence of the set S and, as well, to control the spectrum.7 Generating stagnating example of restarted GMRES
Theorem 2 handles the case for the decreasing positive sequence { f ( k ) } qk =0 . In this section, we are concernedwith the stagnation case: when f (0) > f (1) > · · · > f ( s ) > f ( s ) = f ( s + 1) = . . . = f ( q ). Theorem 3
Given a matrix order n , a restart parameter m ( m < n ), a positive sequence { f ( k ) } qk =0 , whichis either decreasing, or such that f (0) > f (1) > · · · > f ( s ) > and f ( s ) = f ( s + 1) = . . . = f ( q ) , where q < n/m , s < q . There exits an n -by- n matrix A and a vector r with k r k = f (0) such that k r k k = f ( k ) , k = 1 , . . . , q , where r k is the residual at cycle k of restarted GMRES with restart parameter m applied to thelinear system Ax = b , with initial residual r = b − Ax . Moreover, the matrix A can be chosen to have anydesired eigenvalues. Proof.
The decreasing convergence case is handled by Theorem 2. Therefore, we only need to constructa matrix A with a spectrum Λ and an initial residual vector r , k r k = f (0) for which restarted GMRESstagnates at cycle s + 1 while k r k = f (1) > . . . > k r s k = f ( s ), s < q .By Lemma 1, given the initial residual vector r , k r k = f (0), we can construct residual vectors r k withthe prescribed norms f ( k ), and orthonormal sets W ( k ) m , k = 1 , . . . , s , such that the set { r , W (1) m − , . . . , r s − , W ( s ) m − , r s } (18)is linearly independent. In order to enforce stagnation at the ( s + 1)-st GMRES( m ) cycle, we want the nextorthonormal set W ( s +1) m to be orthogonal to the residual vector r s . (See Eq. (6) or (7).) Thus, following thepattern in Lemma 1, we choose W ( s +1) m from the orthogonal complement of the set (18), and append W ( s +1) m − to the set (18), thus obtaining the linearly independent set { r , W (1) m − , . . . , r s − , W ( s ) m − , r s , W ( s +1) m − } . (19)At this point, if we followed the proof of Lemma 1, we would append the new residual vector r s +1 to theset (19). Since r s = r s +1 , this would result in the loss of the linear independence of our set. Instead, wewould like to expand the set (19) by some vector that will not spoil the linear independence and will allowfor a proper definition of the operator A at the second step of the proof. To fulfill this task, we choose thisvector to be w ( s +1) m + r s and append it to (19). We obtain the set { r , W (1) m − , . . . , r s − , W ( s ) m − , r s , W ( s +1) m − , w ( s +1) m + r s } , (20)which is linearly independent, since the vector w ( s +1) m + r s has the component w ( s +1) m from the orthogonalcomplement of (19) and hence cannot be represented as a linear combination of vectors in this set.Expanding (20) with vectors b S = { b s , . . . , b s t } , we finally construct the basis of C n :˜ S = { r , W (1) m − , . . . , r s − , W ( s ) m − , r s , W ( s +1) m − , w ( s +1) m + r s , b s , . . . , b s t } , (21)where t = n − m ( s + 1) − A with a prescribedspectrum Λ, represented by the matrix A in the canonical basis, such that Eq. (2) is satisfied for k =1 , . . . , s + 1. We split Λ into the disjoint subsets Λ = { Λ , Λ , . . . , Λ s +1 , Λ s +2 } , so that each Λ k consists of m sequential elements of Λ, k = 1 , . . . , s + 1, while the rest n − m ( s + 1) elements are included into Λ s +2 .Similarly to (13)–(14), for each k , we introduce the polynomials p k ( x ) = x m − m − X j =0 α ( k ) j x j , k = 1 , . . . , s + 1; (22) p s +2 ( x ) = x t +1 − t X j =0 α ( s +2) j x j , t = n − m ( s + 1) −
1; (23)where the roots of each polynomial are in the respective set Λ k , k = 1 , . . . , s + 2.8imilarly to (15), we define the operator A as following: A r k − = w ( k )1 , A w ( k )1 = w ( k )2 , ... A w ( k ) m − = w ( k ) m − , A w ( k ) m − = − α ( k )0 r k + α ( k )0 r k − + α ( k )1 w ( k )1 + · · · + α ( k ) m − w ( k ) m − , k = 1 , . . . , s ; A r s = w ( s +1)1 , A w ( s +1)1 = w ( s +1)2 , ... (24) A w ( s +1) m − = w ( s +1) m − , A w ( s +1) m − = − α ( s +1)0 ( w ( s +1) m + r s ) + α ( s +1)0 r s + α ( s +1)1 w ( s +1)1 + · · · + α ( s +1) m − w ( s +1) m − , A ( w ( s +1) m + r s ) = b s , A b s = b s , ... A b s t − = b s t , A b s t = α ( s +2)0 ( w ( s +1) m + r s ) + α ( s +2)1 b s + · · · + α ( s +2) t b s t , where α ( k ) j ’s are the coefficients of polynomials (22) and (23). From the definition (24) of the operator A ,one can observe that for each k , k = 1 , . . . , s + 1,span {A r k − , . . . , A j r k − } = span W ( j ) m , j = 1 , . . . , m. Thus, given the representation A of the operator A in the canonical basis, we can guarantee that Eq. (2)holds for each k , k = 1 , . . . , s + 1.Similarly to Eq. (16), the structure of the matrix [ A ] ˜ S of the operator A in the basis ˜ S will be blocklower triangular with each diagonal block being the companion matrix for the corresponding subset Λ k of A ’s eigenvalues, where characteristic polynomials are defined by (22)–(23), and − α ( k )0 ’s being subdiagonalelements. The desired matrix A is then obtained by similarity transformation A = ˜ S [ A ] ˜ S ˜ S − , where the square matrix ˜ S is formed by the set of vectors (21) written as columns. (cid:3) The result given by Theorem 3 generalizes to the case when the restart parameter m is not fixed, but variesover the successive cycles with a priori prescribed restart parameters m k for the corresponding GMRES( m k )cycles. Corollary 1
Given a matrix order n , a sequence { m k } qk =1 of restart parameters with ≤ m k ≤ n − , q X k =1 m k < n , and a positive sequence { f ( k ) } qk =0 , which is either decreasing, or such that f (0) > f (1) > · · · >f ( s ) > and f ( s ) = f ( s + 1) = . . . = f ( q ) , where s < q . There exits an n -by- n matrix A and a vector r with k r k = f (0) such that k r k k = f ( k ) , k = 1 , . . . , q , where r k is the residual at cycle k of restarted GMRESwith a variable restart parameter m k applied to the linear system Ax = b , with initial residual r = b − Ax .Moreover, the matrix A can be chosen to have any desired eigenvalues. roof The proof follows directly from Lemma 1, Lemma 2 and Theorem 2. Note that the constructedoperator A will have block lower triangular matrices with block sizes m k (instead of m ). (cid:3) When constructing a matrix A and an initial residual vector r which provide the prescribed decreasingcycle-convergence generated by GMRES( m ), we note that from the last line of the definition (15) of theoperator A we obtain r q ∈ A K t +1 ( A, r q ) , where A is the representation of the operator A in the canonical basis and t = n − mq −
1. This equalityimplies that at the end of the ( q + 1)-st cycle GMRES( m ) converges to the exact solution of Eq. (1), i.e. r q +1 = 0. This fact might seem unnatural and undesirable, e.g., for constructing academic examples. The“drawback”, however, can be easily fixed by a slight correction of the basis S – somewhat similarly to howwe handled the stagnation case in Theorem 3.Given residuals r k and orthonormal sets W ( k ) m constructed according to Lemma 1, instead of consideringthe set S , we consider the following basis of C n :˜ S = { r , w (1)1 , . . . , w (1) m − , . . . , r q − , w ( q )1 , . . . , w ( q ) m − , r q + γr q − , b s , . . . , b s t } , (25)where γ = −
1. Here we substituted the basis vector r q in Eq. (12) by r q + γr q − . The vector r q + γr q − cannot be represented as a linear combination of other vectors in ˜ S , since it contains the component r q ,which is not represented by these vectors. Hence, ˜ S is indeed a basis of C n . Thus we can define the operator A by its action on ˜ S : A r k − = w ( k )1 , A w ( k )1 = w ( k )2 , ... A w ( k ) m − = w ( k ) m − , A w ( k ) m − = − α ( k )0 r k + α ( k )0 r k − + α ( k )1 w ( k )1 + · · · + α ( k ) m − w ( k ) m − , k = 1 , . . . , q − A r q − = w ( q )1 , A w ( q )1 = w ( q )2 , ... (26) A w ( q ) m − = w ( q ) m − , A w ( q ) m − = − α ( q )0 γ ( r q + γr q − ) + α ( q )0 r q − + α ( q )1 w ( q )1 + · · · + α ( q ) m − w ( q ) m − , A ( r q + γr q − ) = b s , A b s = b s , ... A b s t − = b s t , A b s t = α ( q +1)0 ( r q + γr q − ) + α ( q +1)1 b s + · · · + α ( q +1) t b s t , where α ( k ) j ’s are the coefficients of the corresponding characteristic polynomials (13) and (14). The fact thatthe operator A produces the correct Krylov residual subspace at the cycle q , i.e.,span {A r q − , . . . , A m r q − } = span W ( q ) m , A w ( q ) m − = − α ( q )0 γ ( r q + γr q − ) + α ( q )0 r q − + α ( q )1 w ( q )1 + · · · + α ( q ) m − w ( q ) m − = − α ( q )0 γ ( r q − r q − + (1 + γ ) r q − ) + α ( q )0 r q − + α ( q )1 w ( q )1 + · · · + α ( q ) m − w ( q ) m − = − α ( q )0 γ ( r q − r q − ) + α ( q )1 w ( q )1 + · · · + α ( q ) m − w ( q ) m − , where, by Eq. (26), A w ( q ) m − = A m r q − and, by Eq. (7), ( r q − r q − ) ∈ span { w ( q ) m } .The matrix [ A ] ˜ S of the operator A , defined by Eq. (26), in the basis ˜ S is identical to Eq. (16) with theonly change of the subdiagonal element − α ( q )0 to − α ( q )0 γ , γ = −
1. Hence, A has the desired spectrum Λ.Thus, finally, according to Eq. (26), r q ∈ AK t +1 ( A , r q ) + K t +2 ( A , r q − ) , providing that r q +1 is nonzero. References [1] J. Baglama, D. Calvetti, G. H. Golub, and L. Reichel. Adaptively preconditioned GMRES algorithms.
SIAM Journal on Scientific Computing , 20(1):243–269, 1999.[2] A. H. Baker, E. R. Jessup, and Tz. V. Kolev. A simple strategy for varying the restart parameter inGMRES(m).
Journal of Computational and Applied Mathematics , 230(2):751–761, 2009.[3] B. Beckermann, S. A. Goreinov, and E. E. Tyrtyshnikov. Some remarks on the Elman estimate forGMRES.
SIAM Journal on Matrix Analysis and Applications , 27(3):772–778, 2005.[4] Andrew Chapman and Yousef Saad. Deflated and augmented Krylov subspace techniques.
NumericalLinear Algebra with Applications , 4(1):43–66, 1997.[5] I. S. Duff, L. Giraud, J. Langou, and ´E. Martin. Using spectral low rank preconditioners for largeelectromagnetic calculations.
Int. J. Numerical Methods in Engineering , 62(3):416–434, 2005.[6] S. C. Eisenstat, H. C. Elman, and M. H. Schultz. Variational iterative methods for nonsymmetricsystems of linear equations.
SIAM Journal on Numerical Analysis , 20:345–357, 1983.[7] H. C. Elman.
Iterative methods for large sparse nonsymmetric systems of linear equations . PhD thesis,Yale University: New Haven, CT, 1982.[8] M. Embree. How descriptive are GMRES convergence bounds? Technical Report 99/08, OxfordUniversity Computing Laboratory, 1999.[9] M. Embree. The tortoise and the hare restart GMRES.
SIAM Review , 45(2):259–266, 2003.[10] J. Erhel, K. Burrage, and B. Pohl. Restarted GMRES preconditioned by deflation.
Journal of Compu-tational and Applied Mathematics , 69:303–318, 1996.[11] A. Greenbaum.
Iterative Methods for Solving Linear Systems . SIAM, 1997.[12] A. Greenbaum, V. Pt´ak, and Z. Strakoˇs. Any nonincreasing convergence curve is possible for GMRES.
SIAM Journal on Matrix Analysis and Applications , 17(3):465–469, 1996.[13] R. B. Morgan. GMRES with deflated restarting.
SIAM Journal on Scientific Computing , 24(1):20–37,2002. 1114] N. M. Nachtigal, S. C. Reddy, and L. N. Trefethen. How fast are nonsymmetric matrix iterations?
SIAM Journal on Matrix Analysis and Applications , 13(3):778–795, 1992.[15] Y. Saad and M. H. Schultz. GMRES: A generalized minimal residual algorithm for solving nonsymmetriclinear systems.
SIAM Journal on Scientific and Statistical Computing , 7(3):856–869, 1986.[16] V. Simoncini and D. Szyld. New conditions for non-stagnation of minimal residual methods.
NumerischeMathematik , 109(3):477–487, 2008.[17] V. Simoncini and D. B. Szyld. On the occurrence of superlinear convergence of exact and inexact Krylovsubspace methods.
SIAM Review , 47:247–272, 2005.[18] H. A. van der Vorst and C. Vuik. The superlinear convergence behaviour of GMRES.
Journal ofComputational and Applied Mathematics , 48(3):327–341, 1993.[19] E. Vecharynski and J. Langou. The cycle-convergence of restarted GMRES for normal matrices issublinear. SIAM Journal on Scientific Computing, to appear.[20] B. Zhong and R. B. Morgan. Complementary cycles of restarted GMRES.
Numerical Linear Algebrawith Applications , 15(6):559–571, 2008.[21] J. Z´ıtko. Generalization of convergence conditions for a restarted GMRES.
Numerical Linear Algebrawith Applications , 7(3):117–131, 2000.[22] J. Z´ıtko. Some remarks on the restarted and augmented GMRES method.