A Provably Componentwise Backward Stable O(n^2) QR Algorithm for the Diagonalization of Colleague Matrices
TThe roots of a monic polynomial expressed in a Chebyshev basis are known to be theeigenvalues of the so-called colleague matrix, which is a Hessenberg matrix that is the sumof a symmetric tridiagonal matrix and a rank-1 matrix. The rootfinding problem is thusreformulated as an eigenproblem, making the computation of the eigenvalues of such matricesa subject of significant practical importance. In this manuscript, we describe an O ( n )explicit structured QR algorithm for colleague matrices and prove that it is componentwisebackward stable, in the sense that the backward error in the colleague matrix can berepresented as relative perturbations to its components. A recent result of Noferini, Robol,and Vandebril shows that componentwise backward stability implies that the backwarderror δc in the vector c of Chebyshev expansion coefficients of the polynomial has the bound (cid:107) δc (cid:107) (cid:46) (cid:107) c (cid:107) u , where u is machine precision. Thus, the algorithm we describe has both theoptimal backward error in the coefficients and the optimal cost O ( n ). We illustrate theperformance of the algorithm with several numerical examples. A Provably Componentwise Backward Stable O ( n ) QRAlgorithm for the Diagonalization of Colleague Matrices
K. Serkh † (cid:5) , V. Rokhlin ‡ ⊕ ,February 25, 2021 (cid:5)
This author’s work was supported in part by the NSERC Discovery Grants RGPIN-2020-06022 and DGECR-2020-00356. ⊕ This author’s work was supported in part under ONR N00014-18-1-2353 and NSFDMS-1952751. † Dept. of Math. and Computer Science, University of Toronto, Toronto, ON M5S 2E4 ‡ Dept. of Mathematics, Yale University, New Haven, CT 06511 a r X i v : . [ m a t h . NA ] F e b ontents QR . . . . . . . . . . . . . . 204.2 Backward Error Analysis of the QR Algorithms . . . . . . . . . . . . . . . 31 p rand ( x ): Polynomials with Random Coefficients . . . . . . . . . . . . . . 395.2 p wilk ( x ): Wilkinson’s Polynomial . . . . . . . . . . . . . . . . . . . . . . . 405.3 f sin ( x ): A Smooth Function . . . . . . . . . . . . . . . . . . . . . . . . . . 425.4 p mult ( x ): A Polynomial with Multiple Roots . . . . . . . . . . . . . . . . . 435.5 p yuji ( x ): A Pathological Example from [26] . . . . . . . . . . . . . . . . . 445.6 f cas ( x ): A Pathological Example from [14] . . . . . . . . . . . . . . . . . . 455.7 CPU Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 The problem of finding the roots of the polynomial p ( x ) = c + c x + · · · + c n − x n − + x n (1)is one of the oldest and most classical problems in mathematics. Countless methodshave been proposed for its solution (see, for example, [28] for a history, and the twovolumes [24] and [25] for a detailed account of such methods). In the 1800’s, it was observedby Frobenius that the roots of the polynomial are the eigenvalues of a certain matrixcalled the companion matrix , formed using the polynomial coefficients. A matrix whoseeigenvalues are the roots of p ( x ) is called a linearization of p ( x ). Given a linearization of p ( x ), the roots of the polynomial can thus be recovered by computing the eigenvalues ofthe matrix. 2f the roots of the polynomial are found numerically, than the computed roots canbe viewed as the exact roots of a perturbed polynomial p ( x ) + δp ( x ) with coefficients c i + δc i , where the size of the vector δc is called the backward error in the polynomialcoefficients. The best possible bound on the backward error that a linearization methodcan have for general polynomials is (cid:107) δc (cid:107) (cid:46) (cid:107) c (cid:107) u , (2)or, in other words, that the relative normwise backward error in the polynomial coefficientsis bounded by machine precision u (see, for example, [26]). The backward error in thecompanion matrix method was revealed by the influential paper [19], which analyzed therelationship between perturbations in the companion matrix and perturbations in thepolynomial coefficients. There, the authors proved that if the companion matrix C isperturbed by the matrix E , then the matrix C + E is a linearization of the polynomialwith perturbed coefficients c i + δc i , and that the perturbation satisfies the normwisebound (cid:107) δc (cid:107) (cid:46) (cid:107) c (cid:107)(cid:107) E (cid:107) u . (3)If the eigenvalues are computed by a standard QR algorithm, which is known to bebackward stable (see, for example, [35]), then the computed eigenvalues are the exacteigenvalues of C + E , where (cid:107) E (cid:107) (cid:46) (cid:107) C (cid:107) u . Since (cid:107) C (cid:107) ≈ (cid:107) c (cid:107) , it follows that the backwarderror in the polynomial coefficients is bounded by (cid:107) δc (cid:107) (cid:46) (cid:107) c (cid:107) u . (4)Thus, as (cid:107) c (cid:107) get larger, the relative backward error in the coefficients increases. Thecompanion matrix method, at first glance, would appear then to have two drawbacks: itfalls short of the optimal backward error bound (2), and it costs O ( n ) operations as aresult of using the QR algorthim.The situation improved dramatically in 2007, when Bini, Eidelman, Gemignani,and Gohberg published a paper [10] describing a stable, O ( n ) explicit QR method forcompanion matrices (around the same time, Chandrasekaran, Gu, Xia, and Zhu alsodiscovered an O ( n ) method for companion matrices, see [15]). The algorithm is based onthe observation that the companion matrix and its QR iterates have a certain structurewhich allows them to be represented by a collection of O ( n ) parameters called generators (specifically, the companion matrix is a Hessenberg matrix that is the sum of a unitarymatrix and a rank-1 perturbation; matrices of this form are called fellow matrices ). In2010, an implicit version of this algorithm, also stable and O ( n ), and also based ongenerators, was introduced in [9]. Around the same time, Van Barel, Vandebril, VanDooren, and Frederix discovered in [8] an alternative stable, O ( n ) implicit QR algorithmbased on representing the unitary part by so-called core transformations , which arerotation matrices acting only on two adjacent rows at a time (see, for example, [2]).The first example of a proof of backward stability for an implicit O ( n ) QR algorithmfor companion matrices was given by Aurentz, Mach, Vandebril, and Watkins in [4];this algorithm is again based on core transformations. The backward stability resultaccompanying this QR algorithm guarantees the sub-optimal bound (2), but has theoptimal complexity of O ( n ). Amazingly, the authors then discovered that the algorithm3hey had constructed, with some minor modifications, actually yields the optimal bound (4)in practice. An investigation showed that the reason for this remarkable behavior isthat their algorithm is not just backward stable, but is componentwise backward stable,meaning that the backward error in the companion matrix can be decomposed intoproportional backward errors in each of its components. They published a proof ofthe componentwise backward stability of their algorithm, together with a proof thatcomponentwise backward stability guarantees the bound (2), in [3], along with numericalexperiments.Thus, if the coefficients of the polynomial p ( x ) in the monomial basis are known,then the algorithm of [3] is optimal in both error and time complexity. However, if thecoefficients are not known, then the companion matrix cannot be used to the find theroots accurately, since the relationship between the values of the polynomial p ( x ) andthe coefficients of its monomial expansion is highly unstable (this fact has been knownfor many decades, at least as early as Wilkinson [31]). If the polynomial p ( x ) is insteadexpanded in a basis of Chebyshev polynomials p ( x ) = c + c T ( x ) + · · · + c n − T n − ( x ) + T n ( x ) , (5)where T i ( x ) is the Chebyshev polynomial of order i , then the relationship between thecoefficients and the polynomial is perfectly stable (see, for example, [36]). In fact, thisobservation is the basis for the Chebfun software package (see [6] and [16]). An analogueof the companion matrix, constructed from the Chebyshev expansion coefficients, wasdiscovered in 1961 by Good [21], who called it the colleague matrix , and independently bySpect in 1957 [33]–[34]. The first O ( n ) algorithm for colleague matrices was discovered byBini, Gemignani, and Pan in 2005 (even before [10] appeared) and is a stable, explicit QRalgorithm based on generators [11]. Like the companion matrix, the colleague matrix hasa special structure that is preserved over QR iterations (specifically, the colleague matrixis a Hessenberg matrix that is the sum of a Hermitian matrix and a rank-1 perturbation).In 2008, Eidelman, Gemignani, and Gohberg, in [20], introduced a stable, O ( n ) implicitQR algorithm. The relationship between the backward error in the Chebyshev expansioncoefficients and perturbations to the colleague matrix was first investigated Nakasukasaand Noferini in [26], where the authors found a lower bound for the backward error inthe coefficients, showing that a backward stable QR algorithm can do no better than (4)(around the same time, Lawrence, Van Barel, and Van Dooren published a general analysisin [23], where they also proved a lower bound for colleague matrices). In [30], Perezand Noferini improved on this result and found an upper bound as well, proving thatif the perturbation to the colleague matrix is small, then the bound (2) is achieved.The relationship between componentwise perturbations to the colleague matrix and thebackward error in the coefficients was described completely in 2019 by Noferini, Robol,and Vandebril in [27].Recently (see [32] and [14]), it was observed that certain O ( n ) structured QRalgorithms for colleague matrices are surprisingly stable, attaining the bound (2) inmany cases, an observation that mirrors the discovery in [3] for the case of companionmatrices. However, unlike in [3], all previously proposed O ( n ) structured QR algorithmsfor colleague matrices have polynomials for which the worst-case bound (4) is attained.Thus, the question of whether or not there exists a structured O ( n ) QR algorithmthat, when used to find the roots of a colleague matrix, attains the optimal bound (2),4as remained open. In this manuscript, we answer this question in the affirmative bypresenting a new, explicit O ( n ) QR algorithm for colleague matrices (in fact, for allHessenberg matrices that have a Hermitian plus rank-1 structure), and prove that ouralgorithm is componentwise backward stable. Combined with the result in [27], thisamounts to a proof that the optimal bound (2) is attained for all polynomials p ( x ). Wedemonstrate that this is indeed the case with several numerical experiments.The structure of this manuscript is as follows. Section 2 describes the mathematical andnumerical preliminaries. Section 3 describes the algorithm, and explains the significanceof each step. In Section 4, we prove rigorously that the algorithm is componentwisebackward stable. Section 5 presents the results of several numerical experiments. InSection 6, we discuss possible extensions and generalizations of the algorithm. In this section, we describe the mathematical and numerical preliminaries.
The following lemma states that if the sum of a Hermitian matrix and a rank-1 update pq ∗ is lower Hessenberg, then the matrix is determined entirely by its diagonal andsuperdiagonal together with the vectors p and q . Lemma 2.1 (Eidelman, Gemignani, Gohberg [20]) . Suppose that A ∈ C n × n is Hermitian,and let d and β denote the diagonal and superdiagonal of A , respectively. Suppose that p, q ∈ C n and that A + pq ∗ is lower Hessenberg. Then a i,j = − p i q ∗ j if j > i + 1 β i if j = i + 1 d i if j = iβ j if j = i − − q j p ∗ i if j < i − where a i,j denotes the ( i, j ) -th entry of A . The following lemma states that if the sum of a matrix and a rank-1 update pq ∗ islower triangular, then the upper Hessenberg part of the matrix is determined entirely byits diagonal and subdiagonal, together with the vectors p and q . Lemma 2.2.
Suppose that B ∈ C n × n and let d and γ denote the diagonal and subdiagonalof B , respectively. Suppose that p, q ∈ C n and that B + pq ∗ is lower triangular. Then b i,j = − p i q ∗ j if j > id i if j = iγ j if j = i − where b i,j denotes the ( i, j ) -th entry of B . The following definition introduces two matrix seminorms that we will need in ourerror analysis. 5 efinition 2.1.
Suppose that A ∈ C n × n and let a i,j denote the ( i, j )-th entry of A . Wewill use the notation (cid:107)·(cid:107) H to denote the square root of the sum of squares of the entriesin the upper Hessenberg part of a matrix, so that (cid:107) A (cid:107) H = (cid:115) (cid:88) j ≥ i − | a i,j | . (8)Likewise, we will use the notation (cid:107)·(cid:107) T to denote the square root of the sum of squaresof the entries in the upper triangular part, so that (cid:107) A (cid:107) T = (cid:115)(cid:88) j ≥ i | a i,j | . (9)The following is a straightforward lemma stating that if a certain sequence of transfor-mations is applied to a matrix on the right, then the upper triangular part of the resultis determined by only the upper Hessenberg part of the original matrix. Lemma 2.3.
Suppose that B ∈ C n × n , and let P , P , . . . , P n ∈ C n × n be matrices suchthat P k only affects the ( k − , k ) -plane of any vector it is applied to. Define P ∈ C n × n by the formula P = P P · · · P n . Then the upper triangular part of BP ∗ is determinedentirely by the upper Hessenberg part of B . Furthermore, if P , P , . . . , P n are unitary,then (cid:107) BP ∗ (cid:107) T ≤ (cid:107) B (cid:107) H . The following definition introduces the notation used in the error analysis that appearsin this manuscript. We follow the notation used in [22] and [3].
Definition 2.2.
Evaluation of an expression in floating point arithmetic is denoted by fl ( · ), and we denote the unit roundoff (or machine epsilon) by u . We assume that fl ( x op y ) = ( x op y )(1 + δ ) , | δ | ≤ u , (10)where op stands for any of the basic arithmetic operations + , − , ∗ , / . We denote computedquantities by a hat, so that (cid:98) x denotes the computed approximation to x . We use thenotation (cid:46) to mean “less than or equal to the right hand side times a modest multiplicativeconstant depending on n as a low-degree polynomial”, where the meaning of n is clearfrom the context. Whenever a matrix or vector norm appears to the left or right of (cid:46) ,we omit the particular choice of norm, since in finite dimensions all norms are equivalent.When u appears in an expression on the right hand side of (cid:46) , we ignore all higher orderpowers of u . The following lemma bounds the forward error of the floating point computation of acomplex plane rotation (see, for example, §
20 of [37]).6 emma 2.4.
Suppose that x = ( x , x ) T ∈ C , and let Q ∈ SU(2) be the complexrotation matrix which eliminates the first entry, so that ( Qx ) = 0 . Let (cid:98) Q ∈ C × be thefloating point matrix defined by (cid:98) Q , = c (cid:98) Q , = − s (11) (cid:98) Q , = s (cid:98) Q , = c (12)(13) where c = fl (cid:0) x / (cid:113) | x | + | x | (cid:1) and s = fl (cid:0) x / (cid:113) | x | + | x | (cid:1) , and where c = 1 and s = 0 if (cid:107) x (cid:107) = 0 . Then (cid:107) (cid:98) Q − Q (cid:107) (cid:46) u . (14) The following lemma estimates the forward error of applying a plane rotation to a vector(see, for example, §
21 of [37]).
Lemma 2.5.
Suppose that Q ∈ SU(2) is a complex rotation matrix, and let (cid:98) Q be a floatingpoint approximation to Q satisfying (14). Suppose further that x = ( x , x ) T ∈ C . Then (cid:107) fl ( (cid:98) Qx ) − Qx (cid:107) (cid:46) (cid:107) x (cid:107) u. (15) Suppose that p ( x ) is a monic polynomial of order n represented by p ( x ) = n (cid:88) j =0 c j T j ( x ) , (16)where c j ∈ R , c n = 1, and T j ( x ) is the Chebyshev polynomial of order j . It turns outthat the roots of p ( x ) are the eigenvalues of the (scaled) n × n colleague matrix C = √ √ . . . . . .. . . 0 − e n (cid:0) c √ c c · · · c n − (cid:1) , (17)where e n is the n -th unit vector (see, for example, [21]). A matrix C whose eigenvaluesare the roots of p ( x ) is called a linearization of p ( x ). Letting A = √ √ . . . . . .. . . 0 (18)7nd q ∗ = − (cid:0) c √ c c · · · c n − (cid:1) , (19)we see that the colleague matrix C can be written as C = A + e n q ∗ , (20)where A is Hermitian and C is lower Hessenberg.The following beautiful theorem by Noferini, Robol, and Vandebril (see Corollary 5.4of [27]) bounds the change in the coefficients of the polynomial being linearized by thecomponentwise perturbations of the colleague matrix C = A + e n q ∗ . Theorem 2.6.
Let C = A + e n q ∗ be the linearization (17) of the monic polynomial p ( x ) ,expressed in the Chebyshev basis. Consider the perturbations (cid:107) δA (cid:107) ≤ (cid:15) A , (cid:107) δe n (cid:107) ≤ (cid:15) n , and (cid:107) δq (cid:107) ≤ (cid:15) q . Then, the matrix C + δC = A + δA + ( e n + δe n )( q + δq ) ∗ (21) is a linearization of the polynomial p ( x ) + δp ( x ) = n (cid:88) j =0 ( c j + δc j ) T j ( x ) , (22) where (cid:107) δc (cid:107) (cid:46) (cid:15) n + (cid:15) q + (cid:107) c (cid:107) (cid:15) A . Suppose that p ( x ) is a monic polynomial of order n represented in the Chebshev basis p ( x ) = n (cid:88) j =0 c j T j ( x ) , (23)where c j ∈ R , c n = 1, and T j ( x ) is the Chebyshev polynomial of order j , and let the rootsof p ( x ) be denoted by x , x , . . . x n ∈ C . Suppose that a rootfinding algorithm returnsthe computed roots (cid:98) x , (cid:98) x , . . . , (cid:98) x n ∈ C . If the computed roots are the exact roots of someperturbed polynomial p ( x ) + δp ( x ) = n (cid:88) j =0 ( c j + δc j ) T j ( x ) , (24)where (cid:107) δc (cid:107)(cid:107) c (cid:107) (cid:46) u , (25)then we say that the rootfinding algorithm is backward stable. In fact, this is the bestbackward stability bound that can be hoped for, for general polynomials p ( x ) (see thediscussion in Appendix A of [26]). 8 emark 2.1. Suppose that p ( x ) is a polynomial of order n , that is not monic, expressedin the Chebyshev basis p ( x ) = n (cid:88) j =0 a j T j ( x ) , (26)where a j ∈ R and T j ( x ) is the Chebyshev polynomial of order j . Clearly, the roots of p ( x )are identical to the roots of p ( x ) /a n . Let c j = a j /a n , for j = 0 , , . . . , n . If a backwardstable rootfinding algorithm is applied to the monic polynomial p ( x ) /a n , then, letting δa = δc · a n , the algorithm is also backward stable with respect to the original coefficients a j , since (cid:107) δa (cid:107)(cid:107) a (cid:107) = a n (cid:107) δa (cid:107) a n (cid:107) a (cid:107) = (cid:107) δc (cid:107)(cid:107) c (cid:107) (cid:46) u . (27)When linearization is used as a rootfinding algorithm, the stability of the computedroots comes from the stability of the eigenvalue algorithm applied to the colleague matrix C . If the eigenvalues of C are computed by an unstructured QR algorithm, then thebackward error δC on C is bounded by (cid:107) δC (cid:107) (cid:46) (cid:107) C (cid:107) u . Since the backward error isunstructured, it follows that (cid:107) δA (cid:107) ≈ (cid:107) C (cid:107) u , so, by Theorem 2.6 together with the factthat (cid:107) C (cid:107) ≈ (cid:107) c (cid:107) , the backward error in the coefficients is bounded by (cid:107) δc (cid:107) (cid:46) (cid:107) c (cid:107) u . Remark 2.2.
This backward error can be reduced by partially balancing the matrix C .Suppose that, instead of computing the eigenvalues of C , we compute the eigenvalues of (cid:101) C = √ √ . . . . . .. . . 0 (cid:107) c (cid:107) (cid:107) c (cid:107) − (cid:107) c (cid:107) e n (cid:16) c √ c c · · · (cid:107) c (cid:107) c n − (cid:17) . (28)Provided that the entry in the ( n, n )-position is small, we have that (cid:107) (cid:101) C (cid:107) ≈ (cid:107) c (cid:107) , so thebackward error δ (cid:101) C of unstructured QR is bounded by (cid:107) δ (cid:101) C (cid:107) (cid:46) (cid:107) c (cid:107) u . In practice, itturns out that (cid:107) δA (cid:107) ≈ (cid:107) δ (cid:101) C (cid:107) (cid:46) (cid:107) c (cid:107) , so (cid:107) δc (cid:107) (cid:46) (cid:107) c (cid:107) u . The assumption that the ( n, n )-thelement is small is not always satisfied. However, usually the norm of c is large becausethe last coefficient a n in the non-monic expansion (26) is small. When this is the case,we can simply raise the order of the expansion by one by taking an additional term. Thelast two terms will both be small and roughly the same size, making the ( n, n )-th elementsmall. Notice also that, by adjusting the last row, the matrix (cid:101) C can be represented as asymmetric tridiagonal matrix of magnitude (cid:107) c (cid:107) plus a rank-1 matrix of magnitude (cid:107) c (cid:107) . Remark 2.3.
Let bal ( C ) denote the matrix C after complete balancing. Remarkably,in some situations, (cid:107) bal ( C ) (cid:107) ≈ (cid:107) c (cid:107) is large. Thus, complete balancingcan completely eliminate large entries in C , at the expense of destroying its symmetrictridiagonal plus rank-1 structure. See Remark 5.1, as well as the paper [29], for a moredetailed discussion. 9 emark 2.4. While unstructured QR, applied to the colleague matrix, is known toachieve only the backward error bound (cid:107) δc (cid:107) (cid:46) (cid:107) c (cid:107) u , the QZ algorithm, applied to anappropriately scaled matrix pencil, does result in a backward stable rootfinder with thebound (cid:107) δc (cid:107) (cid:46) (cid:107) c (cid:107) u (see, for example, [26]). This is because the eigenvalue problemfor the colleague matrix can be written as a matrix pencil A − λB , where both A and B are small, and a backward stable QZ algorithm applied to the pencil computes theexact eigenvalues of a perturbed pencil ( A + δA ) − λ ( B + δB ), where (cid:107) δA (cid:107) (cid:46) (cid:107) A (cid:107) u and (cid:107) δB (cid:107) (cid:46) (cid:107) B (cid:107) u . Unfortunately, it appears to be very difficult to construct structured, O ( n ) QZ algorithms for colleague matrices that retain the nice stability properties ofthe unstructured, O ( n ) QZ algorithm.The following theorem, stated in a slightly different form in [27], says that if theeigenvalues of the colleague matrix C = A + e n q ∗ are computed using a componentwisebackward stable algorithm, then linearization is backward stable as a rootfinding algorithm.It follows immediately from Theorem 2.6. Theorem 2.7.
Suppose that the eigenvalues of the colleague matrix C = A + e n q ∗ arecomputed by a componentwise backward stable algorithm, in the sense that the computedeigenvalues are the exact eigenvalues of the matrix C + δC = A + δA + ( e n + δe n )( q + δq ) ∗ , (29) where (cid:107) δA (cid:107) (cid:46) (cid:107) A (cid:107) u ≈ u, (cid:107) δe n (cid:107) (cid:46) (cid:107) e n (cid:107) u ≈ u, and (cid:107) δq (cid:107) (cid:46) (cid:107) q (cid:107) u. Then, linearization isbackward stable as a rootfinding algorithm, with (cid:107) δc (cid:107) (cid:46) (cid:107) c (cid:107) u. It was pointed out to the authors that, while the Hessenberg matrices in this manuscriptare all lower Hessenberg, the standard convention in numerical linear algebra is to studythe transpose of the problem, and consider only upper Hessenberg matrices (see [18]). Theupper Hessenberg form is much better notationally since, in upper Hessenberg form, thefirst elimination step eliminates the entry in the (2 , n − , n )-position is eliminated first. Furthermore, the upperHessenberg form is more convenient when representing polynomials in the Lagrange basis(see, for example, [17]). Unfortunately, at the time that this was all pointed out, most ofthe writing and numerical codes were complete, and had been written in lower Hessenbergform because of a historical fluke related to the structure of old explicit QR codes thatwere used as a template for our algorithm. In this section, we give an overview of our algorithm. We begin by describing the class ofmatrices our algorithm can be applied to. Let F n ⊂ C n × n be the set of lower Hessenbergmatrices of the form A + pq ∗ , (30)where A ∈ C n × n is Hermitian and p, q ∗ ∈ C n . Eidelman, Gemignani, and Gohbergobserved in [20] that the matrix A is determined entirely by:10. The diagonal entries d i = a i,i , for i = 1 , , . . . , n ;2. The superdiagonal entries β i = a i,i +1 for i = 1 , , . . . , n − p and q (see Lemma 2.1). Following [20], we call these four vectors the basic elements or generators of A . In [20], the authors construct an implicit QR algorithm that takes advantage of thisstructure to achieve a cost of O ( n ). They also prove that their algorithm is backwardstable, in the sense that, when the algorithm is used to compute the eigenvalues of amatrix C ∈ F n , the computed eigenvalues are the exact eigenvalues of C + δC , where (cid:107) δC (cid:107) (cid:46) (cid:107) C (cid:107) u . This is the same backward stability bound that is provided by anunstructured QR algorithm. In this manuscript, we describe a new explicit QR algorithm for matrices A + pq ∗ ∈ F n ,that also has the cost O ( n ) , and prove that our algorithm is componentwise backwardstable, in the sense that the computed eigenvalues are the exact eigenvalues of ( A + δA ) +( p + δp )( q + δq ) ∗ , where (cid:107) δA (cid:107) (cid:46) (cid:107) A (cid:107) u, (cid:107) δp (cid:107) (cid:46) (cid:107) p (cid:107) u, and (cid:107) δq (cid:107) (cid:46) (cid:107) q (cid:107) u. To motivate our algorithm, consider first the naive unshifted QR algorithm in exactarithmetic, applied to the matrix C = A + pq ∗ . Let the matrix U n ∈ C n × n be the unitarymatrix that rotates the ( n − , n )-plane so that( U n C ) n − ,n = 0 , (31)eliminating the superdiagonal in the ( n − , n )-th position. Likewise, let U n − ∈ C n × n denote the unitary matrix rotating the ( n − , n − U n − U n C ) n − ,n − = 0 , (32)eliminating the superdiagonal in the ( n − , n − U n − , U n − , . . . , U be the unitary matrices eliminating the superdiagonal entries in the( n − , n − , ( n − , n − , . . . , (1 ,
2) positions of the matrices ( U n − U n C ) , ( U n − U n − U n C ) ,. . . , ( U · · · U n − U n C ), respectively. Letting U = U U · · · U n , we have that U C is lowertriangular. This matrix has the form
U C = B + ( U p ) q ∗ , (33)where the upper Hessenberg part of the matrix B = U A is determined entirely by:1. The diagonal entries d i = b i,i , for i = 1 , , . . . , n ;2. The subdiagonal entries γ i = b i +1 ,i , for i = 1 , , . . . , n − p = U p and q (see Lemma 2.2). Like with the matrix A , we call these four vectors the basic elements or generators of (the upper Hessenberg part of) B .Next, the matrix is multiplied by U ∗ on the right; clearly, U CU ∗ = BU ∗ + U p ( U q ) ∗ , (34)11o U CU ∗ = U AU ∗ + U p ( U q ) ∗ . (35)It’s easy to show that, since U C is lower triangular and U ∗ = U ∗ n U ∗ n − · · · U ∗ , where U k rotates the ( k − , k )-plane, the matrix U CU ∗ is lower Hessenberg. Thus, U CU ∗ ∈ F n ,and the matrix A = U AU ∗ is determined entirely by its diagonal and superdiagonal,together with p = U p and q = U q . Furthermore, the upper triangular part of
U AU ∗ is determined entirely by the upper Hessenberg part of B (see Lemma 2.3), and since U AU ∗ is Hermitian, it follows that the whole of the matrix U AU ∗ is determined entirelyby the upper Hessenberg part of B .In our algorithm, we use only the basic elements of A and B to represent ourmatrices. This results in a single iteration of our QR algorithm requiring O ( n ) operations.Furthermore, we prove that the matrix (cid:98) A and vectors (cid:98) p and (cid:98) q , computed by a singleiteration of our QR algorithm, have the componentwise forward error bounds (cid:107) (cid:98) A − A (cid:107) (cid:46) (cid:107) A (cid:107) u , (cid:107) (cid:98) p − p (cid:107) (cid:46) (cid:107) p (cid:107) u , and (cid:107) (cid:98) q − q (cid:107) (cid:46) (cid:107) p (cid:107) u . We then show that these componentwiseforward error bounds result in componentwise backward stability. In this section, we describe how our algorithm performs a single elimination of a su-perdiagonal element (see Algorithm 1). Suppose that we have already eliminated thesuperdiagonal elements in the positions ( n − , n ) , ( n − , n − , . . . , ( k, k + 1). Let p ( k +1) = U k +1 U k +2 · · · U n p and B ( k +1) = U k +1 U k +2 · · · U n A . Suppose further that (cid:98) p ( k +1) and (cid:98) B ( k +1) are the computed approximations to p ( k +1) and B ( k +1) , and that the upperHessenberg part of the computed matrix (cid:98) B ( k +1) is represented by its generators:1. The diagonal elements (cid:98) d ( k +1) i = (cid:98) b ( k +1) i,i , for i = 1 , , . . . , n ;2. The superdiagional elements (cid:98) β ( k +1) i = (cid:98) b ( k +1) i,i +1 , for i = 1 , , . . . , k − (cid:98) γ ( k +1) i = (cid:98) b ( k +1) i +1 ,i , for i = 1 , , . . . , n − (cid:98) p ( k +1) and q , from which the remaining elements in the upper Hessen-berg part are inferred.Suppose that (cid:107) (cid:98) B ( k +1) − B ( k +1) (cid:107) H (cid:46) (cid:107) A (cid:107) u and (cid:107) (cid:98) p ( k +1) − p ( k +1) (cid:107) (cid:46) (cid:107) p (cid:107) u . Notice that, ifwe define (cid:98) B ( n +1) = B ( n +1) = A and (cid:98) p ( n +1) = p ( n +1) = p , then this is obviously true for k = n .To eliminate the superdiagonal element in the ( k − , k ) position of (cid:98) B ( k +1) + (cid:98) p ( k +1) q ∗ ,we first compute the rotation matrix Q k ∈ SU(2) that eliminates it by a rotation in the( k − , k )-plane (see Line 4 of Algorithm 1). Next, we apply the rotation matrix separatelyto the generators of (cid:98) B ( k +1) and to the vector (cid:98) p ( k +1) . Since we are only interested incomputing the upper Hessenberg part of (cid:98) B ( k ) , we need to update the subdiagonal elementin the ( k − , k −
2) position of (cid:98) B ( k +1) , represented by (cid:98) γ ( k +1) k − (see Figure 1). However, thiscalculation requires the sub-subdiagonal entry in the ( k, k −
2) position of (cid:98) B ( k +1) , whichis unknown to us since only the upper Hessenberg part of (cid:98) B ( k +1) is available. Fortunately,it can be recovered by the following trick. 12 . . . (cid:98) γ ( k +1) k − (cid:98) d ( k +1) k − (cid:98) β ( k +1) k − − (cid:98) p ( k +1) k − q ∗ k − (cid:98) p ( k +1) k − q ∗ k +1 · · ·· · · × (cid:98) γ ( k +1) k − (cid:98) d ( k +1) k − (cid:98) β ( k +1) k − − (cid:98) p ( k +1) k − q ∗ k +1 · · ·· · · × (cid:98) b ( k +1) k,k − (cid:98) γ ( k +1) k − (cid:98) d ( k +1) k − (cid:98) p ( k +1) k q ∗ k +1 · · ·· · · × × × (cid:98) γ ( k +1) k (cid:98) d ( k +1) k +1 . . . Figure 1: The ( k − k -th rows of (cid:98) B ( k +1) , represented by its generators.Since B ( k +1) = U k +1 U k +2 · · · U n A and A is Hermitian, it follows that B ( k +1) U ∗ n U ∗ n − · · · U ∗ k +1 is also Hermitian. Thus,( B ( k +1) U ∗ n U ∗ n − · · · U ∗ k +1 ) k,k − = ( B ( k +1) U ∗ n U ∗ n − · · · U ∗ k +1 ) k − ,k . (36)Furthermore, since right-multiplication by U ∗ j only affects columns j and j − U ∗ n U ∗ n − · · · U ∗ k +1 leaves b ( k +1) k,k − unchanged.Therefore, b ( k +1) k,k − = ( B ( k +1) U ∗ n U ∗ n − · · · U ∗ k +1 ) k − ,k . (37)We know that the entries in the ( k − , k ) , ( k − , k + 1) , . . . , ( k − , n ) positions of (cid:98) B ( k +1) are inferred from (cid:98) p ( k +1) and q by the formula (cid:98) b ( k +1) k − ,(cid:96) = − (cid:98) p ( k +1) k − q ∗ (cid:96) , (38)for (cid:96) = k, k + 1 , . . . , n . Combining (37) and (38), we thus have that the sub-subdiagonalentry in the ( k, k −
2) position of (cid:98) B ( k +1) can be recovered by the formula (cid:98) b ( k +1) k,k − = ( − (cid:98) p ( k +1) k − q ∗ U ∗ n U ∗ n − · · · U ∗ k +1 ) k . (39)Defining (cid:101) q ( k +1) = U k +1 U k +2 · · · U n q , we have (cid:98) b ( k +1) k,k − = − (cid:101) q ( k +1) k (cid:98) p ( k +1) ∗ k − . (40)By computing the vector (cid:98)(cid:101) q ( k +1) (see Line 14 of Algorithm 1), we use this formula torecover the sub-subdiagonal element (cid:98) b ( k +1) k,k − .Thus, the element in the ( k − , k −
2) position of (cid:98) B ( k +1) , represented by (cid:98) γ ( k +1) k − , isupdated in Line 6 of Algorithm 1. Next, the elements in the ( k − , k −
1) and ( k, k − (cid:98) B ( k +1) , represented by (cid:98) d ( k +1) k − and (cid:98) γ ( k +1) k − , respectively, are updated in astraightforward way in Line 8. Finally, the elements in the ( k − , k ) and ( k, k ) positionsof (cid:98) B ( k +1) , represented by (cid:98) β ( k +1) k − and (cid:98) d ( k +1) k , respectively, are updated in Line 9, and thevector (cid:98) p ( k +1) is rotated in Line 10. 13ince we’ve eliminated the superdiagonal element in the ( k − , k ) position of (cid:98) B ( k +1) + (cid:98) p ( k +1) q ∗ , we have that the ( k − , k ) element of the matrix (cid:98) B ( k ) is inferred from (cid:98) p ( k ) and q by the formula (cid:98) b ( k ) k − ,k = − (cid:98) p ( k ) k − q ∗ k . (41)Now, we would like the upper Hessenberg part of (cid:98) B ( k ) to have a small componentwiseerror, so that (cid:107) (cid:98) B ( k ) − B ( k ) (cid:107) H (cid:46) (cid:107) A (cid:107) u . However, consider the following scenario. Sup-pose that the norm of ( (cid:98) p ( k +1) k − q ∗ k , (cid:98) p ( k +1) k q ∗ k ) T is much larger than (cid:107) A (cid:107) . By Lemma 2.5,the error in (cid:98) p ( k ) k − q ∗ k will be approximately (cid:16)(cid:113) | (cid:98) p ( k +1) k − q ∗ k | + | (cid:98) p ( k +1) k q ∗ k | (cid:17) u , which will bemuch larger than (cid:107) A (cid:107) u . In this situation then, even if (cid:107) (cid:98) p ( k +1) − p ( k +1) (cid:107) (cid:46) (cid:107) p (cid:107) u and (cid:107) (cid:98) B ( k +1) − B ( k +1) (cid:107) H (cid:46) (cid:107) A (cid:107) u , we will not have (cid:107) (cid:98) B ( k ) − B ( k ) (cid:107) H (cid:46) (cid:107) A (cid:107) u . To remedy this,we must apply a correction to (cid:98) p ( k ) k − . Recall that the rotation matrix Q k was defined to bethe matrix eliminating the ( k − , k )-th entry of (cid:98) B ( k +1) + (cid:98) p ( k +1) q ∗ in exact arithmetic. Ifwe let (˚ p ( k ) k − , ˚ p ( k ) k ) T denote the result of applying Q k to ( (cid:98) p ( k +1) k − , (cid:98) p ( k +1) k ) T in exact arith-metic, and likewise let (˚ β ( k ) k − , ˚ d ( k ) k ) T denote the result of applying Q k to ( (cid:98) β ( k +1) k − , (cid:98) d ( k +1) k ) T in exact arithmetic, then, by the definition of Q k , we have˚ β ( k ) k − + ˚ p ( k ) k − q ∗ k = 0 . (42)By Lemma 2.5, we have that | ˚ β ( k ) k − − (cid:98) β ( k ) k − | (cid:46) (cid:16)(cid:114) | (cid:98) β ( k +1) k − | + | (cid:98) d ( k +1) k | (cid:17) u (43)and | ˚ p ( k ) k − q ∗ k − (cid:98) p ( k ) k − q ∗ k | (cid:46) (cid:16)(cid:114) | (cid:98) p ( k +1) k − q ∗ k | + | (cid:98) p ( k +1) k q ∗ k | (cid:17) u . (44)Thus, if | (cid:98) p ( k +1) k − q ∗ k | + | (cid:98) p ( k +1) k q ∗ k | > | (cid:98) β ( k +1) k − | + | (cid:98) d ( k +1) k | , then we set (cid:98) p ( k ) k − q ∗ k = − (cid:98) β ( k ) k − , (45)so (cid:98) p ( k ) k − = − (cid:98) β ( k ) k − /q ∗ k (46)(see Line 12 of Algorithm 1). With this correction to (cid:98) p ( k ) k − , it is easy to see that (cid:107) (cid:98) B ( k ) − B ( k ) (cid:107) H (cid:46) (cid:107) A (cid:107) u and (cid:107) (cid:98) p ( k ) − p ( k ) (cid:107) (cid:46) (cid:107) p (cid:107) u . If, on the other hand, | (cid:98) p ( k +1) k − q ∗ k | + | (cid:98) p ( k +1) k q ∗ k | ≤ | (cid:98) β ( k +1) k − | + | (cid:98) d ( k +1) k | , then the correction is not neccessary, since in thiscase the error in (cid:98) p ( k ) k − q ∗ k is smaller than the error in (cid:98) β ( k ) k − .This process of eliminating the superdiagonal elements can be repeated, until theupper Hessenberg part of the matrix (cid:98) B , approximating B = U U · · · U n A , is obtained,together with (cid:98) p , approximating p = U U · · · U n p (see Algorithm 1). In Section 4.1,Lemma 4.1, we prove that the forward errors in the upper Hessenberg part of (cid:98) B and inthe vector (cid:98) p are proportional to (cid:107) A (cid:107) u and (cid:107) p (cid:107) u , respectively.14 lgorithm 1 (A single elimination of the superdiagonal) Inputs:
This algorithmaccepts as inputs two vectors d and β representing the diagonal and superdiagonal,respectively, of an n × n Hermitian matrix A , as well as two vectors p and q of length n , where A + pq ∗ is lower Hessenberg. Outputs:
It returns as its outputs the rotationmatrices Q , Q , . . . , Q n ∈ C × so that, letting U k ∈ C n × n , k = 2 , , . . . , n , denote thematrices that rotate the ( k − , k )-plane by Q k , U U · · · U n ( A + pq ∗ ) is lower triangular. Italso returns the vectors d , γ , and p , where d and γ represent the diagonal and subdiagonal,respectively, of the matrix U U · · · U n A , and p = U U · · · U n p . Set γ ← β , where γ represents the subdiagonal. Make a copy of q , setting (cid:101) q ← q . for k = n, n − , . . . , do Construct the 2 × Q k ∈ SU(2) so that (cid:16) Q k (cid:20) β k − + p k − q ∗ k d k + p k q ∗ k (cid:21)(cid:17) = 0 . if k (cid:54) = 2 then Rotate the subdiagonal and the sub-subdiagonal: γ k − ← (cid:16) Q k (cid:20) γ k − − (cid:101) q k p ∗ k − (cid:21)(cid:17) end if Rotate the diagonal and the subdiagonal: (cid:20) d k − γ k − (cid:21) ← Q k (cid:20) d k − γ k − (cid:21) . Rotate the superdiagonal and the diagonal: (cid:20) β k − d k (cid:21) ← Q k (cid:20) β k − d k (cid:21) . Rotate p : (cid:20) p k − p k (cid:21) ← Q k (cid:20) p k − p k (cid:21) if | p k − q ∗ k | + | p k q ∗ k | > | β k − | + | d k | then Correct the vector p , setting p k − ← − β k − q ∗ k end if Rotate (cid:101) q : (cid:20) (cid:101) q k − (cid:101) q k (cid:21) ← Q k (cid:20) (cid:101) q k − (cid:101) q k (cid:21) end for Set d ← d , γ ← γ , and p ← p . 15 . . . ... ... ... (cid:98) d ( k +1) k − − p k − (cid:98) q ( k +1) ∗ k − − p k − (cid:98) q ( k +1) ∗ k − p k − (cid:98) q ( k +1) ∗ k +1 γ k − (cid:98) d ( k +1) k − − p k − (cid:98) q ( k +1) ∗ k − p k − (cid:98) q ( k +1) ∗ k +1 × γ k − (cid:98) d ( k +1) k (cid:98) β ( k +1) k × × × (cid:98) d ( k +1) k +1 ... ... ... . . . Figure 2: The ( k − k -th columns of (cid:98) A ( k +1) , represented by its generators. In this section, we describe how our algorithm rotates the triangular matrix produced byan elimination of the superdiagonal back to lower Hessenberg form (see Algorithm 2).Suppose that B is an n × n matrix and that p and q are vectors such that B + pq ∗ islower triangular. Notice that this condition is satisfied by the matrix (cid:98) B and the vectors (cid:98) p and q from the preceding section, produced by an elimination of the superdiagonal.Let γ denote the subdiagonal of B . Suppose that we have already applied the rotationmatrices U ∗ n , U ∗ n − , · · · , U ∗ k +1 to the right of B and q ∗ , and let q ( k +1) = U k +1 U k +2 · · · U n q and A ( k +1) = BU ∗ n U ∗ n − · · · U ∗ k +1 . Suppose that (cid:98) q ( k +1) and (cid:98) A ( k +1) are the computedapproximations to q ( k +1) and A ( k +1) , respectively, and that the upper triangular part of (cid:98) A ( k +1) is represented by its generators:1. The diagonal entries (cid:98) d ( k +1) i = (cid:98) a ( k +1) i,i , for i = 1 , , . . . , n ;2. The superdiagonal entries (cid:98) β ( k +1) i = (cid:98) a ( k +1) i,i +1 for i = k, k + 1 , . . . , n − p and (cid:98) q ( k +1) , from which the remaining elements in the upper triangularpart are inferred.Suppose that (cid:107) (cid:98) A ( k +1) − A ( k +1) (cid:107) T (cid:46) (cid:107) B (cid:107) H u and (cid:107) (cid:98) q ( k +1) − q ( k +1) (cid:107) (cid:46) (cid:107) q (cid:107) u . Notice that,if we define (cid:98) A ( n +1) = A ( n +1) = B and (cid:98) q ( n +1) = q ( n +1) = q , then this is obviously true for k = n .To apply the matrix U ∗ k to (cid:98) A ( k +1) + p (cid:98) q ( k +1) on the right, we apply the matrix Q ∗ k ∈ SU(2) separately to the generators of (cid:98) A ( k +1) and to the vector (cid:98) q ( k +1) . We startby rotating the diagonal and superdiagonal elements in the ( k − , k −
1) and ( k − , k )positions of (cid:98) A ( k +1) , represented by (cid:98) d ( k +1) k − and − p k − (cid:98) q ( k +1) ∗ k , respectively, in Line 2 ofAlgorithm 2, saving the superdiagonal element in (cid:98) β ( k ) k − (see Figure 2). Next, we rotatethe elements in the ( k, k −
1) and ( k, k ) positions, represented by γ k − (the ( k − B ) and (cid:98) d ( k +1) k , respectively, in a straightforward way inLine 3; since we are only interested in computing the upper triangular part of (cid:98) A ( k ) , weonly update the diagonal entry. Finally, we rotate the vector (cid:98) q ( k +1) in Line 4.16he process of applying the rotation matrices on the right can be repeated, until theupper triangular part of the matrix (cid:98) A , approximating A = BU ∗ n U ∗ n − · · · U ∗ , is obtained,together with (cid:98) q , approximating q = U U · · · U n q (see Algorithm 2). In Section 4.1,Lemma 4.2, we prove that the forward errors in the upper triangular part of (cid:98) A and inthe vector (cid:98) q are proportional to (cid:107) B (cid:107) H u and (cid:107) q (cid:107) u , respectively. The elimination of the superdiagonal described in Algorithm 1, followed by the rotationback to Hessenberg form described in Algorithm 2, can be iterated to find the eigenvaluesof A + pq ∗ . Our unshifted explicit QR algorithm, based on this iteration, is described inAlgorithm 3. This unshifted QR algorithm can be accelerated by the introduction of shifts;our explicit shifted QR algorithm, with Wilkinson shifts, is described in Algorithm 4.In Section 4.1, we show that the forward error of one iteration of our QR algorithm(Algorithm 1 followed by Algorithm 2) satisfies componentwise forward error bounds. InSection 4.2, we use this result to prove that both our explicit unshifted QR algorithm(Algorithm 3) and our shifted QR algorithm (Algorithm 4) are componentwise backwardstable. Algorithm 2 (Rotating the matrix back to Hessenberg form)
Inputs:
This algorithmaccepts as inputs n − Q , Q , . . . , Q n ∈ C n × n , two vectors d and γ representing the diagonal and subdiagonal, respectively, of an n × n complex matrix B , and two vectors p and q of length n , where B + pq ∗ is lower triangular. Outputs:
Letting U k ∈ C n × n , k = 2 , , . . . , n , denote the matrices that rotate the ( k − , k )-planeby Q k , this algorithm returns as its outputs the vectors d , β , and q , where d and β represent the diagonal and superdiagonal, respectively, of the matrix BU ∗ n U ∗ n − · · · U ∗ ,and q = U U · · · U n q . for k = n, n − , . . . , do Rotate the diagonal and the superdiagonal: (cid:20) d k − β k − (cid:21) ← Q k (cid:20) d k − − p k − q ∗ k (cid:21) . Rotate the subdiagonal and the diagonal: d k ← (cid:16) Q k (cid:20) γ k − d k (cid:21)(cid:17) Rotate q : (cid:20) q k − q k (cid:21) ← Q k (cid:20) q k − q k (cid:21) end for Set d ← d , β ← β , and q ← q . 17 lgorithm 3 (Unshifted explicit QR) Inputs:
This algorithm accepts as inputs twovectors d and β representing the diagonal and superdiagonal, respectively, of an n × n Hermitian matrix A , as well as two vectors p and q of length n , where A + pq ∗ islower Hessenberg. It also accepts a tolerance (cid:15) >
0, which determines the accuracy theeigenvalues are computed to.
Outputs:
It returns as its output the vector λ of length n containing the eigenvalues of the matrix A + pq ∗ . for i = 1 , , . . . , n − do while β i + p i q ∗ i +1 ≥ (cid:15) do (cid:46) Check if ( A + pq ∗ ) i,i +1 is close to zero Perform one iteration of QR (one step of Algorithm 1 followed by onestep of Algorithm 2) on the submatrix ( A + pq ∗ ) i : n,i : n defined by the vectors d i : n , β i : n − , p i : n , and q i : n . end while end for Set λ ← d . 18 lgorithm 4 (Shifted explicit QR) Inputs:
This algorithm accepts as inputs twovectors d and β representing the diagonal and superdiagonal, respectively, of an n × n Hermitian matrix A , as well as two vectors p and q of length n , where A + pq ∗ islower Hessenberg. It also accepts a tolerance (cid:15) >
0, which determines the accuracy theeigenvalues are computed to.
Outputs:
It returns as its output the vector λ of length n containing the eigenvalues of the matrix A + pq ∗ . for i = 1 , , . . . , n − do Set µ sum ← while β i + p i q ∗ i +1 ≥ (cid:15) do (cid:46) Check if ( A + pq ∗ ) i,i +1 is close to zero Compute the eigenvalues µ and µ of the 2 × (cid:20) d i + p i q ∗ i β i + p i q ∗ i +1 β i + p i +1 q ∗ i d i +1 + p i +1 q ∗ i +1 (cid:21) . (cid:46) This is just ( A + pq ∗ ) i : i +1 ,i : i +1 Set µ to whichever of µ and µ is closest to d i + p i q ∗ i . Set µ sum ← µ sum + µ . Set d i : n ← d i : n − µ . Perform one iteration of QR (one step of Algorithm 1 followed by onestep of Algorithm 2) on the submatrix ( A + pq ∗ ) i : n,i : n defined by the vectors d i : n , β i : n − , p i : n , and q i : n . end while Set d i : n ← d i : n + µ sum . end for Set λ i ← d i + p i q ∗ i , for i = 1 , , . . . , n . 19 Componentwise Backward Stability
The principal results of this section are Theorems 4.6 and 4.7, which state that ourunshifted and shifted QR algorithms, respectively, are componentwise backward stable.In Section 4.1, we prove that the forward error of a single sweep of our unshifted QRalgorithm satisfies componentwise bounds. In Section 4.2, we use these bounds showcomponentwise backward stability of our QR algorithms. QR Suppose that A is Hermitian and A + pq ∗ is lower Hessenberg. In this section, we provein Theorem 4.3 that the forward errors in A , p , and q of single sweep of our explicit QR algorithm are proportional to (cid:107) A (cid:107) u , (cid:107) p (cid:107) u , and (cid:107) q (cid:107) u , respectively.The following lemma bounds the forward error of Algorithm 1 (the elimination of thesuperdiagonal). Lemma 4.1.
Suppose that A ∈ C n × n is a Hermitian matrix, and that p, q ∈ C n .Suppose further that A + pq ∗ is lower Hessenberg, and let d and β denote the diagonal andsuperdiagonal of A , respectively. Suppose that Algorithm 1 is carried out in floating pointarithmetic with d , β , p , and q as inputs, and let Q , Q , . . . , Q n ∈ SU(2) be the unitarymatrices generated by an exact step of Line 4 of Algorithm 1 applied to the computedvectors at that step. Let U k ∈ C n × n , k = 2 , , . . . , n , denote the matrices that rotate the ( k − , k ) -plane by Q k , and define U ∈ C n × n by the formula U = U U · · · U n . Supposefinally that (cid:98) d , (cid:98) γ , and (cid:98) p are the outputs generated by Algorithm 1, and define the upperHessenberg part of the matrix (cid:98) B ∈ C n × n by the formula (cid:98) b i,j = − (cid:98) p i q ∗ j if j > i, (cid:98) d i if j = i, (cid:98) γ j if j = i − . (47) where (cid:98) b i,j denotes the ( i, j ) -th entry of (cid:98) B . Let B = U A and p = U p . Then (cid:107) (cid:98) B − B (cid:107) H (cid:46) (cid:107) A (cid:107) u (48) and (cid:107) (cid:98) p − p (cid:107) (cid:46) (cid:107) p (cid:107) u , (49) where (cid:107)·(cid:107) H denotes the square root of the sum of squares of the entries in the upperHessenberg part of its argument (see Definition 2.1). Proof.
Suppose that (cid:98) d ( k ) , (cid:98) γ ( k ) , (cid:98) β ( k ) , (cid:98) p ( k ) , and (cid:98)(cid:101) q ( k ) denote the computed vectorsin Algorithm 1 after the elimination of the superdiagonal elements in the positions( n − , n ) , ( n − , n − , . . . , ( k − , k ). Suppose further that the upper Hessenberg partof the matrix (cid:98) B ( k ) ∈ C n × n is defined by the formula (cid:98) b ( k ) i,j = − (cid:98) p ( k ) i q ∗ j if j > i + 1 or if j = i + 1 and j ≥ k, (cid:98) β ( k ) i if j = i + 1 and j < k, (cid:98) d ( k ) i if j = i, (cid:98) γ ( k ) i if j = i − , (50)20here (cid:98) b ( k ) i,j denotes the ( i, j )-th entry of (cid:98) B ( k ) . Clearly, (cid:98) d = (cid:98) d (2) , (cid:98) γ = (cid:98) γ (2) , (cid:98) p = (cid:98) p (2) , and (cid:98) B = (cid:98) B (2) . Let B ( k ) = U k U k +1 · · · U n A and p ( k ) = U k U k +1 · · · U n p . We will prove that (cid:107) (cid:98) B ( k ) − B ( k ) (cid:107) H (cid:46) (cid:107) A (cid:107) u and (cid:107) (cid:98) p ( k ) − p ( k ) (cid:107) (cid:46) (cid:107) p (cid:107) u , for each k = n, n − , . . . , k = n . From Line 4, we have that the matrix Q n ∈ SU(2) satisfies (cid:18) Q n (cid:20) β n − + p n − q ∗ n d n + p n q ∗ n (cid:21)(cid:19) = 0 , (51)with the computed matrix (cid:98) Q n satisfying (cid:107) (cid:98) Q n − Q n (cid:107) (cid:46) u by Lemma 2.4. In Line 6, wehave (cid:98) γ ( n ) n − = fl (cid:18) (cid:98) Q n (cid:20) γ n − − (cid:101) q n p ∗ n − (cid:21)(cid:19) . (52)At this stage (cid:101) q is still equal to q and, according to Lemma 2.1, a n,n − = − q n p ∗ n − . Bydefinition, a n − ,n − = γ n − . Therefore, by Lemma 2.5, we have that | (cid:98) γ ( n ) n − − b ( n ) n − ,n − | (cid:46) (cid:107) A (cid:107) u , where b ( n ) i,j denotes the ( i, j )-th entry of B ( n ) . In Line 8, we have (cid:34) (cid:98) d ( n ) n − (cid:98) γ ( n ) n − (cid:35) = fl (cid:18) (cid:98) Q n (cid:20) d n − γ n − (cid:21)(cid:19) . (53)Since a n − ,n − = d n − and a n,n − = γ n − , by Lemma 2.5, we have that | (cid:98) d ( n ) n − − b ( n ) n − ,n − | (cid:46) (cid:107) A (cid:107) u and | (cid:98) γ ( n ) n − − b ( n ) n,n − | (cid:46) (cid:107) A (cid:107) u . In Line 9, we have (cid:34) (cid:98) β ( n ) n − (cid:98) d ( n ) n (cid:35) = fl (cid:18) (cid:98) Q n (cid:20) β n − d n (cid:21)(cid:19) . (54)Since, by definition, a n − ,n = β n − and a n,n = d n , it follows from Lemma 2.5 that | (cid:98) β ( n ) n − − b ( n ) n − ,n | (cid:46) (cid:16)(cid:113) | β n − | + | d n | (cid:17) u ≤ (cid:107) A (cid:107) u and | (cid:98) d ( n ) n − b ( n ) n,n | (cid:46) (cid:16)(cid:113) | β n − | + | d n | (cid:17) u ≤(cid:107) A (cid:107) u . In Line 10, we have (cid:34) (cid:98) p ( n ) † n − (cid:98) p ( n ) n (cid:35) = fl (cid:18) (cid:98) Q n (cid:20) p n − p n (cid:21)(cid:19) , (55)where (cid:98) p ( n ) † n − is a temporary value that will be corrected later. Once again, Lemma 2.5 tellsus that | (cid:98) p ( n ) † n − − p ( n ) n − | (cid:46) (cid:16)(cid:113) | p n − | + | p n | (cid:17) u ≤ (cid:107) p (cid:107) u and | (cid:98) p ( n ) n − p ( n ) n | (cid:46) (cid:16)(cid:113) | p n − | + | p n | (cid:17) u ≤(cid:107) p (cid:107) u . Next, we observe that, by the definition of Q n , we have that b ( n ) n − ,n + p ( n ) n − q ∗ n = 0 . (56)In Line 12, we apply a correction to (cid:98) p ( n ) † n − , so that (cid:98) p ( n ) n − = (cid:40) − (cid:98) β ( n ) n − /q ∗ n if | p n − q ∗ n | + | p n q ∗ n | > | β n − | + | d n | , (cid:98) p ( n ) † n − if | p n − q ∗ n | + | p n q ∗ n | ≤ | β n − | + | d n | . (57)21rom this, we see that, if | p n − q ∗ n | + | p n q ∗ n | > | β n − | + | d n | , then | (cid:98) p ( n ) n − − p ( n ) n − | = |− (cid:98) β ( n ) n − /q ∗ n − p ( n ) n − | = | b ( n ) n − ,n /q ∗ n − (cid:98) β ( n ) n − /q ∗ n | = 1 | q ∗ n | | b ( n ) n − ,n − (cid:98) β ( n ) n − | (cid:46) (cid:113) | β n − | + | d n | | q ∗ n | u ≤ (cid:16)(cid:113) | p n − | + | p n | (cid:17) u ≤ (cid:107) p (cid:107) u . (58)where the second equality is due to (56). Furthermore, |− (cid:98) p ( n ) n − q ∗ n − b ( n ) n − ,n | = | (cid:98) β ( n ) n − − b ( n ) n − ,n | (cid:46) (cid:16)(cid:113) | β n − | + | d n | (cid:17) u ≤ (cid:107) A (cid:107) u . (59)If, on the other hand, | p n − q ∗ n | + | p n q ∗ n | ≤ | β n − | + | d n | , then | (cid:98) p ( n ) n − − p ( n ) n − | = | (cid:98) p ( n ) † n − − p ( n ) n − | (cid:46) (cid:113) | p n − | + | p n | u ≤ (cid:107) p (cid:107) u . (60)Moreover, |− (cid:98) p ( n ) n − q ∗ n − b ( n ) n − ,n | = |− (cid:98) p ( n ) † n − q ∗ n − b ( n ) n − ,n | = |− (cid:98) p ( n ) † n − q ∗ n + p ( n ) n − q ∗ n | (cid:46) | q ∗ n | (cid:16)(cid:113) | p n − | + | p n | (cid:17) u ≤ (cid:16)(cid:113) | β n − | + | d n | (cid:17) u ≤ (cid:107) A (cid:107) u . (61)where the second equality follows from (56). This completes the proof that (cid:107) (cid:98) B ( n ) − B ( n ) (cid:107) H (cid:46) (cid:107) A (cid:107) u and (cid:107) (cid:98) p ( n ) − p ( n ) (cid:107) (cid:46) (cid:107) p (cid:107) u .Now, we will show that, if (cid:107) (cid:98) B ( k +1) − B ( k +1) (cid:107) H (cid:46) (cid:107) A (cid:107) u and (cid:107) (cid:98) p ( k +1) − p ( k +1) (cid:107) (cid:46) (cid:107) p (cid:107) u ,then (cid:107) (cid:98) B ( k ) − B ( k ) (cid:107) H (cid:46) (cid:107) A (cid:107) u and (cid:107) (cid:98) p ( k ) − p ( k ) (cid:107) (cid:46) (cid:107) p (cid:107) u . From Line 4, we have that thematrix Q k ∈ SU(2) satisfies (cid:18) Q k (cid:34) (cid:98) β ( k +1) k − + (cid:98) p ( k +1) k − q ∗ k (cid:98) d ( k +1) k + (cid:98) p ( k +1) k q ∗ k (cid:35)(cid:19) = 0 , (62)22ith the computed matrix (cid:98) Q k satisfying (cid:107) (cid:98) Q k − Q k (cid:107) (cid:46) u by Lemma 2.4. In Line 6, wehave (cid:98) γ ( k ) k − = fl (cid:18) (cid:98) Q k (cid:34) (cid:98) γ ( k +1) k − − (cid:98)(cid:101) q ( k +1) k (cid:98) p ( k +1) ∗ k − (cid:35)(cid:19) . (63)We must first show that (cid:12)(cid:12) − (cid:98)(cid:101) q ( k +1) k (cid:98) p ( k +1) ∗ k − − b ( k +1) k,k − (cid:12)(cid:12) (cid:46) (cid:107) A (cid:107) u . (64)We begin by observing that b ( k +1) k,k − = ( B ( k +1) ) k,k − = ( B ( k +1) U ∗ n U ∗ n − · · · U ∗ k +1 ) k,k − , (65)since right-multiplication by U ∗ j only affects columns j and j −
1. We now observe that B ( k +1) U ∗ n U ∗ n − · · · U ∗ k +1 = U k +1 U k +2 · · · U n AU ∗ n U ∗ n − · · · U ∗ k +1 (66)is Hermitian, so from (65) we have that b ( k +1) k,k − = ( B ( k +1) U ∗ n U ∗ n − · · · U ∗ k +1 ) k − ,k . (67)By the induction hypothesis, (cid:98) b ( k +1) k − ,(cid:96) = − (cid:98) p ( k +1) k − q ∗ (cid:96) (68)and | (cid:98) b ( k +1) k − ,(cid:96) − b ( k +1) k − ,(cid:96) | (cid:46) (cid:107) A (cid:107) u , (69)for all (cid:96) = k, k + 1 , . . . , n . Thus, (cid:12)(cid:12) ( − (cid:98) p ( k +1) k − q ∗ U ∗ n U ∗ n − · · · U ∗ k +1 ) k − ( B ( k +1) U ∗ n U ∗ n − · · · U ∗ k +1 ) k − ,k (cid:12)(cid:12) (cid:46) (cid:107) A (cid:107) u . (70)Combining (70) with (67), (cid:12)(cid:12) − (cid:98) p ( k +1) k − (cid:101) q ( k +1) ∗ k − b ( k +1) k,k − (cid:12)(cid:12) (cid:46) (cid:107) A (cid:107) u , (71)where (cid:101) q ( k +1) = U k +1 U k +2 · · · U n q . From Line 14 we have (cid:98)(cid:101) q ( k +1) = fl ( (cid:98) U k +1 (cid:98) U k +2 · · · (cid:98) U n q ) , (72)and, by repeated application of Lemma 2.5, (cid:12)(cid:12)(cid:98)(cid:101) q ( k +1) k − (cid:101) q ( k +1) k (cid:12)(cid:12) (cid:46) (cid:16)(cid:113) | q k | + | q k +1 | + · · · + | q n | (cid:17) u . (73)Thus, (cid:12)(cid:12)(cid:98) p ( k +1) k − (cid:98)(cid:101) q ( k +1) ∗ k − (cid:98) p ( k +1) k − (cid:101) q ( k +1) ∗ k (cid:12)(cid:12) (cid:46) (cid:16)(cid:114) | (cid:98) p ( k +1) k − q ∗ k | + | (cid:98) p ( k +1) k − q ∗ k +1 | + · · · + | (cid:98) p ( k +1) k − q ∗ n | (cid:17) u = (cid:16)(cid:114) | (cid:98) b ( k +1) k − ,k | + | (cid:98) b ( k +1) k − ,k +1 | + · · · + | (cid:98) b ( k +1) k − ,n | (cid:17) u (cid:46) (cid:107) A (cid:107) u . (74)23here the first equality follows by (68) and the second inequality by (69). Combining (71)and (74), (cid:12)(cid:12) − (cid:98) p ( k +1) k − (cid:98)(cid:101) q ( k +1) ∗ k − b ( k +1) k,k − (cid:12)(cid:12) (cid:46) (cid:107) A (cid:107) u , (75)or, equivalently, (cid:12)(cid:12) − (cid:98)(cid:101) q ( k +1) k (cid:98) p ( k +1) ∗ k − − b ( k +1) k,k − (cid:12)(cid:12) (cid:46) (cid:107) A (cid:107) u . (76)Finally, since, by the induction hypothesis, (cid:12)(cid:12)(cid:98) γ ( k +1) k − − b ( k +1) k − ,k − (cid:12)(cid:12) (cid:46) (cid:107) A (cid:107) u , (77)we use (76) and (77) and apply Lemma 2.5 to (63) to find that | (cid:98) γ ( k ) k − − b ( k ) k − ,k − | (cid:46) (cid:107) A (cid:107) u .In Line 8, we have (cid:34) (cid:98) d ( k ) k − (cid:98) γ ( k ) k − (cid:35) = fl (cid:18) (cid:98) Q k (cid:34) (cid:98) d ( k +1) k − (cid:98) γ ( k +1) k − (cid:35)(cid:19) . (78)By the induction hypothesis, | (cid:98) d ( k +1) k − − b ( k +1) k − ,k − | (cid:46) (cid:107) A (cid:107) u and | (cid:98) γ ( k +1) k − − b ( k +1) k,k − | (cid:46) (cid:107) A (cid:107) u .Thus, another application of Lemma 2.5 shows that | (cid:98) d ( k ) k − − b ( k ) k − ,k − | (cid:46) (cid:107) A (cid:107) u and | (cid:98) γ ( k ) k − − b ( k ) k,k − | (cid:46) (cid:107) A (cid:107) u . In Line 9, we have (cid:34) (cid:98) β ( k ) k − (cid:98) d ( k ) k (cid:35) = fl (cid:18) (cid:98) Q k (cid:34) (cid:98) β ( k +1) k − (cid:98) d ( k +1) k (cid:35)(cid:19) . (79)By the induction hypothesis, | (cid:98) β ( k +1) k − − b ( k +1) k − ,k | (cid:46) (cid:107) A (cid:107) u and | (cid:98) d ( k +1) k − b ( k +1) k,k | (cid:46) (cid:107) A (cid:107) u , soit follows from Lemma 2.5 that | (cid:98) β ( k ) k − − b ( k ) k − ,k | (cid:46) (cid:107) A (cid:107) u and | (cid:98) d ( k ) k − b ( k ) k,k | (cid:46) (cid:107) A (cid:107) u . InLine 10, we then have (cid:34) (cid:98) p ( k ) † k − (cid:98) p ( k ) k (cid:35) = fl (cid:18) (cid:98) Q k (cid:34) (cid:98) p ( k +1) k − (cid:98) p ( k +1) k (cid:35)(cid:19) , (80)where (cid:98) p ( k ) † k − is a temporary value that will be corrected later. By the induction hypothesis, | (cid:98) p ( k +1) k − − p ( k +1) k − | (cid:46) (cid:107) p (cid:107) u and | (cid:98) p ( k +1) k − p ( k +1) k | (cid:46) (cid:107) p (cid:107) u , so it follows from Lemma 2.5 that | (cid:98) p ( k ) † k − − p ( k ) k − | (cid:46) (cid:107) p (cid:107) u and | (cid:98) p ( k ) k − p ( k ) k | (cid:46) (cid:107) p (cid:107) u . Define ˚ β ( k ) k − and ˚ d ( k ) k by the formula (cid:34) ˚ β ( k ) k − ˚ d ( k ) k (cid:35) = Q k (cid:34) (cid:98) β ( k +1) k − (cid:98) d ( k +1) k (cid:35) , (81)and define ˚ p ( k ) k − and ˚ p ( k ) k by (cid:34) ˚ p ( k ) k − ˚ p ( k ) k (cid:35) = Q k (cid:34) (cid:98) p ( k +1) k − (cid:98) p ( k +1) k (cid:35) . (82)24learly, ˚ β ( k ) k − + ˚ p ( k ) k − q ∗ k = 0 , (83)by the definition of Q k (see (62)). Also, by Lemma 2.5, we have that | ˚ β ( k ) k − − (cid:98) β ( k ) k − | (cid:46) (cid:16)(cid:113) | (cid:98) β ( k +1) k − | + | (cid:98) d ( k +1) k | (cid:17) u ≤ (cid:107) A (cid:107) u and | ˚ p ( k ) k − − (cid:98) p ( k ) † k − | (cid:46) (cid:16)(cid:113) | (cid:98) p ( k +1) k − | + | (cid:98) p ( k +1) k | (cid:17) u ≤(cid:107) p (cid:107) u . In Line 12, we apply a correction to (cid:98) p ( k ) † k − , so that (cid:98) p ( k ) k − = (cid:40) − (cid:98) β ( k ) k − /q ∗ k if | (cid:98) p ( k +1) k − q ∗ k | + | (cid:98) p ( k +1) k q ∗ k | > | (cid:98) β ( k +1) k − | + | (cid:98) d ( k +1) k | , (cid:98) p ( k ) † k − if | (cid:98) p ( k +1) k − q ∗ k | + | (cid:98) p ( k +1) k q ∗ k | ≤ | (cid:98) β ( k +1) k − | + | (cid:98) d ( k +1) k | . (84)Thus, if | (cid:98) p ( k +1) k − q ∗ k | + | (cid:98) p ( k +1) k q ∗ k | > | (cid:98) β ( k +1) k − | + | (cid:98) d ( k +1) k | , then (cid:12)(cid:12)(cid:98) p ( k ) k − − p ( k ) k − (cid:12)(cid:12) = (cid:12)(cid:12) − (cid:98) β ( k ) k − /q ∗ k − p ( k ) k − (cid:12)(cid:12) . (85)We then observe that (cid:12)(cid:12) − ˚ β ( k ) k − /q ∗ k − p ( k ) k − (cid:12)(cid:12) = (cid:12)(cid:12) ˚ p ( k ) k − − p ( k ) k − (cid:12)(cid:12) (cid:46) (cid:107) p (cid:107) u , (86)and (cid:12)(cid:12) − ˚ β ( k ) k − /q ∗ k + (cid:98) β ( k ) k − /q ∗ k (cid:12)(cid:12) = 1 | q ∗ k | (cid:12)(cid:12) (cid:98) β ( k ) k − − ˚ β ( k ) k − (cid:12)(cid:12) (cid:46) (cid:113) | (cid:98) β ( k +1) k − | + | (cid:98) d ( k +1) k | | q ∗ k | u ≤ (cid:16)(cid:114) | (cid:98) p ( k +1) k − | + | (cid:98) p ( k +1) k | (cid:17) u ≤ (cid:107) p (cid:107) u . (87)Finally, combining (85), (86), and (87), we find that | (cid:98) p ( k ) k − − p ( k ) k − | (cid:46) (cid:107) p (cid:107) u . Furthermore, (cid:12)(cid:12) − (cid:98) p ( k ) k − q ∗ k − b ( k ) k − ,k (cid:12)(cid:12) = (cid:12)(cid:12) (cid:98) β ( k ) k − − b ( k ) k − ,k (cid:12)(cid:12) (cid:46) (cid:107) A (cid:107) u . (88)If, conversely, | (cid:98) p ( k +1) k − q ∗ k | + | (cid:98) p ( k +1) k q ∗ k | ≤ | (cid:98) β ( k +1) k − | + | (cid:98) d ( k +1) k | , then (cid:12)(cid:12)(cid:98) p ( k ) k − − p ( k ) k − (cid:12)(cid:12) = (cid:12)(cid:12)(cid:98) p ( k ) † k − − p ( k ) k − (cid:12)(cid:12) (cid:46) (cid:107) p (cid:107) u . (89)Next, we observe that (cid:12)(cid:12) − (cid:98) p ( k ) k − q ∗ k − b ( k ) k − ,k (cid:12)(cid:12) = (cid:12)(cid:12) − (cid:98) p ( k ) † k − q ∗ k − b ( k ) k − ,k (cid:12)(cid:12) . (90)Since (cid:12)(cid:12) − ˚ p ( k ) k − q ∗ k − b ( k ) k − ,k (cid:12)(cid:12) = (cid:12)(cid:12) ˚ β ( k ) k − − b ( k ) k − ,k (cid:12)(cid:12) (cid:46) (cid:107) A (cid:107) u , (91)25nd (cid:12)(cid:12) − ˚ p ( k ) k − q ∗ k + (cid:98) p ( k ) † k − q ∗ k (cid:12)(cid:12) = | q ∗ k | (cid:12)(cid:12) − ˚ p ( k ) k − + (cid:98) p ( k ) † k − (cid:12)(cid:12) (cid:46) | q ∗ k | (cid:16)(cid:114) | (cid:98) p ( k +1) k − | + | (cid:98) p ( k +1) k | (cid:17) u ≤ (cid:16)(cid:114) | (cid:98) β ( k +1) k − | + | (cid:98) d ( k +1) k | (cid:17) u ≤ (cid:107) A (cid:107) u , (92)we combine (90), (91), and (92) to see that |− (cid:98) p ( k ) k − q ∗ k − b ( k ) k − ,k | (cid:46) (cid:107) A (cid:107) u .Now all that’s left is to show that |− (cid:98) p ( k ) k − q ∗ (cid:96) − b ( k ) k − ,(cid:96) | (cid:46) (cid:107) A (cid:107) u and |− (cid:98) p ( k ) k q ∗ (cid:96) − b ( k ) k,(cid:96) | (cid:46) (cid:107) A (cid:107) u , for all (cid:96) = k + 1 , k + 2 , . . . , n . By the induction hypothesis, (cid:12)(cid:12) − (cid:98) p ( k +1) k − q ∗ (cid:96) − b ( k +1) k − ,(cid:96) (cid:12)(cid:12) (cid:46) (cid:107) A (cid:107) u (93)and (cid:12)(cid:12) − (cid:98) p ( k +1) k q ∗ (cid:96) − b ( k +1) k,(cid:96) (cid:12)(cid:12) (cid:46) (cid:107) A (cid:107) u , (94)for all (cid:96) = k + 1 , k + 2 , . . . , n . Multiplying (82) by q ∗ (cid:96) , we have (cid:34) ˚ p ( k ) k − q ∗ (cid:96) ˚ p ( k ) k q ∗ (cid:96) (cid:35) = Q k (cid:34) (cid:98) p ( k +1) k − q ∗ (cid:96) (cid:98) p ( k +1) k q ∗ (cid:96) (cid:35) , (95)which, combined with (93) and (94), means that (cid:12)(cid:12) − ˚ p ( k ) k − q ∗ (cid:96) − b ( k ) k − ,(cid:96) (cid:12)(cid:12) (cid:46) (cid:107) A (cid:107) u (96)and (cid:12)(cid:12) − ˚ p ( k ) k q ∗ (cid:96) − b ( k ) k,(cid:96) (cid:12)(cid:12) (cid:46) (cid:107) A (cid:107) u , (97)for all (cid:96) = k + 1 , k + 2 , . . . , n . It is not difficult to show (see (87)) that | ˚ p ( k ) k − − (cid:98) p ( k ) k − | (cid:46) (cid:16)(cid:114) | (cid:98) p ( k +1) k − | + | (cid:98) p ( k +1) k | (cid:17) u (98)and | ˚ p ( k ) k − (cid:98) p ( k ) k | (cid:46) (cid:16)(cid:114) | (cid:98) p ( k +1) k − | + | (cid:98) p ( k +1) k | (cid:17) u , (99)from which it follows that | ˚ p ( k ) k − q ∗ (cid:96) − (cid:98) p ( k ) k − q ∗ (cid:96) | (cid:46) (cid:16)(cid:114) | (cid:98) p ( k +1) k − q ∗ (cid:96) | + | (cid:98) p ( k +1) k q ∗ (cid:96) | (cid:17) u (cid:46) (cid:107) A (cid:107) u (100)and | ˚ p ( k ) k q ∗ (cid:96) − (cid:98) p ( k ) k q ∗ (cid:96) | (cid:46) (cid:16)(cid:114) | (cid:98) p ( k +1) k − q ∗ (cid:96) | + | (cid:98) p ( k +1) k q ∗ (cid:96) | (cid:17) u (cid:46) (cid:107) A (cid:107) u , (101)26or all (cid:96) = k + 1 , k + 2 , . . . , n , where the second inequality follows from (93) and (94).Finally, combining (96), (97), (100), and (101), we find that |− (cid:98) p ( k ) k − q ∗ (cid:96) − b ( k ) k − ,(cid:96) | (cid:46) (cid:107) A (cid:107) u and |− (cid:98) p ( k ) k q ∗ (cid:96) − b ( k ) k,(cid:96) | (cid:46) (cid:107) A (cid:107) u , for all (cid:96) = k + 1 , k + 2 , . . . , n , and we are done. (cid:4) The following lemma bounds the forward error of Algorithm 2 (the rotation back toHessenberg form).
Lemma 4.2.
Suppose that B ∈ C n × n and p, q ∈ C n . Suppose further that B + pq ∗ islower triangular, and let d and γ denote the diagonal and subdiagonal of B , respectively.Suppose that Q , Q , . . . , Q n ∈ SU(2) , and suppose that Algorithm 2 is carried out infloating point arithmetic, using d , γ , p , q , and Q , Q , . . . , Q n as inputs. Suppose finallythat (cid:98) d , (cid:98) β , and (cid:98) q are the outputs generated by Algorithm 2, and define the upper triangularpart of the matrix (cid:98) A ∈ C n × n by the formula (cid:98) a i,j = − p i (cid:98) q ∗ j if j > i + 1 , (cid:98) β i if j = i + 1 , (cid:98) d i if j = i, (102) where (cid:98) a i,j denotes the ( i, j ) -th entry of (cid:98) A . Let U k ∈ C n × n , k = 2 , , . . . , n , denotethe matrices that rotate the ( k − , k ) -plane by Q k . Define U ∈ C n × n by the formula U = U U · · · U n , and let A = BU ∗ and q = U q . Then (cid:107) (cid:98) A − A (cid:107) T (cid:46) (cid:107) B (cid:107) H u (103) and (cid:107) (cid:98) q − q (cid:107) (cid:46) (cid:107) q (cid:107) u , (104) where (cid:107)·(cid:107) T denotes the square root of the sum of squares of the entries in the uppertriangular part of its argument and (cid:107)·(cid:107) H denotes the square root of the sum of squares ofthe upper Hessenberg part (see Definition 2.1). Proof.
Suppose that (cid:98) d ( k ) , (cid:98) β ( k ) , and (cid:98) q ( k ) denote the computed vectors in Algorithm 2after rotations in the positions ( n − , n ) , ( n − , n − , . . . , ( k − , k ). Suppose furtherthat the upper triangular part of the matrix (cid:98) A ( k ) ∈ C n × n is defined by the formula (cid:98) a ( k ) i,j = − p i (cid:98) q ( k ) ∗ j if j > i + 1 or if j = i + 1 and j < k, (cid:98) β ( k ) i if j = i + 1 and j ≥ k, (cid:98) d ( k ) i if j = i, (105)where (cid:98) a ( k ) i,j denotes the ( i, j )-th entry of (cid:98) A ( k ) . Clearly, (cid:98) d = (cid:98) d (2) , (cid:98) β = (cid:98) β (2) , (cid:98) q = (cid:98) q (2) , and (cid:98) A = (cid:98) A (2) . Let A ( k ) = BU ∗ n U ∗ n − · · · U ∗ k and q ( k ) = U k U k +1 · · · U n q . We will prove that (cid:107) (cid:98) A ( k ) − A ( k ) (cid:107) T (cid:46) (cid:107) B (cid:107) H u and (cid:107) (cid:98) q ( k ) − q ( k ) (cid:107) (cid:46) (cid:107) q (cid:107) u , for each k = n, n − , . . . , (cid:98) d ( n +1) = d , (cid:98) q ( n +1) = q , q ( n +1) = q , and A ( n +1) = B . Obviously, (cid:98) A ( n +1) = A ( n +1) and (cid:98) q ( n +1) = q ( n +1) , so the above statement is true for k = n + 1. We will proveit for the cases k = n, n − , . . . , (cid:34) (cid:98) d ( k ) k − (cid:98) β ( k ) k − (cid:35) = fl (cid:18) (cid:99) Q k (cid:34) (cid:98) d ( k +1) k − − p k − (cid:98) q ( k +1) ∗ k (cid:35)(cid:19) . (106)27y the induction hypothesis, | (cid:98) d ( k +1) k − − a ( k +1) k − ,k − | (cid:46) (cid:107) B (cid:107) H u and |− p k − (cid:98) q ( k +1) ∗ k − a ( k +1) k − ,k | (cid:46) (cid:107) B (cid:107) H u . Since, by Lemma 2.3, (cid:107) A ( k +1) (cid:107) T ≤ (cid:107) B (cid:107) H , an application of Lemma 2.5 gives us | (cid:98) d ( k ) k − − a ( k ) k − ,k − | (cid:46) (cid:107) B (cid:107) H u and |− p k − (cid:98) q ( k ) ∗ k − a ( k ) k − ,k | (cid:46) (cid:107) B (cid:107) H u . In Line 3, we have (cid:98) d ( k ) k = fl (cid:18) (cid:99) Q k (cid:34) γ k − (cid:98) d ( k +1) k (cid:35)(cid:19) . (107)We first observe that γ k − = b k,k − = ( BU ∗ n U ∗ n − · · · U ∗ k +1 ) k,k − = a ( k +1) k,k − , (108)since right multiplication by U ∗ j only affects columns j and j −
1. Since, by the inductionhypothesis, | (cid:98) d ( k +1) k − a ( k +1) k,k | (cid:46) (cid:107) B (cid:107) H u , an application of Lemma 2.5 together with theinequality (cid:107) A ( k +1) (cid:107) T ≤ (cid:107) B (cid:107) H gives us | (cid:98) d ( k ) k − a ( k ) k,k | (cid:46) (cid:107) B (cid:107) H u . In Line 4, (cid:34) (cid:98) q ( k ) k − (cid:98) q ( k ) k (cid:35) = fl (cid:18) (cid:98) Q k (cid:34) (cid:98) q ( k +1) k − (cid:98) q ( k +1) k (cid:35)(cid:19) . (109)By the induction hypothesis, | (cid:98) q ( k +1) k − − q ( k +1) k − | (cid:46) (cid:107) q (cid:107) u and | (cid:98) q ( k +1) k − q ( k +1) k | (cid:46) (cid:107) q (cid:107) u , so itfollows from Lemma 2.5 that | (cid:98) q ( k ) k − − q ( k ) k − | (cid:46) (cid:107) q (cid:107) u and | (cid:98) q ( k ) k − q ( k ) k | (cid:46) (cid:107) q (cid:107) u .All that’s left now is to prove that |− p (cid:96) (cid:98) q ( k ) ∗ k − − a ( k ) (cid:96),k − | (cid:46) (cid:107) B (cid:107) H u and |− p (cid:96) (cid:98) q ( k ) ∗ k − a ( k ) (cid:96),k | (cid:46) (cid:107) B (cid:107) H u , for all for all (cid:96) = 1 , , . . . , k −
2. By the induction hypothesis, |− p (cid:96) (cid:98) q ( k +1) ∗ k − − a ( k +1) (cid:96),k − | (cid:46) (cid:107) B (cid:107) H u (110)and |− p (cid:96) (cid:98) q ( k +1) ∗ k − a ( k +1) (cid:96),k | (cid:46) (cid:107) B (cid:107) H u , (111)for all (cid:96) = 1 , , . . . , k −
2. Define ˚ q ( k ) k − and ˚ q ( k ) k by (cid:34) ˚ q ( k ) k − ˚ q ( k ) k (cid:35) = Q k (cid:34) (cid:98) q ( k +1) k − (cid:98) q ( k +1) k (cid:35) . (112)Multiplying (112) by p ∗ (cid:96) , we have (cid:34) ˚ q ( k ) k − p ∗ (cid:96) ˚ q ( k ) k p ∗ (cid:96) (cid:35) = Q k (cid:34) (cid:98) q ( k +1) k − p ∗ (cid:96) (cid:98) q ( k +1) k p ∗ (cid:96) (cid:35) , (113)which, combined with (110) and (111) and the fact that (cid:107) A ( k +1) (cid:107) T ≤ (cid:107) B (cid:107) H , means that (cid:12)(cid:12) − p (cid:96) ˚ q ( k ) ∗ k − − a ( k ) (cid:96),k − (cid:12)(cid:12) (cid:46) (cid:107) B (cid:107) H u (114)and (cid:12)(cid:12) − p (cid:96) ˚ q ( k ) ∗ k − a ( k ) (cid:96),k (cid:12)(cid:12) (cid:46) (cid:107) B (cid:107) H u , (115)28or all (cid:96) = 1 , , . . . , k −
2. By Lemma 2.5, | ˚ q ( k ) k − − (cid:98) q ( k ) k − | (cid:46) (cid:16)(cid:114) | (cid:98) q ( k +1) k − | + | (cid:98) q ( k +1) k | (cid:17) u (116)and | ˚ q ( k ) k − (cid:98) q ( k ) k | (cid:46) (cid:16)(cid:114) | (cid:98) q ( k +1) k − | + | (cid:98) q ( k +1) k | (cid:17) u , (117)from which it follows that | ˚ q ( k ) k − p ∗ (cid:96) − (cid:98) q ( k ) k − p ∗ (cid:96) | (cid:46) (cid:16)(cid:114) | (cid:98) q ( k +1) k − p ∗ (cid:96) | + | (cid:98) q ( k +1) k p ∗ (cid:96) | (cid:17) u (cid:46) (cid:107) B (cid:107) H u (118)and | ˚ q ( k ) k p ∗ (cid:96) − (cid:98) q ( k ) k p ∗ (cid:96) | (cid:46) (cid:16)(cid:114) | (cid:98) q ( k +1) k − p ∗ (cid:96) | + | (cid:98) q ( k +1) k p ∗ (cid:96) | (cid:17) u (cid:46) (cid:107) B (cid:107) H u , (119)for all (cid:96) = 1 , , . . . , k −
2, where the second inequality follows from (110) and (111) andthe inequality (cid:107) A ( k +1) (cid:107) T ≤ (cid:107) B (cid:107) H . Finally, combining (114), (115), (118), and (119),we find that |− p (cid:96) (cid:98) q ( k ) ∗ k − − a ( k ) (cid:96),k − | (cid:46) (cid:107) B (cid:107) H u and |− p (cid:96) (cid:98) q ( k ) ∗ k − a ( k ) (cid:96),k | (cid:46) (cid:107) B (cid:107) H u , for all for all (cid:96) = 1 , , . . . , k −
2, and we are done. (cid:4)
The following theorem bounds the forward errors of a full sweep of our QR algorithm,and is the principal result of this subsection. Theorem 4.3.
Suppose that A ∈ C n × n is a Hermitian matrix, that p, q ∈ C n , and that A + pq ∗ is lower Hessenberg. Let d and β denote the diagonal and superdiagonal of A ,respectively. Suppose that Algorithm 1 is carried out in floating point arithmetic, and let Q , Q , . . . , Q n ∈ SU(2) be the unitary matrices generated by an exact step of Line 4 ofAlgorithm 1 applied to the computed vectors at that step. Let U k ∈ C n × n , k = 2 , , . . . , n ,denote the matrices that rotate the ( k − , k ) -plane by Q k , and define U ∈ C n × n by theformula U = U U · · · U n . Suppose that Algorithm 2 is then carried out in floating pointarithmetic, using the outputs of Algorithm 1 as inputs. Suppose finally that (cid:98) p is an outputof Algorithm 1 and (cid:98) q , (cid:98) d , and (cid:98) β are all outputs of Algorithm 2, and define the matrix (cid:98) A by the formula (cid:98) a i,j = − (cid:98) p i (cid:98) q ∗ j if j > i + 1 (cid:98) β i if j = i + 1 (cid:98) d i if j = i ( (cid:98) β j ) if j = i − − (cid:98) q j (cid:98) p ∗ i if j < i − where (cid:98) a i,j denotes the ( i, j ) -th entry of (cid:98) A . Let A = U AU ∗ , p = U p , and q = U q . Then (cid:107) (cid:98) A − A (cid:107) (cid:46) (cid:107) A (cid:107) u , (121)29 (cid:98) p − p (cid:107) (cid:46) (cid:107) p (cid:107) u , (122) and (cid:107) (cid:98) q − q (cid:107) (cid:46) (cid:107) q (cid:107) u . (123) Proof.
Suppose that (cid:98) B (defined by (47)), (cid:98) p , and (cid:98) Q , (cid:98) Q , . . . , (cid:98) Q n ∈ C n × n are outputs ofAlgorithm 1. Let B = U A and p = U p . By Lemma 4.1, (cid:107) (cid:98) B − B (cid:107) H (cid:46) (cid:107) A (cid:107) u (124)and (cid:107) (cid:98) p − p (cid:107) (cid:46) (cid:107) p (cid:107) u , (125)where (cid:107)·(cid:107) H denotes the square root of the sum of squares of the entries in the upperHessenberg part of its argument (see Definition 2.1). Now suppose that (cid:98) B , (cid:98) p , q , and (cid:98) Q , (cid:98) Q , . . . , (cid:98) Q n ∈ C n × n are used as inputs to Algorithm 2. Let (cid:98) U k ∈ C n × n , k = 2 , , . . . , n ,denote the matrices that rotate the ( k − , k )-plane by (cid:98) Q k , and define (cid:98) U ∈ C n × n bythe formula (cid:98) U = (cid:98) U (cid:98) U · · · (cid:98) U n . Let A (cid:101) = (cid:98) B (cid:98) U ∗ and q (cid:101) = (cid:98) U q , and observe that the uppertriangular part of A (cid:101) is well-defined due to Lemma 2.3. By Lemma 4.2 we have that (cid:107) (cid:98) A − A (cid:101) (cid:107) T (cid:46) (cid:107) (cid:98) B (cid:107) H u (126)and (cid:107) (cid:98) q − q (cid:101) (cid:107) (cid:46) (cid:107) q (cid:107) u , (127)where (cid:107)·(cid:107) T denotes the square root of the sum of squares of the entries in the uppertriangular part of its argument and (cid:107)·(cid:107) H denotes the square root of the sum of squares ofthe upper Hessenberg part (see Definition 2.1). Let A = BU ∗ = U AU ∗ and let q = U q .We observe that (cid:107) A (cid:101) − A (cid:107) T = (cid:107) (cid:98) B (cid:98) U ∗ − BU ∗ (cid:107) T ≤ (cid:107) (cid:98) B (cid:98) U ∗ − B (cid:98) U ∗ (cid:107) T + (cid:107) B (cid:98) U ∗ − BU ∗ (cid:107) T = (cid:107) ( (cid:98) B − B ) (cid:98) U ∗ (cid:107) T + (cid:107) B ( (cid:98) U ∗ − U ∗ ) (cid:107) T (cid:46) (cid:107) A (cid:107) u , (128)where the last inequality follows from (124) and the fact that (cid:107) (cid:98) U − U (cid:107) (cid:46) u . Since, clearly, (cid:107) (cid:98) B (cid:107) H u (cid:46) (cid:107) A (cid:107) u , we combine (126) and (128) to get (cid:107) (cid:98) A − A (cid:107) T (cid:46) (cid:107) A (cid:107) u . (129)Now we observe that, since both A = U AU ∗ and (cid:98) A are Hermitian, (cid:107) (cid:98) A − A (cid:107) (cid:46) (cid:107) A (cid:107) u . (130)30ext, we observe that (cid:107) q (cid:101) − q (cid:107) = (cid:107) (cid:98) U q − U q (cid:107) = (cid:107) ( (cid:98) U − U ) q (cid:107) (cid:46) (cid:107) q (cid:107) u , (131)so, combining (127) and (131), we have (cid:107) (cid:98) q − q (cid:107) (cid:46) (cid:107) q (cid:107) u , (132)and we are done. (cid:4) QR Algorithms
Suppose that A is Hermitian and A + pq ∗ is lower Hessenberg. In this section, we prove inTheorems 4.6 and 4.7 that the backward errors in A , p , and q of both our explicit unshifted QR algorithm (see Algorithm 3) and explicit shifted QR algorithm (see Algorithm 4) areproportional to (cid:107) A (cid:107) u , (cid:107) p (cid:107) u , and (cid:107) q (cid:107) u , respectively.The following lemma states that a single iteration of our QR algorithm is component-wise backward stable. Lemma 4.4.
Suppose that A ∈ C n × n is a Hermitian matrix, that p, q ∈ C n , and that A + pq ∗ is lower Hessenberg. Let d and β denote the diagonal and superdiagonal of A ,respectively. Suppose that a single iteration of our QR algorithm (Algorithm 1 followed byAlgorithm 2) is carried out in floating point arithmetic, and let (cid:98) p , (cid:98) q , (cid:98) d , and (cid:98) β denote theoutputs of the algorithm. Define the matrix (cid:98) A by the formula (120). Then there exists aunitary matrix U ∈ C n × n , a matrix δA ∈ C n × n , and vectors δp, δq ∈ C n , such that (cid:98) A = U ( A + δA ) U ∗ , (133) (cid:98) p = U ( p + δp ) , (134) and (cid:98) q = U ( q + δq ) , (135) where (cid:107) δA (cid:107) (cid:46) (cid:107) A (cid:107) u, (cid:107) δp (cid:107) (cid:46) (cid:107) p (cid:107) u, and (cid:107) δq (cid:107) (cid:46) (cid:107) q (cid:107) u. Proof.
Let U ∈ C n × n be the unitary matrix defined in the statement of Theorem 4.3,and let A = U AU ∗ , p = U p , and q = U q . By Theorem 4.3, (cid:98) A = A + δA, (136)where (cid:107) δA (cid:107) (cid:46) (cid:107) A (cid:107) u . Thus, (cid:98) A = U AU ∗ + δA = U ( A + δA ) U ∗ , (137)31here δA = U ∗ δAU . Since U is unitary, clearly (cid:107) δA (cid:107) (cid:46) (cid:107) A (cid:107) u . Likewise, by Theorem 4.3, (cid:98) p = p + δp, (138)where (cid:107) δp (cid:107) (cid:46) (cid:107) p (cid:107) u , so (cid:98) p = U ( p + δp ) , (139)where δp = U ∗ δp and (cid:107) δp (cid:107) (cid:46) (cid:107) p (cid:107) u . Similarly, (cid:98) q = U ( q + δq ) , (140)where (cid:107) δq (cid:107) (cid:46) (cid:107) q (cid:107) u . (cid:4) The following lemma states that repeated iterations of our QR algorithm are compo-nentwise backward stable.
Lemma 4.5.
Suppose that A ∈ C n × n is a Hermitian matrix, that p, q ∈ C n , and that A + pq ∗ is lower Hessenberg. Let d and β denote the diagonal and superdiagonal of A ,respectively. Suppose that k iterations of our QR algorithm (Algorithm 1 followed byAlgorithm 2) are carried out in floating point arithmetic, and let (cid:98) p ( k ) , (cid:98) q ( k ) , (cid:98) d ( k ) , and (cid:98) β ( k ) denote the outputs of the algorithm. Define the matrix (cid:98) A ( k ) by the formula (120),making the obvious substitutions. Then there exists a unitary matrix U ∈ C n × n , a matrix δA ∈ C n × n , and vectors δp, δq ∈ C n , such that (cid:98) A ( k ) = U ( A + δA ) U ∗ , (141) (cid:98) p ( k ) = U ( p + δp ) , (142) and (cid:98) q ( k ) = U ( q + δq ) , (143) where (cid:107) δA (cid:107) (cid:46) (cid:107) A (cid:107) u, (cid:107) δp (cid:107) (cid:46) (cid:107) p (cid:107) u, and (cid:107) δq (cid:107) (cid:46) (cid:107) q (cid:107) u. Proof.
We will prove this statement only for the matrix (cid:98) A ( k ) , since the proofs for (cid:98) p ( k ) and (cid:98) q ( k ) are essentially identical. By repeated application of Lemma 4.4, we know that thereexist unitary matrices U (1) , U (2) , . . . , U ( k ) and matrices δA (0) , δ (cid:98) A (1) , δ (cid:98) A (2) , . . . , δ (cid:98) A ( k − such that (cid:98) A ( k ) = U ( k ) ( (cid:98) A ( k − + δ (cid:98) A ( k − ) U ( k ) ∗ , (144) (cid:98) A ( k − = U ( k − ( (cid:98) A ( k − + δ (cid:98) A ( k − ) U ( k − ∗ , (145)... (cid:98) A (2) = U (2) ( (cid:98) A (1) + δ (cid:98) A (1) ) U (2) ∗ , (146) (cid:98) A (1) = U (1) ( A + δA (0) ) U (1) ∗ , (147)32here (cid:107) δA (0) (cid:107) (cid:46) (cid:107) A (cid:107) u and (cid:107) δ (cid:98) A ( (cid:96) ) (cid:107) (cid:46) (cid:107) A (cid:107) u , for (cid:96) = 1 , , . . . , k −
1. Combining (144)–(147)and expanding, we find that (cid:98) A ( k ) = U ( k ) U ( k − · · · U (1) AU (1) ∗ U (2) ∗ · · · U ( k ) ∗ + U ( k ) δ (cid:98) A ( k − U ( k ) ∗ + U ( k ) U ( k − δ (cid:98) A ( k − U ( k − ∗ U ( k ) ∗ + · · · + U ( k ) U ( k − · · · U (1) δA (0) U (1) ∗ · · · U ( k − ∗ U ( k ) ∗ . (148)Letting U = U ( k ) U ( k − · · · U (1) , this becomes (cid:98) A ( k ) = U AU ∗ + U ( k ) δ (cid:98) A ( k − U ( k ) ∗ + U ( k ) U ( k − δ (cid:98) A ( k − U ( k − ∗ U ( k ) ∗ + · · · + U ( k ) U ( k − · · · U (1) δA (0) U (1) ∗ · · · U ( k − ∗ U ( k ) ∗ . (149)Suppose now that the matrix δA is defined by δA = U ∗ (cid:0) U ( k ) δ (cid:98) A ( k − U ( k ) ∗ + U ( k ) U ( k − δ (cid:98) A ( k − U ( k − ∗ U ( k ) ∗ + · · · + U ( k ) U ( k − · · · U (1) δA (0) U (1) ∗ · · · U ( k − ∗ U ( k ) ∗ (cid:1) U. (150)Clearly, (cid:107) δA (cid:107) (cid:46) (cid:107) A (cid:107) u . Combining (149) and (150), we have (cid:98) A ( k ) = U ( A + δA ) U ∗ , (151)and we are done. (cid:4) The following theorem states that our explicit unshifted QR algorithm is component-wise backward stable.
Theorem 4.6 (Explicit unshifted QR) . Suppose that A ∈ C n × n is a Hermitian matrix,that p, q ∈ C n , and that A + pq ∗ is lower Hessenberg. Let d and β denote the diagonaland superdiagonal of A , respectively. Suppose that Algorithm 3 is carried out in floatingpoint arithmetic with (cid:15) (cid:46) (cid:107) A (cid:107) u, and let (cid:98) λ , (cid:98) λ , . . . , (cid:98) λ n denote the outputs. Then thereexist a matrix δA ∈ C n × n and vectors δp, δq ∈ C n such that (cid:98) λ , (cid:98) λ , . . . , (cid:98) λ n are the exacteigenvalues of the matrix ( A + δA ) + ( p + δp )( q + δq ) ∗ , (152) where (cid:107) δA (cid:107) (cid:46) (cid:107) A (cid:107) u, (cid:107) δp (cid:107) (cid:46) (cid:107) p (cid:107) u, and (cid:107) δq (cid:107) (cid:46) (cid:107) q (cid:107) u. Proof.
Suppose that we carry out QR iterations until the entry in the (1 ,
2) position isless than (cid:15) in absolute value. Let (cid:98) d (1) , (cid:98) β (1) † , (cid:98) p (1) , and (cid:98) q (1) denote the resulting outputs,and let (cid:98) A (1) † be the resulting matrix, defined by formula (120) (making the obvioussubstitutions). By Lemma 4.5, there exist a unitary matrix U (1) ∈ C n × n , a matrix δA (0) † ∈ C n × n , and vectors δp (0) , δq (0) ∈ C n such that (cid:98) A (1) † = U (1) ( A + δA (0) † ) U (1) ∗ , (153) (cid:98) p (1) = U (1) ( p + δp (0) ) , (154)33nd (cid:98) q (1) = U (1) ( q + δq (0) ) , (155)where (cid:107) δA (0) † (cid:107) (cid:46) (cid:107) A (cid:107) u , (cid:107) δp (0) (cid:107) (cid:46) (cid:107) p (cid:107) u , and (cid:107) δq (0) (cid:107) (cid:46) (cid:107) q (cid:107) u . Let (cid:98) A (1) be equal to (cid:98) A (1) † , except that the entry in the (1 ,
2) position of (cid:98) A (1) is equal to − (cid:98) p (1)1 (cid:98) q (1) ∗ , so that (cid:98) A (1)1 , + (cid:98) p (1)1 (cid:98) q (1) ∗ = 0. Since | (cid:98) A (1) † , + (cid:98) p (1)1 (cid:98) q (1) ∗ | < (cid:15) , we have (cid:107) (cid:98) A (1) † − (cid:98) A (1) (cid:107) F < (cid:15), (156)where (cid:107)·(cid:107) F denotes the Frobenius norm, and since (cid:15) (cid:46) (cid:107) A (cid:107) u , (cid:107) (cid:98) A (1) † − (cid:98) A (1) (cid:107) (cid:46) (cid:107) A (cid:107) u . (157)Letting δA (0) = δA (0) † + U (1) ∗ ( (cid:98) A (1) − (cid:98) A (1) † ) U (1) (158)and combining (153) and (158), we have (cid:98) A (1) = U (1) ( A + δA (0) ) U (1) ∗ , (159)where (cid:107) δA (0) (cid:107) (cid:46) (cid:107) A (cid:107) u by (157). Clearly, since (cid:98) A (1)1 , + (cid:98) p (1)1 (cid:98) q (1) ∗ = 0 and (cid:98) A (1) + (cid:98) p (1) (cid:98) q (1) ∗ is lower Hessenberg, (cid:98) λ = (cid:98) A (1)1 , + (cid:98) p (1)1 (cid:98) q (1) ∗ is an eigenvalue of (cid:98) A (1) + (cid:98) p (1) (cid:98) q (1) ∗ . Thus,from (154), (155), and (159), we see that (cid:98) λ is an eigenvalue of ( A + δA (0) ) + ( p + δp (0) )( q + δq (0) ) ∗ .Now suppose that we deflate the matrix, and perform QR iterations on the submatrix (cid:98) A (1)2: n, n + (cid:98) p (1)2: n (cid:98) q (1) ∗ n , until the entry in the (1 ,
2) position of the deflated matrix is less than (cid:15) . Let (cid:98) d (cid:101) (2) ∈ C n − , (cid:98) β (cid:101) (2) † ∈ C n − , (cid:98) p (cid:101) (2) ∈ C n − , and (cid:98) q (cid:101) (2) ∈ C n − denote the resultingoutputs, and let (cid:98) A (cid:101) (2) † ∈ C ( n − × ( n − be the resulting matrix, defined by formula (120)(again making the obvious substitutions). By Lemma 4.5, there exist a unitary matrix U (cid:101) (2) ∈ C ( n − × ( n − , a matrix δ (cid:98) A (cid:101) (1) † ∈ C ( n − × ( n − , and vectors δ (cid:98) p (cid:101) (1) , δ (cid:98) q (cid:101) (1) ∈ C n − such that (cid:98) A (cid:101) (2) † = U (cid:101) (2) ( (cid:98) A (1)2: n, n + δ (cid:98) A (cid:101) (1) † ) U (cid:101) (2) ∗ , (160) (cid:98) p (cid:101) (2) = U (2) ( (cid:98) p (1)2: n + δ (cid:98) p (cid:101) (1) ) , (161)and (cid:98) q (cid:101) (2) = U (2) ( (cid:98) q (1)2: n + δ (cid:98) q (cid:101) (1) ) , (162)where (cid:107) δ (cid:98) A (cid:101) (1) † (cid:107) (cid:46) (cid:107) A (cid:107) u , (cid:107) δ (cid:98) p (cid:101) (1) (cid:107) (cid:46) (cid:107) p (cid:107) u , and (cid:107) δ (cid:98) q (cid:101) (1) (cid:107) (cid:46) (cid:107) q (cid:107) u . Like before, let (cid:98) A (cid:101) (2) ∈ C ( n − × ( n − be equal to (cid:98) A (cid:101) (2) † , except that the entry in the (1 ,
2) position of (cid:98) A (cid:101) (2) is equalto − (cid:98) p (cid:101) (2)1 (cid:98) q (cid:101) (2) ∗ , so that (cid:98) A (cid:101) (2)1 , + (cid:98) p (cid:101) (2)1 (cid:98) q (cid:101) (2) ∗ = 0. Since | (cid:98) A (cid:101) (2) † , + (cid:98) p (cid:101) (2)1 (cid:98) q (cid:101) (2) ∗ | < (cid:15) , we have (cid:107) (cid:98) A (cid:101) (2) † − (cid:98) A (cid:101) (2) (cid:107) F < (cid:15), (163)34here (cid:107)·(cid:107) F denotes the Frobenius norm, and since (cid:15) (cid:46) (cid:107) A (cid:107) u , (cid:107) (cid:98) A (cid:101) (2) † − (cid:98) A (cid:101) (2) (cid:107) (cid:46) (cid:107) A (cid:107) u . (164)Letting δ (cid:98) A (cid:101) (1) = δ (cid:98) A (cid:101) (1) † + U (cid:101) (2) ∗ ( (cid:98) A (cid:101) (2) − (cid:98) A (cid:101) (2) † ) U (cid:101) (2) , (165)we have (cid:98) A (cid:101) (2) = U (cid:101) (2) ( (cid:98) A (1)2: n, n + δ (cid:98) A (cid:101) (1) ) U (cid:101) (2) ∗ , (166)where (cid:107) δ (cid:98) A (cid:101) (1) (cid:107) (cid:46) (cid:107) A (cid:107) u . Since (cid:98) A (cid:101) (2)1 , + (cid:98) p (cid:101) (1)1 (cid:98) q (cid:101) (1) ∗ = 0 and (cid:98) A (cid:101) (2) + (cid:98) p (cid:101) (2) (cid:98) q (cid:101) (2) ∗ is lower Hessenberg, (cid:98) λ = (cid:98) A (cid:101) (2)1 , + (cid:98) p (cid:101) (2)1 (cid:98) q (cid:101) (2) ∗ is an eigenvalue of (cid:98) A (cid:101) (2) + (cid:98) p (cid:101) (2) (cid:98) q (cid:101) (2) ∗ . Now define the unitary matrix U (2) ∈ C n × n by the formula U (2) = · · · U (cid:101) (2) ...0 , (167)the matrix δ (cid:98) A (1) ∈ C n × n by δ (cid:98) A (1) = · · · δ (cid:98) A (cid:101) (1) ...0 , (168)and the vectors δ (cid:98) p (1) , δ (cid:98) q (1) ∈ C n , by δ (cid:98) p (1) = (cid:34) δ (cid:98) p (cid:101) (1) (cid:35) , (169)and δ (cid:98) q (1) = (cid:34) δ (cid:98) q (cid:101) (1) (cid:35) . (170)Clearly, (cid:107) δA (1) (cid:107) (cid:46) (cid:107) A (cid:107) u , (cid:107) δ (cid:98) p (1) (cid:107) (cid:46) (cid:107) p (cid:107) u , and (cid:107) δ (cid:98) q (1) (cid:107) (cid:46) (cid:107) q (cid:107) u . Let (cid:98) A (2) ∈ C n × n bedefined by (cid:98) A (2) = U (2) ( (cid:98) A (1) + δA (1) ) U (2) ∗ , (171)and (cid:98) p (2) , (cid:98) q (2) ∈ C n by (cid:98) p (2) = U (2) ( (cid:98) p (1) + δ (cid:98) p (1) ) , (172)35nd (cid:98) q (2) = U (2) ( (cid:98) q (1) + δ (cid:98) q (1) ) . (173)We first notice that (cid:98) A (2)1 ,(cid:96) + (cid:98) p (2)1 (cid:98) q (2) ∗ (cid:96) = 0 for (cid:96) = 2 , , . . . , n . Next, we observe that (cid:98) A (2)1 , + (cid:98) p (2)1 (cid:98) q (2) ∗ = (cid:98) A (1)1 , + (cid:98) p (1)1 (cid:98) q (1) ∗ = (cid:98) λ ; therefore, (cid:98) λ is an eigenvalue of (cid:98) A (2) + (cid:98) p (2) (cid:98) q (2) ∗ .We then observe that ( (cid:98) A (2) + (cid:98) p (2) (cid:98) q (2) ∗ ) n, n = (cid:98) A (cid:101) (2) + (cid:98) p (cid:101) (2) (cid:98) q (cid:101) (2) ∗ ; therefore, (cid:98) λ is aneigenvalue of (cid:98) A (2) + (cid:98) p (2) (cid:98) q (2) ∗ . Finally, letting U = U (2) U (1) and substituting (154), (155),and (159) into (171)–(173) and expanding, it is straightforward to show that there existmatrices δA ∈ C n × n and vectors δp, δq ∈ C n such that (cid:98) A (2) + (cid:98) p (2) (cid:98) q (2) ∗ = U ( A + δA ) U ∗ + U ( p + δp )( q + δq ) ∗ U ∗ , (174)where (cid:107) δA (cid:107) (cid:46) (cid:107) A (cid:107) u , (cid:107) δp (cid:107) (cid:46) (cid:107) p (cid:107) u , and (cid:107) δq (cid:107) (cid:46) (cid:107) q (cid:107) u . Therefore, (cid:98) λ and (cid:98) λ are eigenvaluesof the matrix ( A + δA ) + ( p + δp )( q + δq ) , (175)where (cid:107) δA (cid:107) (cid:46) (cid:107) A (cid:107) u , (cid:107) δp (cid:107) (cid:46) (cid:107) p (cid:107) u , and (cid:107) δq (cid:107) (cid:46) (cid:107) q (cid:107) u . The same proof can be repeatedinductively to show this for all (cid:98) λ , (cid:98) λ , . . . , (cid:98) λ n . (cid:4) The following theorem states that our explicit shifted QR algorithm is componentwisebackward stable, for those eigenvalues for which the shifts are small.
Theorem 4.7 (Explicit shifted QR) . Suppose that A ∈ C n × n is a Hermitian matrix,that p, q ∈ C n , and that A + pq ∗ is lower Hessenberg. Let d and β denote the diagonaland superdiagonal of A , respectively. Suppose that Algorithm 4 is carried out in floatingpoint arithmetic with (cid:15) (cid:46) (cid:107) A (cid:107) u, and suppose that µ ( (cid:96) ) is the largest total shift encounteredat any point during the course of the algorithm from i = 1 , , . . . , (cid:96) in the outer loop.Let (cid:98) λ , (cid:98) λ , . . . , (cid:98) λ n denote the outputs of the algorithm. Then, for each (cid:96) = 1 , , . . . , n ,there exist a matrix δA ∈ C n × n and vectors δp, δq ∈ C n such that (cid:98) λ , (cid:98) λ , . . . , (cid:98) λ (cid:96) are exacteigenvalues of the matrix ( A + δA ) + ( p + δp )( q + δq ) ∗ , (176) where (cid:107) δA (cid:107) (cid:46) ( (cid:107) A (cid:107) + | µ ( (cid:96) ) | ) u, (cid:107) δp (cid:107) (cid:46) (cid:107) p (cid:107) u, and (cid:107) δq (cid:107) (cid:46) (cid:107) q (cid:107) u. Proof.
The proof is essentially identical to the proof of Theorem 4.6, and we omit it. (cid:4)
Remark 4.1.
Notice that Theorems 4.6 and 4.7 do not make any mention of convergence.What they say is that, if the algorithm converges, then it is componentwise backwardstable. We observe that, in practice, Algorithm 4 always converges rapidly, at leastquadratically, for (cid:15) ≈ (cid:107) A (cid:107) u . Remark 4.2.
Notice that the bound on δA in Theorem 4.7 involves µ ( (cid:96) ) , which is thelargest total shift encountered at any point during the calculation of (cid:98) λ , (cid:98) λ , . . . , (cid:98) λ (cid:96) . Whilethis bound appears weaker than the corresponding bound in Theorem 4.6, in practice it36urns out to be essentially the same, as follows. We can always assume that A is muchsmaller than p , or q , or both; if this isn’t the case, then componentwise stability nolonger has any special meaning, since it follows immediately from the usual Bauer-Fikeperturbation bounds (see [7]). Furthermore, we tend to be interested in the componentwisebackward stability of small eigenvalues (cid:98) λ i , where | (cid:98) λ i | ≈ (cid:107) A (cid:107) . If we perform a few iterationsof unshifted QR on the matrix, then the eigenvalues of the top-left 2 × | µ ( i ) | ≈ | (cid:98) λ i | and (approximately) (cid:98) λ < (cid:98) λ < · · · < (cid:98) λ i . For (cid:98) λ i such that | (cid:98) λ i | ≈ (cid:107) A (cid:107) , we have then that the bound (cid:107) δA (cid:107) (cid:46) ( (cid:107) A (cid:107) + | µ ( i ) | ) u becomes (cid:107) δA (cid:107) (cid:46) (cid:107) A (cid:107) u . Finally, we point out that the dependence of the bound on µ ( (cid:96) ) couldlikely be removed entirely by reformulating our QR algorithm as an implicit method. In this section, we demonstrate the componentwise backward stability of our shifted QRalgorithm (see Algorithm 4) by illustrating its stability when it is used as a rootfindingalgorithm (see Sections 2.3 and 2.4). Consider a polynomial p ( x ) of order n , not necessarilymonic, expressed in a Chebyshev polynomial basis p ( x ) = n (cid:88) j =0 a j T j ( x ) , (177)where a j ∈ R and T j ( x ) is the Chebyshev polynomial of order j . By Theorem 2.7 andRemark 2.1, we have that if the eigenvalues of the linearization (17), where c j = a j /a n , j = 0 , , . . . , n , are computed by a componentwise backward stable algorithm, then thecomputed roots (cid:98) x , (cid:98) x , . . . , (cid:98) x n are the exact roots of the perturbed polynomial p ( x ) + δp ( x ) = n (cid:88) j =0 ( a j + δa j ) T j ( x ) , (178)where (cid:107) δa (cid:107)(cid:107) a (cid:107) (cid:46) u . (179)By applying our QR algorithm to linearizations of various polynomials p ( x ), we demon-strate our algorithm’s componentwise backward stability by showing that the bound (179)always holds.We estimate the size of the backward error δa in the coefficients by using the followingobservation (see the discussion accompanying Table 1 in [26]). By the definition of p ( x ) + δp ( x ), we have that ( p + δp )( (cid:98) x i ) = 0 for i = 1 , , . . . , n . From (177) and (178), itfollows that p ( (cid:98) x i ) = p ( (cid:98) x i ) − ( p + δp )( (cid:98) x i ) = − n (cid:88) j =0 δa j T j ( (cid:98) x j ) . (180)37ince − ≤ T j ( x ) ≤ j when x ∈ [ − , p ( (cid:98) x i ) ≈ (cid:107) δa (cid:107) , (181)whenever (cid:98) x i ∈ C is not too far from the interval [ − , (cid:98) x i is already a floating point number, the value p ( (cid:98) x i ) cannot be computedexactly in floating point arithmetic. Letting (cid:98) p ( (cid:98) x i ) denote the approximation to p ( (cid:98) x i )computed in floating point arithmetic, we know that (cid:98) p ( (cid:98) x i ) ≈ p ( (cid:98) x i ) + κ ( p ; (cid:98) x i ) u , (182)where κ ( p ; (cid:98) x i ) = | (cid:98) x i || p (cid:48) ( (cid:98) x i ) | (183)is the absolute condition number of p ( x ) at x = (cid:98) x i . When κ ( p ; (cid:98) x i ) is large, the error inevaluating p ( (cid:98) x i ) dominates, while when κ ( p ; (cid:98) x i ) is of modest size, we have (cid:98) p ( (cid:98) x i ) ≈ p ( (cid:98) x i ).In this section, we investigate the quantity η ( p ; (cid:98) x i ) = (cid:98) p ( (cid:98) x i )max( κ ( p ; (cid:98) x i ) , (cid:107) a (cid:107) ) , (184)for various polynomials p ( x ). When κ ( p ; (cid:98) x i ) ≥ (cid:107) a (cid:107) , we have | η ( p ; (cid:98) x i ) | = | (cid:98) p ( (cid:98) x i ) | κ ( p ; (cid:98) x i ) ≈ | p ( (cid:98) x i ) | κ ( p ; (cid:98) x i ) + u ≤ | p ( (cid:98) x i ) |(cid:107) a (cid:107) + u. (185)When κ ( p ; (cid:98) x i ) ≤ (cid:107) a (cid:107) , | η ( p ; (cid:98) x i ) | = | (cid:98) p ( (cid:98) x i ) |(cid:107) a (cid:107) ≈ | p ( (cid:98) x i ) |(cid:107) a (cid:107) + κ ( p ; (cid:98) x i ) (cid:107) a (cid:107) u ≤ | p ( (cid:98) x i ) |(cid:107) a (cid:107) + u. (186)Thus, if our QR algorithm is indeed componentwise backward stable and (179) is satisfied,then, by (181), we expect to find that η ( p ; (cid:98) x i ) ≈ u , (187)for all polynomials p ( x ).For p ( (cid:98) x i ) to be a good approximation to (cid:107) δa (cid:107) (see (181)), we stated that (cid:98) x i ∈ C should be “not too far from the interval [ − , z i , i = 1 , , . . . , n denote the exact roots of the order- n polynomial p ( x ), and let (cid:98) z i denote the computed roots. We select roots close to the interval [ − ,
1] by choosing some δ > δ = 10 − ), and letting (cid:98) x i ∈ R denote the real part of all roots (cid:98) z i thatare inside the rectangle { z ∈ C : 1 − δ < Re( z ) < δ, − δ < Im( z ) < δ } . (188)If the polynomial p ( x ) has the real root z i , then taking the real part of (cid:98) z i will not resultin any additional error. The number of real roots inside the region (188) will often beless than the order n , and we denote the number of such roots by n roots .38n our numerical experiments, we compute the eigenvalues of the colleague matrix usingthree different algorithms: our Algorithm 4; MATLAB’s eig function; and MATLAB’s eig function with balancing turned off (using the option ’nobalance’ ), which we call eig nb . For our experiments in extended (quadruple) precision, we use the AdvanpixMultiprecision Computing Toolbox and its implementation of eig (see [1]). Since theAdvanpix Multiprecision Computing Toolbox’s eig function always balances the matrix,and does not support the ’nobalance’ option, we omit the test of eig nb in extendedprecision.For each example, we report the degree of the underlying polynomial, the order n ofthe Chebyshev expansion used to approximate it, the size of the vector c in the Euclideannorm, the Frobenius norm of the completely balanced colleague matrix, which we denoteby bal ( C ), the number n roots of computed roots inside the region (188) for the givenvalue of δ >
0, the size max i | z i | of the largest complex root of the colleague matrix, andthe value of max i | η ( p ; (cid:98) x i ) | , where the maximum is taken over all of the real parts of thecomputed roots inside the region (188).We implemented our algorithm in FORTRAN 77, and compiled it using Lahey/FujitsuFortran 95 Express, Release L6.20e. For the timing experiments, the Fortran codes werecompiled using the Intel Fortran Compiler, version 19.0.2.187, with the -fast flag. TheMATLAB experiments were performed using MATLAB R2019b, version 9.7.0.1190202,and the extended precision MATLAB experiments were performed in quadruple precision( mp.Digits(34) ) using the Advanpix Multiprecision Computing Toolbox, version 4.8.0,Build 14100. All experiments we conducted on a ThinkPad laptop, with 16GB of RAMand an Intel Core i7-8550U CPU. p rand ( x ) : Polynomials with Random Coefficients Following [14], we construct polynomials p rand ( x ) by sampling Chebyshev expansioncoefficients a i independently from a standard normal distribution, so that a i ∼ N (0 , i = 0 , , . . . , n −
1. Then, we choose the desired value of (cid:107) c (cid:107) by setting a n = (cid:107) a (cid:107) / (cid:107) c (cid:107) ,so that the vector of coefficients c appearing in the colleague matrix (17), where c i = a i /a n for i = 1 , , . . . , n , has the specified norm. For this example, we choose n = 30 and set δ = 10 − to extract the real roots (see formula (188)).We report the results in Figure 3. We see that our algorithm shows the expectedbackward stability over the entire range of (cid:107) c (cid:107) , while MATLAB’s eig , both balancedand unbalanced, shows the expected growth with (cid:107) c (cid:107) (see the discussion in Section 2.4).Interestingly, for this example, balancing appears to only improve the error by an orderof magnitude or two, while leaving the growth in the error with respect to (cid:107) c (cid:107) unchanged.This turns out be completely consistent the following explanation. Balancing the colleaguematrix can reduce the magnitude of the all elements from (cid:107) c (cid:107) to (cid:107) c (cid:107) , except for theelement in the ( n, n )-position, which balancing cannot change. In this example, all of theelements of the vector c are around the same size as (cid:107) c (cid:107) , so there are n large elements inthe colleague matrix. Thus, balancing reduces the number of large elements of size (cid:107) c (cid:107) from n to 1, resulting in an n -fold reduction in the norm of the matrix. In this example, n = 30, which corresponds well with the approximately 30-fold reduction in error due tobalancing that we observe in Figure 3.We found that the colleague matrix has a single large eigenvalue of the size (cid:107) c (cid:107) , andthe rest of the eigenvalues are small. This is not surprising, since there is an entry of size39 (cid:1)c(cid:1)10 -14 -12 -10 -8 -6 -4 -2 −16 (cid:1)c(cid:1) 10 (cid:1)c(cid:1)10 -30 -25 -20 -15 -10 -5 −35 (cid:1)c(cid:1) Figure 3: The values of max i | η ( p ; (cid:98) x i ) | for various values of (cid:107) c (cid:107) , in double precision (left)and quadruple precision (right), for the polynomials p rand ( x ) of order n = 30, computedby our algorithm, eig , and eig nb , with δ = 10 − . The values are indicated for ouralgorithm with purple crosses (+), for eig with green x’s ( × ), and for eig nb with bluestars ( (cid:63) ). (cid:107) c (cid:107) is the ( n, n )-position of the matrix (from which it follows that Ce n ≈ (cid:107) c (cid:107) e n ). Thus,for all three algorithms, max i | (cid:98) z i | ≈ (cid:107) c (cid:107) . p wilk ( x ) : Wilkinson’s Polynomial Here we consider the famous Wilkinson polynomial, normalized so that all of its rootsare inside the interval [ − , p wilk ( x ) = m (cid:89) i =1 (cid:16) x − (cid:16) im + 1 − (cid:17)(cid:17) . (189)We construct an order- n Chebyshev expansion of this degree- m polynomial, samplingit at n Chebshev points and applying a linear transformation to obtain the expansioncoefficients (see [36]). We then compute the eigenvalues of the colleague matrix, andset δ = 10 − to extract the real roots (see formula (188)). The results of our numericalexperiment are shown in Tables 1 and 2. We observe that our algorithm is backwardstable, while eig loses accuracy whenever the roots of the colleague matrix are large.Plots of the real and complex roots of the order-100 Chebyshev expansion are shownfor various degrees of p wilk ( x ) in Figure 4. When the order of the Wilkinson polynomialbecomes large, spurious real roots begin appearing in the middle of the interval [ − , Remark 5.1.
The remarkable stability of eig for many of the examples in Tables 1 and 2is explained by the following observation. The colleague matrix is the sum of a tridiagonalmatrix and a matrix that is all zeros except for the last row, which is essentially equal40
Figure 4: The roots of the Chebyshev expansion of order 100 of the Wilkinson polynomial p wilk ( x ), of various degrees, computed by our algorithm. The complex roots (cid:98) z i areplotted with purple crosses (+) and the real roots (cid:98) x i are plotted with blue stars ( (cid:63) ).The Wilkinson polynomial has, in order from top to bottom, orders 24, 34, 44, and 54.Observe that the spurious complex roots are well-separated from the interval [ − ,
1] whenthe order is low, but eventually meet the interval when the order is large.to the coefficient vector c (see formula (17)). When there are large elements of c nearthe tail of the vector, the corresponding large entries in the colleague matrix cannot bebalanced away, since they are very close to the diagonal of the matrix. On the otherhand, when all of the elements of c near the tail of the vector are relatively small, and thelarge elements of c appear near the head of the vector, these large elements can be easilybalanced away, since they are far from the diagonal, and the corresponding elementson the other side of the diagonal are all zero. The coefficient vector c is usually largeonly because the last coefficient of the corresponding non-monic Chebyshev expansion issmall. If the function being approximated by this non-monic Chebyshev expansion hasbeen adequately represented, then taking additional terms in the expansion will result incorresponding expansion coefficients which are all machine epsilon in size. Thus, addingterms to the Chebyshev expansion has the effect of adding elements of size approximatelyone to the tail of the coefficient vector c ; if enough such elements are added, then allthe large elements of c will be closer to the head of the vector, and can be balancedaway. We also observe that, not unexpectedly, the size of the largest eigenvalue of thecolleague matrix is approximately the same size as the norm of the colleague matrixafter balancing. Thus, if enough terms are taken in a Chebyshev expansion, all of the41 ig Algorithm 4Degree n (cid:107) c (cid:107) (cid:107) bal ( C ) (cid:107) max i | z i | n roots max i | η ( p ; (cid:98) x i ) | n roots max i | η ( p ; (cid:98) x i ) |
14 100 0 . · . · . ·
14 0 . · −
14 0 . · −
24 24 0 . · . · . ·
24 0 . · −
24 0 . · −
25 0 . · . · . · † . · −
24 0 . · −
26 0 . · . · . ·
24 0 . · −
24 0 . · −
27 0 . · . · . ·
24 0 . · −
24 0 . · −
28 0 . · . · . ·
24 0 . · −
24 0 . · −
100 0 . · . · . ·
24 0 . · −
24 0 . · −
34 100 0 . · . · . ·
34 0 . · −
34 0 . · −
44 100 0 . · . · . ·
44 0 . · −
44 0 . · −
54 100 0 . · . · . ·
60 0 . · −
60 0 . · − Table 1: The results of computing the roots of the Wilkinson polynomial p wilk ( x ), usingour algorithm and eig , with δ = 10 − . † The error was so large here that some roots wereoutside of the region (188). eig
Algorithm 4Degree n (cid:107) c (cid:107) (cid:107) bal ( C ) (cid:107) max i | z i | n roots max i | η ( p ; (cid:98) x i ) | n roots max i | η ( p ; (cid:98) x i ) |
14 100 0 . · . · . ·
14 0 . · −
14 0 . · −
24 24 0 . · . · . ·
24 0 . · −
24 0 . · −
25 0 . · . · . · † . · −
24 0 . · −
26 0 . · . · . ·
24 0 . · −
24 0 . · −
27 0 . · . · . ·
24 0 . · −
24 0 . · −
28 0 . · . · . ·
24 0 . · −
24 0 . · −
100 0 . · . · . ·
24 0 . · −
24 0 . · −
34 100 0 . · . · . ·
34 0 . · −
34 0 . · −
44 100 0 . · . · . ·
44 0 . · −
44 0 . · −
54 100 0 . · . · . ·
54 0 . · −
54 0 . · − Table 2: The results of computing the roots of the Wilkinson polynomial p wilk ( x ) inextended precision, using our algorithm and eig , with δ = 10 − . † The error was so largehere that some roots were outside of the region (188).eigenvalues of the colleague matrix will eventually be small. All of this indicates that,provided enough terms are taken, a dense eigensolver combined with balancing canresult in a backward stable rootfinding algorithm. Of course, we note that balancingthe colleague matrix destroys the Hermitian plus rank-1 structure, which bars the use ofstructured QR algorithms depending on this property. f sin ( x ) : A Smooth Function Here we construct an order- n Chebyshev expansion of the smooth function f sin ( x ) = sin(2 + 20( x + 0 . ) . (190)Since f sin ( x ) is analytic, its expansion coefficients decay exponentially. When i ≥
80, thecoefficients a i are around 10 − in size (see Figure 5). If the coefficients are computedin extended precision, then, when i ≥ a i are around 10 − in size.Since the function is approximated accurately by a Chebyshev expansion, its roots can42e computed from the corresponding colleague matrix (see, for example, [12] for a nicediscussion). The results of our numerical experiment are shown in Tables 3 and 4. Plots ofthe real and complex roots of the order-100 Chebyshev expansion are shown in Figure 6. −1.0 −0.5 0.0 0.5 1.0−1.0−0.50.00.51.0 f sin (x) -13 -10 -7 -4 -1 |a i | Figure 5: The function f sin ( x ), shown on the left, and the magnitude of its Chebyshevexpansion coefficients a i , shown on the right. eig Algorithm 4 n (cid:107) c (cid:107) (cid:107) bal ( C ) (cid:107) max i | z i | n roots max i | η ( p ; (cid:98) x i ) | n roots max i | η ( p ; (cid:98) x i ) |
80 0 . · . · . ·
14 0 . · −
14 0 . · −
100 0 . · . · . ·
14 0 . · −
14 0 . · − Table 3: The results of computing the roots of the Chebyshev expansion of the function f sin ( x ), using our algorithm and eig , with δ = 10 − . eig Algorithm 4 n (cid:107) c (cid:107) (cid:107) bal ( C ) (cid:107) max i | z i | n roots max i | η ( p ; (cid:98) x i ) | n roots max i | η ( p ; (cid:98) x i ) |
125 0 . · . · . ·
14 0 . · −
14 0 . · −
200 0 . · . · . ·
14 0 . · −
14 0 . · − Table 4: The results of computing the roots of the Chebyshev expansion of the function f sin ( x ) in extended precision, using our algorithm and eig , with δ = 10 − . p mult ( x ) : A Polynomial with Multiple Roots Here we construct an order n Chebyshev expansion of the degree m polynomial f mult ( x ) = ( x + )( x + )( x + 0 . x − . m − (cid:89) i =1 ( x − (1 − − )) . (191)This polynomial has four simple roots on the interval [ − , m −
4) at the point 1 − − (see Figure 7). The results of our numerical experimentsare shown in Tables 5 and 6. We observe that, in double precision, when the multiplicityof the root is greater than or equal to 5, not all real roots are found. This is because43 Figure 6: The roots of the Chebyshev expansion of order 100 of the function f sin ( x ).The complex roots (cid:98) z i are plotted with purple crosses (+) and the real roots (cid:98) x i are plottedwith blue stars ( (cid:63) ).the error in these roots is approximately equal to (cid:15) , and when (cid:15) ≈ − , we have that (cid:15) ≈ . · − ; when δ = 10 − , this means that some of these roots can be outsidethe region (188). Likewise, since when (cid:15) ≈ − , (cid:15) ≈ . · − , it follows that inextended precision, real roots are missed when their multiplicity is greater than or equalto approximately 12. See the excellent discussion in [13] for more details. −1.0 −0.5 0.0 0.5 1.0−1.2−1.0−0.8−0.6−0.4−0.20.0 f mult (x) −1.0 −0.5 0.0 0.5 1.0−4−3−2−10 f mult (x) Figure 7: The polynomial p mult ( x ) of order 7 on the left, and order 9 on the right. Theroots are indicated with blue stars ( (cid:63) ). p yuji ( x ) : A Pathological Example from [26] Here we consider the order-8 polynomial p yuji ( x ) = (cid:88) i =0 a i T i ( x ) , (192)where the coefficient vector a is given by a = (cid:0) − − − − − − − − (cid:1) , (193)described in § − ; we set it close to machine epsilon instead). Observe that the entry in the44 ig Algorithm 4Degree n (cid:107) c (cid:107) (cid:107) bal ( C ) (cid:107) max i | z i | n roots max i | η ( p ; (cid:98) x i ) | n roots max i | η ( p ; (cid:98) x i ) | . · . · . · . · − . · − . · . · . · . · − . · − . · . · . · † . · − . · −
10 0 . · . · . · † . · − . · −
11 0 . · . · . · . · − . · −
100 0 . · . · . · . · − . · − . · . · . · (cid:5) . · − (cid:5) . · −
10 100 0 . · . · . · (cid:5) . · − (cid:5) . · −
13 100 0 . · . · . · (cid:5) . · − (cid:5) . · − Table 5: The results of computing the roots of the polynomial p mult ( x ), using ouralgorithm and eig , with δ = 10 − . † The error was so large here that some roots wereoutside of the region (188). (cid:5)
The multiplicity of the rightmost root was so large here thatsome roots were outside of the region (188). eig
Algorithm 4Degree n (cid:107) c (cid:107) (cid:107) bal ( C ) (cid:107) max i | z i | n roots max i | η ( p ; (cid:98) x i ) | n roots max i | η ( p ; (cid:98) x i ) |
10 100 0 . · . · . ·
10 0 . · −
10 0 . · −
11 11 0 . · . · . ·
11 0 . · −
11 0 . · −
12 0 . · . · . · † . · −
11 0 . · −
13 0 . · . · . · † . · −
11 0 . · −
14 0 . · . · . ·
11 0 . · −
11 0 . · −
100 0 . · . · . ·
11 0 . · −
11 0 . · −
12 100 0 . · . · . ·
12 0 . · −
12 0 . · −
13 100 0 . · . · . ·
13 0 . · −
13 0 . · −
14 100 0 . · . · . ·
14 0 . · −
14 0 . · −
15 100 0 . · . · . · (cid:5) . · − (cid:5) . · − Table 6: The results of computing the roots of the polynomial p mult ( x ) in extendedprecision, using our algorithm and eig , with δ = 10 − . † The error was so large here thatsome roots were outside of the region (188). (cid:5)
The multiplicity of the rightmost root wasso large here that some roots were outside of the region (188).bottom right corner of the corresponding colleague matrix is around 10 in size. Thispolynomial has seven real roots on the interval [ − , eig strugglesto produce any accuracy at all, while our algorithm returns all the roots to machineprecision. f cas ( x ) : A Pathological Example from [14] Here we consider the order- n Chebyshev expansion of the smooth function f cas ( x ) = sin (cid:16) x + 10 − (cid:17) , (194)described in [14]. The first 1430 Chebyshev expansion coefficients of f cas ( x ) are shown inFigure 8. This function is highly oscillatory, and requires a Chebyshev expansion of order45 ig Algorithm 4Degree n (cid:107) c (cid:107) (cid:107) bal ( C ) (cid:107) max i | z i | n roots max i | η ( p ; (cid:98) x i ) | n roots max i | η ( p ; (cid:98) x i ) | . · . · . · . · − . · − Table 7: The results of computing the roots of the polynomial p yuji ( x ), using ouralgorithm and eig , with δ = 10 − .at least 1430 to resolve it. Our numerical experiments are shown in Table 8. Plots of thereal and complex roots of the order-1600 Chebyshev expansion are shown in Figure 9. -15 -12 -9 -6 -3 |a i | Figure 8: The magnitudes of the first 1430 Chebyshev expansion coefficients of f cas ( x ). eig Algorithm 4 n (cid:107) c (cid:107) (cid:107) bal ( C ) (cid:107) max i | z i | n roots max i | η ( p ; (cid:98) x i ) | n roots max i | η ( p ; (cid:98) x i ) | . · . · . ·
62 0 . · −
62 0 . · − Table 8: The results of computing the roots of the Chebyshev expansion of the function f cas ( x ), using our algorithm and eig , with δ = 10 − . The CPU times of our algorithm are compared to the times of MATLAB’s eig in Figure 10.These timing experiments were performed on polynomials with random, independent,normally distributed Chebyshev expansion coefficients, with the last coefficient chosen sothat the vector c has the desired norm (see Section 5.1). We found that the CPU timesdo not depend on (cid:107) c (cid:107) , so we report the results only for (cid:107) c (cid:107) = 2. We observe that ouralgorithm is strictly faster than eig , even for small inputs, except perhaps for n = 7, forwhich our algorithm and eig cost about the same. The growth in CPU times taken byour algorithm agrees nicely with the expected asymptotic cost of O ( n ), while eig showsa growth of O ( n ). 46 Figure 9: The roots of the Chebyshev expansion of order 1600 of the function f cas ( x ).The complex roots (cid:98) z i are plotted with purple crosses (+) and the real roots (cid:98) x i are plottedwith blue stars ( (cid:63) ). n10 -6 -5 -4 -3 -2 -1 T i m e ( s ) Figure 10: The CPU times of our algorithm, plotted with purple crosses (+), and theCPU times of eig , plotted with green x’s ( × ), for various values of n , where n is thedimensionality of the colleague matrix. In this manuscript, we describe an explicit, O ( n ) structured QR algorithm for colleaguematrices (more generally, for Hessenberg matrices that are the sum of a Hermitian matrixand a rank-1 matrix), and prove that it is componentwise backward stable. These resultscan be generalized in several directions, of which we describe four. First, the algorithmcan be modified in a fairly straightforward way to work on Hessenberg matrices thatare the sum of a Hermitian matrix and a rank- k perturbation (as opposed to a rank-1perturbation). Like in the rank-1 case, most of the entries in the Hermitian part areinferred from the low rank part, except that they are inferred from a rank- k matrixinstead of a rank-1 matrix. The QR iteration proceeds similarly, the main differencebeing that the correction in Line 12 of Algorithm 1 becomes a correction to a row of an n × k matrix.Second, the extension of this algorithm to an implicit, O ( n ) structured QR algorithm47hat is also componentwise backward stable is fairly straightforward. The key observationof this manuscript (that, to maintain componentwise error bounds, a correction must beapplied to the rank-1 part whenever an entry of the matrix is eliminated) can be appliedto a bulge-chasing algorithm where the matrix is similarly represented by generators.Third, we observe that this algorithm can be used to accelerate the calculation ofeigenvalues of general matrices, not necessarily in Hessenberg form, that are representableas the sum of a Hermitian matrix and a rank-1 (or rank- k ) matrix. Such matrices can bequickly reduced to Hessenberg form in O ( n ) operations, and once they are in Hessenbergform, our O ( n ) algorithm can be used to compute the eigenvalues. Thus, the cost of thealgorithm is dominated by the reduction to Hessenberg form, which will have a muchsmaller constant than the standard algorithm for the evaluation of the eigenvalues of theoriginal dense matrix. Furthermore, if the reduction to Hessenberg form can be done in acomponentwise backward stable fashion, then this scheme results in a componentwisebackward stable eigensolver for general matrices of the form Hermitian plus rank-1 (orrank- k ).Fourth, we observe that our algorithm can be used to find the roots of polynomialsexpressed in other bases besides Chebyshev polynomials. It was observed in [5] that,given any orthogonal polynomial basis that satisfies a three-term recurrence relation,and given a polynomial expressed in that basis, it is possible to construct an analogueof the colleague matrix from the expansion coefficients. This matrix is a Hessenbergmatrix that is the sum of a (not necessarily symmetric) tridiagonal matrix and a rank-1matrix; matrices of this form are called comrade matrices . For all classical orthogonalpolynomials, the tridiagonal part can made symmetric by balancing, without making anyentries of the matrix much larger or much smaller. Our algorithm can then be applied tothis new matrix, which is a Hessenberg matrix that is the sum of a symmetric tridiagonalmatrix and a rank-1 matrix. 48 eferences [1] Advanpix Multiprecision Computing Toolbox for MATLAB. Yokohama, Japan:Advanpix LLC. See .[2] Aurentz, J., T. Mach, L. Robol, R. Vandebril, D.S. Watkins. Core-Chasing Algorithmsfor the Eigenvalue Problem.
SIAM, 2018.[3] Aurentz, J.L., T. Mach, L. Robol, R. Vandelbril, and D.S. Watkins. “Fast andbackward stable computation of roots of polynomials, part II: backward erroranalysis; companion matrix and companion pencil.”
SIAM J. Matrix Anal. Appl. ,39.3 (2018): 1245–1269.[4] Aurentz, J.L., T. Mach, R. Vandelbril, and D.S. Watkins. “Fast and backward stablecomputation of roots of polynomials.”
SIAM J. Matrix Anal. Appl. , 36.3 (2015):942–973.[5] Barnett, S. “A companion matrix analogue for orthogonal polynomials.”
LinearAlgebra Appl.
12 (1975): 197–208.[6] Battles, Z. and L.N. Trefethen. “An extension of Matlab to continuous functionsand operators.”
SIAM J. Sci. Comput.
25 (2004): 1743–1770.[7] Bauer, F.L., C.T. Fike. “Norm and exclusion theorems.”
Numer. Math.
Numer. Math.
116 (2010): 177–212.[9] Bini, D.A., P. Bonito, Y. Eidelman, L. Gemignani, and I. Gohberg. “A fast implicitQR eigenvalue algorithm for companion matrices.”
Linear Algebra Appl.
432 (2010):2006–2031.[10] Bini, D.A., Y. Eidelman, L. Gemignani, and I. Gohberg. “Fast QR eigenvaluealgorithms for Hessenberg matrices which are rank-one perturbations of unitarymatrices.”
SIAM J. Matrix Anal. Appl.
Numer. Math.
100 (2005):373–408.[12] Boyd, J.P. “Finding the Zeros of a Univariate Equation: Proxy Rootfinders, Cheby-shev Interpolation, and the Companion Matrix.”
SIAM Rev.
J. Comput. Appl. Math.
205 (2007): 281–295.[14] Casulli, A. and L. Robol. “Rank-structured QR for Chebyshev rootfinding.”arXiv:2010.11416v1 [math.NA], Oct. 2020.4915] Chandrasekaran, S., M. Gu, J. Xia, J. Zhu. “A fast QR algorithm for companionmatrices.”
Recent Advances in Matrix and Operator Theory , Oper. Theory Adv. Appl.
179 (2008): 111–143.[16] Chebfun for MATLAB. See .[17] Corless, Robert M. “Generalized companion matrices in the Lagrange basis.”
Pro-ceedings EACA (pp. 317–322). Santander, Spain: Universidad de Cantabria, 2004.[18] Corless, R.M. Personal Communication. 13 July 2020.[19] Edelman, A. and H. Murakami. “Polynomial roots from companion matrix eigenval-ues.”
Math. Comput.
Numer. Algor.
Q. J. Math. , 2.12 (1961): 61–68.[22] Higham, N.J.
Accuracy and Stability of Numerical Algorithms.
SIAM J. Matrix Anal. Appl.
Numerical Methods for Roots of Polynomials, Part I.
Elsevier, 2007.[25] McNamee, J.M. and V.Y. Pan.
Numerical Methods for Roots of Polynomials, PartII.
Elsevier, 2013.[26] Nakatsukasa, Y. and V. Noferini. “On the stability of computing polynomial rootsvia confederate linearizations.”
Math. of Comput. , 85.301 (2016): 2391–2425.[27] Noferini, V., L. Robol, and R. Vandebril. “Structured backward errors in lineariza-tions.” arXiv:1912.04157v1 [math.NA], Dec. 2019.[28] Pan, V.Y. “Solving a polynomial equation: some history and recent progress.”
SIAMRev.
Numer. Math.
13 (1963), 293–304.[30] P´erez, J. and V. Noferini. “Chebyshev rootfinding via computing eigenvalues ofcolleague matrices: what is it stable?”
Math. Comput.
J. Inst. Maths. Appl. , July, 2020. See https://meetings.siam.org/sess/dsp_talk.cfm?p=106559 .[33] Specht, W., “Die Lage der Nullstellen eines Polynoms III.”
Math. Nach.
16 (1957):369–389.[34] Specht, W. “Die Lage der Nullstellen eines Polynoms IV.”
Math. Nach.
21 (1960):201–222.[35] Tisseur, F. “Backward stability of the QR algorithm.”
Equipe d’Analyse Numerique,Universit´e Jean Monnet de Saint-Etienne Technical Report
239 (1996).[36] Trefethen, L.N.
Approximation Theory and Practice.
SIAM, 2013.[37] Wilkinson, J.H.