A fast iterative algorithm for near-diagonal eigenvalue problems
Maseim Kenmoe, Ronald Kriemann, Matteo Smerlak, Anton S. Zadorin
AA FAST ITERATIVE ALGORITHM FOR NEAR-DIAGONALEIGENVALUE PROBLEMS ∗ MASEIM KENMOE † , RONALD KRIEMANN ‡ , MATTEO SMERLAK ‡ , AND
ANTON S.ZADORIN ‡ Abstract.
We introduce a novel iterative eigenvalue algorithm for near-diagonal matrices termed iterative perturbative theory (IPT). Built upon a “perturbative” partitioning of the matrix intodiagonal and off-diagonal parts, IPT computes one or all eigenpairs with a complexity per iterationof one matrix-vector or one matrix-matrix multiplication respectively. Thanks to the high parallelismof these basic linear algebra operations, we obtain excellent performance on multi-core processorsand GPUs, with large speed-ups over standard methods (up to ∼
50x with respect to LAPACK andARPACK). For matrices which are not close to being diagonal but have well-separated eigenvalues,IPT can be be used to refine low-precision eigenpairs obtained by other methods. We give sufficientconditions for linear convergence and demonstrate performance on dense and sparse test matrices.In a real-world application from quantum chemistry, we find that IPT performs similarly to theDavidson algorithm.
Key words. eigenvalue algorithm, perturbation theory, iterative refinement
AMS subject classifications.
1. Introduction.
Computing the eigenvalues and eigenvectors of a matrix thatis already close to being diagonal (or diagonalizable in a known basis) is a classicproblem throughout science.
Ab initio studies of electronic and nuclear structures,for instance, often involve Hamiltonian operators expressed in bases that nearly di-agonalize them, e.g. for self-consistent field or configuration interaction calculations[1]. In classical electromagnetism, perturbative problems arise when the solutionto a slightly deformed problem are sought, for instance when boundary conditionsare shifted or optical properties are varied [2]. In mechanical engineering, one oftenwants to know the effect of slight deformations of stiffness or density on spectra, asin Rayleigh’s pioneering analysis of vibrating strings [3]. More generally, we mightknown approximate eigenpairs through some method, and the problem is to refinethem to a given precision.Eigenvalue perturbation theory provides estimates and bounds for the variationof eigenpairs with respect to perturbations of the matrix [4]. Here our aim is, instead,to compute these eigenpairs exactly (up to numerical error), i.e. we are interested ineigenvalue algorithms for perturbative problems. In the symmetric case, algorithmthat take advantage of near diagonality are available. Thus, when just a few extremaleigenpairs are desired, Davidson-type subspace iteration methods [5] can be efficient,with large speed-ups over Lanczos [6]. To obtain the complete set of eigenvectorsof near-diagonal symmetric matrices, Jacobi’s algorithm may be advised: its localquadratic convergence and parallelizability imply that converged eigenvectors cansometimes be obtained in a fraction of the time required for tri-diagonalization. Thisis especially true on modern graphical processing units (GPUs), whose massive paral- ∗ Submitted to the editors in Dec. 2020.
Funding:
Funding for this work was provided by the Alexander von Humboldt Foundationin the framework of the Sofja Kovalevskaja Award endowed by the German Federal Ministry ofEducation and Research. † University of Dschang, Cameroon, and Max Planck Institute for the Mathematical Sciences,Leipzig, Germany ‡ Max Planck Institute for the Mathematical Sciences, Leipzig, Germany1 a r X i v : . [ m a t h . NA ] F e b M. KENMOE, R. KRIEMANN, M. SMERLAK AND A. ZADORIN lelism allow for extremely high flop rates that can sometimes offset the unfavourablecomplexity of Jacobi’s algorithm [7].Here we introduce a novel iterative algorithm for the computation of one orall eigenpairs of a near-diagonal matrix. While more restricted in its applicabilitythat Davidson- or Jacobi-like methods, its performance is higher. This follows fromour algorithm’s simple linear-algebraic structure: each iteration consists of just onematrix-vector multiplication (for one eigenpair) or one matrix-matrix multiplication(for all eigenpairs). These operations are highly optimized in the Basic Linear Al-gebra Subprograms (BLAS Level 2 and 3 respectively), resulting in comparativelylower execution times. Moreover, since matrix-matrix multiplication has sub-cubictheoretical complexity O ( N . ), so does our algorithm, in contrast with classicalalgorithms which are O ( N ).Iterative perturbation theory (IPT) was inspired by the Rayleigh-Schr¨odinger(RS) perturbative expansion familiar from textbook quantum mechanics [8]. (Therelationship between the two methods, together with applications to physical andchemical examples, will be discussed in a companion paper [9].) Its structure, how-ever, is different: instead of seeking the perturbed eigenvectors as power series in theperturbation, we compute them iteratively, as fixed points of a quadratic equation.One consequence of this difference is that, unlike Rayleigh-Schr¨odinger expansions[10], the domain of convergence of our method as a scalar perturbation parameters isvaried is not restricted to a disk in the complex plane bounded by exceptional points(values of the parameter such that the matrix is defective); instead, IPT behaves forlarge perturbation like the logistic map, i.e. it follows the period-doubling route tochaos.Our presentation starts with a reformulation of the eigenvalue equation as a a fixedpoint equation for a quadratic map in complex projective space (section 2). We thenestablish a sufficient condition for fixed point iteration to converge and illustrate itsdivergence for larger perturbations with a simple two-dimensional example (section 3).Next, we consider the computational efficiency of our method on multi-CPU and GPUarchitectures using random test matrices (section 4). We then discuss one possibleapplications of our algorithm as a refinement method for eigenvectors (section 5). Weconclude with some possible directions for future work.Below we denote complex vectors and functions to vector spaces by the boldface lower case latters: v ∈ C n . We denote coordinates of vector v in the standardbasis of C n by ( v ) i or by v i , using the same letter but in Roman. For matrices andfunctions to spaces of matrices we use upper case Roman letters. Superscripts inparentheses denote the order of an approximation, while the subscripts in parenthesesenumerate terms in asymptotic series (so called corrections). Everywhere in the textwe understand by genericity of an object in some set C n with the standard topologythe condition that that object belongs to a fixed, open, everywhere dense subset of C n .A phrase “condition A is false for a generic object O ∈ S ” should be read as “condition A corresponds to a closed nowhere dense subset of O ”. For example, genericity ofan n × n matrix M is understood with M viewed as an element of C n × n . A phrase“eigenvalues of a generic n × n matrix are pairwise different” should be understoodas “the set of matrices with eigenvalues of higher multiplicity form a nowhere denseclosed subset in C n × n ”. Genericity of a family of partitions (defined below) D + λ ∆,where D = diag( v ), v ∈ C n , and ∆ ∈ C n × n , λ ∈ C , is understood as genericity of apoint ( v , ∆ , λ ) ∈ C n × C n × n × C , and so on. TERATIVE EIGENVALUE ALGORITHM
2. Eigenvectors as fixed points.
Consider an n × n complex matrix M . Itseigenvectors are elements of z ∈ C n . Lemma
The eigenvectors of M are in one-to-one correspondence with non-zero solutions of the systems {E ij } i,j of polynomial equations in coordinates of z forall i and j , where E ij : ( M z ) j z i = ( M z ) i z j . Proof.
The system {E ij } i,j states that the exterior product M z ∧ z of M z and z seen as a polyvector constructed from elements of the vector space C n is equal to 0.This statement is equivalent to the statement that M z and z are collinear.Being defined up to a multiplicative constant, the eigenvectors of matrix M arenaturally identified with elements of the complex projective space C P n − , built from C n in the standard way by factorization on rescaling. Then the same standard coor-dinates of C n define a tuple [ z : · · · : z n ] of the standard homogeneous coordinatesof [ z ] ∈ C P n − , where by [ z ] we denoted the equivalence class of z with respect tothe factorization. These special coordinates define in C P n − the standard (holomor-phic) atlas of affine charts { U i } ≤ i ≤ n , U i = { [ z ] ∈ C P n − : z i (cid:54) = 0 } , with coordinates { ζ j = z j /z i } j : j (cid:54) = i . There are also standard embeddings of U i to C n as affine subspacesfor each i by setting z i = 1 and by identification z j = ζ j for each j . From this pointof view, the system {E ij } i,j defines a projective variety.Fix an index i . To find the full set of eigenvectors in a single chart U i it is enoughto solve a smaller set of equations. Lemma
Eigenvectors z of M such that [ z ] ∈ U i are in one-to-one correspon-dence with the solutions of the system of equations {E ij } j : j (cid:54) = i in the same chart.Proof. We need to prove that for [ z ] ∈ U i the two following propositions areequivalent: ( i ) z is a root of ( M z ) j z i = ( M z ) i z j for j (cid:54) = i and ( ii ) ∃ ε ∈ C M z = ε z .The ( i ) ⇒ ( ii ) direction. For [ z ] ∈ U i it is enough to consider only vectors with z i = 1. Then from {E ij } j : j (cid:54) = i we conclude that ( M z ) j = ( M z ) i z j for all j (cid:54) = i .If we now denote the common factor in these expressions as ε = ( M z ) i , then ( ii )immediately follows.The ( ii ) ⇒ ( i ) direction. Consider an eigenvector z of M such that z i (cid:54) = 0. Fromthe eigenvalue equation M z = ε z we can express its eigenvalue as ε = ( M z ) i /z i ;inserting this back into M z = ε z shows that z is a root of the system {E ij } j : i (cid:54) = j .Denoting V i the projective variety defined by the system {E ij } j : j (cid:54) = i for a fixed i and recalling that the set { U i } i forms an atlas we conclude that the set of eigenvectorsof M can be identified with the set ∪ i ( V i ∩ U i ), which is in fact the projective variety ∩ i V i . Since eigenvectors generically have non-zero coordinates in all directions, eachcomponent V i ∩ U i typically contains the complete set of eigenvectors.It should be noted, however, that V i taken alone may have points unrelated tothe eigenvectors of M , but these additional points can only be at infinity in U i byLemma 2.2. Indeed, consider points of V i such that z i = 0. At the very least theyinclude points that are given by ( M z ) i = 0 in addition to z i = 0. Thus, in general,there is a whole ( n − C P n − (corresponding to a ( n − C n ).We now further assume a partioning M = D + ∆ M. KENMOE, R. KRIEMANN, M. SMERLAK AND A. ZADORIN
Algorithm 2.1
One eigenvector of a near-diagonal matrix M Choose a partition M = D + ∆ with D diagonal Compute g i ← (( D jj − D ii ) − ) j : j (cid:54) = i and set ( g i ) i = 0 Initialize z ← e i while z not converged do z ← f i ( z ) with f i defined by (2.1) end while Return eigenvector z Return eigenvalue ε i = D ii + (∆ z ) i on a diagonal part D and the residual part ∆. The motivation comes from physics,where the diagonal part D often consists of unperturbed eigenvalues (cid:15) i and ∆ repre-sents perturbations in the so called perturbation theory. In practice D can be taken asthe diagonal elements of M , although different partitionings can sometimes be moreappropriate [11]. Provided the (cid:15) i ’s are all simple (non-degenerate, that is pairwisedifferent), we can rewrite the polynomial system {E ij } j : j (cid:54) = i for a particular fixed i as z i z j = ( (cid:15) j − (cid:15) i ) − ( z j (∆ z ) i − z i (∆ z ) j ) for j (cid:54) = i. Thus, the eigenvectors of M in U i can be identified with the solutions of the fixedpoint equation z = f i ( z ) with f i : C n → C n the map(2.1) f i ( z ) = e i + g i ◦ ( z (∆ z ) i − ∆ z )where e i is the i -th standard basis vector of C N , g i is the i -th column of the in-verse gaps matrix G with components G jk = ( g k ) j = ( (cid:15) j − (cid:15) k ) − for j (cid:54) = k and G jj = ( g j ) j = 0. Here ◦ denotes the component-wise product of vectors in the stan-dard basis. To obtain the eigenvector of M closest to the basis vector e i , we cantry to solve (2.1) by fixed-point iteration; this is the basic idea of our method. Asnoted, each iteration of f i consists of a single multiplication of the present vector bythe perturbation ∆, followed by element-wise multiplication by the vector g i . Theresulting solution will be automatically normalized such that its i -th coordinate isequal to 1.The same iterative technique can be used to compute all eigenvectors of M inparallel. For this it suffices to bundle all N candidate eigenvectors for each i into amatrix A and apply the map f i to the i -th column of A . This corresponds to thematrix map(2.2) F ( A ) ≡ I + G ◦ (cid:16) A D (∆ A ) − ∆ A (cid:17) , where ◦ denotes the Hadamard (element-wise) product of matrices and D ( M ) is thediagonal matrix built with the diagonal elements of M . Starting from A (0) = I , weobtain a sequence of matrices A ( k ) = F ( A ( k − ) whose limit as k → ∞ , if exists, isthe full set of eigenvectors. We call this approach iterative perturbation theory (IPT).
3. Convergence and divergence.
In this section we look at the convergenceof fixed-point iteration for the map (2.2). In a nutshell, the off-diagonal elements ∆must be small compared to the diagonal gaps (cid:15) i − (cid:15) j , a typical condition for eigenvectorperturbation theory [10]. TERATIVE EIGENVALUE ALGORITHM Algorithm 2.2
Full eigendecomposition of a near-diagonal matrix M Choose a partition M = D + ∆ with D diagonal Compute G ← (( D ii − D jj ) − ) i (cid:54) = n and set g ii = 0 Initialize A ← I while A not converged do A ← F ( A ) with F defined by (2.2) end while Return eigenmatrix A Return eigenvalues E = D + D (∆ A ) Let (cid:107) · (cid:107) denote the spectralnorm of a matrix, i.e. its largest singular value. Let M ∈ C n × n be a matrix. Let M = D + ∆ be its partition into a diagonal matrix D and the residual matrix ∆such that the corresponding to D matrix of inverse gaps G is defined, which impliesthat all diagonal elements of D are pare-wise different. Let F : C n × n → C n × n be themapping defined by G and ∆ as in the previous section. Theorem If (3.1) (cid:107) G (cid:107)(cid:107) ∆ (cid:107) < − √ , then the dynamical system defined by iterations of F and A (0) = I converges to aunique, asymptotically stable fixed point in the ball B √ ( I ) .Proof. The proof uses the Banach fixed-point theorem based on the (cid:107) · (cid:107) norm,which is sub-multiplicative with respect to both the matrix and Hadamard prod-ucts [12].First, the estimate (cid:107) F ( A ) − I (cid:107) ≤ (cid:107) G (cid:107)(cid:107) ∆ (cid:107) ( (cid:107) A (cid:107) + (cid:107) A (cid:107) )implies that F maps a closed ball B r ( I ) of radius r centered on I onto itself whenever (cid:107) G (cid:107)(cid:107) ∆ (cid:107) ≤ r/ [(1 + r )(2 + r )]. Next, from (2.2) we have the estimate (cid:107) F ( A ) − F ( B ) (cid:107) ≤ (cid:107) G (cid:107)(cid:107) ∆ (cid:107) (1 + (cid:107) A + B (cid:107) ) (cid:107) A − B (cid:107) . Hence, F is contracting in B r ( I ) provided (cid:107) G (cid:107)(cid:107) ∆ (cid:107) < / [1 + 2(1 + r )]. When bothconditions on (cid:107) G (cid:107)(cid:107) ∆ (cid:107) hold the Banach fixed-point theorem implies that A ( k ) = F k ( I )converges exponentially to a unique fixed point A ∗ within B r ( I ) as k → ∞ . Choosingthe optimal radiusargmax r> min (cid:18) r (1 + r )(2 + r ) ,
11 + 2(1 + r ) (cid:19) = √ , we see that (cid:107) G (cid:107)(cid:107) ∆ (cid:107) < − √ ≈ .
17 guarantees convergence to the fixed point A ∗ .The set of solutions of equation A = F ( A ) contains matrices composed of allcombinations of eigenvectors of M with the appropriate normalization (the i -th co-ordinate of the column at the i -th position is set to 1). Some solutions may containseveral repeated eigenvectors. Such solutions are singled out by their dropped rank,viz. rank A < n . In principle, there is a danger that the iterative algorithms convergesto one such solution with a loss of information about the eigenvectors of M as a result.The following lemma guarantees that under condition (3.1) this does not happen. M. KENMOE, R. KRIEMANN, M. SMERLAK AND A. ZADORIN
Lemma
Let condition (3.1) hold for a partition of matrix M and A ∗ be theunique fixed point of F in B √ ( I ) . Then A ∗ has full rank.Proof. The proof uses additional lemmas and one theorem provided in Appen-dix A.A square matrix is rank-deficient if either two of its columns are collinear or morethan two of its columns are linearly dependent but not pair-wise collinear. For A ∗ thelatter is only possible if the corresponding eigenvectors belong to an eigenspace of M of dimension higher than one, which is ruled out by Lemma A.1. By Lemma A.2, A ∗ does not contain any defective eigenvectors of M . Then the rank can be lost only bya repeat of an eigenvector (with renormalization) corresponding to an eigenvalue ofmultiplicity one.Embed M into the family M λ = D + λ ∆ with λ ∈ [0 , M = M and M = D . Consider a partition for each member of the family M λ = D λ + ∆ λ suchthat D λ = D and ∆ λ = λ ∆. Let G λ = G be the inverse gaps matrix of D λ and F λ bethe mapping defined by G λ and ∆ λ according to (2.2) with a substitution of G by G λ and ∆ by ∆ λ . It follows that (cid:107) G λ (cid:107)(cid:107) ∆ λ (cid:107) < − √ λ . Thus, accordingto Theorem 3.1, each F λ has a unique fixed point A ∗ λ in B √ ( I ). Furthermore, byLemmas A.1 and A.2 we can be sure that none of the columns of A ∗ λ for each value of λ corresponds either to an eigenvector with algebraic multiplicity higher than one or laysin an eigenspace of M λ with dimension higher than one. Thus they all correspondto eigenvalues that are simple roots of the characteristic polynomial of M λ . Thenall projective points that induce columns of A ∗ λ are differentiable functions of λ byTheorem A.3. It means that if some two columns of A ∗ = A ∗ are collinear (and thuscorrespond to the same projective point), the same columns stay collinear in A ∗ λ forall λ . But none of the columns of I = A ∗ are collinear. We conclude that A ∗ cannothave collinear columns. Corollary If M has eigenvalues with multiplicity higher than one, then forany partition ˜ M = ˜ D + ˜∆ of any matrix ˜ M similar to M , where ˜ D is diagonal such thatthe corresponding inverse gaps matrix ˜ G is defined, the relation (cid:107) ˜ G (cid:107)(cid:107) ˜∆ (cid:107) ≥ − √ holds. It is interesting to constrastthe present iterative method with conventional RS perturbation theory, where theeigenvectors of a parametric matrix M = D + λ ∆ are sought as power series in λ , viz. z = (cid:80) (cid:96) λ (cid:96) z [ (cid:96) ] , where vector-terms (so called corrections) z [ (cid:96) ] do not depend on λ . Provided that D has distinct diagonal entries, it is indeed possible to express thematrix of eigenvectors A (with the same normalization as in Section 2 and with theorder of eigenvectors such that A → I as λ →
0) as a power series A = (cid:80) (cid:96) λ (cid:96) A [ (cid:96) ] ,which converges in some disk around λ = 0 [10]. Then the order k approximation of A takes the form A ( k )RS = (cid:80) k(cid:96) =0 λ (cid:96) A [ (cid:96) ] . The matrix corrections A [ (cid:96) ] are obtained from A [0] = I via the recursion (Appendix B)(3.2) A [ (cid:96) ] = G ◦ (cid:32) (cid:96) − (cid:88) s =0 A [ (cid:96) − − s ] D (∆ A [ s ] ) − ∆ A [ (cid:96) − (cid:33) . The iterative scheme A ( k ) completely contains this RS series in the sense that A ( k ) = A ( k )RS + O ( λ k +1 ); this can be seen by induction (Appendix C). In other words, we canrecover the usual perturbative expansion of A to order k by iterating k times the TERATIVE EIGENVALUE ALGORITHM D - PTRS - PTsingularities fold flipfoldflip flipflip R
Fig. 1: Convergence of IPT in the two-dimensional example (3.3). Left: In the com-plex λ -plane, RS perturbation theory (RS-PT) converges inside a circle of radius 1 / ± i/ M is not diagonalizable. Dynamical perturbation theory (IPT)converges inside the domain bounded by the blue cardioid, which is larger—especiallyalong the real axis, where there is no singularity. Outside this domain, the map canconverge to a periodic cycle, be chaotic or diverge to infinity, following flip bifurca-tions (along the real axis) and fold bifurcations (at the singularities). The domainwhere the map remains bounded (black area) is a conformal transformation of theMandelbrot set. Right: The bifurcation diagram for the quadratic map f along thereal λ -axis illustrates the period-doubling route to chaos as λ increases away from0 (in absolute value). Orange and left vertical lines indicate the boundary of theconvergence domains of RS-PT and IPT respectively.map F and dropping all terms O ( λ k +1 ). Moreover, the parameter whose smallnessdetermines the convergence of the RS series is the product of the perturbation magni-tude | λ | with the inverse diagonal gaps (cid:107) G (cid:107) [10], just as it determines the contractionproperty of F .But IPT also differs from the RS series in two key ways. First, the complexity ofeach iteration is constant (essentially just one matrix product with ∆), whereas com-puting the RS corrections A [ (cid:96) ] involves the sum of increasingly many matrix products.Second, not being defined as a power series, the convergence of A ( k ) when k → ∞ is not a priori restricted to a disk in the complex λ -plane. Together, these twodifferences suggest that IPT has the potential to converge faster, and in a larger do-main, than RS perturbation theory. This is what we now examine, starting from anelementary but explicit example. × example. To build intuition, let us consider the para-metric matrix(3.3) M = (cid:18) λλ (cid:19) . This matrix has eigenvalues ε ± = (1 ± √ λ ) /
2, both of which are analytic insidethe disk | λ | < / λ = ± i/
2. (These singu-larities are in fact exceptional points, i.e. M is not diagonalizable for these values.)Because the RS series is a power series, these imaginary points contaminate its con- M. KENMOE, R. KRIEMANN, M. SMERLAK AND A. ZADORIN vergence also on the real axis, where no singularity exists: A RS diverges for any valueof λ outside the disk of radius 1 /
2, and in particular for real λ > / A ( k ) = (cid:18) f k (0) − f k (0) 1 (cid:19) , where f ( x ) = λ ( x −
1) and the superscripts indicate k -fold iterates. This one-dimensional map has two fixed points at x ∗± = ε ± /λ . Of these two fixed points x ∗ + is always unstable, while x ∗− is stable for λ ∈ ( −√ / , √ /
2) and loses its stabilityat λ = ±√ / λ , the iterated map f k —hence the fixed-point iterations A ( k ) —follows the period-doubling route to chaosfamiliar from the logistic map [13]. For values of λ along the imaginary axis, we findthat the map is stable if (cid:61) λ ∈ ( − / , /
2) and loses stability in a fold bifurcation atthe exceptional points λ = ± i/
2. The full domain of convergence of the system isstrictly larger than the RS disk, as shown in Figure 1. We also observe that the diskwhere both schemes converge, the dynamical scheme does so with a better rate thanRS perturbation theory: we check that | f k (0) − x ∗− | ∼ | − √ λ | k = O ( | λ | k ),while the remainder of the RS decays as O ( | λ | k ). This is a third way in which thedynamical scheme outperforms the usual RS series, at least in this case: not onlyis each iteration computationally cheaper, but the number of iterations required toreach a given precision is lower.The set where IPT loses stability (the blue curve in Figure 1) can be computed as4 λ + e it (2 − e it ) = 0 (same equation for both eigenvectors). The cusps of this curveare at λ = ± i/
2. In this particular case, the convergence circle of the RS perturbationtheory is completely contained in the convergence domain of the iterative perturbationtheory, and their boundaries intersect only at the cusp points. This convergencedomain for IPT is directly related to the main cardioid of the classical Mandelbrotset: the set of complex values of the parameter c that lead to a bound trajectory of theclassical quadratic (holomorphic) dynamical system x ( k +1) = ( x ( k ) ) + c . The maincardioid of the Mandelbort set (the domain of stability of a steady state) is boundedby the curve 4 c − e it (2 − e it ) = 0. The boundary of the stability domain of our2 × c (cid:55)→ − λ followed by thevariable change x (cid:55)→ λx . This brings the classical system to the dynamical systemof the only nontrivial component of the first line for our 2 × x ( k +1) = λ (cid:0) ( x ( k ) ) − (cid:1) . The nontrivial component of the second line follows an equivalent(up to the sign change of the variable) equation: x ( k +1) = λ (cid:0) − ( x ( k ) ) (cid:1) . The link between the convergence of IPTand the singularities of the spectrum of M (as a function of the complex perturbationparameter λ ) generalizes to higher dimensions. Consider a map f i for some fixed i . Anattracting equilibrium point of the corresponding dynamical system z ( k ) = f i ( z ( k − )loses its stability when the Jacobian matrix ∂ f i of f i , ( ∂ f i ) js ≡ ∂ ( f i ) j /∂z s = ∂F ji /∂z s ,has an eigenvalue (or multiplier ) with absolute value equal to 1 at this point. Theconvergence domain of the dynamical system (2.2) for the whole matrix A of theeigenvectors is equal to the intersection of the convergence domains for its individuallines (2.1).Consider the system of n + 1 polynomial equations of n + 2 complex variables ( z j for 1 ≤ j ≤ n , λ , and µ ) TERATIVE EIGENVALUE ALGORITHM (cid:40) z = f i ( z ) , det( ∂ f i − µI ) = 0 . The variable µ here plays the role of a multiplier of a steady state. Either by suc-cessively computing resultants or by constructing a Groebner basis with the correctlexicographical order, one can exclude the variables z j from this system, which resultsin a single polynomial of two variables ( λ, µ ) (cid:55)→ P ( λ, µ ). This polynomial defines acomplex 1-dimensional variety. The projection to the λ -plane of the real 1-dimensionalvariety defined by { P = 0 , | µ | = 1 } corresponds to some curve C . A more informa-tive way is to represent this curve as a complex function of a real variable t implicitlydefined by P ( λ, e it ) = 0.The curve C is the locus where a fixed point of f i have a multiplier on the unitcircle. In particular, the fixed point that at λ = 0 corresponds to z i = 1 and z j = 0, j (cid:54) = i , loses its stability along a particular subset of this curve. The convergencedomain of the iterative perturbation theory is the domain that is bounded by theseparts of the curve and that contains 0. It is possible to show that C is a smoothcurve with cusps (return points), which correspond to the values of λ such that M has a nontrivial Jordan normal form (is non-diagonalizable), see Appendix E. In atypical case, all cusps a related to the merging of a pair of eigenvectors of M . Forthe dynamical system (2.1), the cusps, thus, correspond to the fold bifurcations of itssteady states [14]. One of the multipliers equals to 1 at such merged point [15], sothese values of λ can be found as a subset of the roots of the univariate polynomial P ( λ, λ -roots of the discriminant polynomial Disc x det( M − xI ). These roots maycorrespond to an actually defective matrix M (when two eigenvectors merge and M acquires a nontrivial Jordan normal form) or a simple degeneration of its eigenvalues(when two eigenvalues become equal but M retains a trivial Jordan normal form).Simple degenerations of eigenvalues are not related to cusps. Thus, the cusp pointsof C can be found as the intersection of the root sets of the two polynomials.
4. Performance.
We now compare the efficiency and accuracy of IPT to refer-ence eigenvalues routines. Our system consists of a dual AMD EPYC 7702 with 128CPU cores using MATLAB (wrapping LAPACK and ARPACK) and a NVidia TitanV GPU using MAGMA [16] and Nvidia cuSOLVER. Test matrices are of the form(4.1) M = diag( n ) ≤ n ≤ N + λR where R is an array of standard normal random variables and λ a positive param-eter. We consider four cases: R can either be dense or sparse (with density 50 /N ,corresponding to ∼ N non-zero elements), and non-symmetric or symmetric (via( R + R T ) / N eigenpairs in doubleprecision is denoted T eig , and the time to compute the lowest-lying eigenpair is denote T eigs . As a reference we use the time for one matrix-matrix multiplication T mm or onematrix-vector multiplication T mv , respectively. In all figures below we use continuouslines for experiments on the CPU and dashed lines for experiments on the GPU, andbrackets indicate the wrapper we used to call the relevant LAPACK or ARPACKroutine. For the complete diagonalization problem we compare IPTto MATLAB’s eig function, which calls LAPACK’s routines GEEV for non-symmetric0
M. KENMOE, R. KRIEMANN, M. SMERLAK AND A. ZADORIN ��� ������ �������������� ���� � × �� � � × �� � � × �� � ������������ � Fig. 2: Runtime vs. dimension N for dense, non-symmetric perturbative matricesof the form (4.1) in single precision. Blue lines refer to IPT and orange lines toLAPACK’s GEEV routine, implemented on 128 CPU cores (continuous lines) or ona Titan V GPU (dashed lines). Within its perturbative convergence domain, IPT isup to 32 and 91 times faster than GEEV, respectively.matrices and SYEV for symmetric matrices; on the GPU MAGMA is used instead.Figure 3 shows the corresponding CPU timings in the dense, sparse, symmetricand non-symmetric cases ( N = 4096). We make two observations. First, unlikeLAPACK, our algorithm has a similar complexity with symmetric and non-symmetricmatrices, leading to larger speed-ups in the latter case. Second, because it is basedon matrix-matrix multiplication, IPT performs especially well on sparse matrices.This is another difference with standard full spectrum algorithms, which do not takeadvantage of sparsity.As noted earlier, Hessenberg reduction is inefficient for matrices which are alreadynear diagonal. In the next experiment we compare IPT with Nvidia’s GPU imple-mentation of Jacobi’s algorithm, which does not rely on reduction and is known tohave local quadratic convergence. In spite of this advantage, Jacobi (labeled SYEVJin cuSOLVER) only proved faster than divide-and-conquer SYEVD for small matricessizes; for N = 8182 SYEVJ is slower than SYEVD for all values of λ (Figure 4, centerand right panels). By contrast, our implementation of IPT in CUDA is faster thanSYEVD for λ ≤ .
05. For dense, non-symmetric matrices, IPT is much faster thanGEEV for all λ compatible with convergence (Figure 4, left panel). Next we consider the computation of just the lowest eigen-pair of the matrices above, corresponding to the column n = 1. The reference routineshere are the implicitly restarted Lanczos/Arnoldi methods in ARPACK [17] ( eigs inMATLAB), and also the Davidson algorithm [5] for symmetric matrices (using thecode from [18]).Figure 5 presents our results. In all cases (dense, sparse, symmetric or non-symmetric) IPT runs much faster than ARPACK. Davidson , an algorithm designed TERATIVE EIGENVALUE ALGORITHM �������� ( ������ ) ���� ���� ���� ������������ �������� ( ������ ) ���� ���� ���� ������������ �������� ( ������ ) ���� ���� ���� ��������������� �������� ( ������ ) ���� ���� ���� ���������� Fig. 3: IPT vs. LAPACK for the complete diagonalization of perturbative matriceswith increasing λ , in double precision. The largest speed-ups are obtained for thenon-symmetric problem.for diagonally dominant matrices and used e.g. in quantum chemistry applications,provides a more meaningul reference point. Here we find that IPT is up to ∼ λ . This can be explained by the fact that, altough IPT requires asimilar number of iterations as Davidson, each iteration is cheaper, with just onematrix-vector multiplication per step. Underlying the performance of IPT is the high parallelism ofmatrix-vector and, especially, matrix-vector multiplication. Such parallelism is notshared by Householder reflections, which is why eigenvalue routines based on Hessen-berg or tri-diagonal reduction do not scale as well on multi-core or GPU architectures.We illustrate this in Figure 6, where the time to diagonalize a dense, non-symmetricmatrix with λ = 0 . Finally we compared the accuracy of IPT on the full spectrumproblem with LAPACK via MATLAB’s eig . For this we used near-diagonal matricesof the form (4.1) with N = 1024 in double precision varying λ ∈ [10 − , . λ we ran IPT and eig to obtain the matrix of eigenvectors A and correspondingeigenvalues E = diag( ε n ) ≤ n ≤ N . We then measured the accuracy of each routinethrough the residual error (cid:107) M A − AE (cid:107) F , where (cid:107) · (cid:107) F denotes the Frobenius norm.2 M. KENMOE, R. KRIEMANN, M. SMERLAK AND A. ZADORIN �������� ( ����� ) ����� ����� ����� ����� ������������������� �������� ( ����� ) ����� ����� ����� ����� ���������������� �������� ( �������� ) ����� ( �������� ) ����� ����� ����� ����� �������������� ��������� ( �������� ) ������ ( �������� ) ����� ����� ����� ����� ������������� Fig. 4: Timing of full spectrum routines on the GPU for dense perturbative matrices.Here IPT proves to be faster than both Divide and Conquer (SYEVD) and Jacobi(SYEVJ) when λ is sufficiently small.We find that IPT is more than an order of magnitude more accurate than LAPACK,with a median residual error of 4 . · − and 6 . · − respectively.
5. Applications.
In this final section we consider two possible applications ofIPT: as a mixed-precision eigenvalue algorithm (full spectrum problem), and as apotential competitor to Davidson in quantum chemistry (lowest-lying eigenvector).
The condition that a matrixbe near-diagonal of course severely restricts the applicability of IPT, although relevantexamples exist in quantum chemistry, see below. However, IPT can be used moregenerally as a refinement method for well-conditioned eigenvalue problems. Consider ageneric matrix M with well separated eigenvalues, and assume given an approximationof its eigenmatrix A . For instance, we could compute A using standard drivers insingle precision, resulting in a single precision eigenmatrix A ,s . Next, we convert thismatrix to double precision, denoted A ,d , and apply IPT to the near-diagonal matrix M (cid:48) = A − ,d M A ,d . Once iterations have converged to a new eigenmatrix A (cid:48) , we canrotate back and obtain accurate eigenvectors for M as A = A ,d A (cid:48) . In practice thelinear solve and matrix multiplication steps are fast, and so computing A takes aboutthe same time as the low-precision diagonalization A ,s .Obviously, this method cannot be used for ill-conditioned eigenvalue problems,because then ( i ) g becomes large and IPT ceases to converge, and ( ii ) the conditionnumber of A ,d becomes large and accuracy is lost in the linear solve step. To measurethe robustness of the mixed-precision algorithm to eigenvalue clustering, we considered TERATIVE EIGENVALUE ALGORITHM ��������� ( ������ ) ���� ���� ���� ������������� ��������� ( ������ ) �������� ���� ���� ���� ���������� ��������� ( ������ ) ��� ��� ��� ������������� ��������� ( ������ ) �������� ��� ��� ��� ���������� Fig. 5: IPT vs. ARPACK (Lanczos/Arnoldi) and Davidson for the lowest-lying eigen-pair of perturbative matrices of the form (4.1) and increasingly λ . Although Davidsonis known to be especially fast for near-diagonal matrices, IPT is yet faster within itsconvergence domain. �������� ( ������ ) � � �� �� ������������ Fig. 6: Parallel scaling. Unlike direct methods based on Hessenberg reduction (hereGEEV for a near-diagonal matrix of the form (4.1) with parameters as indicated),IPT scales as well as BLAS (here matrix-matrix multiplication). As a result, thespeed-up of IPT over GEEV is only limited by the number of CPU cores available.4
M. KENMOE, R. KRIEMANN, M. SMERLAK AND A. ZADORIN
Algorithm 5.1
Iterative refinement of eigenvectors of matrix M with well-separatedeigenvalues Compute eigenvectors A ,s in single precision e.g. using SGEEV or SSYEV Set A ,d as A ,s in double precision Compute M (cid:48) ← A − ,d M A ,d using a linear solver in double precision Compute eigenvectors A (cid:48) and E using IPT( M (cid:48) ) Return eigenvectors A = A ,d A (cid:48) Return eigenvalues E matrices of the form(5.1) J α = Q T D α Q with D α = diag(10 − αn/N ) ≤ n ≤ N , where α is a parameter controlling the spacing between eigenvalues and Q a randomorthogonal matrix. For matrices of size N = 1024 we found that IPT remainedapplicable up to α (cid:39)
4, corresponding to a minimal spectral gap of 1 . · − . (Forthis matrix eigs failed to converge with the default parameters.) For all values of α below this threshold, we obtain acceptable residual errors ( ∼
5x larger than DGEEV)and significant speed-ups over DGEEV (2 −
3x faster), see Figure 8.
Finally, we tested IPT on a real-worldproblem from quantum chemistry, the full configuration interaction (FCI) approachto ab initio electronic structure calculation [1]. Given a molecule, its geometry (theposition of nuclei) and a basis set, FCI provides the most accurate estimate of theenergy and wavefunction of electrons available. Formally, an approximation of theeigenvectors of the Hamiltonian matrix is first computed with self-consistent field(Hartree-Fock) methods; in that basis the Hamiltonian matrix H is near-diagonal (aswell as symmetric), and FCI proceeds by computing iteratively the eigenvector z withsmallest eigenvalue ε , usually using the Davidson algorithm already mentioned above.Because the dimension N of the Hamiltonian matrix H for n e electrons distributedin n b basis sets is ( n b choose n e ) (hence grows factorially with system size), thebottleneck in iterative FCI computations is the matrix-vector product H z . (Fig. 7shows the sparsity pattern of a small FCI matrix.)In order to reduce the number of iterations as much as possible, we applied Ander-son acceleration to the fixed point iteration z ← f ( z ) in Algorithm 2.1. We recall thatAnderson acceleration with memory m replaces the current iterate z ( k ) with a convexcombination of m k = min { m, k } previous iterates, with weights α ( k − j ) (0 ≤ j ≤ m k )defined by a least-squares minimization problem (see [19] and references therein). Wecompared IPT with Anderson acceleration (with memory m = 5) with the imple-mentation of the Davidson algorithm provided in the open-source quantum chemistrysuite PySCF [20] (using the function pyscf.lib.davidson ); for a python implemen-tation of Anderson acceleration we used the code from Ref. [21]. The convergencecriterion was based on the residual, namely (cid:107) H z − ε z (cid:107) ≤ − . TERATIVE EIGENVALUE ALGORITHM nz = 6197 Fig. 7: Sparsity pattern of a small FCI matrix, here for H O in the minimal basis setsto-3g. Molecule Basis set N ( K , T ) Davidson ( K , T ) IPT-AAHe cc-pvqz 900 (9, 0.098) (8, 0.055)Be cc-pvtz 189225 (13, 1.82) (13, 1.82)H O sto-6g 441 (9, 0.003) (8, 0.002)H O 6-31G 1656369 (18, 5.36) (20, 4.75)LiH 6-31g 3025 (12, 0.028) (12, 0.027)LiH cc-pvtz 894916 (15, 39.95) (15, 41.86)Table 1: Full configuration interaction calculation on small molecules, using differentgeometries and basis sets. With Anderson acceleration (memory m = 5), IPT isequally efficient as the Davidson algorithm which is standard in the field. Here K and T denote the number of matrix-vector evaluations and total CPU time (in s)respectively.As showed in Table 1, we find that IPT-AA converges for realistic FCI problemswith a number of iterations and timing comparable to Davidson. One conceptualdifference between the two methods, however, is that IPT is self-contained, whereasDavidson relies on other eigenvalue algorithms to obtain Ritz values.
6. Discussion.
We have presented a new eigenvalue algorithm for near-diagonalmatrices, be them symmetric or non-symmetric, dense or sparse. Its theoretical com-plexity is lower than that of standard methods (being equal to that of matrix-matrixmultiplication for the full spectrum problem), as is its runtime for matrices of theform diag( n ) n plus perturbation. Its structure is elementary: relying on machine-optimized BLAS routines for linear-algebraic operations, our program consists of lessthan ten lines of python or MATLAB code. The largest speed-ups are obtained fornon-symmetric problems.We emphasize that the near-diagonality condition is quite restrictive: off-diagonalmatrix elements must be small compared to diagonal spacings. While their are im-portant applications with this property (e.g. full configuration interaction), it is fairto say that IPT is more akin to a refinement procedure than a full-fledged eigenvalue6 M. KENMOE, R. KRIEMANN, M. SMERLAK AND A. ZADORIN ����� + �������� ( ������ ) � � � ������������������� ����� + �������� � � � ��� × �� - �� �� × �� - �� �� × �� - �� �� × �� - �� Fig. 8: IPT as a refinement algorithm for ill-conditioned problems (5.1) with increas-ingly small eigengap. Combining single-precision GEEV with IPT gives eigenvectorswith acceptable accuracy at a fraction of the computational cost of double-precisionGEEV.algorithm. Even so IPT can be useful: using a mixed-precision approach we computedthe eigenvectors of a general dense matrix with random entries to double precision ina fraction of the time required by DGEEV.Future work should focus on stabilizing our procedure for larger perturbations.For instance, when the perturbation is too large and IPT blows up, we may shrink λ it to λ/Q for some integer Q , diagonalize the matrix with this smaller perturbation,restart the algorithm using the diagonal matrix thus obtained as initial condition,and repeat the operation Q times. This approach is similar to the homotopy con-tinuation method for finding polynomial roots [22] and can effectively extend thedomain of convergence of the present iterative scheme. Another idea is to leveragethe projective-geometric structure outlined above, for instance by using charts on thecomplex projective space other than z i = 1, which would lead to different maps withdifferent convergence properties. A third possibility is to use the freedom in choosingthe diagonal matrix D to construct maps with larger convergence domains, a trickwhich is known to sometimes improve the convergence of the RS series [11]. Finally,we saw that the convergence of IPT can be improved using Anderson acceleration,which can be viewed as an ad hoc non-linear Krylov subspace method. It would beinteresting to see whether there exists an optimal combination of previous iterates tofind the fixed points of the IPT map. Acknowledgments.
We thank Rostislav Matveev and Alexander Heaton foruseful discussions.
REFERENCES[1] A. Szabo and N. S. Ostlund,
Modern Quantum Chemistry: Introduction to Advanced ElectronicStructure Theory.
Dover Publications, 1996.[2] D. Pozar,
Microwave Engineering (2nd edition) . Wiley, New York, 1998.[3] J. W. S. Rayleigh,
Theory of Sound. I (2nd ed.).
London McMillan, 1894.[4] G. W. Stewart and J.-G. Sun,
Matrix Perturbation Theory . Academic Press, New York, 1990.[5] E. R. Davidson, “The iterative calculation of a few of the lowest eigenvalues and correspondingeigenvectors of large real-symmetric matrices,”
Journal of Computational Physics , vol. 17,pp. 87–94, jan 1975.[6] C. Lanczos, “An iteration method for the solution of the eigenvalue problem of linear differentialTERATIVE EIGENVALUE ALGORITHM and integral operators,” Journal of Research of the National Bureau of Standards , vol. 45,p. 255, oct 1950.[7] J. W. Demmel,
Applied Numerical Linear Algebra . Society for Industrial and Applied Mathe-matics, 1997.[8] E. M. Lifshitz and L. D. Landau,
Quantum Mechanics: Non-relativistic Theory . PergamonPress, 1965.[9] M. Kenmoe, M. Smerlak, and A. Zadorin, “Iterative perturbation theory: physical and chemicalapplications,” in preparation , 2021.[10] T. Kato,
Perturbation Theory for Linear Operators . Springer, 1995.[11] P. R. Surj´an and ´A. Szabados, “Appendix to “Studies in Perturbation Theory”: The Problemof Partitioning,” in
Fundamental World of Quantum Chemistry , pp. 129–185, SpringerNetherlands, 2004.[12] R. A. Horn and C. R. Johnson,
Matrix Analysis . Cambridge University Press, 2012.[13] R. M. May, “Simple mathematical models with very complicated dynamics,”
Nature , vol. 261,pp. 459–467, jun 1976.[14] V. Dolotin and A. Morozov, “On the shapes of elementary domains, or why mandelbrot setis made from almost ideal circles?,”
International Journal of Modern Physics A , vol. 23,pp. 3613–3684, sep 2008.[15] Y. A. Kuznetsov,
Elements of Applied Bifurcation Theory . Springer, 2004.[16] S. Tomov, J. Dongarra, and M. Baboulin, “Towards dense linear algebra for hybrid GPUaccelerated manycore systems,”
Parallel Computing , vol. 36, pp. 232–240, jun 2010.[17] R. B. Lehoucq, D. C. Sorensen, and C. Yang,
ARPACK Users Guide: Solution of Large-ScaleEigenvalue Problems with Implicitly Restarted Arnoldi Methods
SIAM Journal onNumerical Analysis , vol. 49, pp. 1715–1735, jan 2011.[20] Q. Sun, T. C. Berkelbach, N. S. Blunt, G. H. Booth, S. Guo, Z. Li, J. Liu, J. D. McClain,E. R. Sayfutyarova, S. Sharma, S. Wouters, and G. K. Chan, “Pyscf: the python-basedsimulations of chemistry framework,” 2017.[21] J. Zhang, B. O’Donoghue, and S. Boyd, “Globally convergent type-i anderson acceleration fornonsmooth fixed-point iterations,”
SIAM Journal on Optimization , vol. 30, no. 4, pp. 3170–3197, 2020.[22] A. Morgan,
Solving Polynomial Systems Using Continuation for Engineering and ScientificProblems . Society for Industrial and Applied Mathematics, jan 2009.[23] P. Lax, “Linear algebra and its applications, john & sons,”
Inc., Hoboken, New Jersey , 2007.[24] R. Roth and J. Langhammer, “Pad´e-resummed high-order perturbation theory for nuclearstructure calculations,”
Physics Letters B , vol. 683, pp. 272–277, jan 2010. M. KENMOE, R. KRIEMANN, M. SMERLAK AND A. ZADORIN
Appendix A. Additional lemmas for Section 3.
Here as in the main text we consider an n × n matrix M with complex entriesalong with its partition M = D + ∆, where D is diagonal with pair-wise distinctdiagonal elements. The partition defines a mapping F : C n × n → C n × n as in the maintext. Lemma
A.1.
Let M = D + ∆ be a matrix with a partition that obeys (3.1). Thenthere are no eigenvectors of M from eigenspaces of dimension higher than one ascolumns of the corresponding unique A ∗ in B √ ( I ) .Proof. If A ∗ contains an eigenvector from an eigenspace of M with dimensionhigher than one, then any matrix A obtained by a substitution in A ∗ of that eigen-vector with a vector from this eigenspace is a fixed point of F , too. This means thatthere is a whole subspace of fixed points of F of dimension higher than zero whichincludes A ∗ . But this contradicts the uniqueness of A ∗ in B √ ( I ). Lemma
A.2. If M and its partition obey (3.1) and M is defective, then A ∗ doesnot contain any eigenvector causing the defect.Proof. By Lemma (A.1), it is enough to assume that M does not have geometri-cally multiple eigenvalues and all its eigenvectors are isolated projective points, so allfixed points of F are isolated matrices. Let such M be defective. Let J be a Jordannormal form of M and S be the transition matrix that turns M into J . Considera deformation J d of J given by J d = J + diag( d ), where d ∈ C n . This induces adeformation of M given by M d = M + S diag( d ) S − . It is clear that one can find d arbitrarily close to 0 (in the standard topology of C n ) such that all diagonal entriesof J d are different. As J d is upper triangular, its eigenvalues (and thus those of M d )are equal to its diagonal entries. But then M d has a full eigenbasis of isolated eigen-vectors. The formerly merged eigenvectors split into distinct ones. Furthermore, thenew eigenvectors that split from an old defective eigenvector can be made arbitrarilyclose to it (as projective points in the standard topology of C P n − ) by choosing d close enough to 0. This is obvious from considering the eigenvectors spawned from asingle block of J d and their behavior at the limit d → M d of M such that in its partition M d = D + ∆ d , ∆ d = ∆ + S diag( d ) S − , (cid:107) ∆ d − ∆ (cid:107) is arbitrarily small and thus (cid:107) G (cid:107)(cid:107) ∆ d (cid:107) < − √ A ∗ splits into several arbitrarilyclose to it, and thus still contained in B √ ( I ) for small enough d , fixed points of themapping F d defined by the partition, with two or more of them being full-rank andmultiple associated lower rank fixed points. But this contradicts the uniqueness ofthe fixed point of F d computed for the deformed matrix M d . Theorem
A.3 ([23] Theorem 8, page 130).
Let M λ be a differentiable matrix-valued function of real λ , a λ an eigenvalue of M λ of multiplicity one. Then we canchoose an eigenvector h λ of M λ pertaining to the eigenvalue a λ to depend differentiablyon λ . The theorem guarantees that the eigenvector corresponding to an eigenvalue withmultiplicity one is a differentiable function of the parameter as a projective spacevalued function.
Appendix B. Rayleigh-Schr¨odinger perturbation theory.
Here we recall the derivation of the Rayleigh-Schr¨odinger recursion, given e.g. in [24]. The idea behind the original perturbation theory is the following. Let aparametric family of matrices M λ = D + λ ∆ be given, where λ ∈ C is the parameter, TERATIVE EIGENVALUE ALGORITHM M λ ∈ C n × n , M = D is diagonal and is called the unperturbed matrix, and ∆ isthe perturbation matrix. The problem is to find eigenvectors of M λ in the form ofan asymptotic series in powers of the parameter λ . We specifically assume a genericcase of the unperturbed matrix D having distinct diagonal entries. The perturbationmatrix ∆ can be arbitrary.Let ε i ∈ C and z i ∈ C n respectively be the i -th eigenvalue and the i -th eigenvectorof M λ ordered such that(B.1) lim λ → ε i = (cid:15) i , lim λ → z i = e i , where (cid:15) i = D ii , e i are the standard basis vectors of C n , and normalized accordingto (cid:104) e i , z i (cid:105) = 1 for all λ (a choice known in physics as “intermediate normalization”),where by (cid:104)· , ·(cid:105) we understand the standard scalar product in C n . Such normalizationcorresponds to singling out eigenvectorst from the affine charts U i seen as affine sub-spaces of C n as defined in Section 2 of the main text. Therefore, the matrix composedof z i in the chosen order is matrix A from the main text. The possibility to orderthe eigenvectors in such a way that (B.1) holds follows from the well known fact that,given the condition on D and the chosen normalization, ε i and z i are holomorphic in λ in a certain disk around 0, z i are distinct, and these functions are given by seriesconvergent in that disk [10]:(B.2) ε i = (cid:88) (cid:96) ≥ λ (cid:96) ε [ (cid:96) ] i and z i = (cid:88) (cid:96) ≥ λ (cid:96) z [ (cid:96) ] i . As a consequence, matrix A itself is holomorphic in the same disk and can be repre-sented by a power series in λ (B.3) A = (cid:88) (cid:96) ≥ λ (cid:96) A [ (cid:96) ] . Substituting expressions (B.2) into the eigenvalue equation M z = ε z and makinguse of the Cauchy product formula, this yields D z i (0) = ε i (0) z i (0) at zeroth order(hence ε i (0) = (cid:15) i ) and for (cid:96) ≥ D − (cid:15) i I ) z [ (cid:96) ] i = (cid:96) (cid:88) s =1 ε [ s ] i z [ (cid:96) − s ] i − ∆ z [ (cid:96) − i , where I is the unit n × n matrix.It is convenient to expand z [ (cid:96) ] i in the basis of the eigenvectors of D as z [ (cid:96) ] i = (cid:80) nj =1 A [ (cid:96) ] ji e j with A [0] ij = δ ij and A [ (cid:96) ] ii = 0 for (cid:96) ≥
1. By construction, A [ (cid:96) ] ij are elementsof matrices A [ (cid:96) ] from (B.3). This gives for each (cid:96) ≥ ≤ j ≤ n ( (cid:15) i − (cid:15) j ) A [ (cid:96) ] ij = (cid:96) (cid:88) s =1 ε [ s ] j A [ (cid:96) − s ] ij − (∆ A [ (cid:96) − ) ij . The equation for the eigenvalues correction is extracted by setting i = j in thisequation and using A [ (cid:96) ] ii = δ (cid:96), . This leads to ε [ (cid:96) ] i = (∆ A [ (cid:96) − ) ii . Injecting this backinto the equation above we arrive at(B.4) A [ (cid:96) ] ij = G ij (cid:16) (cid:96) (cid:88) s =1 (∆ A [ s − ii A [( (cid:96) − s )] ij − (∆ A [ (cid:96) − ) ij (cid:17) . M. KENMOE, R. KRIEMANN, M. SMERLAK AND A. ZADORIN
Appendix C. Iterative perturbation theory contains the RS series.
We prove A ( k ) = A ( k )RS + O ( λ k +1 ) by induction. Obviously A (0) = A (0)RS = I .Suppose that A ( k − = A ( k − + O ( λ k ) or, more specifically, A ( k − = k − (cid:88) (cid:96) =0 λ (cid:96) A [ (cid:96) ] + O ( λ k ) , where the matrices A [ (cid:96) ] satisfy the recursion (B.4). Then from A ( k ) = F ( A ( k − ) wehave A ( k ) = I + λG ◦ (cid:32) k − (cid:88) m =0 λ m A [ m ] D (cid:32) k − (cid:88) (cid:96) =0 λ (cid:96) ∆ A [ (cid:96) ] (cid:33) − k − (cid:88) (cid:96) =0 λ (cid:96) ∆ A [ (cid:96) ] (cid:33) + O ( λ k +1 ) . From this expression it is easy to see that the term of s -th order in λ in A ( k ) is givenby terms with (cid:96) + m = s − viz. λ s G ◦ (cid:32)(cid:32) s − (cid:88) (cid:96) =0 A [ s − − (cid:96) ] D (cid:16) ∆ A [ (cid:96) ] (cid:17)(cid:33) − ∆ A [ s − (cid:33) . This term matches exactly the RS correction term λ s A ( s ) . This concludes the proof. Appendix D. Further explicit examples.
The 2 × M = D + λ ∆for any D and ∆ takes the form A ( k ) = (cid:18) f k (0) f k (0) 1 (cid:19) , where f j are univariate quadratic polynomial functions related by g ( x ) = x g (1 /x ),with g j ( x ) ≡ x − f j ( x ). The first special feature of this recursion scheme is that itis equivalent to a 1-dimensional quadratic discrete-time dynamical system for eachcolumn of A . This implies that the only critical point of either f j (0, when thediagonal elements of ∆ are equal) is necessarily attracted by at most unique stablefixed point. The second special feature is fact that both columns have exactly thesame convergence domains in the λ -plane. To see this, suppose that x and x arethe roots of g . Then it follows that 1 /x and 1 /x are the roots of g . As g (cid:48) (1 /x ) =2 g ( x ) /x − g (cid:48) ( x ), and (like for any quadratic polynomial) g (cid:48) ( x ) = − g (cid:48) ( x ), we alsosee that g (cid:48) ( x , ) = g (cid:48) (1 /x , ), and thus f (cid:48) ( x , ) = f (cid:48) (1 /x , ). Therefore, the fixedpoint of the dynamical systems defined by x ( k +1) = f ( x ( k ) ) corresponding to aneigenvector and the fixed point of x ( k +1) = f ( x ( k ) ) corresponding to the differenteigenvector are stable or unstable simultaneously.These two properties are not generic when n >
2. Therefore, we provide anotherexplicit example of a 3 × M = + λ . The polynomial P that defines the fixed point degeneration curve C here takes theform for z TERATIVE EIGENVALUE ALGORITHM P ( λ, µ ) = 63792 λ − λ µ − λ − λ µ + 89352 λ µ − λ + 960 λ µ + 14516 λ µ − λ µ + 12116 λ + 5616 λ µ − λ µ + 29988 λ µ − λ µ − λ + 468 λ µ − λ µ + 12820 λ µ − λ µ + 7584 λ µ − λ − λµ + 1404 λµ − λµ + 1350 λµ + 432 λµ + 108 µ − µ + 1980 µ − µ + 432 µ , (D.1)for z P ( λ, µ ) = 113408 λ − λ µ − λ + 7416 λ µ + 53208 λ µ + 36424 λ + 6525 λ µ − λ µ − λ µ − λ − λ µ − λ µ + 10332 λ µ + 3696 λ µ + 3088 λ − λ µ + 1800 λ µ − λ µ − λ µ − λ µ − λ + 128 λµ − λµ − λµ − λµ + 1328 λµ − µ + 108 µ + 360 µ − µ − µ , (D.2)and for z P ( λ, µ ) = 35440 λ − λ µ − λ − λ µ + 110080 λ µ − λ + 29112 λ µ − λ µ − λ µ + 51024 λ + 14640 λ µ − λ µ + 116760 λ µ − λ µ + 5400 λ − λ µ − λ µ + 70296 λ µ − λ µ + 41904 λ µ − λ − λµ + 7920 λµ − λµ + 19440 λµ − λµ + 1296 µ − µ + 17712 µ − µ + 5184 µ . (D.3)As we can see, the dynamical systems for all three columns of A ( z , z , and z )have different domains of convergence in the λ plane. The corresponding curves aredepicted in Figure 9, Figure 10, and Figure 11, respectively.There are differences also in curves for individuals columns with those for the 2 × C does not contain enough information to find the convergencedomain itself. The domains on Figure 9–Figure 11 were found empirically. Of course,they are bound by some parts of C and include the point λ = 0. The reason forsome parts of C not forming the boundary of the domain is that its different partscorrespond to different eigenvectors. In other words, they belong to different branchesof a multivalued eigenvector function of λ , the cusps being the branching points.Consider as a particular example the case z . The curve C intersects itself at λ ≈ − .
49. Above the real axis about this point, one of the two intersecting branchesof the curve form the convergence boundary. Below the real axis, the other one takesits place. This indicates that the two branches correspond to different multipliersof the same fixed point. When (cid:61) λ >
0, one of them crosses the unitary circle atthe boundary of the convergence domain; when (cid:61) λ <
0, the other one does. At λ ≈ − .
49, both of them cross the unitary circle simultaneously. This situation2
M. KENMOE, R. KRIEMANN, M. SMERLAK AND A. ZADORIN corresponds, thus, to a Neimark-Sacker bifurcation (the discrete time analog of theAndronov-Hopf bifurcation). Both branches are in fact parts of the same continuouscurve that passes through the point λ ≈ .
56. Around this point, the curve poses noproblem to the convergence of the dynamical system. The reason for this is that anexcursion around cusps (branching points of eigenvectors) permutes some eigenvectors.As a consequence, the curve at λ ≈ − .
49 corresponds to the loss of stability of theeigenvector that is a continuation of the unique stable eigenvector at λ = 0 by thepath [0 , − . λ ≈ .
56 indicates a unitary byabsolute value multiplier of an eigenvector that is not a continuation of the initial oneby the path [0 , . × D and ∆, but it is a generic featureof matrices with real components. In this particular case, due to this symmetry, theonly possible bifurcations for real values of λ are the flip bifurcations (a multiplierequals to −
1, typically followed by the cycle doubling cascade), the Neimark-Sackerbifurcation (two multipliers assume complex conjugate values e ± it for some t ), and, ifthe matrices are not symmetric, the fold bifurcation (a multiplier is equal to 1). Withsymmetric real matrices, the fold bifurcation is not encountered because the cuspscannot be real but instead form complex conjugated pairs. These features are theconsequence of the behavior of det( D + λ ∆ − xI ) with respect to complex conjugationand from the fact that symmetric real matrices cannot have nontrivial Jordan forms.Likewise, Hermitian matrices result in complex conjugate nonreal cusp pairs, butthe curve itself is not necessarily symmetric. As a result, there are many more waysfor a steady state to lose its stability, from which the fold bifurcation is, however,excluded. Generic bifurcations at λ ∈ R here consist in a multiplier getting a value e it for some t (cid:54) = 0. The situation for general complex matrices lacks any symmetry atall. Here steady states lose their stability by a multiplier crossing the unit circle withany value of t , and thus the fold bifurcation, although possible, is not generic. It isgeneric for one-parameter (in addition to λ ) families of matrices.As already noted, for the holomorphic dynamics of any 2 × F starting from the identity matrix converges to the needed solutionprovided that λ is in the convergence domain. This is not true anymore for n >
2, starting already from the fact that there are no discrete critical points in largerdimensions. The problem of finding a good initial condition becomes non-trivial. Ascan be seen on Figure 10, the particular 3 × z ). The naive iteration with A (0) = I does not converge to theexisting attracting fixed point of the dynamical system near some boundaries of itsconvergence domain. Our current understanding of this phenomenon is the crossingof the initial point by the attraction basin boundary (in the z -space). This boundaryis generally fractal. Perhaps this explains the eroded appearance of the empiricalconvergence domain of the autonomous iteration.To somewhat mitigate this complication, we applied a nonautonomous iterationscheme in the form, omitting details, z ( k +1) = f ( z ( k ) , λ (1 − α k )) with z (0) = (0 , , T ,where α is some positive number α <
1, so that lim k →∞ λ (1 − α k ) = λ , and weexplicitly indicated the dependence of f ( z , λ ) on λ . The idea of this ad hoc approachis the continuation of the steady state in the extended ( z , λ )-phase space from valuesof λ that put z (0) inside the convergence domain of that steady state. Doing so, we TERATIVE EIGENVALUE ALGORITHM z , this situation takes place at λ ≈ . z at λ ≈ .
56, and for z at λ ≈ .
2. All three points are on the real axis, as isexpected from the symmetry considerations above. There is no cusp at these pointsand no fold bifurcations (no merging of eigenvectors), as it should be for symmetricreal matrices. Instead, another multiplier of the same fixed point goes to infinity atthe same value of λ (the point becomes super-repelling). As a result, the theorem ofthe reduction to the central manifold is not applicable.Fig. 9: The convergence domain on the λ -plane for the first column of A (the firsteigenvector z ) for the 3 × λ that lead to divergence to infinity (the darker the slower thedivergence). In red are the values of λ where the matrix is non-diagonalizable.4 M. KENMOE, R. KRIEMANN, M. SMERLAK AND A. ZADORIN
Fig. 10: Same as Figure 9 for the second eigenvector z (the second column of A ).Small components correspond to stability of various periodic orbits. Various shadesof grey show the values of λ that lead to divergence to infinity (the darker the slowerthe divergence). TERATIVE EIGENVALUE ALGORITHM z (the third column of A ) . Appendix E. Exceptional points and cusps of C . In a large part of this section we drop the previously adopted notation conventionsfor vectors. We use instead the more appropriate notation style from differentialgeometry.We will prove that if M is defective at some value of λ , then dλ/dt = 0 for thecurve C at that point (as above, µ is a multiplier of the dynamical perturbation theoryand C is defined by µ = e it and is locally considered as a curve t (cid:55)→ λ ( t )).Fix i and consider the dynamical system for the i -th column only. Let x =( x , . . . , x n − ) be a tuple of affine coordinates in the affine chart U i . Note that thecoordinates are rearranged in comparison to z j : x = z , . . . , x i − = z i − , x i = z i +1 ,. . . , x n − = z n . We consider the representation of the corresponding dynamicalsystem in U i too. Let it be given by the tuple of functions h = ( h , . . . , h n − ) thatcorrespond to functions in f j with ( f i ) i omitted (( f i ) i is trivially 1 in U i ). The samerearrangement is implied for h j as for x j . Thus, the dynamical system is defined by x ( k +1) = h ( x ( k ) ) and the stationary states are defined by the system of equations x = h ( x ).Consider C n +1 with coordinates ( x , . . . , x n − , λ, ν ), where ν ≡ µ −
1, as a complexanalytic manifold with these coordinates as global holomorphic coordinates on it. Letus define polynomial functions F j : C n +1 → C , ( x , λ, ν ) (cid:55)→ h j ( x , λ ) − x j , and F = ( F , . . . , F n − ). Let us denote J ≡ ∂ F ∂ x ≡ ∂ ( F , . . . , F n − ) ∂ ( x , . . . , x n − ) the Jacobianmatrix of F with respect to variables x j . Let I be the unitary ( n − × ( n −
1) matrix.Consider the complex 1-dimensional variety
C ⊂ C n +1 defined by the polynomialsystem (cid:40) F = 0 , det( J − νI ) = 0 . M. KENMOE, R. KRIEMANN, M. SMERLAK AND A. ZADORIN
Curve C is the projection to the λ -plane of the real 1-dimensional variety ˜ C = C ∩{| ν + 1 | = 1 } in C N +1 considered as R N +1) .First, note that if M is defective at some value of λ , then the system of equations F = 0 (with this values of λ fixed and considered for unknowns x ∈ C n − ) has a root x of multiplicity greater than 1. This means that the hyperplanes {F j = 0 } are notin general position at the intersection that corresponds to this root, which impliesdet ∂ F /∂ x = 0. Therefore, the point p ∈ C n +1 with the same x and λ and with ν = 0belongs to C and represents this non-diagonalizability of M .Let d denote the exterior derivative and ∧ denote the exterior product on thecomplex of holomorphic exterior forms Ω • ( C n +1 ). Alternatively, one may treat itin purely axiomatic way as the K¨ahler differential on the algebra of holomorphicfunctions on C n +1 with the appropriate factorization in the end. Let p ∈ C be a pointof geometric degeneration of M with ν = 0 as above. Theorem
E.1.
Assume that the following nondegeneration condition holds: d p det J ∧ (cid:86) j d p F j (cid:54) = 0 . Then C can be locally parametrized by ν around p and, with this param-etrization, dλ/dν = 0 at p .Proof. Let T p C n +1 be the holomorphic tangent space to C n +1 at p , that is thetangent space spanned by the holomorphic vector fields ∂ j ≡ ∂/∂x j , ∂ λ ≡ ∂/∂λ , ∂ ν ≡ ∂/∂ν at p . Let us denote ι u σ p the contraction of a holomorphic form σ ( σ p ∈ (cid:86) • p C n +1 ) with a tangent vector u ∈ T p C n +1 at point p .Let us denote ω p ≡ d p det( J − νI ) ∧ (cid:86) j d p F j and (cid:36) p ≡ d p det J ∧ (cid:86) j d p F j . By thepremise, (cid:36) p (cid:54) = 0, which also implies ω p (cid:54) = 0. Indeed, the free term (with respect to ν )of the polynomial det( J − νI ) is equal to det J , and thus(E.1) d p det( J − νI ) = ∂ ν det( J − νI ) | p d p ν + d p det J, where the two terms are linearly independent because ∂ ν det J = 0. Therefore, as nonof F j depends on ν , ω p differs from (cid:36) p by an addition of a linearly independent term.Let v ∈ T p C n +1 be a nonzero vector with coordinates ( v i , v λ , v ν ) tangent to C . Itmeans that v det( J − νI ) = v F j = 0, where by vf we denote the action of a vector v on a function f . This, in turn, implies ι v ω p = 0.As all F j depend only on x and λ , we have d p λ ∧ (cid:86) j d p F j = det J | p d p λ ∧ (cid:86) j d p x j = 0,and thus d p λ ∧ ω p = 0. Therefore, we have ι v ( d p λ ∧ ω p ) = − d p λ ∧ ι v ω p + v λ ω p = v λ ω p = 0 . This implies v λ = 0.On the other hand, by (E.1) we have d p ν ∧ d p det( J − νI ) = d p ν ∧ d p det J, and thus d p ν ∧ ω p = d p ν ∧ (cid:36) p (cid:54) = 0. But dν ∧ ω ∈ Ω n +1 ( C n +1 ), and therefore, forany holomorphic tangent vector u ∈ T p C n +1 , u (cid:54) = 0 is equivalent to ι u ( d p ν ∧ ω p ) (cid:54) = 0.This results in ι v ( d p ν ∧ ω p ) = − d p ν ∧ ι v ω p + v ν ω p = v ν ω p (cid:54) = 0 , and thus v ν (cid:54) = 0. TERATIVE EIGENVALUE ALGORITHM C can be holomorphically parametrizedby ν in a neighborhood of p . Together with v λ = 0 it implies that we have dλ/dν | p = 0on C .Now, consider a smooth real curve parametrized by a real parameter t on thecomplex ν -plane that without degeneracy passes through 0. This curve is lifted to C and the resulting smooth real curve is parametrized by t . From dλ/dν = 0 weconclude that dλ/dt = 0 at p too. In our case we have the curve µ ( t ) = e it or ν ( t ) = e it −
1, which passes through µ = 1 at t = 0. ˜ C is projected without degeneration to C × R (cid:51) ( λ, t ) locally near p and then to C (cid:51) λ with degeneration at the projectionof p given by dλ/dtdλ/dt