[PDF] Polynomial filter diagonalization of large Floquet unitary operators

Abstract

Periodically driven quantum many-body systems play a central role for our understanding of nonequilibrium phenomena. For studies of quantum chaos, thermalization, many-body localization and time crystals, the properties of eigenvectors and eigenvalues of the unitary evolution operator, and their scaling with physical system size L are of interest. While for static systems, powerful methods for the partial diagonalization of the Hamiltonian were developed, the unitary eigenproblem remains daunting. In this paper, we introduce a Krylov space diagonalization method to obtain exact eigenpairs of the unitary Floquet operator with eigenvalue closest to a target on the unit circle. Our method is based on a complex polynomial spectral transformation given by the geometric sum, leading to rapid convergence of the Arnoldi algorithm. We demonstrate that our method is much more efficient than the shift invert method in terms of both runtime and memory requirements, pushing the accessible system sizes to the realm of 20 qubits, with Hilbert space dimensions \geq 10^6.

Full PDF

PPolynomial ﬁlter diagonalization of large Floquet unitary operators

David J. Luitz

Max Planck Institute for the Physics of Complex Systems, Noethnitzer Str. 38, 01167 Dresden, Germany ∗ (Dated: February 11, 2021)Periodically driven quantum many-body systems play a central role for our understanding ofnonequilibrium phenomena. For studies of quantum chaos, thermalization, many-body localizationand time crystals, the properties of eigenvectors and eigenvalues of the unitary evolution operator,and their scaling with physical system size L are of interest. While for static systems, powerfulmethods for the partial diagonalization of the Hamiltonian were developed, the unitary eigenproblemremains daunting.In this paper, we introduce a Krylov space diagonalization method to obtain exact eigenpairs ofthe unitary Floquet operator with eigenvalue closest to a target on the unit circle. Our method isbased on a complex polynomial spectral transformation given by the geometric sum, leading to rapidconvergence of the Arnoldi algorithm. We demonstrate that our method is much more eﬃcient thanthe shift invert method in terms of both runtime and memory requirements, pushing the accessiblesystem sizes to the realm of 20 qubits, with Hilbert space dimensions ≥ . Introduction —

Periodically driven, Floquet quantummany-body systems host fascinating nonequilibrium phe-nomena. While they generically relax to a feature-less state [1], independent of the initial state, they canundergo nonequilibrium phase transitions, such as themany-body localization transition [2–5]. In this con-text, Floquet systems are often cleaner counterparts ofHamiltonian systems, capturing the essence of these phe-nomenona, due to the absence of an energy structure anda uniform density of states. This is for example use-ful for high quality tests of the eigenstate thermalizationhypothesis [1, 6–11].Interestingly, Floquet quantum many-body systemscan exhibit altogether new physics, absent in Hamil-tonian systems [12], such as robust [13–18], ﬁne tuned[19, 20] or prethermal [21–25] time crystals .Many questions, in particular in the context of themany-body localization transition, rely on studying thescaling of the properties of eigenvectors of the unitaryone period evolution operator U with system size. Whilein static systems powerful methods like shift-invert diag-onalization [26, 27] and polynomial ﬁlter diagonalization[28–31] were developed for ﬁnding interior eigenpairs ofa large hermitian and sparse Hamiltonian up to Hilbertspace dimensions of 10 , the case of the dense unitaryeigenproblem remains daunting.Although progress was made for the special case deepin the many-body localized phase based on matrix prod-uct state variants of the shift invert technique [32, 33]for the Floquet operator [34], a general purpose methodwhich can go beyond full diagonalization of the unitarymatrix U , limited to system sizes of about L = 14 . . . g k ( U ) ∗ [email protected] of order k . The spectral transformation is a complexpolynomial of U and an eﬃcient matrix vector product g k ( U ) | ψ (cid:105) can be deﬁned, provided there is an eﬃcient ma-trix vector product U | ψ (cid:105) . This is the case in local Floquetsystems (e.g. in a matrix product operator formulation).The spectral transformation is designed to enhance theabsolute value of eigenvalues of U close to an arbitrarytarget on the unit circle, and to reduce the absolute valueof all other eigenvalues, thus allowing rapid convergenceof the Arnoldi algorithm to the requested eigenpairs of g k ( U ).We show that this procedure is eﬀective and can becarried out with a low memory footprint, compared todense shift-invert or full diagonalization. Eﬀectively, itgives access to system sizes up to L ≥

20 qubits, whilefull diagonalization is limited to L ≈ Model —

To investigate the performance of ourmethod, we consider a simple generic model for a timeperiodic one dimensional quantum many-body system of L qubits, with a Hilbert space of dimension d = 2 L . Theevolution operator U over one driving period is given bya two layer brickwork circuit, composed of two site uni-taries. U = (1)Each of the boxes in Eq. (1) represents a unitary u i,i +1 ∈ C × , acting on qubit i and its right neighbor( i +1). At the boundaries, we ﬁll in single qubit unitaries u , u L ∈ C × if needed. Our construction ensures thateach link in the chain of qubits is represented by a 4 × U in terms of the two layers U a and a r X i v : . [ c ond - m a t . d i s - nn ] F e b U b (for even L ): U = U a U b ,U a = u , × u , × · · · × u L − ,L U b = u × u , × · · · × u L − ,L − × u L . (2)It is important to note that the unitary matrix U is dense in the computational basis. To construct an ef-ﬁcient matrix vector product | ψ (cid:48) (cid:105) ← U | ψ (cid:105) , we split thecircuit in a left part U L ∈ C d L × d L (red tensors in Eq. (1))and a right part U R ∈ C d R × d R (blue tensors in Eq. (1)).It is advisable to chose d L , d R ≈ √ L .The Floquet operator is then decomposed into U =( U L × )( × U R ), and we can calculate U | ψ (cid:105) by twomatrix products, ﬁrst calculating ψ d L × d/d L ← U L ψ d L × d/d L and then ψ d/d R × d R ← ψ d/d R × d R U T R . (3)Note, that here ψ d L × d/d L and ψ d/d R × d R are two diﬀerent reshapings of the vector | ψ (cid:105) into matrices of dimensionsindicated in the subscript.The advantage of this procedure is that instead of thevery large matrix U ∈ C d × d , we only need to store twomuch smaller matrices U L and U T R , and we have expressedthe matrix vector product in terms of two matrix prod-ucts, with eﬃcient memory access per ﬂoating point op-eration.This decomposition is speciﬁc to brickwork circuits,but for general one dimensional local Floquet problemseﬃcient matrix free matrix vector products can be formu-lated, for example based on a matrix product operatorformulation of U [34]. The matrix product operator isguaranteed by locality to have a constant bond dimen-sion due to the area law of the operator entanglemententropy of U [35, 36]. Method —

Our goal is to calculate a subset of the eigen-pairs { ω n , | n (cid:105)} of the large unitary matrix U in such away that we obtain all n ev eigenpairs with eigenvalue ω n closest to a target z tgt ∈ C . The eigenvalues ω n of U lie on the complex unit circle, | ω n | = 1 and can thereforenot be separated by magnitude, which is necessary for theconvergence of Krylov space diagonalization techniques.To achieve this, spectral transformations f ( U ) can beused to transform the eigenvalues ω n → f ( ω n ), whileleaving the the eigenvectors | n (cid:105) invariant. If the magni-tude | f ( ω n ) | is large for eigenvalues ω n close to the targetand small otherwise, e.g. the Arnoldi algorithm can beused to calculate the eigenpairs of interest of f ( U ), whileonly the matrix vector product | ψ (cid:48) (cid:105) ← f ( U ) | ψ (cid:105) is needed.One of the most eﬀective spectral transformations isthe “shift and invert” transformation f sinvert ( U ) = ( U − z tgt ) − (4)which provides excellent convergence of the Arnoldi al-gorithm if the target is chosen on or close to the unit cir-cle. The downside of f sinvert is that for the matrix vector g k (e i φ )) − − I m ( g k ( e i φ )) k = 30 − π φ tgt πφ | g k ( e i φ ) | FIG. 1. Left: Mapping of the complex unit circle | z | = 1(inset) by the geometric sum g k ( z ) with polynomial order k = 30. The “target” arc of the unit circle highlighted in redis mapped to the red line in the main panel by g k ( z ). Right:Absolute value of the mapped values as a function of phaseangle φ . The part of the curve marked in red is the mapped“target” arc of the unit circle, given by the target phase φ tgt . product ( U − z tgt ) − | ψ (cid:105) , an inversion is involved. Dueto the dense spectrum of U , the condition number of U − z tgt is exponentially large in system size and there-fore only a direct solution using a LU decomposition ofthe dense matrix U − z tgt can be used, with a memorycomplexity O ( d ) and runtime complexity O ( d ). Weprovide benchmark results of this technique in Tab. I forcomparison.Polynomial spectral transformations are useful, be-cause they do not suﬀer from large memory requirementssince any power of U can be applied to a vector | ψ (cid:105) by re-peated matrix vector products U k | ψ (cid:105) = U ( U . . . ( U | ψ (cid:105) )).Generally, the polynomial of degree kp k ( U ) = k (cid:88) m =0 a m U m (5)can be eﬃciently multiplied onto a vector to obtain p k ( U ) | ψ (cid:105) .We argue here that an eﬀective complex polynomialspectral transformation to single out eigenpairs witheigenvalue closest to z tgt = e i φ tgt is given by the geo-metric sum g k ( U ) = k (cid:88) m =0 e − i mφ tgt U m . (6)The phase factor can be understood by noticing that mul-tiplication by e − i φ tgt rotates the eigenvalues of U closestto e i φ tgt to the proximity of 1.The geometric sum g k ( U ) has the closed form (if ω n (cid:54) =1) g k ( U ) = (cid:104) − e − i( k +1) φ tgt U k +1 (cid:105) (cid:2) − e − i φ tgt U (cid:3) − , (7)and has some similarity with f sinvert .Fig. 1 illustrates the mapping of the unit circle by g k ( z ). The left panel shows the transformed unit circleunder g k ( z ) on the complex plane, while the right paneldepicts | g k (e i φ ) | as a function of the phase angle φ .A simple analysis shows that the arc of the unit circlewith φ tgt − πk +1 ≤ φ ≤ φ tgt + πk +1 is mapped to the “apple”shaped outer line in the left panel of Fig. 1, while therest of the circle is compressed into the inner spirals. Thetarget arc is shown in red in Fig. 1. It is noteworthy thatthe target arc is not only strongly enhanced in magnitudeby g k , but also shows the largest separation of eigenvalueson the complex plane. These features are important fora rapid convergence of the Arnoldi algorithm.Quantum many-body Floquet systems typically havea uniform eigenvalue density on the unit cirle. Therefore,on average for a ﬁxed polynomial order k , the 2 d/ ( k + 1)eigenvalues closest to z tgt will be mapped to the outer“apple” line by g k . Arnoldi algorithm for g k ( U ) — The Arnoldi algo-rithm [37] is a generalization of the Lanczos method[38] to nonhermitian matrices. It is a numericallystable variant of the power iteration, and iterativelybuilds an orthonormal basis { v j } of the Krylov spacespan (cid:0) | ψ (cid:105) , g k ( U ) | ψ (cid:105) , g k ( U ) | ψ (cid:105) . . . g n cv − k | ψ (cid:105) (cid:1) of dimension n cv , starting from a random initial vector | ψ (cid:105) . Raising g k to high powers in this process ﬁlters out componentsof eigenvectors of g k which are in the target space (i.e.whose eigenvalues of g k have a large magnitude and aretherefore enhanced). At the same time, g k ( U ) is pro-jected into the Krylov space by the matrix V ∈ C d × n cv with columns given by v j , yielding the upper Hessenbergmatrix H m = V † g k ( U ) V. (8)The eigenvalues λ i and eigenvectors x i of H m yield theRitz pairs ( λ i , V x i ), which are approximations of theeigenpairs of g k ( U ). If the dimension of the Krylov space n cv is suﬃciently large, the Ritz pairs will converge withthe number of Arnoldi iterations. Typically, λ i with thelargest magnitude converge ﬁrst, and therefore an eﬀec-tive spectral transformation is important. One shouldnot expect to converge all n cv Ritz pairs, but rather chosea maximal size of the Krylov space n cv > n ev , if n ev eigenvalues are required.Once n ev Ritz pairs are converged, we obtain excellentapproximations for eigenpairs of g k ( U ). The eigenvectors | n (cid:105) of g k ( U ) are also eigenvectors of U . The eigenvalues λ n of g k ( U ) are related to the corresponding eigenvalues ω n via g k ( ω n ) = λ n . Rather than solving this equationby root ﬁnding, we calculate them from ω n = (cid:104) n | U | n (cid:105) .This has the advantage that we can at the same timecheck the residuals [39] r n = (cid:107) U | n (cid:105) − ω n | n (cid:105)(cid:107) (9)as a measure of eigenpair quality. k L = 12, n optcv = 170, k opt = 42optimal runtime = 1 . L +1 /n cv . · L +1 /n cv k L = 14, n optcv = 250, k opt = 108optimal runtime = 36 . n cv k L = 16, n optcv = 430, k opt = 243optimal runtime = 611 . FIG. 2. Benchmark calculation of n ev = 50 eigenpairs closestto z tgt = 1 using 4 cores of an AMD EPYC 7H12 2.6 GHzCPU. The colormap shows the measure runtimes in seconds,the black dashed line indicates Krylov space dimensions equalto the number of eigenvalues on the outer apple line in Fig.1. The red dashed line shows the Krylov space dimension80% smaller than this value. Red crosses show the position ofabsolute minimal runtime. Scanning algorithmic parameters n cv and k disabled core boost and therefore numbers can diﬀerfrom Tab. I. We pursue here the following strategy: Since the eigen-values on the outer “apple” line of g k ( z ) are well sepa-rated from the rest of the spectrum, it is advisable to usea Krylov space dimension n cv = 2 d/ ( k + 1) (in practice,one can take it about 80% smaller as we will see below).If we want to calculate n ev eigenpairs, we furthermoreuse n cv > n ev . We use the reference implementation ofthe implicitly restarted Arnoldi method as provided by arpack [40]. Benchmarks —

To ﬁnd the optimal algorithmic param-eters k and n cv , we perform benchmark calculations toobtain n ev = 50 eigenpairs closest to z tgt = 1 of Floquetrandom circuits from Eq. (1) of length L = 12 , ,

16 fora range of Krylov space dimensions n cv and polynomialorders k . The results are shown in Fig. 2. The blackdashed line corresponds to the n cv = 2 L +1 /k , where theKrylov space dimension is equal to the number of eigen-values on the outer apple line in Fig. 1. The colormapshows the obtained runtimes in seconds. There is a dis- method L time [s] mem. [GiB] max res.geom. sum 10 0 . .

167 4 . × − geom. sum 11 0 . .

224 2 . × − geom. sum 12 2 . .

265 1 . × − geom. sum 13 6 . .

288 2 . × − geom. sum 14 26 . .

369 2 . × − geom. sum 15 46 . .

434 2 . × − geom. sum 16 193 . .

861 2 . × − geom. sum 17 903 . .

802 2 . × − geom. sum 18 4965 . .

601 3 . × − geom. sum 19 16 579 . .

081 3 . × − geom. sum 20 97 706 . .

323 4 . × − full diag. 10 3 . .

289 1 . × − full diag. 11 17 . .

556 1 . × − full diag. 12 102 . .

341 2 . × − full diag. 13 687 . .

476 4 . × − full diag. 14 3622 . .

622 4 . × − full diag. 15 23 025 . .

581 5 . × − full diag. 16 198 568 . .

614 6 . × − shift invert 10 0 . .

264 5 . × − shift invert 11 1 . .

454 6 . × − shift invert 12 5 . .

996 9 . × − shift invert 13 18 . .

25 1 . × − shift invert 14 78 . .

263 1 . × − shift invert 15 352 . .

19 2 . × − shift invert 16 2197 . .

248 3 . × − shift invert 17 11 433 . .

247 3 . × − TABLE I. Average runtime and memory consumption for thecalculation of n ev = 50 eigenpairs close to z tgt = 1 of ourproposed geometric sum polynomial ﬁlter method with n cv = (cid:98) L/ (cid:99) and the optimal polynomial order k = 0 . · L +1 /n cv (cf. Fig. 2) in comparison with full diagonalization usingMKL’s zgeev , as well as a custom shift invert implementa-tion using MKL’s zgetrf . Runtimes were measured using 16cores of an AMD EPYC 7H12 2.6 GHz CPU. Memory us-age is estimated from the maximum resident set size (maxRSS). Memory footprints are only indicative since our simplebenchmark codes were not optimized for memory usage. tinct minimum visible in the runtime, when the Krylovspace dimension is about 0 . · L +1 /k (red dashed) forlarge n cv . This corresponds to the number of eigenvalueswith larger modulus than the rest of the spectrum andis therefore the optimal size of the Krylov space for each k . We observe a tendency that for larger sizes, largerKrylov space dimensions are better. And we are therforeusing n cv = (cid:98) L/ (cid:99) , k = 0 . · L +1 /n cv in the followingbenchmarks for larger sizes.For an assessment of the performance of our method incomparison with the state of the art, we carry out a cal-culation of 50 eigenpairs of U with eigenvalues closest to1 using (i) full diagonalization of U using the zgeev Rou-tine from MKL, (ii) shift invert diagonalization based onthe LU decomposition of U − using the zgetrf Routinefrom MKL in combination with arpack ’s Arnoldi algo-rithm (iii) our new geometric sum ﬁltered diagonalizationusing arpack ’s Arnoldi algorithm.We measure the total runtime of the calculation as well

10 12 14 16 18 20 L r un t i m e [ s ] geom. sumshift invertfull diag. FIG. 3. Scaling of total runtime for the calculation of n ev = 50eigenpairs close to z tgt = 1 for diﬀerent methods. The datais the same as in Tab. I as the memory footprint and show the results in Tab.I and Fig. 3. We also calculate the maximal residuemax n =1 (cid:107) U | n (cid:105) − ω n | n (cid:105)(cid:107) to check the quality of the ob-tained eigenpairs. Fig. 3 reveals that the scaling of alltechniques is exponential in system size L due to thenature of the problem. However, with a ﬁxed runtime,geometric sum ﬁlter diagonalization yields eigenpairs ofsystems about 4 sites larger, i.e. for Hilbert spaces about16 times larger. Due to the very small memory footprint,system sizes up to about L = 20 are therefore reach-able, which would require about 50 TiB of memory withshift-invert. Despite the use of quite large polynomialorders, the algorithm is stable and yields eigenpairs ofexcellent quality with residuals about one order of mag-nitude smaller than obtained from full diagonalization. Conclusion —

We have shown that the geometric sum isan eﬀective polynomial ﬁlter to obtain interior eigenpairsof local Floquet unitary operators. Due to the locality, aneﬃcient matrix vector product U | ψ (cid:105) can be deﬁned andthe geometric sum can be eﬃciently applied to any wavefunction. This allows the application of the implicitlyrestarted Arnoldi algorithm for ﬁnding eigenpairs closestto an arbitrary target on the complex unit circle.Although the overall exponential scaling of the prob-lem remains, the method has a moderate memory foot-print compared to full diagonalization and shift-invert,and is roughly one order of magnitude faster than shift-invert, making systems of L ≥

20 qubits accessible.We note that a large fraction of the runtime of thealgorithm is spent in the matrix vector product due tothe high order of the polynomial, and that runtimes canbe reduced signiﬁcantly by optimizing it.Comments from Luis Colmenarez are gratefully ac-knowledged. This work was in part supported bythe Deutsche Forschungsgemeinschaft through SFB 1143(project-id 247310070). [1] Achilleas Lazarides, Arnab Das, and Roderich Moessner,“Equilibrium states of generic quantum systems subjectto periodic driving,” Phys. Rev. E , 012110 (2014).[2] P. W. Anderson, “Absence of Diﬀusion in Certain Ran-dom Lattices,” Phys. Rev. , 1492–1505 (1958).[3] D. M. Basko, I. L. Aleiner, and B. L. Alt-shuler, “Metal–insulator transition in a weakly interact-ing many-electron system with localized single-particlestates,” Annals of Physics , 1126–1205 (2006).[4] Achilleas Lazarides, Arnab Das, and Roderich Moessner,“Fate of Many-Body Localization Under Periodic Driv-ing,” Phys. Rev. Lett. , 030402 (2015).[5] Pedro Ponte, Zlatko Papi´c, Fran¸cois Huveneers, andDmitry A. Abanin, “Many-Body Localization in Peri-odically Driven Systems,” Phys. Rev. Lett. , 140401(2015).[6] J. M. Deutsch, “Quantum statistical mechanics in aclosed system,” Phys. Rev. A , 2046–2049 (1991).[7] Mark Srednicki, “Chaos and quantum thermalization,”Phys. Rev. E , 888–901 (1994).[8] Marcos Rigol, Vanja Dunjko, and Maxim Olshanii,“Thermalization and its mechanism for generic isolatedquantum systems,” Nature , 854–858 (2008).[9] Sthitadhi Roy, Yevgeny Bar Lev, and David J. Luitz,“Anomalous thermalization and transport in disorderedinteracting Floquet systems,” Phys. Rev. B , 060201(2018).[10] Laura Foini and Jorge Kurchan, “Eigenstate Thermal-ization and Rotational Invariance in Ergodic QuantumSystems,” Phys. Rev. Lett. , 260601 (2019).[11] Amos Chan, Andrea De Luca, and J. T. Chalker, “Eigen-state Correlations, Thermalization, and the Butterﬂy Ef-fect,” Phys. Rev. Lett. , 220601 (2019).[12] Haruki Watanabe and Masaki Oshikawa, “Absence ofQuantum Time Crystals,” Phys. Rev. Lett. , 251603(2015).[13] Vedika Khemani, Achilleas Lazarides, Roderich Moess-ner, and S. L. Sondhi, “Phase Structure of Driven Quan-tum Systems,” Phys. Rev. Lett. , 250401 (2016).[14] Dominic V. Else, Bela Bauer, and Chetan Nayak,“Floquet Time Crystals,” Phys. Rev. Lett. , 090402(2016).[15] R. Moessner and S. L. Sondhi, “Equilibration and orderin quantum Floquet matter,” Nature Physics , 424–428 (2017).[16] Vedika Khemani, Roderich Moessner, and S. L. Sondhi,“A Brief History of Time Crystals,” arXiv:1910.10745(2019).[17] Dominic V. Else, Christopher Monroe, Chetan Nayak,and Norman Y. Yao, “Discrete Time Crystals,” AnnualReview of Condensed Matter Physics , 467–499 (2020).[18] Krzysztof Sacha and Jakub Zakrzewski, “Time crystals:a review,” Rep. Prog. Phys. , 016401 (2017).[19] David J. Luitz, Achilleas Lazarides, and YevgenyBar Lev, “Periodic and quasiperiodic revivals in periodi-cally driven interacting quantum systems,” Phys. Rev. B , 020303 (2018).[20] Robin Sch¨afer, G¨otz S. Uhrig, and Joachim Stolze,“Time-crystalline behavior in an engineered spin chain,”Phys. Rev. B , 184301 (2019).[21] Dmitry A. Abanin, Wojciech De Roeck, and Fran¸cois Huveneers, “Exponentially Slow Heating in PeriodicallyDriven Many-Body Systems,” Phys. Rev. Lett. ,256803 (2015).[22] Dmitry Abanin, Wojciech De Roeck, Wen Wei Ho, andFran¸cois Huveneers, “A Rigorous Theory of Many-BodyPrethermalization for Periodically Driven and ClosedQuantum Systems,” Commun. Math. Phys. , 809–827 (2017).[23] Dmitry A. Abanin, Wojciech De Roeck, Wen WeiHo, and Fran¸cois Huveneers, “Eﬀective Hamiltonians,prethermalization, and slow energy absorption in peri-odically driven many-body systems,” Phys. Rev. B ,014112 (2017).[24] Dominic V. Else, Bela Bauer, and Chetan Nayak,“Prethermal Phases of Matter Protected by Time-Translation Symmetry,” Phys. Rev. X , 011026 (2017).[25] David J. Luitz, Roderich Moessner, S. L. Sondhi, andVedika Khemani, “Prethermalization without Tempera-ture,” Phys. Rev. X , 021046 (2020).[26] David J. Luitz, Nicolas Laﬂorencie, and FabienAlet, “Many-body localization edge in the random-ﬁeldHeisenberg chain,” Phys. Rev. B , 081103 (2015).[27] Francesca Pietracaprina, Nicolas Mac´e, David J. Luitz,and Fabien Alet, “Shift-invert diagonalization of largemany-body localizing spin chains,” SciPost Physics ,045 (2018).[28] Y. Saad, Numerical Methods for Large Eigenvalue Prob-lems (Manchester University Press, 1992).[29] Andreas Pieper, Moritz Kreutzer, Andreas Alvermann,Martin Galgon, Holger Fehske, Georg Hager, BrunoLang, and Gerhard Wellein, “High-performance imple-mentation of Chebyshev ﬁlter diagonalization for inte-rior eigenvalue computations,” Journal of ComputationalPhysics , 226–243 (2016).[30] Moritz Kreutzer, Dominik Ernst, Alan R. Bishop, Hol-ger Fehske, Georg Hager, Kengo Nakajima, and GerhardWellein, “Chebyshev Filter Diagonalization on ModernManycore Processors and GPGPUs,” in