Lanczos Boosted Numerical Linked-Cluster Expansion for Quantum Lattice Models
LLanczos Boosted Numerical Linked-Cluster Expansion for Quantum Lattice Models
Krishnakumar Bhattaram
1, 2 and Ehsan Khatami Lynbrook High School, 1280 Johnson Ave, San Jose, CA 95129, USA Department of Physics and Astronomy, San Jose State University, San Jose, CA 95192, USA
Numerical linked-cluster expansions allow one to calculate finite-temperature properties ofquantum lattice models directly in the thermodynamic limit through exact solutions of small clusters.However, full diagonalization is often the limiting factor for these calculations. Here, we show thata partial diagonalization of the largest clusters in the expansion using the Lanczos algorithm canbe as useful as full diagonalization for the method while mitigating some of the time and memoryissues. As a test case, we consider a frustrated Heisenberg model on the checkerboard lattice andfind that our approach can lead to much more efficient calculations in a parallel environment.
I. INTRODUCTION
Over the past decade, the numerical linked-clusterexpansion (NLCE) [1–3] has become a versatile andpowerful technique for solving quantum lattice modelsand studying finite-temperature properties of strongly-correlated systems in the thermodynamic limit. So far, ithas seen applications to a variety of problems includingmagnetic models [2, 4–6], itinerant electron models onseveral different geometries [3, 7–9], quantum quenchesin the thermodynamic limit [10, 11], entanglement [12–14], and even the hard problem of dynamics inthe thermodynamic limit for system in or out ofequilibrium [15–17].At the heart of the method lies the exact solutionof model Hamiltonians on relatively small clusters,which have open boundaries and unusual topologies,using exact diagonalization. This step is the mostcomputationally expensive and therefore the limitingfactor of the method. For models such as the Fermi-Hubbard model, the growth in the size of the Hilbertspace with cluster size puts the calculations quicklyup against an exponential wall. In other cases, thefactorially growing number of clusters that need to bediagonalized is the limiting factor.In cases where only the ground state is of interest,and in the absence of long-range entanglement, whichis unfavorable to the convergence of the series, othertechniques such as the Lanczos algorithm or the densitymatrix renormalization group have been employed [6, 14].The main idea behind our study is to employ theLanczos technique to obtain not only the ground state,but also a part of the low-energy spectrum of the model inorder to access nonzero, but low, temperatures for largeclusters in the NLCE. The goal is to combine the partial-diagonalization for solving clusters in the last order ofthe series with the traditional full diagonalization ofsmaller cluster at lower orders to push the convergencebeyond temperatures that have been accessible before.The larger the clusters at higher orders, the lower thetemperatures they contribute to in the NLCE, and thesmaller the percentage of the lowest-lying eigenenergiesthat is needed.We benchmark our results from this technique against those obtained from full diagonalization in the NLCEfor the antiferromagnetic Heisenberg model on thecheckerboard lattice with frustration. We find thatin a massively parallel environment, the Lanczos-based method even with the very expensive full re-orthogonalization step can offer a large speedup over thetraditional diagonalization method.The exposition is as follows. In Sec. II, we introducethe model. In Sec. III, we provide the basics of theNLCE and discuss how it can benefit from the Lanczosalgorithm. We discuss the results for the model, scalingand some details of the parallelization scheme in Sec. IV
II. CHECKERBOARD LATTICE HEISENBERGMODEL
The Hamiltonian we consider here is the spin-1/2quantum Heisenberg model on the checkerboard lattice,which is written as ˆ H = (cid:88) ij J ij ˆS i · ˆS j , (1)where ˆS i is the spin-1/2 vector at site i , and J ij = J for nearest neighbor i and j and J ij = J (cid:48) for nextnearest neighbor i and j on every other plaquette ina checkerboard pattern as shown in Fig. 1. J (cid:48) = 0represents the square lattice Heisenberg model. Here,we consider the test case J = 0 . J (cid:48) = 1 .
0. So, J (cid:48) sets the unit of the energy throughout the paper FIG. 1. A 3 × J is the strength of the exchange interaction onnearest neighbor bonds and J (cid:48) is the interaction on nextnearest neighbor bonds in every other 2 × a r X i v : . [ c ond - m a t . s t r- e l ] O c t III. NUMERICAL LINKED-CLUSTEREXPANSION
In the numerical linked-cluster expansion a givenextensive property P ( L ) of the lattice model is expressedas a sum over the contributions to that property fromevery cluster that can be embedded in the lattice L .While a version of the method can be employed forfinite system sizes, its main advantage has been in itsapplications to the models in the thermodynamic limit, L → ∞ . In that limit, the series expansion is given as: P ( L ) / L = (cid:88) c L ( c ) W P ( c ) (2)where c are topologically-distinct clusters that can beembedded in L , L ( c ) is the number of ways per sitecluster c can be embedded in the lattice, and W P ( c ) is thecontribution to property P computed via the inclusion-exclusion principle: W P ( c ) = P ( c ) − (cid:88) s ⊂ c W P ( s ) , (3)where s is a cluster that can be embedded in c (a sub-cluster of c ), and P ( c ) is the property calculated forcluster c using exact diagonalization (ED). L ( c ) canbe thought of as the number of point group symmetryoperations for the underlying lattice that give the clustera distinct orientation. More information on how L ( c )are calculated and other details of the algorithm can befound in Ref. [18].The calculation of properties P ( c ) using ED is themost computationally expensive part of the NLCEs. Forexample, in a site expansion, for which order l of theseries includes all clusters up to l sites, calculations forthe square or honeycomb lattice Fermi-Hubbard modelshave so far been limited to l = 9 [7, 8, 19, 20]. Eventhough there are only 112 topologically-distinct clustersto diagonalize for the square lattice in the 9th orderin that case, the calculations are limited by memoryand time requirements for diagonalization of the largestmatrices for clusters with a size larger than 9 sites.For quantum magnetic models in which the size of theHilbert space ( n Hilbert ) grows significantly slower withthe system size than for Fermi-Hubbard models, thecalculations have been limited to about 15 orders on thesquare lattice [18, 21], not by memory requirements, butby the large number of clusters that need to be solved inhigher orders.In those cases, one can put the limitation back on theED and push the convergence to lower temperatures byemploying expansion schemes that access much largerclusters more quickly as the order increases. That isaccomplished through increasing the size of the buildingblock used to generate clusters. For example, anexpansion with squares as building blocks was used tostudy the checkerboard and square lattice Heisenbergmodels with clusters up to 19 sites in only 6 orders [5].
FIG. 2. Square expansion for the checkerboard latticeHeisenberg model, where clusters at higher orders aregenerated by adding corner-sharing 2 × Figure 2 shows the first five topologically-distinct clustersin the square expansion. In previous studies, triangularbuilding blocks were used to study magnetic modelson the Kagome lattice [6, 22], and rectangular clusterswere considered in the study of entanglement at a two-dimensional critical point [12].Here, we aim to address, to the extent possible, thelimitation on time and memory for full diagonalization,which remains the main issue in the NLCE. We combinethe square expansion NLCE with an efficient partialdiagonalization offered by the Lanczos algorithm toaccess low temperatures much faster than previouslypossible. We base our approach on a key observationthat only small clusters in low orders of the expansioncontribute to properties at the highest temperatures;the inclusion of larger clusters in the series athigher orders does not change the results at highenergies/temperatures. Therefore, a partial knowledgeof the eigenvalues and corresponding eigenvectors at thelow-energy end of the spectrum may be sufficient todeduce new information from larger clusters
A. Efficiency of the Lanczos Algorithm for theNLCE
The Lanczos algorithm [23] has had profoundapplications in solving discrete quantum systems,notably when the ground state is of interest. Thefollowing section will give an overview of the Lanczosmethod in terms of its applicability to the NLCE.For further information concerning the mechanics andderivation of the Lanczos algorithm, see Ref. [23]The algorithm in its most basic form is an iterativeKrylov subspace method in which an orthogonal basisis built and approximations of eigenvectors are found byprojecting our matrix onto the subspace. It is well knownfor its superior space and time efficiency in solving forthe ground state of very large systems. However, themethod is known to have numerical instabilities in finite-precision mathematics [24], which manifest themselves ina loss of orthogonality of the Lanczos vectors which definethe subspace. This leads to spurious degeneracies of thelargest-magnitude eigenvalues, which means that while -4-3.5-3-2.5-2-1.5-1-0.5 0 0.5 1 1.5 0 50 100 150 200 250 300 350 400 E i g e n v a l u e Index
Exact DiagonalizationLanczos n iter = n
Hilbert
Lanczos Reortho n iter = n
Hilbert
Lanczos Reortho n iter =0.5n
Hilbert
FIG. 3. Sorted first 400 eigenvalues of the model Hamiltonianfor a 16-site cluster in the four spin-up sector with n Hilbert =1820. Results from the Lanczos algorithm deviate from theexact ones very early on in the spectrum due to the existenceof degeneracies and the loss of orthogonalization betweenvectors in the Krylov space. A full re-orthogonalization stepin the algorithm restores orthogonality of the Lanczos vectorat every iteration and captures a larger percentage of theexact spectrum as the number of iteration increases. the ground-state behavior may be fully captured, finite-temperature properties are very difficult to ascertain.The solutions to maintaining orthogonality arevaried [25], however, the most basic in capturingthe whole set of eigenvalues to high precision isthe application at every iteration of a Gram-Schmidtorthogonalization with all previous vectors, a fullreorthogonalization scheme. Figure 3 shows the exacteigenvalues of the Hamiltonian for a 16-site cluster inthe spin sector with four spin-ups and twelve spin-downscompared with the values from the Lanczos method withand without full reorthogonalization and two differentnumbers of iterations ( n iter ). It is expected that with n iter = n Hilbert the Lanczos algorithm can find most(but not all) eigenvalues with high degree of precision.The severe degeneracy of the lowest eigenvalues inthe basic Lanczos method and its resolution afterreorthogonalization can be seen in the figure.The reorthogonalization process requires the storage ofabout n iter Lanczos vectors, which has similar memoryrequirements to storing the Hamiltonian matrix when n iter ∼ n Hilbert , and also greatly increases the timecomplexity of the process due to a n iter × n Hilbert scaling. The question thus raised is whether or not thereorthogonalized Lanczos scheme is in fact advantageousover exact diagonalization. Figure 3 includes a plot forthe case where the number of iterations is half the size ofthe Hilbert space. It can be seen that a large fraction oflow-energy states have been captured with high accuracyin that case. A key feature of the NCLE is that withthe first few orders, the series is already convergent
T/J ’ -0.5-0.4-0.3-0.2-0.10 E / J ’ Orders 5 & 6, ED (No Resum.)Order 4, EDOrder 5, EDOrder 6, EDOrder 5, ED, 0.9n
Hilbert
Order 5, n iter =0.9 n
Hilbert
Order 5, n iter =0.5 n
Hilbert
Order 6, n iter =0.5n
Hilbert
J/J ’ =0.5 FIG. 4. Energy per site of the Heisenberg model on acheckerboard lattice with J = 0 . J (cid:48) = 1 . at high temperatures where correlations in the systemare short ranged. Larger clusters in higher ordersprovide information for properties in the thermodynamiclimit only at lower temperatures where correlations growlarger. On the other hand, the Lanczos algorithmcan provide very accurate information about the low-temperature behavior of the largest clusters in the serieseven when the number of iterations is less than thesize of the Hilbert space as low-lying eigenvalues can beconverged, leading to much smaller memory and timerequirements in comparison with ED. In other words,as system sizes increase by increasing the order and theregion of convergence of the series is pushed to lowertemperatures, one can also expect that fewer iterationsare necessary. Furthermore, the Lanczos algorithmis readily parallelizable both in the Hamiltonianmultiplication and in the reorthogonalization step inorder to further reduce the diagonalization time. Adiscussion about scaling is provided in Sec. IV A. IV. RESULTS
To benchmark the performance of the Lanczos-boostedNCLE, we start with the energy per site of our
T/J ’ C v T/J ’ S J/J ’ =0.5 (a) (b) FIG. 5. (a) The heat capacity, and (b) entropy of theHeisenberg model on a checkerboard lattice with J = 0 . J (cid:48) = 1 . n iter of 50% of the size of the Hilbert space is enough toreach the convergence temperature for the entropy with 5 or6 orders. Heisenberg model with J = 0 . J (cid:48) = 1 . n iter = 0 . n Hilbert issufficient to reach the lowest convergence temperatureof the series with five or six orders. The latter are shownas green and red curves in Fig. 4. One can see that theyperfectly agree with results from ED up to temperaturesaround 0 .
3, slightly above the temperature at which theconvergence of the series with six orders is lost. The exactenergies in the thermodynamic limit beyond this pointare already accessible to lower orders. We also show thatby increasing n iter one can systematically reach highertemperatures. With n iter = 0 . n Hilert the agreementwith exact results extends to T ∼ . n iter does not represent the numberof exact eigenvalues obtained in the Lanczos algorithm.We demonstrate that by using the smallest 90% of exacteigenvalues from ED in the Boltzmann average of theenergy in the fifth order, which results in the bluedashed curve in Fig. 4, surpassing the performance ofthe Lanczos algorithm with n iter = 0 . n Hilert . As the order increases and the convergent region isextended to lower temperatures, the number of iterationscan be increasingly smaller fractions of n Hilbert , leadingto the superiority in both space and time complexity ofthe Lanczos-assisted diagonalization over ED. This is dueto the ability of the Lanczos method to capture the lowesteigenvalues with the least number of iterations. Thecalculation of properties in the sixth order with clustersup to 19 sites was only possible previously through theuse of Scalapack routines as done in Ref. [5]. Here, wehave obtained results for the sixth order with n iter =0 . n Hilert in significantly less time. As one can see inFig. 4, those results already capture the average energybefore the series loses convergence around T = 0 . A. Parallelization and Scaling
We use message passing interface (MPI) parallelizationfor ED in our NLCE implementation. We assign acollection of clusters in a given order to a differentnode to be diagonalized. Every compute node in thehigh-performance computer cluster we have used for ourcalculations contains 128 gigabytes of random accessmemory (RAM) and 28 cores. Therefore, we speed upthe diagonalization by using the threaded version of MKLand assigning all 28 cores of each node to diagonalizeone cluster at a time with multiple nodes simultaneouslyat work. In this picture, the run time is expected tobe proportional to the number of clusters assigned toeach node for diagonalization, or inversely proportionalto the number of nodes requested, in the limit of largenumber of clusters. In Fig. 6, we show the inverse runtime (speedup) as a function of number of cores (= 28 × number of nodes) for ED in the fifth order. Since thereare only 11 clusters in that order, the speed generallyincreases with increasing the number of nodes, until wereach 11 nodes (308 cores). Beyond that, there cannotbe any improvement in the run time of ED.We also use MPI to parallelize the inner loops of ourLanczos algorithm. Noting that the largest matriceswe diagonalize in the fifth order have a linear sizeof 12870, and therefore, every core can store at leastup to n Hilbert double-precision Lanczos vectors in theRAM, we distribute the computational tasks for boththe operation of the Hamiltonian on a Lanczos vector
100 1000Number of Cores00.0010.0020.0030.0040.0050.006 / R un T i m e [ / S ec ond s ] Lanczos, n iter =0.5n
Hilbert
Lanczos, n iter =n Hilbert ED FIG. 6. Scaling of the Lanczos partial diagonalizationalgorithm with full re-orthogonalization for up to the 5thorder. For comparison, we also show the same scaling forED for which we use the threaded version of Intel’s MKLto diagonalize every cluster on a node with 28 cores. Theperformance of ED improves rapidly by increasing the numberof nodes until we have as many nodes as the number of clustersin the last order of the series. No speedup can be achievedbeyond that. The Lanczos algorithm with as n iter = n Hilbert quickly surpasses ED in performance and continues to offersome speedup by increasing the number of cores. Choosing n iter = 0 . n Hilbert reduces the computational time by roughlya factor of three. and the reorthogonalization step over all cores available.For this reason, ideally the run time would be inverselyproportional to the number of cores. However, as shownin Fig. 6, we find that the overhead due to communicationrapidly increases and the speedup eventually saturates.However, for n iter = n Hilert our Lanczos-based methodis already faster than our parallel ED scheme when using5 nodes and remains larger than ED as the numberof nodes increases. The reorthogonalization step scaleslike n iter × n Hilbert and is the most expensive part ofour routine. Therefore, decreasing n iter can significantlyspeed up the calculations. We find that choosing n iter = 0 . n Hilert , cuts the run time by a bout a factor of threein the case of the fifth order as shown in Fig. 6.In summary, we have shown that the Lanczosalgorithm can be an efficient and flexible techniqueto partially diagonalize Hamiltonian matricescorresponding to the largest clusters in the NLCE,allowing one to access temperatures not previouslyaccessible using Lapack full diagonalization routines.We benchmark our approach using the square expansionfor the checkerboard lattice Heisenberg model withfrustration and show that by computing less than50% of the eigenvalues through a parallelized Lanczosalgorithm in significantly less time than ED, we can reachtemperatures relevant to the convergence regions of theenergy and entropy of the model in the thermodynamiclimit.More efficient implementations of the Lanczosalgorithm, such as restart Lanczos or smart partialreorthogonalization, can be employed in the future tofurther reduce the computational cost. Our Lanczosapproach can be generalized to produce eigenvectorsin addition to eigenvalues, at an additional cost, forthe calculation of other properties of interest. It canalso be employed for NLCE studies of other quantumlattice models in which the diagonalization steps sets thecomputational limit. Finally, highly-precise results fromquantum Monte Carlo methods can also be used in thelast order in a similar fashion to speedup the calculations,at least in cases where the calculations are not hinderedby the fermion sign problem.
ACKNOWLEDGMENTS
We acknowledge Marcos Rigol for his early idea ofemploying the Lanczos algorithm for full diagonalizationof clusters in the NLCE. EK acknowledges support fromthe National Science Foundation (NSF) under Grant No.DMR-1609560. The computations were performed on theSpartan high-performance computing facility at San Jos´eState University supported by the NSF under Grant No.OAC-1626645. [1] M. Rigol, T. Bryant, and R. R. P. Singh, Phys. Rev.Lett. , 187202 (2006).[2] M. Rigol, T. Bryant, and R. R. P. Singh, Phys. Rev. E , 061118 (2007).[3] M. Rigol, T. Bryant, and R. R. P. Singh, Phys. Rev. E , 061119 (2007).[4] M. Rigol and R. R. P. Singh, Phys. Rev. Lett. , 207204(2007).[5] E. Khatami and M. Rigol, Phys. Rev. B , 134431(2011).[6] E. Khatami, R. R. P. Singh, and M. Rigol, Phys. Rev.B , 224411 (2011). [7] E. Khatami and M. Rigol, Phys. Rev. A , 053611(2011).[8] B. Tang, T. Paiva, E. Khatami, and M. Rigol, Phys.Rev. Lett. , 205301 (2012).[9] E. Khatami, Phys. Rev. B , 125114 (2016).[10] M. Rigol, Phys. Rev. Lett. , 170601 (2014).[11] B. Tang, D. Iyer, and M. Rigol, Phys. Rev. B , 161109(2015).[12] A. B. Kallin, K. Hyatt, R. R. P. Singh, and R. G. Melko,Phys. Rev. Lett. , 135702 (2013).[13] A. B. Kallin, E. M. Stoudenmire, P. Fendley, R. R. P.Singh, and R. G. Melko, Journal of StatisticalMechanics: Theory and Experiment , P06009 (2014).[14] E. M. Stoudenmire, P. Gustainis, R. Johal, S. Wessel,and R. G. Melko, Phys. Rev. B , 235106 (2014).[15] K. Mallayya and M. Rigol, Phys. Rev. Lett. , 070603(2018).[16] I. G. White, B. Sundar, and K. R. A. Hazzard, (2017),arXiv:1710.07696.[17] M. A. Nichols, L. W. Cheuk, M. Okan, T. R. Hartke,E. Mendez, T. Senthil, E. Khatami, H. Zhang, andM. W. Zwierlein, (2018), arXiv:1802.10018.[18] B. Tang, E. Khatami, and M. Rigol, Computer PhysicsCommunications , 557 (2013). [19] E. Khatami and M. Rigol, Phys. Rev. A , 023633(2012).[20] B. Tang, T. Paiva, E. Khatami, and M. Rigol, Phys.Rev. B , 125127 (2013).[21] B. Tang, D. Iyer, and M. Rigol, Phys. Rev. B , 174413(2015).[22] E. Khatami, J. S. Helton, and M. Rigol, Phys. Rev. B , 064401 (2012).[23] C. Lanczos, J. Res. Nat. BUT. Standards , 282(1950).[24] C. Paige, I. Inst. Math. Appl. , 373 (1972).[25] H. D. Simon, Linear Algebra and its Applications61