Imaginary time propagation code for large-scale two-dimensional eigenvalue problems in magnetic fields
IImaginary time propagation code for large-scaletwo-dimensional eigenvalue problems in magnetic fields
P.J.J. Luukko a, ∗ , E. R¨as¨anen b,a a Nanoscience Center, University of Jyv¨askyl¨a, PL 35, FI-40014, Finland b Department of Physics, Tampere University of Technology, FI-33101 Tampere, Finland
Abstract
We present a code for solving the single-particle, time-independent Schr¨odingerequation in two dimensions. Our program utilizes the imaginary time propaga-tion (ITP) algorithm, and it includes the most recent developments in the ITPmethod: the arbitrary order operator factorization and the exact inclusion of a(possibly very strong) magnetic field. Our program is able to solve thousands ofeigenstates of a two-dimensional quantum system in reasonable time with com-monly available hardware. The main motivation behind our work is to allow thestudy of highly excited states and energy spectra of two-dimensional quantumdots and billiard systems with a single versatile code, e.g., in quantum chaosresearch. In our implementation we emphasize a modern and easily extensi-ble design, simple and user-friendly interfaces, and an open-source developmentphilosophy.
Keywords:
Schr¨odinger equation, Imaginary time propagation, Diffusionalgorithm, Quantum chaos
PACS:
Program summary
Program title: itp2d
Journal reference:Catalogue identifier:Licensing provisions:
GNU General Public License, version 3
Programming languages:
C++ and Python
Computer:
Tested on x86 and x86-64 architectures.
Operating system:
Tested under Linux with the g++ compiler. Any POSIX-compliantOS with a C++ compiler and the required external routines should suffice.
RAM:
Has the code been vectorised or parallelized?:
Yes, with OpenMP.
Classification: ∗ Corresponding author
Email address: [email protected] (P.J.J. Luukko)
Preprint submitted to Computer Physics Communications 7 September 2012 a r X i v : . [ phy s i c s . c o m p - ph ] N ov xternal routines/libraries: Nature of problem:
Numerical calculation of the lowest energy solutions (up to afew thousand, depending on available memory), of a single-particle, time-independentSchr¨odinger equation in two dimensions with or without a homogeneous magnetic field.
Solution method:
Imaginary time propagation (also known as the diffusion algorithm),with arbitrary even order factorization of the imaginary time evolution operator.
Additional comments:
Please see the README file distributed with the programfor more information. The source code of our program is also available at https://bitbucket.org/luukko/itp2d.
Running time:
Seconds to hours, depending on system size.
1. Introduction
In this paper we present itp2d : a modern implementation of the imaginarytime propagation (ITP) scheme for solving the eigenstates of a two-dimensional,single-particle quantum system from the time-independent Schr¨odinger equa-tion. Our implementation includes the most recent developments in imaginarytime propagation, namely, the arbitrary order operator factorization by Chin [1],and the exact method of including a homogeneous magnetic field developed byAichinger et al. [2, 3].The computational methods used in our program have been implementedbefore [2, 4, 5], but itp2d is the first to combine them all in a two-dimensionalcode. In addition, with itp2d we focus on a clear, object-oriented, and extensi-ble implementation, without compromising efficiency. We have also emphasizedon making the implementation suitable for solving a great number of eigenstates– up to thousands – in reasonable time. With this we intend to make itp2d auseful tool for studying the energy level spectra and highly excited wave func-tions of two-dimensional quantum systems, for example in the field of quantumchaos [6]. Since solving the single-particle equation is also a crucial part in themany-particle formalism of density-functional theory, itp2d can also be used asa fast eigensolver for realistic electronic structure calculations.There are other algorithms designed for probing the highly excited states ofquantum billiards, such as the various boundary methods (see, e.g. Refs. [7, 8]and references therein), but we have chosen ITP for its versatility and its abilityto handle strong magnetic fields. To our knowledge, ITP with its latest devel-opments has not been thoroughly benchmarked against other general-purposeiterative eigensolvers. In one reported case [9], ITP was found to have a muchbetter scalability with respect to the grid size compared to the Lanczos method,but worse scalability with respect to the number of states. However, these con-clusions were based on a single 3D system solved only up to 10 eigenstates withthe older, 4th order operator factorization. In Sec. 4.3 we present benchmarkresults which indicate that itp2d provides competitive numerical efficiency com-pared to publicly available, general-purpose eigensolvers.2TP can and has been [4] implemented in any number of dimensions. We havechosen to limit our implementation to two dimensions for simplicity, and becauseoperating in two dimensions allows systems with thousands of eigenstates to fitinto system RAM, minimizing slow I/O operations.In order to give a reasonably self-contained presentation, we will first intro-duce the ITP scheme and the underlying algorithms used in our implementation.A reader who is already familiar with ITP can advance directly to Sec. 3, wherewe will describe the implementation in more detail.All formulas in this paper are given in Hartree atomic units. The magnetism-related units follow the SI-based convention, i.e., the atomic unit of magneticfield is (cid:126) /ea , where e is the unit charge and a is the Bohr radius.
2. Algorithms
Imaginary time propagation (also known as the diffusion method ) is a gen-eral algorithm for solving eigenvalue problems such as the time-independentSchr¨odinger equation Hψ = Eψ, where H is the Hamiltonian of the system, and the solution of the eigenvalueproblem is some set of energies { E j } and the associated eigenstates { ψ j } .The ITP algorithm is based on the idea that the (initially unknown) eigen-states of H form a complete basis of the associated Hilbert space, and thus anyarbitrary state φ can be expanded in the eigenstates of H as φ = (cid:88) j c j ψ j , where { c j } is some set of coefficients. By choosing an operator T = exp( − εH ),where ε >
0, we can “filter out” the lower energy states out of this expansionby repeatedly applying T to our initial state φ . After one iteration we get T φ = (cid:88) j c j T ψ j = (cid:88) j ( c j exp( − εE j ) ψ j , (1)in other words, components in the expansion are damped with the damping co-efficient approaching zero exponentially with increasing energy and increasingvalue for ε . By repeatedly applying the operator T and subsequently normaliz-ing the state, all higher energy components are removed from the expansion andonly the ground state of the Hamiltonian remains. The operator T has the sameform as the time evolution operator of a quantum system in the Schr¨odingerpicture, exp( − ı tH ), only with an imaginary number inserted for time, giving theITP algorithm its name. For this reason the value ε is also called the imaginarytime step .In order to get more states besides the ground state one chooses a set ofarbitrary, linearly independent initial states and repeatedly applies T on each3tate followed by orthonormalizing the set of states. By choosing a set of N initial states this procedure leads to the set of N lowest energy eigenstates ofthe Hamiltonian. The convergence speed of each state typically depends on itsenergy, with the ground state converging first and the highest energy states last.This also means that often it is advantageous to include some extra states inthe computation – with more states each iteration takes longer, but since theabsolutely highest states are not required to converge, the number of requirediterations is reduced.Since the method is based on removing unwanted components to project outthe wanted ones, it is essential that the initial states really include all desiredcomponents in the first place, i.e., even though the initial states are otherwisearbitrary, all of them should not be orthogonal to any eigenstate. A simple wayto make sure that this requirement is practically always fulfilled is to chooserandom noise as the initial states.To apply the ITP method in computer simulations we need a method tocompute T φ for an arbitrary state φ , an efficient method to orthonormalize aset of states, and a method to check for convergence, i.e. to assess whether thecurrent set of states is close enough to the true eigenstates of H . After thesesteps are implemented the iterative ITP scheme can be used as summarized inthe following:1. Start with a set of N arbitrary, linearly independent states φ n .2. Apply the operator T = exp( − εH ) to each state.3. Orthonormalize the set of states.4. Check whether convergence has been achieved. If yes, terminate, if not,go back to step 2. The orthonormalization of states can be executed with a standard Gram-Schmidt process, but in our implementation we have opted for the subspaceorthonormalization algorithm [10], which has been shown to be better suitedfor ITP in previous implementations [5]. In the Gram-Schmidt method onestate is chosen to be a starting point of the iterative orthonormalization scheme,whereas the subspace orthonormalization method treats all states equally.The subspace method for orthonormalizing a set { φ j } of N linearly inde-pendent states can be summarized as follows. First the overlap matrix M with M ij = (cid:104) φ i | φ j (cid:105) is computed. Then the Hermitian matrix M is diag-onalized, which results in a unitary matrix U and a diagonal matrix D =diag( m , m , . . . , m N ) such that M = U DU † . Now the linear combinations φ (cid:48) i = 1 √ m i (cid:88) j U ji φ j (2)form an orthonormalized set, which can be confirmed easily by a direct calcu-lation of the inner product (cid:104) φ (cid:48) i | φ (cid:48) j (cid:105) . In total, the subspace orthonormalizationalgorithm for N states amounts computationally to4. Computing N ( N + 1) / M
2. Diagonalizing a N × N Hermitian matrix M
3. Forming the linear combinations (2) which, if the states are stored as N ar-rays of M numbers each, amounts to the computation of a matrix productbetween a N × N unitary matrix and a N × M matrixAll these steps can be implemented easily and efficiently with standard linearalgebra routines. In addition, all these steps can be computed without allocatingmemory for a second set of states.The subspace orthonormalization also provides a way to approximate theenergy of each state based on the eigenvalues { m n } , since with successive iter-ations m n → exp( − εE n ). However, in systems with a high number of states,the energy values E n become very large, which causes the values for m n getvery close to zero. In turn, the accuracy of this approximation becomes quitepoor.Regardless of the algorithm used, a requirement for orthonormalization towork is of course that the states are linearly independent. This can be assertedfor the initial states, but it can happen that the states lose linear independenceduring the propagation. If the states are propagated with a too large time stepseveral states can, for example, get so close to the ground state that they areessentially linearly dependent. This poses limitations on how large time stepscan be used. For practical Hamiltonians the exponential T = exp( − εH ) can not be im-plemented directly in computations. The traditional approach to this problemis to approximately factorize the exponential into an easier form. For a Hamilto-nian of the form H = T + V , the most simple approximation is the second-orderfactorization T = exp( − εV ) exp( − εT ) exp( − εV ) + O ( ε ) . (3)If T represents the kinetic energy operator, T ∝ ∇ , and V is a local potentialoperator, both remaining exponentials exp( − εV ) and exp( − εT ) can be imple-mented easily: the exponential of V is still a local potential, so in the coordinatebasis it is diagonal, and likewise the exponential of T is diagonal in the wavevector basis. This means that for wave functions in the coordinate basis anexponential of V is simply a pointwise multiplication, and an exponential of T is a combination of a Fourier transform and a pointwise multiplication.As pointed out, the factorization in Eq. (3) is only approximate for generalHamiltonians. When using an approximate evolution operator T (cid:48) , we essen-tially replace the original Hamiltonian H with an approximation H (cid:48) such that T (cid:48) = exp( − εH (cid:48) ), and the eigenstates we get with the ITP method are actuallythe eigenstates of H (cid:48) . The better the approximation for T , the more accu-rately the eigenenergies and eigenstates of H (cid:48) match the true result we wouldget for H . Since some kind of approximation for T is required we need to find a5alance between two opposing effects when choosing a value for the imaginarytime step ε : a larger value for ε causes the ITP scheme to converge faster (asseen directly from the “damping factor” in Eq. (1)), but the ε -dependent ap-proximation causes the scheme to converge further away from the true solution.In order to improve the ITP method several improved, higher order approx-imations for T have developed beyond the second-order factorization of equa-tion (3). The most recent improvement is the factorization by Chin [1], thatconstructs an arbitrarily high order approximation for T from the second-orderfactorization: T = n (cid:88) k =1 c k T k ( ε/k ) + O ( ε n +1 ) , (4)where T ( ε ) is the second-order approximation from Eq. (3), and the coeffi-cients { c i } are given by c i = n (cid:89) j =1 j (cid:54) = i i i − j . Using a higher-order approximation for T allows for higher values of ε , whichimproves the convergence rate of the ITP scheme. This is highly advantageous,since even though more complicated approximations for T make the propaga-tion step more computationally intensive, fewer iterations are needed due to thefaster convergence rate. As the number of states N is increased, the computa-tional cost of the propagation step in the ITP scheme scales as ∝ N (each stateis simply propagated independently of the others), but the orthonormalizationstep usually scales as ∝ N or worse. This means that regardless of how com-plicated the propagation operator T is, the orthonormalization step starts toquickly dominate the computation completely, and thus for solving a high num-ber of states it is critical to keep the total number of iterations at a minimumby using a high-order approximation for T . However, since higher-order factor-izations involve an increasing number of arithmetic operations, finite precisionarithmetic poses limits on how high order expansions of type (4) are reasonable.As reported in the case of a separate implementation of ITP [4], we confirmthat order 12 is usually the limit for double-precision arithmetic. In the presence of a magnetic field B characterized by a vector potential A ,the canonical momentum operator of an electron is, in SI-based Hartree atomicunits, Π = − ı ∇ + A , and the kinetic energy operator is T = Π . This operator is no longer di-agonal in wave vector space, so applying the operator exp( − εT ) is no longertrivial. However, as noted by Aichinger et al. [2], when the magnetic field ishomogeneous and parallel to the z -axis, the required exponential term can be6actorized exactly :exp( − εT ) = exp( − ε (Π x + Π y + Π z ))= exp( − ε f x ( ξ )Π x ) exp( − ε f y ( ξ )Π y ) exp( − ε f x ( ξ )Π x ) exp( − ε Π z ) , (5)where Π x , Π y and Π z are the x -, y - and z -components of Π, respectively, andthe coefficients f x and f y are f x ( ξ ) = cosh( ξ ) − ξ sinh( ξ ) and f y ( ξ ) = sinh( ξ ) ξ , (6)given as a function of ξ = (cid:15)B .The next step is choosing the gauge of the vector potential in a way thateach of term in factorization (5) can be implemented efficiently. The lineargauge A = ( − By, , B is the magnetic field strength, is a goodchoice, because then the components of Π are simply Π x = k x − By , Π y = k y ,Π z = k z , in terms of the wave vector k = ( k x , k y , k z ). This means thatthe factorized exp( − εT ) can be applied to a wave function by first Fouriertransforming from the ( x, y, z ) basis to ( k x , y, k z ), where both exp( − ε Π z ) andexp( − ε f x ( ξ )Π x ) are diagonal and easily applied. Then we can Fourier trans-form the remaining y -coordinate in order to get to the basis ( k x , k y , k z ) whereexp( − ε f y ( ξ )Π y ) is diagonal. Finally we transform back to ( k x , y, k z ) in orderto apply exp( − ε f x ( ξ )Π x ) again. All steps require only Fourier transforms andpointwise multiplications, and moreover, the number of required Fourier trans-forms is not increased from the case of zero magnetic field.This method of exact factorization of the kinetic energy part allows for a(possibly very strong) homogeneous external magnetic field to be included inITP simulations without any additional approximations. Another attractivefeature of this method is that it can be made gauge-invariant regardless of thediscretization [3], removing gauge-origin problems that often affect computa-tions with magnetic fields. Throughout the previous discussion, the use of Fourier transforms to go fromthe position to the wave-vector basis has implied the use of periodic boundaryconditions. Switching to Dirichlet boundary conditions would allow the studyof billiard systems, which are common model systems in quantum chaos re-search [6].A simple way to enforce Dirichlet boundary conditions for a rectangularcalculation box is to replace the Fourier transforms with sine transforms, orin other words, to expand the wave functions in eigenstates of a particle in arectangular box instead of plane waves. However, there are two complicationsin this simple approach. First of all, the use of the sine transform still impliesperiodicity across the boundary. The wave functions will be periodic becauseof the Dirichlet boundary conditions, but the wave functions can have a dis-continuous derivative at the boundary. Because of this possible discontinuity of7he derivative, the expansions in sine waves can have spurious, high-frequency“ringing” artifacts. These artifacts will be dampened by the ITP iterations, butthey will worsen convergence.Secondly, with an external magnetic field the sine waves are no longer asgood a basis. For example, applying the Hamiltonian to a combination of sinefunctions results in a combination of sine and cosine functions, since with amagnetic field the Hamiltonian also includes first derivatives. Previously, oper-ators such as the kinetic energy and the exponential of kinetic energy turnedout to be simple: Fourier transform to the wave-vector space, a multiplication,and a transform back. With a magnetic field and Dirichlet boundary conditionsthey become a sine transform, two multiplications, and two inverse transforms,because the sine and cosine parts need to be handled separately. The correctbasis to use would be the eigenfunctions of a particle in a rectangular box witha magnetic field, but to this problem no simple solution is known – comput-ing these eigenfunctions was a major goal for itp2d , and the reason Dirichletboundary conditions were implemented in the code.It should be pointed out, however, that these problems do not prevent com-bining Dirichlet boundary conditions with an external magnetic field, they onlycause slower converge. Our implementation can, for example, solve the first fewthousand eigenstates of the particle in a box with a magnetic field. Improvingthe combination of Dirichlet boundary conditions and a magnetic field will bea major goal for future development of the code.
As discussed previously, the ITP scheme with a fixed imaginary time step ε converges faster with larger ε , but to a more inaccurate solution. For this rea-son the ITP scheme is traditionally coupled with time step adjustment, i.e.,the states are first converged with a larger time step and the time step is subse-quently decreased, iterating this converge-decrease cycle until some final criteriaof convergence is fulfilled. There are therefore two “levels” of convergence in-volved: convergence with respect to the current value of ε , and final convergence.A natural measure of final convergence for a state ψ is the standard deviationof energy σ H ( ψ ) = (cid:112) (cid:104) ψ | H | ψ (cid:105) − (cid:104) ψ | H | ψ (cid:105) , where H is the system Hamiltonian. This quantity also gives an error estimatefor the computed eigenenergies. Convergence with respect to the time stepcan be considered by looking at the change of σ H between successive iterations– when the standard deviation no longer decreases by a significant amountbetween iterations, the state can be considered converged with the current timestep.Simpler measures of convergence can be implemented by looking directly atthe values of energy obtained at each iteration and considering the state con-verged when either the relative or absolute change in energy between successiveiterations gets small enough. Another simple way of defining final convergenceis the point when decreasing the time step seems to be of no use, i.e., the point8hen after decreasing the time step, the state converges with respect to thedecreased time step with only one iteration.Due to the fact that the convergence checks represent in any case an in-significant share of the total computational resources, it is usually best to usethe standard deviation as a measure of convergence. The simpler methods cometo play only when something prevents the use of the standard deviation. Thisoccurs, for example, when using an external magnetic field combined with themethod of enforcing Dirichlet boundary conditions discussed in Sec. 2.5, sincethe ringing artifacts near the edges make the computation of (cid:104) ψ | H | ψ (cid:105) inaccu-rate.
3. Implementation
Our implementation of ITP for two-dimensional systems, itp2d , is based ona high-level, object-oriented design, with calls to optimized external routinesfor time-consuming low-level operations. This makes the program easier tomaintain and extend without compromising computational efficiency. All com-putations are done in SI-based Hartree atomic units and for simplicity, Hartreeatomic units are also used for all input and output (except timing data, whichis given in seconds).The complete ITP simulation implementation is encapsulated in a singlehigh-level C++ class, making our program easily included in separate programsneeding a fast Schr¨odinger equation solver, e.g., for solving the Kohn-Shamequations for density-functional theory calculations. This high-level C++ in-terface is supplemented with a simple (but complete) command line interface,that provides an easy way to run simulations with different parameters withoutrecompiling. The command line interface is implemented using the TemplatizedC++ Command Line Parser Library (TCLAP).The program is distributed with a separate documentation file that coversthe use of itp2d from a more practical point of view. The command lineinterface also includes inline documentation, accessible with the command lineargument --help .For compiling itp2d a simple GNU Makefile is provided. The Makefile isdesigned for the free and portable g++ compiler from the GNU Compiler Collec-tion. The actual program code in itp2d should be standards compliant C++,so other standards compliant compilers can also be used, but this requires mod-ifications to the Makefile. In a similar way, itp2d is only tested on computersrunning Linux, but the program should work in other systems with minimal ef-fort, provided a C++ compiler and the required external routines are available.
The wave functions operated on by ITP are implemented as two-dimensionalarrays of double precision complex numbers on a rectangular grid with uniformspacing. This low-level memory layout is supplemented with a high-level class9nterface providing the necessary arithmetic operations. Similar class interfacesare provided for arrays of wave functions for easily handling several wave func-tions as a whole. Operators acting on wave functions are similarly defined in anobject-oriented fashion, with support for defining sums and products of opera-tors with simple arithmetic operations.The potential part of the Hamiltonian operator is implemented with directpointwise multiplication of the wave function with precomputed values. A fewcommon potential types are provided, and implementing new ones is as easy asproviding a C++ routine which gives the values of the potential as a functionof position. There is also rudimentary support for adding arbitrary types ofrandom noise to the potentials.In the case of periodic boundary conditions with no magnetic field the kineticenergy part of the Hamiltonian is implemented by simply expanding the wavefunction in plane waves via a discrete Fourier transform, multiplying with k / k is the wave vector, and returning to the position basis via an inversediscrete Fourier transform. External magnetic field only shifts the eigenvalues ofthe momentum operator with the vector potential A , so in the case of nonzeromagnetic fields, the states simply need to be multiplied with ( k + A ) /
2. Asdiscussed in Sec. 2.4, the magnetic field is assumed to be homogeneous andparallel to the calculation plane, and for numerical efficiency all wave functionsand operators are expressed in the linear gauge A = ( − By, , − ı ∇ + A ) / −∇ + A ) / − ı A · ∇ , sothat the first part is a simple multiplication in the sine function basis, and thesecond one turns the sine functions into cosines multiplied by a suitable factor,i.e., it is a sine transform followed by a multiplication and an inverse cosine transform.The exponentiated operators exp( − εV ) and exp( − εT ) required for imagi-nary time propagation are implemented using Fourier or sine transforms in asimilar way as the original potential and kinetic energy operators. The expo-nentiated potential operator is still a pointwise multiplication in the positionbasis, and as discussed in Sec. 2.4 the exponentiated kinetic energy operator canbe factorized into parts that can be implemented with discrete Fourier trans-forms and pointwise multiplications. In both cases the multiplication arrays areprecomputed and only recalculated when the time step ε is changed. The fullimaginary time propagation operator is then built from the two operators byoperator arithmetic as specified by the Chin factorization [1] given in Eq. (4),up to the order specified by the user. The resulting chain of operator sums andproducts is simplified when possible by absorbing constant prefactors into theoperators themselves and combining adjacent multiplications.The orthonormalization of wave functions is implemented using the subspaceorthonormalization method described in Sec. 2.2. The inner products and thediagonalization of the overlap matrix are simply delegated to external linear10lgebra routines. For the linear combination two alternative algorithms areprovided: the default one in which the large matrix product (2) is split intomatrix-vector products in order to use as little extra memory as possible, andone where the product is calculated directly, requiring an extra copy of thewave functions. The latter algorithm may be faster in some cases since it makesmaximal use of optimized external routines, but this is offset by the roughlydouble memory requirement which makes a huge difference for large systems.The ITP cycle is started with random noise as the initial wave functions.This helps to ensure that no eigenstate of the Hamiltonian is missing from theexpansions of the initial states due to accidental orthogonality, as discussed inSec. 2.1. The desired number of states to be converged is provided by the user,as is the total number of states to be included in the computation. If the latteris missing, the program adds 25% to the number required to converge. Thisdefault value was empirically determined to provide a good convergence speed.The initial imaginary time step ε is also provided by the user, and after all therequired states have converged with respect to the time step size, it is decreasedby dividing by a user-provided constant. It is also possible to fine-tune theconvergence by explicitly listing all time step values that are to be used. Eachtime convergence with respect to the time step is found, states are also testedagainst the criteria of final convergence, and if it is fulfilled by all the requiredstates the computation ends. The criteria for convergence are also provided bythe user. By default the program uses the standard deviation with respect to theHamiltonian, but other criteria listed in Sec. 2.6 have also been implemented. In addition to the textual output given by the command-line interface, itp2d saves its results and parameters as portable HDF5 data files. All data comingfrom a single simulation run are saved in a single data file. The user can specifywhether, in addition to the parameters given to the simulation, only final ener-gies (along with their error estimates) are saved, or also the final wave functions,or even intermediate wave functions after each iteration. Using a common (asopposed to application-specific) data file format has several advantages. Firstof all, the data can be imported easily to common data analysis software, andaccessing the data is easy: the HDF5 format presents the data as a directory ofdata sets, with descriptive names for each set. With HDF5 even complicated,multidimensional data can be saved without trouble and without complicatinglater data access.
Many parts of the ITP computation are suitable for shared-memory paral-lelization. The most trivial case is the actual propagation step, where each stateis operated on by the imaginary time propagation operator independent of eachother. In a similar way, the task calculating the energy and standard devia-tion of energy for each state is trivially split to several, independent processingthreads. The orthonormalization step is not as easily parallelized, but most of11he work can be distributed by calculating the inner products for the overlapmatrix in parallel, and splitting the matrix product of Eq. (2) into matrix-vectorproducts which are executed in parallel.In all above cases our implementation uses high-level OpenMP instructionsfor parallelization, making the parallelized code simple and readable. It shouldbe noted that due to the large amount of data that needs to be passed tothe execution threads, especially for parallelizing the orthonormalization step,ITP works best with shared-memory parallelization, i.e., several execution coresaccessing the same physical memory. When using the program in large clustercomputers special care should be taken to ensure that there is no unnecessarymemory access across slow network links.
To avoid needless reimplementation, most low-level numerical operationsused in itp2d are delegated to external routines. This also allows the user touse routines heavily optimized to the current hardware. All linear algebra rou-tines are accessed via standard LAPACK and CBLAS interfaces. Our programhas been tested with the portable ATLAS implementation, the MKL libraryfrom Intel, and the ACML library from AMD. Discrete Fourier, sine and cosinetransformations are computed using the heavily optimized library FFTW3 [11].Intel’s MKL library provides a FFTW3-compliant interface, but using MKL forthe transformations has not been tested with our program.
Our program is distributed with a comprehensive unit test suite, imple-mented with the Google C++ Testing Framework. The unit tests cover severallow-level details, such as the accuracy of external Fourier transform routinesand the internal logic of several arithmetic operations, and high-level features,such as running ITP simulations using potentials with known analytic eigen-states and comparing the results. It is advisable to always run this unit testssuite before important calculations to protect against unforeseen errors.
We release itp2d under an open-source license with the intention that itwill foster wide use and future development of the code. Users are encour-aged to improve and extend the code and share their changes with other usersof itp2d . More information about getting involved can be found in the
README file distributed with itp2d .
4. Numerical tests
The harmonic oscillator is an example of a system with a known analyticsolution with or without a magnetic field. The harmonic oscillator with poten-tial V ( r ) = r is also the default potential used in itp2d , so simply running the12 able 1: Calculated energy levels of a harmonic oscillator potential V ( r ) = r with magneticfield strength B = 1. The table also shows the standard deviation of energy σ H , the exactvalue of energy, and the actual error of the result. All results are from a single simulation with5000 states. The final convergence criteria was that the first 4000 states have σ H /E < − .As is evident from the results, the lower energy states converge to a lot higher accuracy. E σ H E exact | E − E exact | × − < −
10 4.5901699437497 2 × − × −
100 14.152475842500 4 × − × −
400 28.311529493753 5 × − × − × − × − × − × − × − × − program with no additional command line parameters will produce the first feweigenstates of the harmonic oscillator. The energy levels of the above potentialwill follow the Fock-Darwin spectrum E nl ( B ) = (2 n + | l | + 1) (cid:113) B − lB, (7)with n = 0 , , , . . . , l = 0 , ± , ± , . . . .Table 1 collects some eigenenergies computed for the harmonic oscillatorby itp2d from a simulation with 4000 states required to converge (5000 statesin total) and magnetic field strength B = 1. The final convergence criteriaused was a relative standard deviation of energy σ H /E of less than 10 − . As isseen in the table, the accuracy of computed energies is very good up to highlyexcited states, and in most cases σ H gives a good upper bound estimate ofthe actual error in the result. For state 1000 the standard deviation σ H is lessthan the actual error. In general, very small values of σ H are not reliable dueto discretization errors. This simulation required 5 iterations of ITP startingwith time step ε = 0 .
1. The simulation used a 500 by 500 grid and 12th orderoperator splitting.
The particle in a box, i.e., a potential that is zero inside a rectangular boxand infinite elsewhere, is another example of a potential with a known energyspectrum – except for the case with nonzero magnetic field. With a magneticfield the system is no longer trivial, and no analytic solution is known. Anotherinteresting feature of this system is that the corresponding classical system showschaotic behavior [12, 13], making the particle in a box with magnetic field aninteresting testbed for quantum chaos studies.Since the system combines Dirichlet boundary conditions and a magneticfield, it is subject to the problems discussed in Sec. 2.5, i.e., slower convergenceand inaccurate calculation of σ H . However, in order to illustrate that these13roblems do not prevent the study of this system with itp2d , we demonstratethat we have calculated a thousand eigenstates of this system. However, sinceno analytic expression of the energy is known, and since we are not aware ofany other program that could compute this many eigenstates of this particularsystem, assessing the accuracy of the calculation is difficult. The ITP calculationstill converges, and the wave functions show no sign of numerical error. Due tothe complicated interplay of the magnetic field and the “hard” potential walls,the eigenstate wave functions have a very intricate form, as shown in Figure 1.The wave functions of a square billiard in magnetic field have been reportedpreviously only for the first few eigenstates [14, 15].365th 546th 622th750th 890th 975th Figure 1: Density plots of a few collected eigenstates of the particle in a box with a mag-netic field. The eigenstates were calculated by itp2d for a π by π box with magnetic fieldstrength B = 1. In our future studies we will focus on the chaotic properties of the present andother billiard systems in magnetic fields by examining the spectral properties indetail. The itp2d code is a versatile tool for that purpose.
To assess the numerical efficiency of our program, we have benchmarked itp2d against publicly available general-purpose eigensolvers. The solvers usedin our test were PRIMME [16] (version 1.1) and SLEPc [17] (version 3.3-p2), thelatter also functioning as an interface for ARPACK (arpack-ng version 3.1.2).Our benchmark consisted of solving an increasing number of eigenstates of a14uartic oscillator potential in zero magnetic field, and measuring the elapsedwall time. The computations were run without parallelization. The results ofthis test are shown in Figure 2. W a llti m e ( s ec ond s ) Number of eigenvaluesSLEPc (Krylov-Schur)SLEPc (ARPACK)PRIMMEitp2d
Figure 2: Elapsed wall time in an eigenstate computation as a function of the number ofeigenstates. The results are shown for four different solvers: SLEPc using its default solveralgorithm (Krylov-Schur), SLEPc interface for ARPACK, PRIMME, which implements severalsolver algorithms and chooses the optimal one dynamically, and itp2d . All computations wererepeated three times, and the average wall time was used for making the figure. The errorbars on the data points show the minimum and maximum wall time value, respectively. Thesame Hamiltonian operator implementation, the same grid size and the same convergencecriterion were used for all computations. The computations were done on a single, dedicatedworkstation.
As with any benchmark, the results need to be interpreted with care. Itis very difficult to compare the performance of different programs and differ-ent algorithms completely fairly. Besides itp2d , the tested eigensolvers onlyuse the Hamiltonian of the system to compute the eigenstates. In this test alleigensolvers use the Hamiltonian implemented in itp2d , which provides itp2d with an advantage. The other solvers compute matrix-vector products of theHamiltonian and a state vector out-of-place (i.e., without overwriting the origi-nal state), whereas the Hamiltonian implementation of itp2d is in-place. Thisincurs an overhead, since each matrix-vector product requires the vector to becopied. This overhead was measured to be small (a few percent of total runtime),and it was subsequently subtracted from the results show in Figure 2. Someadvantage still remains from the fact that in-place Fourier transforms computedwith FFTW3 are somewhat slower than their out-of-place variants [11]. Anotherissue which complicates the interpretation of this simple benchmark is that allthe tested programs and algorithms have several parameters, and truly optimalperformance would require fine-tuning these parameters for each system andproblem size. For example, different preconditioning strategies were not tested15or any of the contestants.As a conclusion, even though our simple benchmark does not capture thewhole truth regarding the performance of itp2d compared to other programs,we are confident that itp2d performs on a level which is comparable to othereigensolver implementations.
5. Summary
The program we have presented, itp2d , is a modern implementation ofthe imaginary time propagation algorithm for solving the single-particle, time-independent Schr¨odinger equation in two dimensions. Its strengths include aclear, object oriented design, and the ability to include a strong, homogeneousmagnetic field. It released with the aim of providing researchers with a flexibleand extensible code package for solving the eigenstates and energy spectra oftwo-dimensional quantum systems.As immediate applications, we find appealing possibilities in the field ofquantum chaos in terms of spectral statistics. Furthermore, it is straightforwardto combine itp2d with real-space electronic-structure methods based on density-functional theory, e.g., into the Octopus code package [18].
6. Acknowledgements
This work was supported by the Finnish Cultural Foundation, University ofJyv¨askyl¨a and the Academy of Finland. We are grateful to CSC – the FinnishIT Center for Science – for providing computational resources. We wish tothank P. Heliste for assistance in testing itp2d , and Prof. A. Stathopoulos foruseful discussions.