GPU-accelerated large-scale quantum molecular dynamics simulation of 3-dimensional C60 polymers
aa r X i v : . [ c ond - m a t . m t r l - s c i ] O c t GPU-accelerated large-scale quantum moleculardynamics simulation of 3-dimensional C polymers Toshiaki Iitaka
Computational Astrophysics Laboratory, RIKEN2-1 Hirosawa, Wako, Saitama 351-0198, Japan
E-mail: [email protected]
Abstract.
Polymerization of C molecular crystal under high pressure and high temperatureis simulated by using linear scaling tight binding molecular dynamics (TBMD) with GraphicProcessing Unit (GPU) as a computational accelerator for matrix-matrix multiplication. Twosets of tight binding parameters were tested.
1. Introduction
High pressure transformation of C molecular crystal has been attracting a great deal ofattention in the past years because of its potential superconductivity and super-hardness. Atambient conditions, C molecules crystallize into a face-centered-cubic (fcc) structure with weakvan der Waals interactions. Under extreme compression, the fcc C crystal may transforminto polycrystalline diamond, diamond-like materials, or graphitic forms. Yamanaka et al.[1, 2, 3] have succeeded in synthesizing two dimensional and three dimensional C polymersat moderate pressure and temperature by forming covalent bonds between C molecules viahybridization change of sp bond to sp bond. Several polymorphs of 3-d C have beentheoretically proposed [3, 4, 5, 6, 7] and studied by using electronic structure calculation,geometry optimization and phonon calculation. However, the experimentally observed structureis not fully consistent with any of theoretically predicted models. Therefore molecular dynamicssimulation of polymerization of C is important to understand the crystal structure. Inthis paper we introduce a large scale tight binding molecular dynamics simulation of 3-d C polymer formation with thousands of carbon atoms in the unit cell. Such large scale quantumsimulation was achieved by the combination of linear scaling tight-binding molecular dynamics(TBMD) [8, 9, 10, 11, 12] and computational acceleration with graphic processing unit (GPU)[13, 14, 15, 16].
2. Linear scaling method
The Hamiltonian of non-selfconsistent field tight binding method is written as H = X i,α ǫ i,α c † i,α c i,α + X i,α,j,β t i,α ; j,β c † i,α c j,β (1)where i and j are the indices of atoms and α and β are the indices of orbitals of the atom while c † and c are the creation and annihilation operators of electron. ! " . / + - - Figure 1.
Krylov basis of order r with the seed vector | i = 0 i for one dimensional uniformchain.The one-body density matrix of the Hamiltonian (1) is defined as ρ = X m | m i f ( ǫ m ) h m | (2)where | m i are the eigenstates of (1) and f ( ǫ m ) is the occupation number of the eigenstate | m i .The density matrix contains all information of the system state. For example, the force on the i -th atom can be calculated as h F i i = Tr[ ρF i ] by using the density matrix and the force matrix.The computational cost is O ( N ) due to the diagonalization of Hamiltonian where N is the totalnumber of atoms.Linear scaling of the computational cost with respect to N is achieved by exploiting thelocality of the density matrix, or solving the local electronic properties of the target atom bytaking into account the atoms within the cutoff radius from the target atom only [8, 9, 10, 11, 12].Then the computational cost becomes O ( n N ) where n is the number of atoms within the cutoffradius from the target atom.
3. Krylov subspace
The Hilbert space of the Hamiltonian of the atoms within the cutoff radius from the target atomis expressed as span {| , α i , | , α i , | , α i , · · · | i, α i · · · , | n, α i} (3)where n is the number of the atoms in the cutoff sphere and α is the index of the atomic orbital,which is hereafter omitted for simplicity. The Krylov subspace of order r with the seed vector | φ i , K r ( H, | φ i ), is generated by multiplying H repeatedly on the seed vector | φ i . K r ( H, | φ i ) = span n | φ i , H | φ i , H | φ i , · · · , H r − | φ i o . (4)Here we use the atomic orbital of the i -th atom as the seed vector, | φ i = | i i , to calculate theforce on the i -th atom. This is quite different from the usual Lanczos diagonalization methods !" , * -. $8 :,;5090<=>*)?5*@9 Figure 2.
Equations of state of the cubic diamond: (1) calculated data with Xu’s parameter[17], (2) calculated data with Omata’s parameter [18], (3) experimental data by Dewaele et al.[19].where the random vector is used for the seed vector. The accuracy of the Krylov subspace as anapproximant of the original Hilbert space increases rapidly as r increases up to m , where m < n is the degree of the minimal polynomial of H with respect to | i i . It was shown [8] that Krylovsubspace of order r ≪ n can describe the system with sufficient accuracy. The efficiency of theKrylov subspace can be understood from the physical point of view. Since the tight bindingHamiltonian (1) contains the hopping terms V ij , multiplication of H on | i i makes the electronhop from the i -th atom to the neighbor atoms linked by the hopping terms. Multiplying H twice on | i i makes the electron spread to the next nearest neighbors. Therefore the Krylov basisgenerated from the seed vector | i i is localized at the target atom and then gradually spreadsout to nearby atoms. Let us examine this with the simplest example of one dimensional uniformchain whose Hamiltonian matrix elements (1) are all zero except t i,i ± = 1 for all i . Then theKrylov basis of order r can be expressed in terms of binomial coefficients, H r | i = 0 i = r X k =0 C r,k | k − r i . (5)In Fig. 1 the Krylov basis in real space representation is shown. Krylov subspace of small order( r ≈
30) can represent quantum states localized around the target atom accurately because itsamples the wave function around the atom with larger weight.Since Krylov basis (4) is not orthogonal, Lanczos algorithm is often used to constructorthogonal basis set from it. Lanczos algorithm transforms an Hermitian matrix H to atridiagonal real symmetric matrix T = V † HV where the transform matrix V = ( v , v , · · · , v r )consists of orthogonal basis v m called Lanczos vector. The Lanczos vector v m and the diagonaland off-diagonal matrix elements of T ( α i and β i ) are calculated by the three term recursionformula: | l k i = H | v k i − α k | v k i − β k − | v k − i (6) v k +1 i = 1 β k | l k i (7)where v = 0, v = | i i , α k = h v k | H | v k i , β k = p h l k | l k i and β = 0. Once T is obtained, theeigenvalues and eigenvectors of this small sized ( r × r ) symmetric tridiagonal matrix are easilycalculated by using common linear algebra libraries such as LAPACK.In order to calculate all forces acting on the N atoms we have to repeat the above procedurefor each atom. By rearranging the atoms into small groups consisting of m atoms ( m ≪ n ≪ N )and calculating simultaneously the forces acting on the m atoms, we can rewrite the three termrecursion formula (6) in the matrix-matrix multiplication form (Level 3 BLAS form) [20], whichis efficiently calculated with GPU. L k = HV k − A k V k − B k − V k − (8) V k +1 = B − k L k (9)where L k = ( | l (1) k i , | l (2) k i , · · · , | l ( m ) k i ), V k = ( v (1) k , v (2) k , · · · , v ( m ) k ), A k is diagonal matrix[ α (1) k , α (2) k , · · · , α ( m ) k ], and B k is diagonal matrix [ β (1) k , β (2) k , · · · , β ( m ) k ].
4. Molecular dynamics of C The tight binding method was applied to the equation of state (EOS) of diamond and moleculardynamics of C polymerization. For this purpose, two types of tight binding model parameters,Xu’s parameter[17] and Omata’s parameter[18] were used. Both parameter sets have the samefunctional forms, but Omata’s parameter was constructed to reproduce the LDA energetics ofnot only the sp graphene and sp diamond covalent bonds but also that of the sp interlayerinteraction. The cutoff radius of Xu’s parameter is 2.6 A, while that of Omata’s parameter is7.0 A so that it can include interaction between sp interlayers, which may be important forpolymerization of C .The EOS of 4x4x4 supercell of cubic diamond was calculated for Xu’s and Omata’s TBparameters (Fig. 2). Convergence was tested with respected to the number of atoms in thecutoff radius n , the order of Krylov subspace r and size of supercell (by using 8x8x8 supercell).The EOS of Xu’s TB parameter reproduces the experimental result well while the Omata’sparameter produces stiffer EOS.Fig. 3 shows the 2x2x2 fcc structure of the C at 0K and 0GPa as the initial structure,Fig. 4 shows the snapshot after 3 ps NPT simulation with Xu’s parameter at 25 GPa and 500K,and Fig. 5 shows the snapshot after 3 ps NPT simulation with Omata’s parameter at 25 GPaand 500K. Simulation with Omata’s parameter successfully reproduced the polymerization ofC under high temperature and high pressure. However, the produced polymer was not perfectcrystal. The finer control of temperature and pressure would be required to obtain better results.
5. Acceleration by GPU
Graphics Processing Unit (GPU) is an electronic device for accelerating graphic processing onpersonal computers and game machines. GPU can perform numerical calculation much fasterthan CPU in some types of computation such as matrix-matrix multiplication [13], fast Fouriertransformation (FFT) [13], and the N-body problem [15]. Application of GPU computation israpidly expanding to many fields of science including brain science, quantum science, astronomyand fluid dynamics [16]. In this paper, we apply GPU computation to tight binding moleculardynamics. According to the Amdal’s law [21] the speedup of a program using GPU is limited bythe time needed for the CPU fraction of the program. The breakdown of computational time ofthe original CPU version of ELSES [10] shows that the most time consuming part is the matrix-vector multiplication in the Lanczos algorithm which is 99.92 % of the total computation time in igure 3. the 2x2x2 fcc structure of the C at 0GPa and 0K. Figure 4.
Snapshot after 3 ps NPT simulation with Xu’s parameter at 25 GPa and 500K.
Figure 5.
Snapshot after 3 ps NPT simulation with Omata’s parameter at 25 GPa and500K.ur case. Therefore the calculation can be significantly accelerated by using fast matrix-vectoroperation. For this purpose, we adopted CUBLAS, a linear algebra library on GPU [13]. Theelapse time for calculating the pressure of 4x4x4 supercell of diamond with Omata’s parameterwas 109.53 s for GPU version with m = 4 and 504.3 s for the original CPU version of ELSES [10].Therefore the GPU version is five times faster than the CPU version. The elapse time for Xu’sparameter was 55.65 s for GPU version and 29.19 s for CPU version. The CPU calculation isefficient when the interaction is short ranged and H becomes sparse matrix, while GPU becomesvery efficient when the interaction is long ranged and the Hamiltonian becomes dense matrix.We used Sun Fire X2250 eight core workstation for CPU computation and NVIDIA GeForceGTX8800 for GPU computation.In summary, formation process of C polymer under high pressure and high temperaturewas simulated by the linear scale tight binding molecular dynamics with hardware accelerationby GPU. Two sets of tight binding parameters were tested. Xu’s parameter has short rangedinteraction resulting in fast simulation on CPU but does not faithfully reproduce polymerization.Omata’s parameter has long ranged interaction and enjoy the acceleration by GPU. It reproduces sp - sp hybridization change successfully but does not reproduce experimental EOS accurately.Further development of accurate tight binding parameters for C polymer would be necessary.This research was supported partially by Grant-in-Aid for Scientific Research on InnovativeAreas ’Earth science based on the high pressure and temperature neutron experiments’ (No.20103001-20103005), from the Ministry of Education, Culture, Sports, Science and Technology(MEXT) of Japan. References [1] Chen X and Yamanaka S 2002 Chem. Phys. Lett. 360 501[2] Yamanaka S, Kubo A, Inumaru K, Komaguchi K, Kini N S, Inoue T and Irifune T 2006 Phys. Rev. Lett. 96076602[3] Yamanaka S, Kini N S, Kubo A, Jida S, and Kuramoto H 2008 J. Am. Chem. Soc. 130 4303[4] Yang J, Tse J S, Yao Y, and Iitaka T 2007, Angewandte Chemie International Edition 46 6275[5] Yang J, Tse J S, and Iitaka T 2007 J. Chem. Phys. 127 134906[6] Zipoli F and Bernasconi M 2008 Phys. Rev. B 77 115432[7] Yamagami Y and Saito S 2009 Phys. Rev. B 79 045425[8] Takayama R, Hoshi T, Fujiwara T 2004 J. Phys. Soc. Jpn 73 1519[9] Takayama R, Hoshi T, Sogabe T, Zhang S L and Fujiwara T 2006 Phys. Rev. B 73 165108[10] Hoshi T and Fujiwara T 2009 J. Phys.: Condens. Matter 21 064233; [11] Ordejon P, Drabold D A, Grumbach M P and Martin R M 1993 Phys. Rev. B 48 14646; [12] Ozaki T 2006 Phys. Rev. B 74 245101; [13] NVIDIA, CUDA. [14] Iitaka T, ”Introduction to Scientific Compution with GPU”, Journal of the Japan Society for ComputationalEngineering and Science, (2007-2008).[15] Hamada T and Iitaka T, ”The Chamomile Scheme: An Optimized Algorithm for N-body simulations onProgrammable Graphics Processing Units”. http://uk.arxiv.org/abs/astro-ph/0703100 [16] Harvard-Riken Joint Symposium : Application of GPU Computation to Brain Science, Quantum Science,Astronomy, Fluid Dynamics and other sciences [17] Xu C H, Wang C Z, Chan C T and Ho K M 1992 J. Phys.: Condens. Matter 4 6047[18] Omata Y, Yamagami Y, Tadano K, Miyake T and Saito S 2005 Physica E 29 454[19] Dewaele A, Datchi F, Loubeyre P and Mezouar M 2008 Phys Rev B 77 094106[20] [21][21]