[PDF] Cation interstitial diffusion in lead telluride and cadmium telluride studied by means of neural network potential based molecular dynamics simulations

Abstract

Using a recently developed approach to represent ab initio based force fields by a neural network potential, we perform molecular dynamics simulations of lead telluride (PbTe) and cadmium telluride (CdTe) crystals. In particular, we study the diffusion of a single cation interstitial in these two systems. Our simulations indicate that the interstitials migrate via two distinct mechanisms: through hops between interstitial sites and through exchanges with lattice atoms. We extract activation energies for both of these mechanisms and show how the temperature dependence of the total diffusion coefficient deviates from Arrhenius behaviour. The accuracy of the neural network approach is estimated by comparing the results for three different independently trained potentials.

Full PDF

CCation interstitial diﬀusion in lead telluride and cadmium telluride studied by meansof neural network potential based molecular dynamics simulations

Marcin Mi´nkowski, Kerstin Hummer, and Christoph Dellago

1, 2 Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090 Vienna, Austria Vienna Research Platform on Accelerating Photoreaction Discovery,University of Vienna, Whringer Str. 17, 1090 Vienna, Austria

Using a recently developed approach to represent ab initio based force ﬁelds by a neural networkpotential, we perform molecular dynamics simulations of lead telluride (PbTe) and cadmium tel-luride (CdTe) crystals. In particular, we study the diﬀusion of a single cation interstitial in thesetwo systems. Our simulations indicate that the interstitials migrate via two distinct mechanisms:through hops between interstitial sites and through exchanges with lattice atoms. We extract ac-tivation energies for both of these mechanisms and show how the temperature dependence of thetotal diﬀusion coeﬃcient deviates from Arrhenius behaviour. The accuracy of the neural networkapproach is estimated by comparing the results for three diﬀerent independently trained potentials.

PACS numbers:

I. INTRODUCTION

Lead telluride (PbTe) and cadmium telluride (CdTe)are semiconductors that combined constitute an im-miscible system, which is often employed in the man-ufacturing of various nanostructures such as two-dimensional nanolayers, one-dimensional nanowires andzero-dimensional quantum dots [1, 2]. Experiments showthat even at very high temperatures, but below the melt-ing point, these two compounds remain separated andmorphological transformations of the shape of the partic-ular phases occur, both, during annealing of an as-grownsample [3] and multilayer crystal growth [4].By the appropriate choice of the experimental con-ditions, such as the thickness of the deposited layers,growth rate and temperature, one can design nanoob-jects of desired shape and size [4–6]. From the applicationpoint of view, the most interesting structures are PbTequantum dots and quantum wells embedded in CdTe,which can emit and detect light in the mid-infrared range[1, 2, 7].Despite many experimental studies concerningPbTe/CdTe growth, the theoretical understanding ofthe dynamics of the underlying processes occurringduring the morphological transformations is scarce. Themultistage disintegration of a single PbTe layer insertedbetween two bulk CdTe materials into separate quantumdots during anneling at high temperatures was repro-duced by the Cahn-Hilliard model [3]. Furthermore, thegrowth of PbTe/CdTe multilayers was studied by meansof kinetic Monte Carlo [5, 6]. However, the modelsused in these studies have coarse-grained characterwithout explicitly taking into account the underlyingprocesses on the microscopic level. Hence, the atomisticmechanism of the morphological transformations acrossthe PbTe/CdTe interface is still unknown.On the atomic scale one can expect that the morpho-logical transformations described above are the result ofdiﬀusion of atoms across the PbTe/CdTe interface [8]. Point defects such as interstitials or vacancies could playa crucial role. For instance, interstitials from one com-pound could diﬀuse across the interface to the other oneand participate there in the transformation of the crystalstructure. Therefore, the ﬁrst step in studying the mech-anism of the morphological transformations is to focuson diﬀusion of native defects in PbTe and CdTe, withoutconsidering the interface.Molecular dynamics (MD) simulations are a commonmethod for studying the time evolution of condensedmatter systems. In order to perform such simulations oneneeds to determine the forces acting on the atoms, whichare often described by empirical potentials. There existempirical potentials for PbTe [9, 10], and CdTe [11, 12]crystals. However, the accuracy of empirical potentialsis in general not adequate due to their limited ﬂexibil-ity. Alternatively, ab initio MD simulations can be per-formed, which are, however, computationally expensiveand therefore not practical for the analysis of the dif-fusion of defects, where large systems and long simula-tion times are required. In order to overcome the diﬃ-culties related to both these approaches, we employ theneural network method of Behler and Parrinello [13–15],in which a high-dimensional artiﬁcial neural network istrained with energies and forces determined by means ofab initio calculations. It has been shown that such neuralnetworks are capable of accurately reproducing the ref-erences data and deliver the accuracy of ab initio meth-ods at a fraction of their cost. To date, this approachhas been applied to study properties of various systems[16, 19–21].In this work, neural networks for bulk PbTe and CdTeare trained and used to carry out extensive molecularsimulations for the diﬀusion of single cationic interstitialsin these materials over a range of temperatures. From theobtained MD trajectories we calculate the diﬀusion coef-ﬁcients for the interstitials as a function of temperatureand analyse the diﬀerent diﬀusion mechanisms occurringin the two systems. For both materials, the same cal-culations are performed for three independently trained a r X i v : . [ phy s i c s . c o m p - ph ] S e p neural network potentials in order to estimate the overallaccuracy of the method.The remainder of the article is organized as follows. InSec. II we review the main physical properties of PbTeand CdTe crystals. Then, in Sec. III, we introduce thecomputational techniques that we employed in our studyand in Sec. IV we present the results of our simulation.Finally, in Sec. V we conclude by discussing our ﬁndings. II. SYSTEMS

PbTe and CdTe are semiconductors belonging to thegroups IV-VI and II-VI, respectively. They possess verysimilar lattice constants, however, their lattice structuresare distinct. According to diﬀraction experiments the lat-tice constant a of PbTe at room temperature is 6.46 ˚A[22] and that of CdTe is 6.48 ˚A [23]. The lattice constantmismatch is therefore very small. However, there exists alattice-type mismatch. PbTe crystallizes in the rocksaltstructure and CdTe in the zinc-blende structure. Theirlattices can be in fact represented as two interpenetrat-ing face-centered cubic (fcc) cation and anion sublatticesshifted with respect to each other by a/ a/ T a and T c ,respectively. III. METHODS

In order to study the diﬀusion of cation interstitials inPbTe and CdTe, the following computational strategy isemployed. First, the Vienna Ab initio Simulation Pack-age (VASP) [17] is used to calculate the reference data,i.e. electronic ground state energies and forces for a givenset of conﬁgurations of PbTe and CdTe with density-functional theory (DFT). These energies and forces arethen used to train neural networks by means of the Neu-ral Network Potential Package (n2p2) [27]. Finally, thetrained neural networks are used for performing MD sim-ulations with LAMMPS [28]. In the post-processing ofthe trajectories obtained at diﬀerent temperatures theposition of the interstitial is identiﬁed at each time stepand from the interstitial’s trajectory its diﬀusion coeﬃ-cient is extracted.This approach is used iteratively, that is the trainingset is repeatedly expanded with new conﬁgurations, forwhich new energies and forces are calculated, and theneural network is retrained at each iteration with theexpanded set. These retrained neural networks are sub-sequently used to perform new MD simulations.Below we describe the methods used in this work inmore detail.

A. Density-functional theory

All our DFT calculations are carried out with theexchange-correlation functional PBEsol [18]. The spin-orbit coupling is not considered. As the starting pointfor generating the set of reference conﬁgurations, 4 × × k -point mesh revealed that the forces obtainedwith a denser 3 × × k -mesh did not signiﬁcantly diﬀerfrom the Γ-point calculations but lasted roughly 10 timeslonger. B. Neural network potential

The data obtained from VASP then can be used totrain a neural network potential, which represents theforce ﬁeld used in the MD simulations. The input of theneural network is an atomic conﬁguration and the outputis its electronic ground state energy. Since the neuralnetwork is an analytic function of the atomic coordinates,the forces acting on each of the atoms can be determinedby simple diﬀerentiation.For training the neural network potential and employ-ing it in MD simulations we use the approach introducedby Behler and Parrinello [13]. In this approach, the in-formation about the atomic conﬁgurations is encoded inso called symmetry functions [29]. They transform theCartesian coordinates into values that are invariant withrespect to the translation and rotation of the whole sys-tem, and to the exchange of two arbitrary atoms of thesame type. The values of these symmetry functions serveas input for the neural network.The ﬁrst thing to consider in constructing suitablesymmetry functions is the cutoﬀ radius R c , which de-ﬁnes the size of the local environment around a givenatom. Its value should be selected in such a way thatall the relevant interactions are taken into account. Here R c = 6 ˚A is chosen, which guarantees that in additionto all the nearest neighbours other atoms within a singlefcc elementary cell are included in the cutoﬀ sphere. Thecutoﬀ is implemented by cutoﬀ functions f ( R ), whosevalues go to zero beyond R c . Several forms of the cut-oﬀ function have been proposed [27]. In this work, wechoose the polynomial f poly2 ( x ) = (cid:40) x [ x (15 − x ) −

10] + 1 for x ≤

10 for x > x = R ij /R c and R ij is the distance between thepair of atoms i and j . The derivatives of this particularfunction are continuous at the cutoﬀ radius up to sec-ond order, such that forces change continuously as atomsmove in and out of the cutoﬀ sphere.Since the purpose of the symmetry functions is to pro-vide a structural ﬁngerprint of the environment of everyatom in the system within a certain cutoﬀ radius, theydepend on the relative position of the given atom (calledthe central atom) and each of its neighbours. Symmetryfunctions are classiﬁed either as radial or angular [29].Radial symmetry functions depend only on the distancesbetween the central atom and its neighbours and are ex-pressed as a sum of two-body functions. On the otherhand, angular symmetry functions depend additionallyon the angle spanned by the central atom with each pair of its neighbouring atoms. Therefore, they are expressedas a sum of three-body functions.It is important to note that since both our systems con-tain two diﬀerent elements, separate symmetry functionsmust be provided to describe the distribution of atomsof each element and with each of them as the centralatom. For radial functions there are four possibilities,whereas for angular functions there are six possibilities,i.e. two types of central atoms times two types of neigh-bour atoms, and two types of central atoms times threepossible types of neighbour pairs, respectively.We choose the following radial symmetry function G i = (cid:88) j e − ηR ij f c ( R ij ) , (2)which is a sum of Gaussian functions multiplied by thecutoﬀ function. The parameter η determines the widthof the Gaussian. Depending on the value of η , thefunction has a speciﬁc range of arguments in which itchanges most steeply and is therefore most sensitiveto changes of the interatomic distances. We choosethe values of η in such a way that the radial sym-metry functions are equally spaced and the whole rel-evant range of interatomic distances is covered, η =0 . . .

4; 0 .

07; 0 .

12; 0 .

2; 0 .

3; 0 .

5; 0 . − .In addition to the radial functions we also choose thefollowing angular symmetry functions G i =2 − ζ (cid:88) j,k (cid:54) = i (1 + λ cos θ ijk ) ζ e − η ( R ij + R ik + R jk ) × f c ( R ij ) f c ( R ik ) f c ( R jk ) , (3) G i =2 − ζ (cid:88) j,k (cid:54) = i (1 + λ cos θ ijk ) ζ e − η ( R ij + R ik ) × f c ( R ij ) f c ( R ik ) . (4)Here, θ ijk is the angle spanned by the atoms i , j and k with the atom i at the center. The functions G i and G i are called narrow and wide symmetry functions, respec-tively [27, 30]. In contrast to G i , in the case of G i nocutoﬀ function acts on the distance between the atoms j and k , such that the contributions of wider angles arenot suppressed. The radial parts of the angular symme-try functions are identical with those of the radial sym-metry functions. Their angular parts, on the other hand,depend on the angular distribution of the neighbouringatoms. Therefore, they are in fact sums of three-bodyfunctions. The parameter λ is either 1 or -1, which setsthe maximum of the angular part at 0 ◦ and 180 ◦ , re-spectively. The value of ζ , on the other hand, controlsthe width of the function. We choose angular symmetryfunctions with λ = ± ζ = 1, therefore withonly two diﬀerent angular parts. The values of η in theradial part are chosen the same as for the radial functions(2).The values of the symmetry functions for a given con-ﬁguration are passed as the input to the neural network,which gives the total energy of that conﬁguration as theoutput. Between the input and the output layer are hid-den layers, which consist of nodes. The value at eachnode depends on the values in the nodes in the precedinglayer. The particular way in which the values from onelayer are transformed into the values in the nodes in thenext layer is determined by the parameters of the neuralnetwork, called weights and biases and by the choice ofthe activation functions.A neural network architecture with two hidden layers,each of which has 25 nodes is chosen. The contributionof the atom i to the total energy of the system is givenby E i = f (cid:40) b + N (cid:88) l =1 a l · f l (cid:34) b l + N (cid:88) k =1 a kl · f k  b k + j max (cid:88) j =1 a jk · G ij  . (5)Here G ij is the j th symmetry function centered at theatom i , j max is the total number of symmetry functionsfor each atom, a kk +1 ij is the weight of the contribution ofnode i in layer k to node j in layer k + 1, b ji is the biasfor node i in layer j and f ji is the activation function fornode i in layer j . For the ﬁrst and the second layer wechoose the hyperbolic tangent as activation function andfor the output layer the linear function is used.The sum of energy contributions from each atom inthe system E = (cid:80) i E i is the total potential energy ofthe given conﬁguration. The force F i acting on atom i isobtained by calculating the gradient of the total potentialenergy with respect to the coordinates of atom i , F i = −∇ E . For a three-dimensional system with N atoms,there are in total 3 N force components (3 for each atom).During the training of the neural network, the weightsand biases are adjusted to minimize the diﬀerence be-tween the energies and forces predicted by the neuralnetwork and the reference data. This diﬀerence is quan-tiﬁed by the cost functionΓ = n (cid:88) i =1 ( E ref i − E NN i ) + β n (cid:88) i =1 N n (cid:88) j =1 ( F ref ij − F NN ij ) , (6)where indices i and j enumerate conﬁgurations andatoms, respectively. The total number of conﬁgurationsis n , each of which contains N n atoms. The parameter β tunes the importance of the forces with respect to the en-ergies. For the purpose of training the reference data aredivided into a training set and a test set. This is done toensure that no overﬁtting occurs, that is the error of thetest set should be comparable with that of the trainingset. The neural network training proceeds in 50 epochs,during each of which all the parameters of the neural net-work are updated. Since the number of forces exceeds byfar the number of energies, we use only a 0.0008 fractionof all the forces for the training so the number of energy and force updates is similar. That fraction is randomlyselected at the beginning of each epoch.For optimization of the cost function (6) we use theextended Kalman ﬁlter [31–33], which has been imple-mented in n2p2 [30]. The Kalman ﬁlter is an algorithmoriginally used for estimating a dynamical system’s un-known state based on a series of noisy measurements.Later it was shown that it can be also used for trainingneural networks [34]. C. Molecular dynamics

Once the neural network for a given system is trained,it can be used for performing MD simulations. The pro-gram n2p2 provides an interface to LAMMPS [27], whichis used for all MD simulations in the present work.Since our aim is to determine the diﬀusion coeﬃcientof a cation interstitial, we carry out MD simulations for4 × × n steps,the mean square displacement (MSD) [35, 36] is esti-mated using (cid:104) r ( n ) (cid:105) = (cid:80) N − ni =1 ( r i + n − r i ) N − n . (7)Here, r i is the position of the interstitial after i stepswith respect to its starting position and N is the totalnumber of collected conﬁgurations. Expression (7) takesinto account that for a time lag of n steps, there are N − n positions that can be used to determine the MDS. Theestimated MSD is more accurate for shorter times, sincethere are more short time lags than long ones.In order to extract the diﬀusion coeﬃcient from theMSD, expression (7) is plotted as a function of time t .By ﬁtting a linear function to these data, the diﬀusioncoeﬃcient D is obtained from the slope according to D = MSD slope2 d , (8)where d is the number of dimensions. In our case d = 3.With this procedure the diﬀusion coeﬃcient can bestudied in the whole range of temperatures. This fur-ther allows to calculate the activation energy by ﬁttingthe data to the Arrhenius law. Moreover, the number ofjumps in the trajectory is counted by analysing the dis-placements of the interstitial. In doing so, two diﬀerenttypes of diﬀusion jumps are distinguished, i.e., exchangeand direct jump. IV. RESULTSA. Neural network training

For each of the two investigated materials, PbTe andCdTe, three independent neural networks with diﬀerentrandom seeds chosen for the initialization of the valuesof the networks parameters were trained. Consequently,the ﬁnal values of these parameters were diﬀerent for eachnetwork also at the end of the training, even though allthree networks represented the same system.The conﬁgurations used for the training were createditeratively. First, the lattice constants of the PbTe rock-salt and CdTe zinc blende unit cells were optimized us-ing VASP. The initial set consisted of a few hundredsof 4x4x4 supercells created with these optimized latticeconstants. In addition to the perfect structures, conﬁg-urations with the atoms displaced randomly from theirequilibrium positions were also used. To the subsequentsets conﬁgurations with slightly modiﬁed lattice con-stants were added to account for the thermal expansionof crystals at ﬁnite temperatures. Moreover, new conﬁg-urations were also taken from MD trajectories. Duringthe MD simulations, an extrapolation warning was pro-duced each time the value of some symmetry function wasout of range for which the neural network was initiallytrained. The number of such warnings was monitoredduring the simulations and the conﬁgurations with thelargest number of extrapolation warnings were added tothe set. Additionally, predictions for the same trajec-tory by two independently trained neural networks werecompared and those conﬁgurations for which the predic-tions diﬀered considerably were also added. This pro-cedure was repeated until there were no extrapolationswarnings and no signiﬁcant diﬀerences between diﬀerentneural networks in the relevant range of temperatures.After training the neural networks for defect-free crys-tals, a single cation interstitial was introduced to the sys-tems, that is Pb in PbTe and Cd in CdTe. This requiredfurther ab initio calculations and further retraining ofthe neural network. For creating conﬁgurations with an interstitial the same procedure was used as for the defect-free crystal. Special attention was paid to conﬁgurationswith the interstitial around the transition states. Theywere identiﬁed visually and added to the training set.During the MD simulations there are fewer such conﬁgu-rations than those with the interstitial around one of theenergetic minima but the former are very important forreproducing the correct energy barrier for the diﬀusion.In total 4898 conﬁgurations of PbTe and 2866 of CdTewere generated, of which 90% were used as training setand 10% as test set. For the training all the energies fromthe test set were used and 0.08 % of the forces, which gavea comparable number of energy and force updates in eachepoch. The relative importance of the forces was set to β = 5. The training proceeded for 50 epochs and theweights and biases from the last epoch were used for theMD simulations. NNP energy RMSE force RMSE [meV/atom] [meV/˚A]PbTe nnp-1 0.493/0.569 72.7/68.9PbTe nnp-2 0.475/0.568 72.9/71.3PbTe nnp-3 0.498/0.569 70.5/70.3CdTe nnp-1 0.255/0.381 54.8/63.3CdTe nnp-2 0.254/0.365 59.7/61.6CdTe nnp-3 0.250/0.544 56.9/72.3TABLE I: Root mean square errors for energies and forces ofthe neural networks trained in this work. The ﬁrst value ineach of the pairs is the error of the training set and the secondone is that of the test set.

In Table I the root mean square errors (RMSE) forall the neural networks trained and used subsequentlyin this work are listed. They are comparable to RMSEobtained previously with neural network potentials forother materials [20, 30]. In all cases the RMSE is sim-ilar and the error for the training and the test set is ofthe same order. We notice that for CdTe the errors arealways smaller than for PbTe.

B. Diﬀusion of cation interstitials

During the MD simulations the interstitial atom dif-fuses in the supercell. No creation of other defects due tothermal ﬂuactions was observed, not even at the highesttemperature studied, 1200 K. Examples of trajectories ofthe defect obtained from the MD simulations are shownin Fig. 1. Along these trajectories, two distinct mecha-nisms of diﬀusion were observed. One of them involvedsimple hops of the diﬀusing interstitial atom betweenneighbouring interstitial sites, while the other mecha-nism was an exchange of the interstitial atom with oneof the lattice atoms. During the latter process the inter-stitial took over the position of the lattice atom, whichsubsequently became a new interstitial. In PbTe andCdTe, both, hops and exchanges were observed. How-ever, in PbTe exchanges represent the dominant mecha-nism, whereas in CdTe hops are the preferred one. Bothmechanisms will be discussed in more detail in the nextsubsection.In Fig. 2 the mean square displacement for a singlecation interstitial in PbTe and CdTe is shown as the func-tion of the time lag. The plots correspond to the trajec-tories generated at various temperatures for the potentialnnp-1. As expected, for short time lags the dependenceof MSD on time is linear. For longer times the averag-ing is done over fewer time lags with more correlationsleading to larger statistical errors. Therefore, in order toobtain the diﬀusion coeﬃcient we ﬁtted a linear function

FIG. 1: Trajectory of a Pb interstitial in PbTe (top) anda Cd interstitial in CdTe (bottom) at T = 700K for nnp-1projected on the xy -plane. The green and red circles denotethe beginning and the end of the trajectory, respectively. FIG. 2: Mean squared displacement measured for a single Pbinterstitial in PbTe (top) and a single Cd interstitial in CdTe(bottom) at temperatures in the range from 700 K to 1200 Kfor nnp-1. The dashed lines represent a linear ﬁt to the MSDcurve in the range 2-14ps, from which the diﬀusion coeﬃcienthas been extracted. . to the linear part of the MSD curve, that is for short timesbetween 2 and 14 ps. The same procedure was appliedto extract the diﬀusion coeﬃcient in all the trajectoriesstudied.Moreover, the convergence of the method applied istested by means of block averaging [37, 38]. For thispurpose, the series of instantaneous square displacementsat 10 ps, for which the MSD is still linear, is obtainedfor the whole trajectory. This series corresponds to thesubsequent terms in the sum in Eq. (7) and contains M = N − n data points, which is equal to the number ofthe time lags. Then the series is divided into n B equalblocks, each of which has M B = M/n B data points. Foreach of the blocks the MSD is extracted by taking theaverage over M B data points, yielding M SD ( i ) B . Then FIG. 3: Block averaging for the MSD at 10 ps of a singleinterstitial at 700 K in PbTe (red) and CdTe (blue) for nnp-1. The product of the block size and the variance of theaverage MSD reaches a plateau when plotted as a function ofthe block size. the average over all block averages is calculated as (cid:104)

M SD (cid:105) B = 1 n B n B (cid:88) i =1 M SD ( i ) B . (9)The variance of the block average is estimated by σ B ( M SD ) = 1 n B n B (cid:88) i =1 ( M SD ( i ) B − (cid:104) M SD (cid:105) B ) . (10)If the simulation is properly converged, the product M B σ B ( M SD ) should reach a plateau s in the limit M B → ∞ . The variance of the average MSD can bethen estimated as σ ( (cid:104) M SD (cid:105) ) = sM (11)and the error of the diﬀusion coeﬃcient is equal to∆ D = σ ( (cid:104) M SD (cid:105) ) / t, (12)where t = 10 ps.In Fig. 3 the method is illustrated for nnp-1 for theinstersitial trajectories in PbTe and CdTe at the temper-ature of 700 K. The product of the block size and thevariance of the average MSD is plotted as the functionof the block size. For both systems it reaches a plateauaround M B = 3500. It corresponds to the values of s around 8 . · ˚A in PbTe and 1 . · ˚A in CdTe, andthe errors of the diﬀusion coeﬃcient 7 . · − cm /sand 9 . · − cm /s, respectively. The same procedureis performed in the whole range of the temperatures stud-ied and for all three neural network potentials.The temperature dependence of the diﬀusion coeﬃ-cients is shown for all neural network potentials in the Arrhenius plots in Fig. 4. The values of the diﬀusioncoeﬃcient at particular temperatures are represented bythe dots with the corresponding error bars. Additionally,the data points are ﬁtted with the Arrhenius equation D ( T ) = D exp( − βE a ) , (13)where β = 1 / ( k B T ). Here, E a is the activation energyfor the diﬀusion and D is the diﬀusion prefactor. Bothparameters can be extracted from the ﬁt. The error ofthe activation energy is estimated by converting Eq. (13)into E a = − ln( D/D ) β (14)and performing error propagation with respect to the dif-fusion coeﬃcient ∆ E a = ∆ DβD . (15)

FIG. 4: Diﬀusion coeﬃcient for a Pb interstitial in PbTe (top)and a Cd interstitial in CdTe (bottom) for three diﬀerentneural network potentials ﬁtted to the Arrhenius law.

FIG. 5: Scheme of the hop and the exchange mechanism of diﬀusion of an interstitial Pb atom in PbTe. The arrows representthe jumps of the atoms involved in the respective mechanism. The ﬁnal position of the interstitial atom is shown with thetransparent colour. During a hop the interstitial Pb atom jumps along one of the [100] directions between energeticallyequivalent minima. On the other hand, during an exchange the interstitial Pb atom replaces one of the nearest lattice Pbatoms, which subsequently becomes the new interstitial. As seen in the scheme, one exchange can be equivalent to two or threehops.FIG. 6: Scheme of the hop and the exchange mechanism of diﬀusion of an interstitial Cd atom in CdTe. The arrows representthe jumps of the atoms involved in the respective mechanism. The ﬁnal position of the interstitial atom is shown with thetransparent colour. During a hop the interstitial Cd atom jumps between T c and T a positions. On the other hand, during anexchange the interstitial replaces one of the nearest lattice Cd atoms, which subsequently becomes the new interstitial. As seenin the scheme, one exchange can be equivalent to two hops. The above result depends on the temperature, here forthe estimation of the error T=1000 K is chosen, whichis the value in the middle of the range of the studiedtemperatures.

NNP E a [meV] D [cm /s]PbTe nnp-1 322 ± · − PbTe nnp-2 250 ± · − PbTe nnp-3 338 ± · − CdTe nnp-1 368 ± · − CdTe nnp-2 391 ± · − CdTe nnp-3 389 ± · − TABLE II: Activation energies E a and diﬀusion prefactors D for Pb and Cd interstitials in PbTe and CdTe, respectively,extracted from the Arrhenius plot for each of the neural net-work potentials. In Table II the activation energies and the diﬀusionprefactors for all the Arrhenius plots in Fig. 4 are col-lected. For PbTe the activation energies tend to be lowerthan for CdTe, which suggests that diﬀusion of intersti-tials in PbTe occurs more easily than in CdTe. On theother hand, the diﬀusion prefactors in CdTe are higherthan in PbTe. Due to competition between these twoeﬀects, the diﬀusion coeﬃcient is larger in PbTe at lowtemperatures, but at high temperatures it is higher inCdTe. For PbTe nnp-2 the diﬀusion prefactor is twice aslow as for the other two PbTe neural network potentials.However, it is compensated by the lower activation en-ergy for nnp-2. As can be seen in Fig. 4, the diﬀusioncoeﬃcient for PbTe nnp-2 at low temperature is slightlyhigher than for nnp-1 and nnp-3.

C. Diﬀusion mechanisms

As already mentioned, two diﬀerent mechanisms of in-terstitial diﬀusion were observed in the MD simulationsof PbTe and CdTe. In one of them the interstitial atomsimply jumps between two diﬀerent energetic minima, inthe other one it exchanges its position with one of thelattice atoms. We refer to these mechanisms as hops andexchanges, respectively.The mechanisms of diﬀusion in PbTe are illustratedin Fig. 5. Since there is only one equilibrium positionfor the interstitial atom in PbTe and two in CdTe, thediﬀusion in the former is simpler. During a hop, the inter-stitial Pb atom jumps between the nearest-neighbouringminima along the [100] direction. The length of the jumpis equal to a/

2. On the other hand, during the exchangeprocess the interstitial Pb atom kicks out one of its fourneighbouring Pb lattice atoms, which in turn assumes theposition in one of the neighbouring minima, either in the[110] or [111] direction relative to the original position ofthe ﬁrst interstitial. The eﬀective jump of the interstitialPb atom is therefore ( a/ , a/ ,

0) or ( a/ , a/ , a/

2) and analogously in all the equivalent crystallographic direc-tions. Hence, during an exchange the interstitial movesby a distance of a √ / a √ /

2, respectively.Analogously, in Fig. 6 the diﬀusion mechanisms inCdTe are shown. The diﬀusion of interstitial Cd atomsis more complex. As mentioned in Sec. II, for the intersti-tial Cd atom there are two diﬀerent equilibrium positionslabelled T a (tetrahedral anion site) and T c (tetrahedralcation site). Moreover, the Cd interstitial can exist inthree charge states [39, 40]: one neutral and two chargedones, Cd + and Cd . The height of the energy barrierbetween the interstitial sites depends on the particularcharge state. In the present case the interstitial atomoccupies most of the time the T a site and only for a veryshort time it is found in T c position, which correspondsto the state Cd discussed in Refs. 39 and 40. Thediﬀusion through hops proceeds between T a and T c sitesalong the [110] direction, as described in Ref. 39. It con-sists of jumps of the interstitial Cd atom between thesesites in the directions [111] and [11 − √ a . Exchanges, on the other hand, occurmostly between two T c sites, also in the [110] direction.One exchange is therefore eﬀectively equivalent to twosuccessive hops.As can be seen from the above analysis of the diﬀu-sion mechanisms, in both materials, PbTe and CdTe,the interstitial jump is longer for an exchange than fora hop. As a consequence, even though there are fewerexchanges than hops in CdTe, the contribution of bothmechanisms to the total diﬀusion coeﬃcient is compa-rable. From the MD trajectories the number of jumpevents for both hops and exchanges was extracted. Sincethe identity of the interstitial (speciﬁed in LAMMPS byits index) and its position are known at each step, a jumpis counted when the particle is displaced by the lengthcomparable with the distance between the neighbouringinterstitial sites. If the index of the particle changes afterthe jump, it is considered as an exchange, otherwise it isa hop. The corresponding jump rates k are shown as anArrhenius plot for all considered neural network poten-tials and compared for PbTe and CdTe in Fig. 7. Thisquantitative analysis conﬁrms the observation that ex-changes dominate in PbTe and hops in CdTe. However,for both materials their ratio becomes closer to 1 as thetemperature increases. At 700 K one of the jump ratesis always around one order of magnitude higher than theother one, while at 1200 K they are of the same order.As can be inferred from Fig. 7, for both mechanismsthe jump rates follow the Arrhenius law. Therefore, inanalogy to the total diﬀusion activation energy, the par-ticular activation energies for hops and exchanges can bedetermined by ﬁtting a line to the Arrhenius plots in Fig.7. In Table III, these activation energies, E hops for hopsand E ex for exchanges, are summarized for PbTe andCdTe. As expected, for PbTe the activation energy ishigher for hops than for exchanges and for CdTe it is theopposite. For each neural network potential the eﬀective0 FIG. 7: Arrhenius plots of hop and exchange rates in PbTe (left) and CdTe (right). The dots represent the values of theserates extracted from the MD trajectories at given temperatures. The lines are the corresponding Arrhenius ﬁts.

NNP E hops [meV] E ex [meV]PbTe nnp-1 515 298PbTe nnp-2 461 224PbTe nnp-3 564 309CdTe nnp-1 218 402CdTe nnp-2 256 399CdTe nnp-3 254 447TABLE III: Activation energies for the two diﬀerent diﬀusionmechanisms in PbTe and CdTe, hops ( E hops ) and exchanges( E ex ), extracted from the Arrhenius plots for the correspond-ing jump rates. activation energy given in Table II is located between thecorresponding energies listed in Table III. These separateactivation energies can be further used to ﬁt a more com-plex function to the data than the Arrhenius law. Thediﬀusion coeﬃcient for a particle which jumps by meansof two independent energetically activated mechanisms can be written as the sum D = D hops0 exp( − βE hops ) + D ex0 exp( − βE ex ) , (16)where D hops0 and D ex0 are the diﬀusion prefactors for hopsand exchanges, respectively. The results of this ﬁtting aresummarized in Table IV and the corresponding curvesare shown in Fig. 8. By comparing the curves with thosein Fig. 4, it can been seen that considering separatemechanisms of diﬀusion allows for a more accurate ﬁttingof the temperature dependence of the diﬀusion coeﬃcientto the simulation data.Diﬀusion of Pb in PbTe was studied experimentally inRef. 41. The activation energy was measured to be 249meV, which is close to the values reported in this work(322, 250, 338 meV). In contrast, the diﬀusion prefactorwas found to be 3 . × − cm /s, which is three orders ofmagnitude smaller than our result. However, the methodused in Ref. 41 is based on radioactive isotopes, whichdoes not take into account the exchange mechanism.The diﬀusion coeﬃcient for a Cd interstitial in CdTewas measured in Ref. 42. The value obtained there for1 FIG. 8: Diﬀusion coeﬃcient for a Pb interstitial in PbTe (top)and a Cd interstitial in CdTe (bottom) for three diﬀerentneural network potentials ﬁtted to the relation (16).

NNP D hops0 [cm /s] D ex0 [cm /s]PbTe nnp-1 3.106 · − · − PbTe nnp-2 2.074 · − · − PbTe nnp-3 4.731 · − · − CdTe nnp-1 1.146 · − · − CdTe nnp-2 6.373 · − · − CdTe nnp-3 2.542 · − · − TABLE IV: Diﬀusion prefactors for hops ( D hops0 ) and ex-changes ( D ex ) of interstitials in PbTe and CdTe for the dif-ferent neural network potentials. the temperature 800 K was 1.75 · − cm /s, which is oneorder of magnitude smaller than the value calculated inthis work (1.95 · − ; 1.80 · − ; 1.61 · − cm /s).Moreover, diﬀusion of a cation interstitial atom inCdTe was studied in Ref. 26 by means of nudged elas-tic band method (NEB) [43]. The corresponding energybarrier determined there is 330 meV, which is close to theactivation barriers reported in this work (368, 391, 389 meV). However, the diﬀerence between the NEB methodand the approach used here is that in the former oneﬁnds the minimum energy path between two ﬁxed end-points, which allows to get the energy barrier but doesnot take into account ﬁnite temperature eﬀects. In MDsimulations the system evolves according to the equa-tions of motion at speciﬁed external conditions (such astemperature). Moreover, because in NEB the initial andthe ﬁnal conﬁgurations are ﬁxied, it is not possible toﬁnd any new diﬀusion mechanisms, as it was done in thiswork. V. CONCLUSIONS

In this work, the diﬀusion processes of interstitial Pband Cd atoms have been studied in a supercell of PbTeand CdTe, respectively. Futhermore, a procedure of ex-tracting the value of the diﬀusion coeﬃcient from thetrajectories generated by neural network based MD sim-ulations has been demonstrated. For the training of theneural network potentials ab initio data calculated withthe PBEsol functional were used. For both systems theresults for the diﬀusion coeﬃcients for three diﬀerent in-dependently trained neural network potentials were com-pared. The corresponding activation energies were ex-tracted and it was found that they diﬀer from each otherby no more than 100 meV, which can be viewed as the ac-curacy of the neural network approach used in this workfor the calculation of activation energies.Both in PbTe and CdTe two diﬀerent mechanisms ofinterstitial diﬀusion have been observed, namely hopsand exchanges, for which separate activation energieshave been extracted. Hops were more frequent in CdTeand exchanges in PbTe. However, in both materials ex-changes had longer eﬀective jump lengths. Therefore, in-terstitial diﬀusion is dominantly controlled by exchangesin both materials. Taking into account two diﬀusionmechanisms with diﬀerent activation energies explainsthe deviations of the temperature dependence of the dif-fusion coeﬃcient from the Arrhenius law.The mechanism of atom exchange is particularly in-teresting in the context of the morphological transforma-tions observed experimentally in PbTe/CdTe systems. Inthis work, only self-diﬀusion of cations within either PbTeor CdTe has been considered. However, the frameworkpresented here can be also used to study the diﬀusion offoreign atoms in the crystal. One can expect that Pbcations introduced in CdTe exchange with the host Cdcations and Cd cations in PbTe exchange with the hostPb cations. This could lead to a subsequent rebuild of thelocal crystal structure, which in turn could be an under-lying mechanism for the morphological transformationsobserved in the PbTe/CdTe growth and annealing exper-iments. Further studies in this direction will be necessaryto clarify this question.2

VI. ACKNOWLEDGEMENTS

The research has been supported by the Austrian Sci-ence Fund (FWF) project M 2382-N28. Calculations have been performed on the Vienna Scientiﬁc Cluster(VSC). [1] W. Heiss, H. Groiss, E. Kaufmann, G. Hesser, M. B¨oberl,G. Springholz, F. Sch¨aﬄer, Appl. Phys. Lett. , 192109(2006)[2] K. Koike, H. Harada, T. Itakura, M. Yano, W. Heiss, H.Groiss, E. Kaufmann, G. Hesser, F. Sch¨aﬄer, J. Cryst.Growth , 722-725 (2007)[3] H. Groiss, I. Daruka, K. Koike, M. Yano, G. Hesser, G.Springholz, N. Zakharov, P. Werner, F. Sch¨aﬄer, APLMaterials , 012105 (2014)[4] G. Karczewski, M. Szot, L. Kowalczyk, S. Chusnutdinow,T. Wojtowicz, S. Schreyeck, K. Brunner, C. Schumacher,L. W. Molenkamp, Nanotechnology , 135601 (2015)[5] M. Mi´nkowski, M. A. Za(cid:32)luska-Kotur, (cid:32)L. A. Turski, G.Karczewski, J. Appl. Phys. , 124305 (2016)[6] M. Mi´nkowski, M. A. Za(cid:32)luska-Kotur, S. Kret, S. Chus-nutdinow, S. Schreyeck, K. Brunner, L. W. Molenkamp,G. Karczewski, J. Alloys and Compounds , 809-814(2018)[7] M. Buka(cid:32)la, P. Sankowski, R. Buczko, P. Kacman,Nanoscale Research Letters , 126 (2011)[8] R. Leitsmann, L. E. Ramos, F. Bechstedt, Phys. Rev. B , 085309 (2006)[9] T. Chonan, S. Katayama, J. Phys. Soc. Jpn. , 064601(2006)[10] B. Qiu, H. Bao, X. Ruan, ASME 2008 3rd Energy Nan-otechnology International Conference, 2008, p. 45[11] Z.Q. Wang, D. Stroud, A.J. Markworth, Phys. Rev. B , 3129 (1989)[12] J. Oh, C.H. Grein, J. Cryst. Growth , 241 (1998)[13] J. Behler, M. Parrinello, Phys. Rev. Lett. , 146401(2007)[14] J. Behler, J. Phys.: Condens. Matter , 183001 (2014)[15] J. Behler, J. Chem. Phys. , 170901 (2016)[16] J. Behler, R. Martoˇn´ak, D. Donadio, M. Parrinello, Phys.Rev. Lett. , 185501 (2008)[17] G. Kresse and J. Furthm¨uller, Phys. Rev. B , 11169(1996)[18] J. P. Perdew, A. Ruzsinszky, G. I. Csonka, O. A. Vydrov,G. E. Scuseria, L. A. Constantin, X. Zhou, and K. Burke,Phys. Rev. Lett. , 136406 (2008)[19] R. Z. Khaliullin, H. Eshet, T. D. K¨uhne, J. Behler, M.Parrinello, Phys. Rev. B , 100103(R) (2010)[20] T. Morawietz, A. Singraber, C. Dellago, J. Behler, Proc.Natl. Acad. Sci. U. S. A. , 8368-8373 (2016)[21] A. Singraber, T. Morawietz, J. Behler, C. Dellago, J.Phys.: Condens. Matter , 254005 (2018) [22] A. Bali, R. Chetty, A. Sharma, G. Rogl, P. Heinrich, S.Suwas, D. K. Misra, P. Rogl, E. Bauer, R. C. Mallik, J.Appl. Phys. , 175101 (2016)[23] A. J. Strauss, Rev. Phys. Appl. (Paris) , 167-184(1977)[24] R. Leitsmann, F. Bechstedt, Semicond. Sci. Technol. ,014005 (2011)[25] W.-F. Li, C.-M. Fang, M. Dijkstra, M. A. van Huis, J.Phys.: Condens. Matter , 355801 (2015)[26] J. L. Roehl, S. V. Khare, Solar Energy , 245-253(2014)[27] A. Singraber, J. Behler, C. Dellago, J. Chem. TheoryComput. , 1827-1840 (2019)[28] S. Plimpton, J. Comput. Phys. , 1-19 (1995)[29] J. Behler, J. Chem. Phys. , 074106 (2011)[30] A. Singraber, T. Morawietz, J. Behler, C. Dellago, J.Chem. Theory Comput. , 3075-3092 (2019)[31] R. E. Kalman, J. Basic. Eng. , 35-45 (1960)[32] R. E. Kalman, R. S. Bucy, J. Basic. Eng. , 95-108(1961)[33] G. L. Smith, S. F. Schmidt, L. A. McGee, Technical Re-ports R-135 (1962)[34] S. Singhal, L. Wu in ”Advances in Neural InformationProcessing Systems 1”; Ed. D. S. Touretzky; Morgan-Kaufmann; p. 133-140 (1989)[35] H. Qian, M. P. Sheetz, E. L. Elson, Biophys. J. , 910-921 (1991)[36] X. Michalet, Phys. Rev. E , 041914 (2010)[37] M. E. J. Newman, G. T. Barkema, ”Monte Carlo Meth-ods in Statistical Physics”, Oxford University Press(1999)[38] M. C. Rapaport, ”The Art of Molecular Dynamics Sim-ulation”, Cambridge University Press (1995)[39] J. Ma, J. Yang, S.-H. Wei, J. L. F. Da Silva, Phys. Rev.B , 155208 (2014)[40] J.-H. Yang, J.-S. Park, J. Kang, S.-H. Wei, Phys. Rev. B , 075202 (2015)[41] M. P. Gomez, D. A. Stevenson, R. A. Huggins, J. Phys.Chem. Solids , 335-344 (1971)[42] H. Wolf, F. Wagner, Th. Wichert, Phys. Rev. Lett.94