High-throughput discovery of novel cubic crystal materials using deep generative neural networks
Yong Zhao, Mohammed Al-Fahdi, Ming Hu, Edirisuriya MD Siriwardane, Yuqi Song, Alireza Nasiri, Jianjun Hu
HH IGH - THROUGHPUT DISCOVERY OF NOVEL CUBIC CRYSTALMATERIALS USING DEEP GENERATIVE NEURAL NETWORKS
A P
REPRINT
Yong Zhao
Department of Computer Science and EngineeringUniversity of South CarolinaColumbia, SC 29201
Mohammed Al-Fahdi, Ming Hu *
Department of Mechanical EngineeringUniversity of South CarolinaColumbia, SC 29201 [email protected]
Edirisuriya MD Siriwardane, Yuqi Song, Alireza Nasiri
Department of Computer Science and EngineeringUniversity of South CarolinaColumbia, SC 29201
Jianjun Hu *
Department of Computer Science and EngineeringUniversity of South CarolinaColumbia, SC 29201 [email protected]
February 9, 2021 A BSTRACT
High-throughput screening has become one of the major strategies for the discovery of novel func-tional materials. However, its effectiveness is severely limited by the lack of quantity and diversity ofknown materials deposited in the current materials repositories such as ICSD and OQMD. Recentprogress in machine learning and especially deep learning have enabled a generative strategy thatlearns implicit chemical rules for creating chemically valid hypothetical materials with new compo-sitions and structures. However, current materials generative models have difficulty in generatingstructurally diverse, chemically valid, and stable materials. Here we propose CubicGAN, a generativeadversarial network (GAN) based deep neural network model for large scale generation of novelcubic crystal structures. When trained on 375,749 ternary crystal materials from the OQMD database,we show that our model is able to not only rediscover most of the currently known cubic materi-als but also generate hypothetical materials of new structure prototypes. A total of 506 such newmaterials (all of them are either ternary or quarternary) have been verified by DFT based phonondispersion stability check, several of which have been found to potentially have exceptional functionalproperties. Considering the importance of cubic materials in wide applications such as solar cellsand lithium batteries, our GAN model provides a promising approach to significantly expand thecurrent repository of materials, enabling the discovery of new functional materials via screening. Thenew crystal structures finally verified by DFT are freely accessible at Carolina Materials Database . K eywords crystal structure generation · generative adversarial network · deep neural networks · cubic crystals Data-driven accelerated design of new materials is emerging as one of the most promising approaches for addressingthe challenges in finding next-generation materials. Currently, one of the main strategies for materials discovery isscreening existing materials databases [1, 2, 3, 4]. However, such approaches are severely limited by the scale anddiversity of the existing structures in the repositories, such as ICSD and Materials Project (MP), which have about a r X i v : . [ c ond - m a t . m t r l - s c i ] F e b PREPRINT - F
EBRUARY
9, 2021165,000 and 125,000 materials, respectively, compared to the almost infinite chemical design space. For example,lithium compounds are widely used in electric vehicles and mobile phone batteries, but there are only 16,000 differentlithium compounds in the MP database, which has been almost exhaustively screened for better lithium-ion battery[5, 6].Large-scale generation of stable hypothetical crystal structures is strongly needed to significantly expand the currentmaterials repositories in both the quantity and compositional and structural diversity to increase the success rate ofhigh-throughput screening of novel functional materials.The properties of materials are closely linked to their crystal structures. Traditionally, materials scientists discover newmaterials by either trial-and-error or heuristic random-guess approaches, both of which are notoriously labor-intensive.One example database is Inorganic Crystal Structure Database (ICSD) [7], which collects almost all discovered materialssince 1913. To date, only around 165,000 experimental structures are reported in ICSD. Considering the numberof elements in the periodic table and their possible combinations, the design space of materials would be infinitecombinatorially. Hence, better approaches for new materials discovery is needed.Several working directions are investigated for the generation of new materials [8, 9, 10, 11, 12, 13, 14, 15, 16]. Thereare mainly three different ways to generate or discover new crystal structures including doping/element substitution[17,5, 18, 19], composition generation plus crystal structure prediction[9], and generative machine learning models[12,20, 14, 21, 15, 16]. The element substitution approach is the most widely used strategy. But it is subject to theextremely limited known prototype structures in the database compared to the vast chemical design space. The secondapproach can exploit the recently developed generative models [12] to generate a large number of hypothetical materialscompositions and then use crystal structure prediction codes to predict their structures. Many global optimizationmethods have been developed to search the appropriate compositions and structures, including simulated annealing [22],basin hopping [23], minima hopping [24], genetic and evolutionary algorithms [10, 11]. Those approaches generallyguide the searches towards the local minima of free energy to identify the stable or meta-stable structures either byinitial configuration space or chemical composition. However, these crystal structure prediction algorithms are usuallytoo computationally expensive due to their reliance on DFT-based formation energy calculation and can thus onlyhandle relatively simple structures. For complex structures, most of the time, these methods fail to find the ground truthstructures corresponding to the global minimum formation energy.One of the most promising approaches for new materials structure creation is deep generative machine learningmodels [12, 14, 15, 16, 25, 26, 13, 27]. Both variational autoencoder (VAE) [15, 14, 26, 27] and generative adversarialnetworks (GAN) [12, 13, 16, 25] have been adapted for inverse design of inorganic materials with different crystalstructure representations. A VAE model contains two parts: an encoder and a decoder [28, 29, 30]. The encoder partencodes the crystal structure distribution into a latent space, and the decoder reconstructs the material structures fromthe latent space. After training, new material structures can be generated by sampling in the latent space. Conversely,a GAN generator model consists of two neural networks: a discriminator (critic) and a generator, both of whichare trained simultaneously. The discriminator is trained to differentiate real materials from fake ones generated bythe generator, while the generator tries to generate fake materials as real as possible to fool the discriminator. Thenash equilibrium achieved by the discriminator and the generator helps a GAN learn the distribution the materialsimplicitly. In the past few years, several inorganic materials generative models have been proposed. Those worksare limited by their chemical family (e.g. special oxides) [15, 16, 13] or formulas generation [12] or hydrides [25].Noh et al. [15] present a framework for learning a continuous vector for vanadium oxides using VAE, which is trainedon a 3D image-like representation to attain the continuous materials space. Two sampling strategies are used in thelatent space to generate only V x O y materials. Training the VAE model using 3D grid representation is computationallydemanding and memory-hungry. In [16], Kim and Noh et al. trained a composition family-specific GAN model on theMg-Mn-O system using the atom coordinates as the representation of materials. The crystal GAN model is composedof three modules: a generator, a critic (discriminator), and a classifier. The critic calculates the Wasserstein distancebetween real and fake materials [31]. The classifier module ensures that the generator generates desired compositionand atom numbers in the unit cell. However, this model can only be used to generate structures of the Mg-MN-O system,and the model quality is limited by the small dataset since there are only limited known compounds of this chemicalsystem. CrystalGAN [25], proposed by Nouira et al., consists of a cross-domain GAN model, which maps one hydridesystem into another using CycleGAN schema [32]. All these works focus on generating materials of a special materialsystem. In a most recent work, Ren et al.[14] proposed a new VAE model that directly uses the atom coordinates andunit cell lattice parameters to encode the structures. To constrain the neural network model behavior, their invertiblerepresentation encodes the crystallographic information into the descriptors in both real space and reciprocal Fourierspace crystal properties. Their model is trained with 24,785 unique ternary materials and can generate interesting newstructures. However, most of their new structures are generated by perturbing the latent vectors of known materials.Large-scale generation of stable crystal structures remains a challenging problem. Other than generating materialsstructures, Dan et.al. [12] proposed MatGAN to generate millions of novel materials formulas with chemical validity,which expands the candidates for inverse design of new solid materials.2 PREPRINT - F
EBRUARY
9, 2021In this work, we propose a novel deep generative model called
CubicGAN to generate cubic materials structures on alarge scale. Ternary materials selected from the OQMD [33] database are chosen as our training set because of its largesize of materials and diverse compositions. In our model, material structures are represented by their lattice parameters,atom coordinates, element embedding, and the space group. The conditions of a specific space group and three elementsare fed to the generator to generate desired crystal material structures. We trained ternary and a quarternary GANmodels to generate novel cubic (ternary and quarternary) crystal structures of the space groups 216,255,221. Materialsof these three space groups consist of 78.5% of all ternary and quarternary cubic materials in OQMD, covering amajority of known cubic materials space.Our systematic experiments show that our CubicGAN model can recover not only many of the known cubic structuresbut also discover many new materials with new composition prototypes with different anonymous formulas (newprototypes). Additional large-scale DFT based validation has led to the discovery of 506 new cubic crystal materialsof new prototypes. The detail of the CubicGAN model will be explained in the following sections. Comparedto [15, 16, 25], our framework can generate a large variety of materials of different chemical systems. The only workthat is similar to ours in terms of variety of materials is [14], in which Ren et al. use VAE rather than GAN as thegenerative model trained with train ternary materials in Materials Project [34] database. However, their model tends togenerate new samples by interpolation. The second major difference is a much simpler representation is used in ourwork without the momentum space representations.Our contributions can be summarized as follows:• We propose a novel GAN model to generate large-scale cubic materials conditioning on the elements and aspecified spacegroup. In total, we generate 10 million hypothetical ternary and 10 million quaternary crystalstructures for downstream analysis.• We perform three stage checks on generated materials and extensively match the generated materials againstexisting databases. The results show that our method can rediscover a majority of cubic materials in the existingdatabases. In addition, most of the rediscovered materials from MP are confirmed as stable or meta-stablematerials in terms of energy-above-hull.• We perform DFT simulations on 108,897 hypothetical materials, of which 33.8% novel materials are suc-cessfully relaxed. By further analysis, we demonstrate that new crystal structure prototypes (with differentanonymized formula types) can be found, such as ABC -216, ABC D -216, and AB C -221.• By further stability verification, 506 new-prototype materials have been generated and confirmed to be stableby phonon dispersion calculation. In this work, we focus on training generators of ternary and quarternary cubic crystal structures of three space groups(216, 221, 225) to simplify our model design while ensuring coverage of a majority of the cubic design space. We findthat in the OQMD dataset with 813,839 materials, 85.8% of them are ternary or quaternary materials. In addition, outof all the cubic crystals, 97.8% of them belong to these three space groups, again covering the majority of the knowncubic materials space. These three space groups are selected because we find that for the materials of these three spacegroups, most of their nonequivalent atom fractional coordinates in the CIF files have a multiplicative factor of 0.25or belong to this set [0, 0.25, 0.5, 0.75]. So, instead of generating cubic structures with arbitrary real-valued atomcoordinates, we only aim to train a cubic material generator that only generates structures whose atom positions aresitting at positions with their fractional coordinate values to be from this set +/-0, 0.25, 0.5, 0.75. In this case, the specialdiscrete fractional coordinates are much easier to generate accurately by our deep neural networks. This decision hasdramatically simplified our generation model, and thus we choose the training data with these two criteria:ternary andquaternary cubic crystal structures of three space groups (216, 221, 225).
We collect the training data from OQMD [35, 33], which is an open-source database of experimental and DFT-calculatedmaterials. Totally 813839 entries are retrieved from version 1.3 of OQMD. Entries calculated with local-densityapproximation (LDA) are also included. Among them, we successfully build 556,839 and 141,100 POSCAR filesfor ternary and quaternary materials in the OQMD, of which 505,456 and 127,659 structures belong to cubic crystalsystems, respectively. After converting the POSCAR files to symmetrized CIF files, 411,646 ternary materials have three3
PREPRINT - F
EBRUARY
9, 2021unique nonequivalent atom sites, of which 388,680 materials of cubic crystal systems are found; 129,514 quaternarymaterials have four unique nonequivalent atom sites, of which 127,523 materials belong to the cubic crystal systems.Table 1 shows the statistics of OQMD materials distributions. We can find that ternary materials of cubic crystal systemsare the largest chuck (91%) out of all ternary materials. Similarly, it is observed that the ternary cubic structures with 3nonequivalent sites are 94% out of all ternary materials with 3 nonequivalent sites. For quaternary materials, these twopercentages are 90% and 98%, respectively. This means that our CubicGAN model can be used to generate hypotheticalcubic materials that are the majority type of known material category.Table 1: Statistics of OQMD ternary and quarternary materials (Total 813,839)Ternary Ternary cubic Ternary with 3nonequivalent sites Ternary cubic with 3nonequivalent sitesCount 556,839 505,456 411,646 388,680Cubic Percentage 505456/556839=91% 388,680/411,646=94%Quaternary Quaternary cubic Quaternary with 4nonequivalent sites Quaternary cubic with 4nonequivalent sitesCount 141,100 127,659 129,514 127,523Cubic Percentage 127659/141100=90% 127,523/129,514=98%Another key criterion for selecting our training samples is that we only pick cubic structures with three nonequivalentatom positions (in CIF files) for training ternary GAN model (for quarternary GAN, the number is 4). Making thischoice allows us to use a unified matrix of dimension ( × ) to represent all ternary cubic materials (for quarternarymaterials, the dimension is × where only one space group is used in this work). For a given material, once we haveits nonequivalent positions and space group, the full atom positions within the unit cell can be converted to conventionalatom positions by symmetry operations. We have identified 411,646 ternary materials with only three nonequivalentpositions, of which 388,680 (94%) materials belong to cubic crystal systems as shown in Table 1. Out of these 388,680materials, 22 space groups are found as shown in supplementary Figure 1,. Among them, the space groups that have themost numbers of materials are Fm ¯3 m and F ¯4
3m (the total portion of these two space groups is 97.2%). Pm ¯3 m is thethird one with only 6,462 samples or 1.7%. After removing the duplicate cubic materials within MP and ICSD, almostall the materials (375,749 out of 384,215) with the space groups of Fm ¯3 m, F ¯4
3m and Pm ¯3 m follow this criterion.Table 2 shows the overall statistics of our finalized training and validation datasets. In total, we have selected 375,749ternary materials from three cubic system space groups from OQMD to form the OQMD-TC3 (T:Ternary, C-Cubic,3-three space groups) training dataset: Fm ¯3 m, F ¯4 ¯3 m each having 186,344, 184,162 and 5,243 materialsrespectively. These materials together correspond to 249,646 unique formulas. With this diversity of formulas, ourCubicGAN model can efficiently learn valid combinations of ternary elements. The unique 84 elements in the datasetsare utilized to generate random three-element combinations during GAN training. The same steps are applied toquaternary materials in OQMD. As shown in supplementary Figure 1, materials with space group F ¯4
3m occupies 95%of the quaternary data. So for training the quaternary GAN model, we only choose materials of space group F ¯4 ¯3 m, F ¯4
3m and Pm ¯3 m are 4576, 520, and 1449, respectively andthere are 6,431 unique formulas existing in the whole retrieved data. From the ICSD database, 1,875 cubic materials arefound to satisfy our seleciton criteria, of which the numbers of materials are 804, 280, and 791 forf space groups Fm ¯3 m,F ¯4
3m and Pm ¯3 m.. For quaternary materials, the OQMD-QC1 training dataset has 121,008 samples. However, only 39and 8 quaternary materials are found in MP and ICSD that satisfy our two selection criteria (SeeTable 2). We haveremoved these samples from our training dataset selected from OQMD by removing the crystal structures with a minordifference of cube lengths from the samples in the validation sets).4 PREPRINT - F
EBRUARY
9, 2021Table 2: Statistics of our training data and validation datasets from OQMD, MP and ICSD. Here the cubic materials(6,545+1,875) from MP/ICSD are used as validation set and are excluded from the training set.
Ternary Materials
Dataset Total Fm ¯3 m F ¯4
3m Pm ¯3 m Unique formula Unique elementTraining:OQMD-TC3 375,749 186,344 184,162 5,243 249,646 84Validation:MP-TC3 6,545 4,576 520 1,449 6,431 84Validation:ICSD-TC3 1,875 804 280 791 1,034 84 Quaternary Materials
Dataset Total Unique formula Unique elementTraining:OQMD-QC1 121,008 39,767 56Validation:MP-QC1 39 39 39Validation:ICSD-QC1 8 7 12In terms of prototypes in the validation datasets MP-TC3 and ICSD-TC3, supplementary Table 3 shows details ofthe existing prototypes for materials that satisfy our selection criteria. We take the prototype "ABC2-225" as anexample. Here ABC2 and 225 are the crystal prototype anonymous formula and the space group number used todenote a prototype, and we will use this format in the following content. Overall, the three databases have the sameset of prototypes; other than that, MP has an extra one: AB6C6-225. However, only one material (mp-1147668) isfound under AB6C6-225 and is unstable. For quaternary materials in OQMD, there are only two prototypes, includingABCD-216, with 121,006 materials and ABCD -216 with two materials. Moreover, we find that quaternary cubicmaterials distribution is highly biased with 121,018 belonging to space group 216, and only 5674 belonging to spacegroup 225, and no samples found for space group 221. For simplicity, we train the quaternary CubicGAN using onlythe samples from space group 216 and it then can only generate samples of this space group. Figure 1: The workflow of our CubicGAN frameworkFigure 1 illustrates the main framework of our method. The framework primarily contains two steps: GAN trainingand material generation. Our goal is to train a generator that learns the distribution from known materials data andthen sample from it. To achieve this, the generator is trained to create fake material structures, conditioned on a givenspace group and a specification of three elements. The three elements are randomly chosen from 84 elements in thedataset. The 84 elements are one-hot encoded and are converted to a × element matrix by the embedding layer. Theparameters of the embedding layer are initialized by 23 element properties as shown in supplementary Table 1. Taking5 PREPRINT - F
EBRUARY
9, 2021a randomly selected space group (one-hot encoded), 3-element combinations (one-hot encoded), and random noise Z as inputs, the generator then generates material structures with the specified space group and element constituents.Space groups and elements are mapped into dense vectors by their corresponding embedding layers. The number ofatoms for each element does not need to be specified as it can be determined by the space group symmetry operations.The random selections of space groups are based on the portions of three cubic space groups considered in our model:Fm ¯3 m, F ¯4
3m and Pm ¯3 m. The detailed architecture is shown in Supplementary Figure 2.An input to the discriminator has four parts: nonequivalent atomic coordinates, element properties, unit cell parameters,and space groups as shown in Figure 1. The coordinates part includes the fractional coordinates of three nonequivalentatoms. For three unique elements in each material, each element is represented by 23 properties as shown in thesupplementary Table 1. Since the lattice lengths a, b, c are the same in cubic crystals, we only need to use one valueto represent it. Three cubic space groups are one-hot encoded. As shown in supplementary Figure 2, four partsare concatenated together to form a tensor with the dimension of × . The input is then forwarded to four 1Dconvolutional layers, of which the kernel size is × , which is used to capture the implicit relationships among the fourparts. We use two CNN layers to reduce the dimension from three to one. Then, a few fully connected layers are usedto map them to Wasserstein estimation [31]. The detailed network settings are shown in supplementary Figure 2. Instandard conditional GAN, the input of the generator includes the random noise and a condition vector [36]. Here, weadd a space group embedding layer and an element embedding layer as shown in Figure1 to map the randomly selectedone-hot encoded space group (chosen from 216/221/225) and three randomly selected elements (one-hot encoded) intothe latent vectors. The reasons for this design are as follows: 1) As only three dominant cubic space groups are used inthis work, the combination of atom positions with corresponding elements, unit cell lengths, and one-hot encoded spacegroup symmetry is sufficient to describe a material structure; 2) Using element properties as part of the representationmakes the generator learn to generate chemically valid materials, e.g., structures that do not violate Pauling’s rules. Asour previous work [12] shows, the composition constraints can be learned from the compositions of existing materials.Here, our CubicGAN is also configured to learn both implicit compositional as well as structural constraints to help thegenerator generate only valid ternary or quaternary formulas as much as possible; 3) Our 2D representations of thecubic structures also matches well with the convolutional layers used in the discriminator, in which the convolutionaloperations can extract implicit relationships among four parts of information.The generator and the discriminator of the CubicGAN model are trained with the loss function of Wassersteindistance [31] which measures the dissimilarity between distribution differences of real and fake materials. Compared toloss functions used in traditional GAN [37], Wasserstein distance improves the model stability and prevents the modecollapse. We use the gradient penalty to clip weights in order to improve the stability of training as done by Gulrajani etal. [38]. The penalty of gradient norm with respect to the inputs works as a regularization term to stabilize the trainingprocess of the GAN. More formally, our cost function for GAN training is as follows: L = E ˜ x ∼P g [ D (˜ x )] − E x ∼P r [ D ( x )] + λ E ˆ x ∼P ˆ x [( (cid:107)∇ ˆ x D (ˆ x ) (cid:107) − ] (1)where D is the discriminator, P ˆ x is the distribution of interpolated samples between the distribution of real materials P r ,and the distribution of generated materials P g . λ is the balancing parameter, which is set to 10 in this work.After inspecting the generated structures by the GAN, we find that the generated lattice parameter a is often not goodenough, leading to overlapping atom clusters. To address this issue, an additional post-processing step introduced topredict the lattice length a using a composition based machine learning model that we recently developed [39], whichachieves a R score of 0.979 for cubic lattice a prediction.During training the GAN, real materials are randomly picked in batches. With the fused matrix of generated materialsas shown in Figure 1, they are fed to the discriminator in a mixed manner. We set the number of iterations ofthe discriminator per generator iteration as 5. The GAN model is developed using the open-source libraries ofTensorFlow [40] and Keras [41]. More details regarding model architecture and hyper-parameter setting can be foundin Supplementary Table 2 of the supplementary materials.6 PREPRINT - F
EBRUARY
9, 2021 P e r ce n t a g e Number of sampling materials (10000)
Pymatgen-readable ValidityUnique Structure Unique Formula (a) Validity, structural and compositional uniqueness ofCubicGAN in terms of number of samplings. P e r ce n t a g e o f m a t e r i a l s r e - d i s c ov e r e d by GAN
Number of sampling materials (10000)
OQMD (training) MP ICSD (b) Rediscover rates of CubicGAN in term of the number ofsampled materials.
Figure 2: Performance evaluation of CubicGANThere are three major criteria for evaluating generative models, namely, validity, diversity, and uniqueness [42]. Aftertraining the ternary CubicGAN using the OQMD-TC3 dataset, we generate 10 million cubic structures of the specifiedthree cubic space groups (225,216,221). The proportions of the samplings are set as identical to the training set, whichis 49.6%-49.0%-1.4% respectively. To evaluate the generation performance, we first check how the percentage ofthe generated charge-neutral samples changes with respect to the total number of generated samples. The chargeneutrality check is based on Pymatgen [43] using the common valence values of elements as defined in Pymatgen. Asshown in Figure2a, the charge-neutral samples’ percentage maintains around 41% over the whole process of generating10 million samples, which means that when we generate 10 million samples, appropriately we can get 4.1 millioncharge-neutral samples for downstream screening. We then checked how the percentage of the generated sampleshave pymatgen-readable CIFs (Crystallographic Information File), unique CIFs, and unique formulas, which reflectthe diversity and uniqueness of the generator. In Figure 2a, the blue line demonstrates the percentage of cifs readableby pymatgen in terms of sampling size, and the sampling size is from ten thousand to ten million. In this work,pymatgen-readable means that CIFs can be recognized as the space group that is assigned to. We can find that thepercentage of readable CIF files is stable no matter how we run the sampling. After removing the duplicate materials,we calculate the percentage of unique CIFs and unique formulas as denoted by the yellow and red lines in Figure 2a.Only those materials that have the same formula and the same corresponding atom positions are considered as duplicateshere. It is found that the percentages of the unique CIFs and unique formulas are decreasing and growing flat. Fromthese observations, we believe that our GAN model might have explored the majority of the cubic crystal structurespace but have not exhausted it yet.Another effective way to evaluate the CubicGAN’s performance is to check how soon it can rediscover the knowncubic crystals in leave-out datasets of existing databases. To do this, in our training dataset, we have removed all thematerials of the three cubic space groups (216,225,221) existing in MP and ICSD databases, which are 6,545 and 1,875,respectively. It is interesting to see how many of those leave-out cubic materials can be rediscovered by our GAN modelas the sampling size goes from ten thousand to ten million. Figure 2b shows how the rediscovery percentages of thecubic crystals of the three space groups (216,225,221) change as the sampling size increases.Figure 2b shows the rediscovery rates over time of sampling. At first, we check how the percentage of the rediscoveredcubic samples out of all training samples (blue line) changes while generating more samples. It is found that this trainingset rediscovery rate increases consistently over the sampling process. It soars quickly to 88% when the sampling sizeincreases until 5 million samplings are reached. At the end of 10 million samplings, the rediscovery rate reaches 95.5%.Similar patterns can be observed for the rediscovery rate curve for the MP-TC3 validation dataset, as shown by the7
PREPRINT - F
EBRUARY
9, 2021green line. With the increasing number of samplings, the rediscovery rate reaches 72.0%. This saturated percentage ismuch lower than that of the training set, which is due to MP-TC3 data has different proportions over the three spacegroups (225,216,221), which are 69.9%-7.9%-22.1% respectively compared to 49.6%-49.0%-1.4% of the training set.Since our generation process is based on the space group proportions of the training set (which focuses on generatingcandidates of space groups 225 and 216, the 72% rediscovery rate is close to the percentage of these two types ofsamples in MP-TC3 (69.9%+7.9%=77.8%). We also find that half of the rediscovered materials in MP-TC3 are stablebased on the formation energy and e-above-hull criteria. Details of the stability analysis can be found in SupplementaryFigure 3. The rediscovery rate pattern over ICSD-TC3 is similar to that of MP-TC3 except that the highest rediscoveryrate is 50.7% at the sampling size of ten million, which is close to the percentage of total samples of space groups225 and 216 (42.9%+14.9%=56.8%). These high rediscovery rates over the training set and the two validation setsdemonstrate that our CubicGAN has learned the implicit chemical rules of the cubic structures to generate in a muchbetter way than random sampling. After the sampling size reaches 7 million, the number of materials rediscoveredconverges, indicating that ten million samplings could be a reasonable size to cover most of the cubic structures sincethey seem to have almost exhaustively explored the search space of materials that meet our criteria. Therefore, we useten million samplings for further analysis.To compare how our CubicGAN performs compared to random sampling or exhaustive enumeration, we calculate theenrichment score for our ternary CubicGAN. As we are searching candidates of three cubic space groups with threeunique sites of three distinct elements and the only possible fraction coordinates are 0,0.25,0.5,0.75, the total possibilityof configurations are (4 ) ∗ ∗ ∗ ∗ , , , , which is much larger than the correspondingcombinations of the ternary composition space [12]. Considering that with 10 million samplings, we have rediscovered95.5% of the OQMD-TC3 dataset, the enrichment score is approximately 44,507, which is a significant boosting forgenerating chemically valid crystal structures compared to exhaustive enumeration. Table 3: Statistics of generated materialsvalid CIFs unique formulas crystal prototypesTernary 2,558,678 990,319 31 (24)Quaternary 5,498,267 1,797,592 3 (1)No Lanthanoid and ActinoidTernary 1,064,650 403,337 31 (24)Quaternary 4,382,130 1,431,500 3 (1)While rediscovery rate analysis over the MP-TC3 and ICSD-TC3 validation sets have demonstrated the acceleratedsampling in cubic structure space, there are only 6,545+1,875=8,420 validation samples plus the 358,840 rediscoveredtraining samples. It is still desirable to check the chemical validity of the remaining 96.33% generated samples andfilter out those promising new materials. With 10 million hypothetical cubic materials, it is impractical to performDFT calculations for all of them to verify their chemical validity and stability. Here we adopt three stages of validationcheck to reduce the pool of samples for DFT validation. We use the CGCNN based graph neural network model forformation energy prediction, which was trained with samples from Materials Project database[44]. Then we scanthe generated materials in the order of space group match, charge neutrality, and formation energy filtering. Thenonequivalent coordinates are transformed by symmetry operations provided by relevant space groups used whengenerating samples. With the full coordinates set, elements, unit cell parameters ( unit cell length a and angles, whichare always 90 degrees in cubic systems), we could write a Crystallographic Information File. The space group checkis performed by Pymatgen [43] in the first place (we refer to this check as a Pymatgen-recognizable check). If thegenerated sample cannot be recognized by Pymatgen or the space group analyzed by Pymatgen is not consistent with thespace group given to the generated sample, this sample is considered as a failed generated case. As shown in Table 3, intotal there are 2,558,678 and 5,498,267 valid ternary and quaternary CIFs have been found from 10 million generations,respectively. From them, candidate materials with charge neutrality and CGCNN-predicted negative formation energyare reserved for further DFT calculations based verification.A major evaluation of our CubicGAN model is to check whether it can generate new cubic materials with novelprototypes, which are represented by distinct anonymized formulas in Pymatgen. As shown in Table 3, we find that 24and 1 novel prototypes for ternary and quaternary materials, respectively, have been found in our generated samples thatare not existent in our training data. For relieving the burden of DFT calculations, we choose to remove the samples thatcontain Lanthanoid and Actinoid elements. In total, 1,064,650 ternary materials are left, of which 209,744 materials areof new crystal prototypes. The distribution of prototypes for 1,064,650 materials is shown in Supplementary Figure 4.8 PREPRINT - F
EBRUARY
9, 2021Similarly, 4,382,130 quaternary materials are left after removing Lanthanoid and Actinoid elements, of which 260,891materials are of the new crystal prototype (the prototype ABC D -216). Since only two ABCD -216 materials exist inthe quaternary training dataset OQMD-QC1, we also include ABCD -216 materials for the downstream DFT analysisconsidering the huge number of generated ABCD -216 samples (1,655,407). After searching thoroughly in databasesof OQMD, MP, and ICSD, only a limited number of materials with ABCD -216 are found, as shown in SupplementaryTable 4. Then, we perform charge neutrality check by Pymatgen and CGCNN formation energy filtering on 209,744ternary materials and 1,916,298 (260,891 + 1,655,407) quaternary materials. While each material might have differentatom arrangements in the unit cell that maps to the same space group, in this work, we only choose one of them forDFT calculations. Finally, 17,303 ternary materials and 91,594 quaternary materials are left for DFT optimization. Intotal, 36847 candidate materials have been relaxed successfully with 14,433 ternary and 22,414 quaternary samples. After filtering down materials with novel prototypes, we perform DFT optimization on materials with CGCNN-predictednegative formation energy, and we use Γ points and mechanic constants to further scale down the successfully relaxedstructures. Phonon dispersion is the eventual criterion to determine the stability of structures. Gamma points and mechanic constants filtering
The vibrational frequencies at the Γ point together with the elasticconstants of screened structures were obtained by calculating the Hessian matrix (matrix of the second derivativesof the energy with respect to the atomic positions) [45], which can be done by setting IBRION=6 (NFREE= 4) inVASP run. For cubic structures, the mechanical stability of lattice structures is verified as C > , C > , C > | C | , C + 2 C > , where C ij are components of elastic constant matrix [46]. After screening the materials withthe mechanical criteria, we further narrow-down the materials by checking the vibrational frequencies at the Γ point.All materials with negative Γ point frequencies were discarded. Phonon Dispersion calculation
After the structures pass the mechanical stability criteria and all Γ point frequenciesare positive, we further calculate the full phonon dispersions in the first Brinounion zone (BZ). All nd interatomic forceconstants (IFCs) of the cubic structures were computed in a 2x2x2 supercell based on their corresponding primitivecell. Then, the phonon dispersions were calculated by using the PHONOPY package [47] with high symmetry paths Γ → X → U → K → Γ → L → W → X [48].In total, four prototypes with stable materials are discovered: ABC -216, AB C -225, ABCD -216, and ABC D -216.The details of the number of materials for each prototype are shown in Supplementary Figure 5. To our best ofknowledge, ABC -216 and ABC D -216 are novel prototypes that are not in our training dataset, and the validation setsMP-TC3 and ICSD-TC3. Also, the AB C -225 prototype is not in the training dataset and only one unstable materialcan be found in MP. However, our method finds 42 stable ones. Two materials of ABCD -216 prototype are in thetraining dataset, and several others are in MP and ICSD. We expand the datasets by finding 62 stable materials ofprototype ABCD -216. Overall, we find 183 stable ternary materials and 323 stable quaternary materials. Figure 3shows four newly discovered stable cubic materials with their phonon dispersion curves. The CIF files of the 506 newprototype materials can be found in the supplementary file.Some interesting features have been observed from the phonon dispersions of newly discovered materials. For instance,a couple of hundred cubic structures we have screened out possess significant but tunable phonon bandgaps (e.g., CaCO as shown in Figure 3(a)). Such phonon bandgaps could lead to extraordinary hot carrier performance [49, 50, 51, 52],which is very promising for their potential application in photovoltaics, nonlinear optics (e.g, ultrashort pulsed lasers),multi-exciton generation devices, and even photocatalysis. Large phonon bandgaps at extremely high frequencies(such as H-containing materials not shown herein) deserve further investigation for their electron-phonon couplingproperties [53, 54, 55], which could be beneficial for designing novel superconductors. Also, there are many cubicmaterials possessing very soft acoustic modes, e.g., the longitudinal acoustic (LA) phonon branch in KYNbSi (Figure 3(c)), which indicate strong phonon anharmonicity and could be good candidates for waste-heat energyrecovery (thermoelectrics). Last but not least, the phonon dispersion of Y AlTe structure exhibits a very largegradient in high-frequency optical phonon modes and thus their phonon group velocities will be very high, whichcould lead to a significant contribution to the overall thermal transport from these optical modes and thus unusualtemperature-dependent lattice thermal conductivity [56]. 9 PREPRINT - F
EBRUARY
9, 2021
CaCO Li N Cl KYNbSi Y AlTe As(a) CaCO (b) Li N Cl (c) KYNbSi (d) Y AlTe As Figure 3: Examples of four new prototype materials. Top row is the crystal structures and the second row withcorresponding phonon dispersion. (a) Distribution of ABC materials and the training samples ofspace group F ¯4
3m (b) Local zoomed region in (a)
Figure 4: Visualization of the structural distributions of the materials in training set and the generated new-prototypeABC materials both belonging to the space group of F ¯4 -216 materials.From Figure 4(a), we can find that new prototype materials (dark green dots) form distinct clusters, and there areapparent boundaries between known and unknown materials, which indicates that our model can generate materialsbeyond the scope of existing prototypes with significant structure deviations. Figure 4(b) shows a zoomed region of10 PREPRINT - F
EBRUARY
9, 2021clusters of novel ABC -216 materials, which implies that even samples of the same prototype can form structurallydifferent clusters. For the other three prototypes, the distribution of known structures and our new-prototype structuresare shown in Supplementary Figure 6- Figure 9. In Supplementary Figure 6, we find that the new-prototype ( AB C )materials are mostly located at the peripheral regions of know materials clusters, indicating their structural closenessto known structures. In contrast, supplementary Figure 7 shows that materials of two new-prototypes (ABC andAB C ) tend to form distinct clusters from known cubic materials in MP-TC3 and ICSD-TC3 validation sets, indicatingtheir structural deviation from known materials. Additionally, for most of these new-prototype clusters, we haveidentified one or more DFT-verified stable materials. Supplementary Figure 8 shows that materials of new-prototypeABCD form multiple new clusters, each of which contains multiple DFT-verified stable materials. Instead, materialsof new-prototype ABC D form much fewer cluster compared to the training set OQMD-TC3. Large scale generation of new materials with distinct structures and functions are highly desirable for widely used high-throughput screening based materials discovery. Faced with astronomically large structural design space (compared tothe space of the chemical compositions), the generator models have to exploit the implicit sophisticated physicochemicaland geometric rules and constraints embedded in the existing crystal materials. Here we propose a novel GAN-baseddeep generative model for large-scale generation of three major types (space groups:216, 225, 221) of cubic materialsstructures. Trained with 375,749 ternary cubic crystal structures from OQMD, our CubicGAN model can rediscovermost of the known cubic structures as curated over more than 100 years of history within 10 million samplings.Especially, further analysis shows that our GAN model can generate not only new materials of existing prototypes butalso new-prototype materials with distinct structural novelty. In total, we have identified 24 new prototypes of cubicmaterials. With rigorous DFT-based relaxation and phonon dispersion calculation, we have identified and verified 506new-prototype cubic materials, which are shared via our pubic database. From them, we have already identified severalcrystal structures with exceptional properties to be exploited in future. Together, our CubicGAN has demonstrated apromising path to large-scale generation and discovery of new materials.
Conceptualization, J.H.; methodology, Y.Z.,J.H.,M.H.; software, Y.Z.,J.H.,Y.S.; validation, Y.Z., M.H., J.H., D.S.,M.A.; investigation, J.H., Y.Z., M.H., D.S., Y.S.; resources, J.H.; data curation, Y.Z. and J.H.; writing–original draftpreparation, Y.Z., J.H. and M.H.; writing–review and editing, J.H., M.H.,Y.S,A.N.; visualization, Y.Z.,Y.S.; supervision,J.H. and M.H.; funding acquisition, J.H. and M.H.
All the training data are downloaded from OQMD and Materials Project website.
Research reported in this work was supported in part by NSF under grant 1940099. This material is based upon worksupported by the National Science Foundation Harnessing the Data Revolution Big Idea under Grant No. 1905775. Theviews, perspective, and content do not necessarily represent the official views of the NSF. We thank Yuxin Li for hishelp on lattice parameter prediction. 11
PREPRINT - F
EBRUARY
9, 2021
References [1] Philipp Wollmann, Matthias Leistner, Ulrich Stoeck, Ronny Grünker, Kristina Gedrich, Nicole Klein, OliverThrol, Wulf Grählert, Irena Senkovska, Frieder Dreisbach, et al. High-throughput screening: speeding up porousmaterials discovery.
Chemical Communications , 47(18):5151–5153, 2011.[2] Austin D Sendek, Qian Yang, Ekin D Cubuk, Karel-Alexander N Duerloo, Yi Cui, and Evan J Reed. Holisticcomputational structure screening of more than 12000 candidates for solid lithium-ion conductor materials.
Energy& Environmental Science , 10(1):306–320, 2017.[3] Kyoungdoc Kim, Logan Ward, Jiangang He, Amar Krishna, Ankit Agrawal, and C Wolverton. Machine-learning-accelerated high-throughput materials screening: Discovery of novel quaternary heusler compounds.
PhysicalReview Materials , 2(12):123801, 2018.[4] Murat Cihan Sorkun, Séverin Astruc, JM Vianney A Koelman, and Süleyman Er. An artificial intelligence-aidedvirtual screening recipe for two-dimensional materials discovery. npj Computational Materials , 6(1):1–10, 2020.[5] Yunsheng Liu, Shuo Wang, Adelaide M Nolan, Chen Ling, and Yifei Mo. Tailoring the cation lattice for chloridelithium-ion conductors.
Advanced Energy Materials , 10(40):2002356, 2020.[6] Adelaide M Nolan, Yizhou Zhu, Xingfeng He, Qiang Bai, and Yifei Mo. Computation-accelerated design ofmaterials and interfaces for all-solid-state lithium-ion batteries.
Joule , 2(10):2016–2046, 2018.[7] G Bergerhoff, ID Brown, F Allen, et al. Crystallographic databases.
International Union of Crystallography,Chester , 360:77–95, 1987.[8] AR Oganov, Andriy Lyakhov, Mario Valle, and Gilles Frapper. Crystal structure prediction using the uspex code.In
CECAM-Workshop Lausanne , pages 22–26, 2012.[9] Artem R Oganov, Chris J Pickard, Qiang Zhu, and Richard J Needs. Structure prediction drives materials discovery.
Nature Reviews Materials , 4(5):331–348, 2019.[10] Yanchao Wang, Jian Lv, Li Zhu, and Yanming Ma. Crystal structure prediction via particle-swarm optimization.
Physical Review B , 82(9):094116, 2010.[11] Colin W Glass, Artem R Oganov, and Nikolaus Hansen. Uspex—evolutionary crystal structure prediction.
Computer physics communications , 175(11-12):713–720, 2006.[12] Yabo Dan, Yong Zhao, Xiang Li, Shaobo Li, Ming Hu, and Jianjun Hu. Generative adversarial networks(gan) based efficient sampling of chemical composition space for inverse design of inorganic materials. npjComputational Materials , 6(1):1–7, 2020.[13] Teng Long, Nuno M Fortunato, Ingo Opahle, Yixuan Zhang, Ilias Samathrakis, Chen Shen, Oliver Gutfleisch, andHongbin Zhang. Ccdcgan: Inverse design of crystal structures. arXiv preprint arXiv:2007.11228 , 2020.[14] Zekun Ren, Juhwan Noh, Siyu Tian, Felipe Oviedo, Guangzong Xing, Qiaohao Liang, Armin Aberle, Yi Liu,Qianxiao Li, Senthilnath Jayavelu, et al. Inverse design of crystals using generalized invertible crystallographicrepresentation. arXiv preprint arXiv:2005.07609 , 2020.[15] Juhwan Noh, Jaehoon Kim, Helge S Stein, Benjamin Sanchez-Lengeling, John M Gregoire, Alan Aspuru-Guzik,and Yousung Jung. Inverse design of solid-state materials via a continuous representation.
Matter , 1(5):1370–1384,2019.[16] Sungwon Kim, Juhwan Noh, Geun Ho Gu, Alán Aspuru-Guzik, and Yousung Jung. Generative adversarialnetworks for crystal structure prediction. arXiv preprint arXiv:2004.01396 , 2020.[17] Geoffroy Hautier, Chris Fischer, Virginie Ehrlacher, Anubhav Jain, and Gerbrand Ceder. Data mined ionicsubstitutions for the discovery of new compounds.
Inorganic chemistry , 50(2):656–663, 2011.[18] Jimmy-Xuan Shen, Matthew Horton, and Kristin A Persson. A charge-density-based general cation insertionalgorithm for generating new li-ion cathode materials. npj Computational Materials , 6(1):1–7, 2020.[19] Yuqi Song, Edirisuriya M Dilanga Siriwardane, Yong Zhao, and Jianjun Hu. Computational discovery of new 2dmaterials using deep learning generative models. arXiv preprint arXiv:2012.09314 , 2020.[20] Juhwan Noh, Geun Ho Gu, Sungwon Kim, and Yousung Jung. Machine-enabled inverse design of inorganic solidmaterials: promises and challenges.
Chemical Science , 11(19):4871–4881, 2020.[21] Yousung Jung. Machine learning approaches for materials discovery: Predictive and generative models. In
Telluride Workshop Machine Learning and Informatics for Chemistry and Materials . Telluride Science ResearchCenter, 2018. 12
PREPRINT - F
EBRUARY
9, 2021[22] LT Wille. Searching potential energy surfaces by simulated annealing.
Nature , 324(6092):46–48, 1986.[23] David J Wales and Jonathan PK Doye. Global optimization by basin-hopping and the lowest energy structures oflennard-jones clusters containing up to 110 atoms.
The Journal of Physical Chemistry A , 101(28):5111–5116,1997.[24] Stefan Goedecker. Minima hopping: An efficient search method for the global minimum of the potential energysurface of complex molecular systems.
The Journal of chemical physics , 120(21):9911–9917, 2004.[25] Asma Nouira, Nataliya Sokolovska, and Jean-Claude Crivello. Crystalgan: learning to discover crystallographicstructures with generative adversarial networks. arXiv preprint arXiv:1810.11203 , 2018.[26] Callum Court, Batuhan Yildirim, Apoorv Jain, and Jacqueline M Cole. 3-d inorganic crystal structure generationand property prediction via representation learning.
Journal of Chemical Information and Modeling , 2020.[27] Vadim Korolev, Artem Mitrofanov, Artem Eliseev, and Valery Tkachenko. Machine-learning-assisted search forfunctional materials over extended chemical space.
Materials Horizons , 7(10):2710–2718, 2020.[28] Geoffrey E Hinton and Ruslan R Salakhutdinov. Reducing the dimensionality of data with neural networks. science , 313(5786):504–507, 2006.[29] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 , 2013.[30] Carl Doersch. Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908 , 2016.[31] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875 , 2017.[32] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation usingcycle-consistent adversarial networks. In
Proceedings of the IEEE international conference on computer vision ,pages 2223–2232, 2017.[33] Scott Kirklin, James E Saal, Bryce Meredig, Alex Thompson, Jeff W Doak, Muratahan Aykol, Stephan Rühl, andChris Wolverton. The open quantum materials database (oqmd): assessing the accuracy of dft formation energies. npj Computational Materials , 1(1):1–15, 2015.[34] Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, ShreyasCholia, Dan Gunter, David Skinner, Gerbrand Ceder, et al. Commentary: The materials project: A materialsgenome approach to accelerating materials innovation.
Apl Materials , 1(1):011002, 2013.[35] James E Saal, Scott Kirklin, Muratahan Aykol, Bryce Meredig, and Christopher Wolverton. Materials design anddiscovery with high-throughput density functional theory: the open quantum materials database (oqmd).
Jom ,65(11):1501–1509, 2013.[36] Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 ,2014.[37] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville,and Yoshua Bengio. Generative adversarial nets. In
Advances in neural information processing systems , pages2672–2680, 2014.[38] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved trainingof wasserstein gans. In
Advances in neural information processing systems , pages 5767–5777, 2017.[39] Yuxin Li, Wenhui Yang, Rongzhi Dong, and Jianjun Hu. Mlatticeabc: generic lattice constant prediction of crystalmaterials using machine learning. arXiv preprint arXiv:2010.16099 , 2020.[40] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, SanjayGhemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A system for large-scale machine learning. In { USENIX } symposium on operating systems design and implementation ( { OSDI } , pages 265–283, 2016.[41] François Chollet et al. Keras. https://keras.io , 2015.[42] Benjamin Sanchez-Lengeling and Alán Aspuru-Guzik. Inverse molecular design using machine learning: Genera-tive models for matter engineering. Science , 361(6400):360–365, 2018.[43] Shyue Ping Ong, William Davidson Richards, Anubhav Jain, Geoffroy Hautier, Michael Kocher, Shreyas Cholia,Dan Gunter, Vincent L Chevrier, Kristin A Persson, and Gerbrand Ceder. Python materials genomics (pymatgen):A robust, open-source python library for materials analysis.
Computational Materials Science , 68:314–319, 2013.[44] Tian Xie and Jeffrey C Grossman. Crystal graph convolutional neural networks for an accurate and interpretableprediction of material properties.
Physical review letters , 120(14):145301, 2018.[45] Yvon Le Page and Paul Saxe. Symmetry-general least-squares extraction of elastic data for strained materialsfrom ab initio calculations of stress.
Phys. Rev. B , 65:104104, Feb 2002.13
PREPRINT - F
EBRUARY
9, 2021[46] S.H. Zhang and R.F. Zhang. Aelas: Automatic elastic property derivations via high-throughput first-principlescomputation.
Computer Physics Communications , 220:403 – 416, 2017.[47] A Togo and I Tanaka. First principles phonon calculations in materials science.
Scr. Mater. , 108:1–5, Nov 2015.[48] Yoyo Hinuma, Giovanni Pizzi, Yu Kumagai, Fumiyasu Oba, and Isao Tanaka. Band structure diagram paths basedon crystallography.
Computational Materials Science , 128:140 – 184, 2017.[49] S Chung, S Shrestha, X Wen, Y Feng, N Gupta, H Xia, P Yu, J Tang, and G Conibeer. Evidence for a largephononic band gap leading to slow hot carrier thermalisation. In
IOP Conference Series: Materials Science andEngineering , volume 68, page 012002. IOP Publishing, 2014.[50] B. Thapa, M. Dubajic, M. P. Nielsen, R. Patterson, G. Conibeer, and S. Shrestha. Hot carrier dynamics in nitrogen– rich hafnium nitride thin film. In , pages 0793–0797,2020.[51] Xiu Zhang, Hai-Ying Song, X. C. Nie, Shi-Bing Liu, Yang Wang, Cong-Ying Jiang, Shi-Zhong Zhao, GenfuChen, Jian-Qiao Meng, Yu-Xia Duan, and H. Y. Liu. Ultrafast hot carrier dynamics of zrte from time-resolvedoptical reflectivity. Phys. Rev. B , 99:125141, Mar 2019.[52] Adam D Wright, Carla Verdi, Rebecca L Milot, Giles E Eperon, Miguel A Pérez-Osorio, Henry J Snaith, FelicianoGiustino, Michael B Johnston, and Laura M Herz. Electron–phonon coupling in hybrid lead halide perovskites.
Nature communications , 7(1):1–9, 2016.[53] Sheng-Ying Yue, Long Cheng, Bolin Liao, and Ming Hu. Electron–phonon interaction and superconductivity inthe high-pressure ci 16 phase of lithium from first principles.
Physical Chemistry Chemical Physics , 20(42):27125–27130, 2018.[54] Jia-Yue Yang and Ming Hu. Strong electron–phonon interaction retarding phonon transport in superconductinghydrogen sulfide at high pressures.
Physical Chemistry Chemical Physics , 20(37):24222–24226, 2018.[55] Jia-Yue Yang, Wenjie Zhang, Chengying Xu, Jun Liu, Linhua Liu, and Ming Hu. Strong electron-phonon couplinginduced anomalous phonon transport in ultrahigh temperature ceramics zrb2 and tib2.
International Journal ofHeat and Mass Transfer , 152:119481, 2020.[56] Guangzhao Qin, Zhenzhen Qin, Huimin Wang, and Ming Hu. Anomalously temperature-dependent thermalconductivity of monolayer gan with large deviations from the traditional 1/t law.
Physical Review B , 95(19):195416,2017.[57] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.