[PDF] Recent Progress of the Computational 2D Materials Database (C2DB)

Abstract

The C2DB is a highly curated open database organizing a wealth of computed properties for more than 4000 atomically thin two-dimensional (2D) materials. Here we report on new materials and properties that were added to the database since its first release in 2018. The set of new materials comprise several hundred monolayers exfoliated from experimentally known layered bulk materials, (homo)bilayers in various stacking configurations, native point defects in semiconducting monolayers, and chalcogen/halogen Janus monolayers. The new properties include exfoliation energies, Bader charges, spontaneous polarisations, Born charges, infrared polarisabilities, piezoelectric tensors, band topology invariants, exchange couplings, Raman- and second harmonic generation spectra. We also describe refinements of the employed material classification schemes, upgrades of the computational methodologies used for property evaluations, as well as significant enhancements of the data documentation and provenance. Finally, we explore the performance of Gaussian process-based regression for efficient prediction of mechanical and electronic materials properties. The combination of open access, detailed documentation, and extremely rich materials property data sets make the C2DB a unique resource that will advance the science of atomically thin materials.

Full PDF

RRecent Progress of the Computational 2D MaterialsDatabase (C2DB)

Morten Niklas Gjerding , Alireza Taghizadeh , , AsbjørnRasmussen , Sajid Ali , Fabian Bertoldo , ThorstenDeilmann , Urko Petralanda Holguin , Nikolaj RørbækKnøsgaard , Mads Kruse , Simone Manti , Thomas GarmPedersen , Thorbjørn Skovhus , Mark Kamper Svendsen ,Jens Jørgen Mortensen , Thomas Olsen and KristianSommer Thygesen Computational Atomic-scale Materials Design (CAMD), Department ofPhysics, Technical University of Denmark, 2800 Kgs. Lyngby Denmark Department of Materials and Production, Aalborg University, 9220 AalborgØst, Denmark Institut f¨ur Festk¨orpertheorie, Westf¨alische Wilhelms-Universit¨at M¨unster,48149 M¨unster, GermanyE-mail: [email protected]

Abstract.

The C2DB is a highly curated open database organizing a wealthof computed properties for more than 4000 atomically thin two-dimensional(2D) materials. Here we report on new materials and properties that wereadded to the database since its ﬁrst release in 2018. The set of newmaterials comprise several hundred monolayers exfoliated from experimentallyknown layered bulk materials, (homo)bilayers in various stacking conﬁgurations,native point defects in semiconducting monolayers, and chalcogen/halogen Janusmonolayers. The new properties include exfoliation energies, Bader charges,spontaneous polarisations, Born charges, infrared polarisabilities, piezoelectrictensors, band topology invariants, exchange couplings, Raman- and secondharmonic generation spectra. We also describe reﬁnements of the employedmaterial classiﬁcation schemes, upgrades of the computational methodologiesused for property evaluations, as well as signiﬁcant enhancements of the datadocumentation and provenance. Finally, we explore the performance of Gaussianprocess-based regression for eﬃcient prediction of mechanical and electronicmaterials properties. The combination of open access, detailed documentation,and extremely rich materials property data sets make the C2DB a unique resourcethat will advance the science of atomically thin materials.

Submitted to:

2D Mater. a r X i v : . [ c ond - m a t . m t r l - s c i ] F e b ecent Progress of the Computational 2D Materials Database (C2DB)

1. Introduction

The discovery of new materials, or new properties ofknown materials, to meet a speciﬁc industrial or scien-tiﬁc requirement, is an exciting intellectual challenge ofthe utmost importance for our environment and econ-omy. For example, the successful transition to a societybased on sustainable energy sources and the realisa-tion of quantum technologies (e.g. quantum computersand quantum communication) depend critically on newmaterials with novel functionalities. First-principlesquantum mechanical calculations, e.g. based on den-sity functional theory (DFT)[1], can predict the prop-erties of materials with high accuracy even before theyare made in the lab. They provide insight into mecha-nisms at the most fundamental (atomic and electronic)level and can pinpoint and calculate key propertiesthat determine the performance of the material at themacroscopic level. Powered by high-performance com-puters, atomistic quantum calculations in combinationwith data science approaches, have the potential to rev-olutionize the way we discover and develop new mate-rials.Atomically thin, two-dimensional (2D) crystalsrepresent a fascinating class of materials with excit-ing perspectives for both fundamental science andtechnology[2, 3, 4, 5]. The family of 2D materialshas been growing steadily over the past decade andcounts about a hundred materials that have been re-alised in single- or few-layer form[6, 7, 8, 9, 10]. Whilesome of these materials, including graphene, hexago-nal boron-nitride (hBN), and transition metal dichalco-genides (TMDs), have been extensively studied, themajority have only been scarcely characterized and re-main poorly understood. Computational studies indi-cate that around 1000 already known layered crystalshave suﬃciently weak interlayer bonding to allow theindividual layers to be mechanically exfoliated[11, 12].Supposedly, even more 2D materials could be realizedbeyond this set of already known crystals. Addingto this the possibility of stacking individual 2D layers(of the same or diﬀerent kinds) into ultrathin van derWaals (vdW) crystals[13], and tuning the propertiesof such structures by varying the relative twist anglebetween adjacent layers[14, 15] or intercalating atomsinto the vdW gap[16, 17], it is clear that the prospectsof tailor made 2D materials are simply immense. Tosupport experimental eﬀorts and navigate the vast 2Dmaterials space, ﬁrst-principles calculations play a piv- otal role. In particular, FAIR ‡ [18] databases populatedby high-throughput calculations can provide a conve-nient overview of known materials and point to newpromising materials with desired (predicted) proper-ties. Such databases are also a fundamental require-ment for the successful introduction and deploymentof artiﬁcial intelligence in materials science.Many of the unique properties exhibited by2D materials have their origin in quantum conﬁne-ment and reduced dielectric screening. These ef-fects tend to enhance many-body interactions andlead to profoundly new phenomena such as stronglybound excitons[19, 20, 21] with nonhydrogenic Ryd-berg series[22, 23, 24], phonons and plasmons withanomalous dispersion relations[25, 26], large dielec-tric band structure renormalizations[27, 28], unconven-tional Mott insulating and superconducting phases[14,15], and high-temperature exciton condensates[29].Recently, it has become clear that long range magneticorder can persist[30, 31] and (in-plane) ferroelectric-ity even be enhanced[32], in the single layer limit. Inaddition, ﬁrst-principles studies of 2D crystals have re-vealed rich and abundant topological phases[33, 34].The peculiar physics ruling the world of 2D mate-rials entails that many of the conventional theoriesand concepts developed for bulk crystals break downor require special treatments when applied to 2Dmaterials[35, 26, 36]. This means that computationalstudies must be performed with extra care, which inturn calls for well-organized and well-documented 2Dproperty data sets that can form the basis for the devel-opment, benchmarking, and consolidation of physicaltheories and numerical implementations.The Computational 2D Materials Database(C2DB)[6, 37] is a highly curated and fully opendatabase containing elementary physical properties ofaround 4000 two-dimensional (2D) monolayer crys-tals. The data has been generated by automatic high-throughput calculations at the level of density func-tional theory (DFT) and many-body perturbation the-ory as implemented in the GPAW electronic structurecode. The computational workﬂow is constructed us-ing the Atomic Simulation Recipes (ASR) – a recentlydeveloped Python framework for high-throughput ma-terials modeling building on the Atomic Simulation En-vironment (ASE) – and managed/executed using the ‡ FAIR data are data which meet principles of ﬁndability,accessibility, interoperability, and reusability ecent Progress of the Computational 2D Materials Database (C2DB)

2. Selection, classiﬁcation, and stability

Figure 1 illustrates the workﬂow behind the C2DB. Inthis section we describe the ﬁrst part of the workﬂowuntil the property calculations (red box), focusingon aspects related to selection criteria, classiﬁcation,and stability assessment, that have been changed orupdated since the 2018 paper.

Given a prospect 2D material, the ﬁrst step is tocarry out a structure optimization. This calculationis performed with spin polarization and with thesymmetries of the original structure enforced. Thelatter is done to keep the highest level of control over the resulting structure by avoiding “uncontrolled”symmetry breaking distortions. The prize to payis a higher risk of generating dynamically unstablestructures.

A dimensionality analysis[43] is performed to identifyand ﬁlter out materials that have disintegrated intonon-2D structures during relaxation. Covalentlybonded clusters are identiﬁed through an analysis ofthe connectivity of the structures where two atomsare considered to belong to the same cluster if theirdistance is less than some scaling of the sum of theircovalent radii, i.e. d < k ( r cov i + r cov j ), where i and j are atomic indices. A scaling factor of k = 1 . Se , and Pb O ) of structures and theircluster dimensionalities before and after relaxation. Allstructures initially consist of a single 2D cluster, butupon relaxation Ge Se and Pb O disintegrate intotwo 2D clusters as well as one 2D and two 0D clusters,respectively. On the other hand, the relaxation ofgraphene decreases the in-plane lattice constant butdoes not aﬀect the dimensionality. According to thecriterion deﬁned above only graphene will enter thedatabase. Maintaining a high-throughput database inevitablyrequires a strategy for comparing similar structuresand ranking them according to their relevance. Inparticular, this is necessary in order to identifydiﬀerent representatives of the same material e.g.resulting from independent relaxations, and therebyavoid duplicate entries and redundant computations.The C2DB strategy to this end involves a combinationof structure clustering and Pareto analysis.First, a single-linkage clustering algorithm is usedto group materials with identical reduced chemicalformula and ”similar” atomic conﬁgurations. Toquantify conﬁguration similarity a slightly modiﬁedversion of PyMatGen’s[44] distance metric is employedwhere the cell volume normalization is removed tomake it applicable to 2D materials surrounded byvacuum. Roughly speaking, the metric measures themaximum distance an atom must be moved (in unitsof ˚A) in order to match the two atomic conﬁgurations.Two atomic conﬁgurations belong to the same clusterif their distance is below an empirically determinedthreshold of 0.3 ˚A.At this point, the simplest strategy would beto remove all but the most stable compound within ecent Progress of the Computational 2D Materials Database (C2DB) Structure relaxation(symmetry constrained)Dimensionality analysis:Exactly one 2D cluster ?Classification of Magnetic state:NM: max{|µ atom |} < 0.1µ B Classification of Crystal type:‘Stoichiometry – SG – Wykoff ’Duplicate check:Unique material ∉ C2DB ?

C2DB monolayer workflow

Formation and hull energy( Δ H form and Δ H hull )min{eig( C )} ≥ D )} ≥ Δ H form < 0 ?Stiffness tensor ( C )Dynamical matrix ( D ) Property workflow (see Table 1) YES NONONOYES

MultilayersPoint defectsMonolayers dyn a m i ca ll y un s t a b l e thermodynamicallyunstable • Thermodynamic • Elastic • Electronic • Magnetic • Topological • … no t n e w / no t D C2DB

YESYESYESNONO

Figure 1.

The workﬂow behind the C2DB. After the structural relaxation, the dimensionality of the material is checked andit is veriﬁed that the material is not already present in the database. Next, the material is classiﬁed according to its chemicalcomposition, crystal structure, and magnetic state. Finally, the thermodynamic- and dynamic stability is assessed from the energyabove the convex hull and the sign of the minimum eigenvalues of the dynamical matrix and stiﬀness tensor. Unstable materials arestored in the database; stable materials are subject to the property workﬂow. The C2DB monolayer database is interlinked withdatabases containing structures and properties of multilayer stacks and point defects in monolayers from the C2DB. a cluster. However, this procedure would removemany high symmetry crystals for which a more stabledistorted version exists. For example, the well knownT-phase of MoS would be removed in favor of the morestable T’-phase. This is undesired as high-symmetrystructures, even if dynamically unstable at T = 0, mayprovide useful information and might in fact becomestabilized at higher temperatures[45]. Therefore, thegeneral strategy adopted for the C2DB, is to keep amaterial that is less stable than another material of thesame cluster if it has fewer atoms in its primitive unitcell (and thus typically higher symmetry). Precisely,materials within a given cluster are kept only if theyrepresent a deﬁning point of the ( N , ∆ H )-Pareto front,where N is the number of atoms in the unit cell and∆ H is the heat of formation. A graphical illustrationof the Pareto analysis is shown in Figure 3 for the caseof ReS . The original C2DB employed a crystal prototype classiﬁcation scheme where speciﬁc materials werepromoted to prototypes and used to label groupsof materials with the same or very similar crystalstructure. This approach was found to be diﬃcult tomaintain (as well as being non-transparent). Instead,materials are now classiﬁed according to their crystal type deﬁned by the reduced stoichiometry, space groupnumber, and the alphabetically sorted labels of theoccupied Wyckoﬀ positions. As an example, MoS inthe H-phase has the crystal type: AB2-187-bi. In the new version of the C2DB, materials areclassiﬁed according to their magnetic state as either non-magnetic or magnetic . A material is consideredmagnetic if any atom has a local magnetic momentsgreater than 0.1 µ B .In the original C2DB, the magnetic category wasfurther subdivided into ferromagnetic (FM) and anti-ferromagnetic (AFM). But since the simplest anti-ferromagnetically ordered state typically does notrepresent the true ground state, all material entrieswith an AFM state have been removed from the C2DBand replaced by the material in its FM state. Althoughthe latter is less stable, it represents a more welldeﬁned state of the material. Crucially, the nearestneighbor exchange couplings for all magnetic materialshave been included in the C2DB (see Sec. 5.8). Thisenables a more detailed and realistic description ofthe magnetic order via the Heisenberg model. Inparticular, the FM state of a material is not expectedto represent the true magnetic ground if the exchangecoupling J < ecent Progress of the Computational 2D Materials Database (C2DB) E PBEgap > E PBEgap >

0, non-magnetic,non-centrosymmetric 375Electronic band structure PBE PBE* 3496Magnetic anisotropies PBE* Magnetic 823Deformation potentials PBE* E PBEgap > E PBEgap > E PBEgap = 0 2505Plasma frequency PBE* E PBEgap = 0 3144Work function PBE* E PBEgap = 0 4044Optical polarisability RPA@PBE 3127Electronic band structure HSE06@PBE* 3155Electronic band structure G W @PBE* E PBEgap > N atoms < E PBEgap > E PBEgap , non-centrosym. 353Optical absorbance BSE@G W * E PBEgap > N atoms < E PBEgap >

0, nearly centrosym.polar space group 151Topological invariants PBE*, Berry phase 0 < E

PBEgap < . Table 1.

Properties calculated by the C2DB monolayer workﬂow. The computational method and the criteria used to decidewhether the property should be evaluation for a given material is also shown. A ’*’ indicates that spin-orbit coupling (SOC) isincluded. All calculations are performed with the GPAW code using a plane wave basis except for the Raman calculations, whichemploy a double-zeta polarized (DZP) basis of numerical atomic orbitals.

The heat of formation, ∆ H , of a compound isdeﬁned as its energy per atom relative to itsconstituent elements in their standard states.[46] The thermodynamic stability of a compound is evaluatedin terms of its energy above the convex hull , ∆ H hull ,which gives the energy of the material relative to othercompeting phases of the same chemical composition,including mixed phases[6], see Fig. 4 for an example. ecent Progress of the Computational 2D Materials Database (C2DB) Figure 2.

Three example structures from C2DB (top:graphene, middle: Ge Se , bottom: Pb O ) with theirrespective cluster dimensionalities cluster before (left) and after(right) relaxation. The number N x D denotes the number ofclusters of dimensionality x . Clearly, ∆ H hull , depends on the pool of referencephases, which in turn deﬁnes the convex hull. Theoriginal C2DB employed a pool of reference phasescomprised by 2807 elemental and binary bulk crystalsfrom the convex hull of the Open Quantum MaterialsDatabase (OQMD)[46]. In the new version, this set Thermodynamicstability indicator CriterionLOW ∆ H >

H < H hull > . H < H hull < . Table 2.

Thermodynamic stability indicator assigned to allmaterials in the C2DB. has been extended by approximately 6783 ternary bulkcompounds from the convex hull of OQMD, making atotal of 9590 stable bulk reference compounds.As a simple indicator for the thermodynamicstability of a material, the C2DB employs three labels(low, medium, high) as deﬁned in Table 2.It should be emphasized that the energies of bothmonolayers and bulk reference crystals are calculatedwith the PBE xc-functional. This implies thatsome inaccuracies must be expected, in particularfor materials with strongly localized d -electrons, e.g.certain transition metal oxides, and materials for whichdispersive interactions are important, e.g. layered vander Waals crystals. The latter implies that the energyof a monolayer and its layered bulk parent (if suchexists in the pool of references) will have the sameenergy. For further details and discussions see Ref.[6]. Dynamically stable materials are situated at a localminimum of the potential energy surface and are thusstable to small structural perturbations. Structuresresulting from DFT relaxations can end up in saddlepoint conﬁgurations because of imposed symmetryconstraints or an insuﬃcient number of atoms in theunit cell.In C2DB, the dynamical stability is assessed fromthe signs of the minimum eigenvalues of (i) the stiﬀnesstensor (see Sec. 3.1) and (ii) the Γ-point Hessianmatrix for a supercell containing 2 × × ecent Progress of the Computational 2D Materials Database (C2DB) Number of atoms per unit cell − . − . − . − . − . − . − . ∆ H [ e V /a t o m ] T’ T”T

Pareto frontIncluded materialsExcluded materials

Figure 3.

Illustration of the Pareto analysis used to ﬁlter out duplicates or irrelevant structures from the C2DB. All points representmaterials with the same reduced chemical formula (in this case ReS ) that belong to the same cluster deﬁned by the structure metric.Only structures lying on the ( N, ∆ H )-Pareto front are retained (black circles) while other materials are excluded (red circles). Thephilosophy behind the algorithm is to keep less stable materials if they contain fewer atoms per unit cell than more stable materialsand thus represent structures of higher symmetry. Bi Te TeBiITeBiI IBi < / > Figure 4.

Convex hull diagram for (Bi,I,Te)-compounds. Green(red) coloring indicate materials that have a convex hull energyof less than (greater than) 5 meV. The monolayers BiI , Bi Te and BiITe lie on the convex hull. The monolayers are degeneratewith their layered bulk parent because the vdW interactions arenot captured by the PBE xc-functional.

3. Improved property methodology

The new version of the C2DB has been generatedusing a signiﬁcantly extended and improved workﬂowfor property evaluations. This section focuses onimprovements relating to properties that were alreadypresent in the original version of the C2DB while newproperties are discussed in the next section.

The stiﬀness tensor, C , is a rank-4 tensor that relatesthe stress of a material to the applied strain. In Mandelnotation (a variant of Voigt notation) C is expressedas an N × N matrix relating the N independentcomponents of the stress and strain tensors. For a 2Dmaterial N = 3 and the tensor takes the form C =  C xxxx C xxyy √ C xxxy C xxyy C yyyy √ C yyxy √ C xxxy √ C yyxy C xyxy  (1)where the indices on the matrix elements refer to therank-4 tensor. The factors multiplying the tensorelements account for their multiplicities in the fullrank-4 tensor. In the C2DB workﬂow, C is calculatedas a ﬁnite diﬀerence of the stress under an appliedstrain with full relaxation of atomic coordinates. Anegative eigenvalue of C signals a dynamical instability,see Sec. 2.7 ecent Progress of the Computational 2D Materials Database (C2DB) × For all materials with a ﬁnite band gap the eﬀectivemasses of electrons and holes are calculated forbands within 100 meV of the conduction bandminimum (CBM) and valence band maximum (VBM),respectively. The Hessian matrices at the bandextrema (BE) are determined by ﬁtting a second orderpolynomium to the PBE band structure includingSOC, and the eﬀective masses are obtained bysubsequent diagonalization of the Hessian. The mainﬁtting-procedure is unaltered from the ﬁrst version ofC2DB, but two important improvements have beenmade.The ﬁrst improvement consists in an additional k -mesh reﬁnement step for better localization of theBE in the Brillouin zone. After the location of theBE have been estimated based on a uniformly sampledband structure with k -point density of 12 ˚A, anotherone-shot calculation is perform with a denser k -mesharound the estimated BE positions. This ensures amore accurate and robust determination of the locationof the BE, which can be important in cases with asmall but still signiﬁcant spin-orbit splitting or whenthe band is very ﬂat or non-quadratic around the BE.The second reﬁnement step is the same as in the ﬁrstversion of C2DB, i.e. the band energies are calculatedon a highly dense k -mesh in a small disc around theBE, and the Hessian is obtained by ﬁtting the bandenergies in the range up to 1 meV from the bandminimum/maximum.The second improvement is the calculation of themean absolute relative error (MARE) of the poly-nomial ﬁt in a 25 meV range from the band mini-mum/maximum. The value of 25 meV corresponds tothe thermal energy at room temperature and is thusthe relevant energy scale for many applications. TheMARE provides a useful measure of the parabolicity ofthe energy bands and thus the validity of the eﬀectivemass approximation over this energy scale.Figure 5 shows two examples of band structureswith the eﬀective mass ﬁts and corresponding ﬁt errorsindicated. Additionally, the distribution of MARE forall the eﬀective mass ﬁts in the C2DB are presented.Most materials have an insigniﬁcant MARE, but afew materials have very large errors. Materials witha MARE above a few tens of percentages fall into twoclasses. For some materials the algorithm does notcorrectly ﬁnd the position of the BE. An example is Ti S in the space group C2/m. For others, the ﬁt andBE location are both correct, but the band ﬂattensaway from the BE which leads to a large MARE. Anexample of this latter class is Cl Tl in the space groupP-1. In general a small MARE indicates a parabolicband while materials with large MARE should behandled on a case-by-case basis. To facilitate a state-speciﬁc analysis of the PBEKohn-Sham wave functions, an orbital projectedband structure (PBS) is provided to complementthe projected density of states (PDOS). In thePAW methodology, the all-electron wave functions areprojected onto atomic orbitals inside the augmentationspheres centered at the position of each atom. The PBSresolves these atomic orbital contributions to the wavefunctions as a function of band and k -point whereasthe PDOS resolves the atomic orbital character of thetotal density of states as a function of energy. The spin-orbit coupling is not included in the PBS or PDOS, asits eﬀect is separately visualized by the spin-projectedband structure also available in the C2DB.As an example, Figure 6 shows the PBS (left)and PDOS (right) of monolayer MoS calculated withPBE. The relative orbital contribution to a given Blochstate is indicated by a pie chart symbol. In thepresent example, one can deduce from the PBS thateven though Mo- p orbitals and S- p orbitals contributeroughly equally to the DOS in the valence band, theMo- p orbital contributions are localized to a region inthe BZ around the M -point, whereas the S- p orbitalscontribute throughout the entire BZ. W band structures The C2DB contains G W quasiparticle (QP) bandstructures of 370 monolayers covering 14 diﬀerentcrystal structures and 52 chemical elements. Thedetails of these calculations can be found in the originalC2DB paper[6]. A recent in-depth analysis of the61.716 G W data points making up the QP bandstructures led to several important conclusions relevantfor high-throughput G W calculations. In particular,it identiﬁed the linear QP approximation as asigniﬁcant error source in standard G W calculationsand proposed an extremely simple correction scheme(the empirical Z (empZ) scheme), that reduces thiserror by a factor of two on average.The empZ scheme divides the electronic states intotwo classes according to the size of the QP weight, Z . States with Z ∈ [0 . , .

0] are classiﬁed as QPconsistent (QP-c) while states with Z (cid:54)∈ [0 . , . ecent Progress of the Computational 2D Materials Database (C2DB) − . . . ∆ k [1 / ˚A] − . − . − . − . − . E − E v a c [ e V ]

25 meV Rh Br − . − . . . . h S z i − .

05 0 .

00 0 . ∆ k [1 / ˚A] − . − . − . E − E v a c [ e V ]

25 meV

MoS − . − . . . . h S z i MARE [%] C o un t [ % ] MARE [%] − C o un t [ % ] Figure 5.

Left: The PBE band structures of Rh Br and MoS (colored dots) in regions around the conduction band minimum.The dashed red line shows the ﬁt made to estimate the eﬀective masses of the lowest conduction band. The shaded grey regionhighlights the error between the ﬁt and the true band structure. The mean absolute relative error (MARE) is calculated for energieswithin 25 meV of the band minimum. For MoS the ﬁt is essentially ontop of the band energies. Right: The distribution of theMARE of all eﬀective mass ﬁts in the C2DB. The inset shows the full distribution on a log scale. As mentioned in the main text,very large MAREs indicate that the band minimum/maximum was incorrectly identiﬁed by the algorithm and/or that the band isvery ﬂat. Γ M K Γ k -points − − − − − − − E − E v a c [ e V ] Mo (s)S (s) Mo (p)S (p) Mo (d) 0 1 2 3 4

Projected DOS [states / eV] − − − − − − − E − E v a c [ e V ] Mo (s)S (s) Mo (p)S (p) Mo (d)

Figure 6.

Orbital projected band structure and orbital projected density of states of MoS in the H-phase. The pie chart symbolsindicate the fractional atomic orbital character of the Kohn-Sham wave functions. ecent Progress of the Computational 2D Materials Database (C2DB) . . . . . . Quasiparticle weight, Z C o un t( % ) QP-c ( Z ∈ [0.5, 1.0])QP-ic ( Z [0.5, 1.0])Γ M K Γ k -points − − − − − − E − E v a c [ e V ] G W G W @empZ Figure 7.

Top: Distribution of the 61716 QP weights ( Z )contained in the C2DB. The blue part of the distribution showsQP-consistent (QP-c) Z -values while the orange part shows QP-inconsistent (QP-ic) Z values. In general, the linear expansion ofthe self-energy performed when solving the QP equation worksbetter for Z closer to 1. About 0.3% of the Z -values lie outsidethe interval from 0 to 1 and are not included in the distribution.Bottom: G W band structure before (blue) and after (orange)applying the empZ correction, which replaces Z by the mean ofthe distribution for QP-ic states. In the case of MoS only onestate at K is QP-ic. spectral weight in the QP peak. The distribution ofthe 60.000+ Z -values is shown in Figure 7. It turnsout that the linear approximation to the self-energy,which is the gist of the QP approximation, introducessigniﬁcantly larger errors for QP-ic states than for QP-c states. Consequently, the empZ method replaces thecalculated Z of QP-ic states with the mean of the Z -distribution, Z ≈ .

75. This simple replacementreduces the average error of the linear approximationfrom 0.11 eV to 0.06 eV.An illustration of the method applied to MoS isshown in Figure 7. The original uncorrected G W band structure is shown in blue while the empZcorrected band structure is shown in orange. MoS has only one QP-ic state in the third conduction bandat the K -point. Due to a break-down of the QPapproximation for this state, the G W correction isgreatly overestimated leading to a local discontinuityin the band structure. The replacement of Z by Z forthis particular state resolves the problem. All G W band structures in the C2DB are now empZ corrected. In the ﬁrst version of the C2DB, the optical absorbancewas obtained from the simple expression [6] A ( ω ) ≈ ω Im (cid:8) α ( ω ) (cid:9) (cid:15) c , (2)where α is the long wavelength limit of the in-planesheet polarisability density (Note that the equationis written here in SI units). The sheet polarisabilityis related to the sheet conductivity via σ ( ω ) = − iωα ( ω ). The expression (2) assumes that theelectric ﬁeld inside the layer equals the incomingﬁeld (i.e. reﬂection is ignored), and hence, it mayoverestimate the absorbance.In the new version, the absorbance is evaluatedfrom A = 1 − R − T , where R and T are the reﬂectedand transmitted powers of a plane wave at normalincidence, respectively. These can be obtained fromthe conventional transfer matrix method applied to amonolayer suspended in vacuum. The 2D materialis here modelled as an inﬁnitely thin layer with asheet conductivity. Alternatively, it can be modelledas quasi-2D material of thickness d with a “bulk”conductivity of σ = σ /d [48], but the two approachesyield very similar results, since the optical thicknessof a 2D material is much smaller than the opticalwavelength. Within this model, the expression for theabsorbance of a suspended monolayer with the sheetconductivity σ reads A ( ω ) = Re (cid:110) σ ( ω ) η (cid:111)(cid:12)(cid:12)(cid:12)(cid:12)

22 + σ ( ω ) η (cid:12)(cid:12)(cid:12)(cid:12) , (3)where η = 1 / ( (cid:15) c ) ≈

377 Ω is the vacuum impedance.If the light-matter interaction is weak, i.e. | σ η | (cid:28)

1, Eq. (3) reduces to Eq. (2). Nonetheless,due the strong light-matter interaction in some 2Dmaterials, this approximation is not reliable in general.In fact, it can be shown that the maximum possibleabsorption from Eq. (3) is 50%, which is known as theupper limit of light absorption in thin ﬁlms [49]. Thislimit is not guaranteed by Eq. (2), which can even yieldan absorbance above 100%.As an example, Fig. 8 shows the absorptionspectrum of monolayer MoS for in- and out-of-planepolarized light as calculated with the exact Eq. (3) andthe approximate Eq. (2), respectively. In all cases the ecent Progress of the Computational 2D Materials Database (C2DB) z -polarized light, the two approaches agree quite well,but noticeable diﬀerences are observed in regions withstronger light-matter interaction.

4. New materials in the C2DB

In this section we discuss the most signiﬁcantextensions of the C2DB in terms of new materials.The set of materials presented here is not complete,but represents the most important and/or well deﬁnedclasses. The materials discussed in Secs. 4.1 and4.2 (MXY Janus monolayers and monolayers extractedfrom experimental crystal structure databases) arealready included in the C2DB. The materials describedin Secs. 4.3 and 4.4 (homo-bilayers and monolayerpoint defect systems) will soon become available asseparate C2DB-interlinked databases.

The class of transition metal dichalcogenide (TMDC)monolayers of the type MX (where M is the transitionmetal and X is a chalcogen) exhibits a large variety ofinteresting and unique properties and has been widelydiscussed in the literature [50]. Recent experimentshave shown that it is not only possible to synthesizediﬀerent materials by changing the metal M or thechalcogen X, but also by exchanging the X on oneside of the layer by another chalcogen (or halogen)[51, 52, 53]. This results in a class of 2D materialsknown as MXY Janus monolayers with broken mirrorsymmetry and ﬁnite out-of-plane dipole moments. Theprototypical MXY crystal structures are shown inFig. 9 for the case of MoSSe and BiTeI, which have bothbeen experimentally realized [51, 52, 53]. Adopting thenomenclature from the TMDCs, the crystal structuresare denoted as H- or T-phase, depending on whetherX and Y atoms are vertically aligned or displaced,respectively.In a recent work [54], the C2DB workﬂow wasemployed to scrutinize and classify the basic electronicand optical properties of 224 diﬀerent MXY Janusmonolayers. All data from the study is available in theC2DB. Here we provide a brief discussion of the Rashbaphysics in these materials and refer the interestedreader to Ref. [54] for more details and analysis of otherproperties.A key issue when considering hypothetical mate-rials, i.e. materials not previously synthesized, is theirstability. The experimentally synthesized MoSSe andBiTeI are both found to be dynamically stable and liewithin 10 meV of the convex hull conﬁrming their ther-modynamic stability. Out of the 224 initial monolayers 93 are classiﬁed as stable according to the C2DB cri-teria (dynamically stable and ∆ H hull < . z -axis). Close tothe band extremum, the energy of the two spin bandsis described by the Rashba Hamiltonian [55, 56] H = α R ( σ × k ) · ˆ e z , (4)where σ is the vector of Pauli matrices, k = p / (cid:126) is the wave number, and the Rashba parameter isproportional to the electric ﬁeld strength, α R ∝ E ,Although the Rashba Hamiltonian is only meantas a qualitative model, it is of interest to testits validity on the Janus monolayers. The electricﬁeld of the Rashba model is approximately given by E = ∆ V vac /d , where ∆ V vac is the shift in vacuumpotential on the two sides of the layer (see left insetof Fig. 10) and d is the layer thickness. Assuming asimilar thickness for all monolayers, the electric ﬁeld isproportional to the potential shift. Not unexpected,the latter is found to correlate strongly with thediﬀerence in electronegativity of the X and Y atoms,see left panel of Fig. 10.The Rashba energy, E R , can be found by ﬁtting E ( k ) = (cid:126) k / m ∗ ± α R k to the band structure (seeright inset of Fig. 10) and should scale with the electricﬁeld strength. However, as seen from the right panelof Fig. 10, there is no correlation between the twoquantities. Hence we conclude that the simple Rashbamodel is completely inadequate and that the strengthof the perpendicular electric ﬁeld cannot be used toquantify the eﬀect of spin-orbit interactions on bandenergies. The C2DB has been extended with a number of mono-layers that are likely exfoliable from experimentallyknown layered bulk compounds. Speciﬁcally, the In-organic Crystal Structure Database (ICSD)[57] andCrystallography Open Database (COD)[58] have ﬁrstbeen ﬁltered for corrupted, duplicate and theoreticalcompounds, which reduce the initial set of 585.485database entries to 167.767 unique materials. All ofthese have subsequently been assigned a ”dimensional-ity score” based on a purely geometrical descriptor. Ifthe 2D score is larger than the sum of 0D, 1D and 3D ecent Progress of the Computational 2D Materials Database (C2DB) Photon energy [eV] A b s o r p t i o n [ % ] x / y -polarization | s | 1 Exact 0 1 2 3 4 5 6

Photon energy [eV] z -polarization x y Figure 8.

Optical absorption of standalone monolayer MoS for x/y -polarization (left) and z -polarization (right) at normal incidentin the BSE framework, obtained using Eq. (2) (blue) or Eq. (3) (orange). The crystal structure cross-sectional views are shown inthe inset with the deﬁnition of directions. H-phase (MoSSe) T-phase (BiTeI)

Figure 9.

Atomic structure of the MXY Janus monolayersin the H-phase (left) and T-phase (right). The two prototypematerials MoSSe and BiTeI are examples of experimentallyrealized monolayers adopting these crystal structures (not toscale). scores we regard the material as being exfoliable andwe extract the individual 2D components that com-prise the material (see also Sec. 2.2). We refer to theoriginal work on the method for details [43] and notethat similar approaches were applied in Refs. [12, 11]to identify potentially exfoliable monolayers from theICSD and COD.The search has been limited to bulk compoundscontaining less than 6 diﬀerent elements and no rareearth elements. This reduces the set of relevant bulkmaterials to 2991. For all of these we extracted the 2Dcomponents containing less than 21 atoms in the unitcell, which were then relaxed and sorted for duplicatesfollowing the general C2DB workﬂow steps describedin Secs. 2.1-2.3. At this point 781 materials remain.This set includes most known 2D materials and 207 of the 781 were already present in the C2DB prior to thisaddition. All the materials (including those that werealready in C2DB) have been assigned an ICSD/CODidentiﬁer that refers to the parent bulk compound fromwhich the 2D material was computationally exfoliated.We emphasize that we have not considered exfoliationenergies in the analysis and a subset of these materialsmay thus be rather strongly bound and challenging toexfoliate even if the geometries indicate van der Waalsbonded structures of the parent bulk compounds.Fig. 11 shows the distribution of energiesabove the convex hull for materials derived fromparent structures in ICSD or COD as well as forthe entire C2DB, which includes materials obtainedfrom combinatorial lattice decoration as well. Asexpected, the materials derived from experimental bulkmaterials are situated rather close to the convex hullwhereas those obtained from lattice decoration extendto energies far above the convex hull. It is alsoobserved that a larger fraction of the experimentallyderived materials are dynamically stable. There are,however, well known examples of van der Waalsbonded structures where the monolayer undergoes asigniﬁcant lattice distortion, which will manifest itselfas a dynamical instability in the present context. Forexample, bulk MoS exists in van der Waals bondedstructures composed of either 2H-MoS or 1T-MoS layers, but a monolayer of the 1T phase undergoesa structural deformation involving a doubling of theunit cell[59] and is thus categorized as dynamicallyunstable by the C2DB workﬂow. The dynamicallystable materials derived from parent bulk structures ecent Progress of the Computational 2D Materials Database (C2DB) (Pauling scale) BiClTeBiTeI SbTeClCrSTeMoSSe MoSTe AsTeBr

H-phaseT-phase V a c uu m l e v e l s h i f t [ e V ] Vacuum level shift [eV] R a s hb a e n e r g y [ m e V ] k BiSeI BiSI BiSeClBiTeI WSTeMoSTeBiClSeBiSeBr E [ m e V ] E [ e V ] z [ Å ] Δ χ = χ X – χ Y E R Figure 10.

Left: Correlation between the electronegativity diﬀerence of X and Y in MXY Janus monolayers and the vacuum levelshift across the layer. Right: Correlation between the Rashba energy and the vacuum level shift. Structures in the H-phase (e.g.MoSSe) are shown in black while structures in the T-phase (e.g. BiTeI) are shown in orange. The linear ﬁt has a slope 1 .

35 eV / ∆ χ (Pauling scale). The insets show the deﬁnition of the vacuum level shift and the Rashba energy, respectively. Modiﬁed from [54]. in the ICSD and COD may serve as a useful subsetof the C2DB that are likely to be exfoliable fromknown compounds and thus facilitate experimentalveriﬁcation. As a ﬁrst application the subset hasbeen used to search for magnetic 2D materials, whichresulted in a total of 85 ferromagnets and 61 anti-ferromagnets [60]. The C2DB is concerned with the properties ofcovalently bonded monolayers (see discussion ofdimensionality ﬁltering in Sec. 2.2). However,multilayer structures composed of two or moreidentical monolayers are equally interesting andoften have properties that deviate from those ofthe monolayer. In fact, the synthesis of layeredvdW structures with a controllable number oflayers represents an interesting avenue for atomic-scale materials design. Several examples of novelphenomena emerging in layered vdW structures havebeen demonstrated including direct-indirect band gaptransitions in MoS2 [61, 62], layer-parity selectiveBerry curvatures in few-layer WTe [63], thickness-dependent magnetic order in CrI [64, 65], andemergent ferroelectricity in bilayer hBN[66].As a ﬁrst step towards a systematic explorationof multilayer 2D structures, the C2DB has beenused as basis for generating homobilayers in variousstacking conﬁgurations and subsequently computing their properties following a modiﬁed version of theC2DB monolayer workﬂow. Speciﬁcally, the moststable monolayers (around 1000) are combined intobilayers by applying all possible transformations(unit cell preserving point group operations andtranslations) of one layer while keeping the otherﬁxed. The candidate bilayers generated in this wayare subject to a stability analysis, which evaluates thebinding energy and optimal interlayer distance basedon PBE-D3 total energy calculations keeping the atomsof the monolayers ﬁxed in their PBE relaxed geometry,see Fig. 12 and Table 3.The calculated interlayer binding energies aregenerally in the range from a few to a hundredmeV/˚A and interlayer distances range from 1.5˚A to3.8˚A. A scatter plot of preliminary binding energiesand interlayer distances is shown in Fig. 13. Theanalysis of homobilayers provides an estimate of theenergy required to peel a monolayer oﬀ a bulkstructure. In particular, the binding energy for themost stable bilayer conﬁguration provides a measureof the exfoliation energy of the monolayer. This keyquantity is now available for all monolayers in theC2DB, see Sec. 5.1. The C2DB is concerned with the properties of 2Dmaterials in their pristine crystalline form. However,as is well known the perfect crystal is an idealized ecent Progress of the Computational 2D Materials Database (C2DB) Energy above convex hull [eV] N u m b e r o f m a t e r i a l s Known in 3D bulk phase

Dynamically stableDynamically unstable0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50

Energy above convex hull [eV] N u m b e r o f m a t e r i a l s Entire C2DB

Dynamically stableDynamically unstable

Figure 11.

Distribution of energies above the convex hull forthe 2D materials extracted from bulk compounds in ICSD andCOD (top) and for the entire C2DB including those constructedfrom combinatorial lattice decoration (bottom). Dynamicallystable materials are indicated in blue. model of real materials, which always contain defectsin smaller or larger amounts depending on the intrinsicmaterials properties and growth conditions. Crystaldefects often have a negative impact on physicalproperties, e.g. they lead to scattering and lifetime-reduction of charge carriers in semiconductors.However, there are also important situations wheredefects play a positive enabling role, e.g. indoping of semiconductors, as color centers for photonemission[67, 68] or as active sites in catalysis.To reduce the gap between the pristine modelmaterial and real experimentally accessible samples,a systematic evaluation of the basic properties ofthe simplest native point defects in a selected subsetof monolayers from the C2DB has been initiated.The monolayers are selected based on the stabilityof the pristine crystal. Moreover, only non-magneticsemiconductors with a PBE band gap satisfying E gap > .

00 3 .

25 3 .

50 3 .

75 4 .

00 4 . Interlayer distance [˚A] − . − . − . − . − . − . − . − . B i nd i n g e n e r g y [ e V / ˚A ] MoS - AA Figure 12.

An illustration of the optimization of the interlayer(IL) distance for MoS in the AA stacking. The black crossesare the points sampled by the optimization algorithm while theblue curve is a spline interpolation of the black crosses. Theinset shows the MoS AA stacking and the deﬁnition of the ILdistance is indicated with a black double-sided arrow.1 . . . . . . Interlayer distance [˚A] − − B i nd i n g e n e r g y [ e V / ˚A ] MoS C BN Figure 13.

Scatter plot of the calculated interlayer distanceand binding energies of (homo)bilayers of selected materialsfrom C2DB. A few well known materials are highlighted: MoS ,graphene (C ), and hexagonal boron-nitride (BN). The bilayerbinding energies provide an estimate of the monolayer exfoliationenergies, see Sec. 5.1 ecent Progress of the Computational 2D Materials Database (C2DB) Figure 14.

Overview of some of the properties included in the 2D defect database project for the example host material CH Si. (a)The supercell used to represent the defects (here a Si vacancy). The supercell is deliberately chosen to break the symmetry of thehost crystal lattice. (b) Formation energies of a C vacancy (green) and C-Si substitutional defect (purple). (c) Energy and orbitalsymmetry of the localized single-particle states of the V Si defect for both spin channels (left and right). The Fermi level is shown bythe dotted line. (d) Schematic excited state conﬁguration energy diagram. The transitions corresponding to the vertical absorptionand the zero-phonon emission are indicated. qubits. Following these selection criteria around 300monolayers are identiﬁed and their vacancies andintrinsic substitutional defects are considered, yieldinga total of about 1500 defect systems.Each defect system is subject to the sameworkﬂow, which is brieﬂy outlined below. Toenable point defects to relax into their lowest energyconﬁguration, the symmetry of the pristine host crystalis intentionally broken by the chosen supercell, seeFig. 14 (a). In order to minimize defect-defectinteraction, supercells are furthermore chosen suchthat the minimum distance between periodic imagesof defects is larger than 15 ˚A. Unique point defects arecreated based on the analysis of equivalent Wyckoﬀpositions for the host material. To illustrate some ofthe properties that will feature in the upcoming pointdefect database, we consider the speciﬁc example ofmonolayer CH Si. First, the formation energy[69, 70] of a givendefect is calculated from PBE total energies. Next,Slater-Janak transition state theory is used to obtainthe charge transition levels[71, 72]. By combiningthese results, one obtains the formation energy of thedefect in all possible charge states as a function ofthe Fermi level. An example of such a diagram isshown in Fig. 14 (b) for the case of the V C andC Si defects in monolayer CH Si. For each defect andeach charge state, the PBE single-particle energy leveldiagram is calculated to provide a qualitative overviewof the electronic structure. A symmetry analysis[73] isperformed for the defect structure and the individualdefect states lying inside the band gap. The energylevel diagram of the neutral V Si defect in CH Si isshown in Fig. 14 (c), where the defect states are labeledaccording to the irreducible representations of the C s point group. ecent Progress of the Computational 2D Materials Database (C2DB) Si, the DO-MOMmethod yields E ZPL = 3 .

84 eV, λ reorggs = 0 .

11 eV and λ reorgexc = 0 .

16 eV. For systems with large electron-phonon coupling (i.e. Huang-Rhys factor >

1) a one-dimensional approximation for displacements along themain phonon mode is used to produce the conﬁgurationcoordinate diagram (see Fig. 14 (d)). In additionto the ZPL energies and reorganization energies,the Huang-Rhys factors, photoluminescence spectrumfrom the 1D phonon model, hyperﬁne coupling andzero ﬁeld splitting are calculated.

5. New properties in the C2DB

This section reports on new properties that havebecome available in the C2DB since the ﬁrst release.The employed computational methodology is describedin some detail and results are compared to theliterature where relevant. In addition, some interestingproperty correlations are considered along with generaldiscussions of the general signiﬁcance and potentialapplication of the available data.

The exfoliation energy of a monolayer is estimated asthe binding energy of its bilayer in the most stablestacking conﬁguration (see also Sec. 4.3). Thebinding energy is calculated using the PBE+D3 xc-functional[75] with the atoms of both monolayers ﬁxedin the PBE relaxed geometry. Table 3 comparesexfoliation energies obtained in this way to values fromMounet et al. [11] for a representative set of monolayers.

For all monolayers we calculate the net charge onthe individual atoms using the Bader partitioningscheme[76]. The analysis is based purely on the Material SG PBE+D3 DF2 rVV10MoS P-6m2 28.9 21.6 28.8MoTe P-6m2 30.3 25.2 30.4ZrNBr Pmmn 18.5 10.5 18.5C P6/mmm 18.9 20.3 25.5P Pmna 21.9 38.4 30.7BN P-6m2 18.9 19.4 24.4WTe P-6m2 32.0 24.7 30.0PbTe P3m1 23.2 27.5 33.0

Table 3.

Exfoliation energies for selected materials calculatedwith the PBE+D3 xc-functional as described in Sec. 4.3 andcompared with the DF2 and rVV10 results from Ref. [11]. Thespacegroups are indicated in the column ”SG”. All numbers arein units of meV/˚A . electron density, which we calculate from the PAWpseudo density plus compensation charges using thePBE xc-functional. Details of the method and itsimplementation can be found in Tang et al. [77]. InSec. 5.4 we compare and discuss the relation betweenBader charges and Born charges. The spontaneous polarization ( P s ) of a bulk materialis deﬁned as the charge displacement with respect tothat of a reference centrosymmetric structure [78, 79].Ferroelectric materials exhibit a ﬁnite value of P s thatmay be switched by an applied external ﬁeld andhave attracted a large interest for a wide range ofapplications [80, 81, 82].The spontaneous polarization in bulk materialscan be regarded as electric dipole moment per unitvolume, but in contrast to the case of ﬁnite systemsthis quantity is ill-deﬁned for periodic crystals [78].Nevertheless, one can deﬁne the formal polarizationdensity P = 12 π eV (cid:88) l φ l a l (5)where a l (with l ∈ { , , } ) are the lattice vectorsspanning the unit cell, V is the cell volume and e isthe elementary charge. φ l is the polarization phasealong the lattice vector deﬁned by φ l = (cid:88) i Z i b l · u i − φ elec l (6)where b l is the reciprocal lattice vector satisfying b l · R l = 2 π and u i is the position of nucleus i ecent Progress of the Computational 2D Materials Database (C2DB) − − λ − . − . − . − . . . . . . P o l a r i z a t i o n [ n C / m ] − . − . . . . E n e r g y [ e V ] PolarizationEnergy

Figure 15.

Depicted in the blue plot is the formalpolarization calculated along the adiabatic path for GeSe,using the methods described in the main text. The orangeplot shows the energy potential along the path as well asoutside. Figure inset: The structure of GeSe in the 2 non-centrosymmetric conﬁgurations corresponding to - P s and P s andthe centrosymmetric conﬁguration. with charge eZ i . The electronic contribution to thepolarization phase is deﬁned as φ elec l = 1 N k ⊥ b l Im (cid:88) k ∈ BZ ⊥ b l × ln N k (cid:107) b l − (cid:89) j =0 det occ (cid:20)(cid:68) u n k + jδ k (cid:12)(cid:12)(cid:12) u m k +( j +1) δ k (cid:69)(cid:21) , (7)where BZ ⊥ b l = { k | k · b l = 0 } is a plane of k -points orthogonal to b l , δ k is the distance betweenneighbouring k-points in the b l direction and N k (cid:107) b l ( N k ⊥ b l ) is the number of k -points along (perpendicularto) the b l direction. These expression generalizestraightforwardly to 2D.The formal polarization is only well-deﬁnedmodulo e R n /V where R n is any lattice vector.However, changes in polarization are well deﬁned andthe spontaneous polarization may thus be obtained by P s = (cid:90) d P ( λ ) dλ dλ, (8)where λ is a dimensionless parameter that deﬁnes anadiabatic structural path connecting the polar phase( λ = 1) with a non-polar phase ( λ = 0).The methodology has been implemented in GPAWand used to calculate the spontaneous polarizationof all stable materials in the C2DB with a PBEband gap above 0.01 eV and a polar space groupsymmetry. For each material, the centrosymmetricphase with smallest atomic displacement from the polar phase is constructed and relaxed under theconstraint of inversion symmetry. The adiabaticpath connecting the two phases is then used tocalculate the spontaneous polarization using Eqs. (5)-(8). An example of a calculation for GeSe isshown in Fig. 15 where the polarization along thepath connecting two equivalent polar phases via thecentrosymmetric phase is shown together with the totalenergy. The spontaneous polarization obtained fromthe path is 39 . The Born charge of an atom a at position u a in a solidis deﬁned as Z aij = Ve ∂P i ∂u aj (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E =0 . (9)It can be understood as an eﬀective charge assignedto the atom to match the change in polarization indirection i when its position is perturbed in direction j .Since the polarization density and the atomic positionare both vectors, the Born charge of an atom is arank-2 tensor. The Born charge is calculated as aﬁnite diﬀerence and relies on the Modern theory ofpolarization [84] for the calculation of polarizationdensities, see Ref. [85] for more details. The Borncharge has been calculated for all stable materials inC2DB with a ﬁnite PBE band gap.It is of interest to examine the relation betweenthe Born charge and the Bader charge (see Sec. 5.2).In materials with strong ionic bonds one would expectthe charges to follow the atoms. On the other hand, incovalently bonded materials the hybridization patternand thus the charge distribution, depends on the atompositions in a complex way, and the idea of chargesfollowing the atom is expected to break down. Inagreement with this idea, the (in-plane) Born chargesin the strongly ionic hexagonal boron-nitride ( ± . e for B and N, respectively) are in good agreement withthe calculated Bader charges ( ± . e ). In contrast,(the in-plane) Born charges in MoS ( − . e and 0 . e for Mo and S, respectively) deviate signiﬁcantly fromthe Bader charges (1 . e and − . e for Mo and S,respectively). In fact, the values disagree even on thesign of the charges underlining the non-intuitive natureof the Born charges in covalently bonded materials.Note that the out-of-plane Born charges nevermatch the Bader charges, even for strongly ionicinsulators, and are consistently smaller in valuethan the in-plane components. The smaller out-of-plane values are consistent with the generally smallerout-of-plane polarisability of 2D materials (for bothelectronic and phonon contributions) and agrees with ecent Progress of the Computational 2D Materials Database (C2DB) − − Bader charge [e] − − − − B o r n c h a r g e ( T r ( Z ) / ) [ e ] . . . . . I o n i c i t y Figure 16.

Born charges, Tr( Z ) /

3, vs. Bader charges for3025 atoms in the 585 materials for which the Born charges arecalculated. The colors indicate the ionicity of the atoms (seemain text). the intuitive expectation that it is more diﬃcult topolarize a 2D material in the out-of-plane direction ascompared to the in-plane direction.Fig. 16 shows the average of the diagonal ofthe Born charge tensor, Tr( Z a ) /

3, plotted against theBader charges for all 585 materials in the C2DB forwhich the Born charges have been computed. The datapoints have been colored according to the ionicity ofthe atom a deﬁned as I ( a ) = | χ a − (cid:104) χ (cid:105)| , where χ a and (cid:104) χ (cid:105) are the Pauling electronegativity of atom a andthe average electronegativity of all atoms in the unitcell, respectively. The ionicity is thus a measure of thetendency of an atom to donate/accept charge relativeto the average tendency of atoms in the material. Itis clear from Fig. 16 that there is a larger propensityfor the Born and Bader charges to match in materialswith higher ionicity.Fig. 17 plots the average (in-plane) Born chargeand the Bader charge versus the band gap. It is clearthat large band gap materials typically exhibit integerBader charges, whereas there is no clear correlationbetween the Born charge and the band gap. The original C2DB provided the frequency dependentpolarisability computed in the random phase approx-imation (RPA) with inclusion of electronic interbandand intraband (for metals) transitions[6]. However,phonons carrying a dipole moment (so-called infrared(IR) active phonons) also contribute to the polariz-ability at frequencies comparable to the frequency ofoptical phonons. This response is described by the IRpolarizability,

Band gap [eV] C h a r g e ( m e a n a b s o l u t e v a l u e ) [ e ] Born chargeBader charge

Figure 17.

Bader and in-plane Born charges vs. band gap.0 100 200 300 400 500

Energy [meV] − P o l a r i z a b ili t y [ ˚A ] α ∞ imagreal Figure 18.

Total polarizability, including both electronsand phonons, of monolayer hBN in the infrared frequencyregime. The resonance at around 180 meV is due to theΓ-point longitudinal optical phonon. At energies above allphonon frequencies (but below the band gap) the polarizabilityis approximately constant and equal to the static limit of theelectronic polarizability, α ∞ . α IR ( ω ) = e A Z T M − / (cid:88) i d i d Ti ω i − ω − iγω  M − / Z , (10)where Z and M are matrix representations of the Borncharges and atomic masses, ω i and d i are eigenvectorsand eigenvalues of the dynamical matrix, A is thein-plane cell area and γ is a broadening parameterrepresenting the phonon lifetime and is set to 10meV. The total polarizability is then the sum of theelectronic polarizability and the IR polariability.The new C2DB includes the IR polarisability ofall monolayers for which the Born charges have been ecent Progress of the Computational 2D Materials Database (C2DB) The piezoelectric eﬀect is the accumulation of charges,or equivalently the formation of an electric polarisa-tion, in a material in response to an applied mechanicalstress or strain. It is an important material character-istic with numerous scientiﬁc and technological appli-cations in sonar, microphones, accelerometers, ultra-sonic transducers, energy conversion etc [86, 87]. Thechange in polarization originates from the movementof positive and negative charge centers as the materialis deformed.Piezoelectricity can be described by the (proper)piezoelectric tensor c ijk with i, j, k ∈ { x, y, z } , given by[88] c ijk = e πV (cid:88) l ∂φ l ∂(cid:15) jk a li . (11)which diﬀers from Eq. (5) only by a derivative of thepolarization phase with respect to the strain tensor (cid:15) jk .Note that c ijk does not depend on the chosen branchcut. The piezoelectric tensor is a symmetric tensor withat most 18 independent components. Furthermore,the point group symmetry restricts the number ofindependent tensor elements and their relationshipsdue to the well-known Neumann’s principle [89]. Forexample, monolayer MoS with point group D h , hasonly one non-vanishing independent element of c ijk .Note that c ijk vanishes identically for centrosymmetricmaterials. Using a ﬁnite-diﬀerence technique with aﬁnite but small strain (1% in our case), Eq. (11) hasbeen used to compute the proper piezoelectric tensorfor all non-centrosymmetric materials in the C2DBwith a ﬁnite band gap. Table 4 shows a comparison ofthe piezoelectric tensors in the C2DB with literaturefor a selected set of monolayer materials. Goodagreement is obtained for all these materials. For all materials in the C2DB exhibiting a direct bandgap below 1 eV, the k -space Berry phase spectrum ofthe occupied bands has been calculated from the PBEwave functions. Speciﬁcally, a particular k -point iswritten as k b + k b and the Berry phases γ n ( k )of the occupied states on the path k = 0 → k = 1is calculated for each value of k . The connectivity ofthe Berry phase spectrum determines the topologicalproperties of the 2D Bloch Hamiltonian [92, 93]. Material Exp. Theory [90] C2DBBN - 0.14 0.13MoS - 0.39 0.38MoTe - 0.54 0.48WS - 0.25 0.24WSe - 0.27 0.26WTe - 0.34 0.34 Table 4.

Comparison of computed piezoelectric tensor versusexperimental values and previous calculations for hexagonal BNand a selected set of TMDs (space group 187). All number arein units of nC/m. Experimental data for MoS is obtained fromRef. [91]. The calculated Berry phase spectra of the relevantmaterials are available for visual inspection on theC2DB webpage. Three diﬀerent topological invariantshave been extracted from these spectra and arereported in the C2DB: 1) The Chern number, C , takesan integer value and is well deﬁned for any gapped2D material. It determines the number of chiral edgestates on any edge of the material. For any non-magnetic material the Chern number vanishes due totime-reversal symmetry. It is determined from theBerry phase spectrum as the number of crossings atany horizontal line in the spectrum. 2) The mirrorChern number, C M , deﬁned for gapped materialswith a mirror plane in the atomic layer[94]. Forsuch materials, all states may be chosen as mirroreigenstates with eigenvalues ± i and the Chern numbers C ± can be deﬁned for each mirror sector separately.For a material with vanishing Chern number, themirror Chern number is deﬁned as C M = ( C + − C − ) / Z invariant, ν , which can take the values 0 and1, is deﬁned for materials with time-reversal symmetry.Materials with ν = 1 are referred to as quantum spinHall insulators and exhibit helical edge states at anytime-reversal conserving edge. It is determined fromthe Berry phase spectrum as the number of crossingpoints modulus 2 at any horizontal line in the interval k ∈ [0 , / C , C M and ν as well as a trivial insulator.The four materials are: OsCl (space group 147) -a Chern insulator with C = 1, OsTe (space group14) - a mirror crystalline insulator with C M = 2, SbI ecent Progress of the Computational 2D Materials Database (C2DB) − . . k π π γ n › S x ﬁ / − . . k π π γ n › S z ﬁ / − . . k π π γ n › S z ﬁ / − . . k π π γ n › S z ﬁ / Figure 19.

Berry phase spectra of the Chern insulator OsCl (top left), the crystalline topological insulator OsTe (top right), thequantum spin Hall insulator SbI (lower left) and the trivial insulator BiITe (lower right). (spacegroup 1) - a quantum spin Hall insulator with ν = 1 and BiITe (spacegroup 156) - a trivial insulator.Note that a gap in the Berry phase spectrum alwaysimplies a trivial insulator.In Ref. [95] the C2DB was screened for materialswith non-trivial topology. At that point it wasfound that the database contained 7 Chern insulators,21 mirror crystalline topological insulators and 48quantum spin Hall insulators. However, that does notcompletely exhaust the the topological properties ofmaterials in the C2DB. In particular, there may bematerials that can be topologically classiﬁed based oncrystalline symmetries other than the mirror plane ofthe layer. In addition, second order topological eﬀectsmay be present in certain materials, which imply thatﬂakes will exhibit topologically protected corner states.Again, the Berry phase spectra may be used to unravelthe second order topology by means of nested Wilsonloops [96]. The general C2DB workﬂow described in Secs. 2.1-2.3 will identify the ferromagnetic ground state of a material and apply it as starting point for subsequentproperty calculations, whenever it is more stable thanthe spin-paired ground state. In reality, however, theferromagnetic state is not guaranteed to comprise themagnetic ground state. In fact, anti-ferromagneticstates often have lower energy than the ferromagneticone, but in general it is non-trivial to obtain the truemagnetic ground state. We have chosen to focus on theferromagnetic state due to its simplicity and becauseits atomic structure and stability are often very similarto those of other magnetic states. Whether or not theferromagnetic state is the true magnetic ground stateis indicated by the nearest neighbor exchange couplingconstant as described below.When investigating magnetic materials the ther-modynamical properties (for example the critical tem-peratures for ordering) are of crucial interest. In twodimensions the Mermin-Wagner theorem[97] comprisesan extreme example of the importance of thermal ef-fects since it implies that magnetic order is only pos-sible at T = 0 unless the spin-rotational symmetry isexplicitly broken. The thermodynamic properties can-not be accessed directly by DFT. Consequently, mag-netic models that capture the crucial features of mag- ecent Progress of the Computational 2D Materials Database (C2DB) H = − J (cid:88) (cid:104) ij (cid:105) S i · S j − λ (cid:88) (cid:104) ij (cid:105) S zi S zj − A (cid:88) i (cid:0) S zi (cid:1) (12)where J is the nearest neighbor exchange constant, λ isthe nearest neighbor anisotropic exchange constant and A measures the strength of single-ion anisotropy. Wealso neglect oﬀ-diagonal exchange coupling constantsthat give rise to terms proportional to S xi S yj , S yi S zj and S zi S xj . The out-of-plane direction has been chosenas z and (cid:104) ij (cid:105) implies that for each site i we sumover all nearest neighbor sites j . The parameters J , λ and A may be obtained from an energy mappinganalysis involving four DFT calculations with diﬀerentspin conﬁgurations[99, 60, 100]. The thermodynamicproperties of the resulting ”ﬁrst principles Heisenbergmodel” may subsequently be analysed with classicalMonte Carlo simulations or renormalized spin wavetheory [101, 36].The C2DB provides the values of J , λ , and A as well as the number of nearest neighbors N nn andthe maximum eigenvalue of S z ( S ), which is obtainedfrom the total magnetic moment per atom in theferromagnetic ground state (rounded to nearest half-integer for metals). These key parameters facilitateeasy post-processing analysis of thermal eﬀects on themagnetic structure. In Ref. [102] such an analysiswas applied to estimate the critical temperature of allferromagnetic materials in the C2DB based on a modelexpression for T C and the parameters from Eq. (12).For metals, the Heisenberg parameters availablein C2DB should be used with care because theHeisenberg model is not expected to provide anaccurate description of magnetic interactions in thiscase. Nevertheless, even for metals the sign andmagnitude of the parameters provide an importantqualitative measure of the magnetic interactions thatmay be used to screen and select materials for moredetailed investigations of magnetic properties.A negative value of J implies the existence of ananti-ferromagnetic state with lower energy than theferromagnetic state used in C2DB. This parameter isthus crucial to consider when judging the stabilityand relevance of a material classiﬁed as magneticin C2DB (see Sec. 2.5). Fig. 20 shows thedistribution of exchange coupling constants (weightedby S ) of the magnetic materials in the C2DB. Thedistribution is slightly skewed to the positive side indicating that ferromagnetic order is more commonthan anti-ferromagnetic order.The origin of magnetic anisotropy may stem fromeither single-ion anisotropy or anisotropic exchangeand it is in general diﬃcult a priori to determine, whichmechanism is most important. There is, however,a tendency in the literature to neglect anisotropicexchange terms in a Heisenberg model descriptionof magnetism and focus solely on the single-ionanisotropy. In Fig. 20 we show a scatter plot of theanisotropy parameters A and λ for the ferromagneticmaterials ( J > A (2 S −

1) + λN nn > A < A (2 S −

1) + λN nn >

0. This is in fact the case for 11ferromagnetic insulators and 31 ferromagnetic metalsin the C2DB.

Raman spectroscopy is an important technique usedto probe the vibrational modes of a solid (ormolecule) by means of inelastic scattering of light[103]. In fact, Raman spectroscopy is the dominantmethod for characterising 2D materials and canyield detailed information about chemical composition,crystal structure and layer thickness. There existseveral diﬀerent types of Raman spectroscopies thatdiﬀer mainly by the number of photons and phononsinvolved in the scattering process [103]. The ﬁrst-order Raman process, in which only a single phonon isinvolved, is the dominant scattering process in sampleswith low defect concentrations.In a recent work, the ﬁrst-order Raman spec-tra of 733 monolayer materials from the C2DB werecalculated, and used as the basis for an automaticprocedure for identifying a 2D material entirely fromits experimental Raman spectrum[104]. The Ra-man spectrum is calculated using third-order pertur-bation theory to obtain the rate of scattering pro-cesses involving creation/annihilation of one phononand two photons, see Ref. [104] for details. Thelight ﬁeld is written as F ( t ) = F in u in exp( − iω in t ) + F out u out exp( − iω out t )+c.c., where F in / out and ω in / out denote the amplitudes and frequencies of the in-put/output electromagnetic ﬁelds, respectively. In ad-dition, u in / out = (cid:80) i u i in / out e i are the correspondingpolarization vectors, where e i denotes the unit vec-tor along the i -direction with i ∈ { x, y, z } . Using thislight ﬁeld, the ﬁnal expression for the Stokes Raman in-tensity involving scattering events by only one phonon ecent Progress of the Computational 2D Materials Database (C2DB)

75 50 25 0 25 50 75 JS [meV] N u m b e r o f m a t e r i a l s MetalsInsulators20 10 0 10 20 λNS [meV] A ( S − ) [ m e V ] MetalsInsulators

Figure 20.

Top: Distribution of exchange coupling constants inC2DB. Bottom: Single-ion anisotropy A vs anisotropic exchange λ for ferromagnetic materials with S > /

2. The shaded areaindicates the part of parameter space where the model (Eq. (12))does not yield an ordered state at ﬁnite temperatures. reads [104] I ( ω ) = I (cid:88) ν n ν + 1 ω ν (cid:12)(cid:12)(cid:12)(cid:12) (cid:88) ij u i in R νij u j out (cid:12)(cid:12)(cid:12)(cid:12) δ ( ω − ω ν ) . (13)Here, I is an unimportant constant (since Ramanspectra are always reported normalized), and n ν isobtained from the Bose–Einstein distribution, i.e. n ν ≡ (exp (cid:2) (cid:126) ω ν /k B T (cid:3) − − at temperature T for aRaman mode with energy (cid:126) ω ν . Note that only phononsat the Brillouin zone center (with zero momentum)contribute to the one-phonon Raman processes dueto momentum conservation. In Eq. (13), R νij is theRaman tensor for phonon mode ν , which involveselectron-phonon and dipole matrix elements as wellas the electronic transition energies and the incidentexcitation frequency. Eq. (13) has been used tocompute the Raman spectra of the 733 most stable,non-magnetic monolayers in C2DB for a range ofexcitation frequencies and polarization conﬁgurations.Note that the Raman shift (cid:126) ω is typically expressed in cm − with 1 meV equivalent to 8.0655 cm − . Inaddition, for generating the Raman spectra, we haveused a Gaussian [ G ( ω ) = ( σ √ π ) − exp (cid:8) ( − ω / σ ) (cid:9) ]with a variance σ = 3 cm − to replace the Diracdelta function, which accounts for the inhomogeneousbroadening of phonon modes.As an example, Fig. 21 shows the calculatedRaman spectrum of monolayer MoS and the Janusmonolayer MoSSe (see Sec. 4.1). ExperimentalRaman spectra extracted from Ref. [52] are shownfor comparison. For both materials, good agreementbetween theory and experiment is observed for thepeak positions and relative amplitudes of the mainpeaks. The small deviations can presumably beattributed to substrate interactions and defects in theexperimental samples as well as the neglect of excitoniceﬀects in the calculations. The qualitative diﬀerencesbetween the Raman spectra can be explained by thediﬀerent point groups of the materials ( C v and D h ,respectively), see Ref. [104]. In particular, the lowersymmetry of MoSSe results in a lower degeneracy of itsvibrational modes leading to more peaks in the Ramanspectrum. Nonlinear optical (NLO) phenomena such as harmonicgeneration, Kerr, and Pockels eﬀects are of great tech-nological importance for lasers, frequency converters,modulators, etc. In addition, NLO spectroscopy hasbeen extensively employed to obtain insight into mate-rials properties [105] that are not accessible by e.g. lin-ear optical spectroscopy. Among numerous nonlinearprocesses, second-harmonic generation (SHG) has beenwidely used for generating new frequencies in lasers aswell as identifying crystal orientations and symmetries.Recently, the SHG spectrum was calculated for375 non-magnetic, non-centrosymmetric semiconduct-ing monolayers of the C2DB, and multiple 2D mate-rials with giant optical nonlinearities were identiﬁed[106]. In the SHG process, two incident photons at fre-quency ω generate an emitted photon at frequency of2 ω . Assume that a mono-harmonic electric ﬁeld writ-ten F ( t ) = (cid:80) i F i e i e − iωt +c.c. is incident on the ma-terial, where e i denotes the unit vector along direction i ∈ { x, y, z } . The electric ﬁeld induces a SHG polar-ization density P (2) , which can be obtained from thequadratic susceptibility tensor χ (2) ijk , P (2) i ( t ) = (cid:15) (cid:88) jk χ (2) ijk ( ω, ω ) F i F j e − iωt + c.c. , (14)where (cid:15) denotes the vacuum permittivity. χ (2) ijk is asymmetric (due to intrinsic permutation symmetry i.e. χ (2) ijk = χ (2) ijk ) rank-3 tensor with at most 18 independentelements. Furthermore, similar to the piezoelectric ecent Progress of the Computational 2D Materials Database (C2DB)

200 300 400 500

Raman shift [cm − ] E A MoS Exp.Theory

200 300 400 500

Raman shift [cm − ] E A E A MoSSe I n t e n s i t y [ a . u . ] Figure 21.

Comparison of the calculated and experimental (extracted from Ref. [52]) Raman spectrum of MoS (left) and MoSSe(right). The excitation wavelength is 532 nm, and both the polarization of both the incoming and outgoing photons are alongthe y -direction. The Raman peaks are labeled according to the irreducible representations of the corresponding vibrational modes.Adapted from Ref. [104]. tensor, the point group symmetry reduces the numberof independent tensor elements.In the C2DB, the quadratic susceptibility iscalculated using density matrices and perturbationtheory [107, 108] with the involved transition dipolematrix elements and band energies obtained fromDFT. The use of DFT single-particle orbitals impliesthat excitonic eﬀects are not accounted for. Thenumber of empty bands included in the sum overbands was set to three times the number of occupiedbands. The width of the Fermi-Dirac occupationfactor was set to k B T = 50 meV, and a line-shapebroadening of η = 50 meV was used in all spectra.Furthermore, time-reversal symmetry was imposed inorder to reduce the k -integrals to half the BZ. Forvarious 2D crystal classes, it was veriﬁed by explicitcalculation that the quadratic tensor elements fulﬁllthe expected symmetries, e.g. that they all vanishidentically for centrosymmetric crystals.As an example, the calculated SHG spectra formonolayer Ge Se is shown in Fig. 22 (left panel).Monolayer Ge Se has 5 independent tensor elements, χ (2) xxx , χ (2) xyy , χ (2) xzz , χ (2) yyx = χ (2) yxy , and χ (2) zzx = χ (2) zxz , sinceit is a group-IV dichalcogenide with an orthorhombiccrystal structure (space group 31 and point group C v ). Note that, similar to the linear susceptibility, thebulk quadratic susceptibility (with SI units of m/V)is ill-deﬁned for 2D materials (since the volume isambiguous) [106]. Instead, the unambiguous sheet quadratic susceptibility (with SI units of m /V) isevaluated. In addition to the frequency-dependentSHG spectrum, the angular dependence of the static( ω = 0) SHG intensity at normal incidence forparallel and perpendicular polarizations (relative to the incident electric ﬁeld) is calculated, see Fig. 22(right panel). Such angular resolved SHG spectroscopyhas been widely used for determining the crystalorientation of 2D materials. The calculated SHGspectra for all non-vanishing inequivalent polarizationconﬁgurations and their angular dependence, areavailable in the C2DB.Since C2DB has already gathered various materialproperties of numerous 2D materials, it providesa unique opportunity to investigate interrelationsbetween diﬀerent material properties. For example, thestrong dependence of the quadratic optical responseon the electronic band gap was demonstrated on basisof the C2DB data [106]. As another example of auseful correlation, the static quadratic susceptibilityis plotted versus the static linear susceptibility for67 TMDCs (with formula MX , space group 187)in Fig. 23. Note that for materials with severalindependent tensor elements, only the largest is shown.There is a very clear correlation between the twoquantities. This is not unexpected as both thelinear and quadratic optical responses are functionsof the transition dipole moments and transitionenergies. More interestingly, the strength of thequadratic response seems to a very good approximationto be given by a universal constant times thelinear susceptibility to the power of three (ignoringpolarisation indices), i.e. χ (2) (0 , ≈ Aχ (1) (0) , (15)where A is only weakly material dependent. Note thatthis scaling law is also known in classical optics assemi-empirical Miller’s rule for non-resonant quadraticresponses [109], which states that the second order ecent Progress of the Computational 2D Materials Database (C2DB) Pump photon energy [eV] | χ ( ) i j k | [ n m / V ] yyxxyy xxxxzz zzxx y θ = 0 Parallel: | χ (2) θθθ |Perpendicular: | χ (2)( θ + 90) θθ | Figure 22. (Left panel) SHG spectra of monolayer Ge Se , where only non-vanishing independent tensor elements are shown. Thevertical dashed lines mark (cid:126) ω = E g / (cid:126) ω = E g , respectively. The crystal structure of Ge Se structure is shown in the inset.(Right panel) The rotational anisotropy of the static ( ω = 0) SHG signal for parallel (blue) and perpendicular (red) polarizationconﬁgurations with θ deﬁned with respect to the crystal x -axis. s l o p e : max{| χ (1) ij (0)|} [nm] -4 -3 -2 -1 m a x { | χ ( ) i j k ( , ) | } [ n m / V ] WS MoS CrTe Figure 23.

Scatter plot (double log scale) of the static sheetquadratic susceptibility | χ (2) ijk | versus the static sheet linearsusceptibility | χ (1) ij | for 67 TMDCs (with chemical formulaMX and space group 187). A few well known materials arehighlighted. electric susceptibility is proportional to the product ofthe ﬁrst-order susceptibilities at the three frequenciesinvolved.

6. Machine learning properties

In recent years, material scientists have shown greatinterest in exploiting the use of machine learning(ML) techniques for predicting materials propertiesand guiding the search for new materials. ML is the scientiﬁc study of algorithms and statistical modelsthat computer systems can use to perform a speciﬁctask without using explicit instructions but insteadrelying on patterns and inference. Within the domainof materials science, one of the most frequent problemsis the mapping from atomic conﬁguration to materialproperty, which can be used e.g. to screen largematerial spaces in search of optimal candidates forspeciﬁc applications. [110, 111]In the ML literature, the mathematical represen-tation of the input observations is often referred to asa ﬁngerprint. Any ﬁngerprint must satisfy a number ofgeneral requirements. [112] In particular, a ﬁngerprintmust be

Complete:

The ﬁngerprint should incorporate all therelevant input for the underlying problem, i.e.materials with diﬀerent properties should havediﬀerent ﬁngerprints.

Compact:

The ﬁngerprint should contain no ora minimal number of features redundant tothe underlying problem. This includes beinginvariant to rotations, translations and othertransformations that leave the properties of thesystem invariant.

Descriptive:

Materials with similar target valuesshould have similar ﬁngerprints.

Simple:

The ﬁngerprint should be eﬃcient toevaluate. In the present context, this means thatcalculating the ﬁngerprint should be signiﬁcantlyfaster than calculating the target property. ecent Progress of the Computational 2D Materials Database (C2DB) HSE06 gap [eV]N=1059 σ =1.39 − − ∆ H [eV/atom] E B [eV]

100 200 h α i i [A]

100 200 h C ij i [N/m] H S E p [ e V ] − − ∆ H [ e V /a t o m ] N=3381 σ =0.63 − − ∆ H [ e V /a t o m ] E B [ e V ] N=357 σ =0.56 . . . . . E B [ e V ] h α i i [ A ] N=2990 σ =32.14 h α i i [ A ] HSE06 gap [eV] h C i j i [ N / m ] − − ∆ H [eV/atom] E B [eV] h α i i [A] h C ij i [N/m]N=3339 σ =41.83 Figure 24.

Pair-plot of selected properties from C2DB. The diagonal contains the single property histograms. Below the diagonalare two-property scatter plots showing the correlation between properties and above the diagonal are two-property histograms.properties include the HSE06 band gap, the PBE heat of formation (∆ H ), the exciton binding energy ( E B ) calculated from theBethe-Salpeter equation (BSE), the in-plane static polarisability calculated in the random phase approximation (RPA) and averagedover the x and y polarisation directions ( (cid:104) α i (cid:105) ), and the in-plane Voigt modulus ( (cid:104) C ii (cid:105) ) deﬁned as ( C + C + 2 C ), where C ij is a component of the elastic stiﬀness tensor. Several types of atomic-level materials ﬁngerprintshave been proposed in the literature, including generalpurpose ﬁngerprints based on atomistic properties[113,114] possibly encoding information about the atomicstructure, i.e. atomic positions[115, 112, 116], andspecialized ﬁngerprints tailored for speciﬁc applications(materials/properties)[117, 118]. The aim of this section is to demonstrate howthe C2DB may be utilized for ML-based predictionof general materials properties. Moreover, thestudy serves to illustrate the important role of theﬁngerprint for such problems. The 2D materials arerepresented using three diﬀerent ﬁngerprints: twopopular structural ﬁngerprints and a more advanced ecent Progress of the Computational 2D Materials Database (C2DB) H ),the exciton binding energy ( E B ) obtained from themany-body Bethe-Salpeter equation (BSE), the in-plane static polarisability calculated in the randomphase approximation (RPA) averaged over the x and y polarisation directions ( (cid:104) α i (cid:105) ), and the in-planeVoigt modulus ( (cid:104) C ii (cid:105) ) deﬁned as ( C + C + 2 C ),where C ij is a component of the elastic stiﬀness tensorin Mandel notation.To introduce the data, Figure 24 shows pair-plots of the dual-property relations of these properties.The plots in the diagonal show the single-propertyhistograms, whereas the oﬀ-diagonals show dual-property scatter plots below the diagonal andhistograms above the diagonal. Clearly, there are onlyweak correlations between most of the properties, withthe largest degree of correlation observed between theHSE06 gap and exciton binding energy. The lackof strong correlations motivates the use of machinelearning for predicting the properties.The prediction models are build using the Ewaldsum matrix and many-body tensor representation(MBTR) as structural ﬁngerprints. The Ewaldﬁngerprint is a version of the simple Coulomb matrixﬁngerprint[115] modiﬁed to periodic systems [112].The MBTR encodes ﬁrst, second and third order termslike atomic numbers, distances and angles betweenatoms in the system [116]. As an alternative to thestructural ﬁngerprints, a representation based on thePBE projected density of states (PDOS) is also tested.This ﬁngerprint (to be published) encodes the couplingbetween the PDOS at diﬀerent atomic orbitals and thedistance between atoms. Since this ﬁngerprint requiresa DFT-PBE calculations to be performed, additionalfeatures derivable from the DFT calculation can beadded to the ﬁngerprint. In this study, the PDOSﬁngerprint is amended by the PBE band gap. Thelatter can in principle be extracted from the PDOS, butits explicit inclusion improves performance (see below).A Gaussian process regression using a simpleGaussian kernel with a noise component is used aslearning algorithm. The models are trained using 5-fold cross validation on a training set consisting of 80%of the materials with the remaining 20% held aside astest data. Prior to training the model, the input spaceis reduced to 50 features using principal componentanalysis (PCA). This step is neccesary to reduce thehuge number of features in the MBTR ﬁngerprint to amanageable size. Although this is not required for theEwald and PDOS ﬁngerprints, we perform the samefeature reduction in all cases. The optimal number offeatures depends on the choice of ﬁngerprint, target h α i i [ A ] E B [ e V ] h C i j i [ N / m ] ∆ H [ e V /a t o m ] H S E p [ e V ] . . . . . . M A E / σ EwaldMBTRPDOS

Figure 25.

Prediction scores (MAE normalized to standarddeviation of property values) for the test sets of selectedproperties using a Gaussian process regression. property and learning algorithm, but for consistency50 PCA components are used for all ﬁngerprints andproperties in this study.Figure 25 shows the prediction scores obtained forthe 5 properties using the three diﬀerent ﬁngerprints.The employed prediction score is the mean absoluteerror of the test set normalized by the standarddeviation of the property values (standard deviationsare annotated in the diagonal plots in Fig. 24).In general, the PDOS ﬁngerprint outperforms thestructural ﬁngerprints. The diﬀerence betweenprediction scores is smallest for the static polarisability (cid:104) α i (cid:105) and largest for the HSE06 gap. It should bestressed that although the evaluation of the PBE-PDOS ﬁngerprint is signiﬁcantly more time consumingthan the evaluation of the structural ﬁngerprints, it isstill much faster than the evaluation of all the targetproperties. Moreover, structural ﬁngerprints requirethe atomic structure, which in turns requires a DFTstructure optimization (unless the structure is availableby other means).The HSE06 band gap shows the largest sensitivityto the employed ﬁngerprint. To elaborate on theHSE06 results, Fig. 26 shows the band gap predictedusing each of the three diﬀerent ﬁngerprints plottedagainst the true band gap. The mean absolute errorson the test set is 0.95 eV and 0.74 eV for Ewaldand MBTR ﬁngerprints, respectively, while the PDOSsigniﬁcantly outperforms the other ﬁngerprints witha test MAE of only 0.21 eV. This improvement inprediction accuracy is partly due to the presence of ecent Progress of the Computational 2D Materials Database (C2DB) DFT HSE06 gap [eV] M L H S E p [ e V ] Ewald

Train: MAE=0.82Test: MAE=0.95

DFT HSE06 gap [eV]MBTR

Train: MAE=0.35Test: MAE=0.74

DFT HSE06 gap [eV]PDOS

Train: MAE=0.10Test: MAE=0.21

Figure 26.

ML predicted HSE06 gap values vs. true values for Ewald, MBTR and PDOS ﬁngerprints with MAE’s for train andtest set included. The PDOS is found to perform signiﬁcantly better for the prediction of HSE06 gap. the PBE gap in the PDOS ﬁngerprint. However,our analysis shows that the pure PDOS ﬁngerprintwithout the PBE gap still outperforms the structuralﬁngerprints. Using only the PBE gap as feature resultsin a test MAE of 0.28 eV.The current results show that the precision of ML-based predictions are highly dependent on the type oftarget property and the chosen material representation.For some properties, the mapping between atomicstructure and property is easier to learn while othersmight require more/deeper information, e.g. in termsof electronic structure ﬁngerprints. Our results clearlydemonstrate the potential of encoding electronicstructure information into the material ﬁngerprint, andwe anticipate more work on this relevant and excitingtopic in the future.

7. Summary and outlook

We have documented a number of extensions andimprovements of the Computational 2D MaterialsDatabase (C2DB) made in the period 2018-2020.The new developments include: (i) A reﬁned andmore stringent workﬂow for ﬁltering prospect 2Dmaterials and classifying them according to theircrystal structure, magnetic state and stability. (ii)Improvements of the methodology used to computecertain challenging properties such as the full stiﬀnesstensor, eﬀective masses, G W band structures, andoptical absorption spectra. (iii) New materialsincluding 216 MXY Janus monolayers and 574monolayers exfoliated from experimentally knownbulk crystals. In addition, ongoing eﬀorts tosystematically obtain and characterize bilayers inall possible stacking conﬁgurations as well as pointdefects in the semiconducting monolayers, have been described. (iv) New properties including exfoliationenergies, spontaneous polarisations, Bader charges,piezoelectric tensors, infrared (IR) polarisabilities,topological invariants, magnetic exchange couplings,Raman spectra, and second harmonic generationspectra. It should be stressed that the C2DB willcontinue to grow as new structures and properties arebeing added, and thus the present paper should notbe seen as a ﬁnal report on the C2DB but rather asnapshot of its current state.In addition to the above mentioned improvementsrelating to data quantity and quality, the C2DB hasbeen endowed with a comprehensive documentationlayer. In particular, all data presented on theC2DB website are now accompanied by an informationﬁeld that explains the meaning and representation(if applicable) of the data and details how it wascalculated thus making the data easier to understand,reproduce, and deploy.The C2DB has been produced using the AtomicSimulation Recipes (ASR) in combination with theGPAW electronic structure code and the MyQueuetask and workﬂow scheduling system. The ASR isa newly developed Python-based framework designedfor high-throughput materials computations. Thehighly ﬂexible and modular nature of the ASR andits strong coupling to the well established community-driven ASE project, makes it a versatile framework forboth high- and low-throughput materials simulationprojects. The ASR and the C2DB-ASR workﬂoware distributed as open source code. A detaileddocumentation of the ASR will be published elsewhere.While the C2DB itself is solely concerned withthe properties of perfect monolayer crystals, ongoingeﬀorts focus on the systematic characterisation ofhomo-bilayer structures as well as point defects ecent Progress of the Computational 2D Materials Database (C2DB)

8. Acknowledgments

The Center for Nanostructured Graphene (CNG)is sponsored by the Danish National ResearchFoundation, Project DNRF103. This project hasreceived funding from the European Research Council(ERC) under the European Union’s Horizon 2020research and innovation program grant agreement No773122 (LIMA). T.D. acknowledges ﬁnancial supportfrom the German Research Foundation (DFG ProjectsNo. DE 2749/2-1).

References [1] Kohn W and Sham L J 1965

Physical review

A1133[2] Schwierz F 2010

Nature nanotechnology Science [4] Ferrari A C, Bonaccorso F, Fal’Ko V, Novoselov K S,Roche S, Bøggild P, Borini S, Koppens F H, PalermoV, Pugno N et al.

Nanoscale et al. ACS nano et al.

2D Materials et al. Nature communications et al. Nature

NatureReviews Materials et al. Science et al.

Nature nanotechnology Physical review letters

Nature

Proceedings of theNational Academy of Sciences et al.

Nature

Chemical Society Reviews et al. Scientiﬁcdata Physical reviewletters Physical ReviewB et al. Scientiﬁc reports Physical review letters

Physical review letters arXiv preprint arXiv:2009.12317 [25] Felipe H, Xian L, Rubio A and Louie S G 2020

Naturecommunications Nano letters et al. Nature materials

2D Materials Nature et al.

Nature et al.

Science

Physical Review Materials Nano Letters

2D Materials

2D Mater. The Journal ofPhysical Chemistry C

Journal of Open Source Software https://doi.org/10.21105/joss.01844 [39] Saal J E, Kirklin S, Aykol M, Meredig B and WolvertonC 2013 Jom et al. Apl Materials et al. Computational Materials Science et al. Scientiﬁcdata Phys. Rev. Mater.

Computational Materials Science ecent Progress of the Computational 2D Materials Database (C2DB) Physical Review B npjComputational Materials

2D Materials https://doi.org/10.1088/2053-1583/ab2ef3 [48] Li Y and Heinz T F 2018

2D Mater. https://doi.org/10.1088/2053-1583/aab0cf [49] Hadley L N and Dennison D 1947 JOSA Reviews of ModernPhysics Nature Nanotechnology ACSNano https://doi.org/10.1021/acsnano.7b03186 [53] F¨ul¨op B, Tajkov Z, Pet˝o J, Kun P, Koltai J, Oroszl´anyL, T´ov´ari E, Murakawa H, Tokura Y, Bord´acs S et al.

2D Materials ACS Nano Journal of physics C:Solid state physics Surface science

Acta Crystallogr. Sect. AFound. Crystallogr. Nucleic Acids Res. D420–D427[59] Qian X, Liu J, Fu L and Li J 2014

Science npj Computational Materials Physical review letters

Nano letters et al. Nature Physics Nano letters Scientiﬁc reports arXiv preprint arXiv:2010.06600 [67] Northup T and Blatt R 2014 Nature photonics Photonics vol 3 p 687[69] Zhang S and Northrup J E 1991

Physical review letters Physical Review B Physical Review B Nano letters The Journal ofPhysical Chemistry A

FaradayDiscussions [75] Grimme S, Antony J, Ehrlich S and Krieg H 2010

TheJournal of chemical physics

Atoms in molecules : a quantum theory

The International series of monographs on chemistry ;22 (Oxford: Clarendon Press) ISBN 0198551681[77] Tang W, Sanville E and Henkelman G 2009

Journal ofPhysics: Condensed Matter https://doi.org/10.1088/0953-8984/21/8/084204 [78] Resta R 1992 Ferroelectrics https://doi.org/10.1080/00150199208016065 [79] King-Smith R D and Vanderbilt D 1993

Physical ReviewB (3) URL https://doi.org/10.1103/PhysRevB.47.1651 [80] Zhang S and Yu F 2011 Journal of the American CeramicSociety Journal ofElectroceramics Ferroelectric memories vol 3 (SpringerScience & Business Media)[83] Rangel T, Fregoso B M, Mendoza B S, Morimoto T, MooreJ E and Neaton J B 2017

Phys. Rev. Lett. (6-11) URL https://doi.org/10.1103/PhysRevLett.119.067402 [84] Resta R and Vanderbilt D 2007 Theory of Polarization:A Modern Approach

Physics of Ferroelectrics vol105 (Springer Berlin Heidelberg) pp 31–68 ISBN3540345906[85] Gjerding M N, Cavalcante L S R, Chaves A and ThygesenK S 2020

The Journal of Physical Chemistry C https://doi.org/10.1021/acs.jpcc.0c01635 [86] Ye Z G 2008

Handbook of advanced dielectric, piezoelectricand ferroelectric materials: Synthesis, properties andapplications (Elsevier)[87] Ogawa T 2016

Piezoelectric Materials (BoD–Books onDemand)[88] Vanderbilt D 1999

Journal of Physics and Chem-istry of Solids (2) URL https://doi.org/10.1016/S0022-3697(99)00273-5 [89] Authier A 2003 International tables for crystallography:Volume D: Physical properties of crystals (WileyOnline Library)[90] Duerloo K A N, Ong M T and Reed E J 2012

The Journalof Physical Chemistry Letters https://doi.org/10.1021/jz3012436 [91] Zhu H, Wang Y, Xiao J, Liu M, Xiong S, Wong Z J, Ye Z,Ye Y, Yin X and Zhang X 2015 Nature Nanotechnology https://doi.org/10.1038/nnano.2014.309 [92] Taherinejad M, Garrity K F and Vanderbilt D 2014 Physical Review B Phys. Rev. B Physical Review Letters

Physical Review Materials Physical Review B Physical Review Letters MRS Communications Physical Review B Journal of Physics: CondensedMatter

2D Mater.

2D Materials The Raman Eﬀect: A UniﬁedTreatment of the Theory of Raman Scattering byMolecules (John Wiley & Sons Ltd, Chichester) ISBN0471490288 URL https://onlinelibrary.wiley.com/ ecent Progress of the Computational 2D Materials Database (C2DB) doi/book/10.1002/0470845767 [104] Taghizadeh A, Leﬀers U, Pedersen T G and Thygesen K S2020 Nat. Commun. https://doi.org/10.1038/s41467-020-16529-6 [105] Prylepa A, Reitb¨ock C, Cobet M, Jesacher A, JinX, Adelung R, Schatzl-Linder M, Luckeneder G,Stellnberger K H, Steck T, Faderl J, Stehrer T andStifter D 2018 J. Phys. D: Appl. Phys https://doi.org/10.1088/1361-6463/aa9df4 [106] Taghizadeh A, Thygesen K S and Pedersen T G 2020 arXiv:2010.11596 [cond-mat, physics:physics] ArXiv:2010.11596 URL http://arxiv.org/abs/2010.11596 [107] Aversa C and Sipe J E 1995

Phys. Rev. B (20)14636–14645 URL https://link.aps.org/doi/10.1103/PhysRevB.52.14636 [108] Taghizadeh A, Hipolito F and Pedersen T G 2017 Phys.Rev. B (19) 195413 URL https://link.aps.org/doi/10.1103/PhysRevB.96.195413 [109] Miller R C 1964 Appl. Phys. Lett. https://doi.org/10.1063/1.1754022 [110] Schmidt J, Marques M R G, Botti S and Marques M A L2019 npj Computational Materials

83 ISSN 2057-3960URL https://doi.org/10.1038/s41524-019-0221-0 [111] Zhuo Y, Mansouri Tehrani A and Brgoch J 2018

The Journal of Physical Chemistry Letters Preprint https://doi.org/10.1021/acs.jpclett.8b00124 ) URL https://doi.org/10.1021/acs.jpclett.8b00124 [112] Faber F, Lindmaa A, von Lilienfeld O A and ArmientoR 2015

International Journal of Quantum Chemistry

Preprint https://onlinelibrary.wiley.com/doi/pdf/10.1002/qua.24917 ) URL https://onlinelibrary.wiley.com/doi/abs/10.1002/qua.24917 [113] Ward L, Agrawal A, Choudhary A and Wolverton C 2016 npj Computational Materials Physical review letters

Physical review letters

Preprint )[117] Jørgensen P B, Mesta M, Shil S, Garc´ıa Lastra J M,Jacobsen K W, Thygesen K S and Schmidt M N 2018

The Journal of chemical physics

Chemistry of Materials30