[PDF] Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems

Abstract

Machine learning models are poised to make a transformative impact on chemical sciences by dramatically accelerating computational algorithms and amplifying insights available from computational chemistry methods. However, achieving this requires a confluence and coaction of expertise in computer science and physical sciences. This review is written for new and experienced researchers working at the intersection of both fields. We first provide concise tutorials of computational chemistry and machine learning methods, showing how insights involving both can be achieved. We then follow with a critical review of noteworthy applications that demonstrate how computational chemistry and machine learning can be used together to provide insightful (and useful) predictions in molecular and materials modeling, retrosyntheses, catalysis, and drug design.

Full PDF

CCombining Machine Learning and ComputationalChemistry for Predictive Insights Into ChemicalSystems

John A. Keith, ∗ , † Valentin Vassilev-Galindo, ‡ Bingqing Cheng, ¶ Stefan Chmiela, § Michael Gastegger, § Klaus-Robert M¨uller, ∗ , k and Alexandre Tkatchenko ∗ , ‡ † Department of Chemical and Petroleum Engineering Swanson School of Engineering,University of Pittsburgh, Pittsburgh, USA ‡ Department of Physics and Materials Science, University of Luxembourg, L-1511Luxembourg City, Luxembourg ¶ Accelerate Programme for Scientiﬁc Discovery, Department of Computer Science andTechnology, 15 J. J. Thomson Avenue, Cambridge CB3 0FD, United Kingdom, andCavendish Laboratory, University of Cambridge, J. J. Thomson Avenue, Cambridge CB30HE, United Kingdom § Department of Software Engineering and Theoretical Computer Science, TechnischeUniversit¨at Berlin, 10587, Berlin, Germany k Machine Learning Group, Technische Universit¨at Berlin, 10587, Berlin, Germany;Department of Artiﬁcial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul,02841, Korea; Max-Planck-Institut f¨ur Informatik, Saarbr¨ucken, Germany; GoogleResearch, Brain team, Berlin, Germany

E-mail: [email protected]; [email protected]; [email protected]

Abstract a r X i v : . [ phy s i c s . c h e m - ph ] F e b achine learning models are poised to make a transformative impact on chemicalsciences by dramatically accelerating computational algorithms and amplifying insightsavailable from computational chemistry methods. However, achieving this requires aconﬂuence and coaction of expertise in computer science and physical sciences. Thisreview is written for new and experienced researchers working at the intersection ofboth ﬁelds. We ﬁrst provide concise tutorials of computational chemistry and machinelearning methods, showing how insights involving both can be achieved. We then followwith a critical review of noteworthy applications that demonstrate how computationalchemistry and machine learning can be used together to provide insightful (and useful)predictions in molecular and materials modeling, retrosyntheses, catalysis, and drugdesign. Contents Introduction

A lasting challenge in applied physical and chemical sciences has been to answer the ques-tion: how can one identify and make chemical compounds or materials that have optimalproperties for a given purpose? A substantial part of research in physics, chemistry, andmaterials science concerns the discovery and characterization of novel compounds that canbeneﬁt society, but most advances still are generally attributed to trial-and-error experimen-tation, and this requires signiﬁcant time and cost. Current global challenges create greaterurgency for faster, better, and less expensive research and development eﬀorts. Compu-tational chemistry (CompChem) methods have signiﬁcantly improved over time, and theypromise paradigm shifts in how compounds are fundamentally understood and designed forspeciﬁc applications.Machine learning (ML) methods have in the last decades witnessed an unprecedentedtechnological evolution enabling a plethora of applications, some of which have become dailycompanions in our lives.

Applications of ML include technological ﬁelds such as websearch, translation, natural language processing, self-driving vehicles, control architectures,and in the sciences, e.g. medical diagnostics, particle physics, nano sciences, bioin-formatics, brain-computer interfaces, social media analysis, robotics, and team,social, or board games. These methods have also become popular for accelerating thediscovery and design of new materials, chemicals, and chemical processes. At the sametime, we have witnessed hype, criticism, and misunderstanding about how ML tools areto be used in chemical research. From this, we see a need for researchers working at theintersection of CompChem+ML to more critically recognize the true strengths and weak-nesses of each component in any given study. Speciﬁcally, we wanted to review why and howCompChem+ML can provide useful insights into the study of molecules and materials.While developing this review, we polled the scientiﬁc community with an anonymous5nline survey that asked for questions and concerns regarding the use of ML models withchemistry applications. Respondents raised excellent points including:1. ML methods are becoming less understood while they are also more regularly used asblack box tools.2. Many publications show inadequate technical expertise in ML (e.g. inappropriatesplitting of training, testing, and validation sets).3. It can be diﬃcult to compare diﬀerent ML methods and know which is the best for aparticular application, or whether ML should even be used at all.4. Data quality and context are often missing from ML modeling, and data sets need tobe made freely available and clearly explained.Additionally, when asked about the most exciting active and emerging areas of ML in the nextﬁve years, respondents mentioned a wide range of topics from catalysis discovery, drug andpeptide design, “above the arrow” reaction predictions, and generative models that promiseto fundamentally transform chemical discovery. When asked about challenges that ML willnot surmount in the next ﬁve years, respondents mentioned modeling complex photochemicaland electrochemical environments, discovering exact exchange-correlation functionals, andcompletely autonomous reaction discovery. This review will give our perspective on manyof these topics.As context for this review, Figure 1 shows a heatmap depicting the frequency of ML key-words found in scientiﬁc articles that also have keywords associated with diﬀerent AmericanChemical Society (ACS) technical divisions. Preparing this ﬁgure required several steps.First, lists of ML keywords were chosen. Second, lists of keywords were created by pe-rusing ACS division symposia titles from over the past ﬁve years. Third, Python scriptsused Scopus Application Programming Interfaces (APIs) to identify the number of scientiﬁcpublications that matched sets of ML and division symposia keywords. Figure 1 elucidatesseveral interesting points. First, the most popular ML approaches across all divisions areclearly neural networks, followed by genetic algorithms and support vector machines/kernel6ethods. Second, divisions such as physical (PHYS), analytical (ANYL), and environmental(ENVR) are already using diverse sets of ML approaches, while divisions such as inorganic(INOR), nuclear (NUCL), and carbohydrate (CARB) are primarily employing more distinctsubsets of approaches, while other divisions such as educational (CHED), history (HIST),law (CHAL), and business-oriented divisions (BMGT and SCHB), i.e. divisions that pro-duce much fewer scholarly journal articles, are not linking to publications that mention ML.Third, ML has more prevalence across practically all divisions over time. For further insight,Table 1 lists the top four keywords obtained from recent ACS symposium titles as well astheir respective contribution percentage reﬂected in Figure 1. There, one sees that a handfulof keywords can dominate matches in some of the bins, e.g.: ‘electro’, ‘sensor’, ‘protein’, and‘plastic’. With any ML application, there will be a risk of imperfect data and/or user bias,but this is a useful launch point to appreciate how and where ML is being used in chemicalsciences.

The survey results and literature analysis above showed an opportunity for a tutorial ref-erence to help readers address future research challenges that will require joint applicationsof CompChem, ML, and chemical and physical intuition (CPI). We will classify concepts inthis review using a popular rendition of a “data to wisdom” hierarchy, Figure 2. Scholarshave noted shortcomings with similar constructs, but we mean for this ﬁgure to reﬂectscientiﬁc progress, starting from data and ending in impact. CompChem, ML, and CPI eachbring something to the table since all have diﬀerent strengths and weaknesses. CPI canlead to knowledge, insight, and wisdom from data and information, but the applicability ofCPI may be limited when one is faced with large data sets. Alternatively, CompChem isextraordinarily well-suited for generating high quality data that contain useful information( vide infra ). Subsequently, non-linear ML models can provide numerical representations ofpartial or complete sets of data. The application of CPI to these ML models can be used to7igure 1: Heatmaps illustrating the extent that ML terms appear in scientiﬁc papersaligned by American Chemical Society (ACS) technical divisions from 2000-2010 a) andfrom 2010-present b). c) A line graph showing the number of occurrences of any MLterm being found in papers attributed to the ACS PHYS division, from 2000-present.Figures were made by Charles D. Griego. Python scripts used to generate these ﬁguresand corresponding Table 1 are freely available with a creative commons attribution license.Readers are welcome to use, adapt, and share these scripts with appropriate attribution:https://github.com/keithgroup/scopus searching ML in chem literature)8 a b l e : L i s t o f t o p r a n k e d k e y w o r d s ( p e r A C Sd i v i s i o n ) w i t h c o rr e s p o nd i n g p e r ce n t ag e o f m a t c h e s f o r a n y M L t e r m . D i v i s i o n R a n k R a n k R a n k R a n k P HY S e l ec t r o* ( . % ) s p ec t r o s c o p y ( . % ) i o n * ( . % ) n a n o* ( . % ) ANY L s e n s o r * ( . % ) s p ec t r o s c o p y ( . % ) c h a r a c t e r i z a t i o n * ( . % ) s p ec t r o m e t r y ( . % ) E NV R * s e n s o r * ( . % ) s o il * ( . % ) w a t e r q u a li t y ( . % ) e n v i r o n m e n t a l m o n i t o r * ( . % ) A G F D p r o t e i n * ( . % ) ag r i c u l t u r * ( . % ) f oo d ( . % ) f r u i t * ( . % ) E N F L f u e l * ( . % ) p e t r o l e u m ( . % ) e n e r g y e ﬃ c i e n c y ( . % ) b a tt e r * ( . % ) A G R O s o il ( . % ) c r o p * ( . % ) g r o und w a t e r ( . % ) d e v e l o p i n g c o un t r * ( . % ) O R G N p r o t e i n * ( . % ) a m i n oa c i d * ( . % ) p e p t i d e * ( . % ) a r o m a t i c * ( . % ) P O L Y p l a s t i c * ( . % ) p o l y m e r * ( . % ) p o l y m e r i z * ( . % ) p o l y m e r i c ( . % ) P M S E * p o l y m e r * ( . % ) * p e p t i d e * ( . % )t h i nﬁ l m * ( . % )t i ss u ee n g i n ee r i n g ( . % ) B I O T b i o c h e m i * ( . % ) b i o ph y s i c * ( . % ) s y s t e m s b i o l og y ( . % ) b i o t ec hn o l og y ( . % ) G E O C g r o und w a t e r ( . % ) m i n i n g ( . % ) *g e o c h e m * ( . % ) a n t h r o p og e n i c ( . % ) M E D I p r o t e i n i n t e r a c t i o n * ( . % ) d r u g d i s c o v e r y ( . % ) d r u g d e s i g n ( . % ) a n t i b i o t i c * ( . % ) C O M P d r u g d i s c o v e r y ( . % ) d r u g d e s i g n ( . % ) m o l ec u l a r m o d e l * ( . % ) p r o t e i nd a t a b a s e * ( . % ) C O LL n a n o p a r t i c l e * ( . % ) a d s o r p t i o n ( . % )t h i nﬁ l m * ( . % )t r i b o l og* ( . % ) B I O L d r u g d i s c o v e r y ( . % ) p r o t e i n f o l d i n g ( . % ) b i o s y n t h e s i s ( . % ) c y t o c h r o m e * ( . % ) T O X I t o x i * ( . % ) c h e m i c a l e x p o s u r e * ( . % ) a n t i b o d y d r u g c o n j u ga t e * ( . % ) C A T L c a t a l y * ( . % ) m e t a l o x i d e s ( . % ) ph o t o c a t a l y * ( . % ) s u r f a cec h e m i s t r y ( . % ) C I N F d r u g d i s c o v e r y ( . % ) c o m pu t a t i o n a l c h e m i s t r y ( . % ) b i o* m o d e li n g ( . % ) c h e m * d a t a b a s e * ( . % ) I N O R e l ec t r o c h e m * ( . % ) n a n o m a t e r i a l * ( . % ) o r ga n o m e t a lli c * ( . % ) m e t a l o r ga n i c f r a m e w o r k * ( . % ) NU C L nu c l e a r f u e l * ( . % ) i s o t o p e * ( . % ) r a d i o i s o t o p e * ( . % ) nu c l e a r m e d i c i n e * ( . % ) C A R B c a r b o h y d r a t e * ( . % ) g l y c o p r o t e i n * ( . % ) g l y c a n * ( . % ) o li go s a cc h a r i d e * ( . % ) R U BB r ubb e r * ( . % ) C E LL ce ll u l o s e ( . % ) p o l y s a cc h a r i d e * ( . % ) li g n i n ( . % ) li g n o ce ll u l o s * ( . % ) I & E Cw a t e r pu r i ﬁ c a t i o n ( . % ) i ndu s t r i a l c h e m * ( . % ) r a r ee a r t h e l e m e n t * ( . % ) i ndu s t r i a l a nd e n g i n ee r i n g c h e m i s t r y ( . % ) F L U O ﬂu o r i n e * ( . % ) r a d i o ph a r m a ce u t i c a l c h e m * ( . % ) C H E D c h e m * c l a ss * ( . % ) c h e m * c o mm un i c a t i o n * ( . % ) c h e m * e du c a t * ( . % ) l a b * s a f e t y ( . % ) C HA S c h e m * s a f e t y ( . % ) l a b * s a f e t y ( . % ) e n v i r o n m e n t a l h e a l t h a nd s a f e t y ( . % ) c h e m * r e g u l a t i o n s ( . % ) B M G T c h e m * c o m p a n * ( . % ) c h e m * e n t e r p r i s e * ( . % ) c h e m * bu s i n e ss * ( . % ) c h e m * r e s e a r c h a ndd e v e l o p m e n t( . % ) S C H B c o mm e r c i a l c h e m * ( . % ) c h e m * s ec t o r * ( . % ) a c a d e m i ce n t r e p r e n e u r * ( . % ) s c i e n ce a d v o c a* ( . % ) H I S T c h e m * h i s t o r * ( . % ) e v o l u t i o n o f c h e m * ( . % ) h i s t o r y o f c h e m * ( . % ) P R O F c h e m * e du c a t i o n ( . % ) C HA L ph a r m a ce u t i c a l p a t e n t * ( . % ) c h e m * i n c o mm e r ce ( . % ) c h e m * p a t e n t * ( . % ) vide infra ), a task that is especially diﬃcult for even the most expert-level CPIalone. However, useful ML requires robust datasets that can be provided by CompChemas long as the CPI component is selecting and correctly interpreting appropriate methodsfor the task at hand. We furthermore note that the knowledge generation process shown inFig. 2 is by no means a linear one — on the contrary, it contains many loops and dead ends.As we show later, within the troika of CompChem+ML+CPI, ML acts as a catalyst since itcan be used in place of explorative data-driven hypotheses generation. Automatically gener-ated hypotheses are then validated and calibrated with CompChem and CPI to yield further10mproved ML modeling (enriched by more physical prior knowledge), which then loops backwith improved hypotheses. This feedback loop is the key to the modern knowledge discoveryleading to insight, wisdom and hopefully a positive impact to society.11 CompChem and Notable Intersections with ML

We consider quantum mechanics as described by the non-relativistic time-independent Schr¨o-dinger equation as our “standard model” because it accurately represents the physics ofcharged particles (electrons and nuclei) that make up almost all molecules and materials.Indeed, this opinion has been held by some for almost a century:The fundamental laws necessary for the mathematical treatment of a large part ofphysics and the whole of chemistry are thus completely known, and the diﬃcultylies only in the fact that application of these laws leads to equations that are toocomplex to be solved. – P. A. M. Dirac, 1929Any theoretical method for predicting molecular and/or material phenomena must ﬁrst berooted in quantum mechanics theory and then suitably coarse-grained and approximated sothat it can be applied in a practical setting. CompChem, or more precisely, computationalquantum chemistry deﬁnes computationally-driven numerical analyses based on quantummechanics. In this section we will explain how and why diﬀerent CompChem methodscapture diﬀerent aspects of underlying physics. Speciﬁcally, this section provides a conciseoverview of the broad range of CompChem methods that are available for generating datasetsthat would be useful for ML-assisted studies of molecules and/or materials.

A traditional example of a “good model” is the ideal gas equation:

P V = nRT , which canbe considered ‘simple’, ‘useful’, and ‘insightful’. The ideal gas equation relates macroscopicpressure ( P ), volume ( V ), number of molecules ( n ), and temperature ( T ) of gases underidealized conditions, without requiring explicit knowledge of the processes occurring on an12tomic scale. Its simple functional form needs just one parameter, the ideal gas constant R ,and this makes it possible to formulate useful insights, such as how at constant pressure agas expands with rising temperature. On the other hand, this elegant equation only holdsfor conditions where the gas behaves as an ideal gas. The derivation of more accuratemodels of gases requires more mathematically complicated equations of state that rely onmore free parameters that in turn obfuscate physical insights, require more computationaleﬀort to solve, and thus make the model less “good”. This example also oﬀers a convenientconnection to ML models that will be discussed later in Section 3. As mathematical modelsfor complex phenomena become more complicated and less intuitive to derive, ML modelsthat can infer non-linear relationships from data become more applicable when increasingamounts of empirical data become available.Alternatively, the conventional CompChem treatment entails ﬁrst determining the sys-tem’s relevant geometry and its total ground state energy, and from that physical propertiesof interest (e.g. pressure, volume, band gap, polarizability, etc) can be obtained usingquantum and/or statistical mechanics. In this Section, we discuss the relevant CompChemmethods for these. While the mathematical physics for these methods might occasionallybe too complicated for a user to fully understand, many algorithms exist so that they canstill be easily run in a ‘black-box’ way with modern computational chemistry software andaccompanying tutorials. CompChem thus serves as an invaluable tool to generate dataand information for knowledge and insights across many length and time scales. Fig. 3 isan adaption of a multiscale hierarchy of diﬀerent classes of CompChem methods, showingtheir applicability for modeling diﬀerent length and time scales, and a depiction of how largescale models may be developed based on smaller scale theories. Integral to every CompChem study is the user’s representation for the system, i.e. how theuser chooses to describe the system. CompChem representations can range from simple and13igure 3: Hierarchy of computational methods and corresponding time and length scaleslucid (e.g. a precise chemical system such as a water molecule isolated in a vacuum) tocomplex and ambiguous (e.g. a putative but unproved depiction of a solid-liquid interfaceunder electrochemical conditions). Approximate wavefunctions (expressed on a basis set ofmathematical functions) or approximate Hamiltonians (referred to as levels of theory) asdescribed below in this section can also be considered representations. One might then saythat many representations for diﬀerent components of a system will constitute an overallrepresentation, and this is true. The point we make is that the validity of any computationalresult depends on the overall representation, and sometimes an incorrect representation mayprovide a correct result due to “fortuitous error cancellation”. In CompChem studies, a validrepresentation is one that captures the nature of the physical phenomena of a system. Fora molecular example, if one is determining the bond energy of a large biodiesel moleculeusing CompChem methods, it may or may not be justiﬁed to approximate a nearby long-chain alkyl group (-C n H (2 n +1) ) simply as a methyl (-CH ) or even a hydrogen atom. Indeed,choosing such a representation can sometimes be a useful example of CPI since alkyl bondsusually exhibit relatively short-ranged interactions (a feature that will be discussed in thecontext of ML in more detail in Section 4.1.3). An atomic scale geometry with fewer atomswould reduce the computational cost of the study or allow a more accurate but more com-putationally expensive calculation to be run. On the other hand, it might also be a poor14hoice if the chemical group, e.g. a substituted alkyl group participated in physical organicinteractions such as subtle steric, induction, or resonance eﬀects. For a solid-state example,a user might exercise good CPI by assuming that a relatively small unit cell under periodicboundary conditions would capture salient features of a bulk material or a material surface(as is often the case for many metals). On the other hand, subtle symmetry-breaking ef-fects in materials (e.g. distortions arising from tilting octahedra groups in perovskites, orsurface reconstruction phenomena that occur on single crystals) might only be observedwhen considering larger and more computationally expensive unit cells. Relevant to bothexamples, it may also be that the CompChem method itself brings errors that obfuscatephenomena that the user intends to model. In general, CompChem errors may be due toerrors in the initial set up the CompChem application, or if they were due to errors in howthe CompChem method is treating the physics of the system, and both factors reﬂect therepresentation used in the study. In Section 3, we will discuss how the choice of ML rep-resentation also plays similarly critical roles in determining whether and to what extent anML model is useful. The quantitative accuracy of a CompChem model stems from its suitability in describing thesystem. As explained above, an observed accuracy will depend on the representation beingused. High quality CompChem calculations have traditionally been benchmarked againstdatasets that consist of well-controlled and relatively precise thermochemistry experimentson small, isolated molecules.

The error bars for standard calorimetry experiments areapproximately 4 kJ/mol (or 1 kcal/mol or 0.04 eV), and computational methods that canprovide greater accuracy than this are stated as achieving ‘chemical accuracy’. Note thatthis term should be used when describing the accuracy of the method compared to the mostaccurate data possible; for example, if one CompChem method was found to reproduce an-other CompChem method within 1 kJ/mol, but both methods reproduce experimental data15ith errors of 20 kJ/mol, then neither method should be called chemically accurate. Thereare many well-established reasons why CompChem models can bring errors. For example,errors may be due to size consistency or size extensivity problems that are intrinsicwithin the CompChem method, larger systems sometimes embody signiﬁcant medium andlong range interactions (e.g. van der Waals forces) or self-interaction errors that mightnot be noticeable in small test cases. The recommended path forward is to consider whichfundamental interactions are in play in the system of interest, and then to use a CompChemmodel that is adequate at describing those interactions. Besides this, users should makeuse of existing tutorial references that provide practical knowledge about which parametersin a CompChem calculation should be carefully noted, for example Ref. 37. Historicallythe most popular CompChem methods for molecular and materials modeling (the B3LYP and PBE exchange correlation functionals, see section 2.2.3) are often said to have anexpected accuracy of about 10-15 kJ/mol (or 2-4 kcal/mol or 0.1-0.2 eV) when modelingdiﬀerences between the total energies of two similar systems, and errors are expected to besomewhat larger when considering transition state energies. Though this is used as a simplerule, it is obviously an oversimpliﬁcation and actual accuracy is only assessed by thoughtfulbenchmarking of the case being considered. In CompChem, one normally assumes that that any two users using the same representationfor the system with the same code on the same computing architecture will obtain the exactsame result within the numerical precision of the computers being used. This is not alwaysthe case, especially for molecular dynamics (MD) simulations that often rely on stochasticmethods. Computational precision also becomes more concerning when there are diﬀerentversions of codes in circulation, errors that might arise from diﬀerent compilers and libraries,and a lack of consensus in the community about which computational methods and which de-fault settings should be used for speciﬁc application systems, e.g. grid density selections,

16r standard keywords for molecular dynamics simulations.

There have been eﬀorts toconﬁrm that diﬀerent codes can reproduce energies for the same system representation, but some commercial codes hold proprietary licenses that restrict publications that criti-cally benchmark calculation accuracy and timings across diﬀerent codes. A path forwardto beneﬁt the advancement of insight is the development of (open) source codes that per-form as well if not better than commercial codes. While increased access to computationalalgorithms is beneﬁcial, it also raises the need for enforcing high standards of quality andreproducibility. We are also glad to see active developments to more lucidly show howany set of computational data is generated, precisely with which codes, keywords, and aux-iliary scripts and routines.

We are now in an era where truly massive amounts of dataand information can be generated for CompChem+ML eﬀorts. To go forward, one needs toknow what constitutes good and useful data, and the next section provides an overview ofhow to do this using CompChem.

Earlier we mentioned that a usual task in CompChem is to calculate the ground stateenergy of an atomic scale system. Indeed, CompChem methods can determine the energyfor a hypothetical conﬁguration of atoms, and this constitutes the potential energy surface(PES) of the system (Fig. 4). The PES is a hypersurface spanning 3 N dimensions, where N is the number of atoms in the system. Since the PES is used to analyze chemical bondingbetween atoms within the system, the PES can also be simpliﬁed by ignoring translationaland rotational degrees of freedom for the entire system. This reduces the dimensionality ofthe PES from 3 N to 3 N − N − z -axis is conventionally used to represent the scalefor system energy. 17igure 4: Potential energy surface (PES) of a ﬁctional system with the two coordinates R and R . The minima of the PES correspond to stable states of a system, such as equilibriumconﬁgurations and reactants or products. Minima can be connected by paths (red line),along which rearrangements and reactions can occur. The maximum along such a path iscalled a transition state. Transition states are ﬁrst order saddle points, a maximum in onecoordinate and minima in all others. They correspond to the minimum energy required totransition between two PES minima and play a crucial role in the description of chemicaltransformations.Any arbitrary PES will contain several interesting features. Minima on the PES corre-spond to mechanically stable conﬁgurations of a molecule or material, for example reactantand product states of a chemical reaction or diﬀerent conformational isomers of a molecule.Because they are minima, the second derivative of the energy given by the PES with respectto any dimension will be positive. Minima can also be connected by pathways, which indicatechemical transformations (Fig. 4, red line). Along such pathways, the second derivative canbe positive, zero, or negative, but all other second derivatives must be positive. Transitionstates are ﬁrst order saddlepoints, and thus represent a maximum in one coordinate and aminimum along all others. They correspond to the lowest energy barriers connecting twominima on the PES and are hence important for characterizing transitions between PESminima (e.g. chemical reactions). Second order saddle points and bifurcating pathways In standard computational quantum chemistry, a system’s energy can be computed in termsof the Schr¨odinger equation.

The wavefunction that will be used to represent the posi-tions of electrons and nuclei in the system (Ψ( r , R )) is hard to intuit since it can be complexvalued. However, its square describes the real probability density of the nuclear ( R ) andelectronic positions ( r ). In a real system, the position and interactions of a single particlein the system with respect to all other particles will be correlated , and this makes exactlysolving the Schr¨odinger equation impossible for almost all systems of practical interest. Tomake the problem more tractable, one may exploit the Born-Oppenheimer approximation; since nuclei are expected to move much slower than the electrons they can be approximatedas stationary at any point along the PES. This allows the energy to be calculated using thetime-independent Schr¨odinger equation and solving the eigenvalue problem:ˆ H Ψ = ( ˆ T + ˆ V )Ψ = E Ψ , (1)19igure 5: a) A ‘magic cube’ depiction of hierarchies of correlated wavefunction approaches.b) A ‘Jacob’s Ladder’ depiction of hierarchies of Kohn-Sham DFT approaches. c) Anillustration of expected hierarchies of atomistic potentials adapted from Ref. 60. d) Anillustration of overall hierarchies in predictive atomic scale modeling methods.Here, the Hamiltonian operator ( ˆ H ) is the sum of the kinetic ( ˆ T ) and potential ( ˆ V ) operators,Ψ is the wavefunction (i.e. an eigenfunction) that represents particles in the system, and E is the energy (i.e. an eigenvalue). In this way, nuclei can be treated as ﬁxed point charges,and then Eq. 1 can be transformed into the so-called electronic Schr¨odinger equation, wherethe Hamiltonian ˆ H el and wavefunction Ψ el ( r ; R ) now only depend on the nuclear coordinates20 in a parametric fashion:ˆ H el Ψ el ( r ; R ) = h ˆ T e ( r ) + ˆ V eN ( r ; R ) + ˆ V NN ( R ) + ˆ V ee ( r ) i Ψ el ( r ; R )= E el Ψ el ( r ; R ) (2)The above expression has ˆ H el composed of single electron (e), electron-nuclear (eN), nuclear-nuclear (NN), and electron-electron (ee) terms. Here, we will now implicitly assume the Born-Oppenheimer approximation throughout and leave oﬀ the subscript indicating the electronicproblem. However, we note that the Born–Oppenheimer approximation is not always suﬃ-cient and the computationally intensive nonadiabatic quantum dynamics may be required. In certain cases, semi-classical treatments are appropriate, for example, nonadiabatic eﬀectsbetween electrons and nuclei can be considered using nuclear-electronic orbital methods. A second common approximation is to expand the total electronic wavefunction in termsone-electron wave functions (i.e. spin orbitals): φ ( r i ). Electrons are fermions and thereforeexhibit antisymmetry, which in turn results in the Pauli exclusion principle. Antisymmetrymeans that the interchange of any two within the system should bring an overall sign changeto the wavefunction (i.e. from + to − , or vice versa ). This property is conveniently capturedmathematically by combining one electron spin orbitals into the form of a Slater determinant:Ψ( r , . . . , r n ) = 1 √ n ! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) φ ( r ) . . . φ n ( r )... . . . ... φ ( r n ) . . . φ n ( r n ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (3)Note that a determinant’s sign changes whenever two columns or rows are interchanged,and in a Slater determinant this corresponds to interchanging electrons and thus the physi-cally appropriate sign change for the overall wavefunction. Additionally, √ n ! is a normalizingfactor to ensure the wavefunction is unitary.The spin orbitals can be treated as a mathematical expansion using a basis set of µ χ µ , each having coeﬃcients c µi , which are generally Gaussian basis functions, Slater-type hydrogenic orbitals, or plane waves under periodic boundary conditions: φ i = X µ c µi χ µ (4)The diﬀerent types of mathematical functions bring diﬀerent strengths and weaknesses, butthese will not be discussed further here. A universal point is that larger basis sets will havemore basis functions and thus give a more ﬂexible and physical representation of electronswithin the system. On one hand this can be crucial for capturing subtle electronic struc-ture eﬀects due to electron correlation. On the other hand, larger basis sets also necessitatesigniﬁcantly higher computational eﬀort. A standard technique to avoid high computa-tional eﬀort in electronic structure calculations is to replace non-reacting core electrons withanalytic functions using eﬀective core potentials (ECPs, i.e. pseudopotentials). This re-quires reformulating the basis sets that describe the valence space of the atoms, for examplesee Refs. 90, 91. Larger nuclei that bring higher atomic numbers and larger numbers ofelectrons will also exhibit relativistic eﬀects, and relativistic Hamiltonians are based onthe Dirac equation or quantum electrodynamics. These methods can range from rea-sonably cost-eﬀective methods to those bringing extremely high computational cost. Practical applications have traditionally used standard non-relativistic Hamiltonian methodsalong with ECPs (or pseudopotentials) that have been explicitly developed to account forcompressed core orbitals that result from relativistic eﬀects.Using the Born-Oppenheimer approximation (Eq. 2) together with a Slater determinantwavefunction (Eq. 3) expressed in a ﬁnite basis set (Eq. 4) brings about the simplest wave-function based method, the Hartree-Fock (HF) approach (for historical context see Refs. 99–101). The HF method is a mean ﬁeld approach, where each electron is treated as if it moveswithin the average ﬁeld generated by all other electrons. It is generally considered inaccu-rate when describing many chemical systems, but it continues to serve as a critical pillar for22ompChem electronic structure calculations since it either establishes the foundation for allother accurate methods or provides energy contributions (i.e. exact exchange) that is notprovided in some CompChem methods. CompChem methods that achieve accuracy higherthan HF theory are said to contain electron correlation , a critical component for understand-ing molecules and materials (as described in more detail in section 2.2.2). Expressing Ψ asa Slater determinant and rearranging Eq. 2 while temporarily neglecting nuclear-nuclearinteractions allows one to deﬁne the HF energy in terms of integrals of the electronic spinorbitals: E HF = − X i Z d r φ ∗ i ( r ) 12 ∇ | {z } ˆ T e φ i ( r ) − X i Z d r φ ∗ i ( r ) N X α Z α | r − R α | | {z } ˆ V eN φ i ( r )+ X i,j>i Z Z d r d r φ ∗ i ( r ) φ i ( r ) 1 | r − r | | {z } ˆ V ee (Coulomb) φ ∗ j ( r ) φ j ( r ) − X i,j>i Z Z d r d r φ ∗ i ( r ) φ j ( r ) 1 | r − r | | {z } ˆ V ee (exchange) φ ∗ j ( r ) φ i ( r ) (5)where the ﬁrst two terms are referred to as one-electron integrals and represent the kineticenergy due to the electrons and the potential energy contributions from electron-nuclei inter-actions. The remaining terms are two-electron integrals that describe the potential energyarising from electron-electron interactions and are called Coulomb and exchange integrals.Using Lagrange multipliers, one can express the Hartree–Fock equation in a compact matrixform, the so-called Roothan–Hall equations, which allow for an eﬃcient solution: FC = (cid:15) SC (6)Each matrix has a size of µ × µ , where µ the number basis functions used to express theorbitals of the system. C is a coeﬃcient matrix collecting the basis coeﬃcients c µi (seeEq. 4), while S is the overlap matrix measuring the degree of overlap between individual23asis functions and (cid:15) is a diagonal matrix of the spin orbital energies. Finally, F is theFock matrix, with elements of a similar form as in Eq. 5, but expressed in terms of basisfunctions χ µ . One important detail not readily apparent in Eq. 6 is that the Fock matrixdepends on the orbital coeﬃcients that must be provided before Eq. 6 can be solved. Assuch, Eq. 6 cannot be solved in closed form, but instead requires a so-called self-consistentﬁeld approach. Starting from an arbitrary set of trial (i.e. initial guess) functions, oneiteratively solves for optimal molecular orbital coeﬃcients which are then used to constructa new Fock matrix, until a minimum energy is reached in accordance with the variationalprinciple of quantum mechanics. Evaluating and transforming the two-electron integrals inEq. 5 are a signiﬁcant bottleneck for these calculations and thus the computational eﬀort ofthe HF methods formally scales as O ( µ ) with the number of basis functions. This means,that a calculation on a system twice as large will require at least 2 = 16 times as muchcomputing time. The electronic exchange interaction resulting from the antisymmetry ofthe wavefunction imposes a strong constraint on the mathematical form of ML models forelectronic wavefunctions. Construction of eﬃcient and reliable antisymmetric ML modelsfor the many-body wavefunction is an important area of current research. The system’s correlation energy is deﬁned as sum of electron-electron interactions that origi-nate beyond the mean-ﬁeld approximation for electron-electron interactions that is providedby HF theory. While correlation energy makes up a rather small contribution to the overallenergy of a system (usually about 1% of the total energy), because internal energies in molec-ular and material systems are so enormous, this contribution becomes rather signiﬁcant. Asan example, most molecular crystals would be unstable as solids if calculated using the HFlevel of theory. The missing component is attractive forces that are obtained from levels oftheory that account for correlation energy. Correlation energies are obtained by calculatingadditional electron-electron interaction energies that arise from diﬀerent arrangements of24lectron conﬁgurations (i.e. eﬀectively, diﬀerent possible excited states) that are not treatedwith the mean ﬁeld approach of HF theory.The most complete correlation treatment is the full conﬁguration interaction (FCI)method, which is the exact numerical solution of the electronic Schr¨odinger equation (inthe complete basis limit) that considers interactions arising from all possible excited con-ﬁgurations of electrons. The FCI wavefunction takes the form of a linear combination ofall possible excited Slater determinants which can be generated from a single HF referencewavefunction by electron excitations:Ψ

FCI = a Ψ HF + X α,β a αβ Ψ αβ + X α,β,γ,δ a αγβδ Ψ αγβδ + . . . (7)where Ψ αβ represents the Slater determinant obtained by exciting an electron from orbital α into an unoccupied orbital β and the a s are expansion coeﬃcients determining the weightof the diﬀerent contributing conﬁgurations. Expectedly, FCI calculations scale extremelypoorly with the number of electrons in the system ( O ( n !)), as the number of possible con-ﬁgurations grows rapidly, making them feasible only for small molecules. For an example ofthe state of the art, FCI calculations have been used to benchmark highly accurate methodson calculations on a benzene molecule. Most correlated wavefunction methods use a subset of the possible conﬁgurations in Eq. 7to be computationally tractable. The conﬁguration interaction (CI) method for exampleonly includes determinants up to a certain permutation level (e.g. ‘s’ingle and ‘d’oubleexcitations in CISD). Alternatively, MP n (e.g. MP2) recovers the correlation energy byapplying diﬀerent orders of perturbation theory. Coupled cluster theory, another widelyused post-HF method, includes additional electron conﬁgurations via cluster operators. One coupled cluster method that involves single, double, and perturbative triples excitations,CCSD(T), is referred to as the “gold-standard” approach for CompChem electronic structuremethods since it brings high accuracy for molecular energies. However, there are many25ewer advances that improve upon CCSD(T).

Note that just because a method has areputation for being accurate does not mean that it will be for all systems. For example,consider again the benzene molecule, which is best illustrated having dotted resonance bonddepicting a planar molecule with equal C–C bond lengths. Such a geometry will not be foundto be stable with many diﬀerent CompChem methods, in part because of subtle chemicalbonding interactions and/or errors that arise from speciﬁc choices of basis sets used withdiﬀerent levels of theory.

A key point to reiterate is that correlated wavefunction methods are founded on the HFtheory, and so they are even more computationally demanding than HF calculations, e.g. O ( n ) for MP2, O ( n ) for CCSD and CISD and O ( n ) for CCSD(T). However, this computa-tional expense is alleviated by continually improving computing resources (e.g. the usabilityof graphics processing units (GPUs)) and the development of eﬃciency enhancing al-gorithms such as pseudospectral methods, resolution of the identity (RI), domain-based local pair natural orbital methods (DLPNO), and explicitly correlated R12/F12methods. There are also ongoing eﬀorts to develop other CompChem methods based onquantum Monte Carlo and density matrix renormalization group theory (DMRG) toprovide high accuracy with competitive scaling with other computational methods. Eﬀortsare beginning to become implemented that use ML to accelerate these types of calcula-tions.

Schemes have also been developed to exploit systematic errors between diﬀerent levelsof theory with diﬀerent basis sets so that approximations can be extrapolated toward anexact result. Examples include the complete basis set (CBS),

Gaussian G n , Weizmann(W-) n methods, and high accuracy extrapolated ab initio thermochemistry (HEAT) methods. For a recent review on these and other methods see Ref. 134. These schemes arealso becoming a target of recent work using ML methods. HF determinants provide good baseline approximations of the ground state electronicstructure of many molecules, but they may describe poorly more complicated bonding that26rises during bond dissociation events, excited states, and conical intersections.

Somemany-body wavefunctions are best described as a superposition of two or more conﬁgura-tions, e.g. when other conﬁgurations in Eq. 7 can have similar or higher expansion coeﬃ-cients a than the HF determinant. For this reason, high quality single reference methodslike CCSD(T) fail because the theory assumes that salient electronic eﬀects are captured bythe initial single HF conﬁguration. (In fact, methods such as CCSD(T) have been imple-mented with diagnostic approaches available that let users know when there may be cause forconcern). In these cases, it may no longer be trivial to ﬁnd reliable black-box or auto-mated procedures (e.g. in situations involving resonance states, chemical reactions, molecularexcited states, transition metal complexes, and metallic materials, etc.).

So-called multi-conﬁguration approaches, such as the generalized valence bond (GVB) method or thecomplete active space self consistent ﬁeld (CASSCF), the multireference CI (MRCI) meth-ods, complete active space perturbation theory (CASPT2), or multireference coupledcluster (MRCC), can more physically model these systems since they employ severalsuitable reference conﬁgurations with diﬀerent degrees of correlation treatments. Thesemethods are not black-box and should be expected to require an experienced practitionerwith CPI to choose the reference states that can substantially inﬂuence the quality of re-sults.

This is an area though where ML can bring progress in automating the selectionsof physically justiﬁed active spaces.

In closing, there are a large number of available correlated wavefunction methods butmany are even more costly than HF theory by virtue of requiring an HF reference energyexpression shown in Eq. 5. Fig. 5a depicts a so-called ‘magic cube’ (that is an extensionbeyond a traditional ‘Pople diagram’ ) that concisely shows a full hierarchy of com-putational approaches across diﬀerent Hamiltonians, basis sets, and correlation treatmentmethods. This makes it easy to identify diﬀerent wavefunction methods that should bemore accurate and more likely to provide useful atomic scale insights (as well as those thatwould be more computationally intensive). Another important aspect highlighted in the27magic cube’ is that higher level wavefunction methods require larger basis sets to success-fully model electron correlation eﬀects. A CCSD(T) computation carried out with a smallbasis set for example might only oﬀer the same accuracy as MP2 while being two orders ofmagnitude more expensive to evaluate.

As was mentioned earlier with the benzene sys-tem, spurious errors with diﬀerent basis sets might still be found that indicate problems withspeciﬁc combinations of levels of theory and basis sets. The deep complexity of correlatedwavefunction methods makes this a promising area for continued eﬀorts in CompChem+MLresearch.

Density-functional theory (DFT) is another method to calculate the quantum mechanicalinternal energy of a system using an energy expression that relies on functionals (i.e., afunction of a function) of electronic density ρ : E [ ρ ] = T [ ρ ] + V [ ρ ] (8)Compared to wavefunction theory, DFT should be far more eﬃcient since the dimensionalityof a density representation for electrons will always be in three rather than the 3 n dimensionsfor any n -electron system described by a many-body wavefunction method. DFT brings animportant drawback that the exact expression for the energy functional is currently unknown,all approximations bring some degree of uncontrollable error, and this has precipitated dis-agreeable opinions from purists in chemical physics, especially those who are developingcorrelated wavefunction methods. However, there is also substantial evidence that DFTapproximations are reasonably reliable and accurate for many practical applications thatbring information, knowledge, and sometimes insight. We now provide a bird’s-eye view ofDFT-based methods.One thrust of DFT developments since its inception has focused on designing accurate28xpressions strictly in terms of a density representation, and these approaches are referredto as ‘kinetic energy (KE-)’ or ‘orbital-free (OF-)’ DFT. Some energy contributions (e.g.nuclear-electron energy and classical electron-electron energy terms) can be expressed ex-actly, but other terms such as the kinetic energy as a function of the density are not knownand must be approximated. OF-DFT is very computationally eﬃcient (these methods shouldscale linearly with system size ) but these formulations have not yet been developed torival the accuracy or transferability of wavefunction methods, though they have been usedfor studying diﬀerent classes of chemical and materials systems.

OF-DFT methods arealso used in exciting applications modeling chemistry and materials under extreme condi-tions.

One should expect that once highly accurate forms are developed and matured,accurate CompChem calculations on electronic structures on systems having more than amillion of atoms might become commonplace. Indeed, there are eﬀorts to use ML to developmore physical OFDFT methods.

The most commonly used form of DFT (which is also one of the most widely usedCompChem methods in use today) is called Kohn–Sham (KS-)DFT.

In KS-DFT, oneassumes a ﬁctitious system of non-interacting electrons with the same ground state densityas the real system of interest. This makes it possible to split the energy functional in Eq. 8into a new form that involves an exact expression of the kinetic energy for non-interactingelectrons: E [ ρ ] = T ni [ ρ ] + V eN [ ρ ] + V ee [ ρ ] + ∆ T ee [ ρ ] + ∆ V ee [ ρ ] (9)Here, T ni [ ρ ] is the kinetic energy of the non-interacting electrons, V eN [ ρ ] is the exact nuclear-electron potential and V ee [ ρ ] is the Coulombic (classical) energy of the non-interacting elec-trons. The last two terms are corrections due to the interacting nature of electrons andnon-classical electron-electron repulsion. KS-DFT also expands the three dimensional elec-tron density into a spin orbital-basis φ similar to HF theory to deﬁne the one-electron kineticenergy in a straightforward manner. This allows the T ni , V eN and V ee expressions to be eval-29ated exactly and one arrives at the KS energy: E KS [ ρ ] = − X i Z d r φ ∗ i ( r ) 12 ∇ φ i ( r ) | {z } T ni − X i Z d r φ ∗ i ( r ) N X α Z α | r − R α | φ i ( r ) | {z } V eN + X i Z d r φ ∗ i ( r ) " Z d r ρ ( r ) | r − r | φ i ( r ) | {z } ˆ V ee (Coulomb) + E xc [ ρ ] (10)The last two correction terms in Eq. 9 arise from electron interactions, and these are combinedinto the so-called ’exchange-correlation’ term ( E xc ), which uniquely deﬁnes which scheme ofKS-DFT is being used. In theory, an exact E xc term would capture all diﬀerences betweenthe exact FCI energy and the system of non-interacting electrons for a ground state.The KS-DFT equations can be cast in a similar form as the Roothan–Hall equations(Eq. 6), which allows for a computationally eﬃcient solution. Moreover, the elements of theKohn-Sham matrix (which replaces the Fock matrix F ) are easier to evaluate due to the factthat several of computationally intensive integrals are now accounted for via E xc . Hence,the formal scaling for KS-DFT is O ( n ) with respect to the number of electrons. Eventhough this is much poorer scaling than ideally linear scaling OF-DFT, the exact treatmentof non-interacting electrons makes KS-DFT more accurate. Furthermore, there are severalmodern exchange-correlation functionals that routinely achieve much higher accuracy thanHF theory with less computational cost, and thus KS-DFT is a competitive alternative withmany correlated wavefunction methods in many modern applications.A remaining problem is constructing a practical expression for the exchange-correlationfunctional, as its exact functional form remains unknown. This has spawned a wealth of ap-proximations that have been founded with diﬀerent degrees of ﬁrst principles and/or empiri-cal schemes. Classes of KS-DFT functionals are deﬁned by whether the exchange-correlationfunctional is based on just the homogeneous electron gas (i.e. the ‘local density approxima-tion’, LDA), that and its derivative (i.e. the ‘generalized gradient approximation’, GGA), aswell as other additional terms that should result in physically improved descriptions and/or30rror cancellations. The resulting hierarchy of KS-DFT functionals is often referred to asa ‘Jacob’s Ladder’ of DFT (Fig. 5b). Generally, the higher up the ladder one goes, themore accurate but more computationally demanding the calculation. While the intrinsicinexactness in DFT makes it diﬃcult to assess which functionals are physically better thanothers.

The Jacob’s Ladder hierarchy is useful for clearly designating how and whynewer methods should perform in speciﬁc applications (for perspective see Refs. 167–169),though there remains substantial beneﬁt to those that bring CPI to the development eﬀorts.Indeed, by being based on a ground-state representation for homogeneous electron gas,DFT calculations can sometimes bring more easily physical insight into some systems thatare very challenging for wavefunction theory to examine (e.g. metals where HF theoryprovides divergent exchange energy behaviors). On the other hand, DFT is also generally notwell-suited for studying physical phenomena involving localized orbitals or band structuressuch as those found in semiconducting materials with small band gaps, molecular or materialexcited charge transfer states, or interaction forces that can arise due to excited states, e.g.dispersion (or London) forces. The former features can normally be treated using Hubbard-corrected DFT+U models that require a system-speciﬁc U − J parameter or moregeneralizable but much more computationally expensive hybrid DFT approaches. Dispersionforces (i.e. van der Waals interactions) are non-existent in semilocal DFT approximations,and it is now commonplace to introduce them into DFT calculations using a variety ofdiﬀerent methods. There is also growing interest in using embedded QC calculation schemes that can par-tition systems into discrete regions that could be treated with highly accurate correlatedwavefunction theory and computationally eﬃcient KS-DFT schemes separately.

DFThas also been extended to the modeling of excited states in the form of time-dependent(TD-)DFT.

Similar to ground state DFT, TDDFT is a computationally inexpensive al-ternative to excited state wavefunction-based methods. The approach yields reasonableresults where excitations induce only small changes in the ground state density, e.g. low ly-31ng excited states.

However, due to its single reference nature, TDDFT tends to breakdown in situations where more than one electronic conﬁguration contribute signiﬁcantly tothe excited state. Just as with correlated wavefunction methods, there are already signs ofCompChem+ML eﬀorts to improve the applicability of DFT-based methods.

Correlated wavefunctions and, to a lesser degree, KS-DFT are still very computationallydemanding and only of limited use for large scale simulations. Further approximations basedon wavefunctions and DFT methods have been developed to simplify and accelerate energycalculations. These so-called semi-empirical methods still explicitly consider the electronicstructure of a molecule, but in a more approximate way than methods described above.Semi-empirical approaches based on wavefunction theory include methods like extendedH¨uckel theory and neglect of diatomic diﬀerential overlap (NDDO).

Both approaches aresimpliﬁcations of the Hartree–Fock equations (Eq. 5) by introducing approximations to thediﬀerent integrals. In the NDDO approach, only the two-electron integrals in Eq. 5 areconsidered, where the two orbitals on the right and left hand side of the | r i − r j | operatorare located on the same atom. The remaining two-center (and one-center) integrals are thenapproximated by introducing a set of empirical functions, one for each unique type of integral.Moreover, the overlap matrix in Eq. 6 is assumed to be diagonal, which greatly simpliﬁesthe energy evaluation. This reduces the required computational eﬀort tremendously andallows the scaling of these approaches to be reduced to O ( N ). NDDO serves as a basisfor more sophisticated semi-empirical schemes, such as AM1, PM7 and MNDO, where the energy is usually determined self-consistently using a minimally sized basis set.Inadequacies in theory can be compensated by diﬀerent empirical parametrization schemesthat can allow these calculations to rival the accuracy of higher level theory for some systems.For example Dral et al. provided a recent ‘big-data’ analysis of the performance of severalsemi-empirical methods with large datasets. 32emi-empirical schemes are also carried over to approximate KS-DFT with so-calleddensity functional tight binding (DFTB).

DFTB simpliﬁes the Kohn–Sham equations(Eq. 10) by decomposing the total electron density ρ into a density of free and neutral atoms ρ and a small perturbation term δρ ( ρ = ρ + δρ ). Expanding Eq. 10 in the perturbation δρ makes it possible to partition the total energy into three terms amendable to diﬀerentapproximation schemes: E [ δρ ] = E rep + E Coul [ δρ ] + E BS [ δρ ] . (11) E rep is a repulsive potential containing interactions between the nuclei and contributionsfrom the exchange correlation functional (these are typically approximated via pairwise po-tentials). The charge ﬂuctuation term E Coul is modeled as a Coulomb potential of Gaussiancharge distributions computed from the approximate density. Finally, E BS refers to the‘band structure’ term which considers the electronic structure and contains contributionsfrom T ni , V eN , and the exchange correlation functional (see Eq. 10). To compute E BS , thedensity is expressed in a minimal basis of atomic orbitals, similar as in NDDO. The necessaryHamiltonian and overlap integrals are then evaluated via an approximate scheme based onSlater–Koster transformations. In addition to the energy, atomic partial charges are alsocomputed in this step, which are then used in E Coul . As a consequence, DFTB equationscan also be solved self-consistently. DFTB methods are parameterized by ﬁnding suitableforms for the repulsive potential and adjusting the parameters used in the Slater–Koster in-tegrals. Non-self consistent and self-consistent tight-binding DFT methods have beendeveloped for simulating large scale systems. Semi-empirical methods have also been a tar-get of diﬀerent ML schemes, yielding improved parametrization schemes and more accuratefunctional approximations. .2.5 Nuclear quantum eﬀects The quantum nature of lighter elements such as H–Li and even heavier elements that formstrong chemical bonds (C–C bond in graphene for example ) gives rise to signiﬁcant nuclearquantum eﬀects (NQEs). Such eﬀects are responsible for large diﬀerences from the Dulong-Petit limit of the heat capacity of solids, isotope eﬀects, and the deviations of the particlemomentum distribution from the Maxwell-Boltzmann equation.

To capture NQEs, path-integral molecular dynamics (PIMD) or centroid molecular dynamics (CMD) canbe used, but these methods are associated with much higher computational costs (usuallyabout 30 times higher) compared with classical MD simulations using point nuclei. Moreover,because systems may be inﬂuenced by competing NQEs, the extent of NQEs is sensitive to thepotential energy surface assumed. (Semi-)local DFT approaches may not even qualitativelypredict isotope fractionation ratios, and usually hybrid DFT is needed to reach quantitativeaccuracy.

However, employing hybrid DFT calculations or beyond in PIMD/CMD simu-lations can accrue extremely high computational costs. For this reason, ML force ﬁelds havebeen proposed as eﬃcient means to carry out PIMD simulations, enabling essentially exactquantum-mechanical treatment of both electronic and nuclear degrees of freedom, at leastfor small molecules with dozens of atoms.

Interatomic potentials introduce an additional level of abstraction compared to methodsdescribed above. Instead of using exact quantum mechanical expressions to create thePES for the system, analytic functions are used to model a pre-supposed PES that con-tains explicit interactions between atoms, while electrons are treated in an implicit manner(sometimes using partial charge schemes).

Interatomic potentials thus are (often timesdramatically) more computationally eﬃcient than correlated wavefunction, DFT, and semi-empirical approaches. This eﬃciency makes it possible to study even larger systems of atoms(e.g. biomolecules, surfaces, and materials) than is possible with other computational meth-34able 2: Types of interatomic potentials and their areas of application.

Potential Reactive Typical applications ExamplesPairwise-distance-based sometimes materials, liquids Lennard-Jones , Morse , Coulomb interac-tions, Buckingham

Distance andangle-based usually no materials, liquids many water potentials (e.g. SPC, TIP4P, mW) ,Stillinger-Weber

Class I(non-polarizable)force ﬁelds no Proteins, lipids, polymers, nu-cleic acids, carbohydrates, or-ganic molecules, liquids AMBER,

GAFF,

CHARMM,

GRO-MOS,

OPLS,

DREIDING,

MMFF94,

UFF,

COMPASS,

IN-TERFACE , interatomic potentials for ionicsystems

Class II(polarizable) no Proteins, lipids, polymers, nu-cleic acids, carbohydrates, or-ganic molecules, liquids AMOEBA, classical Drude oscillator mod-els, ﬂuctuating charge (FQ) models,

MB-Pol, distributed point polarizable models(DPP2), and many more.

Embedded atommethod(EAM)-like yes Reactions within solid materials EAM,

MEAM,

Finnis-Sinclair,

Sutton-Chen

Bond-order poten-tials (BOPs) yes Reactions within solids, liquids,gases Brenner,

Tersoﬀ,

REBO,

COMB,

ReaxFF,

APT

Other quantummechanics-derivedforceﬁelds yes Reactions within liquids andgases EVB and related models ods. Note that diﬀerent empirical potentials bring substantially diﬀerent computationaleﬃciencies; for example LJ potentials are more eﬃcient than classical forceﬁelds like AM-BER and CHARMM, while those are more eﬃcient than most bond-order potentials suchas ReaxFF.

The degree of eﬃciency arises from the balance of using accurate and/orphysically justiﬁed functional forms, approximations, and model parameterizations. Thereare many diﬀerent formulations (see Fig. 5c), and we will discuss the most general classes.An overview of the diﬀerent types of potentials and their features is provided in Tab. 2. Forextensive discussions on these methods including semi-empirical approaches, we refer to theextensive review by Akimov and Prezhdo (Ref. 257). An excellent review for interatomicpotentials is provided by Harrison et. al. (Ref. 258), and an excellent overview of modernmethods can be found in a special issue of J. Chem. Phys.

The distinctions between diﬀerent types of forceﬁelds can be blurry sometimes, and wewill diﬀerentiate categories in ascending complexity. One of the simplest interatomic poten-35ials is the Lennard–Jones potential: E LJ = X i,j>i ε ij  σ ij r ij ! − σ ij r ij !  . (12)It models the total energy as the sum of all pairwise interaction between atoms i and j usingan attractive and repulsive term depending on the interatomic distance r ij . ε ij modulatesthe strength of the interaction function, while σ ij deﬁnes where it reaches its minimum. TheLennard–Jones potential is a prototypical “good model” of interatomic potentials, as it has asuﬃciently simple physical form with only two parameters while still yielding useful results.For covalent systems such as bulk carbon or silicon, just pairwise distances are not suﬃ-cient to capture the local coordination of the atoms, and many empirical potentials for these systems were expressed as a function of the pairwise distances and three-bodyterms within a certain cutoﬀ distance. The pairwise term can take the form of LJ-type,electrostatic, or harmonic potentials, and the three-body term is usually a function of theangles formed by sets of three atoms.So-called Class I classical force ﬁelds introduce a more complicated energy expression: E tot = X bonds k ij ( r ij − ¯ r ij ) + X angles k ijk (cid:16) θ ijk − ¯ θ ijk (cid:17) + X dihedral X γ k ( γ ) ijkl h − cos( γφ ijkl − ¯ φ ( γ ) ijkl ) i + X i,j>i q i q j r ij + E LJ (13)The ﬁrst three terms are the energy contributions of the distances ( r ij ), angles ( θ ijk ) anddihedral angles ( φ ijkl ) between bonded atoms. Because of this, they are also referred to asbonded contributions. Bond and angle energies are modeled via harmonic potentials, withthe k ij and k ijk parameters modulating the potential strength and ¯ r ij and ¯ θ ijk being the36quilibrium distances and angles. The dihedral term is modeled with a Fourier series tocapture the periodicity of dihedral angles, with k ijkl and φ ijkl as free parameters. The lasttwo terms account for non-bonded interactions. The long range electrostatics are modeledas the Coulomb energy between charges q i and q j and the van der Waals energy is treatedvia a Lennard–Jones potential (Eq. 12). In Class I/II force ﬁelds, empirical parameters aretabulated for a variety of elements in wide ranges of chemical environments (for exampleRef. 262). Parameters for any one system should not necessarily be assumed to transferwell to other systems, and reparametrizations may be needed depending on the application.Diﬀerent sets of parametrization schemes give rise to diﬀerent types of classical FFs, withCHARMM, Amber,

GROMOS and OPLS being a few of many examples.An extension beyond these FFs are Class II (i.e. “polarizable”) force ﬁelds, where thestatic charges are replaced by environment dependent functions (e.g. AMOEBA ). Asigniﬁcant advantage to Class I and II types of forceﬁelds is that they are computationallyeﬃcient, which makes them will suited for molecular dynamics simulations of complex andextended (bio)molecules, such as proteins, lipids or polymers. Implementations of forceﬁeldcalculations on GPUs makes these simulations extremely productive.

A disadvantage ofClass I and II types of interatomic potentials is that they rely on predeﬁned bonding patternsto compute the total energy, and this limits their transferability. In general, bonds betweenatoms are deﬁned at the beginning of the simulation run and cannot change. Furthermore,bonding terms make use of harmonic potentials that are not suitable for modeling bonddissociation.Reactive potentials, which eschew harmonic potential dependencies and thus can describethe formation and breaking of chemical bonds, include the embedded atom method (EAM,Fig. 5c), which is used widely in materials science.

EAM is a type of many-body potentialprimarily used for metals, where each atom is embedded in the environment of all others.37he total energy is given by E tot = N X i  F i ( ˜ ρ i ) + 12 X j = i V ij ( r ij )  . (14) F i is an embedding function and ˜ ρ i an approximation to the local electron density basedon the environment of atom i . F i ( ˜ ρ i ) can be seen as a contribution due to non-localizedelectrons in a metal. V ij is a term describing to the core-core repulsion between atoms. AnEAM potential is determined by the functional forms used for F i and V ij as well as howthe density is expressed. Its dependence on the local environment without the need forpredeﬁned bonds make EAM well suited for modeling material properties of metals. Anextension of EAM is modiﬁed EAM (MEAM) which includes directional dependence inthe description of the local density ˜ ρ i , but this brings greater computational cost. EAMsalso form the conceptual basis of the embedded atom neural network (EANN) MLPs. Another common type of reactive potentials are bond order potentials (BOPs). In gen-eral, BOPs model the total energy of a system as interactions between the neighboringatoms: E tot = X i,j>i h V rep ( r ij ) − b ij ( k ) V att ( r ij ) i f cut ( r ij ) . (15) V rep and V att are repulsive and attractive potentials depending on the interatomic distance r ij . A cutoﬀ function f cut restricts all interactions to the local atomic environment. b ij ( k ) isthe bond order term, from which the potential takes its name. This term measures the bondorder between atoms i and j (i.e., ‘1’ for a single bond, ‘2’ for a double bond, ‘0.6’ for partiallydissociated bond). Bond orders can also depend on neighboring atoms k in some implemen-tations. BOPs are typically used for covalently bound systems, such as bulk solids and liquidscontaining hydrogen, carbon or silicon (e.g. carbon nanotubes and graphene). Dependingon the exact form of the expressions in Eq. 15, diﬀerent types of BOPs are obtained, suchas Tersoﬀ and REBO potentials. BOPs can also be extended to incorporate dy-namically assigned charges, yielding potentials like COMB or ReaxFF. As with38AMs, BOPs have also been used as a starting point for constructing more elaborate ma-chine learning potentials (MLPs) that will also be discussed in more detail in Section4. While eﬃcient and versatile, all interatomic potentials described above are inherentlyconstrained by their functional forms. A diﬀerent approach is pursued by machine learnedpotentials (MLPs), such as Behler-Parinello Neural Networks, q-SNAP, and GAP po-tentials (Fig. 5c). In MLPs, suitable functional expressions for interactions and energyare determined in a fully data-driven manner and ultimately only limited by the amount andquality of available reference data. One can then use substantially more data to generatea much more accurate MLP than would be possible when using, for instance, a ReaxFFpotential trained on similar data sets.

For the sake of completeness, we note that all approaches described here are fully atom-istic – each atom is modeled as an individual entity. It is also possible to combine groupsof atoms into pseudo-particles giving rise to so-called coarse grained methods. On an evenhigher level of abstraction, whole environments can be modeled as a single continuum. Assuch approaches are not subject of the present review, we refer the interested reader e.g. toRefs. 278 and 279.

Once an energy calculation is completed by one of the CompChem methods above, manyother interesting molecular properties can be calculated. Most of these properties can beobtained as the response of the energy to a perturbation, e.g. changes in nuclear coordinates R , external electric ( (cid:15) ) or magnetic ( B ) ﬁelds or the nuclear magnetic moments { I i } . Givenan expression for the energy, which depends on the above quantities, so-called responseproperties can be computed via the corresponding partial derivatives of the energy. A general39esponse property Π then takes the form Π ( n R , n (cid:15) , n B , n I i ) = ∂ n R + n (cid:15) + n B + n I i E ( R , (cid:15) , B , I i ) ∂ R n R ∂ (cid:15) n (cid:15) ∂ B n B ∂ I n I i i , (16)where the n s indicate the n -th order partial derivative with respect to the quantity in thesubscript. A common response property are nuclear forces F = − Π (1 , , ,

0) that are the negativeﬁrst derivative of the energy with respect to the nuclear positions. Such calculations allowa plethora of diﬀerent geometry optimization schemes for chemical structures on the PES.Hessian calculations corresponding to the second derivative of energy with respect to nuclearpositions are necessary to conﬁrm the location of ﬁrst order saddle points on the PES andidentify normal modes and their frequencies for vibrational partition functions that areuseful for modeling temperature dependencies based on statistical thermodynamics. Hessiancalculations are computationally costly, since they normally involve calculations based onﬁnite diﬀerences methods involving many nuclear force calculations. Many methods havebeen developed to allow CompChem algorithms to sample minimum energy regions of thePES or precisely locate points of interest.

Historically, many of these techniqueshave relied on approximate or full Hessian calculations, but other approaches such asthe nudged-elastic band and string methods are popular alternatives that donot require a Hessian calculation. There have also been eﬀorts using diﬀerent forms of MLto accelerate procedures or overcome long-standing challenges in eﬃcient sampling of andoptimization on the PES.

The general expression above can provide a wealth of other quantities, some of which arerelevant for molecular spectroscopy and/or provide a direct connection to experiment (seeTab. 3). Infrared spectra can be simulated based on dipole moments µ = − Π (0 , , , α = − Π (0 , , ,

0) oﬀer access to polarized and depolarizedRaman spectra. A central response property of the magnetic ﬁeld are nuclear magnetic40able 3: Response properties of the potential energy. n R n (cid:15) n B n I Property shielding tensors σ i = Π (0 , , , σ i = Tr[ σ i ]. The beautyof this formalism lies in the fact, that a single energy calculation method provides access to awide range of quantum chemical properties in a highly systematic manner. A large numberof modern MLPs use the response of the potential energy with respect to nuclear positions toobtain energy conserving forces. However, far fewer applications model perturbations withrespect to electric and magnetic ﬁelds. Ref. 299 extends the descriptor used in the FCHLKernel by an explicit ﬁeld dependent term making it possible to predict dipole momentsacross chemical compound space. Ref. 300 introduces a general NN framework to modelinteractions of a system with vector ﬁelds, which was then used to predict dipole moments,polarizabilities and nuclear magnetic shielding tensors as response properties. An important aspect of CompChem is molecular descriptions from with a solution envi-ronment. Simulating a dynamical environment composed of many surrounding molecules isusually not feasible with electronic-structure methods. To circumvent this problem, solvationmodeling schemes have been devised (see Refs. 301–306 for discussions on this topic).The most popular approach are so-called polarizable continuum solvent models (PCM).

They model the electrostatic interaction of a solute molecule with its environment by rep-resenting the charge distribution of the solvent molecules as a continuous electric ﬁeld, the41eaction ﬁeld. This dielectric continuum can be interpreted as a thermally averaged rep-resentation of the environment and is typically assigned a constant permittivity dependingon the particular solvent to be modeled ( ε = 80 . − ∇ [ (cid:15) ( r ) ∇ V ( r )] = 4 πρ m ( r ) (17)Here, ρ m ( r ) is the charge distribution of the solute and (cid:15) ( r ) is the position dependent permit-tivity, which usually is set to one within the cavity and the ε of the solvent on the outside. V ( r ) is the electrostatic potential composed of the two terms V ( r ) = V m ( r ) + V s ( r ) , (18)where V m ( r ) is the solute potential and V s ( r ) is the apparent potential due to the surfacecharge distribution σ ( s ), V s ( r ) = Z Γ σ ( s ) | r − s | d s . (19)Γ indicates the surface of the cavity. Eq. 17 is solved numerically to obtain the surface chargedistribution σ ( s ). Once σ ( s ) has been determined in this fashion, the potential is computedaccording to Eq. 19 and used to construct an eﬀective Hamiltonian of the formˆ H eﬀ = ˆ H + V s ( r ) , (20)where ˆ H is the vacuum Hamiltonian. These equations are then solved self consistently in aRoothan–Hall or Kohn–Sham approach, yielding the electrostatic solvent-solute interaction42nergy. This scheme is also called the self-consistent reaction ﬁeld approach (SCRF).Continuum models diﬀer in how the cavities are constructed and how Eq. 17 is solved toobtain the surface charge distribution. Variants include the original PCM model, also referedto as dielectric PCM (D-PCM), the integral equation formulation of PCM (IEFPCM), SMD, conductor PCM (C-PCM) or the conductor-like screening model (COSMO).

The latter two approaches replace the dielectric medium by a perfect conductor to allow fora particularly eﬃcient computation of σ ( s ). PCMs can be further extended with statisticalthermodynamics treatments to account for solutes having diﬀerent size and concentrationeﬀects, and this leads to models such as COSMO-RS. A drawback of most PCM-like approaches is that they neglect local solvent structures.Thus, they cannot reliably account for situations where explicit solvent interactions areimportant, e.g. when for stabilizing speciﬁc sites for a transition state through hydrogenbonding.

Furthermore, while implicit models might be parametrized to ﬁt bulk-like prop-erties of mixed or ionic solvents (e.g. Ref. 313), the complex local solvent environmentpresented by these systems are treatable by other means. For mixed solvent systems a rangeof hybrid schemes such as COSMO-RS, reference interaction site models (RISMs) or QM/MM approaches have been developed. As an in-depth discussion of these al-ternative schemes exceeds the scope of this review, we instead refer to other references.

ML models are becoming used to describe solvent eﬀects. Ref. 300 introduces a contin-uum ML model based on a reaction ﬁeld that can predict energies and response propertiesfor continuum solvents, it can extrapolate to solvents not seen during training, and it canbe extended to operate in a QM/MM fashion to account for explicit solvents eﬀects in aClaisen rearrangement reaction. Ref. 321 implemented automatable calculation schemesand unsupervised ML to allow predictions of single ion solvation energies for monovalentand divalent cations and anions based on physically rigorous quasi-chemical theory.

Ref. 324 used convolutional neural networks and molecular dynamics simulations to carryout high-throughput screening of mixed solvent systems. Ref. 325 implemented eﬃcient43ays to carry out ML-based QM/MM molecular dynamics simulations.

By solving for electronic structures, by whatever means is appropriate, one obtains molecularenergies and energy spectrum (typically corresponding to quasiparticles given by Kohn-Sham or Hartree-Fock orbitals). From these, one can then compute molecular or materialproperties that arise from quantum mechanical and statistical operators, e.g. thermodynamicenergies, response properties, highest and lowest occupied molecular orbital energies, bandgaps, among other properties. Many properties are deﬁned by the characters of the orbitals,and having knowledge of these should always be helpful and aid in deriving useful insightinto designing molecules and materials for a particular function. Furthermore, one is ofteninterested in how these molecules behave over time (i.e. the dynamics given some statisticalensemble that depends on temperature, pressure, etc) over all possible degrees of freedom.By understanding how energies and forces change over time, one can predict thermal andpressure dependencies as well as spectroscopic properties for advanced knowledge that buildstoward insightful predictions.Molecular and materials chemistry is vastly complex and variable, and one often facesa question of whether to span wider chemical spaces versus take deeper explorations of aspeciﬁc phenomenon. A key problem is that even after the eﬀort of either approach, it is alsonot as clear how information for one system might be related to another to provide moreknowledge. For instance, one may decide to calculate all possible properties of ethanol with aCompChem method, but understanding how any calculated property would be correlated toan analogous property of isopropanol is still usually diﬃcult to do. There is great interest inunderstanding chemical and materials space through applications of quantitative structureactivity/property relationships, cheminformatics, conceptual DFT, and alchemi-cal perturbation DFT.

All these applications beneﬁt from greater access to CompChemdata, and all have promise as being interfaced with ML for transformative applications to44atalyze wisdom and impact. 45

Machine Learning Tutorial and Intersections withChemistry

Machine learning (ML) has had a dramatic impact on many aspects of our daily lives andhas arguably become one of the most far-reaching technologies of our era. It is hard tooverstate its importance in solving long standing computer science challenges such as imageclassiﬁcation or natural language processing – tasks that require knowledge thatis hard to capture in a traditional computer program . Previous classical artiﬁcialintelligence (AI) approaches relied on very large sets of rules and heuristics, but these wereunable to cover the full scope of these complex problems. Over the past decade, advancesin ML algorithms and computer technology made it possible to learn underlying regularitiesand relevant patterns from massive datasets that enable automatic constructions of powerfulmodels that can sometimes even outperform humans at those tasks.This development inspired researchers to approach challenges in science with the sametools, driven by the hope that ML would revolutionize their respective ﬁelds in a similarway. Here, we give an overview of these developments in chemistry and physics to serve asan orientation for newcomers to ML. We will ﬁrst explain what tasks ML is good at andwhen it might not be the best solution to a problem. We will start by introducing the ﬁeldof ML in general terms and dissect its strengths and weaknesses.

In the most general sense, ML algorithms estimate functional relationships without beinggiven any explicit instructions of how to analyze or draw conclusions from the data. Learningalgorithms can recover mappings between a set of inputs and corresponding outputs or justfrom the inputs alone. Without output labels, the algorithm is left on its own to discoverstructure in the data.

Universal approximators are commonly used for that purpose. These reconstruct46ny function that fulﬁlls a few basic properties, such as continuity and smoothness, as long asenough data is available. Smoothness is a crucial ingredient that makes a function learnable,because it implies that neighboring points are correlated to Y in similar ways. That propertymeans that one can draw successful conclusions about unknown points as long as they areclose to the training data (coming from the same underlying probability distribution). Incontrast, completely random processes in the above sense allow no predictions.An association that immediately springs to mind is traditional regression analysis, butML goes a step further. Regression analyses aim to reconstruct the function that goesthrough a set of known data points with the lowest error, but ML techniques aim to identifyfunctions to predict interpolations between data points and thus minimize the prediction errorfor new data points that might later appear.

Those contrasting objectives are mirrored inthe diﬀerent optimization targets: In traditional regression the optimization taskˆ f = arg min f ∈F " M X i L ( f ( x i ) , y i ) , (21)only measures the ﬁt to the data, but learning algorithms typically aim to ﬁnd models ˆ f that satisfy ˆ f = arg min f ∈F " M X i L ( f ( x i ) , y i ) + k ΓΘ k . (22)Both optimization targets reward a close ﬁt, often using the squared loss L (cid:16) ˆ f ( x ) , y (cid:17) = (cid:16) ˆ f ( x ) − y (cid:17) . However, the key diﬀerence is an additional regularization term in Eq. 22,which inﬂuences the selection of candidate models by introducing additional properties thatpromote generalization. To understand why this is necessary, it is helpful to consider thatEq. 22 is only a proxy for the optimization problemˆ f = arg min f ∈F (cid:20)Z L ( f ( x i ) , y i ) dp ( x , y ) (cid:21) , (23)that we would actually like to solve. In an ideal world, we would minimize the loss function47ver the complete distribution of inputs and labels p ( x , y ). However, this is obviously im-possible in practice, so we apply the principle of Occam’s razor that presumes that simpler(parsimonious) hypotheses are more likely to be correct. With this additional considera-tion we hope to be able to recover a reasonably general model, despite only having seena ﬁnite training set. A common way to favor simpler models is via an additional term inthe cost function, which is what k ΓΘ k in Eq. 22 expresses. Here, Γ is a matrix that de-ﬁnes “simplicity” with regard to the model parameters Θ. Usually, Γ = λ I (with λ >

0) ischosen to simply favor a small L -norm on the parameters, such that the solution does notrely on individual input features too strongly. This particular approach is called Tikhonovregularization, but other regularization techniques also exist. A model that is heavily regularized (i.e. using a large λ ) will eventually become biased in that it is too simplistic to ﬁt the data well. In contrast, a lack of regularization mightyield an overly complex model with high variance . Such an “overly ﬁt” model will follow thedata exactly to the point that it also models the noise components and consequently fail togeneralize (see Fig. 6). Finding the appropriate amount of regularization λ to manage under-and over-ﬁtting is known as attaining a good bias-variance trade-oﬀ . We will introduce aprocess called cross-validation to address this challenge further below (see Section 3.4.3).

ML algorithms can infer functional relationships fromdata in a statistically rigorous way without detailed knowledge about the problem at hand.ML thus captures implicit knowledge from a dataset – even aspects where CPI might notbe available. Traditional modeling approaches, like classical forceﬁelds discussed in Sec-tion 2.2.6, rely on preconceived notions about the PES it is modeling and thus the waythe physical system behaves. In contrast, ML algorithms start from a loss function and amuch more general model class. Within the limits permitted by the noise inherent to thedata, generalization can be improved to arbitrary accuracy given increasingly larger infor-48ative training data sets. This process allows us to explore a problem even before there is areasonably full understanding. An ML predictor can serve as starting point for theory build-ing and be regarded as a versatile tool in the modelling loop: building predictive models,improving them, enriching them by formal insight, improving further and ultimately extract-ing a formal understanding. More and more research eﬀorts start to combine data-drivenlearning algorithms with rigorous scientiﬁc or engineering theory to yield novel insights andapplications.

Redundancy in ﬁrst principles calculations

For a quantum chemical property forcompounds in a dataset, ﬁrst principles calculations need to be repeated independently foreach input, even if they are very similar. No formally rigorous method exists to exploitredundancies in the calculations in such a scenario. The empiricism of learning algorithmshowever does provide a pathway to extract information based on compound structure simi-larity. A data-driven angle allows one to ask questions in new ways and can give rise to newperspectives on established problems. For example, unsupervised algorithms like clusteringor projection methods group objects according to latent structural patterns and provideinsights that would remain hidden when only looking at individual compounds.

Some diﬃcult problems in chemistry and physicscan be solved accurately with CompChem, but doing so would require signiﬁcant resources.For example, enumerating all pair-wise interactions in a many-body system will inevitablyscale quadratically, and there is no obvious path around this. One might ask if empiricalapproaches can address such fundamental problems more eﬃciently, but this is unfortunatelynot possible since ML is more suited for ﬁnding solutions in general function spaces ratherthan in deterministic algorithms where constraints guide the solution process. However, ifwe were not as interested in ﬁnding a full solution but rather some aspect of it, the stochastic49ature of ML can be beneﬁcial. For instance, a traditional ML approach might not be thebest tool for explicitly calculating the Schr¨odinger equation, but it might be a far more usefultool for developing a forceﬁeld that returns the energy of a system without the need for acumbersome wavefunction and a self-consistent algorithm. As an example, Hermann andNo´e used deep neural networks to show how ML methods may be suitable for overcomingchallenges faced by traditional CompChem approaches.

Reliance on high-quality data

ML algorithms require a large amount of high qualitydata, and it is hard to decide a priori when a data set is suﬃcient. Sometimes, a data setmay be large, but it does not adequately sample all the relevant systems one intends tomodel, e.g. a molecular dynamics simulation might generate many thousands of molecularconﬁrmations used to train an ML forceﬁeld, but perhaps that sampling only occurred ina local region of the PES. In this case, the ML forceﬁeld would be eﬀective at modelingregions of the PES it was trained to but useless in other regions until more data and broadersampling occurred. This feature is general to all empirical models that are generally limitedin their extrapolation abilities.

Inability to derive high-level concepts

Standard ML algorithms cannot conceptualizeknowledge from a data set. Two of the main reasons are the non-linearity and excessiveparametric complexity of most models that allow many equally viable solutions for the sameproblem.

It can be hard to gain insight into the modeled relationship because it isnot based on a small set of simple rules. Techniques have emerged to make ML modelsinterpretable (explainable AI - XAI ). While helpful, drawing scientiﬁc insight clearlystill requires human expertise.

Furthermore, the path from an ML model backto a physical set of equations is being explored, but it is far from being fully establishedautomatically. rone to artifacts Despite following the rules of best practice, ML algorithms can giveunexpected and undesired results. Instead of extracting meaningful relationships, they mayoccasionally exploit nuisance patterns within the underlying experimental design, like themodel architecture, the loss function or artifacts in the dataset. This results in a “cleverHans” predictor, which technically manages the learning problem but uses a trivial solu-tion that is only applicable within the narrow scope of the particular experimental setup athand. The predictor will appear to be performing well, while actually harvesting the wronginformation, and therefore not allowing any generalization or transferable insights.For example, a recently proposed random forest predictor for the success of Buchwald-Hartwig coupling reactions was later revealed to give almost the same performance whenthe original inputs were replaced by Gaussian noise.

This ﬁnding strongly suggestedthat the ML algorithm exploited some hidden underlying structure in the input data, irre-spective of the chemical knowledge that was provided through the descriptor. Even thoughthe model might appear quite useful, any conclusions that rely on the importance of thechemical features used in the model were thus rendered questionable at best. This ex-ample demonstrates that out-of-sample validation alone is often not suﬃcient to establishthat a proposed model has indeed learned something meaningful. Therefore, the hypoth-esis described by the model must be challenged in extensive testing in practically relevantscenarios like actual physical simulations. In other words the ML model needs to lead to abetter understanding of the modeling itself and the underlying chemistry.

ML models are classiﬁed by the type of learning problem they solve. Consider for instance adata scientist who develops an ML model that can predict acidity constants (p K a ’s) for anymolecule. A researcher with knowledge of physical organic chemistry might be aware of theempirical Taft equation that provides a linear free energy relationship between molecules onthe basis of empirical parameters that account for a molecules fundamental ﬁeld, inductive,51esonance, and steric eﬀects (e.g. values related to Hammett ρ and σ values). There areseveral ways the data scientist might develop an ML model to do this. Examples mentionedhere include supervised, unsupervised, and reinforcement learning. Supervised learning addresses learning problems where the ML model ˆ f : X ML −→ Y connectsa set of known inputs X and outputs Y , either to perform a regression or classiﬁcationtask. While the former maps onto a continuous space (e.g. energy, polarizability), the latteroutputs a categorical value (e.g. acid or base; metal or insulator) for each data point.Using the p K a predictor example, a supervised learning algorithm could be trained tocorrelate recognizable chemical patterns or structures to experimentally known p K a s. Thegoal would be to deduce the relationship between these inputs and outputs, such that themodel is able to generalize beyond the known training set. A standard universal approxima-tor has to accomplish this learning task without any preconceived notion about the problemat hand and will therefore likely require many examples before it can make accurate predic-tions. Recently, a lot of research is being carried out that investigates ways to incorporatehigh-level concepts into the learning algorithm in the form of prior knowledge. In thisvein, one could take into account chemically relevant parameters such as Hammett constantsso that the parameterized ML model incorporates the modiﬁed Hammett or Taft equation.An example of a classiﬁcation problem in materials science is the categorization of materi-als, where identifying characteristics of the electronic structure can be used to distinguishbetween insulators and metals.

Unsupervised learning describes problems were only the inputs X are known, with no cor-responding labels. In this setting, the goal is to recover some of the underlying structureof the data to gain a higher-level understanding. Unsupervised learning problems are not52s rigorously deﬁned as supervised problems in the sense that there can be multiple correctanswers, depending on the model and objective function that is applied.For example, one might be interested in separating conformers of a molecule from a molec-ular dynamics trajectory, given exclusively the positions of the atoms. A clustering algorithm(like the k-means algorithm) could identify those conformers by grouping the data based oncommon patterns. Alternatively, a projection technique could reveal a low-dimensionalrepresentations of the dataset.

Often data is represented in high dimension, despite beingintrinsically low-dimensional. With the right projection technique, it is possible to retainthe meaningful properties in a representation with less degrees of freedom. A conceptuallysimple embedding method is principal component analysis (PCA) in which the relationshipthat is sought to be preserved is the scalar product between the data points.

There aremany other linear and non-linear projection methods, such as multi-dimensional scaling, kernel PCA, t-distributed stochastic neighbor embedding (t-SNE), sketch-map, and the uniform manifold approximation and projection (UMAP).

Finally, anomaly de-tection is a further variant of unsupervised learning, where ’outliers’ to the available datacan be discovered.

However, without knowing the labels (in this example, the potentialenergy associated with each geometry), there is no way to conclusively verify that the re-sult is correct. The literature is gradually seeing more instances of unsupervised learning,particular to reveal important chemical properties to eﬃciently explore chemical/materialsspaces.

Reinforcement learning describes problems that combine aspects of supervised and unsuper-vised learning. Reinforcement learning problems often involve deﬁning an agent within anenvironment that learns by receiving feedback in the form of punishments and rewards. Theprogress of the agent is characterized by a combination of explorative activity and exploita-tion of already gathered knowledge.

For chemistry applications, reinforcement learning53echniques are being increasingly used for ﬁnding molecules with desired properties in largechemical spaces. Universal approximators have their origins in the 1960s, where the hope was to construct“learning machines” that have similar capabilities as the human brain. An early mathemat-ical model of a single simpliﬁed neuron emerged that was called a perceptron (Eq. 24). f ( x ) = sign( N X i =1 w i x i − b ) (24)Here, x denotes the N -dimensional input to the perceptron. It has N + 1 parametersconsisting of w i (so-called weights) and a single b (a so-called threshold) that are adapted tothe data. This adaption process is typically called “learning” ( vide infra ) and it amounts tominimizing a predeﬁned loss function.In the 1960s, this simple neural network had very limited use, as it was only able tomodel a linear separating hyperplane. Even simple non-linear functions like the XOR wereout of reach. Thus, excitement waned but then reappeared two decades later with theemergence of novel models consisting of more neurons and their arrangement in multi-layerneural network structures (see Eq. 25). Recent algorithmic and hardware advances nowallow deep and increasingly complex architectures. f k ( x ) = g  H X j =1 w kj g " N X i =1 w ji x i − b , (25)In Eq.(25), g ( · ) denotes an activation function that is a non-linear transformation thatallows complex mappings between input and output. As with the perceptron, the parametersof multi-layer NNs can be learned eﬃciently using iterative algorithms that compute thegradient of the loss-function using the so-called back-propagation (BP) algorithm. Inthe late 1980s, artiﬁcial neural networks were then proven to be universal approximators of54mooth nonlinear functions, and so they gained broad interest even outside the MLcommunity that then was still relatively small.In 1995, a novel technique called Support Vector Machines (SVM) and kernel-basedlearning were then proposed, which came with some useful theoretical guarantees.SVMs implement a nonlinear predictor: f ( x ) = N X j =1 y j α j K ( x j , x ) − b, (26)where K is the so-called kernel. The kernel implicitly deﬁnes an inner product in somefeature space and thus avoids an explicit mapping of the inputs. This “kernel trick” makesit possible to introduce non-linearity into any learning algorithm that can be expressed interms of inner products of the input. It has since been applied to many other algorithmsbeyond SVMs, such as Gaussian Processes (GP),

PCA, independent componentanalysis (ICA).

The most eﬀective kernels are tailored to the speciﬁc learning task at hand, but there aremany generic choices, such as the polynomial kernel K ( x j , x ) = ( x j · x − b ) d , which describesinner products between degree d polynomials. Another popular choice is the Gaussian kernel K ( x j , x ) = exp( − / σ ( x j − x ) ). It is one of the most versatile kernels because it onlyimposes smoothness assumptions on the solution depending on the width parameter σ . As seen in Eq. (26), an SVM can also be understood as a shallow neural network with aﬁxed set of non-linearities. In other words, the kernel explicitly deﬁnes a similarity metricto compare data points, whereas neural networks have some freedom to shape this trans-formation during training because they nest parameterizable non-linear transformations onmultiple scales. This diﬀerence gives both techniques some unique strengths and drawbacks.Despite that, there exists a duality between both approaches that allows neural networks tobe translated into kernel machines and analyzed more formally (see Refs. 399–401).In the context of computational chemistry, both NNs and kernel-based methods are the55ost used ML approaches. Simpler learners, such as nearest neighbors models or decisiontrees can still be surprisingly eﬀective. Those have also been successfully used to solve awide spectrum of problems including drug design, chemical synthesis planning, and crystalstructure classiﬁcation.

In the following, we summarize the overall ML process, starting from a dataset all the wayto trained and tested model. The ML workﬂow typically includes the following stages:• Gathering and preparing the data• Choosing a representation• Training the model – Train model condidates – Evaluate model accuracy – Tune hyperparameters• Testing the model out of sampleNote, that the progression to a good ML model is not necessarily linear and some steps(except the out of sample test) may require reiteration as we learn about the problem athand.

On a fundamental level, ML models could be simply regarded as sophisticated parametriza-tions of datasets. While the architectural details of the model matter, the reference datasetforms the backbone that ultimately determines its eﬀectiveness. If the dataset is not repre-sentative of the problem at hand, the model will be incomplete and behave unpredictably56n situations that have been improperly captured. The same applies to any other shortcom-ings of the dataset, such as biases or noise artifacts that will also be reﬂected in the model.Some of these dataset issues are likely to remain unnoticed when following the standardmodel selection protocol since training and test datasets are usually sampled from the samedistribution. If the sampling method is too narrow, errors seen during the cross-validationprocedure may appear to be encouragingly small, but the ML model will fail catastroph-ically when applied to a real problem. If the training and test sets come from diﬀerentdistributions, then techniques to compensate this covariate shift can be used.

Robust models can generally only be constructed from comprehensive datasets, but itis possible to incorporate certain patterns into models to make them more data-eﬃcient.Prior scientiﬁc knowledge or intuition about speciﬁc problems can be used to reduce thefunction space from which a ML algorithm has to select a solution. If some of the unphysicalsolutions are removed a priori, less data are necessary to identify a good model. This is whyNNs and kernel methods, despite both being broad universal function classes, bring diﬀerentscaling behaviors. The choice of the kernel function provides a direct way to include priorknowledge such as invariances, symmetries or conservation laws, whereas NNs are typicallyused if the learning problem cannot be characterized as speciﬁcally.

In general,without prior knowledge NNs often require larger datasets to produce the same accuracyas well-constrained kernel methods that embody problem knowledge. This consideration isparticularly important if the data is expensive, e.g. if it comes from high quality experimentsand/or expensive computations.

In order to apply ML, the dataset needs to be encoded into a numerical representation(i.e. features/descriptors) that allow the learning algorithm to extract meaningful patternsand regularities.

This is particularly challenging for unstructured data like moleculargraphs that have well-deﬁned invariable or equivariable characteristics that are hard to cap-57ure in a vectorial representation. For example, atoms of the same type are indistinguishablefrom each other, but it is hard to represent them without imposing some kind of order (whichinevitably assigns an identity to each atom). Furthermore, physical systems can be trans-lated and rotated in space without aﬀecting many attributes. Only a representation that isadapted to those transformations can solve the learning problem eﬃciently.It turned out to be a major challenge to reconcile all invariances of molecular systems ina descriptor without sacriﬁcing its uniqueness or computability. Some representations can-not avoid collisions, where multiple geometries map onto the same representation. Othersare unique, but prohibitively expensive to generate. Many solutions to this problem havebeen proposed, based on general strategies such as invariant integration, parameter shar-ing, density representations, or ﬁnger printing techniques.

Alternatively,an NN model infers the representation from data.

To date, none of the proposedapproaches are without compromise, which is why the optimal choice of descriptor dependson the learning task at hand.

The training process is the key step that ties together the dataset and model architecture.Through the choice of the model architecture, we implicitly deﬁne a function space of possiblesolutions, which is then conditioned on the training dataset by selecting suitable parameters.This optimization task is guided by a loss function that encodes our two somewhat opposingobjectives: (1) achieving a good ﬁt to the data, while (2) keeping the parametrization generalenough such that the trained model becomes applicable to data that is not covered in thetraining set (see the two terms in Eq. 22). Satisfying the latter objectives involves a processcalled model selection in which a suitable model is chosen from a set of variants that havebeen trained with exclusive focus on the ﬁrst objective. Depending on the model architecture,more or less sophisticated optimization algorithms can be applied to train the set of modelcandidates. 58 ernel-based learning algorithms are typically linear in their parameters ~α (see Eq. 26).Coupled with a quadratic loss function, L ( ˆ f ( x ) , y ) = ( ˆ f ( x ) − y ) , they yield a convex opti-mization problem. Convex problems can be solved quickly and reliably due to only havinga single solution that is guaranteed to be globally optimal. This solution can be found al-gebraically by taking the derivative of the loss function and setting it to zero. For example,KRR and GPs then yield a linear system of the form ∇ ~α L ( ˆ f ( x ) , y ) = ( K + λ I ) ~α − y = 0 (27)which is typically solved in a numerically robust way by factorizing the kernel matrix K .There exist a broad spectrum of matrix factorization algorithms such as the Cholesky de-composition that exploit the symmetry and positive deﬁniteness properties of kernel matri-ces.

Factorization approaches are however only feasible if enough memory is availableto store the matrix factors, and this can be a limitation for large-scale problems. In that case,numerical optimization algorithms provide an alternative: they take a multi-step approachto solve the optimization problem iteratively by following the gradient: ~α t = ~α t − − γ ∇ ~α L ( ˆ f ( x ) , y ) | {z } e.g. ( K + λ I ) ~α t − − y , (28)where γ is the step size (or learning rate). Iterative solvers follow the gradient of the lossfunction until it vanishes at a minimum, which is much less computationally demanding perstep, because it only requires the evaluation of the model ˆ f . In particular, kernel modelscan be evaluated without storing K (see Eq. 28). Neural networks are constructed by nesting non-linear functions in multiple layers, whichyields non-convex optimization problems. Closed-form solutions similar to Eq. 27 do notexist, which means that NNs can only be trained iteratively, i.e. analogous to Eq. 28. Severalvariants of this standard gradient descent algorithm exist including stochastic or mini-batch59radient descent, where only an n -sized portion of the training data ( x , y ) i : i + n is consideredin every step. Due to multiple local minima and saddle points on the loss surface, the globalminimum is exponentially hard to obtain such that these algorithms usually converge to alocal minimum. However, thanks to the strong modelling power of NNs, local solutions areusually good enough. Hyper-parameters

In addition to the parameters that are determined when ﬁtting a MLmodel to the dataset (i.e. the node weights/biases or regression coeﬃcients), many modelscontain so-called hyper-parameters that need to be ﬁxed before training. Two types of hyper-parameters can be distinguished: ones that inﬂuence the model, such as the type of kernelor the NN architecture, and ones that aﬀect the optimization algorithm, e.g. the choiceof regularization scheme or the aforementioned learning rate. Both tune a given model tothe prior beliefs about the dataset and thus play a signiﬁcant role in model eﬀectiveness.Hyper-parameters can be used to gauge the generalization behavior of a model.Hyper-parameter spaces are often rather complex: certain parameters might need tobe selected from unbounded value spaces, others could be restricted to integers or haveinterdependencies. This is why they are usually optimized using primitive exhaustive searchschemes like grid or random search in combination with educated guesses for suitable searchranges. Common gradient-based optimization methods typically cannot be applied for thistask. Instead, the performance of a given set of hyper-parameters is measured by evaluatingthe respective model on another training dataset called the validation dataset (see Fig. 6).This process is also referred to as model selection.

Model selection

Cross-validation or out-of-sample testing is a technique to assess how atrained ML model will generalize to previously unseen data.

For a reasonably complexmodel, it is typically not challenging to generate the right responses for the data knownfrom the training set. This is why the training error is not indicative of how the model willfulﬁll its ultimate purpose of predicting responses for new inputs. Alas, since the probability60igure 6: Supervised learning algorithms have to balance two sources of error during training:the bias and variance of the model. A highly biased model is based on ﬂawed assumptionsabout the problem at hand (under-ﬁtting). Conversely, a high variance causes a model tofollow small variations in the data too closely, therefore making it susceptible to picking uprandom noise (over-ﬁtting). The optimal bias-variance trade-oﬀ minimizes the generalizationerror of the model, e.g. how well it performs on unknown data. It can be estimated withcross-validation techniques. 61istribution of the data is typically unknown, it is not possible to determine this so-called generalization error exactly. Instead, this error is often estimated using an independent testsubset that is held back and later passed through the trained model to compare its responsesto the known test labels. If the model suﬀers from over-ﬁtting on the training data, this testwill yield large errors. It is important to remember not to tweak any parameters in responseto these test results, as this will skew this assessment of the model performance and will leadto overﬁtting on the test set.

Besides cross-validation, there are alternative ways to estimate the generalization error,for example via maximization of the marginal likelihood in Bayesian inference.

Somewell-deﬁned learning scenarios even allow the computation of rigorous upper bounds for thegeneralization error. Applications of Machine Learning to Chemical Sys-tems

We now discuss ways that CompChem methods described in Section 2 and ML methodsin Section 3 can be implemented as CompChem+ML approaches for insights into chemicalsystems. We often notice the lack of details about why an ML model is used and how it actually contributes to worthwhile and scientiﬁc insights. Thus, we will summarize theunderlying attributes of conventional CompChem+ML eﬀorts and then explain why theseattributes are important for speciﬁc applications.To begin, consider molecules or materials in a dataset, and any entry will be related toanother based on an abstract concept of “similarity”. While similarity is an application-dependent concept, it should go hand in hand with CPI. For instance, physical propertiesof chemical systems can be attributed to the structure and/or composition of the chemicalfragments within those systems. Thus, if chemical structures and/or compositions of twoentries in the database were similar, then their physical properties would also likely besimilar.For CompChem+ML using a supervised algorithm, a CompChem prediction might bemade on a hypothetical system, pinpointed by an ML model that was trained to identifychemical fragments that correlate with labeled physical properties. This would be a directexploitation of chemical similarity. Alternatively, for CompChem+ML using an unsupervisedalgorithm, the ML model would identify an underlying distribution or key features based onthe similarity between pairs of entries in the dataset without labeled data. This would bea more nuanced leveraging of chemical similarity. In both cases the accuracy, eﬃciency andreliability of the ML models depends strongly on the how similarity is deﬁned and measured.In this section we will ﬁrst describe state-of-the-art descriptors and kernels for atomicsystems that can be used to quantify the similarity between chemical systems. We will thenexplain the essential attributes of good atomic descriptors. Lastly for this section, we will63lucidate why and how speciﬁc combinations of these descriptors and ML algorithms arebeginning to revolutionize the ﬁeld of CompChem.

In CompChem, molecules and materials are usually represented by the Cartesian coordinatesand the chemical elements of all the atoms. Thus, the size of the vector representationcontaining the coordinates and charges will be {R N and Z N } , respectively, for a system ofsize N . Even though these atomic coordinates provide a complete description of the system,they are hardly ever used as the input of a ML model because this vector would introducesubstantial superﬂuous redundancy. For instance, an ML model might treat two identicalmolecules that are rotated or translated as diﬀerent molecules, and that in turn might causethe ML model to predict diﬀerent physical properties for the two otherwise indistinguishablemolecules. There are further diﬃculties when comparing molecules having diﬀerent numbersof atoms. To work around these problems, atomic coordinates are usually converted into anappropriate representation ψ that is suitable for a particular task. Such conversions are usefulbecause they allow the incorporation of physical invariances. Mathematically speaking, therepresentation fulﬁlls ψ ( {R N , Z N } ) = ψ ( S ( {R N , Z N } )) , (29)where S indicates a symmetry operation, e.g. a rigid rotation about an axis C i , an exchangeof two identical atoms, or a translation of the whole system in the Cartesian space, etc.It can also be advantageous to adopt a coarse-grained representation of the system. For example, dihedral angles of a peptide might be accounted for without the positions ofthe side-chains, positions of ions in a solution might be accounted for without the explicitcoordinates of solvents, or just the center of mass for a water molecule might be accountedfor in place of the full three-centered atomistic representation. The choice of these coarse-grained representations provides a way to incorporate prior knowledge of the data, or such64epresentations can be learned from an unsupervised learning step.

Atomistic systems can be represented in a myriad of ways. Some descriptions are designed toemphasise particular aspects of a system, while others aim to disambiguate similar chemicalor physical principles across a wide range of molecules or materials. The set of desirableproperties in a representation thus depends on the task at hand. The following overviewgives a coarse characterization of the most popular representations.Table 4 summarizes representations for chemical systems.

All re-spect the aforementioned physical symmetries and invariances needed for chemical systems.Many have similar theoretical foundations that can be understood as the basis onto whichthe atomic density is projected, and the connection between them has been summarisedin a recent review.

The descriptors in Table 4 can be classiﬁed into two categories: globaland atomic, as indicated under the column “globality”. Traditional descriptors used in chem-informatics are global descriptors based on the covalent connectivity of atoms. These includesimple valence counting and common neighbor analysis, the presence or absence of prede-ﬁned atomic fragments (e.g. the Morgan ﬁngerprints ), pairwise distances between atoms(e.g. Coulomb Matrix,

Sine Matrix,

Ewald Sum Matrix,

Bag of Bonds (BoB) ),etc. However, atomic descriptors are generally more popular than the globalones in ML and CompChem. In this case, a chemical system is described as a set of atomicenvironments, X , . . . X i . . . X N , and each consists of the atoms (chemical species and posi-tion) within a sphere of radius r cut centered at a speciﬁc atom i . One needs to combine theset of atomic descriptors of all environments to construct a descriptor for the entire atomicstructure. The most straightforward way to do this is to average the atomic descriptors,Φ( A ) = 1 N A N A X i ∈ A ψ ( X i ) , (30)65here the sum runs over all N A atoms i in structure A and X i is the environment aroundatom i . When there are multiple chemical species, the descriptors for the local environmentsof diﬀerent species can either be included in the single sum, or the averaging can be per-formed for the environments of each species separately and the species-speciﬁc averaged localdescriptors can be concatenated. This can be done by considering the Root Mean SquareDisplacement (RMSD), the best match between the environments of the two structures(best-match), or by combining local descriptors using a regularized entropy match (RE-Match). [a] Descriptors Comp. eﬃciency [b]

Periodic Unique Invariances : [c] Global SmoothT R P

Atom-centered SymmetryFunctions (ASCF)

B 1,2,3-body terms,cut-oﬀ (cid:34) (cid:56) (cid:34) (cid:34) (cid:34) (cid:56) (cid:34)

Smooth Overlap of Atomic Po-sitions (SOAP)

B density based, SO integration (cid:34) (cid:56) (cid:34) (cid:34) (cid:34) (cid:56) (cid:34) Coulomb Matrix (CM)

A 1,2-body terms (cid:56) (cid:34) (cid:34) (cid:34) (cid:56) (cid:34) (cid:34)

Sine Matrix

A 1,2-body terms (cid:34) (cid:34) (cid:34) (cid:34) (cid:56) (cid:34) (cid:34)

Ewald Sum Matrix

A 1,2-body terms (cid:34) (cid:34) (cid:34) (cid:34) (cid:56) (cid:34) (cid:34)

Bag of Bonds (BoB)

A 1,2-body terms (cid:56) (cid:56) (cid:34) (cid:34) (cid:35) (cid:34) (cid:56)

Faber-Christensen-Huang-Lilienfeld (FCHL)

C 1,2,3-body terms (cid:34) (cid:56) (cid:34) (cid:34) (cid:34) (cid:56) (cid:34)

Spectrum of London andAxilrod-Teller-Muto potential(SLATM)

D 1,2,3,4-body terms (cid:34) (cid:56) (cid:34) (cid:34) (cid:34) (cid:56) (cid:34)

Many-body Tensor Representa-tion (MBTR)

C 1,2,3-body terms (cid:56) (cid:56) (cid:34) (cid:34) (cid:34) (cid:34) (cid:34)

Atomic cluster expansion

A 1,2-body terms (cid:34) (cid:56) (cid:34) (cid:34) (cid:34) (cid:34) (cid:34)

Invariant many-body interac-tion descriptor (MBI)

B 1,2,3-body terms (cid:56) (cid:56) (cid:34) (cid:34) (cid:34) (cid:56) (cid:34)

Neural Network Architectures

Deep Potential - Smooth Edi-tion (DeepPot-SE)

B 1,2,3-body terms,cut-oﬀ (cid:34) (cid:56) (cid:34) (cid:34) (cid:34) (cid:56) (cid:34)

MPNN,SchNet

A / B 1,2-body terms,hierarchical (cid:34) (cid:56) (cid:34) (cid:34) (cid:34) (cid:56) (cid:34)

Cormorant

B 1,2-body terms, hier-archical (cid:56) (cid:56) (cid:34) (cid:34) (cid:34) (cid:56) (cid:34)

Tensor Field Networks

B 1,2-body terms (cid:34) (cid:56) (cid:34) (cid:34) (cid:34) (cid:56) (cid:34)

Similarity metrics

Root mean square deviation ofatomic positions (RMSD)

A 1,2-body terms, in-put matching (cid:56) (cid:56) (cid:35) (cid:35) (cid:56) (cid:34) (cid:56)

Overlap Matrix

A 1,2-body terms, in-put matching (cid:56) (cid:56) (cid:34) (cid:34) (cid:34) (cid:34) (cid:56)

REMatch

C 1,2-body terms, in-put matching (cid:56) (cid:56) (cid:34) (cid:34) (cid:34) (cid:34) (cid:56) sGDML

A 1,2-body terms (cid:34) (cid:34) (cid:34) (cid:34) (cid:35) [c] (cid:34) (cid:34) [a] ‘ (cid:34) ’ = satisﬁes condition; ‘ (cid:35) ’ = partially satisﬁes condition; ‘ (cid:56) ’ = does not satisfy condition[b] Computational eﬃciency ranks with grades A − D in descending order. The eﬃciency class reﬂects the extent that thedescriptor requires expensive operations (e.g. a hierarchical processing or matching of inputs)[c] ‘ T ’ = translational; ‘ R ’ = rotational; ‘ P ’ = permutational;[d] Only invariant permutations represented in the training data .1.2 Representing local environments We will now describe the Smooth Overlap of Atomic Positions (SOAP) descriptors sincemany other descriptors based on the atomic density are similar and diﬀer mainly by howthe density if projected onto basis functions.

To construct SOAP descriptors, one ﬁrstconsiders an atomic environment X that contains only one atomic species, and a Gaussianfunction of width σ is then placed on each atom i in X to make an atomic density function: ρ X i ( r ) = X i ∈X exp " − | r − r i | σ f cut ( | r | ) . (31)Here, r denotes a point in Cartesian space, r i is the position of atom i relative to thecentral atom of X , and the cutoﬀ function f cut smoothly decays to zero beyond the radius r cut .This density representation ensures invariance with respect to translations and permutationsof atoms of the same species but not rotations. To obtain a rotationally-invariant descriptor,one expands the density in a basis of spherical harmonics, Y lm ( ˆr ), and a set of orthogonalradial functions, g n ( r ), as ρ X ( r ) = X nlm c nlm g n ( | r | ) Y lm ( ˆr ) , (32)to construct the power spectrum of the density using the expansion coeﬃcients: ψ nn l ( X ) = s l + 1 X m ( c nlm ) ∗ c n lm . (33)One then obtains a vector of descriptors ψ = { ψ nn l } by considering all components l ≤ l max and n, n ≤ n max that act as band limits that control the spatial resolution of the atomicdensity. The generalization to more than one chemical species is straightforward: oneconstructs separate densities for each species α and then computes the power spectra ψ αα nn l ( X )for each pair of elements α and α where the two species indices correspond to the c ∗ and c coeﬃcients, respectively. The resulting vectors corresponding to each of the α and α pairsare then concatenated to obtain the descriptor vector of the complete environment.68tom-centered Symmetry Functions (ACSFs) or sometimes called Behler-Parrinello Sym-metry Function descriptors diﬀer from SOAP descriptors in that it projects the atomicdensities over selected 2-body or 3-body symmetry functions. Faber–Christensen–Huang–Li-lienfeld (FCHL) descriptors follow similar principles while also considering the correlationsbetween the atomic densities coming from diﬀerent chemical species. The Many-Body Ten-sor Representation (MBTR) approach involves taking the histograms of atom counts,inverse pair-wise distances, and angles. Atomic cluster expansion (ACE) descriptors ﬁrstexpress atomic densities using spherical harmonics and then generate invariant products bycontracting the spherical harmonics with the Clebsch-Gordan coeﬃcients. Length-scale hyper parameters

Most atomic descriptors use length-scale hyper param-eters speciﬁcally chosen for a given problem and system.

There areseveral ways to automate hyper parameter selections. Ref. 374 introduced general heuristicsfor choosing the SOAP hyper-parameters for a system with arbitrary chemical compositionbased on characteristic bond lengths. Ref. 464 adopts the strategy to ﬁrst generate a com-prehensive set of ACSFs and then select a subset using the sparsiﬁcation methods such asfarthest point sampling (FPS) and CUR matrix decomposition.

Incompleteness of atomic descriptors

A structural descriptor is complete when thereis no pair of conﬁgurations that produces the same descriptor.

For atomic descriptors, thismeans that diﬀerent atomic environments — after considering the invariances of rotation,translation and permutation of identical atoms — should adopt distinct descriptors. With-out completeness, any ML model using the descriptors as input will give identical predictionsof physically diﬀerent systems. Ensuring completeness while preserving the invariances isnon-trivial, however. One of the simplest descriptors is based on permutationally-invariantpairwise atomic distances (2-body descriptors), and Ref. 412 demonstrated that these aregenerally not complete since one can construct two distinct tetrahedra using the same setof distances. Many have assumed that permutationally-invariant 3-body atomic descriptors69niquely specify atomic environments due to the tremendous success of ML models for chem-ical systems and particularly MLPs. However, Refs. 468 and 467 exemplify that structuraldegeneracies can be found even when using 3- or 4-body descriptors. This underscores animportant shortcoming of state-of-the-art 3-body descriptors such as ACSF,

SOAP,

FCHL, and MBTR.

Atomic cluster expansion should be a complete descriptor oflocal environments, but its reliance on spherical harmonic expansion and the subsequentcontraction makes their evaluations expensive. Hence, there are still opportunities to de-velop improved atomic descriptors.

Representing a many-body chemical system in terms atomic environments brings physicalsigniﬁcance since certain extensive physical properties (e.g. the total energy, total electro-static charge, and polarizability of a system) can be approximated by the sum of the atomiccontributions coming from each atomic environment, e.g. Θ = P i θ ( X i ). This approximationis valid because the atomic contribution associated with a central atom is largely determinedby its neighbors, and long-range interactions can be approximated in a mean-ﬁeld mannerwithout explicitly considering distant atoms. Such “locality” is tacitly assumed in manyML models for CompChem, and it is a crucial necessity for most common atomistic po-tentials and MLPs (Section 2.2.6). Most MLPs (e.g. BPNN, GAP, and DeepMD )approximate the total energy of a system as sums of local atomic energies.Figure 7 illustrates locality by showing a KPCA map of the atom environments of carbonin the QM9 set (see Section 4.3 for more detailed descriptions of the data set). By color-coding the KPCA plot with the local energies from a SOAP-based GAP model trained onQM9 energies, one observes a systematic and smooth trend in energies across clusters.The total molecular energy can then be accurately predicted by the sum of local energies,which means the total energy can be approximated on the basis of all the local environmentscontained in the molecule. For example, an NN potential trained on liquid water simulations70igure 7: KPCA maps of carbon atom environments in the QM9 database. Maps are color-coded according to Mulliken charges (a), hybridization (b), whether atoms are found in rings(c) and according to local energies predicted by a machine learning potential (d). Reprintedwith permission from Ref. 374. Copyright (2020) American Chemical Society.can predict the densities, lattice energies, and vibrational properties of diverse ice phasesbecause the local atomic environments found in liquid water span the similar environmentsas those observed in ice phases.

Another GAP potential of carbon trained on amorphousstructures and other crystalline phases predicted novel carbon structures in random structuresearches as well as approximate reaction barriers.

The locality approximation is typically rationalized based on the multiscale nature ofinteratomic interactions in chemical systems. It is generally expected that shorter interatomicdistances correspond to stronger interactions, such that a cutoﬀ may be imposed after acertain radial distance d given a certain energy accuracy threshold (cid:15) . The multiscale natureof interactions underlies the usual classiﬁcation of chemical interactions, from strong covalentbonds and ionic interactions to weaker non-covalent hydrogen bonds and van der Waalsinteractions. However, our understanding of non-covalent interactions in large molecules71nd materials is still emerging and no general rules-of-thumb exist to deﬁne the cutoﬀdistance d corresponding to a deﬁned (cid:15) . Hence, for systems having long-range interactions(which includes most chemical systems), the locality assumption needs revision. Models suchas (s-)GDML that learn global interactions can be better suited in these cases. Globalmodels tend to be more data-eﬃcient because they are more specialized toward learning afull PES for a particular molecule or material, but this restricts the use of a single ML modelto only the system it was trained upon. Built-in symmetry in ML models substantially compresses the dimensionality of atomic rep-resentations and ensures that physically equivalent systems are predicted to have identicalproperties. One of the most rigorous ways of imposing symmetry onto a model f is via theinvariant integration over the relevant group S f sym ( x ) = Z π ∈S f ( P π x ) , (34)where P π x is a permutation of the input. However, the cardinality of even basic symmetrygroups is exceedingly high, which makes this operation prohibitively expensive. This combi-natorial challenge can be solved by limiting the invariant integral to the physical point groupand ﬂuxional symmetries that actually occur in the training dataset, as done in sGDML. Alternative approaches such as parameter sharing or density representations havealso proven eﬀective. For example, the DeepMD potential has two versions, the Smooth Edi-tion (DeepPot-SE) explicitly preserves all the natural symmetries of the molecular system,and the other version that does not.

The DeepPot-SE oﬀers much improved stability andaccuracy.

For ML predictions of scalar properties, the rotationally-invariant atomic descriptorframework described earlier is appropriate. One may wish to predict vectorial or tenso-72ial properties including dipole moments, polarizability, and elasticity. A co-variant versionof descriptors may be advantageous, and this can be expressed as S ( ψ ( {R N } )) = ψ ( S ( {R N } )) , (35)where S indicates a symmetry operation such as a rigid rotation about an axis. Ref. 473proposed a general method for transforming a standard kernel for ﬁtting scalar properties intoa co-variant one. Ref. 474 derived a rotational-symmetry-adapted SOAP kernel, which canbe understood as using the angular-dependent SOAP vectors based on spherical harmonicsexpansions as the descriptors. Note that the SOAP kernels for learning scalar propertiesintroduced in Ref. 412 remove angular dependencies by summing up the SOAP vectors inseparate spherical harmonics channels.Symmetry can be further exploited into “alchemical” representations that incorporatesimilarity between chemical species that are relatable by changing one atom into another.The FCHL representation considers the similarity between elements in the same rowand columns of the periodic table and performs very well on chemical compounds acrosschemical space. Ref. 475 compiled a data-driven periodic table of the elements by ﬁtting toan elpasolite data set using an alchemical representation. All descriptors introduced above rely on a suitable set of hyperparameters (e.g. lengthscales, radial and angular resolution). Determining an optimal set of hyperparameters canbe a tedious process, especially when heuristics are unavailable or fail due to the structuraland compositional complexity of the system. A poor choice of descriptors can limit theaccuracy of the ﬁnal ML model, e.g. when certain interatomic distances can not be resolved.End-to-end NN representations follow a diﬀerent strategy to learn a representation di-rectly from reference data. Using atom types and positions of a system as inputs, end-to-end73Ns construct a set of atom-wise features x i . These features are then used to predict theproperty of interest, e.g. the energy as a sum of atom-wise contributions. Unlike staticdescriptors, the representation is also optimized as part of the overall training process. Thisway end-to-end NNs can adapt to structural features in the data and the target propertiesin a fully automatic fashion to eliminate the need for extensive feature engineering from thepractitioner.The deep tensor NN framework (DTNN) introduced a procedure to iteratively reﬁnea set of atom-wise features { x i } based on interactions with neighboring atoms. Higher orderinteractions can then be captured in an hierarchical fashion. For example, a ﬁrst informationpass would only capture radial information, but further interactions would recover angularrelations and so on. In DTNN, a learnable representation depending only on atom types x i = e Z i serves as an initial set of features. These are then reﬁned by successive applicationsof an update function depending on the atomic environment that takes the general form: x l +1 i = F l  x li + X j G l h x li , x lj , g l ( | r i − r j | ) i f cut ( | r i − r j | )  (36)Here, l indicates the number of overall update steps. The sum runs over all atoms j inthe local environment, and a cutoﬀ function f cut ensures smoothness of the representation.Each feature is updated with information from all neighboring atoms through the interactionfunction G. Apart from the neighbor features x j , G also depends on the interatomic distance | r i − r j | , which is usually expressed in the form of a radial basis vector g . After the update,an atom-wise transformation F can be applied to further modulate the features. Sinceeach update depends only on the interatomic distances and the summation over neighboringatoms is commutative, end-to-end NNs of this type automatically achieve a representationthat is invariant to rotation, translation and permutations of atoms. Using these atom-typedependent embeddings compactly encodes elemental information. This is advantageous forsystems comprised of many diﬀerent chemical elements. Such multi-component systems can74e problematic to treat with pre-deﬁned descriptors (e.g. ACSFs or SOAP), as these typicallyintroduce additional entries for each possible combination of atom types, resulting in a largenumber of descriptor dimensions.Since the introduction of DTNN, many diﬀerent types of end-to-end NNs have beendeveloped, and these vary by the choice for the functions F and G. For example, SchNet uses continuous convolutions inspired by convolutional neural networks (CNNs) to describethe interatomic interactions. In this case, the update in Eq. 36 takes the form x l +1 i = x li + MLP l tr X j x lj MLP l rad ( g l ( | r i − r j | ) f cut ( | r i − r j | )  (37)where the feature transformation (MLP tr ) and the radial dependence (MLP rad ) are bothmodeled as trainable ML potentials.Other ML models introduce additional physical information. The hierarchical interactingparticle NN (HIP-NN) enforces a physically motivated partitioning of the overall energybetween the diﬀerent reﬁnement steps, while the PhysNet architecture introduces explicitterms for long range electrostatic and dispersion interactions. In Ref. 420, Gilmer et al. cat-egorize graph networks of this general type as message-passing NNs (MPNNs) and introducethe concept of edge updates. These make it possible to use interatomic information besidethe radial distance metric in the reﬁnement procedure, and they have since been adaptedfor other architectures. Another interesting extension are end-to-end NNs incorporatinghigher-order features beside the scalar x i used in the original DTNN framework. These areequivariant features that encode rotational symmetry and can be based on angles, dipolemoment vectors, or features that can be expressed as spherical harmonics with l >

0. Thisenables the exchange of only radial information between atoms in each interaction pass andinstead include higher structural information, such as dipole-dipole interactions or angularinformation. In addition, equivariant end-to-end NNs can also be used to predict vectorialor tensorial properties in a manner similar to the rotational-symmetry-adapted SOAP ker-75el. Examples include TensorField networks,

Cormorant,

DimeNet,

PiNet, andFieldSchNet . After a descriptor vector for each chemical structure is deﬁned, one can then construct thedesign matrix and the kernel matrix for a set of structures. These matrices can then be usedas the input of ML models. As described in Section 3, supervised ML methods such as NNsand GPs can be used to approximate non-linear and high-dimensional functions, particularlywhen massive amounts of training data become available. Thus, one should expect that usingCompChem would be very useful for generating a large amount of almost noise-free trainingdata of speciﬁc systems or atomic conﬁgurations, as long as a physically accurate methodis being applied in the right way with appropriate computational resources. In contrast,experimental observations can be diﬃcult to measure and reproduce precisely. Note thatthe aim of most CompChem+ML eﬀorts have a similar scope as decades-old quantitativestructure activity/property relationship (QSAR/QSPR) models that are often based on ex-periments or CompChem modeling.

Thus, researchers in CompChem+ML shouldbe aware of potentially relatable work done by the QSAR/QSPR communities, and to whatextent questions being posed have been suﬃciently answered. On the other hand, ML usu-ally provides higher accuracy than non-ML statistical models, and so QSAR/QSPR eﬀortshave been turning toward ML models as well.

Depending on the types of chemical problems being solved, diﬀerent CompChem meth-ods suitable for treating diﬀerent chemical interactions will be used for training ML modelshaving diﬀerent approximations for underlying high-dimensional functions. For example, re-search eﬀorts are towards learning electron densities, density functionals, and molecularpolarizabilities.

Besides these direct learning strategies, ML has been used to enhance the performanceand suitability of CompChem models. As mentioned in Section 2, the ∆-ML approach is76ow a common technique for adapting an ML model that improves the quality of a theoret-ically insuﬃcient but computationally aﬀordable method. This approach has been used tolearn many body corrections for water molecules to allow a relatively inexpensive KS-DFTapproach like BLYP to more accurately reproduce CCSD(T) data.

Along similar lines,Shaw and coworkers used CompChem features along with a neural network to reweightterms from an MP2 interaction energy to provide ML-enhanced methods with increasedperformance.

Miller and coworkers have developed ML-models where molecular orbitalsthemselves are learned to generate a density matrix functional that provides CCSD(T)-quality potential energy surfaces with a single reference calculation. von Lilienfeld andcoworkers have investigated how the choice of regressors and molecular representations forML models impacts accuracy, and their ﬁnding suggest ways that ML models may be trainedto be more accurate and less computationally expensive than hybrid DFT methods.

Burkeand co-workers have studied how ML methods can result in improved understanding andmore physical exact KS-DFT and OFDFT functionals.

Brockherde et al. havepresented an approach where ML models can directly learn the Hohenberg-Kohn map fromthe one-body potential eﬃciently to ﬁnd the functional and its derivative.

Akashi andcoworkers have also reported the out-of-training transferability of neural networks that cap-ture total energies, which shows a path forward to generalizable methods.

Besides the above-mentioned applications of supervised learning, one can exploit the“universal approximator” nature of ML architectures to ﬁnd a function that gives the bestsolution in a variational setting. For instance, using restricted Boltzmann machines ordeep neural networks as a basis representation of wavefunctions in Quantum MonteCarlo calculations.

We have laid the general framework for CompChem+ML studies, but this direction wouldnot be complete without more details about training data (i.e., garbage in, garbage out).77e now review the landscape of data sets in CompChem and how they will likely evolveover time. The past decade has seen continually increasing usefulness and availabilty of“big data” from CompChem that include community-wide data repositories comprised ofmillions of atomistic structures along with diverse physical and chemical properties.

Such repositories are becoming the norm, and it is more customary for diﬀerent users todeposit raw or processed simulation data there for the beneﬁt of the research community.This brings the possibility of robust validation tests for ML models, but it also necessitatesapproaches that are well-equipped to handle large and complex data sets. Typical data setsmay come from diverse origins such as MD trajectories from ab initio simulations, data setsof small molecules and molecular conformers, or other training sets used for developing MLand non-ML forceﬁelds for speciﬁc applications. As the data sets grow, so do the scope ofpublications that involve ML as shown in Fig. 1.

ML model must be validated before they can be trusted for predictions. Validations ofdescriptors or model trainings are performed on benchmark data sets, and the most popularones are summarized in Table 5. These allow ML models to be compared on the same groundand provide large amounts of data for robust training. Their availability to the public alsoensures that the data sets can evolve with time and be extended as a part of communityeﬀorts.

Amongst the entries in Table 5, the most often used one is the QM9 set, which consists ofapproximately 134,000 of the smallest organic molecules that contain up to 9 heavy atoms (C,O, N, or F; excluding H) along with their CompChem-computed molecular properties such astotal energies, dipole moments, HOMO-LUMO gaps, etc. Several ML studies have alreadybeen published using this dataset (see Fig. 8, Ref. 488). A popular challenge is to developa next-generation ML model that learns the electronic energies of random assortments oforganic molecules with higher accuracy and less required training data than other existing78able 5: ML databases for CompChem

Database Description LocationAFLOWLIB Databases containing calculated properties of over 625k materi-als

ANI-1 Large computational DFT database, which consists of morethan 20M oﬀ equilibrium conformations for 57.5k small organicmolecules. https://github.com/isayev/ANI1_dataset

ANI-1x /ANI-1ccx ANI-1x contains multiple QM properties from 5M DFT calcula-tions, while ANI-1ccx contains 500k data points obtained withan accurate CCSD(T)/CBS extrapolation. https://github.com/aiqm/ANI1x_datasets

BindingDB Measured binding aﬃnities focusing on interactions of proteinsconsidered to be candidates as drug-targets. 1,200,000 bindingdata for 5,500 proteins and over 520,000 drug-like molecules.

Clean En-ergy Project Contains ∼ http://cepdb.molecularspace.org CoRE MOF Database containing over 4,700 porous structures of metal-organic frameworks with publically available atomic coordi-nates. Includes important physical and chemical properties. http://dx.doi.org/10.11578/1118280

FreeSolv Experimental and calculated hydration free energies for neutralmolecules in water.

GDB GDB-11, GDB-13 and GDB-17. Together these databases con-tain billions of small organic molecules following simple chemicalstability and synthetic feasibility rules. http://gdb.unibe.ch/downloads/

Hypotheticalzeolites Contains approximately 1M zeolite structures.

MaterialsProject Contains computed structural, electronic, and energetic data forover 500k compounds.

MD17 Datasets in this package range in size from 150k to nearly 1Mconformational geometries. All trajectories are calculated at atemperature of 500 K and a resolution of 0.5 fs.

MoleculeNet Contains data on the properties of over 700k compounds. http://moleculenet.ai

Open Cata-lyst Project 1.2 M molecular relaxations with results from over 250 M DFTcalculations relevant for renewable energy storage. https://opencatalystproject.org/index.html

OQMD Consists of DFT predicted crystallographic parameters and for-mation energies for over 200k experimentally observed crystalstructures. http://oqmd.org

PubChemQC Provides ca. 3 million molecular structures optimized by densityfunctional theory (DFT) and excited states for over 2 millionmolecules using time-dependent DFT (TDDFT). http://pubchemqc.riken.jp/

QM7-X Comprehensive dataset of 42 physicochemical properties for ∼ https://zenodo.org/record/4288677 QM9 Geometric, energetic, electronic, and thermodynamic propertiesfor 134k stable small organic molecules out of GDB-17. https://figshare.com/collections/Quantum_chemistry_structures_and_properties_of_134_kilo_molecules/978904

SynthesisProject Collection of aggregated synthesis parameters computed usingthe text contained within over 640,000 journal articles. quantum-machine.org A repository of diverse datasets, including valence electron den-sities, chemical reactions, solvated protein fragments and molec-ular Hamiltonians. http://quantum-machine.org/datasets/

As the structural data sets grow it becomes infeasible to manually identify hidden patternsor curate the data. Data-driven and automated frameworks for visualizing these data setsbecome increasingly popular.

Dimensionality reduction eﬀectively translates the highdimensional data (i.e. the xyz-coordinates for molecules or materials in diﬀerent atomicconﬁgurations) into a low-dimensional space easily visualized on paper or a computer screen.In this way, entries such as those in the QM9 set can be shown (see Fig. 9). This map thus is a80igure 9: KPCA maps of the QM9 database using a global SOAP kernel. The frames arecolor-coded according to structural descriptors (b,c,d,g) and quantum mechanical properties(a,e,f). Reprinted with permission from Ref. 374. Copyright (2020) American ChemicalSociety.useful tool to help navigate the QM9 set by showing how molecules of diﬀerent compositions(represented as colored dots) are similar or dissimilar based on molecular properties (i.e.atomization energies) along the principal axes. Similarly, Ref. 321 used SOAP-sketchmapsin conjunction with quasi-chemical theory to show an unsupervised learning procedure foridentifying local solvent environments that signiﬁcantly impact solvation energies of smallions.These data-driven maps are generated by processing the design matrix (or kernel matrix)associated with a data set using dimensionality reduction techniques introduced in Section3.2. A simple option is to use the ASAP code, a Python-based command line tool, that81utomates analysis and mapping. Fig. 7 and 9 were generated using ASAP using only twocommands that are displayed in the ﬁgure. Data sets can also be explored in an intuitivemanner using interactive visualizers that run in a web browser and display 3D-structurescorresponding to each atomistic structure in the data set.

Conventional publications are an essential part of CompChem knowledge base, and ML isbecoming useful at accelerating information extraction from the scientiﬁc literature via textmining.

This topic was previously comprehensively reviewed in the context of chemin-formatics.

Natural language processing has already driven text-mining eﬀorts materialsscience discovery and experimental synthesis conditions of oxides.

CompChem+MLcan also amplify existing eﬀorts in chemometrics, the science of data-driven extraction ofchemical information.

This area has also branched into related disciplines of data min-ing for speciﬁc classes of materials and catalysis informatics.

These approaches havegreat promise, especially for deriving information and knowledge from data, but it remainschallenging to implement these in ways that achieve insight (and true impact).Some have shown paths forward for doing so. For example, ML models can obtainknowledge from failed experimental data more reliably than humans who are more susceptibleto survivor bias, and it can also be used to distill physical laws and fundamental equationsusing experimental and computational data.

ML models can also be used to reliablypredict SMILES representations that allow encoded information to be derived from low-resolution images found in the literature.

ML models can interpret experimental X-rayabsorption near edge structure (XANES) data and predict real space information aboutcoordination environments.

Likewise, scanning tunneling microscopy (STM) data can beused to classify structural and rotational states on surfaces, and name indicators canbe used to predict in tandem mass spectrometry (MS/MS) properties.

In closing, wesee exciting opportunities for future applications that complement data and text mining to82hemometrics through chemical space.

We previously mentioned that ML can handle large data sets and extract insights while cir-cumventing the high cost of quantum-mechanical calculations by statistical learning. Com-pChem+ML also has great potential in developing MLPs. Car and Parrinello proposedrunning MD using electronic-structure methods in 1985.

These are now mainstream butalso quite computationally demanding and normally restricted to small system sizes ( ∼ ∼ − s). Alternatively, accurate atomistic potentialsintroduced in Section 2.2.6 have been developed to allow Monte Carlo and MD calculations,but suﬃciently accurate potentials are sometimes not available. MLPs have thus emerged asway to achieve as high accuracy as KS-DFT or correlated wavefunction methods but with afraction of the cost. MLPs have been constructed for far-reaching systems from small organicmolecules to bulk condensed materials and interfaces. Several of the co-authors ofthe current review have also written separate review focused more narrowly on this topic, and so we only provide a brief overview here.Training an MLP to reproduce a system’s PES usually requires generating diverse andhigh quality CompChem data points that cover the relevant temperature and pressure con-ditions, reaction pathways, polymorphs, defects, compositions, etc.

After data pointscomprised of atomic conﬁgurations, system energies, and forces are obtained, diﬀerent meth-ods for constructing MLPs employ either diﬀerent descriptors (see a list of examples inTable 4) or diﬀerent ML architectures to perform interpolations of the full PES. Again,smoothness is an essential feature for any PES, so special considerations are needed to avoidnumerical noise that would result in discontinuities.

Kernel method-based MLPs suchas GAP and sGDML ensure smoothness by relying on smoothly varying basisfunctions, but the scaling of kernel-based methods with respect to the number of trainingpoints is challenged without reduction mechanisms.

As a much more eﬃcient but some-83hat less accurate alternative to GAP, Spectral Neighbor Analysis Potential (SNAP) usesthe coeﬃcients of the SOAP descriptors and assumes a linear or quadratic relation betweenenergies and the SOAP bispectrum components.

The most popular MLPs are currentlyNN-based due to their ﬂexibility and capacity to train based on large amounts data. Amongstthese, ANI and BPNN potentials use ACSF descriptors as inputs, while Deepneural networks such as SchNet and DeepMD use the coordinates and nuclearcharges of atoms. We now focus on a few example applications.

Many CompChem eﬀorts focus on predicting thermodynamic properties at ﬁnite temper-atures, such as heat capacity, density, and chemical potential. Although many physicalproperties are already accessible from MD simulations, doing estimations of free energiesthat establish the relative stability of diﬀerent states using electronic structure methods re-mains diﬃcult. The conﬁgurational part of the Gibbs free energy of a bulk system that has N distinguishable particles with atomic coordinates r = { r ...N } , and the associated potentialenergy U ( r ) can be expressed as: G ( N, P, T ) = − k B T ln Z d r exp " − U ( r ) + P Vk B T , (38)integrated over all possible coordinates r , where k B is the Boltzmann constant. In orderto rigorously determine G , one must exhaustively sample the conﬁguration space that hasrelatively high weight exp " − U ( r ) + P Vk B T . This normally requires thermodynamic integra-tion or enhanced sampling methods (e.g. umbrella sampling, metadynamics, transitionpath sampling ), that require simulation times and scales far beyond what is accessiblewith MD simulations based on KS-DFT or correlated wavefunction methods.However, MLPs have unleashed both limits on the time scale and system size. Anearly example, used an MLP with umbrella sampling and the free energy perturbation84ethod to reveal the inﬂuence of van der Waals corrections on the thermodynamic prop-erties of liquid water. Later, the combination of an MLP trained from hybrid DFT data andfree energy methods reproduced several thermodynamic properties of water from quantummechanics, including the density of ice and water, the diﬀerence in melting temperaturefor normal and heavy water, and the stability of diﬀerent forms of ice. Ref. 567 em-ployed the DeepMD approach to study the relatively long time-scale nucleation of gallium.MLPs for high-pressure hydrogen provided evidence on how hydrogen gradually turns intoa metal in giant planets.

In all these examples, high accuracy and long timescales wererequired to model the speciﬁc phenomena and reveal physical insights, and it is precisely thecombination of CompChem+ML that enables both.

As mentioned in Sec. 2.2.5, NQEs of chemical systems having light elements bring challengesfor atomistic modelling because the added mobility of lighter atoms in dynamics simula-tions requires higher computational cost to treat. To make the matter even more compli-cated, many atomistic potentials (see Sec. 2.2.6), particularly the ones for water or organicmolecules, cannot be used to model NQEs, because they often describe colavent bonds asrigid, and thus cannot describe the ﬂuctuations of the bond lengths and angles. As a rem-edy, several studies have been performed by training an MLP using higher rungs of KS-DFT(e.g. hybrid-DFT or meta-GGA) and this use this potential in PIMD simulations.

The study of water that was mentioned in the previous section and used MLPs trainedfrom hybrid DFT revealed that NQEs were critical for promoting the hexagonal packing ofmolecules inside ice that ultimately lead to the six-fold symmetry of snowﬂakes.

Highlydata eﬃcient ML potentials can even trained on reference data at the computationally veryexpensive quantum-chemical CCSD(T) level of accuracy. For example, the sGDML approach has been shown to faithfully reproduce such force ﬁelds for small molecules, whichwere then used to perform simulations with eﬀectively fully quantized electrons and nuclei.85 .5 ML for structure search, sampling, and generation

Locating stationary points on the potential-energy surface (PES) is a frequent task in Com-pChem, since minima represent thermodynamically stable states and ﬁrst order saddle pointsfor transition states that deﬁne barrier heights that explaining reaction kinetics. Explorationsfor stationary points normally require many energy and force evaluations. ML approachesare being implemented to dramatically accelerate minimum energy as well as saddle-pointoptimizations.

Bernstein et al. proposed an automated protocol that itera-tively explores structural space using a GAP potential.

Bisbo and Hammer employed anactively-learned surrogate model of the potential energy surface to performs local relaxationswhile only performing single-point quantum-mechanical calculations for selected structureswith high values of acquisition.

Refs. 293,295–297 accelerate nudged elastic band (NEB)calculations by incorporating a surrogate ML models.ML can also dramatically accelerate the challenge of eﬃciently sampling equilibrium ortransition states by accelerating enhanced sampling methods such as umbrella sampling and metadynamics.

These procedures make use of collective variables (CVs) that deﬁnea reaction coordinate, and computing the associated free energy surface (FES) amounts togenerating the marginal probability distribution in these CVs. Unfortunately, the choiceof the CVs is not always clear for speciﬁc systems, and ML have shown some promise inguiding their determination.

Another direction is to exploit that ML models can beconsidered as universal approximators of free energy surfaces.

For example, there arereports of adaptive enhanced sampling methods using a Gaussian Mixture model, usingan NN architecture to represent the FES or the bias function in variational samplingsimulations.

ML methods also oﬀer fundamentally new ways to explore chemical compound and con-ﬁguration space. Generative models can learn the structural and elemental distributionunderlying chemical systems, and once trained, these models can then be used to directlysample from this distribution. It is furthermore possible to bias the generated structures86owards exhibiting desired properties, e.g. drug activity or thermal conductivity. As a con-sequence, generative models oﬀer exciting new avenues in drug and materials design.

Generative methods in CompChem include recurrent neural networks (RNNs), which can beused for the sequential generation of molecules encoded as SMILES strings (a string basedrepresentation of molecular graphs).

Segler et al. demonstrated how such a recurrentmodel can ﬁrst learn general molecular motifs and then be ﬁne-tuned to sample moleculesexhibiting activity against a variety of medical targets.

Autoencoders (AE) are anotherfrequently used ML method for molecular generation. AEs learn to transform of molecu-lar graphs or SMILES into a low-dimensional feature space and backwards. The resultingfeature vector represents a smooth encoding of the molecular distribution and can be usedto eﬀectively sample chemical space.

By applying a variational AE to the QM9 andZINC databases, Gomez-Bombarelli et al. could generate several optimized functional com-pounds.

An interesting extension to AEs are conditional AEs, which not only capturethe distribution of molecular structures but also its dependence on various properties.

This makes it possible to directly generate structures exhibiting certain property ranges orcombinations without the need for biasing or additional optimization steps. AEs can alsoform the basis of another approach for exploring chemical space called generative adversarialnetworks (GANs).

In a GAN, a generator model (often an AE) attempts to createsamples that closely match the underlying data, while a discriminator tries to distinguishtrue from generated samples. These architectures can be enhanced by using reinforcementlearning (RL) objectives. RL learns an optimal sequence of actions (e.g. placement of atoms)leading to a desired outcome (e.g. molecule with certain property). This makes it possibleto drive generative process towards certain objectives, allowing for the targeted generationof molecules with particular properties.

RL in general is a promising alternative strat-egy for generative models, and they oﬀer the possibility for tightly integrating theminto drug design cycles.

Alternative approaches combine autoregressive models with graphconvolution networks.

Gebauer et. al proposed a auto-regressive generative model based on the SchNetarchitecture, called g-SchNet.

Once trained on the QM9 dataset, g-SchNet was able togenerate equilibrium structures without the need for optimization procedures. It was furtherfound, that the model could be biased towards certain properties. In another promising ap-proach, No´e et al. used an invertible neural network based on normalizing ﬂows to learn thedistribution of atomic positions (e.g. sampled from a molecular dynamics trajectory). Thisnetwork can then be used to directly sample molecular conﬁgurations by sampling from thisdistribution without performing costly simulations.

Multiscale modeling is a term for including simulation or information from diﬀerent scales(see Fig. 3). ML has been introduced into QM/MM-like schemes that enable improvedmultiscale simulations, and on the side of coarse-graining.

Coarse-graining havebeen developed, but the inherent functional form for these potentials relies on CPI aswell as trial-and-error procedures. Several works used ML for constructing coarse-grainedpotentials by matching mean forces.

In closing, we see promise for experimentalpriors into ML models, for instance, using experimental measurements to improving a MLpotential energy surface by complementing with experimental data. We are not aware of sucheﬀorts for developing highly accurate MLPs beyond the atomic scale, although much workhas been done along this line to reﬁne force ﬁelds of RNAs and proteins, often incorporatingmethods from ML, including the maximum entropy approach. Selected applications and paths toward insights

The central challenge posed at the beginning of this review was how to identify and makechemical compounds or materials having optimal properties for a given purpose. To doso would help address critical and broad issues from pollution, global warming to hu-man diseases. Traditional developments are often slow, expensive, and restricted by non-transferable empirical optimizations, and so eﬀorts have turned to CompChem+ML to alle-viate this.

CompChem+ML are enabling searches through larger areas of chemical space much fasterthan before.

This section is not to extensively review the large amount of work usingCompChem+ML in these diﬀerent areas, but rather to highlight examples of applicationsthat have resulted in notable insights so that others might use these notable works as tem-plates for future eﬀorts.

Molecules and materials design is usually considered to be an optimization problem.

Thus, a comprehensive understanding of chemical space is needed to identify compounds withdesired properties that are subject to certain required constraints (e.g. a speciﬁc thermalstability or a suitable optical gap for absorbing sunlight). Those properties will also dependon many key variables (e.g. constitutive elements, crystal forms, geometrical and electroniccharacteristics, among others), which make the property prediction complex.

CompChemcalculations as explained in Section 2 should provide a continuous description of propertiesacross a continuous representation (i.e. a descriptor or ﬁngerprint) of molecules that is usedto map molecular conﬁgurations to target properties, and vice-versa. ML methods thencan be implemented to search large databases to extract structure-property relationshipsfor designing compounds with speciﬁc characteristics.

Optimizations would thenbe performed on the structure-based function learned from training conﬁgurations, and the89omposition of the chemical compound would then be recovered back from the continuousrepresentation.As a protoypical example of molecular design via high-throughput screening, Gomez-Bombarelli et al. showed a computation-driven search for novel thermally activated de-layed ﬂuorescence organic light-emitting diode (OLED) emitters. That work ﬁrst ﬁltered asearch space of 1.6 million molecules down to approximately 400,000 candidates using ML toanticipate criteria for desirable OLEDs. For the purpose of evaluating candidates, they esti-mated an upper bound on the delayed ﬂuorescence rate constant (k

TADF ). Time-dependentDFT calculations were then used to provide reﬁned predictions of speciﬁc properties ofthousands of promising novel OLED molecules across the visible spectrum so that syntheticchemists, device scientists, and industry partners would be able to choose the most promis-ing molecules for experimental validation and implementation. Notably, this example ofCompChem+ML resulted in new devices that exhibited an external quantum eﬃciency ofover 22%. Fig. 10 shows the high accuracy of ML in predicting useful properties for high-throughput screening of molecules and materials based on k

TADF calculations. This workexempliﬁes how ML can accelerate the design of novel compounds in such a way that couldnot be possible using traditional CompChem methods alone.Figure 10: Neural network predictions compared to TD-DFT derived data of log k

TADF (R =0.94). ML models computed molecular properties needed for screening with an accu-racy comparable to CompChem calculations, but at a fraction of the computational cost.See Ref. 619. Reprinted by permission from Springer Nature Customer Service CentreGmbH: Springer Nature, Nature Materials. Design of eﬃcient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach, Gomez-Bombarelli, R., et al. , COPYRIGHT (2016). Permission pending.One can also exploit the ﬂexibility of ML models to increase the quality of the chemicalinsights. Integrations of features relevant to learning tasks allow one to improve the accuracyof ML predictions for a given target property. Park and Wolverton improved the perfor-mance of the crystal graph convolution neural network (CGCNN) by adding to the originalframework information about Voronoi tessellated crystal structure, which are explicit 3-body90orrelations of neighboring constituent atoms, and an optimized representation of interatomicbonds. The new approach that was labeled as iCGCNN achieved a predictive accuracy 20%higher than that of the original CGCNN when determining thermodynamic stabilities ofcompounds (i.e. predictions of hull distances). When used for high-throughput searches,iCGCNN exhibited a success rate higher than an undirected high-throughput search andhigher than that of CGCNN. Fig. 11 shows the improvement in predictions of nearly stablecompounds after using more appropriate descriptors. This study showcases how descriptorscan be tailored to further enhance the success of ML-aided high-throughput screening.Figure 11: DFT vs ML predicted hull distances of nearly stable compounds (hull distancessmaller than 50 meV/atom) for CGCNN and iCGCNN. The ﬂexibility of ML approachesenable constructions of robust models tailored for speciﬁc target properties. See Ref. 625. https://doi.org/10.1103/PhysRevMaterials.4.063801 . A grand challenge in chemistry is to understand synthetic pathways to desired molecules.

Retrosynthesis involves the design of chemical steps to produce molecules and materials thatwould be crucial to drug discovery, medicinal chemistry, and materials science. As a diﬀer-ent kind of optimization problem, the general tactic is to analyze atomic scale compoundsrecursively, map them onto synthetically achievable building blocks, and then assemble thoseblocks into the desired compound.

First, sim-ple combinatorics make the space of possible reactions greater than the space of possiblemolecules. Second, reactants seldom contain only one reactive functional group, and thusrequire predictions of multiple functional groups. Third, one failed step in the route caninvalidate the entire synthesis because organic synthesis is a multistep process.Given these challenges, ML is becoming more established in determining reaction rulesfrom CompChem data.

Computer-aided synthesis planning was actually ﬁrst attemptedin the 1960s.

Many have since attempted to formalize chemical perception and syntheticthinking using computer programs.

These programs are typically based on one of threepossible algorithms:

1. Algorithms that use reaction rules (manually encoded or automatically derived fromdatabases).2. Algorithms that use principles of physical chemistry based on ab initio calculations topredict energy barriers.3. Algorithms based on ML techniques.ML approaches are used to try to overcome the generalization issues of rule-based al-gorithms (that normally suﬀering from incompleteness, infeasible suggestions, and humanbias) while also avoiding the high cost of CompChem calculations. It is now possible toobtain purely data-driven approaches for synthesis planning, which are promoting a rapidadvancement in the ﬁeld. For example, Coley and co-workers designed a data-driven met-ric, SCScore, for describing a real synthesis modeled after the idea that products are, onaverage, more synthetically complex than each of their reactants. The deﬁnition of a metricfor selecting the most promising disconnections that produce easily synthesizable compoundsis crucial for avoiding combinatorial explosion. Fig. 12 shows that a data-driven metric, theSCScore, is more suitable than other heuristic metrics to perceive the complexity of eachstep in a given synthesis. This work oﬀered a valuable contribution to the retrosynthesisworking pipeline by providing a method that implicitly learns what structures and motifs92re more prevalent as reactants.Figure 12: Use of diﬀerent metrics to analyze the synthesis of a precursor to lenvatinib. Onlythe SCScore, a data-driven metric, correctly perceives a monotonic increase in complexity.ML models can give insights into which compounds are either reactants or products. SeeRef. 634. Reprinted with permission from American Chemical Society, Accounts of ChemicalResearch. Machine Learning in Computer-Aided Synthesis Planning, Coley, C. W., et al. ,Copyright (2018) American Chemical Society.Apart from isolated approaches or algorithms to deal with speciﬁc tasks within retrosyn-thesis, there is already software available to advance this ﬁeld. One example is the Chematicaprogram, which has implemented a new module that combines network theory, modernhigh-power computing, artiﬁcial intelligence, and expert chemical knowledge to design syn-thetic pathways. A scoring function is used to promote synthetic brevity and penalize anyreactivity conﬂicts or non-selectivities, thus allowing it to ﬁnd solutions that might be hardfor a human to identify. Fig. 13A shows the decision tree for one of the almost 50,000reaction rules used in Chematica. Reaction rules can be considered as the allowed moves93rom which the synthetic pathways are built, and such moves lead to an enormous syntheticspace (the number of possibilities within n steps scales as 100 n ) as the one shown by thegraph in Fig. 13B. Chematica explores this large synthetic space by truncating and revertingfrom unpromising connections and drives its searches to the most eﬃcient sequences of steps.Moreover, in the pathways presented to the user, each substance can be further analyzedwith molecular mechanics tools. This software was used to obtain insights into the syntheticpathways to eight targets (seven bioactive substances and one natural product). All of thecomputer-planned routes were not only successfully carried out in the laboratory, but theyalso resulted in improved yields and cost savings over previous known paths. This workopened an avenue for chemists to ﬁnally obtain reliable pathways from in silico retrosynthe-sis. For further reading we recommend the two-part reviews of Coley and co-workers. Figure 13: (A) Decision tree of one of the reaction rules within Chematica (double stere-odiﬀerentiating condensation of esters with aldehydes). The diﬀerent conditions in the treespecify the range of admissible and possible substituents or atom types. (B) Reaction rulesare used to explore the graph of synthetic possibilities (similar to the one shown here).Each node corresponds to a set of substrates. The combination of expert chemical knowl-edge, CompChem calculations and ML enables ﬁnding synthesizable paths. See Ref. 638.Reprinted from Chem, 4 (3), Klucznik, T., et al. , Eﬃcient Syntheses of Diverse, MedicinallyRelevant Targets Planned by Computer and Executed in the Laboratory, 522-532, Copyright(2018), with permission from Elsevier. Permission pending.

Catalysis research requires multiscale approaches to determine chemical compounds that caninﬂuence barrier heights of reaction mechanisms to impact product yields and selectivitieswithout otherwise being generated or consumed by the reaction.

Traditional catalysis isnormally discussed in textbooks in terms of homogeneous (i.e. within a solution phase), het-erogeneous (occurring at a solid/liquid interface), and biological (occurring within enzymesand riboenzymes), but it is best not to use these terms too strictly because actual reactionmechanisms can be quite complex and overall processes may sometimes exhibit characteris-94ics (by design) of two or more of these classical processes.

Modern research in catal-ysis has been interested in studying chemical reactivity and reaction selectivity arising fromstimuli from solar thermal energy, electrochemical potentials, photons, plas-mas, or other external resonances.

Catalysis makes up roughly 35% of the world’sgross domestic product, and it is important to guide toward the end goal of achievinggreater sustainability with catalytic processes.

These reasons help make catalysis a fertile training ground for applying and develop-ing theoretical models (e.g. Refs. 659–661) that can be used along with CompChem orCompChem+ML. The research ﬁeld is also burgeoning with many reports and review arti-cles that discuss perspectives and progress using ML methods for catalysis science;here, we will mention notable examples that present a broad range of ways that Com-pChem+ML can be used for insights. For example, CompChem+ML methods are enablingmore data generation by allowing costly QC calculations to be run more eﬃciently, andmore information means more comprehensive predictions of chemical and materials phasediagrams for catalysis as well as stability and reactivity descriptors identiﬁed on theﬂy.

Figure 14 shows examples of the palettes of insight available using state-of-the-artCompChem+ML modeling for identifying activity and selectivity maps, as well as visualiza-tions of data using t-Distributed Stochastic Neighbor Embedding (t-SNE).

Figure 14: CompChem+ML screening of hypothetical Cu and Cu-based catalyst sites. a)A two-dimensional activity volcano plot for CO reduction. TOF, turnover frequency. b) Atwo-dimensional selectivity volcano plot for CO reduction. CO and H adsorption energiesin panels a and b were calculated using DFT. Yellow data points are average adsorption en-ergies of monometallics; green data points are average adsorption energies of copper alloys;and magenta data points are average, low-coverage adsorption energies of Cu-Al surfaces. c)t-Distributed Stochastic Neighbor Embedding (t-SNE) representation of approximately4,000 adsorption sites on which we performed DFT calculations with Cu-containing alloys.The Cu-Al clusters are labelled numerically. d) Representative coordination sites for each ofthe clusters labelled in the t-SNE diagram. Each site archetype is labelled by the stoichio-metric balance of the surface, that is, Al-heavy, Cu-heavy or balanced, and the binding siteof the surface. Ref. 675. Permission pending.Regarding modeling of deeply complex chemical environments, Artrith and Kolpak de-95eloped MLPs for investigating the relationships between solvent, surface composition andmorphology, surface electronic structure, and catalytic activity in systems composed of thou-sands of atoms interfaces. We expect such simulations for electro- and photo-catalysiselucidation will continue to improve in size, scale, and accuracy. For other physical insights,new approaches by Kulik and Getman and co-workers have also focused on developing MLmodels appropriate for elucidating complex d -orbital participation in homogeneous catal-ysis. Rappe and co-workers have used regularized random forests to analyze how localchemical pressure eﬀects adsorbate states on surface sites for the hydrogen evolution reac-tion.

Almost trivially simple ML approaches can be used in catalysis studies to deduceinsights into interaction trends between single metal atoms and oxide supports, to iden-tify the signiﬁcance of features (e.g. adsorbate type or coverage) where CompChem theoriesbreak down, or they can be used to identify trends that result in optimal catalysis acrossmultiple objectives such as activity and cost (Fig. 15).

Figure 15: Estimated price (for one mmol in US dollars) of the catalysts in the selected rangeof − − − (for ligand no. 72 – 90). The price is calculated as a summationof the commercial price of transition metal precursors (one mmol) and one mmol of eachligand. The cheapest complex for each metal is shown on the right. The estimated price ofall the 557 catalysts is detailed in the reference: 681 - Published by The Royal Society ofChemistry. Ref. 681ML is also opening opportunities for CompChem+ML studies on highly detailed andcomplex networks of reactions. Such models in principle can then signiﬁcantly extendthe range of utility of microkinetics modeling for predictions of products from catalysis.

96L also enables studies of complicated reaction networks that can allow predictions of regios-elective products based on CompChem data, asymmetric catalysis important for naturalproduct synthesis, and biochemical reactions.

Eﬀorts to better understand ‘above-the-arrow’ optimizations of reaction conditions relate back to the challenge of retrosyntheticchallenges.

Ideally, these eﬀorts will continue while making use of rapid advances inCompChem+ML that enable predictive atomistic simulations to be run faster and moreaccurately. We see reason for excitement for diﬀerent approaches, but we again stress theimportance of ensuring that models will provide unique and physical results (see Section 3where we discuss the risk of “clever Hans” predictors ). The central objective for drug discovery is to ﬁnd structurally novel molecules with preciseselectivity for a medicinal function. This involves identifying new chemical entities and ob-taining structures with diﬀerent physicochemical and polypharmacological properties (i.e.,combinations of beneﬁcial pharmacological eﬀects and/or adverse side-eﬀects).

Drugdiscovery involves the identiﬁcation of targets (a property optimization task, as in materialdesign) and the determination of compounds with good on-target eﬀects and minimal oﬀ-target eﬀects.

Traditionally, a drug discovery program may take around six years before adrug candidate can be used in clinical trials and additional six or seven years are required forthree clinical phases. Thus, it is important to identify adverse eﬀects as soon as possible tominimize time and monetary costs.

Accelerating drug discovery relies on predicting howand where a certain drug binds to more than one protein, a phenomenon that sometimes re-sults in polypharmacology. Researchers are developing ready-to-use tools aimed to facilitateresearch for drug discovery, but CompChem+ML is expected to continue providing evenmore beneﬁts to the drug development pipeline.

In a recent study, Zhavoronkov et al. developed a deep generative model for de novo small-molecule design: the generative tensorial reinforcement learning (GENTRL) model97hat was used to discover potent inhibitors of discoidin domain receptor 1 (DDR1), a kinasetarget implicated in ﬁbrosis and other diseases. The drug discovery process was carried outin only 46 days, beginning with the recollection of appropriate data for training and ﬁnishingwith the synthesis and experimental test of some compounds (Fig. 16A). GENTRL was usedto screen a total of 30,000 structures (some examples compared to the parent DDR1 kinaseinhibitor are shown in Fig. 16B) down to only 40 structures that were randomly selectedensuring a coverage of the resulting chemical space and distribution of root-mean squareddeviation values. Six of these molecules were then selected for experimental validation (seeFig. 16C), with one of them demonstrating favorable pharmacokinetics in mice. The pre-dicted conformation of the successful compound according to pharmacophore modelling wasvery similar to the one predicted to be preferred and stable by CompChem methods. Thiswork illustrates the utility of CompChem+ML approaches to give insights into drug designby rapidly giving compound candidates that are synthetically feasible and active against adesired target.Figure 16: (A) Workﬂow and timeline for the design of candidates employing GENTRL.(B) Representative examples of the initial 30,000 structures compared to the parent DDR1kinase inhibitor. (C) Compounds found to have the highest inhibition activity against humanDDR1 kinase. CompChem+ML methods can considerably accelerate the discovery of drugsthat are eﬀective against a desired target. See Ref. 604. Reprinted by permission fromSpringer Nature Customer Service Centre GmbH: Springer Nature, Nature Biotechnology.Deep learning enables rapid identiﬁcation of potent DDR1 kinase inhibitors, Zhavoronkov,A., et al. , COPYRIGHT (2019). Permission pending.Besides generating new chemical structures with favorable pharmacokinetics, ML meth-ods are also used in pharmaceutical research and development for peptide design, compoundactivity prediction and for assisting scoring protein-ligand interaction (docking).

Anexample of the latter was proposed by Batra et al. for eﬃciently identifying ligands thatcan potentially limit the host-virus interactions of SARS-CoV-2. Those authors designeda high-throughput strategy based on CompChem+ML that involved high-ﬁdelity dockingstudies to ﬁnd candidates displaying high-binding aﬃnities. The ML model was used to98earch through thousands of approved ligands by the Food and Drug Administration (FDA)and a million biomolecules in the BindingDB database.

From these, insights were obtainedfor more than 19,000 molecules satisfying the Vina score (i.e. an important physicochemicalmeasure of the therapeutic process of a molecule that is used to rank molecular conforma-tions and predict free energy of binding). Fig. 17 shows the Vina score predictions that ledto the selection of the best candidates, some of which are also illustrated in the ﬁgure. TheVina scores for the top ligands were further conﬁrmed using expensive docking approaches,resulting in the identiﬁcation of 75 FDA-approved and 100 other ligands potentially use-ful to treat SARS-CoV-2. This study highlights a reasonable CompChem+ML strategyfor making useful suggestions to aid expert biologists and medical professionals to focus infewer candidates when performing either robust CompChem eﬀorts or synthesis and trialexperiments.Figure 17: Vina scores predictions for the isolated protein (S-protein) and the protein-receptor complex (interface) for all the molecules in the BindingDB datasets and someexemplary top cases that satisfy the screening criteria. ML models trained on accurateCompChem databases are of upmost importance to eﬃciently gain insights into possi-ble treatments, even for newly discovered diseases. Figure taken from Ref. 705. https://doi.org/10.1021/acs.jpclett.0c02278 .99

Conclusions and Outlook

Recent CompChem methods, algorithms, and codes have empowered new studies for a wealthof physical and chemical insights into molecules and materials. Today, the combination ofCompChem+ML can be equipped to address new and more challenging questions in dif-ferent domains of physics, materials science, chemistry, biology, and medicine. Productiveresearch eﬀorts in this direction necessitate interdisciplinary teams and increasing availabil-ity of high-quality data across appropriate regions of chemical compound space. Discoveringnew chemicals and materials requires thorough investigations. One needs to predict reac-tion pathways and interactions between molecules, optimize environmental conditions forcatalytic reactions, enhance selectivities that eliminate undesired side reactions and/or sideeﬀects, and navigate other system-speciﬁc degrees of freedom. Addressing this complexitycalls for a statistical view on chemical design and discovery, and CompChem+ML providesa natural synergy for obtaining predictive insights to lead to wisdom and impact.This review provided an essential background of CompChem and ML and how theycan be used together to make transformative impacts in the chemical sciences by ampli-fying insights available from CompChem methods. The successes of CompChem+ML areparticularly visible in physical chemistry and include drastic acceleration of molecular andmaterials modeling, discovery and prediction of chemicals with desired properties, predictionof reaction pathways, and design of new catalysts and drug candidates. Nevertheless, wehave only begun to scratch the surface of how successful applications of ML in chemistry canbring impact. There are many conceptual, theoretical, and practical challenges waiting to besolved to enable further synergies within the troika of CompChem, ML, and CPI. Here weenumerate some of the challenges that we consider to be the most pressing and interestingat this moment:1.

Reliance on ML in CompChem algorithms must be increased:

ML algorithms can beintegrated into CompChem algorithms at almost any simulation level (Fig. 3). ML100lgorithms are already available to accelerate calculations of CompChem energies, nav-igations along reaction pathways, and sampling of larger regions of the PES, but thereluctance of their use impedes progress. In general, these algorithms must be mademore eﬀective, eﬃcient, accessible, user-friendly, and reproducible to beneﬁt funda-mental and applied research (see for example, Ref. 706).2.

More general ML approaches are needed:

ML methods must continue to evolve beyondnow-common applications of learning a narrow region of a PES or identifying straight-forward structure/property relationships. New ML methods should have the capacityto predict energetic and electronic properties and their more convoluted relationshipsacross chemical space. Such approaches should grow toward uniformly describing com-positional (chemical arrangement of atoms in a molecule) and conﬁgurational (physicalarrangement of atoms in space) degrees of freedom on equal footing. Further progressin this ﬁeld requires developing new universal ML models suitable for insights acrossdiverse systems and physicochemical properties.3.

ML representations must to include the right physics:

ML methods that are claimedto be accurate but incorrectly describe the true physics of a system will eventuallyfail to achieve meaningful insights while lowering the reputation of other work in theﬁeld. Current ML representations (descriptors) can successfully describe local chemicalbonding, but few if any are treating long-range electrostatics, polarization, and van derWaals dispersion interactions that are critical for rationalizing physical systems, bothlarge and small. Combining intermolecular interaction theory (a key focus of advancedCompChem methods) with ML is an important direction for future progress towardsstudying complex molecular systems.4.

CompChem + ML applications need to strive toward achieving realistic complexity:

Investigations using highly accurate CompChem methods normally require overly sim-pliﬁed model systems while more realistic model systems necessitate less accurate but101omputationally eﬃcient CompChem methods. This compromise should no longerbe necessary. We are due for a paradigm shift in how thermodynamics, kinetics,and dynamics of systems in complex chemical environments (e.g. for multiscale bio-logical processes like drug design and/or catalytic processes at solid liquid interfacesunder photochemical excitations, etc.) can be treated more faithfully with less corner-cutting. An emerging idea is to dispatch ML approaches into computationally eﬃcientmodel Hamiltonians for electronic interactions based on correlated wavefunction, KS-DFT, tight-binding, molecular orbital techniques, and/or the many-body dispersionmethod. ML can predict Hamiltonian parameters and the quantum-mechanical ob-servables would be calculated via diagonalization of the corresponding Hamiltonian.The challenge is to ﬁnd an appropriate balance between prediction accuracy and com-putational eﬃciency to dramatically enhance larger scale simulations.5. (Much) more experimental data is needed:

Validations of ML predictions require exten-sive comparisons with experimental observables such as reaction rates, spectroscopicobservations, solvation energies, melting temperatures. Such experiments may havepreviously been considered too routine, too mundane, or not insightful enough alone,but all high quality brings great value for future CompChem+ML eﬀorts that tightly in-tegrate quantum mechanics, statistical simulations, and fast ML predictions, all withina comprehensive molecular simulation framework. (Much) more comprehensive data sets need to be assembled and curated: Current Com-pChem+ML eﬀorts have proﬁted heavily by the availability of benchmark data setsfor relatively small molecules that allow a comparison of existing models.

Whileeﬀorts ﬁxated on boosting prediction accuracies and shrinking down requisite train-ing set sizes for ML models have had their merits, it is time to move on as furtherimprovements are meaningless if the ML models are not making useful and insight-ful predictions themselves. More useful predictions will require knowledge from larger102atasets, and these will inevitably contain heterogeneous combinations of diﬀerent lev-els of theory and/or experiments that must be analyzed, ‘cleaned’, and uncertaintiesadequately quantiﬁed in order for models to productively learn. Such hybrid datasets may be the key to arrive at novel hypotheses in chemistry that could then beexperimentally tested.7.

Bolder and deeper explorations of chemical space are needed:

So far most eﬀorts togenerate chemical data have focused on exploring parts of chemical space for new com-pounds for a targeted purpose. This should change. Combining ML model uncertaintyestimates across broader swaths of chemical space could open pathways for fruitfulstatistical explorations, say, in an active learning framework. This could lead to dis-covering new synergies between data that otherwise would not have been possible toenable advances in scientiﬁc understanding and improve ML models. Generative mod-els can bridge the gap between sampling and targeted structure generation imposingoptimal compound properties, e.g. for inverse chemical design.

This and other reviews have stated how ML has become instrumental forrecent progress in CompChem. We would like to also mention inspirations that ML hasdrawn from being applied to physical and chemical problems.ML methods generally assume that data is subject to measurement noise while Comp-Chem data is generally approximate but also noise-free from a statistical perspective. MLmodeling still requires regularization, but regularizers should here particularly reﬂect theunderlying physics of molecular and materials systems. ML models used in applicationsof vision contain discrete convolution ﬁlters that are suboptimal for chemical modeling,but recognition of this shortcoming has led to novel continuous convolution ﬁlters that arewell suited for chemistry and have also become a popular novel architecture for core MLmethods.

Furthermore, invariances, symmetries, and conservation laws are key ingredients to phys-ical and chemical systems. Incorporating them into ML has led to novel and useful models103or chemistry since they can learn from signiﬁcantly less data, which then makes it possibleto build force ﬁelds at unprecedentedly high levels of theory.

Using these powerfulML techniques for computer vision, natural language processing, and other applications iscurrently being explored. Structural information from molecular graphs provide the basisfor novel tensor neural networks or message passing architectures as well as graphexplanation methods.

Many further challenges exist that have led or will lead to mutual bidirectional cross-fertilization between ML and chemistry. These interdisciplinary eﬀorts also initiate progressin respective application domains. The power of this path is that solving a burning problemin chemistry with a novel crafted ML model may also result in unforeseen insights in how tobetter design core ML methods. Interestingly, the exploratory usage of ML for knowledgediscovery in chemistry typically requires novel ML models and unforeseen scientiﬁc inno-vations, and this can lead to interesting insight that is not necessary limited to chemistryalone, rather it is likely to go beyond.To conclude, the past decade has shown that it has not been enough to just apply existingML algorithms, but breakthroughs are happening by a handshaking of innovations resultingin novel ML algorithms and architectures driven by the pursuit of novel insights in chemistry while cknowledgement

JAK was supported by the Luxembourg National Research Fund (INTER/MOBILITY/19/13511646)and the U.S. National Science Foundation (CBET-1653392 and CBET-1705592).VVG acknowledges ﬁnancial support from the Luxembourg National Research Fund(FNR) under the program DTU PRIDE MASSENA (PRIDE/15/10935404)BC acknowledges funding from the Swiss National Science Foundation (Project P2ELP2-184408).KRM was supported in part by Institute of Information & Communications Technol-ogy Planning & Evaluation (IITP) grants funded by the Korea Government (No. 2017-0-00451, Development of BCI based Brain and Cognitive Computing Technology for Recog-nizing User’s Intentions using Deep Learning) and funded by the Korea Government (No.2019-0-00079, Artiﬁcial Intelligence Graduate School Program, Korea University), and waspartly supported by the German Ministry for Education and Research (BMBF) under Grants01IS14013A-E, 01GQ1115, 01GQ0850, 01IS18025A, 031L0207D and 01IS18037A; the Ger-man Research Foundation (DFG) under Grant Math+, EXC 2046/1, Project ID 390685689.AT acknowledges ﬁnancial support from the European Research Council (ERC Consol-idator Grant BeStMo and ERC-POC Grant DISCOVERER).We gratefully acknowledge helpful comments on the manuscript by Hartmut Maennel.Address correspondence to: [email protected], [email protected], [email protected] 105

Author Bios

Figure 18: Picture of John A. KeithJohn A. Keith is an associate professor and R.K. Mellon Faculty Fellow in Energy atthe University of Pittsburgh in the department of chemical and petroleum engineering. Heobtained his bachelors’ in chemistry at Wesleyan University and a Ph.D. degree in com-putational chemistry at Caltech in 2007. After an Alexander von Humboldt postdoctoralfellowship at the Universit¨at Ulm, he was an Associate Research Scholar at Princeton Uni-versity. He was a recipient of an NSF-CAREER award in 2017. His research interests liein the applications and development of computational chemistry for engineering chemicalreactions and materials for electrocatalysis, anticorrosion coatings, and the development ofchemicals having less of an environmental footprint. He was a recipient of a LuxembourgScience Foundation INTER Mobility award in 2019-2020 to do a research sabbatical in Prof.Alexandre Tkatchenko’s group at the University of Luxembourg. This review is a primaryproduct of that visit.Valentin Vassilev-Galindo graduated with honors from University of Veracruz (Mexico)with a Bachelor’s degree in Chemical Engineering in 2014. Then, he enrolled to the Masterprogram in Physical Chemistry at Cinvestav-M´erida (Mexico) where he worked under the106igure 19: Picture of Valentin Vassilev-Galindosupervision of Professor Gabriel Merino until receiving the MSc. degree in 2017. He iscurrently pursuing a PhD degree at the University of Luxembourg in the research groupof Professor Alexandre Tkatchenko. His research is mainly related to machine learningpotentials. Figure 20: Picture of Bingqing ChengBingqing Cheng is a Departmental Early Career Fellow at the Computer Laboratory,University of Cambridge, and a Junior Research Fellow at Trinity College. She received herPh.D. from the ´Ecole polytechnique f´ed´erale de Lausanne (EPFL) in 2019. Her work focuseson theoretical predictions of material properties.Stefan Chmiela is a senior researcher at the Berlin Institute for the Foundations of Learn-107igure 21: Picture of Stefan Chmielaing and Data (BIFOLD). He received his Ph.D. from Technische Universit¨at Berlin in 2019.His research interests include Hilbert space learning methods for applications in quantumchemistry, with particular focus on data eﬃciency and robustness.Figure 22: Picture of Michael GasteggerMichael Gastegger is a postdoctoral researcher in the BASLEARN project of the MachineLearning Group at Technische Universit¨at Berlin. He received his Ph.D. in Chemistry fromthe University of Vienna in Austria in 2017. His research interests include the developmentof machine learning methods for quantum chemistry and their application in simulations.108igure 23: Picture of Klaus-Robert M¨ullerKlaus-Robert M¨uller has been a professor of computer science at Technische Universit¨atBerlin since 2006; at the same time he is directing and co-directing the Berlin Machine Learn-ing Center and the Berlin Big Data Center, respectively. He studied physics in Karlsruhefrom 1984 to 1989 and obtained his Ph.D. degree in computer science at Technische Univer-sit¨at Karlsruhe in 1992. After completing a postdoctoral position at GMD FIRST in Berlin,he was a research fellow at the University of Tokyo from 1994 to 1995. In 1995, he foundedthe Intelligent Data Analysis group at GMD-FIRST (later Fraunhofer FIRST) and directedit until 2008. From 1999 to 2006, he was a professor at the University of Potsdam. He wasawarded the Olympus Prize for Pattern Recognition (1999), the SEL Alcatel Communica-tion Award (2006), the Science Prize of Berlin by the Governing Mayor of Berlin (2014),the Vodafone Innovations Award (2017). In 2012, he was elected member of the GermanNational Academy of Sciences-Leopoldina, in 2017 of the Berlin Brandenburg Academy ofSciences and also in 2017 external scientiﬁc member of the Max Planck Society. In 2019and 2020 he became a Highly Cited researcher in the cross-disciplinary area. His researchinterests are intelligent data analysis and Machine Learning in the sciences (Neuroscience(speciﬁcally Brain-Computer Interfaces), Physics, Chemistry) and in industry.Alexandre Tkatchenko is a Professor of Theoretical Chemical Physics at the Universityof Luxembourg and Visiting Professor at Technische Universit¨at Berlin. He obtained his109igure 24: Picture of Alexandre Tkatchenkobachelor degree in Computer Science and a Ph.D. in Physical Chemistry at the UniversidadAutonoma Metropolitana in Mexico City. Between 2008 and 2010, he was an Alexander vonHumboldt Fellow at the Fritz Haber Institute of the Max Planck Society in Berlin. Between2011 and 2016, he led an independent research group at the same institute. Tkatchenkoserves on editorial boards of two society journals: Physical Review Letters (APS) and Sci-ence Advances (AAAS). He received a number of awards, including elected Fellow of theAmerican Physical Society, the 2020 Dirac Medal from WATOC, the Gerhard Ertl YoungInvestigator Award of the German Physical Society, and two ﬂagship grants from the Eu-ropean Research Council: a Starting Grant in 2011 and a Consolidator Grant in 2017. Hisgroup pushes the boundaries of quantum mechanics, statistical mechanics, and machinelearning to develop eﬃcient methods to enable accurate modeling and obtain new insightsinto complex materials.

References (1) LeCun, Y.; Bengio, Y.; Hinton, G.

Nature , , 436–444.(2) Schmidhuber, J. Neural Netw. , , 85–117.(3) Goodfellow, I.; Bengio, Y.; Courville, A. Deep learning ; MIT press, 2016.(4) others„ et al.

Nature , , 469.

5) others„ et al.

Semin. Cancer Biol. , , 151.(6) others„ et al. Sci. Transl. Med. , , eaaw8513.(7) others„ et al. Nat. Med. , , 954–961.(8) Baldi, P.; Sadowski, P.; Whiteson, D. Nat. Commun. , , 4308.(9) Leinen, P.; Esders, M.; Sch¨utt, K. T.; Wagner, C.; M¨uller, K.-R.; Tautz, F. S. Sci. Adv. , ,eabb6987.(10) Lengauer, T.; Sander, O.; Sierra, S.; Thielen, A.; Kaiser, R. Nat. Biotechnol. , , 1407.(11) others„ et al. Nature , , 706–710.(12) Blankertz, B.; Tomioka, R.; Lemm, S.; Kawanabe, M.; Muller, K.-R. IEEE Signal Process. Mag. , , 41–56.(13) Perozzi, B.; Al-Rfou, R.; Skiena, S. Deepwalk: Online learning of social representations. Proceedingsof the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014;pp 701–710.(14) Thrun, S.; Burgard, W.; Fox, D. Probabilistic robotics ; MIT press, 2005.(15) Won, D.-O.; M¨uller, K.-R.; Lee, S.-W.

Sci. Robot. , , eabb9764.(16) Lewis, M. M. Moneyball: The art of winning an unfair game ; W. W. Norton, 2003.(17) Ferrucci, D.; Levas, A.; Bagchi, S.; Gondek, D.; Mueller, E. T.

Artif. Intell. , , 93–105.(18) others„ et al. Nature , , 484.(19) Tkatchenko, A. Nat. Commun. , , 4125.(20) Rowley, J. J. Inf. Sci. , , 163–180.(21) Box, G. E. J. Am. Stat. Assoc. , , 791–799.(22) McQuarrie, D.; Simon, J.; Cox, H.; Choi, J. Physical Chemistry: A Molecular Approach ; UniversityScience Books, 1997.(23) Cramer, C. J.

Essentials of computational chemistry: theories and models ; John Wiley & Sons, 2004.

24) Frenkel, D.; Smit, B.

Understanding Molecular Simulation: From Algorithms to Applications ; ElsevierScience, 2001.(25) Foresman, J.; Frisch, A.; Gaussian, I.

Exploring Chemistry with Electronic Structure Methods ; Gaus-sian, Incorporated, 1996.(26) Eastman, P.; Swails, J.; Chodera, J. D.; McGibbon, R. T.; Zhao, Y.; Beauchamp, K. A.; Wang, L.-P.;Simmonett, A. C.; Harrigan, M. P.; Stern, C. D.; Wiewiora, R. P.; Brooks, B. R.; Pande, V. S.

PLoSComput. Biol. , , e1005659.(27) Oyeyemi, V. B.; Keith, J. A.; Carter, E. A. J. Phys. Chem. A , , 7392–7403.(28) Anslyn, E.; Dougherty, D.; Dougherty, E.; Books, U. S. Modern Physical Organic Chemistry ; Univer-sity Science Books, 2006.(29) Glazer, A.

Acta Crystallogr. B , , 3384–3392.(30) Giessibl, F. J. Science , , 68–71.(31) Curtiss, L. A.; Raghavachari, K.; Redfern, P. C.; Pople, J. A. J. Chem. Phys. , , 7374.(32) Haunschild, R.; Klopper, W. J. Chem. Phys. , , 164102.(33) Taylor, P. R. European summer school in quantum chemistry ; Springer, Berlin, 1994; Vol. 125; pp125–202.(34) Bartlett, R. J.

Annu. Rev. Phys. Chem. , , 359–401.(35) St¨ohr, M.; Van Voorhis, T.; Tkatchenko, A. Chem. Soc. Rev. , , 4118–4154.(36) Lundberg, M.; Siegbahn, P. E. J. Chem. Phys. , , 224103.(37) Morgante, P.; Peverati, R. Int. J. Quantum Chem. , , e26332.(38) Becke, A. D. J. Chem. Phys. , , 5648.(39) Perdew, J. P.; Burke, K.; Ernzerhof, M. Phys. Rev. Lett. , , 3865.(40) Goerigk, L.; Hansen, A.; Bauer, C.; Ehrlich, S.; Najibi, A.; Grimme, S. Phys. Chem. Chem. Phys. , , 32184–32215.(41) Zhao, Y.; Gonz´alez-Garda, N.; Truhlar, D. G. J. Phys. Chem. A , , 2012–2018.

42) Sim, E.; Song, S.; Burke, K.

J. Phys. Chem. Lett. , , 6385–6392.(43) Riley, K. E.; Op’t Holt, B. T.; Merz, K. M. J. Chem. Theory Comput. , , 407–433.(44) Maldonado, A. M.; Hagiwara, S.; Choi, T. H.; Eckert, F.; Schwarz, K.; Sundararaman, R.; Otani, M.;Keith, J. A. J. Phys. Chem. A , , 154–164.(45) Abraham, M. J. et al. J. Chem. Inf. Model. , , 4093–4099.(46) Wheeler, S. E.; Houk, K. N. J. Chem. Theory Comput. , , 395–404.(47) Bonomi, M. et al. Nat. Methods , , 670–673.(48) Lejaeghere, K. et al. Science , , aad3000.(49) others„ et al. J. Mach. Learn. Res. , , 2443–2466.(50) Durrani, J. Chemistry World , .(51) Perkel, J. M. Nature , , 656–658.(52) Govoni, M.; Munakami, M.; Tanikanti, A.; Skone, J. H.; Runesha, H. B.; Giberti, F.; de Pablo, J.;Galli, G. Sci. Data , , 190002.(53) Kitchin, J. R. ACS Catal. , , 3894–3899.(54) ´Alvarez-Moreno, M.; De Graaf, C.; L´opez, N.; Maseras, F.; Poblet, J. M.; Bo, C. J. Chem. Inf. Model. , , 95–103.(55) Huber, S. P. et al. Sci. Data , , 300.(56) Heidrich, D.; Quapp, W. Theor. Chim. Acta , , 89–98.(57) Ess, D. H.; Wheeler, S. E.; Iafe, R. G.; Xu, L.; C¸ elebi- ¨Ol¸c¨um, N.; Houk, K. N. Angew. Chem. Int. Ed. , , 7592–7601.(58) Tarczay, G.; Cs´asz´ar, A. G.; Klopper, W.; Quiney, H. M. Mol. Phys. , , 1769–1794.(59) Perdew, J. P.; Schmidt, K. Jacob’s ladder of density functional approximations for the exchange-correlation energy. AIP Conference Proceedings. 2001; pp 1–20.

60) Thompson, A. P.

Predictive Atomistic Simulations of Materials using SNAP Machine-Learning Inter-atomic Potentials. ; 2019.(61) Schr¨odinger, E.

Ann. Physik , , 361–376.(62) Schr¨odinger, E. Ann. Physik , , 489–527.(63) Schr¨odinger, E. Ann. Physik , , 109–139.(64) Born, M.; Oppenheimer, R. Ann. Phys. , , 457–484.(65) Curchod, B. F.; Mart´ınez, T. J. Chem. Rev. , , 3305–3336.(66) Pavoˇsevi´c, F.; Culpitt, T.; Hammes-Schiﬀer, S. Chem. Rev. , , 4222–4253.(67) Peterson, K. A.; Dunning, T. H. J. Chem. Phys. , , 10548.(68) Hehre, W. J.; Stewart, R. F.; Pople, J. A. J. Chem. Phys. , , 2657.(69) Sch¨afer, A.; Horn, H.; Ahlrichs, R. J. Chem. Phys. , , 2571.(70) Van Lenthe, E.; Baerends, E. J. J. Comput. Chem. , , 1142–1156.(71) Slater, J. C. Adv. Quantum Chem. , , 35–58.(72) MacDonald, A. H.; Picket, W. E.; Koelling, D. D. J. Phys. C Solid State Phys. , , 2675.(73) Louie, S. G.; Ho, K. M.; Cohen, M. L. Phys. Rev. B , , 1774.(74) Goedecker, S.; Teter, M. Phys. Rev. B Condens. Matter Mater. Phys. , , 1703.(75) Melius, C. F.; Goddard, W. A. Phys. Rev. A , , 1528.(76) Wadt, W. R.; Hay, P. J. J. Chem. Phys. , , 284.(77) Hay, P. J.; Wadt, W. R. J. Chem. Phys. , , 270.(78) Cao, X.; Dolg, M. J. Mol. Struct.-THEOCHEM , , 139–147.(79) Metz, B.; Stoll, H.; Dolg, M. J. Chem. Phys. , , 2563.(80) Dolg, M. In Handbook of Relativistic Quantum Chemistry ; Liu, W., Ed.; Springer, Berlin, Heidelberg,2016; pp 449–478.

81) Shaw, R. W.; Harrison, W. A.

Phys. Rev. , , 604.(82) Kahn, L. R.; Baybutt, P.; Truhlar, D. G. J. Chem. Phys. , , 3826.(83) Christiansen, P. A.; Lee, Y. S.; Pitzer, K. S. J. Chem. Phys. , , 4445.(84) Hamann, D. R.; Schl¨uter, M.; Chiang, C. Phys. Rev. Lett. , , 1494.(85) Vanderbilt, D. Phys. Rev. B , , 8412.(86) Garrity, K. F.; Bennett, J. W.; Rabe, K. M.; Vanderbilt, D. Comput. Mater. Sci , , 446–452.(87) Kresse, G.; Hafner, J. J. Phys. Condens. Matter. , , 8245.(88) Joubert, D. Phys. Rev. B Condens. Matter Mater. Phys. , , 1758.(89) Troullier, N.; Martins, J. Solid State Commun. , , 613–616.(90) Peterson, K. A.; Figgen, D.; Goll, E.; Stoll, H.; Dolg, M. J. Chem. Phys. , , 11113.(91) Roy, L. E.; Hay, P. J.; Martin, R. L. J. Chem. Theory Comput. , , 1029–1031.(92) Pyykk¨o, P. Annu. Rev. Phys. Chem. , , 45–64.(93) Dirac, P. A. M.; Fowler, R. H. Proc. R. Soc. Lond. A , , 610–624.(94) Tecmer, P.; Boguslawski, K.; K¸edziera, D. In Handbook of Computational Chemistry ; Leszczynski, J.,Ed.; Springer, Dordrecht, 2016; pp 1–43.(95) Feynman, R.

Quantum Electrodynamics ; Avalon Publishing, 1998.(96) Nakajima, T.; Hirao, K.

Chem. Rev. , , 385–402.(97) Van Lenthe, J. H.; Faas, S.; Snijders, J. G. Chem. Phys. Lett. , , 107–112.(98) Visscher, L. J. Comput. Chem. , , 759–766.(99) Hartree, D. R.; Hartree, W. Proc. R. Soc. Lond. A , , 9–33.(100) Slater, J. C. Phys. Rev. , , 385.(101) Fock, V. Zeitschrift f¨ur Physik , , 126–148.(102) Jensen, F. Introduction to Computational Chemistry ; Wiley, 2007.

Reviews of Modern Physics , , 69.(104) Hall, G. G. Proc. R. Soc. Lond. A , , 541–552.(105) Hermann, J.; Sch¨atzle, Z.; No´e, F. Nat. Chem. , , 891–897.(106) Pfau, D.; Spencer, J. S.; Matthews, A. G.; Foulkes, W. M. C. Phys. Rev. Res. , , 033429.(107) Eriksen, J. J. et al. J. Phys. Chem. Lett. , 8922–8929.(108) Helgaker, T.; Jorgensen, P.; Olsen, J.

Molecular Electronic-Structure Theory ; Wiley, 2013.(109) Bartlett, R. J.; Musia l, M.

Rev. Mod. Phys. , , 291–352.(110) ˇRez´aˇc, J.; Hobza, P. J. Chem. Theory Comput. , , 2151–2155.(111) Moran, D.; Simmonett, A. C.; Leach, F. E.; Allen, W. D.; Schleyer, P. V.; Schaefer, H. F. J. Am.Chem. Soc. , , 9342–9343.(112) Samala, N. R.; Jordan, K. D. Chem. Phys. Lett. , , 230–232.(113) Titov, A. V.; Uﬁmtsev, I. S.; Luehr, N.; Martinez, T. J. J. Chem. Theory Comput. , , 213–221.(114) Seritan, S.; Bannwarth, C.; Fales, B. S.; Hohenstein, E. G.; Isborn, C. M.; Kokkila-Schumacher, S. I.;Li, X.; Liu, F.; Luehr, N.; Snyder, J. W.; Song, C.; Titov, A. V.; Uﬁmtsev, I. S.; Wang, L. P.;Mart´ınez, T. J. Wiley Interdiscip. Rev. Comput. Mol. Sci. , e1494.(115) Anderson, A. G.; Goddard, W. A.; Schr¨oder, P.

Comput. Phys. Commun. , , 298–306.(116) Andrade, X.; Aspuru-Guzik, A. J. Chem. Theory Comput. , , 4360–4373.(117) Friesner, R. A. J. Chem. Phys. , , 1462.(118) Martinez, T. J.; Mehta, A.; Carter, E. A. J. Chem. Phys. , , 1876.(119) Friesner, R. A.; Murphy, R. B.; Ringnalda, M. N. Encyclopedia of Computational Chemistry ; 2002.(120) Sierka, M.; Hogekamp, A.; Ahlrichs, R.

J. Chem. Phys. , , 9136–9148.(121) Riplinger, C.; Neese, F. J. Chem. Phys. , , 034106.(122) Kong, L.; Bischoﬀ, F. A.; Valeev, E. F. Chem. Rev. , , 75–107. Chem. Rev. , , 263–288.(124) Marti, K. H.; Reiher, M. Z. Phys. Chem. , , 583–599.(125) Sch¨utt, K.; Gastegger, M.; Tkatchenko, A.; M¨uller, K.-R.; Maurer, R. Nat. Commun. , , 5024.(126) McGibbon, R. T.; Taube, A. G.; Donchev, A. G.; Siva, K.; Hern´andez, F.; Hargus, C.; Law, K. H.;Klepeis, J. L.; Shaw, D. E. J. Chem. Phys. , , 161725.(127) Townsend, J.; Vogiatzis, K. D. J. Chem. Theory, Comput. , , 7453–7461.(128) Coe, J. P. J. Chem. Theory, Comput. , , 5739–5749.(129) Jeong, W. S.; Stoneburner, S. J.; King, D.; Li, R.; Walker, A.; Lindh, R.; Gagliardi, L. J. Chem.Theory Comput. , , 2389–2399.(130) Montgomery, J. A.; Frisch, M. J.; Ochterski, J. W.; Petersson, G. A. J. Chem. Phys. , , 6532.(131) Curtiss, L. A.; Raghavach Ari, K.; Redfern, P. C.; Rassolov, V.; Pople, J. A. J. Chem. Phys. , , 7764.(132) Karton, A.; Rabinovich, E.; Martin, J. M.; Ruscic, B. J. Chem. Phys. , , 144108.(133) Tajti, A.; Szalay, P. G.; Cs´asz´ar, A. G.; K´allay, M.; Gauss, J.; Valeev, E. F.; Flowers, B. A.; V´azquez, J.;Stanton, J. F. J. Chem. Phys. , , 11599.(134) Karton, A. Wiley Interdiscip. Rev. Comput. Mol. Sci. , , 292–310.(135) Zaspel, P.; Huang, B.; Harbrecht, H.; Von Lilienfeld, O. A. J. Chem. Theory Comput. , ,1546–1559.(136) Park, J. W.; Al-Saadon, R.; Macleod, M. K.; Shiozaki, T.; Vlaisavljevich, B. Chem. Rev. , ,5878–5909.(137) Hirao, K. Chem. Phys. Lett. , , 374–380.(138) Buenker, R. J.; Peyerimhoﬀ, S. D.; Butscher, W. Mol. Phys. , , 771–791.(139) Levine, B. G.; Coe, J. D.; Mart´ınez, T. J. J. Phys. Chem. B , , 405–413.(140) Jiang, W.; Deyonker, N. J.; Wilson, A. K. J. Chem. Theory Comput. , , 460–468. Chem. Phys. Lett. , , 362–367.(142) Duan, C.; Liu, F.; Nandy, A.; Kulik, H. J. J. Chem. Theory Comput. , , 4373–4387.(143) Bobrowicz, F. W.; Goddard, W. A. In Methods of Electronic Structure Theory ; Schaefer, H. F., Ed.;Springer, Boston, MA, 1977; pp 79–127.(144) Roos, B. O.; Taylor, P. R.; Sigbahn, P. E.

Chem. Phys. , , 157–173.(145) Szalay, P. G.; M¨uller, T.; Gidofalvi, G.; Lischka, H.; Shepard, R. Chem. Rev. , , 108–181.(146) Pulay, P. Int. J. Quantum Chem. , , 3273–3279.(147) Lyakh, D. I.; Musia l, M.; Lotrich, V. F.; Bartlett, R. J. Chem. Rev. , , 182–243.(148) Evangelista, F. A. J. Chem. Phys. , , 030901.(149) Jensen, K. P.; Roos, B. O.; Ryde, U. J. Inorg. Biochem. , , 978.(150) Pople, J. A. J. Chem. Phys. , , S229.(151) Parr, R.; Weitao, Y. Density-Functional Theory of Atoms and Molecules ; International Series of Mono-graphs on Chemistry; Oxford University Press, 1994.(152) Witt, W. C.; Del Rio, B. G.; Dieterich, J. M.; Carter, E. A.

J. Mater. Res. , , 777–795.(153) Hung, L.; Huang, C.; Shin, I.; Ho, G. S.; Lign`eres, V. L.; Carter, E. A. Comput. Phys. Commun. , , 2208–2209.(154) Mi, W.; Shao, X.; Su, C.; Zhou, Y.; Zhang, S.; Li, Q.; Wang, H.; Zhang, L.; Miao, M.; Wang, Y.;Ma, Y. Comput. Phys. Commun. , , 87–95.(155) Mi, W.; Genova, A.; Pavanello, M. J. Chem. Phys. , , 184107.(156) Ayers, P. W. J. Math. Phys. , , 062107.(157) Huang, C.; Carter, E. A. Phys. Rev. B Condens. Matter Mater. Phys. , , 045206.(158) Burakovsky, L.; Ticknor, C.; Kress, J. D.; Collins, L. A.; Lambert, F. Phys. Rev. E Stat. Nonlin. Soft.Matter Phys. , , 023104.(159) Sjostrom, T.; Daligault, J. Phys. Rev. E Stat. Nonlin. Soft. Matter Phys. , , 063304. Matter Radiat. at Extremes , , 064403.(161) Snyder, J. C.; Rupp, M.; Hansen, K.; Blooston, L.; M¨uller, K. R.; Burke, K. J. Chem. Phys. , , 224104.(162) Brockherde, F.; Vogt, L.; Li, L.; Tuckerman, M. E.; Burke, K.; M¨uller, K. R. Nat. Commun. , ,872.(163) Kohn, W.; Sham, L. J. Phys. Rev. , , A1133–A1138.(164) Maurer, R. J.; Freysoldt, C.; Reilly, A. M.; Brandenburg, J. G.; Hofmann, O. T.; Bj¨orkman, T.;Leb`egue, S.; Tkatchenko, A. Annu. Rev. Mater. Res. , , 1–30.(165) Jacobsen, H.; Cavallo, L. In Handbook of Computational Chemistry ; Leszczynski, J., Ed.; Springer,Dordrecht, 2012; pp 95–133.(166) Learn Density Functional Theory. https://dft.uci.edu/learnDFT.php , Accessed: 2020-11-30.(167) Tran, F.; Stelzl, J.; Blaha, P.

J. Chem. Phys. , , 204120.(168) Kozuch, S.; Martin, J. M. J. Comput. Chem. , , 2327–2344.(169) Janesko, B. G. Phys. Chem. Chem. Phys. , , 4793–4801.(170) Gerber, I. C.; ´Angy´an, J. G.; Marsman, M.; Kresse, G. J. Chem. Phys. , , 054101.(171) Pisani, C.; Dovesi, R.; Roetti, C. Hartree-Fock Ab Initio Treatment of Crystalline Systems ; SpringerScience & Business Media, 2012; Vol. 48.(172) Shishkin, M.; Sato, H.

J. Chem. Phys. , , 024102.(173) Petukhov, A. G.; Mazin, I. I.; Chioncel, L.; Lichtenstein, A. I. Phys. Rev. B Condens. Matter Mater.Phys. , , 153106.(174) Sun, Q.; Chan, G. K. L. Acc. Chem. Res. , , 2705–2712.(175) Cortona, P. Phys. Rev. B , , 8454.(176) Huang, P.; Carter, E. A. Annu. Rev. Phys. Chem. , , 261–290.(177) Manby, F. R.; Stella, M.; Goodpaster, J. D.; Miller, T. F. J. Chem. Theory Comput. , , 2564–2568. Acc. Chem. Res. , , 2768–2775.(179) Casida, M.; Huix-Rotllant, M. Annu. Rev. Phys. Chem. , , 287–323.(180) Casida, M. E.; Casida, K. C.; Salahub, D. R. Int. J. Quantum Chem. , , 933–941.(181) Snyder, J. C.; Rupp, M.; Hansen, K.; M¨uller, K. R.; Burke, K. Phys. Rev. Lett. , , 253002.(182) Schmidt, J.; Benavides-Riveros, C. L.; Marques, M. A. J. Phys. Chem. Lett. , , 6425–6431.(183) Meyer, R.; Weichselbaum, M.; Hauser, A. W. J. Chem. Theory, Comput. , , 5685–5694.(184) Bogojeski, M.; Vogt-Maranto, L.; Tuckerman, M. E.; M¨uller, K.-R.; Burke, K. Nat. Commun. , , 5223.(185) Nagai, R.; Akashi, R.; Sugino, O. Npj Comput. Mater. , , 1–8.(186) Thiel, W. Wiley Interdiscip. Rev. Comput. Mol. Sci. , , 145–157.(187) Pople, J.; Beveridge, D. Approximate Molecular Orbital Theory ; McGraw-Hill, 1970.(188) Dewar, M. J.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J.

J. Am. Chem. Soc. , , 3902–3909.(189) Stewart, J. J. J. Mol. Model. , , 1–32.(190) Dewar, M. J.; Thiel, W. J. Am. Chem. Soc. , , 4899–4907.(191) Dral, P. O.; Wu, X.; Sp¨orkel, L.; Koslowski, A.; Thiel, W. J. Chem. Theory Comput. , ,1097–1120.(192) Koskinen, P.; M¨akinen, V. Comput. Mater. Sci. , , 237–253.(193) Elstner, M.; Porezag, D.; Jungnickel, G.; Elsner, J.; Haugk, M.; Frauenheim, T. Phys. Rev. B Condens.Matter Mater. Phys. , , 7260.(194) Bannwarth, C.; Ehlert, S.; Grimme, S. J. Chem. Theory Comput. , , 1652–1671.(195) Dral, P. O.; von Lilienfeld, O. A.; Thiel, W. J. Chem. Theory, Comput. , , 2120–2125.(196) Hegde, G.; Bowen, R. C. Sci. Rep. , , 42669.(197) St¨ohr, M.; Medrano Sandonas, L.; Tkatchenko, A. J. Phys. Chem. Lett. , , 6835–6843. J. Chem. Theory, Comput. , ,5764–5776.(199) Poltavsky, I.; Zheng, L.; Mortazavi, M.; Tkatchenko, A. J. Chem. Phys. , , 204707.(200) Ceriotti, M.; Fang, W.; Kusalik, P. G.; McKenzie, R. H.; Michaelides, A.; Morales, M. A.; Mark-land, T. E. Chem. Rev. , , 7529–7550.(201) Marx, D.; Parrinello, M. J. Chem. Phys. , , 4077–4082.(202) Chandler, D.; Wolynes, P. G. J. Chem. Phys. , , 4078–4095.(203) Cao, J.; Voth, G. A. J. Chem. Phys. , , 6168–6183.(204) Hele, T. J. H.; Willatt, M. J.; Muolo, A.; Althorpe, S. C. J. Chem. Phys. , , 191101.(205) Wang, L.; Ceriotti, M.; Markland, T. E. J. Chem. Phys. , , 104502.(206) Sauceda, H. E.; Vassilev-Galindo, V.; Chmiela, S.; M¨uller, K.-R.; Tkatchenko, A. Nat. Commun. , , 442.(207) Chmiela, S.; Sauceda, H. E.; M¨uller, K. R.; Tkatchenko, A. Nat. Commun. , , 3887.(208) Wang, X.; Ram´ırez-Hinestrosa, S.; Dobnikar, J.; Frenkel, D. Phys. Chem. Chem. Phys. , ,10624–10633.(209) Li, P.; Song, L. F.; Merz, K. M. J. Phys. Chem. B , , 883–895.(210) Girifalco, L. A.; Weizer, V. G. Phys. Rev. , , 687.(211) Buckingham, R. A. Proc. R. Soc. Lond. A , , 264–283.(212) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. J. Chem. Phys. , , 926.(213) Stillinger, F. H.; Weber, T. A. Phys. Rev. B , , 5262.(214) Weiner, P. K.; Kollman, P. A. J. Comput. Chem. , , 287–303.(215) Salomon-Ferrer, R.; Case, D. A.; Walker, R. C. Wiley Interdiscip. Rev. Comput. Mol. Sci. , ,198–210. J. Comput. Chem. , ,1157–1174.(217) others„ et al. J. Comput. Chem. , , 1545–1614.(218) Oostenbrink, C.; Villa, A.; Mark, A. E.; van Gunsteren, W. F. J. Comp. Chem. , , 1656–1676.(219) Schmid, N.; Eichenberger, A. P.; Choutko, A.; Riniker, S.; Winger, M.; Mark, A. E.; Van Gun-steren, W. F. Eur. Biophys. J. , , 843.(220) Daura, X.; Mark, A. E.; Van Gunsteren, W. F. J. Comput. Chem. , , 535–547.(221) Jorgensen, W. L.; Maxwell, D. S.; Tirado-Rives, J. J. Am. Chem. Soc , , 11225–11236.(222) Jorgensen, W. L.; Madura, J. D.; Swenson, C. J. J. Am. Chem. Soc. , , 6638–6646.(223) Mayo, S. L.; Olafson, B. D.; Goddard, W. A. J. Phys. Chem. , , 8897–8909.(224) Halgren, T. A. J. Comput. Chem. , , 490–519.(225) Rapp´e, A. K.; Casewit, C. J.; Colwell, K. S.; Goddard, W. A.; Skiﬀ, W. M. J. Am. Chem. Soc. , , 10024–10035.(226) Sun, H. J. Phys. Chem. B , , 7338–7364.(227) Heinz, H.; Lin, T. J.; Kishore Mishra, R.; Emami, F. S. Langmuir , , 1754–1765.(228) Gale, J. D. Philos. Mag. B , , 3–19.(229) Ponder, J. W.; Wu, C.; Ren, P.; Pande, V. S.; Chodera, J. D.; Schnieders, M. J.; Haque, I.; Mob-ley, D. L.; Lambrecht, D. S.; DiStasio, R. A.; Head-Gordon, M.; Clark, G. N. I.; Johnson, M. E.;Head-Gordon, T. J. Phys. Chem. B , , 2549–2564.(230) Lemkul, J. A.; Huang, J.; Roux, B.; Mackerell, A. D. Chem. Rev. , , 4983–5013.(231) Banks, J. L.; Kaminski, G. A.; Zhou, R.; Mainz, D. T.; Berne, B. J.; Friesner, R. A. J. Chem. Phys. , , 741.(232) Babin, V.; Leforestier, C.; Paesani, F. J. Chem. Theory Comput. , , 5395–5403.(233) Kumar, R.; Wang, F. F.; Jenness, G. R.; Jordan, K. D. J. Chem. Phys. , , 014309. J. Chem. Phys. , , 090901.(235) Daw, M. S.; Foiles, S. M.; Baskes, M. I. Mater. Sci. Rep. , , 251–310.(236) Baskes, M. I. Phys. Rev. B , , 2727.(237) Finnis, M. W.; Sinclair, J. E. Philos. Mag. A , , 45–55.(238) Sutton, A. P.; Chen, J. Philos. Mag. Lett. , , 139–146.(239) Brenner, D. W. Phys. Rev. B , , 9458.(240) Tersoﬀ, J. Phys. Rev. B , , 6991.(241) Tersoﬀ, J. Phys. Rev. B , , 5566–5568.(242) Brenner, D. W.; Shenderova, O. A.; Harrison, J. A.; Stuart, S. J.; Ni, B.; Sinnott, S. B. J. Phys.Condens. Matter. , , 783.(243) Liang, T.; Shan, T. R.; Cheng, Y. T.; Devine, B. D.; Noordhoek, M.; Li, Y.; Lu, Z.; Phillpot, S. R.;Sinnott, S. B. Mater. Sci. Eng. R Rep. , , 255–279.(244) Yu, J.; Sinnott, S. B.; Phillpot, S. R. Phys. Rev. B , , 085311.(245) Senftle, T. P.; Hong, S.; Islam, M. M.; Kylasa, S. B.; Zheng, Y.; Shin, Y. K.; Junkermeier, C.; Engel-Herbert, R.; Janik, M. J.; Aktulga, H. M.; Verstraelen, T.; Grama, A.; Van Duin, A. C. Npj Comput.Mater. , , 15011.(246) Van Duin, A. C.; Dasgupta, S.; Lorant, F.; Goddard, W. A. J. Phys. Chem. A , , 9396–9409.(247) Rapp´e, A. K.; Bormann-Rochotte, L. M.; Wiser, D. C.; Hart, J. R.; Pietsch, M. A.; Casewit, C. J.;Skiﬀ, W. M. Mol. Phys. , , 301–324.(248) Warshel, A.; Weiss, R. M. J. Am. Chem. Soc. , , 6218–6226.(249) Wu, Y.; Chen, H.; Wang, F.; Paesani, F.; Voth, G. A. J. Phys. Chem. B , , 467–482.(250) Hartke, B.; Grimme, S. Phys. Chem. Chem. Phys. , , 16715–16718.(251) Singh, U. C.; Kollman, P. A. J. Comput. Chem. , , 129–145. J. Comput. Aided Mol. Des. , ,87–110.(253) Mehler, E. L.; Solmajer, T. Protein Eng. Des. Sel. , , 903–910.(254) Chen, J.; Mart´ınez, T. J. Chem. Phys. Lett. , , 315–320.(255) Poier, P. P.; Jensen, F. J. Chem. Theory Comput. , , 3093–3107.(256) Rapp´e, A. K.; Goddard, W. A. J. Phys. Chem. , , 3358–3363.(257) Akimov, A. V.; Prezhdo, O. V. Chem. Rev. , , 5797–5890.(258) Harrison, J. A.; Schall, J. D.; Maskey, S.; Mikulski, P. T.; Knippenberg, M. T.; Morrow, B. H. Appl.Phys. Rev. , , 031104.(259) Piquemal, J. P.; Jordan, K. D. J. Chem. Phys. , , 161401.(260) Lennard-Jones, J. E. Proc. Phys. Soc. , , 461–482.(261) Tersoﬀ, J. Phys. Rev. B , , 9902.(262) Vanommeslaeghe, K.; Hatcher, E.; Acharya, C.; Kundu, S.; Zhong, S.; Shim, J.; Darian, E.; Gu-vench, O.; Lopes, P.; Vorobyov, I.; Mackerell, A. D. J. Comput. Chem. , , 671–690.(263) Zhang, C.; Lu, C.; Jing, Z.; Wu, C.; Piquemal, J.-P.; Ponder, J. W.; Ren, P. J. Chem. Theory Comput. , , 2084–2108.(264) G¨otz, A. W.; Williamson, M. J.; Xu, D.; Poole, D.; Le Grand, S.; Walker, R. C. J. Chem. TheoryComput. , , 1542–1555.(265) Salomon-Ferrer, R.; G¨otz, A. W.; Poole, D.; Le Grand, S.; Walker, R. C. J. Chem. Theory Comput. , , 3878–3888.(266) Stone, J. E.; Hardy, D. J.; Uﬁmtsev, I. S.; Schulten, K. J. Mol. Graph. Model. , , 116–125.(267) Glaser, J.; Nguyen, T. D.; Anderson, J. A.; Lui, P.; Spiga, F.; Millan, J. A.; Morse, D. C.; Glotzer, S. C. Comput. Phys. Commun. , , 97–107.(268) Lagard`ere, L.; Jolly, L. H.; Lipparini, F.; Aviat, F.; Stamm, B.; Jing, Z. F.; Harger, M.; Torabifard, H.;Cisneros, G. A.; Schnieders, M. J.; Gresh, N.; Maday, Y.; Ren, P. Y.; Ponder, J. W.; Piquemal, J. P. Chem. Sci. , , 956–972. J. Phys. Chem. Lett. , , 4962–4967.(270) Agrawal, A.; Choudhary, A. APL Mater. , , 053208.(271) Pun, G. P.; Batra, R.; Ramprasad, R.; Mishin, Y. Nat. Commun. , , 1–10.(272) Guo, F.; Wen, Y.-S.; Feng, S.-Q.; Li, X.-D.; Li, H.-S.; Cui, S.-X.; Zhang, Z.-R.; Hu, H.-Q.; Zhang, G.-Q.; Cheng, X.-L. Comput. Mater. Sci. , , 109393.(273) Narayanan, B.; Chan, H.; Kinaci, A.; Sen, F. G.; Gray, S. K.; Chan, M. K.; Sankaranarayanan, S. K. Nanoscale , , 18229–18239.(274) Behler, J.; Parrinello, M. Phys. Rev. Lett. , , 146401.(275) Wood, M. A.; Thompson, A. P. J. Chem. Phys. , , 241721.(276) Bart´ok, A. P.; Payne, M. C.; Kondor, R.; Cs´anyi, G. Phys. Rev. Lett. , , 136403.(277) Boes, J. R.; Groenenboom, M. C.; Keith, J. A.; Kitchin, J. R. Int. J. Quantum Chem. , ,979–987.(278) Ing´olfsson, H. I.; Lopez, C. A.; Uusitalo, J. J.; de Jong, D. H.; Gopal, S. M.; Periole, X.; Marrink, S. J. Wiley Interdiscip. Rev. Comput. Mol. Sci , , 225–248.(279) Mennucci, B. Wiley Interdiscip. Rev. Comput. Mol. Sci. , , 386–404.(280) J¨ager, M.; Sch¨afer, R.; Johnston, R. L. Adv. Phys. X , , 1516514.(281) Dieterich, J. M.; Hartke, B. Mol. Phys. , , 279–291.(282) Wales, D. J.; Doye, J. P. J. Phys. Chem. A , , 5111–5116.(283) Zhang, J.; Dolg, M. Phys. Chem. Chem. Phys. , , 24173–24181.(284) Goedecker, S. J. Chem. Phys. , , 9911.(285) Schlegel, H. B. Wiley Interdiscip. Rev. Comput. Mol. Sci. , , 790–809.(286) Sheppard, D.; Terrell, R.; Henkelman, G. J. Chem. Phys. , , 134106.(287) Schlegel, B. H. Theor. Chim. Acta , , 333–340.(288) Henkelman, G.; Uberuaga, B. P.; J´onsson, H. J. Chem. Phys. , , 9901. J. Chem. Phys. , ,074103.(290) Zimmerman, P. M. J. Chem. Phys. , , 184102.(291) Samanta, A.; Weinan, E. Commun. Comput. Phys. , , 265–275.(292) Burger, S. K.; Yang, W. J. Chem. Phys. , , 054109.(293) Peterson, A. A. J. Chem. Phys. , , 074106.(294) Del R´ıo, E. G.; Mortensen, J. J.; Jacobsen, K. W. Phys. Rev. B , , 104103.(295) Garrido Torres, J. A.; Jennings, P. C.; Hansen, M. H.; Boes, J. R.; Bligaard, T. Phys. Rev. Lett. , , 156001.(296) Meyer, R.; Schmuck, K. S.; Hauser, A. W. J. Chem. Theory Comput. , 6513–6523.(297) Koistinen, O. P.; ´Asgeirsson, V.; Vehtari, A.; J´onsson, H.

J. Chem. Theory Comput. , ,6738–6751.(298) No´e, F.; Olsson, S.; K¨ohler, J.; Wu, H. Science , , eaaw1147.(299) Christensen, A. S.; Faber, F. A.; von Lilienfeld, O. A. J. Chem. Phys. , , 064105.(300) Gastegger, M.; Sch¨utt, K. T.; M¨uller, K.-R. arXiv preprint , arXiv:2010.14942.(301) Varghese, J. J.; Mushrif, S. H. React. Chem. Eng. , , 165–206.(302) Basdogan, Y.; Maldonado, A. M.; Keith, J. A. Wiley Interdiscip. Rev. Comput. Mol. Sci. , ,e1446.(303) Tomasi, J.; Mennucci, B.; Cammi, R. Chem. Rev. , , 2999–3094.(304) Cramer, C. J.; Truhlar, D. G. Acc. Chem. Res. , , 760–768.(305) Klamt, A. Wiley Interdiscip. Rev. Comput. Mol. Sci. , , 699–709.(306) Hirata, F. Molecular theory of solvation ; Springer Science & Business Media, 2003; Vol. 24.(307) Miertuˇs, S.; Scrocco, E.; Tomasi, J.

Chem. Phys. , , 117–129.(308) Cances, E.; Mennucci, B.; Tomasi, J. J. Chem. Phys. , , 3032–3041. J. Phys. Chem. B , , 6378–6396.(310) Barone, V.; Cossi, M. J. Phys. Chem. A , , 1995–2001.(311) Klamt, A.; Sch¨u¨urmann, G. J. Chem. Soc., Perkin Trans. 2 , 799–805.(312) Klamt, A.

J. Phys. Chem. , , 2224–2235.(313) Bernales, V. S.; Marenich, A. V.; Contreras, R.; Cramer, C. J.; Truhlar, D. G. J. Phys. Chem. B , , 9122–9129.(314) Truchon, J. F.; Pettitt, B. M.; Labute, P. J. Chem. Theory Comput. , , 934–941.(315) Nishihara, S.; Otani, M. Phys. Rev. B , , 115429.(316) Kamerlin, S. C.; Haranczyk, M.; Warshel, A. J. Phys. Chem. B , , 1253–1272.(317) Boereboom, J. M.; Fleurat-Lessard, P.; Bulo, R. E. J. Chem. Theory Comput. , , 1841–1852.(318) Gregersen, B. A.; Lopez, X.; York, D. M. J. Am. Chem. Soc. , , 7504–7513.(319) Maldonado, A. M.; Basdogan, Y.; Berryman, J. T.; Rempe, S. B.; Keith, J. A. J. Chem. Phys. , , 130902.(320) Skyner, R.; McDonagh, J.; Groom, C.; Van Mourik, T.; Mitchell, J. Phys. Chem. Chem. Phys. , , 6174–6191.(321) Basdogan, Y.; Groenenboom, M. C.; Henderson, E.; De, S.; Rempe, S. B.; Keith, J. A. J. Chem.Theory Comput. , , 633–642.(322) Pratt, L. R.; Laviolette, R. A. Mol. Phys. , , 909–915.(323) Rempe, S. B.; Pratt, L. R.; Hummer, G.; Kress, J. D.; Martin, R. L.; Redondo, A. J. Am. Chem. Soc. , , 966–967.(324) Chew, A. K.; Jiang, S.; Zhang, W.; Zavala, V. M.; Van Lehn, R. C. Chem. Sci. , , 12464–12476.(325) Zhang, P.; Shen, L.; Yang, W. J. Phys. Chem. B , , 901–908.(326) Katritzky, A. R.; Kuanar, M.; Slavov, S.; Hall, C. D.; Karelson, M.; Kahn, I.; Dobchev, D. A. Chem.Rev. , , 5714–5789. Chem. Soc. Rev. , , 3525–3564.(328) Fourches, D.; Muratov, E.; Tropsha, A. J. Chem. Inf. Model. , , 1189–1204.(329) Geerlings, P.; Chamorro, E.; Chattaraj, P. K.; De Proft, F.; G´azquez, J. L.; Liu, S.; Morell, C.;Toro-Labb´e, A.; Vela, A.; Ayers, P. Theor. Chem. Acc. , , 36.(330) von Rudorﬀ, G. F.; von Lilienfeld, O. A. Phys. Rev. Res. , , 023220.(331) Hinton, G. E.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. R. arXiv preprint ,arXiv:1207.0580.(332) Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.;Rabinovich, A. Going deeper with convolutions. Proceedings of the IEEE conference on computervision and pattern recognition. 2015; pp 1–9.(333) others„ et al. Int. J. Comput. Vis. , , 211–252.(334) Krizhevsky, A.; Sutskever, I.; Hinton, G. E. Commun. ACM , , 84–90.(335) Blei, D. M.; Ng, A. Y.; Jordan, M. I. J. Mach. Learn. Res. , , 993–1022.(336) Bengio, Y.; Ducharme, R.; Vincent, P.; Jauvin, C. J. Mach. Learn. Res. , , 1137–1155.(337) others„ et al. arXiv preprint , arXiv:1609.08144.(338) Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. arXiv preprint , arXiv:1301.3781.(339) Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L.; Polosukhin, I. Adv. Neural Inf. Process. Syst. , , 5998–6008.(340) Friedman, J.; Hastie, T.; Tibshirani, R. The elements of statistical learning ; Springer series in statisticsNew York, 2001; Vol. 1.(341) Rasmussen, C. E. Gaussian processes in machine learning. Summer School on Machine Learning. 2003;pp 63–71.(342) Bishop, C. M.

Pattern recognition and machine learning ; springer, 2006.(343) others„ et al.

Neural Netw. , , 359–366.(344) Tran, D.; Ranganath, R.; Blei, D. M. arXiv preprint , arXiv:1511.06499. The nature of statistical learning theory ; Springer, 1995.(346) Poggio, T.; Girosi, F.

Proc. IEEE , , 1481–1497.(347) Smola, A. J.; Sch¨olkopf, B.; M¨uller, K.-R. Neural Netw. , , 637–649.(348) Rasmussen, C. E.; Williams, C. K. I. Gaussian Processes for Machine Learning (Adaptive Computationand Machine Learning) ; The MIT Press, 2005.(349) Caruana, R.; Lawrence, S.; Giles, C. L. Overﬁtting in neural nets: Backpropagation, conjugate gradi-ent, and early stopping. Advances in neural information processing systems. 2001; pp 402–408.(350) Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R.

J. Mach. Learn. Res. , , 1929–1958.(351) Geman, S.; Bienenstock, E.; Doursat, R. Neural Comput. , , 1–58.(352) Sch¨utt, K. T.; Arbabzadah, F.; Chmiela, S.; M¨uller, K. R.; Tkatchenko, A. Nat. Commun. , ,13890.(353) Bietti, A.; Mairal, J. arXiv preprint , arXiv:1905.12173.(354) Montavon, G.; Lapuschkin, S.; Binder, A.; Samek, W.; M¨uller, K.-R. Pattern Recognit. , ,211–222.(355) Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K., M¨uller, K.-R., Eds. Explainable AI: Interpret-ing, Explaining and Visualizing Deep Learning ; Lecture Notes in Computer Science; Springer, 2019;Vol. 11700.(356) Baehrens, D.; Schroeter, T.; Harmeling, S.; Kawanabe, M.; Hansen, K.; M¨uller, K.-R.

J. Mach. Learn.Res. , , 1803–1831.(357) Bach, S.; Binder, A.; Montavon, G.; Klauschen, F.; M¨uller, K.-R.; Samek, W. PloS one , ,e0130140.(358) Montavon, G.; Samek, W.; M¨uller, K.-R. Digit. Signal Process. , , 1–15.(359) Holzinger, A. From machine learning to explainable AI. 2018 World Symposium on Digital Intelligencefor Systems and Machines (DISA). 2018; pp 55–66. Nat. Commun. , , 1096.(361) Samek, W.; Montavon, G.; Lapuschkin, S.; Anders, C. J.; M¨uller, K.-R. arXiv preprint ,arXiv:2003.07631.(362) Bongard, J.; Lipson, H. Proc. Natl. Acad. Sci. , , 9943–9948.(363) Schmidt, M.; Lipson, H. Science , , 81–85.(364) Brunton, S. L.; Proctor, J. L.; Kutz, J. N. Proc. Natl. Acad. Sci. , , 3932–3937.(365) Boninsegna, L.; N¨uske, F.; Clementi, C. J. Chem. Phys. , , 241723.(366) Hoﬀmann, M.; Fr¨ohner, C.; No´e, F. J. Chem. Phys. , , 025101.(367) Watters, N.; Zoran, D.; Weber, T.; Battaglia, P.; Pascanu, R.; Tacchetti, A. Adv. Neural Inf. Process.Syst. , 4539–4547.(368) Raissi, M.

J. Mach. Learn. Res. , , 932–955.(369) Ahneman, D. T.; Estrada, J. G.; Lin, S.; Dreher, S. D.; Doyle, A. G. Science , , 186–190.(370) Chuang, K. V.; Keiser, M. J. Science , .(371) Estrada, J. G.; Ahneman, D. T.; Sheridan, R. P.; Dreher, S. D.; Doyle, A. G. Science , .(372) Chmiela, S.; Tkatchenko, A.; Sauceda, H. E.; Poltavsky, I.; Sch¨utt, K. T.; M¨uller, K.-R. Sci. Adv. , , e1603015.(373) Ghiringhelli, L. M.; Vybiral, J.; Levchenko, S. V.; Draxl, C.; Scheﬄer, M. Phys. Rev. Lett. , ,105503.(374) others„ et al. Acc. Chem. Res. , , 1981–1991.(375) Reinhardt, A.; Pickard, C. J.; Cheng, B. Phys. Chem. Chem. Phys. , , 12697–12705.(376) Meila, M.; Koelle, S.; Zhang, H. arXiv preprint , arXiv:1811.11891.(377) Cox, M. A.; Cox, T. F. Handbook of data visualization ; Springer, 2008; pp 315–347.

Neural Comput. , , 1299–1319.(380) Maaten, L. v. d.; Hinton, G. J. Mach. Learn. Res. , , 2579–2605.(381) Ceriotti, M.; Tribello, G. A.; Parrinello, M. Proc. Natl. Acad. Sci. , , 13023–13028.(382) McInnes, L.; Healy, J.; Melville, J. arXiv preprint , arXiv:1802.03426.(383) Ruﬀ, L.; Kauﬀmann, J. R.; Vandermeulen, R. A.; Montavon, G.; Samek, W.; Kloft, M.; Diet-terich, T. G.; M¨uller, K.-R. Proc. of the IEEE, doi:10.1109/JPROC.2021.3052449 ,(384) Kaelbling, L. P.; Littman, M. L.; Moore, A. W.

J. Artif. Intell. Res. , , 237–285.(385) Rosenblatt, F. Psychol. Rev. , , 386.(386) Rosenblatt, F. Principles of neurodynamics. perceptrons and the theory of brain mechanisms ; 1961.(387) Minsky, M.; Papert, S. A.

Perceptrons: An introduction to computational geometry ; MIT press, 2017.(388) Rumelhart, D. E.; Hinton, G. E.; Williams, R. J.

Nature , , 533–536.(389) LeCun, Y. Proceedings of Cognitiva 85 , 599–604.(390) others„ et al.

Neural networks for pattern recognition ; Oxford university press, 1995.(391) Funahashi, K.-I.

Neural Netw. , , 183–192.(392) Cybenko, G. Math. Control. Signals, Syst. , , 303–314.(393) Cortes, C.; Vapnik, V. Mach. Learn. , , 273–297.(394) M¨uller, K.-R.; Mika, S.; R¨atsch, G.; Tsuda, K.; Sch¨olkopf, B. IEEE Trans. Neural Netw. , ,181–201.(395) Sch¨olkopf, B.; Smola, A. J. Learning with kernels: support vector machines, regularization, optimiza-tion, and beyond ; MIT press, 2002.(396) Sch¨olkopf, B.; Mika, S.; Burges, C. J.; Knirsch, P.; M¨uller, K.-R.; R¨atsch, G.; Smola, A. J.

IEEETrans. Neural Netw. , , 1000–1017. Neural Comput. , , 1089–1124.(399) Braun, M. L.; Buhmann, J. M.; M¨uller, K.-R. J. Mach. Learn. Res. , , 1875–1908.(400) Montavon, G.; Braun, M. L.; M¨uller, K.-R. J. Mach. Learn. Res. , , 2563–2581.(401) Montavon, G.; Braun, M. L.; Krueger, T.; Muller, K.-R. IEEE Signal Process. Mag. , , 62–74.(402) Mitchell B.O., J. B. Wiley Interdiscip. Rev. Comput. Mol. Sci. , , 468–481.(403) Ballester, P. J.; Mitchell, J. B. Bioinformatics , , 1169–1175.(404) Mizoguchi, T.; Kiyohara, S. Microscopy , , 92–109.(405) Carr, D. A.; Lach-hab, M.; Yang, S.; Vaisman, I. I.; Blaisten-Barojas, E. Microporous MesoporousMater. , , 339–349.(406) Legrain, F.; Carrete, J.; Van Roekeghem, A.; Madsen, G. K.; Mingo, N. J. Phys. Chem. B , ,625–632.(407) Lam Pham, T.; Kino, H.; Terakura, K.; Miyake, T.; Tsuda, K.; Takigawa, I.; Chi Dam, H. Sci.Technol. Adv. Mater. , , 756–765.(408) Sugiyama, M.; Krauledat, M.; M¨uller, K.-R. J. Mach. Learn. Res. , , 985–1005.(409) Sugiyama, M.; Kawanabe, M. Machine learning in non-stationary environments: Introduction tocovariate shift adaptation ; MIT press, 2012.(410) Zien, A.; R¨atsch, G.; Mika, S.; Sch¨olkopf, B.; Lengauer, T.; M¨uller, K.-R.

Bioinformatics , ,799–807.(411) Behler, J. J. Chem. Phys. , , 074106.(412) Bart´ok, A. P.; Kondor, R.; Cs´anyi, G. Phys. Rev. B , , 184115.(413) Rupp, M.; Tkatchenko, A.; M¨uller, K.-R.; Von Lilienfeld, O. A. Phys. Rev. Lett. , , 058301. Int. J. Quantum Chem. , ,1094–1101.(415) Hansen, K.; Biegler, F.; Ramakrishnan, R.; Pronobis, W.; Von Lilienfeld, O. A.; M¨uller, K. R.;Tkatchenko, A. J. Phys. Chem. Lett. , , 2326–2331.(416) Christensen, A. S.; Bratholm, L. A.; Faber, F. A.; Anatole von Lilienfeld, O. J. Chem. Phys. , , 044107.(417) Faber, F. A.; Christensen, A. S.; Huang, B.; Von Lilienfeld, O. A. J. Chem. Phys. , , 241717.(418) Huo, H.; Rupp, M. arXiv preprint , arXiv:1704.06439.(419) Drautz, R. Phys. Rev. B , , 014104.(420) Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; Vinyals, O.; Dahl, G. E. Neural message passing for quantumchemistry. 34th International Conference on Machine Learning, ICML 2017. 2017; pp 2053–2070.(421) Sch¨utt, K. T.; Kindermans, P. J.; Sauceda, H. E.; Chmiela, S.; Tkatchenko, A.; M¨uller, K. R. Adv.Neural Inf. Process. Syst. , 992–1002.(422) Kocer, E.; Mason, J. K.; Erturk, H.

J. Chem. Phys. , , 154102.(423) Duvenaud, D.; Maclaurin, D.; Aguilera-Iparraguirre, J.; G´omez-Bombarelli, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R. P. Adv. Neural Inf. Process. Syst. , , 2224–2232.(424) Li, Z.; Wang, S.; Chin, W. S.; Achenie, L. E.; Xin, H. J. Mater. Chem. A , , 24131–24138.(425) Lim, J.; Ryu, S.; Kim, J. W.; Kim, W. Y. J. Cheminformatics , , 31.(426) Rogers, D.; Hahn, M. J. Chem. Inf. Model. , , 742–754.(427) Ehmki, E. S. R.; Schmidt, R.; Ohm, F.; Rarey, M. J. Chem. Inf. Model. , , 2572–2586.(428) Schmidt, R.; Ehmki, E. S.; Ohm, F.; Ehrlich, H. C.; Mashychev, A.; Rarey, M. J. Chem. Inf. Model. , , 2560–2571.(429) Weininger, D. J. Chem. Inf. Comput. Sci. , , 31–36.(430) Weininger, D.; Weininger, A.; Weininger, J. L. J. Chem. Inf. Comput. Sci. , , 97–101.(431) Behler, J. Phys. Chem. Chem. Phys. , , 17930–17955. J. Chem. Phys. , , 170901.(433) Sch¨utt, K. T.; Sauceda, H. E.; Kindermans, P.-J.; Tkatchenko, A.; M¨uller, K.-R. J. Chem. Phys. , , 241722.(434) Unke, O. T.; Meuwly, M. J. Chem. Phys. , , 241708.(435) Murray, I. Gaussian processes and fast matrix-vector multiplies. 2009.(436) Wilson, A. G.; Hu, Z.; Salakhutdinov, R.; Xing, E. P. Deep kernel learning. Artiﬁcial Intelligence andStatistics. 2016; pp 370–378.(437) Gardner, J. R.; Pleiss, G.; Wu, R.; Weinberger, K. Q.; Wilson, A. G. arXiv preprint ,arXiv:1802.08903.(438) Gardner, J.; Pleiss, G.; Weinberger, K. Q.; Bindel, D.; Wilson, A. G. Adv. Neural Inf. Process. Syst. , 7576–7586.(439) Wang, K.; Pleiss, G.; Gardner, J.; Tyree, S.; Weinberger, K. Q.; Wilson, A. G.

Adv. Neural Inf.Process. Syst. , , 14648–14659.(440) LeCun, Y. A.; Bottou, L.; Orr, G. B.; M¨uller, K.-R. Neural networks: Tricks of the trade ; SpringerLNCS 7700, 2012; pp 9–48.(441) Hastie, T.; Tibshirani, R.; Friedman, J.

The elements of statistical learning: data mining, inference,and prediction ; Springer, 2001.(442) Hansen, K.; Montavon, G.; Biegler, F.; Fazli, S.; Rupp, M.; Scheﬄer, M.; Von Lilienfeld, O. A.;Tkatchenko, A.; M¨uller, K.-R.

J. Chem. Theory Comput. , , 3404–3419.(443) MacKay, D. J. Neural Comput. , , 720–736.(444) MacKay, D. J. Neural Comput. , , 448–472.(445) Kwok, J. T.-Y. IEEE Trans. Neural Netw. , , 1162–1173.(446) Akaike, H. IEEE Trans. Automat. Contr. , , 716–723.(447) others„ et al. Ann. Stat. , , 461–464.(448) Murata, N.; Yoshizawa, S.; Amari, S.-i. IEEE Trans. Neural Netw. , , 865–872. ACS Cent. Sci. , , 755–767.(450) Wang, J.; Chmiela, S.; M¨uller, K.-R.; No´e, F.; Clementi, C. J. Chem. Phys. , , 194106.(451) Bonati, L.; Rizzi, V.; Parrinello, M. J. Phys. Chem. Lett. , , 2998–3004.(452) Sadeghi, A.; Ghasemi, S. A.; Schaefer, B.; Mohr, S.; Lill, M. A.; Goedecker, S. J. Chem. Phys. , , 184118.(453) Barker, J.; Bulin, J.; Hamaekers, J.; Mathias, S. Scientiﬁc Computing and Algorithms in IndustrialSimulations ; Springer, 2017; pp 25–42.(454) Willatt, M. J.; Musil, F.; Ceriotti, M.

J. Chem. Phys. , , 154110.(455) Musil, F.; Grisaﬁ, A.; Bart´ok, A. P.; Ortner, C.; Cs´anyi, G.; Ceriotti, M. arXiv preprint ,arXiv:2101.04673.(456) Tsuzuki, H.; Branicio, P. S.; Rino, J. P. Comput. Phys. Commun. , , 518–523.(457) Gastegger, M.; Schwiedrzik, L.; Bittermann, M.; Berzsenyi, F.; Marquetand, P. J. Chem. Phys. , , 241709.(458) De, S.; Bart´ok, A. P.; Cs´anyi, G.; Ceriotti, M. Phys. Chem. Chem. Phys. , , 13754–13769.(459) Pronobis, W.; Tkatchenko, A.; M¨uller, K.-R. J. Chem. Theory Comput. , , 2991–3003.(460) Zhang, L.; Han, J.; Wang, H.; Car, R.; Weinan, E. Phys. Rev. Lett , , 143001.(461) others„ et al. Adv. Neural Inf. Process. Syst. , , 4436–4446.(462) Anderson, B.; Hy, T. S.; Kondor, R. Adv. Neural Inf. Process. Syst. , , 14537–14546.(463) Thomas, N.; Smidt, T.; Kearnes, S.; Yang, L.; Li, L.; Kohlhoﬀ, K.; Riley, P. arXiv preprint ,arXiv:1802.08219.(464) Imbalzano, G.; Anelli, A.; Giofr´e, D.; Klees, S.; Behler, J.; Ceriotti, M. J. Chem. Phys. , ,241730.(465) Schl¨omer, T.; Heck, D.; Deussen, O. Farthest-point optimized point sets with maximized minimumdistance. Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics. 2011; pp135–142. Proc. Natl. Acad. Sci. , , 697–702.(467) Pozdnyakov, S. N.; Willatt, M. J.; Bart´ok, A. P.; Ortner, C.; Cs´anyi, G.; Ceriotti, M. arXiv preprint , arXiv:2001.11696.(468) von Lilienfeld, O. A.; Ramakrishnan, R.; Rupp, M.; Knoll, A. Int. J. Quantum Chem. , ,1084–1093.(469) Bartok, A. P.; De, S.; Poelking, C.; Bernstein, N.; Kermode, J.; Csanyi, G.; Ceriotti, M. Sci. Adv. , , e1701816.(470) Monserrat, B.; Brandenburg, J. G.; Engel, E. A.; Cheng, B. arXiv preprint , arXiv:2006.13316.(471) Deringer, V. L.; Cs´anyi, G. Phys. Rev. B , , 094203.(472) Rowe, P.; Deringer, V. L.; Gasparotto, P.; Cs´anyi, G.; Michaelides, A. J. Chem. Phys. , ,034702.(473) Glielmo, A.; Sollich, P.; De Vita, A. Phys. Rev. B , , 214302.(474) Grisaﬁ, A.; Wilkins, D. M.; Cs´anyi, G.; Ceriotti, M. Phys. Rev. Lett. , , 036002.(475) Willatt, M. J.; Musil, F.; Ceriotti, M. Phys. Chem. Chem. Phys. , , 29661–29668.(476) Lubbers, N.; Smith, J. S.; Barros, K. J. Chem. Phys. , , 241715.(477) Unke, O. T.; Meuwly, M. J. Chem. Theory Comput. , , 3678–3693.(478) Jørgensen, P. B.; Jacobsen, K. W.; Schmidt, M. N. arXiv preprint , arXiv:1806.03146.(479) Klicpera, J.; Groß, J.; G¨unnemann, S. Directional Message Passing for Molecular Graphs. InternationalConference on Learning Representations (ICLR). 2020.(480) Shao, Y.; Hellstro ¨m, M.; Mitev, P. D.; Knijﬀ, L.; Zhang, C. J. Chem. Inf. Model. , , 1184–1193.(481) Tropsha, A. Mol. Inform. , , 476–488.(482) Ma, J.; Sheridan, R. P.; Liaw, A.; Dahl, G. E.; Svetnik, V. J. Chem. Inf. Model. , , 263–274.(483) Grisaﬁ, A.; Fabrizio, A.; Meyer, B.; Wilkins, D. M.; Corminboeuf, C.; Ceriotti, M. ACS Cent. Sci. , , 57–64. Proc. Natl. Acad. Sci. , , 3401–3406.(485) Ramakrishnan, R.; Dral, P. O.; Rupp, M.; Von Lilienfeld, O. A. J. Chem. Theory Comput. , ,2087–2096.(486) Bart´ok, A. P.; Gillan, M. J.; Manby, F. R.; Cs´anyi, G. Phys. Rev. B Condens. Matter Mater. Phys. , , 054104.(487) Cheng, L.; Welborn, M.; Christensen, A. S.; Miller, T. F. J. Chem. Phys. , , 131103.(488) Faber, F. A.; Hutchison, L.; Huang, B.; Gilmer, J.; Schoenholz, S. S.; Dahl, G. E.; Vinyals, O.;Kearnes, S.; Riley, P. F.; Von Lilienfeld, O. A. J. Chem. Theory Comput. , , 5255–5264.(489) Hollingsworth, J.; Baker, T. E.; Burke, K. J. Chem. Phys. , , 241743.(490) Li, L.; Snyder, J. C.; Pelaschier, I. M.; Huang, J.; Niranjan, U. N.; Duncan, P.; Rupp, M.; M¨uller, K. R.;Burke, K. Int. J. Quantum Chem. , , 819–833.(491) Vu, K.; Snyder, J. C.; Li, L.; Rupp, M.; Chen, B. F.; Khelif, T.; M¨uller, K. R.; Burke, K. Int. J.Quantum Chem. , , 1115–1128.(492) Nagai, R.; Akashi, R.; Sasaki, S.; Tsuneyuki, S. J. Chem. Phys. , , 241737.(493) Carleo, G.; Troyer, M. Science , , 602–606.(494) Manzhos, S.; Carrington, T. Can. J. Chem. , , 864–871.(495) quantum-machine.org. http://quantum-machine.org/datasets/ .(496) others„ et al. Npj Comput. Mater. , , 41.(497) The Materials Project. https://materialsproject.org/ .(498) The NOMAD Laboratory. https://nomad-repository.eu/ .(499) Calderon, C. E.; Plata, J. J.; Toher, C.; Oses, C.; Levy, O.; Fornari, M.; Natan, A.; Mehl, M. J.;Hart, G.; Buongiorno Nardelli, M.; Curtarolo, S. Comput. Mater. Sci. , , 233–238.(500) Smith, J. S.; Isayev, O.; Roitberg, A. E. Chem. Sci. , , 3192–3203.(501) Smith, J. S.; Isayev, O.; Roitberg, A. E. Sci. Data , , 170193. Sci. Data , , 1–10.(503) Liu, T.; Lin, Y.; Wen, X.; Jorissen, R. N.; Gilson, M. K. Nucleic Acids Res. , , D198–D201.(504) Hachmann, J.; Olivares-Amaya, R.; Atahan-Evrenk, S.; Amador-Bedolla, C.; S´anchez-Carrera, R. S.;Gold-Parker, A.; Vogt, L.; Brockway, A. M.; Aspuru-Guzik, A. J. Phys. Chem. Lett. , , 2241–2251.(505) Chung, Y. G.; Camp, J.; Haranczyk, M.; Sikora, B. J.; Bury, W.; Krungleviciute, V.; Yildirim, T.;Farha, O. K.; Sholl, D. S.; Snurr, R. Q. Chem. Mater. , , 6185–6192.(506) Mobley, D. L.; Guthrie, J. P. J. Comput. Aided Mol. Des. , , 711–720.(507) Fink, T.; Raymond, J. L. J. Chem. Inf. Model. , , 342–353.(508) Earl, D. J.; Deem, M. W. Ind. Eng. Chem. Res. , , 5449–5454.(509) Jain, A.; Ong, S. P.; Hautier, G.; Chen, W.; Richards, W. D.; Dacek, S.; Cholia, S.; Gunter, D.;Skinner, D.; Ceder, G.; Persson, K. A. APL Mater. , , 011002.(510) Wu, Z.; Ramsundar, B.; Feinberg, E. N.; Gomes, J.; Geniesse, C.; Pappu, A. S.; Leswing, K.; Pande, V. Chem. Sci. , , 513–530.(511) Zitnick, C. L. et al. arXiv preprint , arXiv:2010.09435.(512) Saal, J. E.; Kirklin, S.; Aykol, M.; Meredig, B.; Wolverton, C. JOM , , 1501–1509.(513) Nakata, M.; Shimazaki, T. J. Chem. Inf. Model. , , 1300–1308.(514) Hoja, J.; Sandonas, L. M.; Ernst, B. G.; Vazquez-Mayagoitia, A.; DiStasio Jr, R. A.; Tkatchenko, A. arXiv preprint , arXiv:2006.15139.(515) Ramakrishnan, R.; Dral, P. O.; Rupp, M.; Von Lilienfeld, O. A. Sci. Data , , 1–7.(516) Kim, E.; Huang, K.; Tomala, A.; Matthews, S.; Strubell, E.; Saunders, A.; McCallum, A.; Olivetti, E. Sci. Data , , 170127.(517) Tetko, I. V.; Maran, U.; Tropsha, A. Mol. Inform. , , 1600082. New Journal of Physics , , 095003.(519) Isayev, O.; Fourches, D.; Muratov, E. N.; Oses, C.; Rasch, K.; Tropsha, A.; Curtarolo, S. Chem.Mater. , , 735–743.(520) Ceriotti, M. J. Chem. Phys. , , 150901.(521) Fraux, G.; Cersonsky, R.; Ceriotti, M. J. Open Source Softw. , , 2117.(522) Qiang, Y.; Xindong, W. Int. J. Inf. Technol. Decis. Mak. , , 597–604.(523) Wu, X.; Kumar, V.; Ross, Q. J.; Ghosh, J.; Yang, Q.; Motoda, H.; McLachlan, G. J.; Ng, A.; Liu, B.;Yu, P. S.; Zhou, Z. H.; Steinbach, M.; Hand, D. J.; Steinberg, D. Knowl. Inf. Syst. , , 1–37.(524) Li, H.; Zhang, Z.; Zhao, Z.-Z. Processes , , 151.(525) Tshitoyan, V.; Dagdelen, J.; Weston, L.; Dunn, A.; Rong, Z.; Kononova, O.; Persson, K. A.; Ceder, G.;Jain, A. Nature , , 95–98.(526) Lo, Y. C.; Rensi, S. E.; Torng, W.; Altman, R. B. Drug Discov. Today , , 1538–1546.(527) Kim, E.; Huang, K.; Saunders, A.; McCallum, A.; Ceder, G.; Olivetti, E. Chem. Mater. , ,9436–9444.(528) Kowalski, B. R. J. Chem. Inf. Comput. Sci. , , 201–203.(529) Swain, M. C.; Cole, J. M. J. Chem. Inf. Model. , , 1894–1904.(530) Sparks, T. D.; Gaultois, M. W.; Oliynyk, A.; Brgoch, J.; Meredig, B. Scr. Mater. , , 10–15.(531) Medford, A. J.; Kunz, M. R.; Ewing, S. M.; Borders, T.; Fushimi, R. ACS Catal. , , 7403–7429.(532) Raccuglia, P.; Elbert, K. C.; Adler, P. D.; Falk, C.; Wenny, M. B.; Mollo, A.; Zeller, M.; Friedler, S. A.;Schrier, J.; Norquist, A. J. Nature , , 73–76.(533) Xie, S.; Stewart, G.; Hamlin, J.; Hirschfeld, P.; Hennig, R. Phys. Rev. B , , 174513.(534) Staker, J.; Marshall, K.; Abel, R.; McQuaw, C. M. J. Chem. Inf. Model. , , 1017–1029.(535) Timoshenko, J.; Lu, D.; Lin, Y.; Frenkel, A. I. J. Phys. Chem. Lett. , , 5091–5098. Npj Comput. Mater. , , 1–9.(537) Zhou, X. X.; Zeng, W. F.; Chi, H.; Luo, C.; Liu, C.; Zhan, J.; He, S. M.; Zhang, Z. Anal. Chem. , , 12690–12697.(538) Car, R.; Parrinello, M. Phys. Rev. Lett. , , 2471–2474.(539) Butler, K. T.; Davies, D. W.; Cartwright, H.; Isayev, O.; Walsh, A. Nature , , 547–555.(540) Deringer, V. L.; Caro, M. A.; Cs´anyi, G. Adv. Mater. , , 1902765.(541) Unke, O. T.; Chmiela, S.; Sauceda, H. E.; Gastegger, M.; Poltavsky, I.; Sch¨utt, K. T.; Tkatchenko, A.;M¨uller, K.-R. arXiv preprint , arXiv:2010.07067.(542) Dral, P. O.; Owens, A.; Yurchenko, S. N.; Thiel, W. J. Chem. Phys. , , 244108.(543) Handley, C. M.; Popelier, P. L. J. Phys. Chem. A , , 3371–3383.(544) Jiang, B.; Li, J.; Guo, H. Int. Rev. Phys. Chem. , , 479–506.(545) Kolb, B.; Zhao, B.; Li, J.; Jiang, B.; Guo, H. J. Chem. Phys. , , 224103.(546) Shao, K.; Chen, J.; Zhao, Z.; Zhang, D. H. J. Chem. Phys. , , 071101.(547) Li, J.; Song, K.; Behler, J. Phys. Chem. Chem. Phys. , , 9672–9682.(548) Fu, B.; Zhang, D. H. J. Chem. Theory Comput. , , 2289–2303.(549) Ballard, A. J.; Das, R.; Martiniani, S.; Mehta, D.; Sagun, L.; Stevenson, J. D.; Wales, D. J. Phys.Chem. Chem. Phys. , , 12585–12603.(550) Chen, J.; Xu, X.; Zhang, D. H. J. Chem. Phys. , , 154301.(551) Brown, D. F.; Gibbs, M. N.; Clary, D. C. J. Chem. Phys. , , 7597–7604.(552) Bernstein, N.; Cs´anyi, G.; Deringer, V. L. Npj Comput. Mater. , , 99.(553) Chmiela, S.; Sauceda, H. E.; Poltavsky, I.; M¨uller, K.-R.; Tkatchenko, A. Comput. Phys. Commun. , , 38–45.(554) P´artay, L. B.; Bart´ok, A. P.; Cs´anyi, G. J. Phys. Chem. B , , 10502–10512.(555) Thompson, A.; Swiler, L.; Trott, C.; Foiles, S.; Tucker, G. J. Comput. Phys. , , 316–330. J. Phys. Chem. A , , 731–745.(557) Behler, J. J. Phys. Condens. Matter , , 183001.(558) Sch¨utt, K. T.; Kessel, P.; Gastegger, M.; Nicoli, K. A.; Tkatchenko, A.; M¨uller, K. R. J. Chem. TheoryComput. , , 448–455.(559) Han, J.; Zhang, L.; Car, R.; E, W. arXiv preprint , arXiv:1707.01478.(560) Torrie, G. M.; Valleau, J. P. J. Comput. Phys. , , 187–199.(561) Laio, A.; Parrinello, M. Proc. Natl. Acad. Sci. USA , , 12562–12566.(562) Bolhuis, P. G.; Chandler, D.; Dellago, C.; Geissler, P. L. Annu. Rev. Phys. Chem. , , 291–318.(563) Morawietz, T.; Singraber, A.; Dellago, C.; Behler, J. Proc. Natl. Acad. Sci. USA , , 8368–8373.(564) Tuckerman, M. Statistical mechanics: theory and molecular simulation ; Oxford university press, 2010.(565) Cheng, B.; Engel, E. A.; Behler, J.; Dellago, C.; Ceriotti, M.

Proc. Natl. Acad. Sci. , ,1110–1115.(566) Reinhardt, A.; Cheng, B. Nat. Commun. , , 588.(567) Niu, H.; Bonati, L.; Piaggi, P. M.; Parrinello, M. Nat. Commun. , , 2654.(568) Cheng, B.; Mazzola, G.; Pickard, C. J.; Ceriotti, M. Nature , , 217–220.(569) Cheng, B.; Behler, J.; Ceriotti, M. J. Phys. Chem. Lett. , , 2210–2215.(570) Morawietz, T.; Marsalek, O.; Pattenaude, S. R.; Streacker, L. M.; Ben-Amotz, D.; Markland, T. E. J. Phys. Chem. Lett. , , 851–857.(571) Ko, H.-Y.; Zhang, L.; Santra, B.; Wang, H.; E, W.; Jr, R. A. D.; Car, R. Mol. Phys. , ,3269–3281.(572) Sauceda, H. E.; Chmiela, S.; Poltavsky, I.; M¨uller, K.-R.; Tkatchenko, A. J. Chem. Phys. , ,114102.(573) Bisbo, M. K.; Hammer, B. Phys. Rev. Lett. , , 086102. J. Chem. Phys. , , 134104.(575) Deringer, V. L.; Pickard, C. J.; Proserpio, D. M. Angew. Chem. Int. Ed. , 16014–16019.(576) Chiavazzo, E.; Covino, R.; Coifman, R. R.; Gear, C. W.; Georgiou, A. S.; Hummer, G.;Kevrekidis, I. G.

Proc. Natl. Acad. Sci. , , E5494–E5503.(577) Rogal, J.; Schneider, E.; Tuckerman, M. E. Phys. Rev. Lett. , , 245701.(578) Bonati, L.; Rizzi, V.; Parrinello, M. J. Phys. Chem. Lett. , , 2998–3004.(579) Mones, L.; Bernstein, N.; Cs´anyi, G. J. Chem. Theory Comput. , , 5100–5110.(580) Debnath, J.; Parrinello, M. J. Phys. Chem. Lett. , , 5076–5080.(581) Schneider, E.; Dai, L.; Topper, R. Q.; Drechsel-Grau, C.; Tuckerman, M. E. Phys. Rev. Lett. , , 150601.(582) Bonati, L.; Zhang, Y.-Y.; Parrinello, M. Proc. Natl. Acad. Sci. , , 17641–17647.(583) Vanhaelen, Q.; Lin, Y.-C.; Zhavoronkov, A. ACS Med. Chem. Lett. , , 1496–1505.(584) Sanchez-Lengeling, B.; Aspuru-Guzik, A. Science , , 360–365.(585) Gupta, A.; M¨uller, A. T.; Huisman, B. J. H.; Fuchs, J. A.; Schneider, P.; Schneider, G. Mol. Inform. , , 1700111.(586) Segler, M. H. S.; Kogej, T.; Tyrchan, C.; Waller, M. P. ACS Cent. Sci. , , 120–131.(587) Merk, D.; Friedrich, L.; Grisoni, F.; Schneider, G. Mol. Inform. , , 1700153.(588) Liu, Q.; Allamanis, M.; Brockschmidt, M.; Gaunt, A. Adv. Neural Inf. Process. Syst. , , 7795–7804.(589) Jin, W.; Barzilay, R.; Jaakkola, T. arXiv preprint , arXiv:1802.04364.(590) Dai, H.; Tian, Y.; Dai, B.; Skiena, S.; Song, L. Syntax-Directed Variational Autoencoder for StructuredData. International Conference on Learning Representations. 2018.(591) Kusner, M. J.; Paige, B.; Hern´andez-Lobato, J. M. Grammar variational autoencoder. Proceedings ofthe 34th International Conference on Machine Learning-Volume 70. 2017; pp 1945–1954. J. Chem. Phys. , , 241735.(593) Jin, W.; Yang, K.; Barzilay, R.; Jaakkola, T. Learning Multimodal Graph-to-Graph Translation forMolecular Optimization. International Conference on Learning Representations. 2019.(594) G´omez-Bombarelli, R.; Wei, J. N.; Duvenaud, D.; Hern´andez-Lobato, J. M.; S´anchez-Lengeling, B.;Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T. D.; Adams, R. P.; Aspuru-Guzik, A. ACS Cent. Sci. , , 268–276.(595) Polykovskiy, D.; Zhebrak, A.; Vetrov, D.; Ivanenkov, Y.; Aladinskiy, V.; Mamoshina, P.; Bozda-ganyan, M.; Aliper, A.; Zhavoronkov, A.; Kadurin, A. Mol. Pharmaceutics , , 4398–4405.(596) Blaschke, T.; Olivecrona, M.; Engkvist, O.; Bajorath, J.; Chen, H. Mol. Inform. , , 1700123.(597) Kadurin, A.; Nikolenko, S.; Khrabrov, K.; Aliper, A.; Zhavoronkov, A. Mol. Pharmaceutics , ,3098–3104.(598) Yu, L.; Zhang, W.; Wang, J.; Yu, Y. arXiv preprint , arXiv:1609.05473.(599) Guimaraes, G. L.; Sanchez-Lengeling, B.; Farias, P. L. C.; Aspuru-Guzik, A. arXiv preprint ,arXiv:1705.10843.(600) De Cao, N.; Kipf, T. arXiv preprint , arXiv:1805.11973.(601) Maziarka, L.; Pocha, A.; Kaczmarczyk, J.; Rataj, K.; Danel, T.; Warcho l, M. J. Cheminformatics , , 1–18.(602) You, J.; Liu, B.; Ying, R.; Pande, V.; Leskovec, J. Adv. Neural Inf. Process. Syst. , , 6410–6421.(603) Popova, M.; Isayev, O.; Tropsha, A. Sci. Adv. , , eaap7885.(604) Zhavoronkov, A. et al. Nat. Biotechnol. , , 1038–1040.(605) Li, Y.; Zhang, L.; Liu, Z. J. Cheminformatics , , 33.(606) Li, Y.; Vinyals, O.; Dyer, C.; Pascanu, R.; Battaglia, P. arXiv preprint , arXiv:1803.03324.(607) Mansimov, E.; Mahmood, O.; Kang, S.; Cho, K. arXiv preprint , arXiv:1904.00314.(608) Gebauer, N. W. A.; Gastegger, M.; Sch¨utt, K. T. arXiv preprint , arXiv:1810.11347. Adv. Neural Inf. Process. Syst. , , 7564–7576.(610) Caccin, M.; Li, Z.; Kermode, J. R.; De Vita, A. Int. J. Quantum Chem. , , 1129–1139.(611) Gkeka, P. et al. J. Chem. Theory Comput. , , 4757–4775.(612) Saunders, M. G.; Voth, G. A. Annu. Rev. Biophys. , , 73–93.(613) John, S. T.; Cs´anyi, G. J. Phys. Chem. B , , 10934–10949.(614) Zhang, L.; Han, J.; Wang, H.; Car, R.; E, W. J. Chem. Phys. , , 034101.(615) Cesari, A.; Gil-Ley, A.; Bussi, G. J. Chem. Theory Comput. , , 6192–6200.(616) Goh, G. B.; Hodas, N. O.; Vishnu, A. J. Comput. Chem. , , 1291–1307.(617) Goh, G. B.; Vishnu, A.; Siegel, C.; Hodas, N. Using rule-based labels for weak supervised learn-ing: A ChemNet for transferable chemical property prediction. Proceedings of the ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining. 2018; pp 302–310.(618) Curtarolo, S.; Hart, G. L.; Nardelli, M. B.; Mingo, N.; Sanvito, S.; Levy, O. Nat. Mater. , ,191–201.(619) G´omez-Bombarelli, R. et al. Nat. Mater. , , 1120–1127.(620) Kolb, B.; Lentz, L. C.; Kolpak, A. M. Sci. Rep. , , 1192.(621) von Lilienfeld, O. A.; M¨uller, K.-R.; Tkatchenko, A. Nat. Rev. Chem. , , 347—-358.(622) Kuhn, C.; Beratan, D. N. J. Phys. Chem. , , 10595–10599.(623) Isayev, O.; Oses, C.; Toher, C.; Gossett, E.; Curtarolo, S.; Tropsha, A. Nat. Commun. , , 15679.(624) Zhang, L.; Mao, H.; Liu, Q.; Gani, R. Curr. Opin. Chem. Eng. , , 22–34.(625) Park, C. W.; Wolverton, C. Phys. Rev. Mater. , , 063801.(626) Xie, T.; Grossman, J. C. Phys. Rev. Lett. , , 145301.(627) Dewai, M. J.; Storch, D. M. J. Am. Chem. Soc. , , 3898–3902.(628) Nam, J.; Kim, J. arXiv preprint , arXiv:1612.09529. Nature , , 604–610.(630) Segler, M.; Preuss, M.; Waller, M. P. Towards “Alphachem”: Chemical synthesis planning with treesearch and deep neural network policies. 5th International Conference on Learning Representations,ICLR 2017 - Workshop Track Proceedings. 2019.(631) Warr, W. A. Mol. Inform. , , 469–476.(632) Schwaller, P.; Laino, T.; Gaudin, T.; Bolgar, P.; Hunter, C. A.; Bekas, C.; Lee, A. A. ACS Cent. Sci. , , 1572–1583.(633) Corey, E. J.; Todd Wipke, W. Science , , 178–192.(634) Coley, C. W.; Green, W. H.; Jensen, K. F. Acc. Chem. Res. , , 1281–1289.(635) Cook, A.; Johnson, A. P.; Law, J.; Mirzazadeh, M.; Ravitz, O.; Simon, A. Wiley Interdiscip. Rev.Comput. Mol. Sci. , , 79–107.(636) Liu, B.; Ramsundar, B.; Kawthekar, P.; Shi, J.; Gomes, J.; Luu Nguyen, Q.; Ho, S.; Sloane, J.;Wender, P.; Pande, V. ACS Cent. Sci. , , 1103–1113.(637) Coley, C. W.; Rogers, L.; Green, W. H.; Jensen, K. F. J. Chem. Inf. Model. , , 252–261.(638) Klucznik, T. et al. Chem , , 522–532.(639) Coley, C. W.; Eyke, N. S.; Jensen, K. F. Angew. Chem. Int. Ed. , , 22858–22893.(640) Coley, C. W.; Eyke, N. S.; Jensen, K. F. Angew. Chem. Int. Ed. , , 23414–23436.(641) Lindstr¨om, B.; Pettersson, L. J. CATTECH , , 130–138.(642) Cui, X.; Li, W.; Ryabchuk, P.; Junge, K.; Beller, M. Nat. Catal. , , 385–397.(643) Suen, N.-T.; Hung, S.-F.; Quan, Q.; Zhang, N.; Xu, Y.-J.; Chen, H. M. Chem. Soc. Rev. , ,337–365.(644) Benson, E. E.; Kubiak, C. P.; Sathrum, A. J.; Smieja, J. M. Chem. Soc. Rev. , , 89–99.(645) Steinfeld, A. Sol. Energy , , 603–615.(646) Lee, J. H.; Seo, Y.; Park, Y. D.; Anthony, J. E.; Kwak, D. H.; Lim, J. A.; Ko, S.; Jang, H. W.; Cho, K.;Lee, W. H. Sci. Rep. , , 1–9. Electroanalysis , , 2256–2269.(648) Wenderich, K.; Mul, G. Chem. Rev. , , 14587–14619.(649) Hou, W.; Cronin, S. B. Adv. Funct. Mater , , 1612–1619.(650) Julliard, M.; Chanon, M. Chem. Rev. , , 425–506.(651) Kavarnos, G. J.; Turro, N. J. Chem. Rev. , , 401–449.(652) others„ et al. J. Phys. D: Appl. Phys , , 443001.(653) Van Durme, J.; Dewulf, J.; Leys, C.; Van Langenhove, H. Appl. Catal. B , , 324–333.(654) Campos-Gonzalez-Angulo, J. A.; Ribeiro, R. F.; Yuen-Zhou, J. Nat. Commun. , , 4685.(655) Ma, Z.; Zaera, F. Encyclopedia of Inorganic and Bioinorganic Chemistry ; Wiley Online Library, 2014;pp 1–16.(656) Anastas, P. T.; Crabtree, R. H.

Handbook of Green Chemistry; Volume 2: Green Catalysis, Heteroge-neous Catalysis. ; Wiley-VCH, 2013.(657) Trost, B. M.

Acc. Chem. Res. , , 695–705.(658) Sheldon, R. A.; Arends, I.; Hanefeld, U. Green chemistry and catalysis ; John Wiley & Sons, 2007.(659) Hammer, B.; Nørskov, J. K.

Adv. Catal. , , 71–129.(660) Medford, A. J.; Vojvodic, A.; Hummelshøj, J. S.; Voss, J.; Abild-Pedersen, F.; Studt, F.; Bligaard, T.;Nilsson, A.; Nørskov, J. K. J. Catal. , , 36–42.(661) Calle-Vallejo, F.; Loﬀreda, D.; Koper, M. T.; Sautet, P. Nat. Chem. , , 403–410.(662) Kitchin, J. R. Nat. Catal. , , 230–232.(663) Toyao, T.; Maeno, Z.; Takakusagi, S.; Kamachi, T.; Takigawa, I.; Shimizu, K. I. ACS Catal. , , 2260–2297.(664) Grajciar, L.; Heard, C. J.; Bondarenko, A. A.; Polynski, M. V.; Meeprasert, J.; Pidko, E. A.; Nachti-gall, P. Chem. Soc. Rev. , , 8307–8348.(665) Goldsmith, B. R.; Esterhuizen, J.; Liu, J. X.; Bartel, C. J.; Sutton, C. AIChE J. , , 2311–2323. Phys. Chem. Chem. Phys. , , 11174–11196.(667) Wexler, R. B.; Qiu, T.; Rappe, A. M. J. Phys. Chem. C , , 2321–2328.(668) Ulissi, Z. W.; Singh, A. R.; Tsai, C.; Nørskov, J. K. J. Phys. Chem. Lett. , , 3931–3935.(669) Roling, L. T.; Li, L.; Abild-Pedersen, F. J. Phys. Chem. C , , 23002–23010.(670) Roling, L. T.; Choksi, T. S.; Abild-Pedersen, F. Nanoscale , , 4438–4452.(671) Yan, Z.; Taylor, M. G.; Mascareno, A.; Mpourmpakis, G. Nano Lett. , , 2696–2704.(672) Dean, J.; Taylor, M. G.; Mpourmpakis, G. Sci. Adv. , , eaax5101.(673) N´u˜nez, M.; Lansford, J. L.; Vlachos, D. G. Nat. Chem. , , 449–456.(674) van der Maaten, L. J. Mach. Learn. Res. , , 3221–3245.(675) Zhong, M. et al. Nature , , 178–183.(676) Artrith, N.; Kolpak, A. M. Nano Lett. , , 2670–2676.(677) Nandy, A.; Zhu, J.; Janet, J. P.; Duan, C.; Getman, R. B.; Kulik, H. J. ACS Catal. , , 8243–8255.(678) Wexler, R. B.; Martirez, J. M. P.; Rappe, A. M. J. Am. Chem. Soc. , , 4678–4683.(679) O’Connor, N. J.; Jonayat, A. S.; Janik, M. J.; Senftle, T. P. Nat. Catal. , , 531–539.(680) Griego, C. D.; Zhao, L.; Saravanan, K.; Keith, J. A. AIChE J. , , e17041.(681) Meyer, B.; Sawatlon, B.; Heinen, S.; Von Lilienfeld, O. A.; Corminboeuf, C. Chem. Sci. , ,7069–7077.(682) Boes, J. R.; Mamun, O.; Winther, K.; Bligaard, T. J. Phys. Chem. A , , 2281–2285.(683) Ulissi, Z. W.; Medford, A. J.; Bligaard, T.; Nørskov, J. K. Nat. Commun. , , 14621.(684) Bort, W.; Baskin, I. I.; Sidorov, P.; Marcou, G.; Horvath, D.; Madzhidov, T.; Varnek, A.; Gimadiev, T.;Nugmanov, R.; Mukanov, A. ChemRxiv , chemrxiv.11635929.v1.(685) Kayala, M. A.; Baldi, P.

J. Chem. Inf. Model. , , 2526–2540. ACS Cent. Sci. , , 725–732.(687) Schwaller, P.; Probst, D.; Vaucher, A. C.; Nair, V. H.; Kreutter, D.; Laino, T.; Reymond, J.-L. arXivpreprint , arXiv:2012.06051.(688) Blondal, K.; Jelic, J.; Mazeau, E.; Studt, F.; West, R. H.; Goldsmith, C. F. Ind. Eng. Chem. Res. , , 17682–17691.(689) Rangarajan, S.; Maravelias, C. T.; Mavrikakis, M. J. Phys. Chem. C , , 25847–25863.(690) Banerjee, S.; Sreenithya, A.; Sunoj, R. B. Phys. Chem. Chem. Phys. , , 18311–18318.(691) Reid, J. P.; Sigman, M. S. Nature , , 343–348.(692) Zahrt, A. F.; Henle, J. J.; Rose, B. T.; Wang, Y.; Darrow, W. T.; Denmark, S. E. Science , ,eaau5631.(693) Ravasco, J. M.; Coelho, J. A. J. Am. Chem. Soc. , , 4235–4241.(694) Davies, I. W. Nature , , 175–181.(695) Li, J.; Eastgate, M. D. React. Chem. Eng. , , 1595–1607.(696) Schneider, G.; Fechner, U. Nat. Rev. Drug Discov. , , 649–663.(697) Reker, D.; Schneider, G. Drug Discov. Today , , 458–465.(698) Wainberg, M.; Merico, D.; Delong, A.; Frey, B. J. Nat. Biotechnol. , , 829–838.(699) Khaket, T. P.; Aggarwal, H.; Dhanda, S.; Singh, J. Industrial Enzymes: Trends, Scope and Relevance ;Nova Science Publishers, Inc., 2014; pp 110–143.(700) Blaschke, T.; Ar´us-Pous, J.; Chen, H.; Margreitter, C.; Tyrchan, C.; Engkvist, O.; Papadopoulos, K.;Patronov, A.

J. Chem. Inf. Model. , , 5918–5922.(701) Jim´enez-Luna, J.; Grisoni, F.; Schneider, G. Nat. Mach. Intell. , , 573–584.(702) Chen, H.; Engkvist, O.; Wang, Y.; Olivecrona, M.; Blaschke, T. Drug Discov. Today , , 1241–1250.(703) Piroozmand, F.; Mohammadipanah, F.; Sajedi, H. Chem. Biol. Drug Des. , , 886–901. Nat. Rev. Drug Discov. , , 37–51.(705) Batra, R.; Chan, H.; Kamath, G.; Ramprasad, R.; Cherukara, M. J.; Sankaranarayanan, S. K. J.Phys. Chem. Lett. , , 7058–7065.(706) Haghighatlari, M.; Vishwakarma, G.; Altarawy, D.; Subramanian, R.; Kota, B. U.; Sonpal, A.;Setlur, S.; Hachmann, J. WIREs Comp. Mol. Sci. , , e1458.(707) No´e, F.; Tkatchenko, A.; M¨uller, K.-R.; Clementi, C. Annu. Rev. Phys. Chem. , , 361–390.(708) Von Lilienfeld, O. A. Angew. Chem. Int. Ed. , , 4164–4169.(709) von Lilienfeld, O. A.; Burke, K. Nat. Commun. , , 4895.(710) Strieth-Kalthoﬀ, F.; Sandfort, F.; Segler, M. H.; Glorius, F. Chem. Soc. Rev. , , 6154–6168.(711) Sch¨utt, K. T.; Chmiela, S.; von Lilienfeld, O. A.; Tkatchenko, A.; Tsuda, K.; M¨uller, K.-R. MachineLearning Meets Quantum Physics ; Springer Lecture Notes in Physics, 2020; Vol. 968.(712) Schnake, T.; Eberle, O.; Lederer, J.; Nakajima, S.; Sch¨utt, K. T.; M¨uller, K.-R.; Montavon, G. arXivpreprint , arXiv:2006.03589. raphical TOC Entryraphical TOC Entry