Perspective on integrating machine learning into computational chemistry and materials science
Julia Westermayr, Michael Gastegger, Kristof T. Schütt, Reinhard J. Maurer
DDeep integration of machine learning into computational chemistry andmaterials science
Julia Westermayr, Michael Gastegger, Kristof T. Schütt,
2, 3 and Reinhard J. Maurer a) Department of Chemistry, University of Warwick, Gibbet Hill Road, Coventry, CV4 7AL,United Kingdom Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany Berlin Institute for the Foundations of Learning and Data, 10587 Berlin, Germany (Dated: 18 February 2021)
Machine learning (ML) methods are being used in almost every conceivable area of electronic structuretheory and molecular simulation. In particular, ML has become firmly established in the construction ofhigh-dimensional interatomic potentials. Not a day goes by without another proof of principle being pub-lished on how ML methods can represent and predict quantum mechanical properties – be they observable,such as molecular polarizabilities, or not, such as atomic charges. As ML is becoming pervasive in electronicstructure theory and molecular simulation, we provide an overview of how various aspects of atomistic com-putational modelling are being transformed by the incorporation of ML approaches. From the perspectiveof the practitioner in the field, we assess how common workflows to predict structure, dynamics, and spec-troscopy are affected by ML. Finally, we discuss how a deeper and lasting integration of ML methods withcomputational chemistry and materials science can be achieved and what it will mean for research practice,software development, and postgraduate training.Keywords: electronic structure theory, quantum chemistry, artificial intelligence, molecular dynamics simula-tion, materials discovery
I. INTRODUCTION
Atomistic and electronic structure simulations basedon quantum theoretical calculations form a central as-pect of modern chemistry and materials research. Bypredicting molecular and materials properties from first-principles, i.e. by solving equations without experimentalinput, and by simulating atomic-scale dynamics, compu-tational chemists and physicists in academia and indus-try contribute to fundamental mechanistic understandingof chemical processes, to the identification of novel ma-terials, and the optimization of existing ones. Over thelast few decades, computational molecular simulation hasbeen firmly established in the chemical sciences as an im-portant part of the method portfolio. This was accom-panied by a move to streamline and optimize commonworkflows for model building and simulation (see Figure1). Algorithms to perform molecular geometry optimiza-tion, efficient chemical dynamics simulations, and elec-tronic structure calculations have become optimized toperform highly specialized tasks while being massivelyscalable and parallelized across a diverse range of hard-ware architectures.
Simultaneously, PhD graduates inthe field have been trained to be expert users of exist-ing and developers of new simulation workflows. This isthe status quo at the time when machine learning (ML)methods enter the stage.The application of ML to atomistic simulation andelectronic structure theory has been developing rapidlysince its earliest works in a modern context at the end a) Electronic mail: [email protected] of this century’s first decade.
A number of excellentreviews have recently been written to highlight progressin various contexts including the role of ML in catalystdesign, in the development of force-fields and inter-atomic potentials for ground state properties andexcited states, in quantum chemistry, in findingsolutions to the Schrödinger equation, and the role ofunsupervised learning in atomistic simulation (see Ta-ble I for a non-exhaustive list).An excellent retrospective of the last decade of ML inthe context of chemical discovery has recently been pub-lished by von Lilienfeld and Burke, predicting a brightfuture in the context of ML for quantum chemistry thatlies ahead. Indeed, not a day goes by without anothernovel ML approach being published, which promises topredict atomic and electronic properties of molecules andmaterials at ever greater accuracy and efficiency. Goalsof ML models include the parametrization of electronicstructure within highly efficient analytical models thatcan be evaluated extremely fast and that speed up sim-ulations to achieve longer time and length scales or theprovision of descriptors to more efficiently chart the vastspace of chemical compounds and materials. Theseapproaches have the potential to fundamentally changeday-to-day practices, workflows and paradigms in atom-istic and quantum simulation.
But how exactly willML affect the method portfolio of future compu-tational scientists working in electronic structuretheory and molecular simulation?
How will this af-fect a practitioner who wants to determine the equilib-rium structure and ground-state energy of a molecularsystem using electronic structure theory? How will itchange the required expertise and demands on PhD grad-uates? a r X i v : . [ phy s i c s . c h e m - ph ] F e b FIG. 1. Schematic depiction of the key workflow steps incomputational molecular and materials modelling: Modelbuilding and method choice, electronic structure calculations,structure exploration and dynamics, and connection to ex-periment. All of these steps can benefit from ML models.In many cases ML methods do not just enhance existing ap-proaches, but also open avenues towards new workflows.
Despite the comparative novelty of ML in this field, forthe uninitiated, it is easy to get lost in the vast array ofML models, which might soon be comparable to the zooof exchange-correlation functionals available in densityfunctional theory (DFT). What will become the MLequivalent of go-to DFT functionals for practitioners? Atthe moment, there are relatively few examples where MLmodels have become generally applicable to researchersoutside the immediate circle of developers. In this per-spective, we are discussing recent advances through thelens of their potential benefit to a wide community ofcomputational molecular scientists to identify the futurepossibilities of permanent integration of ML-based ap-proaches into workflows and electronic structure and sim-ulation software packages. Central to this perspective isthe question how ML can effectively address the compu-tational bottlenecks in electronic structure calculationsand molecular simulations and what are the steps neededto make ML an integral part of the method portfolio ofthis field.Our goal is to make this account as accessible as pos-sible and to highlight approaches that the communitymight want to keep track of in the future. We stressthat our aim is not to provide a comprehensive reviewof existing approaches, which is beyond the scope of thisperspective. We rather intend to provide a first point ofentry for computational chemists and physicists and sug-
Topic of ML Review Year References
Physical Sciences 2019 Carleo et al. Quantum Chemistry 2020 Dral Noé et al. von Lilienfeld et al. Keith et al. Materials Science 2019 Schleder et al. Ceriotti Interatomic Potentials 2017 Behler et al. Manzhos et al. P. Gkeka et al. Unke et al. Catalysis 2018 Goldsmith et al. et al. Freeze et al. Elton et al. et al. Electronic Structure 2020 Manzhos Excited States 2020 Westermayr et al. TABLE I. Overview of recent reviews of machine learningmethods in electronic structure theory and atomistic simula-tion. This is not intended to be a complete list of all reviewson the subject, but a selection of suggested further reading. gest further reading material in Table I. Following thekey steps of molecular modelling shown in Figure 1, eachsection focuses on how ML methods can benefit a centralworkflow or aspect of computational molecular and ma-terial science. We place a particular focus on approachesthat have the potential to augment existing or introducenew prevalent approaches.
II. MACHINE LEARNING PRIMER
We start with a brief overview of basic terminologyin ML to make this perspective as accessible as possi-ble. ML is concerned with algorithms that improve withincreasing amount of available data under some perfor-mance measure. In contrast to conventional physicalmodels, where one often starts with clear assumptionsabout the system to be modelled, ML focuses on univer-sal approximators . These are able to represent any func-tion with arbitrary accuracy, when given enough trainingdata and parameters. Examples for this class of modelsare neural networks (NNs) with one hidden layer and ar-bitrary number of neurons, or Gaussian processes usingthe squared exponential kernel. Starting from such aflexible approximator, the required amount of trainingdata can be reduced by introducing regularizers and con-straints to obtain, e.g. smooth functions or to encodeknown symmetries. This general framework makes MLbroadly applicable to many domains, where the regular-
FIG. 2. Schematic depiction of different ML model cate-gories. Unsupervised learning techniques use unlabeled dataand are often used for dimensionality reduction or clustering,whereas supervised ML models perform regression or classifi-cation tasks on labelled data. ities of the data are only partially known or cannot beexpressed in analytic form.The tasks tackled by ML can be broadly separatedinto unsupervised, supervised and reinforcement learn-ing.
Supervised ML methods predict properties thatare known for a limited number of examples. This in-cludes classification and regression for categorical andcontinuous properties, respectively (see also Fig. 2). MLforce fields are examples of regression tasks (see sectionIV), while classifiers can, for instance, be used to auto-matically select appropriate quantum chemistry methodsfor a given system (see section III). In contrast, Unsu-pervised ML aims to find patterns in the data thatare specified by an optimization objective without hav-ing access to the ground truth. Tasks falling under thiscategory include clustering, dimensionality reduction orestimation of probability densities. In the context of com-putational chemistry, unsupervised ML finds applicationin post-processing and analysis of simulation data, e.g. inidentifying collective variables (CVs) and reaction path-ways that will be discussed in section VI (see also Fig.2). Another important application is the sampling ofnew data, e.g. with generative models for novel molecu-lar structures. The probability distribution over molec-ular space can be modeled explicitly, for example usingvariational autoencoders, or implicitly, e.g. by gener-ative adversarial networks that provide access to thedistribution only through sampling. In a supervised set-ting, generative models can facilitate inverse design bylearning a probability distribution of chemical structuresconditioned on a desired target range of one or multipleproperties. Finally, reinforcement learning is concerned withlearning the optimal action in a given state to maximizea specified, future reward. An example for this is an un-folded protein (state), where one applies changes to thegeometry (action) in order to come closer to the foldedstructure with minimum energy (future reward). Rein-forcement learning includes an exploration strategy suchthat more data is collected during the training process.Therefore, it can, for example, be used for molecular de-sign without requiring a representative set of referencestructures before training.
III. ML IMPROVES MODEL BUILDING, METHODCHOICE, AND OPENS NEW MULTI-SCALE APPROACHES
The first task one faces when investigating a chemicalproblem in silico is to determine a suitable computationalmodel. The modeling process involves the design of theatomistic structural model and the choice of electronicstructure or molecular mechanics method. Both choicestraditionally are based on achieving a balance betweena sufficiently accurate description of the chemical phe-nomena to be studied and limited computational effortthat renders the calculations feasible. The model build-ing stage furthermore involves a range of decisions on howto represent the system, for example, how to treat envi-ronments such as solvents, what size the simulation cellshould have, or which atoms to model explicitly. All thesedecisions can influence the quality of results at a funda-mental level and hence need to be considered carefully.Unfortunately, choices are often ambiguous and differentmethods can still yield similar results or may only workin certain combinations. The associated design choicestypically require a mix of expertise and chemical intu-ition of experienced practitioners. This makes it hard tosee how ML could help to automate this process. Never-theless, ML models can e.g. learn to infer decision rulesor categorize complex patterns in a purely data drivenfashion. This makes them a promising tool to providesupport during the model building stage, making bal-anced model building choices more widely available andpotentially achieving fully automated decision making inthe future.As stated above, one of the central aspects of compu-tational chemistry is the selection of a suitable ref-erence method . Methods can range from electronicstructure theory methods (e.g. correlated wavefunctionor density functional approaches) to more approximateempirical force fields. Depending on the particular ap-proximations used, a method can be appropriate for cer-tain chemical systems, while giving unreliable predictionsfor others. Moreover, applicability of a method often de-pends on particular details in the electronic structure ofa system and how it transforms during a study. All theseeffects greatly complicate the method selection processmaking it nearly impossible to determine a universal setof rules.One strategy towards transparent method selectionprotocols is uncertainty quantification.
Currently,theoretical predictions tend to be reduced to a sin-gle number, without considering the spread due to e.g.method-specific modeling errors. Access to confidenceintervals can provide several key advantages beyond de-termining how well a particular method is suited for atask. Trends in method predictions can be analyzed ina more general manner, going beyond the snapshots pro-vided by traditional benchmark studies. When combinedwith experiment, uncertainties assigned to theoreticalpredictions allow for a better separation of error sourcesand interpretation of results. Recently, some progresshas been made in tackling this problem with ML algo-rithms and Bayesian approaches in particular. Bayesianerror estimation has been successfully used to constructmultiple density functionals. Wellendorff et al. re-ported a Bayesian functional with a non-local van derWaals correlation term. This so-called BEEF-vdW func-tional provides predictions as well as computational er-ror estimates. They demonstrated the utility of BEEF-vdW based on two surface science problems, modelinggraphene adsorption on a Ni(111) surface and the bind-ing of CO to Pt(111) and Rh(111) substrates. Bayesianframeworks for density functionals were also developedby Aldegunde, Kermode, and Zabaras and Simm andReiher . All these approaches allow for the constructionof specialized density functionals which yield confidenceintervals for computed energies. This makes it possibleto automatically probe the reliability of the method fordifferent compounds and structures and identify prob-lematic situations. By applying this approach to chemi-cal reaction networks, Proppe et al. demonstrated howthis method can further be used to provide uncertaintyestimates for chemical reaction rates. Recently, the ap-proach by Simm and Reiher has been extended to handlelong range dispersion effects. Beyond error estimates, ML has been employed to au-tomatically construct adaptive basis sets for electronicstructure methods. These basis sets are tailored to asystem based only on local structural information andhave been shown to significantly improve the accuracyand efficiency of subsequent electronic structure calcula-tions. Similarly, local pseudopotentials have been con-structed based on kernel ridge regression. Another im-portant decision in method selection is whether the prob-lem of interest exhibits strong electron correlation (alsoreferred to as multi-reference character or static cor-relation). In this case, a single configuration of elec-trons is no longer sufficient to describe the system andsingle-reference methods (e.g. Hartree-Fock or single-reference coupled cluster theory) yield unreliable pre-dictions. Duan et al. have proposed a ML approachto automatically classify chemical systems according totheir multi-reference character in an efficient manner andthus aid in avoiding one potential source of error in themethod selection process. In some situations, it can beadvantageous to rely not on a single method, but in- stead employ a combination of electronic structure theo-ries and basis set levels. Such composite methods profitfrom the cancellation of errors at different levels of the-ory and can offer improved accuracy at lower compu-tational cost. Zaspel et al. have leveraged ML andcombination techniques to derive a composite methodin a data driven fashion. They could demonstrate thattheir method achieved coupled cluster accuracy usingonly lower levels of theory.The model building process encompasses many otheraspects apart from method selection. This includes deci-sions on which structural aspects of the system need tobe modeled at a certain accuracy (e.g. implicit versusexplicit solvation models), whether periodic boundaryconditions are required or which boundary box shapesand sizes are appropriate. Other aspects concern theelectronic structure, especially in the context of multi-reference methods. Most of these approaches requiredecisions on which particular electronic reference con-figurations, often referred to as active space, to includein the description of a system. This problem is highlynontrivial, as it not only depends on the intrinsic elec-tronic structure of a system but also on the chemical phe-nomenon to be studied. As a consequence, these meth-ods (e.g. CASSCF) have been hard to use in a blackbox manner. Jeong et al. recently introduced a MLbased protocol for active space selection in bond disso-ciation studies. Their approach is able to predict goodactive spaces with a reasonable success rate and consti-tutes an important step towards black box applicationsof multi-reference methods.ML approaches further show great potential in thecontext of multi-scale modeling . Multi-scale ap-proaches combine information from different levels of the-ory to bridge different physical scales. Examples in-clude hybrid quantum mechanics/molecular mechanics(QM/MM) simulations . For example, Zhang, Shen,and Yang have shown how a simple ∆ -learning basedmodel can improve the accuracy of solvent free energycalculations. A similar scheme has been employed byBöselt, Thürlemann, and Riniker to simulate the in-teractions of organic compounds in water. Gastegger,Schütt, and Müller used a ML/MM approach where aML model completely replaced the QM region to modelsolvent effects on molecular spectra and reactions. Com-bining fragment methods with ML techniques, Chen,Fang, and Cui were able to investigate excited statesin extended systems. Finally, Caccin et al. have in-troduced a general framework for leveraging multi-scalemodels using ML to simulate crack propagation throughmaterials.While a complete automation of the model buildingstage has not yet been achieved, ML based algorithmshave nevertheless lead to significant progress towards thisendeavor. Due to the complexity of the model buildingprocess, there still is a large number of untouched sub-jects which may serve as fruitful substrate for future MLresearch. IV. ML WILL BOOST THE ACCURACY ANDAPPLICABILITY OF ELECTRONIC STRUCTURE THEORY
The solution to the electronic Schrödinger equation canbe approximated in various ways, where a tug-of-war be-tween accuracy and computational efficiency is crucial toany choice of method. The bottlenecks that need to beaddressed to achieve more efficient electronic structurecalculations are mainly:(1) the evaluation of multi-centre and multi-electroninteraction integrals, which requires optimally-tuned basis representations to construct Hamilto-nians and sets of secular equations and(2) the (iterative) solution of coupled sets of equationsto predict total energies, wave functions, electrondensities and other properties derived thereof.To overcome these bottlenecks, developments of corre-lated wave-function-based methods, exchange-correlationfunctionals within DFT, and methods based on many-body perturbation theory must go hand in hand with al-gorithmic advances. Progress on challenge (2) has beenpropelled by algorithmic ingenuity and a collective com-munity effort to develop massively scalable linear algebraalgorithms to be collected in central libraries such as theElectronic Structure Library (ESL ) and the ElectronicStructure Interface (ELSI ). It is challenge (1), whereML methods can potentially have the biggest impact ineliminating computational bottlenecks while maintaininghigh predictive power.Currently, the most pervasive application of ML is toreplace ab-initio electronic structure calculationswith ab-initio -quality interatomic potentials . Inprinciple, ML models can parametrize any function, suchas the ground-state total energy, the forces, and otherderived properties obtained from a first-principles calcu-lation. Related ML models are already reviewed exten-sively (see Table I for examples). However, the speed-upcomes at the cost of losing access to the electronic struc-ture beyond the target quantity that the ML model wastrained on. Many ML representations of excited state prop-erties , such as HOMO-LUMO gaps, excited-stateenergies, or band gaps have been proposed aswell. Recently, ML models have also been applied to de-rive such properties based on predictions of the densityof states and to obtain excitation spectra. One chal-lenge that is frequently encountered when fitting suchmany-state properties is the non-smoothness of the tar-get functions, which is particularly true when the under-lying reference method is a multi-reference method.
Avoided crossings represent a good example for this be-haviour: When two potential energies become degenerateand form a cusp, the respective coupling values becomesingular at this point in the conformational space. Conse-quently, a direct learning of such properties is prohibitedand a smoothing of a target property or the fitting of aspectrum is required in many cases.
The ML parametrization is limited by the unfavorablescaling associated with bottleneck (1), i.e., many highlyaccurate electronic structure methods are too compu-tationally costly to generate sufficiently large trainingdatasets that enable reliable parametrization. Some-times, better accuracy can be achieved with ∆ -MLapproaches . This approach is based on the assump-tion that the difference in energy between two electronicstructure methods - a low-level one and a high-level one- is easier to represent than either one of the two meth-ods. An alternative to the ∆ -learning approach is transfer learning , where a model is trained on datafrom a low level of theory and retrained with less datapoints of a more accurate method. In both cases, theML model ideally yields an accuracy that is comparableto the higher-level theory. The prediction of energies withcoupled-cluster accuracy for the QM data sets was shownby Smith, Isayev, and Roitberg using transfer learningand mostly range-separated semi-local DFT data. Veryrecently, Bogojeski et al. have demonstrated that sim-ilar accuracy can be achieved by using mostly semi-localDFT reference data and only a few data points calculatedwith coupled-cluster theory.Data efficiency can also be improved by designing NNarchitectures that implicitly satisfy symmetry constraints(i.e. rotational equivariance and permutational invari-ance) and, as a consequence, require much fewer datapoints to achieve a given accuracy. This is only one ofmany possible strategies to include more physical in-formation into ML model architectures . Includingthe mathematical structures and the physical boundaryconditions relevant to electronic structure methods intodeep learning models leads to a further boost of data effi-ciency and model transferability. This has recently beenshown in the context of an ML-based parametrization ofDensity Functional Tight-binding (DFTB). Similarly,the MOB-ML approach uses localized 2-electron inter-action integrals from Hartree-Fock calculations as inputto construct a highly accurate and transferable GaussianProcess Regression model. This is applied to the pre-diction of CCSD correlation energies for a diverse rangeof molecular systems.
Alternatively, rather than cir-cumventing the solution of iterative equations of corre-lated wavefunction methods, ML models may also beused to facilitate faster convergence. Townsend and Vo-giatzis have trained an ML model to facilitate the con-vergence of coupled-cluster methods based on lower-leveltheory electronic structure data. Besides ML modelsbeing powerful to accelerate the computation of targetproperties, they can also be used to predict correlated to-tal energies of molecules based on Hartree-Fock or DFTresults. Examples are NeuralXC, DeepHC, and Orb-Net with NN representations based on atomic orbitalfeatures. We expect that, in future work, highly accurateelectronic structure predictions will make heavy use ofML models that employ "physics-heavy" features derivedfrom efficient low-level methods such as Hartree-Fock orMP2 theory. ML becomes increasingly important as an in-tegrated element of solving quantum many-bodyproblems . First attempts to solve non-homogeneous or-dinary and partial differential equations using ML algo-rithms already date back to more than 20 yearsago for model systems and have recently been appliedto solve the quantum many-body problem for small or-ganic molecular systems.
These efforts have recentlybeen summarized in a comprehensive review and per-spective. While they are conceptually exciting and po-tentially transformative in solving the many body prob-lem, their integration into existing, widely accessible elec-tronic structure software may not be fully practicable yetas existing models are limited to small system sizes andhardly transferable.Rather than using ML methods to learn a represen-tation of quantum states, they can also be used toparametrize electronic structure in an already known rep-resentation that is compatible with well-established elec-tronic structure packages. Such
ML models are ontheir way to becoming an integrated element ofelectronic structure codes . The resulting surrogatemodels, thereby, provide not only predictions of total en-ergies and their derivatives, but further enable the deriva-tion of many additional properties. One such exampleis the SchNOrb model (SchNet for Orbitals), which isbased on the deep tensor NN SchNet. SchNOrb pre-dicts Hamiltonians and overlap matrices in local atomicorbital representation compatible with most quantumchemistry software packages. Thus, it can be trainedwith data from quantum chemistry codes and its predic-tion can directly enter further quantum chemical calcu-lations, e.g. as an initial guess of the wave functions inself-consistent field calculations or to perform perturba-tion theory calculations of correlation energies. Beyondthat, it has been shown that the model can representinteraction integrals in localized effective minimal basisrepresentations. Such representations can be integratedinto existing electronic structure software to mitigate ex-isting bottlenecks in integral evaluation.Alternatively, an ML model may predict the elec-tron density or the density functional . Arecent example of a deep learning framework to predictthe electronic density or properties related to the den-sity of a reference DFT method is DeepDFT. In or-der to assess the electron correlation effects of a targetcompound, which most often cannot be described accu-rately using DFT, the on-top pair density can be usedto achieve better accuracy. Symmetry-adapted frame-works also improve and facilitate ML representations ofthe electron density.
The density can also be usedas input for an ML model, which has proven successfulfor a harmonic and random external potential model sys-tem to predict the potential, total, correlation, external,kinetic, and exchange energy. A universal density functional provided by anML model could potentially eliminate the need for ex-haustively comparing different types of functionals for ML suiteDatabase
I/OStructure
Algebra& MPIDynamics Analysis
Modular hybrid ML/QM code I n t e g r a l e n g i n e predictionProperty FIG. 3. Electronic structure software is increasingly becomingmore modular. By moving away from monolithic (all-in-one)code models to a modular design, atomistic ML toolkits anddata repositories, together with other standardized libraries,can be more deeply integrated into electronic structure work-flows. a given chemical problem. So far, ML has been usedto generate new DFT functionals or to adjust the en-ergy functional, bypassing the need to solve the iterativeKohn-Sham equations and accelerating simulations forthe ground state and excited states signifi-cantly. These models further promise better transferabil-ity for different types of molecular systems. Orbital-freeDFT is another effort that allows for more reliable DFTcalculations, but it requires the kinetic energy densityfunctional.
However, various approaches have beenput forward to parametrize the kinetic energy densityfunctional with different kernel-based and deep learn-ing methods. Li et al. recently presented an ap-proach that integrates the iterative self-consistent fieldalgorithm into an ML model to construct a learned rep-resentation of the exchange-correlation potential for 1Dmodel systems of H and H .The concept of ML-based Hamiltonian and density-functional surrogate models directly leads to the con-struction of approximate electronic structure mod-els based on ML . Recently reported approachesinclude an ML-based Hückel model, parametrizedFrenkel and Tight-binding (TB) Hamiltoni-ans as well as novel semi-empirical methods with ML-tuned parameters. . Beyond that, several groupshave proposed to combine established DFTB Slater-Koster parametrizations with kernel ridge regresseionor NN representations of the repulsive energy contri-butions to improve the accuracy and transferability ofDFTB. Electronic properties have further been ob-tained from self-consistent field with a NN layer that rep-resents the tight-binding Hamiltonian with substantialerror reduction for hydrocarbons. We expect vivid development regarding the deep inte-gration of ML within electronic structure software - anapproach that some package developers already pursue(e.g. in the case of entos and DFTB+ ). Already inrecent years, electronic structure software has started tomove away from monolithic (all-in-one) software to moremodular designs with interfaces to general-purpose stan-dalone libraries (see Fig. 3). These developments willbe helpful in the future to achieve integrated ML/QMsolutions in computational workflows. As can be seen inFig. 3, existing atomistic ML packages such as AMP ,sGDML or SchNetPack could be interfaced withelectronic structure packages that heavily expose internalroutines (e.g. FHI-aims,
PSI4, or PySCF ) andbe used alongside dynamics packages such as i-Pi andSHARC , as well as algebra and electronic struc-ture libraries such as ELSI and ESL . The structuregeneration, workflow and parser tool Atomic Simula-tion Environment (ASE) is for example already inter-faced with the above examples of AMP and SchNetPack.This could also involve a closer integration with existingdata repositories such as NOMAD , the MaterialsProject , or the Quantum Machine repository. V. ML WILL IMPROVE OUR ABILITY TO EXPLOREMOLECULAR STRUCTURE AND MATERIALSCOMPOSITION
A key objective of computational chemistry and ma-terials science is the prediction of new stable structuresand viable reaction pathways to synthesize them. Be-yond the significance to the discovery of new drugs andmaterials, finding stable equilibrium geometries and ac-cessible transition states is a crucial element of compu-tational molecular and materials discovery that typicallyinvolves tailored workflows.
As shown in Fig. 4, opti-mization problems in atomistic simulation span differentscales from searching stable molecules across chemicalspace to charting the global energy landscape spannedby the chemical coordinates of a given molecule downto local structure relaxation and transition state search.Even without considering the computational cost of elec-tronic structure calculations, high-dimensional structuresearch is uniquely challenging and can be greatly facili-tated by ML methods.Efficient chemical exploration methods need to beable to identify CVs in high-dimensional spaces that areassociated with relevant reaction events that occur atvastly different time scales ranging from the femtosec-ond regime (electron transfer and vibrational motion)to multiple nanoseconds (configurational dynamics ofbiomolecules) . It is therefore not surprising that theuse of a variety of methods that fall under the umbrellaof ML, has led to a significant boost in the capability toexplore chemical structure space.Even a task that is nominally as simple as finding thenearest equilibrium structure , i.e. the local mini- mum of the potential energy landscape, can benefit fromML approaches. The most common geometry optimiza-tion algorithms are based on quasi-Newton methods thatdetermine trial steps based on an approximate Hessian.Finding optimal initial guesses and preconditioners forthe Hessian is key to minimizing the number of geom-etry optimizations that are required. Recently, severalmore sophisticated preconditioning schemes have beenproposed based on Gaussian Process Regression that sig-nificantly reduce the required number of geometry opti-mization steps for molecules and transition metal com-plexes , for correlated quantum chemistry methodsthat require numerical differentiation , and for bulkmaterials and molecules adsorbed at surfaces.
Fur-thermore, unsupervised ML can be used to automaticallyidentify if geometry optimization has failed or led to anirrelevant outcome as recently shown for transition metalcomplexes. ML methods have also recently been used toaccelerate the search of first-order saddle pointsor transition states . Denzel and Kästner have usedGaussian Process Regression to improve gradient-basedtransition state search starting from an equilibrium struc-ture (one-ended search).
Simultaneously, severalapproaches have been proposed to incorporate aspectsof ML into double-ended transition state search basedon the Nudged Elastic Band (NEB) method.
Gar-rido Torres et al. have proposed a surrogate GaussianProcess Regresion model to accelerate a NEB method,leading to a factor of 5 to 25 fewer energy and force evalu-ations when compared to the conventional NEB method.One of the most challenging tasks, namely identifyingthe global minimum of a potential energy land-scape associated with the most stable structure ,can be significantly facilitated by the use of ML. Estab-lished methods to perform global optimization are oftenevolutionary algorithms or stochastic methods. Two ex-amples are genetic algorithms and random structuresearch or basin hopping.
A prominent exampleof a global optimisation problem on a complex high-dimensional energy landscape is protein folding. Here,the alphaFold and alphaFold2 deep NN modelswere recently able to show what can be achieved whenML and structure optimisation methods are combined.In alphaFold, the ML model predicts residue distancesand torsional angle distributions. On the basis of this,a coarse-grained potential is constructed to perform asequence of random structure search and optimizationcycles. Hammer and coworkers have proposed a globalstructure prediction algorithm, called ASLA, based onimage recognition and reinforcement learning.
Theuse of image recognition to identify structural charac-teristics removes the need for encoding strings such asSMILES or descriptors of the atomic environment. Theapproach is applicable to molecules as well as materi-als and has been showcased on graphene formation, andoxide surface reconstructions.
Bayesian optimisationhas become a common tool to achieve efficient structure
FIG. 4. Exploration methods can target different scales of molecular and material space. At the highest level, chemical space,both chemical composition and structure are varied. Global exploration targets a single potential energy surface with constantchemical composition and explores different structural conformations and their relative stability. At the lowest level, localdetails of the potential energy surface such as reaction pathways and transition states are investigated. prediction for crystals, surface reconstructions, and hybrid organic/inorganic interfaces to name just afew examples.
They often outperform evolutionaryalgorithms in terms of efficiency.As shown in Figure 4, one level above the search forstable structures in energy landscapes lies the search forpossible stable molecular compositions in chemical space.Generative ML models have recently shown great util-ity to predict molecules with tailored properties ,for example using SMILES representation or molec-ular graphs . While these are supervised approachesthat require reference data for training, several relatedapproaches have been proposed that use reinforcementlearning.
These models can further be constrainedto only predict SMILES strings that are chemicallyvalid.
Well beyond providing stability ranking, thisapproach can be used to generate molecules with arbi-trary target properties to be used in drug and materialsdiscovery. Unfortunately, graph-based generative mod-els are limited in their applicability, since they can notdistinguish between different conformations that lead tothe same graph. However, for applications such as pro-tein folding, optimizing reaction environments or findingreaction paths, it is paramount to have full access to con-formation space. Mansimov et al. proposed a gener-ative model to sample 3d conformations from SMILES.This approach suffers from the same limitations as thegraph representation it is built upon when properties aredirectly related to the 3d structure. There have been sev-eral recent efforts to directly generate 3d molecular struc-tures: Köhler, Klein, and Noé proposed equivariantnormalizing flows, which are able to estimate a probabil-ity density over many-particle systems. This has been ap-plied to finding meta-stable states of large Lennard-Jonessystems. Gebauer, Gastegger, and Schütt introducedG-SchNet that places atoms successively, incorporatingrotational and translational symmetries. The model can be fine-tuned to generate molecules with properties ina specified target range. In a similar manner, Simm et al. have employed reinforcement learning to findstable molecules.With ML methods affecting every aspect of our abil-ity to explore molecular configurations and compositions,their routine application to facilitate continuous explo-ration across composition space is not far, which wouldallow for the variation of the number and type of atomsin the system via
ML-enabled alchemical optimiza-tion . So-called alchemical potentials have long been ap-plied to rational drug design and changing of reac-tion barriers . ML methods, such as NNs, have shownto be capable of modeling alchemical potentials as well as to produce smooth paths through alchemicalspace . We expect a lot of activity in this area in thefuture.
VI. ML ENABLES CLASSICAL AND QUANTUMDYNAMICS FOR SYSTEMS OF UNPRECEDENTED SCALEAND COMPLEXITY
The dynamical motion of atoms is a central target of alarge part of computational research. In molecular sim-ulation, we study the time evolution of electrons andatoms to predict static and dynamic equilibrium prop-erties of molecules and materials at realistic temperatureand pressure conditions, but also to understand nonequi-librium dynamics and rare events that govern chemicalreactions. Dynamics methods range from classical molec-ular dynamics, via mixed quantum-classical methods (in-corporating electronic quantum effects) to quantum dy-namics in full quantum or semi-classical formalisms. Inall cases, equations of motion need to be integrated overtime, which involves numerous evaluations of forces andother properties that govern the dynamics. ML methodscan address bottlenecks in such simulations on variouslevels: by facilitating property evaluations in each timestep, by supporting coarse-graining and the use of largertime steps, and by directly predicting dynamical proper-ties, expectation values, and correlation functions.The most obvious way in which ML can facilitateMD simulations is the use of ML-based interatomicpotentials instead of on-the-fly ab-initio MD .ML-based interatomic potentials that replace electronicstructure evaluation during dynamics is by now com-monly established, see e.g. Refs. 186, 15, and 187, andhas since then enabled simulations of unprecedented com-plexity and scale. For example, a recent breakthroughby Deringer et al. showed that Gaussian Approxima-tion Potentials could be used to predict phase tran-sitions and electronic properties of systems containingmore than 100,000 atoms. Jiang, Li, and Guo haverecently reviewed the transformative role that ML-basedhigh-fidelity PESs play in gas-surface dynamics simula-tions.A key factor in building ML force fields is the efficientand comprehensive sampling of relevant data points. Ac-tive learning schemes have been proposed to ef-ficiently sample the relevant configuration space of a sub-sequent MD simulation. These schemes are based on anuncertainty measure during ML dynamics, which can beused to detect unexplored or undersampled conforma-tional regions. The uncertainty measure could be for in-stance the deviation of two NNs or the statistical uncer-tainty estimate of the inferences made with e.g. Gaus-sian Process Regression. By using gradient-domain MLmodels that are trained on gradients rather than ener-gies, energy-conserving ML force fields can be obtainedwith high accuracy and little amount of training datarequired. ML methods have enabled the simulation of rare eventsthat occur on time scales inaccessible to conventionalMD. A perspective review that recently arose from a CE-CAM conference on "Coarse-graining with ML in molec-ular dynamics" provides a comprehensive overview of MLfor free energy sampling, coarse-graining, and long-timemolecular dynamics . ML methods help to identify CVs, which char-acterize long-time dynamics of molecular systems.This is important to identify long-lived attractor statesin phase spaces and to find strategies to efficiently exploredynamics in complex hierarchical energy landscapes, e.g.for protein folding . ML methods in this domain in-clude principal component analysis (PCA) and ker-nel PCA , diffusion maps,
Markov state mod-els and various types of autoencoders . Dif-fusion map techniques enable the use of previously iden-tified CVs to perform dynamics .Several ML models have been developed that aim toachieve bottom-up coarse-graining by representingthe potential of mean force or free energy surface as afunction of coarse grained variables. This has been donefor instance using NNs to infer conformational free en- ergies for oligomers or to construct a coarse-grainedliquid water potential or a Gaussian approximation-based coarse-grained potential for alanine dipeptide and molecular liquids.
Mixed quantum-classical simulations (MQCD), i.e.classical dynamics of nuclei coupled to the time-dependent quantum mechanical evolution of electrons,are commonly used to simulate light-induced nonadia-batic dynamics of molecules, as well as coupledelectron-nuclear dynamics in extended systems.
Whileon-the-fly MQCD simulations have become feasible in thelast decade, the accessible time scale and the number ofnon-equilibrium trajectories that can feasibly be simu-lated on-the-fly is too limited to enable comprehensivestatistical analysis and ensemble averaging.
ML showsgreat promise in nonadiabatic excited-state sim-ulations as documented by recent works using NNsto construct excited-state energy landscapes to performfewest-switches surface hopping MD at longer time scalesor with more comprehensive ensemble averaging.
Similar progress has been achieved in nonadiabatic dy-namics at metal surfaces, where NNs have been usedto construct excited-state landscapes and continu-ous representations of the electronic friction tensor used in molecular dynamics with electronic friction sim-ulations . It is evident that ML methods will playan important role in extending the range of applicationsfor MQCD methods in the coming years.Even full quantum dynamics simulations have re-cently seen an increasing uptake of ML methodology topush beyond longstanding limitations in the dimension-ality of systems that can be simulated. The main bottle-neck in quantum dynamics simulations is not the evalu-ation of the temporal evolution of the electrons, but thetemporal evolution of the nuclear wavefunction, which in-volves computations that (formally) scale exponentiallywith the number of atoms in the system. Potential en-ergy landscapes in quantum dynamics are typically rep-resented in a diabatic basis rather than the adiabaticrepresentation (directly outputed by electronic structurecodes) in a process called (quasi-)diabatization.
However, quasi-diabatization requires expert knowledgeand is highly complex for more than two coupled elec-tronic states. The construction of diabatic representa-tions with deep NNs has recently shown potential to sim-plify and automate this laborious task.
Besides thePES generation itself, recent works use Gaussian ProcessRegression to fit the diabatic PESs in reduced dimen-sions.
One of the largest ML-enhanced quantumdynamics simulation was performed on a 14-dimensionalenergy landscape for a mycosporine-like amino acid .ML-based interatomic potentials and continuous re-gression models already play an important role acrossall domains of MD simulations that will only increase inthe coming years. In addition, ML methods are benefi-cial in assessing the validity of different approximationsin dynamical simulations as recently shown by Jasin-ski et al. with a Bayesian model to estimate errors0
FIG. 5. Depiction of how ML methods can act as a bridge be-tween theory and experiment. ML models trained on theorypredict spectra with realistic lineshapes. At the same time,ML models can be used to infer structural information fromexperimental measurements. due to different approximations in quantum scatteringsimulations. Going forward, complex dynamical simula-tion methods will become more accessible to non-expertusers with the help of ML and will open avenues totackle complex systems in solvent environments or dy-namics at hybrid organic-inorganic interfaces . A re-cent work by Brieuc et al. employing ML methods toachieve converged path-integral MD simulations of reac-tive molecules in superfluid Helium under cryogenic con-ditions is an exemplary showcase of what the synergy ofML and quantum dynamics methods can achieve. VII. ML HELPS TO CONNECT THEORY ANDEXPERIMENT
The ultimate goal of computational molecular and ma-terials simulation is to connect theory and experiment.This could mean supporting the explanation of experi-mental outcomes or finding new theoretical rules in obser-vations, in both cases leading to a deeper understandingof the physical world and its laws. Forming this connec-tion is a hard task. A plethora of different effects needto be considered in even the simplest atomistic systems,making it very difficult to faithfully reproduce experi-mental conditions in silico. On the other hand, experi-mental observations can be obscured by a variety of influ-ences or by the sheer complexity of the measured signal.Then the search for new insights becomes the figurativesearch for a needle in the haystack.As we have seen in the preceding sections,
ML ap- proaches can increase the accuracy of predictionsand the speed with which they can be obtained.
This makes it possible to carry out computational studieswhich close the gap between theory and experiment bymore efficiently incorporating experimental parameterssuch as finite temperature, measurement conditions andsolvent effects. Moreover, ML techniques can also pro-vide invaluable support in extracting information fromexperimental observations and uncovering trends that arenot directly apparent to the practitioner.One field which has greatly profited from these devel-opments is computational spectroscopy . The pre-diction of spectroscopic properties is a central aspect ofcomputational modeling, as it provides results which canbe directly compared against experiments. Examplesof successful ML applications include the prediction ofdifferent vibrational spectra, combined with different re-sponse properties of the electric field. Gastegger, Behler,and Marquetand have combined a latent charge dipolemodel with interatomic potentials in order to efficientlypredict infrared spectra (IR) of organic molecules in gasphase. This approach has further been applied to modelabsorption spectra.
Raimbault et al. introduceda kernel approach for predicting the Raman spectra of or-ganic crystals based on molecular polarizabilities. Usinga NN based approach, Sommers et al. have demon-strated that ML can also be used to simulate accurateRaman spectra of extended systems such as liquid wa-ter. In addition to vibrational spectra, ML models arealso capable of modeling response properties, allowingthe simulation of electronic excitations using e.g. mixedquantum-classical approaches (see Section VI). For exam-ple, Zhang et al. use NN models to obtain transitiondipole moments, which in turn could be used to predictUV and visible light spectra. ML approaches have furtherbeen used to predict nuclear magnetic resonance (NMR)spectra from molecular simulations. Paruzzo et al. ,for example, have used the kernel model from Ref. 237to predict the chemical shifts in molecular solids. Re-cently, Christensen et al. have introduced an electric fielddependent descriptor in the FCHL Kernel framework .Based on this, they have derived molecular dipole mo-ments as a general response to the electric field, which canbe used to simulate IR spectra of small organic molecules.Gastegger, Schütt, and Müller have applied a responsetheory approach in combination with a deep NN archi-tecture which explicitly depends on electric and magneticfields. They could show that, in this manner, a single MLmodel can predict IR, Raman and NMR spectra. More-over, by introducing the field generated by a molecularenvironment they were able to model the effect of solventson the resulting spectra.Beyond that, ML offers the possibility to directly ex-tract information from experimental observations and relate them to fundamental chemical concepts. Mod-ern ML techniques excel at uncovering trends hidden be-hind other signals or complex patterns, a situation fre-quently encountered when searching for interesting new1phenomena in experimental measurements. This util-ity extends to routine experimental tasks, where ML hasshown promise for automation. One example is the use ofML to interpret different types of spectroscopic measure-ments to determine structural or electronic properties ofmolecules and materials. Fine et al. have recentlypresented a ML approach to extract data on functionalgroups from infrared and mass spectroscopy data, whileKiyohara et al. have successfully applied a ML schemeto obtain chemical, elemental, and geometric informationfrom the X-ray spectra of materials. Another applicationwhere ML shows promise is the automated interpreta-tion of nuclear magnetic resonance spectra with respectto atomic structure, which typically relies heavily on ex-perience. However,
ML can also be used to leverage ex-perimental results in exciting new ways . The ma-jority of chemical knowledge is collected in the form ofpublications. ML approaches such as natural languageprocessing and image recognition offer the possibility todirectly distill functional relationships and chemical in-sights from the massive body of scientific literature. Forinstance, Tshitoyan et al. have used natural languageprocessing to extract complex materials science concepts,such as structure property relationships, from a large col-lection of research literature. They could further demon-strate, that their model was able to generalize on thelearned concepts and recommend materials for differ-ent functional applications. Raccuglia et al. recentlytrained a ML model using information on failed experi-ments extracted from archived laboratory notebooks topredict the reaction success for the crystallization of tem-plated vanadium selenites. Their model was able to learngeneral reaction conditions and even revealed new hy-potheses regarding the conditions for successful productformation.Finally, ML offers new ways in which theory can guideexperiment. Two fields where ML has played a transfor-mative role are molecular/materials discovery andcomputational high-throughput screening , withseveral reviews summarizing recent advances.
The combination of high-throughput screening with ac-curate and efficient ML models has proven to be highlyvaluable, as it allows to substitute most of the requiredelectronic structure calculations . Examples of whatis possible in this space include the objective-free ex-ploration of light-absorbing molecules, drug design, the computational search for highly active transitionmetal complexes that catalyse C-C cross coupling reac-tions, or the discovery of new perovskite materials or polymers for organic photovoltaic applications.
Still, chemical space is estimated to cover morethan molecules , hence exhaustive computationalscreening remains infeasible – even with fast and ac-curate ML models. In this context, ML-enabled in-verse design offers a promising alternative by reversingthe usual paradigm of obtaining properties from struc-ture . Instead, the aim is to create structures ex- hibiting a range of desired properties. Since such MLmodels readily provide analytic gradients, an applicationto property-based structure optimization is straightfor-ward. First steps of applying ML in these areas have re-cently been achieved. Examples include the optimizationof the HOMO-LUMO gap as demonstrated by Schütt et al. and relaxation for crystal structure prediction asinvestigated by Podryabinkin et al. . While ML onlyprovides gradient-based local optimization in these ex-amples, it can be combined with genetic algorithms orglobal optimization methods such as simulated annealingor minima hopping . VIII. OUTLOOK
In the last ten years, ML methods or, specifically,computational methods for high-dimensional nonlinearparametrization and pattern recognition, have becomepervasive in the field of computational molecular scienceto facilitate and enable a diverse range of tasks. Thevast amount of applications reported to date show greatbenefits and contribute positively to the computationalmethod portfolio. In some instances, this pervasive useof ML methodology might have led to cases where similaror better outcomes could have been achieved with con-ventional established approaches. Even in such instances,establishing this knowledge is an important part of theenthusiastic exploratory phase of ML application that weare witnessing in the field. The key question from the per-spective of the practitioners in computational molecularand materials science – i.e. the PhD students, industrialresearch scientists, and academics that apply electronicstructure and molecular simulation software on a dailybasis – is to what extent this will translate into toolsand software that will improve existing and enable newworkflows.We expect that ML methods will soon become an inte-gral part of electronic structure and molecular simulationsoftware pushing the boundaries of existing techniquestowards more accurate and computationally efficient sim-ulations. ML methods may for example replace complexintegral evaluations in the construction of Hamiltoniansand secular equations or they can provide improved ini-tial guesses to iteratively solve integro-differential equa-tions. ML methods can further help to describe non-local effects in time and space and provide mechanismsfor on-the-fly uncertainty quantification and accuracy im-provements. The beneficial scaling properties of ML al-gorithms with respect to the size of atomistic systemswill play an important role in extending the range of ap-plication of existing electronic structure and dynamicssimulation methods. The application of ML to mixedquantum-classical simulations will enable currently un-feasible time and length scales. As we explore systemsof increasing size, we will be able to better study theboundary between quantum effects at the nanoscale aswell as collective many-body effects and fluctuations at2the meso- and macroscale.
A necessary requirement is the establishment and thedistribution of user-friendly and well-maintained sim-ulation software with deep integration of MLmethodology in chemistry and materials science. Soft-ware solutions will need to be modular to allow interfac-ing with well-established deep learning platforms such asTensorFlow or PyTorch. This should involve the estab-lishment of common data standards to easily communi-cate atomistic simulation and electronic structure databetween chemistry and ML packages. In many ways,this requirement is in line with recent trends of increasedmodularity of codes via general libraries such as ESL and ELSI (see Fig. 3). An exciting initiative towards adeeper integration of ML is the ENTOS quantum chem-istry package and ENTOS AI .Another challenge ahead is related to establishing aculture of openness and willingness to share dataand ML models as the availability of training data isa crucial aspect of driving advances in this field. Welldefined materials data standards as put forward by theFair Data Infrastructure project (FAIR-DI) and ab-initio data repositories such as for example the NOMADrepository and the Materials Project are needed.The need for open access to vast amounts of data willneed to be balanced against other needs, such as com-mercial interests that arise from industrial research orcommercial software projects.Sustainable integration of ML methods into widely-used software will require long-term community effortand might be less glamorous than exciting proof-of-principle applications of ML in chemistry and materialsscience. Research funding agencies, reviewers, and indus-trial stakeholders need to acknowledge this and ensurethat sustained funding for such efforts is put in place.If achieved, a deeper integration of ML methodologyinto electronic structure and molecular simulation soft-ware, will induce lasting change in workflows and capa-bilities for computational molecular scientists. Further-more, it will offer the opportunity to reconsider manyof the underpinning design choices of electronic struc-ture and molecular simulation software packages which,in many cases, historically arose from computational effi-ciency considerations. For example, Gaussian basis rep-resentations have been chosen decades ago in quantumchemistry due to the ease of evaluating multi-centre in-tegrals. If ML methods can vastly facilitate the evalu-ation of multi-centre integrals, are Gaussian basis func-tions still the best choice of basis representation?Deeper integration of ML and molecular simulationwill drastically widen participation in the field and up-take of our methods and problem solving approaches. Ifcodes require dramatically less computing resources andoffer the ability to directly predict experimentally acces-sible quantities, computational simulation will becomemore appealing as a complementary tool in synthetic andanalytical labs. In many industrial applications, cost-benefit analysis requires that a clear correspondence ex- ists between the cost of delivering predictions and theaccuracy and precision that is required for an applica-tion. A deeper integration of ML methods will hopefullyalso provide a drive towards establishing better measuresof uncertainty in atomistic simulation.Finally, the method portfolio and skill set ofcomputational molecular scientists will need toadapt as a consequence of the growing importance ofML methods in electronic structure theory and molec-ular simulation. In many cases, the presence of someaspects of ML "under the hood" of existing methods andworkflows will not change how we apply these methods.For example, a DFT functional parametrized by a MLapproach, can be applied as any existing functional (al-though its range of applicability might be very different).In other cases, the presence of ML methods will funda-mentally change basic workflows as we have discussedacross the sections of this perspective. In those instances,practitioners need a basic understanding of ML conceptsand the different models that they are working with. Thisinvolves knowledge of the capabilities and limitations ofmost standard applications to avoid pitfalls. As such,ML methodology will have to become an integral partof education in computational chemistry and materialsscience. ACKNOWLEDGMENTS
This work was funded by the Austrian Science Fund(FWF) [J 4522-N] (J.W.), the Federal Ministry of Ed-ucation and Research (BMBF) for the Berlin Cen-ter for Machine Learning / BIFOLD (01IS18037A)(K.T.S.), and the UKRI Future Leaders Fellowship pro-gramme (MR/S016023/1) (R.J.M.). M.G. works at theBASLEARN – TU Berlin/BASF Joint Lab for MachineLearning, co-financed by TU Berlin and BASF SE. M. J. T. Oliveira et al., “The cecam electronic structure libraryand the modular software development paradigm,” J. Chem.Phys , 024117 (2020). V. W.-Z. Yu et al., “ELSI - An open infrastructure for elec-tronic structure solvers,” Comput. Phys. Commun. , 107459(2020). J. Behler and M. Parrinello, “Generalized neural-network repre-sentation of high-dimensional potential-energy surfaces,” Phys.Rev. Lett. , 146401 (2007). C. Carbogno, J. Behler, A. Groß, and K. Reuter, “Fingerprintsfor Spin-Selection Rules in the Interaction Dynamics of O atAl(111),” Phys. Rev. Lett. , 096104 (2008). R. Dawes, D. L. Thompson, A. F. Wagner, and M. Minkoff, “In-terpolating moving least-squares methods for fitting potentialenergy surfaces: A strategy for efficient automatic data pointplacement in high dimensions,” J. Chem. Phys. , 084107(2008). S. Manzhos and T. Carrington, “An improved neural networkmethod for solving the Schrödinger equation,” Can. J. Chem. , 864–871 (2009). A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi, “Gaus-sian Approximation Potentials: The Accuracy of Quantum Me-chanics, without the Electrons,” Phys. Rev. Lett. , 136403(2010). M. Rupp, A. Tkatchenko, K.-R. Müller, and O. A. Von Lilien-feld, “Fast and accurate modeling of molecular atomization en-ergies with machine learning,” Phys. Rev. Lett. , 058301(2012). J. G. Freeze, H. R. Kelly, and V. S. Batista, “Search for catalystsby inverse design: artificial intelligence, mountain climbers, andalchemists,” Chem. Rev. , 6595–6612 (2019). D. C. Elton, Z. Boukouvalas, M. D. Fuge, and P. W. Chung,“Deep Learning for Molecular Design – A Review of the Stateof the Art,” Mol. Syst. Des. Eng. , 828–849 (2019). J. Behler, “First Principles Neural Network Potentials for Reac-tive Simulations of Large Molecular and Condensed Systems,”Ang. Chem. Int. Ed. , 12828–12840 (2017). T. Mueller, A. Hernandez, and C. Wang, “Machine learningfor interatomic potential models,” J. Chem. Phys. , 50902(2020). S. Manzhos and T. Carrington, “Neural Network Potential En-ergy Surfaces for Small Molecules and Reactions,” Chem. Rev. in press , doi:10.1021/acs.chemrev.0c00665 (2020). P. Gkeka et al., “Machine Learning Force Fields and Coarse-Grained Variables in Molecular Dynamics: Application to Ma-terials and Biological Systems,” J. Chem. Theory Comput. ,4757–4775 (2020). O. T. Unke, S. Chmiela, H. E. Sauceda, M. Gastegger,I. Poltavsky, K. T. Schütt, A. Tkatchenko, and K.-R. Müller,“Machine learning force fields,” arXiv:2010.07067 (2020). J. Westermayr and P. Marquetand, “Machine learning for elec-tronically excited states of molecules,” Chem. Rev. in press ,doi:10.1021/acs.chemrev.0c00749 (2020). J. Westermayr and P. Marquetand, “Machine learning andexcited-state molecular dynamics,” Mach. Learn.: Sci. Technol. , 043001 (2020). P. O. Dral, “Quantum Chemistry in the Age of Machine Learn-ing,” J. Phys. Chem. Lett. , 2336–2347 (2020). O. A. von Lilienfeld, K.-R. Müller, and A. Tkatchenko, “Ex-ploring chemical compound space with quantum-based machinelearning,” Nat. Rev. Chem. , 347–358 (2020). S. Manzhos, “Machine learning for the solution of theSchrödinger equation,” Mach. Learn.: Sci. Technol. , 013002(2020). M. Ceriotti, “Unsupervised machine learning in atomistic sim-ulations, between predictions and understanding,” J. Chem.Phys. , 150901 (2019). G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld,N. Tishby, L. Vogt-Maranto, and L. Zdeborová, “Machine learn-ing and the physical sciences,” Rev. Modern Phys. , 045002(2019). F. Noé, A. Tkatchenko, K.-R. Müller, and C. Clementi, “Ma-chine learning for molecular simulation,” Annu. Rev. Phys.Chem. , 361–390 (2020). J. A. Keith, V. Vassilev-Galindo, B. Cheng, S. Chmiela,M. Gastegger, K.-R. Müller, and A. Tkatchenko, “Combiningmachine learning and computational chemistry for predictiveinsights into chemical systems,” arXiv:2102.06321 (2021). G. R. Schleder, A. C. M. Padilha, C. M. Acosta, M. Costa, andA. Fazzio, “From DFT to machine learning: recent approachesto materials science–a review,” J. Phys. Mater. , 032001 (2019). B. R. Goldsmith, J. Esterhuizen, J. X. Liu, C. J. Bartel, andC. Sutton, “Machine learning for heterogeneous catalyst designand discovery,” AIChE J. , 2311–2323 (2018). X. Yang, Y. Wang, R. Byrne, G. Schneider, and S. Yang, “Con-cepts of artificial intelligence for computer-assisted drug discov-ery,” Chem. Rev. , 10520–10594 (2019). T. Toyao, Z. Maeno, S. Takakusagi, T. Kamachi, I. Takigawa,and K. I. Shimizu, “Machine Learning for Catalysis Informatics:Recent Applications and Prospects,” ACS Catal. , 2260–2297(2020). O. A. von Lilienfeld and K. Burke, “Retrospective on a decadeof machine learning for chemical discovery,” Nat. Commun. ,4895 (2020). O. A. von Lilienfeld, “Quantum Machine Learning in Chemi-cal Compound Space,” Angew. Chem. - Int. Ed. , 4164–4169(2018). K. T. Schütt, S. Chmiela, O. A. von Lilienfeld, A. Tkatchenko,K. Tsuda, and K.-R. Müller, eds.,
Machine Learning MeetsQuantum Physics (Springer, 2020). A. D. Becke, “Perspective: Fifty years of density-functional the-ory in chemical physics,” J. Chem. Phys. , 18A301 (2014). M. Leshno, V. Y. Lin, A. Pinkus, and S. Schocken, “Multilayerfeedforward networks with a nonpolynomial activation functioncan approximate any function,” Neural Netw. , 861–867 (1993). K. Hansen, G. Montavon, F. Biegler, S. Fazli, M. Rupp,M. Scheffler, O. A. Von Lilienfeld, A. Tkatchenko, and K.-R.Müller, “Assessment and validation of machine learning meth-ods for predicting molecular atomization energies,” J. Chem.Theory Comput. , 3404–3419 (2013). D. P. Kingma and M. Welling, “Auto-encoding variationalbayes,” in
Proceedings of the 2nd International Conference onLearning Representations (2014). I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adver-sarial nets,” Advances in Neural Information Processing Systems , 2672–2680 (2014). Z. Shamsi, K. J. Cheng, and D. Shukla, “Reinforcement learningbased adaptive sampling: Reaping rewards by exploring proteinconformational landscapes,” J. Phys. Chem. B , 8386–8395(2018). B. Ruscic, “Uncertainty quantification in thermochemistry,benchmarking electronic structure computations, and activethermochemical tables,” Int. J. Quantum Chem. , 1097–1101(2014). A. Chernatynskiy, S. R. Phillpot, and R. LeSar, “Uncertaintyquantification in multiscale simulation of materials: A prospec-tive,” Ann. Rev. Mater. Res. , 157–182 (2013). J. Wellendorff, K. T. Lundgaard, A. Møgelhøj, V. Petzold, D. D.Landis, J. K. Nørskov, T. Bligaard, and K. W. Jacobsen, “Den-sity functionals for surface science: Exchange-correlation modeldevelopment with bayesian error estimation,” Phys. Rev. B ,235149 (2012). M. Aldegunde, J. R. Kermode, and N. Zabaras, “Developmentof an exchange–correlation functional with uncertainty quantifi-cation capabilities for density functional theory,” J. Comput.Phys. , 173–195 (2016). G. N. Simm and M. Reiher, “Systematic error estimation forchemical reaction energies,” J. Chem. Theory Comput. ,2762–2773 (2016). J. Proppe, T. Husch, G. N. Simm, and M. Reiher, “Uncertaintyquantification for quantum chemical models of complex reactionnetworks,” Faraday Discuss. , 497–520 (2017). J. Proppe, S. Gugler, and M. Reiher, “Gaussian process-basedrefinement of dispersion corrections,” J. Chem. Theory Comput. , 6046–6060 (2019). O. Schütt and J. VandeVondele, “Machine learning adaptive ba-sis sets for efficient large scale density functional theory simula-tion,” J. Chem. Theory Comput. , 4168–4175 (2018). J. Lüder and S. Manzhos, “Nonparametric local pseudopoten-tials with machine learning: A tin pseudopotential built usinggaussian process regression,” J. Phys. Chem. A , 11111–11124 (2020). C. Duan, F. Liu, A. Nandy, and H. J. Kulik, “Semi-supervisedMachine Learning Enables the Robust Detection of Multirefer-ence Character at Low Cost,” J. Phys. Chem. Lett. , 6640–6648 (2020). P. Zaspel, B. Huang, H. Harbrecht, and O. A. von Lilienfeld,“Boosting quantum machine learning models with a multilevelcombination technique: Pople diagrams revisited,” J. Chem.Theory Comput. , 1546–1559 (2018). W. Jeong, S. J. Stoneburner, D. King, R. Li, A. Walker,R. Lindh, and L. Gagliardi, “Automation of active space selec-tion for multireference methods via machine learning on chemi- cal bond dissociation,” J. Chem. Theory Comput. , 2389–2399(2020). Y.-J. Zhang, A. Khorshidi, G. Kastlunger, and A. A. Peterson,“The potential for machine learning in hybrid QM/MM calcu-lations,” J. Chem. Phys. , 241740 (2018). P. Zhang, L. Shen, and W. Yang, “Solvation free energy calcu-lations with quantum mechanics/molecular mechanics and ma-chine learning models,” J. Phys. Chem. B , 901–908 (2018). L. Böselt, M. Thürlemann, and S. Riniker, “Machine learningin qm/mm molecular dynamics simulations of condensed-phasesystems,” arXiv:2010.11610 (2020). M. Gastegger, K. T. Schütt, and K.-R. Müller, “Machine learn-ing of solvent effects on molecular spectra and reactions,”arXiv:2010.14942 (2020). W.-K. Chen, W.-H. Fang, and G. Cui, “Integrating machinelearning with the multilayer energy-based fragment method forexcited states of large systems,” J. Phys. Chem. Lett. , 7836–7841 (2019). M. Caccin, Z. Li, J. R. Kermode, and A. De Vita, “A frame-work for machine-learning-augmented multiscale atomistic sim-ulations on parallel supercomputers,” Int. J. Quantum Chem. , 1129–1139 (2015). W. Pronobis, K. R. Schütt, A. Tkatchenko, and K.-R. Müller,“Capturing intensive and extensive dft/tddft molecular proper-ties with machine learning,” Eur. Phys. J. B , 178 (2018). K. Ghosh, A. Stuke, M. Todorović, P. B. Jørgensen, M. N.Schmidt, A. Vehtari, and P. Rinke, “Deep learning spectroscopy:Neural networks for molecular excitation spectra,” Adv. Sci. ,1801367 (2019). K. T. Schütt, M. Gastegger, A. Tkatchenko, K.-R. Müller, andR. J. Maurer, “Unifying machine learning and quantum chem-istry with a deep neural network for molecular wavefunctions,”Nat. Commun. , 5024 (2019). R. Ramakrishnan, M. Hartmann, E. Tapavicza, and O. A. vonLilienfeld, “Electronic spectra from TDDFT and machine learn-ing in chemical space,” J. Chem. Phys. , 084111 (2015). J. Westermayr and P. Marquetand, “Deep learning for uv ab-sorption spectra with schnarc: First steps toward transferabil-ity in chemical compound space,” J. Chem. Phys. , 154112(2020). K. T. Schütt, H. Glawe, F. Brockherde, A. Sanna, K.-R. Müller,and E. K. Gross, “How to represent crystal structures for ma-chine learning: Towards fast prediction of electronic properties,”Phys. Rev. B , 205118 (2014). Y. Zhuo, A. Mansouri Tehrani, and J. Brgoch, “Predicting theband gaps of inorganic solids by machine learning,” J. Phys.Chem. Lett. , 1668–1673 (2018). J. Lee, A. Seko, K. Shitara, K. Nakayama, and I. Tanaka, “Pre-diction model of band gap for inorganic compounds by com-bination of density functional theory calculations and machinelearning techniques,” Phys. Rev. B , 115104 (2016). G. Pilania, J. Gubernatis, and T. Lookman, “Multi-fidelitymachine learning models for accurate bandgap predictions ofsolids,” Comput. Mat. Sci. , 156–163 (2017). C. B. Mahmoud, A. Anelli, G. Csányi, and M. Ceriotti, “Learn-ing the electronic density of states in condensed matter,” Phys.Rev. B , 235130 (2020). J. Westermayr, M. Gastegger, M. F. S. J. Menger, S. Mai,L. González, and P. Marquetand, “Machine Learning En-ables Long Time Scale Molecular Photodynamics Simulations,”Chem. Sci. , 8100–8107 (2019). Y. Zhang, C. Hu, and B. Jiang, “Embedded atom neural net-work potentials: efficient and accurate machine learning witha physically inspired representation,” J. Phys. Chem. Lett. ,4962–4967 (2019). J. Westermayr, M. Gastegger, and P. Marquetand, “CombiningSchNet and SHARC: The SchNarc Machine Learning Approachfor Excited-State Dynamics,” J. Phys. Chem. Lett. , 3828–3834 (2020). P. O. Dral, A. Owens, A. Dral, and G. Csányi, “Hierarchical machine learning of potential energy surfaces,” J. Chem. Phys. , 204110 (2020). M. Bogojeski, L. Vogt-Maranto, M. Tuckerman, K.-R. Müller,and K. Burke, “Quantum chemical accuracy from density func-tional approximations via machine learning,” Nat. Commun. ,5223 (2020). S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEET. Know. Data En. , 1345–1359 (2010). J. S. Smith, O. Isayev, and A. E. Roitberg, “ANI-1: an exten-sible neural network potential with DFT accuracy at force fieldcomputational cost,” Chem. Sci. , 3192–3203 (2017). S. Batzner, T. E. Smidt, L. Sun, J. P. Mailoa, M. Kornbluth,N. Molinari, and B. Kozinsky, “Se(3)-equivariant graph neuralnetworks for data-efficient and accurate interatomic potentials,”arXiv:2101.03164 (2021). K. T. Schütt, O. T. Unke, and M. Gastegger, “Equivariant mes-sage passing for the prediction of tensorial properties and molec-ular spectra,” arXiv:2102.03150 (2021). H. Li, C. Collins, M. Tanha, G. J. Gordon, and D. J. Yaron, “ADensity Functional Tight Binding Layer for Deep Learning ofChemical Hamiltonians,” J . Chem. Theory Comput. , 5764–5776 (2018). M. Welborn, L. Cheng, and T. F. Miller, “Transferability inMachine Learning for Electronic Structure via the MolecularOrbital Basis,” J. Chem. Theory Comput. , 4772–4779 (2018). L. Cheng, M. Welborn, A. S. Christensen, and T. F. Miller, “Auniversal density matrix functional from molecular orbital-basedmachine learning: Transferability across organic molecules,” J.Chem. Phys. , 131103 (2019). T. Husch, J. Sun, L. Cheng, S. J. R. Lee, and T. F. M. III, “Im-proved accuracy and transferability of molecular-orbital-basedmachine learning: Organics, transition-metal complexes, non-covalent interactions, and transition states,” arXiv:2010.03626(2020). J. Townsend and K. D. Vogiatzis, “Data-driven acceleration ofthe coupled-cluster singles and doubles iterative solver,” J. Phys.Chem. Lett. , 4129–4135 (2019). S. Dick and M. Fernandez-Serra, “Machine learning accurateexchange and correlation functionals of the electronic density,”Nat. Commun. , 3509 (2020). Y. Chen, L. Zhang, H. Wang, and W. Weinan, “Ground StateEnergy Functional with Hartree-Fock Efficiency and ChemicalAccuracy,” J. Phys. Chem. A , 7155–7165 (2020). Z. Qiao, M. Welborn, A. Anandkumar, F. R. Manby, andT. F. Miller, “Orbnet: Deep learning for quantum chemistry us-ing symmetry-adapted atomic-orbital features,” J. Chem. Phys. , 124111 (2020). I. Lagaris, A. Likas, and D. Fotiadis, “Artificial neural networkmethods in quantum mechanics,” Comput. Phys. Commun. ,1 – 14 (1997). I. E. Lagaris, A. Likas, and D. G. Papageorgiou, “Neural net-work methods for boundary value problems defined in arbitrarilyshaped domains,” CoRR cs.NE/9812003 (1998). M. Sugawara, “Numerical solution of the Schrödinger equationby neural network and genetic algorithm,” Comput. Phys. Com-mun. , 366–380 (2001). G. Carleo and M. Troyer, “Solving the quantum many-bodyproblem with artificial neural networks,” Science , 602–606(2017). H. Saito, “Solving the bose–hubbard model with machine learn-ing,” J. Phys. Soc. Jpn. , 093001 (2017). Y. Nomura, A. S. Darmawan, Y. Yamaji, and M. Imada, “Re-stricted boltzmann machine learning for solving strongly corre-lated quantum systems,” Phys. Rev. B , 205152 (2017). J. Han, L. Zhang, and W. E, “Solving Many-ElectronSchrödinger Equation using Deep Neural Networks,” J. Com-put. Phys. , 108929 (2019). D. Pfau, J. S. Spencer, A. G. D. G. Matthews, and W. M. C.Foulkes, “Ab initio solution of the many-electron schrödingerequation with deep neural networks,” Phys. Rev. Res. , 033429 (2020). J. Hermann, Z. Schätzle, and F. Noé, “Deep-neural-network so-lution of the electronic schrödinger equation,” Nat. Chem. ,891–897 (2020). K. Choo, G. Carleo, N. Regnault, and T. Neupert, “Symme-tries and many-body excitations with neural-network quantumstates,” Phys. Rev. Lett. , 167204 (2018). F. Zheng, X. Gao, and A. Eisfeld, “Excitonic wave functionreconstruction from near-field spectra using machine learningtechniques,” Phys. Rev. Lett. , 163202 (2019). K. T. Schütt, H. E. Sauceda, P. J. Kindermans, A. Tkatchenko,and K. R. Müller, “SchNet - A deep learning architecture formolecules and materials,” J. Chem. Phys. , 241722 (2018). K. T. Schütt, P. Kessel, M. Gastegger, K. A. Nicoli,A. Tkatchenko, and K.-R. Müller, “Schnetpack: A deep learningtoolbox for atomistic systems,” J. Chem. Theory Comput. ,448–455 (2019). M. Gastegger, A. McSloy, M. Luya, K. T. Schütt, and R. J.Maurer, “A deep neural network for molecular wave functionsin quasi-atomic minimal basis representation,” J. Chem. Phys. , 044123 (2020). L. Li, S. Hoyer, R. Pederson, R. Sun, E. D. Cubuk, P. Riley, andK. Burke, “Kohn-sham equations as regularizer: building priorknowledge into machine-learned physics,” Phys. Rev. Lett. ,036401 (2021). P. B. Jørgensen and A. Bhowmik, “Deepdft: Neural mes-sage passing network for accurate charge density prediction,”arXiv:2011.03346 (2020). A. Fabrizio, K. R. Briling, D. D. Girardier, and C. Corminboeuf,“Learning on-top: Regressing the on-top pair density for real-space visualization of electron correlation,” J. Chem. Phys. ,204111 (2020).
A. Grisafi, A. Fabrizio, B. Meyer, D. M. Wilkins, C. Cormin-boeuf, and M. Ceriotti, “Transferable machine-learning modelof the electron density,” ACS Cent. Sci. , 57–64 (2019). A. Fabrizio, A. Grisafi, B. Meyer, M. Ceriotti, and C. Cormin-boeuf, “Electron density learning of non-covalent systems,”Chem. Sci. , 9424–9432 (2019). A. Fabrizio, K. Briling, A. Grisafi, and C. Corminboeuf, “Learn-ing (from) the electron density: Transferability, conformationaland chemical diversity,” CHIMIA Int. J. Chem. , 232–236(2020). K. Ryczko, D. A. Strubbe, and I. Tamblyn, “Deep learning anddensity-functional theory,” Phys. Rev. A , 022512 (2019).
J. C. Snyder, M. Rupp, K. Hansen, K.-R. Müller, and K. Burke,“Finding density functionals with machine learning,” Phys. Rev.Lett. , 253002 (2012).
F. Brockherde, L. Vogt, L. Li, M. E. Tuckerman, K. Burke,and K. R. Müller, “Bypassing the Kohn-Sham equations withmachine learning,” Nat. Commun. , 872 (2017). M. Babaei, Y. T. Azar, and A. Sadeghi, “Locality meets ma-chine learning: Excited and ground-state energy surfaces of largesystems at the cost of small ones,” Phys. Rev. B , 115132(2020).
J. Schmidt, C. L. Benavides-Riveros, and M. A. L. Marques,“Machine learning the physical nonlocal exchange–correlationfunctional of density-functional theory,” J. Phys. Chem. Lett. , 6425–6431 (2019). J. Nelson, R. Tiwari, and S. Sanvito, “Machine learning densityfunctional theory for the hubbard model,” Phys. Rev. B ,075132 (2019). X. Lei and A. J. Medford, “Design and analysis of machine learn-ing exchange-correlation functionals via rotationally invariantconvolutional descriptors,” Phys. Rev. Mater. , 063801 (2019). Y. Suzuki, R. Nagai, and J. Haruyama, “Machine learn-ing exchange-correlation potential in time-dependent density-functional theory,” Phys. Rev. A , 050501 (2020).
V. L. Lignères and E. A. Carter, “An introduction to orbital-freedensity functional theory,” in
Handbook of Materials Modeling:Methods , edited by S. Yip (Springer Netherlands, Dordrecht, 2005) pp. 137–148.
Y. A. Wang, N. Govind, and E. A. Carter, “Orbital-free kinetic-energy density functionals with a density-dependent kernel,”Phys. Rev. B , 16350–16358 (1999). P. Golub and S. Manzhos, “Kinetic energy densities based onthe fourth order gradient expansion: performance in differentclasses of materials and improvement via machine learning,”Phys. Chem. Chem. Phys. , 378–395 (2019). J. Seino, R. Kageyama, M. Fujinami, Y. Ikabata, and H. Nakai,“Semi-local machine-learned kinetic energy density functionaldemonstrating smooth potential energy curves,” Chem. Phys.Lett. , 136732 (2019).
R. Meyer, M. Weichselbaum, and A. W. Hauser, “Machine learn-ing approaches toward orbital-free density functional theory: Si-multaneous training on the kinetic energy density functional andits functional derivative,” J. Chem. Theory Comput. , 5685–5694 (2020). T. Zubatyuk, B. Nebgen, N. Lubbers, J. S. Smith, R. Zubatyuk,G. Zhou, C. Koh, K. Barros, O. Isayev, and S. Tretiak, “MachineLearned Hückel Theory: Interfacing Physics and Deep NeuralNetworks,” arXiv:1909.12963 (2019).
A. Farahvash, C.-K. Lee, Q. Sun, L. Shi, and A. P. Willard,“Machine learning frenkel hamiltonian parameters to acceleratesimulations of exciton dynamics,” J. Chem. Phys. , 074111(2020).
F. Häse, S. Valleau, E. Pyzer-Knapp, and A. Aspuru-Guzik,“Machine learning exciton dynamics,” Chem. Sci. , 5139–5147(2016). Y. Zhang, S. Ye, J. Zhang, J. Jiang, and B. Jiang, “Towardsefficient and accurate spectroscopic simulations in extended sys-tems with symmetry-preserving neural network models for ten-sorial properties,” arXiv:2004.13605 (2020).
M. Krämer, P. M. Dohmen, W. Xie, D. Holub, A. S. Chris-tensen, and M. Elstner, “Charge and exciton transfer simula-tions using machine-learned hamiltonians,” J. Chem. TheoryComput. , 4061–4070 (2020). Z. Wang, S. Ye, H. Wang, J. He, Q. Huang, and S. Chang, “Ma-chine learning method for tight-binding Hamiltonian parame-terization from ab-initio band structure,” npj Comput. Mater. , 11 (2021). P. O. Dral, O. A. Von Lilienfeld, and W. Thiel, “Machine learn-ing of parameters for accurate semiempirical quantum chemicalcalculations,” J. Chem. Theory Comput. , 2120–2125 (2015). C.-P. Chou, Y. Nishimura, C.-C. Fan, G. Mazur, S. Irle, andH. A. Witek, “Automatized parameterization of dftb using par-ticle swarm optimization,” J. Chem. Theory Comput. , 53–64(2016). M. Stöhr, L. Medrano Sandonas, and A. Tkatchenko, “Accu-rate many-body repulsive potentials for density-functional tightbinding from deep tensor neural networks,” J. Phys. Chem. Lett. , 6835–6843 (2020). C. Panosetti, A. Engelmann, L. Nemec, K. Reuter, and J. T.Margraf, “Learning to use the force: Fitting repulsive potentialsin density-functional tight-binding with gaussian process regres-sion,” J. Chem. Theory Comput. , 2181–2191 (2020). F. Manby, T. Miller, P. Bygrave, F. Ding, T. Dressel-haus, F. Batista-Romero, A. Buccheri, C. Bungey, S. Lee,R. Meli, K. Miyamoto, C. Steinmann, T. Tsuchiya, M. Wel-born, T. Wiles, and Z. Williams, “entos: A Quantum MolecularSimulation Package,” chemrxiv , 10.26434/chemrxiv.7762646.v2(2019).
B. Hourahine et al., “DFTB+, a software package for efficientapproximate density functional theory based atomistic simula-tions,” J. Chem. Phys. , 124101 (2020). “Quantum Chemistry’s Modular Movement,” Chem. Eng. News , 26 (2014). A. Khorshidi and A. A. Peterson, “Amp: A modular approachto machine learning in atomistic simulations,” Comput. Phys.Commun. , 310–324 (2016).
S. Chmiela, H. E. Sauceda, I. Poltavsky, K.-R. Müller, and A. Tkatchenko, “sgdml: Constructing accurate and data efficientmolecular force fields using machine learning,” Comput. Phys.Commun. , 38 – 45 (2019).
K. T. Schütt, P. J. Kindermans, H. E. Sauceda, S. Chmiela,A. Tkatchenko, and K. R. Müller, “SchNet: A continuous-filterconvolutional neural network for modeling quantum interac-tions,” in
Advances in Neural Information Processing Systems ,Vol. 2017-Decem (2017) pp. 992–1002.
V. Blum, R. Gehrke, F. Hanke, P. Havu, V. Havu, X. Ren,K. Reuter, and M. Scheffler, “Ab initio molecular simulationswith numeric atom-centered orbitals,” Comp. Phys. Commun. , 2175–2196 (2009).
D. G. A. Smith, “Psi4 1.4: Open-source software for high-throughput quantum chemistry,” J. Chem. Phys. , 184108(2020).
Q. Sun, T. C. Berkelbach, N. S. Blunt, G. H. Booth, S. Guo,Z. Li, J. Liu, J. D. McClain, E. R. Sayfutyarova, S. Sharma,S. Wouters, and G. K. Chan, “Pyscf: the python-based simula-tions of chemistry framework,” Wiley Interdiscip. Rev.: Com-put. Mol. Sci. , e1340 (2017). V. Kapil, M. Rossi, O. Marsalek, R. Petraglia, Y. Litman,T. Spura, B. Cheng, A. Cuzzocrea, R. H. Meißner, D. M.Wilkins, B. A. Helfrecht, P. Juda, S. P. Bienvenue, W. Fang,J. Kessler, I. Poltavsky, S. Vandenbrande, J. Wieme, C. Cormin-boeuf, T. D. Kühne, D. E. Manolopoulos, T. E. Markland, J. O.Richardson, A. Tkatchenko, G. A. Tribello, V. Van Speybroeck,and M. Ceriotti, “i-pi 2.0: A universal force engine for advancedmolecular simulations,” Comput. Phys. Commun. , 214–223(2019).
S. Mai, P. Marquetand, and L. González, “Nonadiabatic Dy-namics: The SHARC Approach,” WIREs Comput. Mol. Sci. ,e1370 (2018). M. Richter, P. Marquetand, J. González-Vázquez, I. Sola, andL. González, “SHARC: Ab initio molecular dynamics with sur-face hopping in the adiabatic representation including arbitrarycouplings,” J. Chem. Theory Comput. , 1253–1258 (2011). A. H. Larsen, “The atomic simulation environment—a pythonlibrary for working with atoms,” J. Phys.: Condens. Matter ,273002 (2017). C. Draxl and M. Scheffler, “NOMAD: The FAIR concept for bigdata-driven materials science,” MRS Bull. , 676–682 (2018). C. Draxl and M. Scheffler, “The NOMAD laboratory: from datasharing to artificial intelligence,” J. Phys. Mater. , 036001(2019). “Quantum machine repository,” http://quantum-machine.org/datasets/ . A. R. Oganov, C. J. Pickard, Q. Zhu, and R. J. Needs, “Structureprediction drives materials discovery,” Nat. Rev. Mater. , 331–348 (2019). G. R. Fleming and P. G. Wolynes, “Chemical dynamics in solu-tion,” Phys. Today , 36–43 (1990). A. Denzel and J. Kästner, “Gaussian Process Regression forTransition State Search,” J. Chem. Theory Comput. , 5777–5786 (2018). R. Meyer and A. W. Hauser, “Geometry optimization usingGaussian process regression in internal coordinate systems,” J.Chem. Phys. , 84112 (2020).
G. Raggi, I. F. Galván, C. L. Ritterhoff, M. Vacher, andR. Lindh, “Restricted-Variance Molecular Geometry Optimiza-tion Based on Gradient-Enhanced Kriging,” J. Chem. TheoryComput. , 3989–4001 (2020). G. Schmitz and O. Christiansen, “Gaussian process regressionto accelerate geometry optimizations relying on numerical dif-ferentiation,” J. Chem. Phys. , 241704 (2018).
E. Garijo del Río, J. J. Mortensen, and K. W. Jacobsen, “Localbayesian optimizer for atomic structures,” Phys. Rev. B ,104103 (2019).
E. Garijo del Río, S. Kaappa, J. A. Garrido Torres, T. Bligaard,and K. W. Jacobsen, “Machine learning with bond informationfor local structure optimizations in surface science,” J. Chem. Phys. , 234116 (2020).
A. Denzel and J. Kästner, “Hessian Matrix Update Scheme forTransition State Search Based on Gaussian Process Regression,”J. Chem. Theory Comput. , 5083–5089 (2020). A. A. Peterson, “Acceleration of saddle-point searches with ma-chine learning,” J. Chem. Phys. , 074106 (2016).
O. P. Koistinen, F. B. Dagbjartsdóttir, V. Ásgeirsson, A. Ve-htari, and H. Jónsson, “Nudged elastic band calculations accel-erated with Gaussian process regression,” J. Chem. Phys. ,152720 (2017).
J. A. Garrido Torres, P. C. Jennings, M. H. Hansen, J. R. Boes,and T. Bligaard, “Low-Scaling Algorithm for Nudged ElasticBand Calculations Using a Surrogate Machine Learning Model,”Phys. Rev. Lett. , 156001 (2019).
F. Curtis, X. Li, T. Rose, Á. Vázquez-Mayagoitia, S. Bhat-tacharya, L. M. Ghiringhelli, and N. Marom, “GAtor: A First-Principles Genetic Algorithm for Molecular Crystal StructurePrediction,” J. Chem. Theory Comput. , 2246–2264 (2018). C. J. Pickard and R. J. Needs, “Ab initio random structuresearching,” J. Phys. Cond. Matter , 053201 (2011). D. J. Wales and J. P. K. Doye, “Global Optimization by Basin-Hopping and the Lowest Energy Structures of Lennard-JonesClusters Containing up to 110 Atoms,” J. Phys. Chem. A ,5111–5116 (1997).
C. Panosetti, K. Krautgasser, D. Palagin, K. Reuter, and R. J.Maurer, “Global materials structure search with chemically-motivated coordinates.” Nano Lett. , 8044–8048 (2015). A. W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre,T. Green, C. Qin, A. Žídek, A. W. R. Nelson, A. Bridgland,H. Penedones, S. Petersen, K. Simonyan, S. Crossan, P. Kohli,D. T. Jones, D. Silver, K. Kavukcuoglu, and D. Hassabis, “Im-proved protein structure prediction using potentials from deeplearning,” Nature , 706–710 (2020).
J. J. et al., “High Accuracy Protein Structure Prediction Us-ing Deep Learning,” in
Fourteenth Critical Assessment of Tech-niques for Protein Structure Prediction (2020).
M. S. Jørgensen, H. L. Mortensen, S. A. Meldgaard, E. L. Kols-bjerg, T. L. Jacobsen, K. H. Sørensen, and B. Hammer, “Atom-istic structure learning,” J. Chem. Phys. , 054111 (2019).
H. L. Mortensen, S. A. Meldgaard, M. K. Bisbo, M.-P. V. Chris-tiansen, and B. Hammer, “Atomistic structure learning algo-rithm with surrogate energy model relaxation,” Phys. Rev. B , 075427 (2020).
S. R. A. Meldgaard, H. L. Mortensen, M. S. Jorgensen, andB. Hammer, “Structure prediction of surface reconstructionsby deep reinforcement learning,” J. Condens. Matter Phys. ,404005 (2020). T. Yamashita, N. Sato, H. Kino, T. Miyake, K. Tsuda,and T. Oguchi, “Crystal structure prediction accelerated byBayesian optimization,” Phys. Rev. Mater. , 013803 (2018). V. L. Deringer, D. M. Proserpio, G. Csányi, and C. J. Pickard,“Data-driven learning and prediction of inorganic crystal struc-tures,” Faraday Discuss. , 45–59 (2018).
M. K. Bisbo and B. Hammer, “Efficient Global Structure Opti-mization with a Machine-Learned Surrogate Model,” Phys. Rev.Lett. , 086102 (2020).
M. Todorović, M. U. Gutmann, J. Corander, and P. Rinke,“Bayesian inference of atomistic structure in functional mate-rials,” npj Comput. Mater. , 35 (2019). L. Hörmann, A. Jeindl, A. T. Egger, M. Scherbela, andO. T. Hofmann, “SAMPLE: Surface structure search enabled bycoarse graining and statistical learning,” Comput. Phys. Com-mun. , 143–155 (2019).
B. Sanchez-Lengeling and A. Aspuru-Guzik, “Inverse molecu-lar design using machine learning:Generative models for matterengineering,” Science , 360–365 (2018).
D. Schwalbe-Koda and R. Gómez-Bombarelli, “Generative mod-els for automatic chemical design,” in
Machine Learning MeetsQuantum Physics (Springer, 2020) pp. 445–467.
R. Gómez-Bombarelli, J. N. Wei, D. Duvenaud, J. M. Hernández-Lobato, B. Sánchez-Lengeling, D. Sheberla,J. Aguilera-Iparraguirre, T. D. Hirzel, R. P. Adams, andA. Aspuru-Guzik, “Automatic chemical design using a data-driven continuous representation of molecules,” ACS Cent. Sci. , 268–276 (2018). Q. Liu, M. Allamanis, M. Brockschmidt, and A. Gaunt, “Con-strained graph variational autoencoders for molecule design,” in
Advances in Neural Information Processing Systems (2018) pp.7795–7804.
E. Putin, A. Asadulaev, Y. Ivanenkov, V. Aladinskiy,B. Sanchez-Lengeling, A. Aspuru-Guzik, and A. Zhavoronkov,“Reinforced Adversarial Neural Computer for de Novo Molecu-lar Design,” J. Chem. Inf. Model. , 1194–1204 (2018). M. Popova, O. Isayev, and A. Tropsha, “Deep reinforcementlearning for de novo drug design,” Sci. Adv. , eaap7885 (2018). M. J. Kusner, B. Paige, and J. M. Hernández-Lobato, “Gram-mar variational autoencoder,” arXiv:1703.01925 (2017).
Z. Zhou, S. Kearnes, L. Li, R. N. Zare, and P. Riley, “Optimiza-tion of Molecules via Deep Reinforcement Learning,” Sci. Rep. , 10752 (2019). E. Mansimov, O. Mahmood, S. Kang, and K. Cho, “Molecu-lar geometry prediction using a deep generative graph neuralnetwork,” Sci. Rep. , 1–13 (2019). J. Köhler, L. Klein, and F. Noé, “Equivariant flows: samplingconfigurations for multi-body systems with symmetric energies,”in
Proceedings of the 37th International Conference on MachineLearning (2019).
N. Gebauer, M. Gastegger, and K. Schütt, “Symmetry-adaptedgeneration of 3d point sets for the targeted discovery ofmolecules,” Advances in Neural Information Processing Systems , 7566–7578 (2019). G. N. C. Simm, R. Pinsler, G. Csányi, and J. M. Hernández-Lobato, “Symmetry-Aware Actor-Critic for 3D Molecular De-sign,” arXiv:2011.12747 (2020).
O. A. von Lilienfeld, R. D. Lins, and U. Rothlisberger, “Varia-tional particle number approach for rational compound design,”Phys. Rev. Lett. , 153002 (2005). O. A. Von Lilienfeld and M. Tuckerman, “Alchemical varia-tions of intermolecular energies according to molecular grand-canonical ensemble density functional theory,” J. Chem. TheoryComput. , 1083–1090 (2007). D. Sheppard, G. Henkelman, and O. A. von Lilienfeld, “Alchem-ical derivatives of reaction energetics,” J. Chem. Phys. ,084104 (2010).
F. A. Faber, A. S. Christensen, B. Huang, and O. A. Von Lilien-feld, “Alchemical and structural distribution based representa-tion for universal quantum machine learning,” J. Chem. Phys. , 241717 (2018).
S. De, A. P. Bartók, G. Csányi, and M. Ceriotti, “Comparingmolecules and solids across structural and alchemical space,”Phys. Chem. Chem. Phys. , 13754–13769 (2016). K. T. Schütt, F. Arbabzadah, S. Chmiela, K. R. Müller, andA. Tkatchenko, “Quantum-chemical insights from deep tensorneural networks,” Nat. Commun. , 1609.08259 (2017). J. Behler, “Perspective: Machine learning potentials for atom-istic simulations,” J. Chem. Phys. , 170901 (2016).
V. Botu, R. Batra, J. Chapman, and R. Ramprasad, “Machinelearning force fields: Construction, validation, and outlook,” J.Phys. Chem. C , 511–522 (2017).
V. L. Deringer, N. Bernstein, G. Csányi, C. B. Mahmoud,M. Ceriotti, M. Wilson, D. A. Drabold, and S. R. Elliott., “Ori-gins of structural and electronic transitions in disordered sili-con,” Nature , 59–64 (2021).
B. Jiang, J. Li, and H. Guo, “High-Fidelity Potential EnergySurfaces for Gas Phase and Gas-Surface Scattering Processesfrom Machine Learning,” J. Phys. Chem. Lett. , 5120–5131(2020). Z. Li, J. R. Kermode, and A. De Vita, “Molecular dynamicswith on-the-fly machine learning of quantum-mechanical forces,”Phys. Rev. Lett. , 096405 (2015).
V. Botu and R. Ramprasad, “Learning scheme to predict atomicforces and accelerate materials simulations,” Phys. Rev. B ,094306 (2015). J. Behler, “Constructing high-dimensional neural network po-tentials: A tutorial review,” Int. J. Quantum Chem. , 1032–1050 (2015).
M. Gastegger, J. Behler, and P. Marquetand, “Machine learn-ing molecular dynamics for the simulation of infrared spectra,”Chem. Sci. , 6924–6935 (2017). A. V. Akimov, “A simple phase correction makes a big differencein nonadiabatic molecular dynamics,” J. Phys. Chem. Lett. ,6096–6102 (2018). S. Chmiela, A. Tkatchenko, H. E. Sauceda, I. Poltavsky, K. T.Schütt, and K.-R. Müller, “Machine learning of accurate energy-conserving molecular force fields,” Sci. Adv. , e1603015 (2017). S. Chmiela, H. E. Sauceda, K.-R. Müller, and A. Tkatchenko,“Towards exact molecular dynamics simulations with machine-learned force fields,” Nat. Commun. , 3887 (2018). F. Noé, G. De Fabritiis, and C. Clementi, “Machine learning forprotein folding and dynamics,” Curr. Opin. Struct. Biol. , 77– 84 (2020). M. A. Balsera, W. Wriggers, Y. Oono, and K. Schulten, “Prin-cipal component analysis and long time protein dynamics,” J.Phys. Chem. , 2567–2572 (1996).
B. Schölkopf, A. Smola, and K.-R. Müller, “Nonlinear compo-nent analysis as a kernel eigenvalue problem,” Neural Comput. , 1299–1319 (1998). W. W. Zhang Z., “Coarse-graining protein structures with localmultivariate features from molecular dynamics,” J. Phys. Chem.B , 44 (2008).
O. F. Lange and H. Grubmüller, “Full correlation analysis ofconformational protein dynamics,” Proteins: Struct., Funct.,Bioinf. , 1294–1312 (2008). J. Preto and C. Clementi, “Fast recovery of free energy land-scapes via diffusion-map-directed molecular dynamics,” Phys.Chem. Chem. Phys. , 19181–19191 (2014). W. Zheng, M. A. Rohrdanz, and C. Clementi, “Rapid explo-ration of configuration space with diffusion-map-directed molec-ular dynamics,” J. Phys. Chem. B , 12769–12776 (2013).
A. Mardt, L. Pasquali, H. Wu, and F. Noé, “VAMPnets for deeplearning of molecular kinetics,” Nat. Commun. , 5 (2018). F. Noé and E. Rosta, “Markov Models of Molecular Kinetics,”J. Chem. Phys. , 190401 (2019).
W. Chen, A. R. Tan, and A. L. Ferguson, “Collective vari-able discovery and enhanced sampling using autoencoders: In-novations in network architecture and error function design,” J.Chem. Phys. , 072312 (2018).
J. M. L. Ribeiro, P. Bravo, Y. Wang, and P. Tiwary,“Reweighted autoencoded variational Bayes for enhanced sam-pling (RAVE),” J. Chem. Phys. , 072301 (2018).
R. R. Coifman and S. Lafon, “Diffusion maps,” Appl. Comput.Harmon. Anal. , 5–30 (2006). T. Lemke and C. Peter, “Neural Network Based Prediction ofConformational Free Energies - A New Route toward Coarse-Grained Simulation Models,” J. Chem. Theory Comput. ,6213–6221 (2017). L. Zhang, J. Han, H. Wang, R. Car, and W. E. Weinan,“DeePCG: Constructing coarse-grained models via deep neuralnetworks,” J. Chem. Phys. , 034101 (2018).
J. Wang, S. Chmiela, K.-R. Müller, and C. C. Frank Noé,“Ensemble learning of coarse-grained molecular dynamics forcefields with a kernel approach,” J. Chem. Phys. , 194106(2020).
S. T. John and G. Csányi, “Many-Body Coarse-Grained In-teractions Using Gaussian Approximation Potentials,” J. Phys.Chem. B , 10934–10949 (2017).
M. Barbatti, “Nonadiabatic dynamics with trajectory surfacehopping method,” Wiley Interdiscip. Rev. Comput. Mol. Sci. ,620–633 (2011). L. González and R. Lindh, eds.,
Quantum Chemistry and Dy- namics of Excited States: Methods and Applications (John Wi-ley & Sons, 2020). S. Mai and L. González, “Molecular photochemistry: Recent de-velopments in theory,” Ang. Chem. Int. Ed. , 16832–16846(2020). B. Smith and A. V. Akimov, “Modeling nonadiabatic dynam-ics in condensed matter materials: Some recent advances andapplications,” J. Phys. Cond. Matter , 073001 (2020). J. Li, P. Reiser, A. Eberhard, P. Friederich, and S. Lopez,“Nanosecond photodynamics simulations of a cis-trans iso-merization are enabled by machine learning,” ChemRxiv ,DOI:10.26434/chemrxiv.13047863.v1 (2020).
C. Carbogno, J. Behler, K. Reuter, and A. Groß, “Signatures ofnonadiabatic O dissociation at Al(111): First-principles fewest-switches study,” Phys. Rev. B , 035410 (2010). Y. Zhang, R. J. Maurer, and B. Jiang, “Symmetry-AdaptedHigh Dimensional Neural Network Representation of ElectronicFriction Tensor of Adsorbates on Metals,” J. Phys. Chem. C , 186–195 (2020).
Y. Zhang, R. J. Maurer, H. Guo, and B. Jiang, “Hot-electroneffects during reactive scattering of H from Ag(111): the inter-play between mode-specific electronic friction and the potentialenergy landscape,” Chem. Sci. , 1089–1097 (2019). C. L. Box, Y. Zhang, R. Yin, B. Jiang, and R. J. Maurer, “De-termining the effect of hot electron dissipation on molecularscattering experiments at metal surfaces,” JACS Au in press (2020), 10.1021/jacsau.0c00066.
D. R. Yarkony, “Nonadiabatic quantum chemistry - past,present, and future,” Chem. Rev. , 481–498 (2012).
H. Köppel, W. Domcke, and L. S. Cederbaum, in: Conical Inter-sections (W. Domcke, D. R. Yarkony, H. Köppel, Eds.) (WorldScientific, New York, 2004).
Y. Shu and D. G. Truhlar, “Diabatization by Machine Intelli-gence,” J. Chem. Theory Comput. , 1 (2020). B. Jiang, J. Li, and H. Guo, “Potential energy surfaces from highfidelity fitting of
Ab Initio points: The permutation invariantpolynomial - neural network approach,” Int. Rev. Phys. Chem. , 479–506 (2016). T. Lenzen and U. Manthe, “Neural network based coupled dia-batic potential energy surfaces for reactive scattering,” J. Chem.Phys. , 084105 (2017).
D. M. G. Williams and W. Eisfeld, “Neural Network Diabati-zation: A New Ansatz for Accurate High-Dimensional CoupledPotential Energy Surfaces,” J. Chem. Phys. , 204106 (2018).
C. Xie, X. Zhu, D. R. Yarkony, and H. Guo, “Permutation in-variant polynomial neural network approach to fitting potentialenergy surfaces. IV. coupled diabatic potential energy matrices,”J. Chem. Phys. , 144107 (2018).
D. M. G. Williams and W. Eisfeld, “Complete nuclearpermutation inversion invariant artificial neural network(cnpi-ann) diabatization for the accurate treatment of vi-bronic coupling problems,” J. Phys. Chem. A in press ,DOI:10.1021/acs.jpca.0c05991 (2020).
G. W. Richings and S. Habershon, “Direct grid-based quantumdynamics on propagated diabatic potential energy surfaces,”Chem. Phys. Lett. , 228 – 233 (2017).
G. W. Richings and S. Habershon, “MCTDH on-the-fly: Effi-cient grid-based quantum dynamics without pre-computed po-tential energy surfaces,” J. Chem. Phys. , 134116 (2018).
G. W. Richings, C. Robertson, and S. Habershon, “Improved on-the-fly MCTDH simulations with many-body-potential tensordecomposition and projection diabatization,” J. Chem. TheoryComput. , 857–870 (2019). G. W. Richings and S. Habershon, “A new diabatization schemefor direct quantum dynamics: Procrustes diabatization,” J.Chem. Phys. , 154108 (2020).
G. W. Richings, C. Robertson, and S. Habershon, “Can we useon-the-fly quantum simulations to connect molecular structureand sunscreen action?” Faraday Discuss. , 476–493 (2019).
A. Jasinski, J. Montaner, R. C. Forrey, B. H. Yang, P. C. Stan- cil, N. Balakrishnan, J. Dai, R. A. Vargas-Hernández, and R. V.Krems, “Machine learning corrected quantum dynamics calcu-lations,” Phys. Rev. Research , 3 (2020). F. Brieuc, C. Schran, F. Uhl, H. Forbert, and D. Marx, “Con-verged quantum simulations of reactive solutes in superfluid he-lium: The bochum perspective,” J. Chem. Phys. , 210901(2020).
N. Raimbault, A. Grisafi, M. Ceriotti, and M. Rossi, “Usinggaussian process regression to simulate the vibrational ramanspectra of molecular crystals,” New J. Phys. , 105001 (2019). G. M. Sommers, M. F. C. Andrade, L. Zhang, H. Wang, andR. Car, “Raman spectrum and polarizability of liquid water fromdeep neural networks,” Phys. Chem. Chem. Phys. , 10592–10602 (2020). F. M. Paruzzo, A. Hofstetter, F. Musil, S. De, M. Ceriotti,and L. Emsley, “Chemical shifts in molecular solids by machinelearning,” Nat. Commun. , 1–10 (2018). A. S. Christensen, F. A. Faber, and O. A. Von Lilienfeld, “Op-erators in quantum machine learning: Response properties inchemical space,” J. Chem. Phys. , 064105 (2019).
J. A. Fine, A. A. Rajasekar, K. P. Jethava, and G. Chopra,“Spectral deep learning for prediction and prospective validationof functional groups,” Chem. Sci. , 4618–4630 (2020). S. Kiyohara, T. Miyata, K. Tsuda, and T. Mizoguchi, “Data-driven approach for the prediction and interpretation of core-electron loss spectroscopy,” Sci. Rep. , 1–12 (2018). C. Cobas, “Nmr signal processing, prediction, and structure veri-fication with machine learning techniques,” Magn. Reson. Chem. , 512–519 (2020). V. Tshitoyan, J. Dagdelen, L. Weston, A. Dunn, Z. Rong,O. Kononova, K. A. Persson, G. Ceder, and A. Jain, “Unsuper-vised word embeddings capture latent knowledge from materialsscience literature,” Nature , 95–98 (2019).
P. Raccuglia, K. C. Elbert, P. D. Adler, C. Falk, M. B. Wenny,A. Mollo, M. Zeller, S. A. Friedler, J. Schrier, and A. J. Norquist,“Machine-learning-assisted materials discovery using failed ex-periments,” Nature , 73–76 (2016).
K. McCullough, T. Williams, K. Mingle, P. Jamshidi, andJ. Lauterbach, “High-throughput experimentation meets arti-ficial intelligence: A new pathway to catalyst discovery,” Phys.Chem. Chem. Phys. , 11174–11196 (2020). J. L. Melville, E. K. Burke, and J. D. Hirst, “Machine learningin virtual screening,” Comb. Chem. High Throughput Screening , 332–343 (2009). K. Terayama, K. Terayama, K. Terayama, K. Terayama,M. Sumita, M. Sumita, R. Tamura, R. Tamura, R. Tamura,D. T. Payne, M. K. Chahal, S. Ishihara, K. Tsuda, K. Tsuda,and K. Tsuda, “Pushing property limits in materials discovery:Via boundless objective-free exploration,” Chem. Sci. , 5959–5968 (2020). S. Ekins, A. C. Puhl, K. M. Zorn, T. R. Lane, D. P. Russo,J. J. Klein, A. J. Hickey, and A. M. Clark, “Exploiting machinelearning for end-to-end drug discovery and development,” Nat.Mater. , 435 (2019). B. Meyer, B. Sawatlon, S. Heinen, O. A. Von Lilienfeld, andC. Corminboeuf, “Machine learning meets volcano plots: Com-putational discovery of cross-coupling catalysts,” Chem. Sci. ,7069–7077 (2018). J. I. Gómez-Peralta and X. Bokhimi, “Discovering new per-ovskites with artificial intelligence,” J. Solid State Chem. ,121253 (2020).
P. B. Jørgensen, M. Mesta, S. Shil, J. M. García Lastra,K. W. Jacobsen, K. S. Thygesen, and M. N. Schmidt, “Machinelearning-based screening of complex molecules for polymer solarcells,” J. Chem. Phys. , 241735 (2018).
P. C. St John, C. Phillips, T. W. Kemper, A. N. Wilson,Y. Guan, M. F. Crowley, M. R. Nimlos, and R. E. Larsen,“Message-passing neural networks for high-throughput polymerscreening,” J. Chem. Phys. , 234111 (2019).
C. M. Dobson, “Chemical space and biology,” Nature , 824–
828 (2004).
T. Weymuth and M. Reiher, “Inverse quantum chemistry: Con-cepts and strategies for rational compound design,” Int. J. Quan-tum Chem. , 823–837 (2014).
A. Zunger, “Inverse design in search of materials with targetfunctionalities,” Nat. Rev. Chem. , 1–16 (2018). E. V. Podryabinkin, E. V. Tikhonov, A. V. Shapeev, and A. R.Oganov, “Accelerating crystal structure prediction by machine-learning interatomic potentials with active learning,” Phys. Rev.B , 064114 (2019). J. Noh, G. H. Gu, S. Kim, and Y. Jung, “Machine-enabledinverse design of inorganic solid materials: promises and chal- lenges,” Chem. Sci. , 4871–4881 (2020). A. Ambrosetti, N. Ferri, R. A. DiStasio, and A. Tkatchenko,“Wavelike charge density fluctuations and van der waals inter-actions at the nanoscale,” Science , 1171–1176 (2016).
M. Wilkinson et al., “The fair guiding principles for scientificdata management and stewardship,” Sci. Data , 160018 (2016). A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards,S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, and K. A.Persson, “The Materials Project: A materials genome approachto accelerating materials innovation,” APL Mater.1