[PDF] An Intermediate Level of Abstraction for Computational Systems Chemistry

Abstract

Computational techniques are required for narrowing down the vast space of possibilities to plausible prebiotic scenarios, since precise information on the molecular composition, the dominant reaction chemistry, and the conditions for that era are scarce. The exploration of large chemical reaction networks is a central aspect in this endeavour. While quantum chemical methods can accurately predict the structures and reactivities of small molecules, they are not efficient enough to cope with large-scale reaction systems. The formalization of chemical reactions as graph grammars provides a generative system, well grounded in category theory, at the right level of abstraction for the analysis of large and complex reaction networks. An extension of the basic formalism into the realm of integer hyperflows allows for the identification of complex reaction patterns, such as auto-catalysis, in large reaction networks using optimization techniques.

Full PDF

AAn Intermediate Level of Abstraction forComputational Systems Chemistry

Jakob L. Andersen , , Christoph Flamm , ,Daniel Merkle ∗ , Peter F. Stadler − Earth-Life Science Institute, Tokyo Institute of TechnologyTokyo 152-8550, Japan Department of Mathematics and Computer ScienceUniversity of Southern Denmark, Denmark Research Network Chemistry Meets MicrobiologyUniversity of Vienna, Wien A-1090, Austria Institute for Theoretical ChemistryUniversity of Vienna, Austria Department of Computer ScienceUniversity of Leipzig, Germany Max Planck Institute for Mathematics in the SciencesLeipzig, Germany Fraunhofer Institute for Cell Therapy and ImmunologyLeipzig, Germany Center for non-coding RNA in Technology and HealthUniversity of Copenhagen, Denmark Santa Fe Institute, Santa Fe, USA

Abstract

Computational techniques are required for narrowing down the vastspace of possibilities to plausible prebiotic scenarios, since precise infor-mation on the molecular composition, the dominant reaction chemistry, ∗ Email: [email protected] a r X i v : . [ q - b i o . M N ] J a n nd the conditions for that era are scarce. The exploration of large chemi-cal reaction networks is a central aspect in this endeavour. While quantumchemical methods can accurately predict the structures and reactivitiesof small molecules, they are not eﬃcient enough to cope with large-scalereaction systems. The formalization of chemical reactions as graph gram-mars provides a generative system, well grounded in category theory, atthe right level of abstraction for the analysis of large and complex re-action networks. An extension of the basic formalism into the realm ofinteger hyperﬂows allows for the identiﬁcation of complex reaction pat-terns, such as auto-catalysis, in large reaction networks using optimizationtechniques. From a fundamental physics point of view chemical systems, or more preciselymolecules and their reactions, are just time dependent multi-particle quantumsystems, completely described by the fundamental principles of quantum ﬁeldtheory (QFT) [38]. At this level of description almost all questions of interestto a chemist are not tractable in practice, however. A hierarchy of approxima-tions and simpliﬁcations is employed therefore to reach models of more practicalvalue. These are guided at least in part by conceptual notions that distinguishchemistry from other quantum systems. Among these are constraints such asthe immutability of atomic nuclei and the idea that chemical reactions compriseonly a redistribution of electrons. On the formal side the Born-Oppenheimerapproximation [7] stipulates a complete separation of the wave function of nucleiand electrons and leads to the concept of the potential energy surface (PES)that explains molecular geometries and provides a consistent – if not completelyaccurate – view of chemical reactions as classical paths of nuclear coordinateson the PES. The PES itself is the result of solving the Schr¨odinger Equationwith nuclear coordinates and charges as parameters [32, 21]. Quantum chem-istry (QC) has developed a plethora of computational schemes for this purpose,typically trading oﬀ accuracy for computational resource consumption. Amongthem in particular are the so-called semi-empirical methods that use the factthe chemical bonds are usually formed by pairs of electrons to decompose theelectronic wave function into contributions of electron pairs.Molecular modelling (MM) and molecular dynamics (MD) [30, 9] abandonquantum mechanics altogether and instead treat chemical bond akin to classi-cal springs. Sacriﬁcing accuracy, MM and MD can treat macro-molecules andsupra-molecular complexes that are well outside the reach of exact and evensemi-empirical quantum-chemical methods. For special classes of molecules,even coarser approximations have been developed. Many properties of aromaticring systems, for instance, can be explained in terms of graph-theoretical modelsknown as H¨uckel theory [26, 24]. For nucleic acids, on the other hand, modelshave been developed that aggregate molecular building blocks (nucleotides) intoelementary objects so that Watson-Crick base pairs become edges in the graphrepresentation [41]. 2 common theme in the construction of coarser approximations is that moreand more external information needs to be supplied to the model. While QFTdoes not require more than a few fundamental constants, nuclear masses andcharges are given in practical QC computations. Semi-empirical methods, inaddition, require empirically determined parameters for electron correlation ef-fects. MM and MD models use extensive tables of parameters that specifyproperties of localised bonds as a function of bond type and incident atoms.Similarly, RNA folding depends on a plethora of empirically determined energycontributions for base pair stacking and loop regions [36]. A second feature ofcoarse-grained methods is that they are specialised to answer diﬀerent typesof questions, or that diﬀerent classes of systems resort to diﬀerent, mutuallyincompatible approximations.This well-established hierarchy of internally consistent models of molecu-lar structures is in stark contrast to our present capability to model chemicalreactions. While transition state theory [17] does provide a means to infer reac-tion rate constants from PES, it requires the prior knowledge of the educt andproduct states.A systematic investigation of large chemical reaction systems requires thedevelopment of a theoretical framework that is suﬃciently coarse-grained to becomputationally tractable. Any such model must satisfy consistency conditionsthat are inherited from the underlying physics. A recent study [33] on complexchemical reaction network supports this view and concludes, that on the struc-ture and reactivity level of small molecules eﬃcent QM based computationalapproaches exist, but on the large-scale network level heuristic approaches areindispensable. Here we argue that chemistry oﬀers a coarse-grained level ofdescription that allows the construction of mathematically sound and consis-tent formal models that, nevertheless, are conceptually and structurally diﬀer-ent from the formalism of quantum chemistry. Much of chemistry is taughtin terms of abstracted molecular structures and rules (named reactions) thattransform molecular graphs into each other. In the following section we showhow this level of modelling can indeed be made mathematically precise and howit can accommodate key concepts of chemistry such as transition state theoryand fundamental conservation laws inherited from the underlying physics.

The starting point of an inherently discrete model of chemistry is a simpliﬁed,graphical representation of molecules. This coarse-graining maps the atoms andbonds of a molecule to vertices and edges of the corresponding graph. All typeinformation (atom types C, H, N, O, etc. and bond multiplicity single, double,triple bond) are mapped to labels on the respective vertices or edges of thegraph. In this setting all information on properties which are tied to the three-dimensional space, such as chirality, and cis/trans isomerism of double bonds isneglected.It is possible, however, to extend the model to retain the local geometric3 H NH CH OH CNCCCOCCCC C C HH HHH H HHH HHHH H H ------=-=- = - -- --- -------- - - L ← K → R Potential energy surfaceReaction coordinateGraph grammar

Figure 1: Levels of abstraction for computational approaches in chemistry.Shown is the hierarchy of approximations from quantum mechanics on the topto graph grammars on the bottom. The coarse-graining via the introduction ofconstraints (such as the Born-Oppenheimer approximation, or reducing coordi-nates of spatial objects to neighbourhood relations on graphs) is accompaniedby a dramatic speedup in computation time.information and thus capture the part of stereochemistry which is tied to stere-ogenic centers. Helicity, for instance, can not be expressed by the extendedmodel since this property is generated by an extended spacial arrangementaround an axis and not a single point or center. The basis for the extendedmodel is the valence-shell electron-pair repulsion (VSEPR) model, which albeitcomprising a set of simple rules, has a ﬁrm grounding in quantum chemicalmodelling [19]. VSEPR theory determines approximate bond angles aroundan atom depending on the incident bond types, i.e., in terms of informationconveyed by the labelled graph representation. Stereochemical information in-volving chiral centres as well as cis/trans isomerism thus can be encoded simplyin the order in which bonds are listed, and augmenting the labelled graph withlocal permutation groups.Still certain aspects of chemistry cannot be described in this form. Forinstance, the concept of bonds as edges fails in multi-centre bonds since threeor more atoms share a pair of bonding electrons. These are frequently observed4n boranes or organometallic compounds such as ferrocenes. Non-local chirality,found for instance in helical molecules does not rely on local, atom-centredsymmetries and thus is not captured by local orientation information.With molecules represented as graphs, the mechanism of a chemical reac-tion is naturally expressed as a graph transformation rule. Graph transforma-tion thus retains the semantics familiar from organic chemistry textbooks. Asa research discipline in computer science, graph transformation dates back tothe 1970s. Graph transformation has been studied extensively in the context offormal language theory, pattern recognition, software engineering, concurrencytheory, compiler construction, veriﬁcation among other ﬁelds in computer sci-ence [34]. Several formalisms have been developed in order to formalize andimplement the process of transforming graphs. Algebraic approaches are ofparticular interest for modelling chemistry, where multiple variations based oncategory theory exist. For example, diﬀerent semantics can be expressed usingeither the Single Pushout (SPO) approach, the more restrictive Double Pushout(DPO) approach, or the recently developed Sesqui-Pushout (SqPO) approach[14].In the context of chemical reactions, DPO graph transformation is the for-malism of choice because it facilitates the construction of transformation rulesthat are chemical in nature. DPO guarantees that all chemical reactions arereversible [4]. The conservation of atoms translates to a simple formal condition(formally, the graph morphisms relating the context to the left hand and righthand side of a rule must be bijections for vertices). In turn this requirementguarantees the existence of well-deﬁned atom maps.Figure 2 illustrates how the Meisenheimer rearrangement [31], a temperatureinduced rearrangement of aliphatic amine oxides into N-alkoxylamines, trans-lates to the DPO formalism. The reaction transforms educt graph G (amineoxide) into product graph H (N-alcoxylamine). All arrows in the diagram aremorphisms. The reaction centre, i.e., the subset of atoms and bonds of thereactant molecules directly involved in the bond breaking/forming steps of thechemical reaction, is expressed as a graph transformation rule. The informationof how to change the connectivity and the charges of the atoms is speciﬁed bythree graphs ( L, K, R ). The left graph L (resp. right graph R ) expresses thelocal state of molecules before (resp. after) applying the reaction rule. The con-text graph K encodes the invariant part of the reaction centre and mathematicalrelates L and R to each other. The left graph L is thus the precondition for ap-plication of the rule (i.e., it can only be applied if there exists a subgraph match m that embeds L in the host graph G ; see the red and blue highlighted part ofgraph G in Figure 2). In this case L can be replaced by the right graph R (seethe green and blue highlighted part of graph H in Figure 2), which transformsthe educt graph G into the product graph H .A computationally very demanding step when performing graph transforma-tion (e.g., for generating large chemical spaces) is the enumeration of subgraphmatches. Deciding if a single subgraph match exists in a host graph is knownto be an N P -complete problem [11, 18]. Better theoretical results exists forcertain classes of graphs, e.g., for the so-called partial k -trees of bounded degree5 − N + CH CH CH C CH C O CH OCCHCCH CH G ONCH CH CH C CH C O CH OCCHCCH CH D ONCH CH CH C CH C O CH OCCHCCH CH H O − N + C CC L ON C CC K ON C CC

Rm d m g hl r Figure 2: Double-pushout (DPO) representation of the application of a graphtransformation rule. The actual reaction (the top span L ← K → R ) isthe Meisenheimer rearrangement which transforms educt graph G into prod-uct graph H . All arrows in the diagram are morphisms, i.e., functions whichmap vertices/edges from the graph on the arrow tail to the graph at arrow head.In order to be a valid transformation the two squares of the diagram must formso-called pushouts .(to which almost all molecule graphs belong [40, 25, 1]) where the subgraphmatching problem can be solved in polynomial time [29, 15]. In practice it ishowever faster to use simpler algorithms, e.g., VF2 [13, 12].Chemical reactions are often compositions of elementary reactions. In thelatter, the reaction centre can always be expressed as a cycle[22, 23], with an evennumber of vertices for homovalent reactions and an odd number from ambiva-lent reactions, better known as redox reactions. Graph transformations have anatural mechanism for rule composition that allows the expression of multi-stepreactions (e.g., enzyme-mediated reactions or even complete metabolic path-ways) as compositions of elementary transformation rules. The properties ofelementary rules in terms of mass conservation or atom-to-atom mapping nicelycarry over to the composed “overall transformation rules”. Since the action ofchemical reactions is to redistribute atoms along complex reaction sequences,rule composition can be used to study the trace of individual atoms along thesereaction sequences in a chemically as well as mathematically correct fashion.Rule composition can be completely automatized and thus opens the possibilityfor model reduction (see [5] for further details). We illustrate rule compositionin the context of prebiotic chemistry in Figure 3. Classical synthetic chemistry traditionally has been concerned with the step-wise application of chemical reactions in carefully crafted synthesis plans. Inliving organisms, in contrast, complex networks of intertwined reactions are6ctive concurrently. These intricate reaction webs harbour complex reactionpatterns such as branch points, autocatalytic cycles, and interferences betweenreaction sequences. The emerging ﬁeld of Systems Chemistry has set out toleverage the systemic, network-centred view as a framework also for syntheticchemistry. Consequently, large-scale chemical networks are no longer just asubject of analysis in the context of understanding the working of a living cell’smetabolism, but are becoming a prerequisite to understanding the possibilitieswithin chemical spaces, i.e., the universe of chemical compounds and the possi-ble chemical reactions connecting them. The formulation of a predictive theoryof chemical space requires it to be rooted in a strict mathematical formaliza-tion and abstraction of the overwhelming amount of anecdotal knowledge, thathas been collected on the single reaction and functional subnetwork level, intogeneralizing principles.Graph transformation systems, as discussed earlier, provide the basis forsuch a formalism that allow for a systematic and step-wise construction of ar-bitrary chemical spaces. A chemical system is then speciﬁed as a formal graphgrammar that encapsulate a set of transformation rules, encoding the reactionchemistry, together with a set of molecules which provide the starting points forrule application. The iteration of the graph grammar yields reaction networksin the form of directed hypergraphs as explicit instantiations of the chemicalspace. Usually a simple iterative expansion of the chemical space leads to acombinatorial explosion in the number of novel molecules. Therefore a sophis-ticated strategy framework for the targeted exploration of the parts of interestof the chemical space has been developed [6]. Such strategies are indispensableif for example polymerization/cyclization spaces are the subject of investiga-tion. These type of spaces are for example found in the important naturalproduct classes of polyketides and terpenes, and in prebiotic HCN chemistry[3]. The strategy framework allows the guidance of chemical space explorationnot only using physico-chemical properties of the generated molecules, but alsousing experimental data such as mass spectra. Importantly, the hypergraphs(reaction networks) are generated automatically annotated with atom-to-atommaps, as deﬁned implicitly in the underlying graph grammar. For large andcomplex reaction networks it is thus possible to construct atom ﬂow networksin an automated fashion, even including corrections for molecule and subnetworksymmetries, as required for the interpretation of isotope labelling experiments[8]. The origin of life can be viewed as an intricate process which has been shapedby external constraints provided by early Earth’s environment and intrinsic con-straints stemming from reaction chemistry itself. Higher order chemical trans-formation motifs, such as network auto-catalysis, are believed to have played akey role in the ampliﬁcation of the building blocks of life [27, 28, 37]. A combi-nation of the constructive graph grammar approach with techniques from com-binatorial optimization sets the proper formal stage for attacking some of theseorigin of life related questions. The key idea here is to rephrase the topologicalrequirements for a particular chemical behaviour, e.g., network auto-catalysis, asan optimization problem on the underlying reaction network (hypergraph). An7xample of this is the enumeration of pathways with speciﬁc properties, whichcan be formally modelled as a constrained hyperﬂow problem. Many of theseproblems are theoretically computationally hard [2], though in practice methodssuch as Integer Linear Programming (ILP) can be successfully used to identifysuch transformation motifs in arbitrary chemical spaces. The enumeration oftransformation motifs is the ﬁrst step in computer-assisted large-scale analysisof reaction networks. Other mathematical formalisms, such as Petri nets, andin general concurrency theory, can subsequently be used to model properties ofchemical systems on an even higher-level. Complicated chemical spaces, suchas the one formed by the formose process [10], can thus be dismantled intocoupled functional modules, advancing the understanding of how a particularreaction chemistry induces speciﬁed behaviour on the reaction network level.More generally speaking and emphasizing the need for new approaches, it iswell foreseeable that the future of chemistry is strongly bundled with a deeperunderstanding of complex chemical systems [39, 20], and the necessary skills toanalyze such systems will become more and more important.

Narrowing down potential pathways for prebiotic scenarios indispensably re-quires novel systemic approaches that allow for the investigation of large chem-ical reaction systems. While the development of mathematically well groundedmethods for abstraction and coarse-graining of (concurrent) systems is a veryactive research area in computer science, the interdisciplinary endeavor to in-tegrate these approaches with chemistry is more often treated as a conceptualpossibility rather than as a predictive approach. Many of the problems to besolved in this process are computationally hard (i.e.,

N P -hard) but still allowfor practical in silico solutions. This discrepancy led to a relatively new andsuccessful subﬁeld in computer science called algorithmic engineering [35], inwhich one of the goals is to bridge the gap between theoretical results and prac-tical solutions to hard problems. Clearly, results from that ﬁeld should be takeninto account when large chemical systems with a plethora of underlying hardproblems have to be solved.As an illustration of the integrative potential we sketch an example in Fig-ure 3 (see http://mod.imada.sdu.dk for further examples). It shows how graphtransformation based chemical space exploration (rooted in graph theory, cat-egory theory, and concurrency theory) with subsequent solution enumeration(using diverse optimization techniques) can be applied to a reaction schema pre-sented in [16], in which Eschenmoser describes how aldehydes act as catalysts forthe hydrolysis of CN groups of the HCN-tetramer. Given a set of chemical reac-tions p , . . . , p (encoded as graph grammar rules) and a set of initial molecules,the iterative application of these rules (potentially with an underlying strategyfor the space expansion) leads to a chemical space encoded as hypergraph. Thishypergraph is the source for solving the subsequent problem of inferring andenumerating declaratively deﬁned reaction motifs or pathways. In Figure 3 we8 OOH

GLX

N NH NH N DAMN p OH O OHN NHNH N OOOH NHNHNH NOOOH NH N NH N p H OOHO OOHNH NHNH N p O OOH

GLX

NNH NH NH O p p CHO NH CHO NH NCCN C O HNCCN C O HHNC O CNHNC O CNNCOH HNCOH HCHO NH CHO NH CHO NH CHO NH p CHO NHNC C CHO NHNC C p • p CHO NHNC CH CHO NHNC CH p • p • p CHO NHNC CH OH H CHO NHNC CH OHH p • p • p • p CHO NHNC CH HOH CHO NHNC CH HOH p • p • p • p • p Figure 3: Automatic inference of an overall rule by subsequent composition ofgraph transformation rules. The example is based on a sequence of reactionsfrom [16], in which Eschenmoser describes how aldehydes act as catalysts forthe hydrolysis of CN groups of the HCN-tetramer. The left column depicts themechanism as presented in [16]. An automated approach will, based on graphtransformation rules, ﬁrst generate a (potentially very large) chemical space (notdepicted here). The depicted mechanism is then found as one of the solutions forthe general question of enumerating hydrolysis pathways of HCN-polymers thatuse glyoxylate (GLX: glyoxylate, CID 760) as catalyst. In the depicted pathwaythe tetramer of HCN (DAMN: tetramer of HCN, CID 2723951) is hydrolysed.The middle column depicts the left and right graph of each transformation rule p , . . . , p that models the generalized reactions. The subsequently inferred rules p (top), p • p , . . . , and the overall rule p • p • . . . • p (bottom) are depictedin the right column. Note, that there in general can be several composed overallrules, each expressing diﬀerent atom traces. In this example there is only asingle one. 9o not illustrate the expansion step, the depicted mechanism (left column) ishowever found automatically as one of the potentially many solutions for thegeneral question of enumerating hydrolysis pathways of HCN-polymers that useglyoxylate (GLX: glyoxylate, CID 760) as catalyst. Formally such a solutionis encoded as an integer hyperﬂow within the underlying hypergraph. Giventhe depicted reaction sequence, the possibility for transformation rule composi-tion is utilised. The overall, (more coarse-grained) rule p • p • . . . • p is thusautomatically inferred by consecutive composition of the (simpler) transforma-tion rules p , . . . , p . All intermediate steps are depicted in the right columnof Figure 3. Note, that the automated coarse-graining implemented by rulecomposition allows for keeping track of the possibilities of diﬀerent atom traces,expressed as the atom-atom mapping from educt to product in composed rules.An obvious reachable next step is therefore the analysis as well as the designof isotope labeling experimentation based on the in silico generative chemistryapproach with subsequent trace analysis.Clearly, the illustration in Figure 3 is serving only as an example. Themodelling of essential chemical parameters including kinetic components andthermodynamics are still missing. Nevertheless, the approach is already highlyautomated, and will bring wetlab and in silico experiments closer together. Weargue that the intermediate-level theory outlined here holds promise in manyﬁelds of chemistry. In particular, we suggest that it is a plausible substrate fora predictive theory of prebiotic chemistry. Acknowledgements

This work is supported by the Danish Council for Independent Research, Natu-ral Sciences, the COST Action CM1304 “Emergence and Evolution of ComplexChemical Systems”, and the ELSI Origins Network (EON), which is supportedby a grant from the John Templeton Foundation.The opinions expressed in thispublication are those of the authors and do not necessarily reﬂect the views ofthe John Templeton Foundation.

References [1] Tatsuya Akutsua and Hiroshi Nagamochi. Comparison and enumeration ofchemical graphs.

Comput Struct Biotechnol J , 5:e201302004, 2013.[2] J. L. Andersen, C. Flamm, D. Merkle, and P. F. Stadler. Maximizingoutput and recognizing autocatalysis in chemical reaction networks is NP-complete.

J. Systems Chem. , 3:1, 2012.[3] Jakob L Andersen, Tommy Andersen, Christoph Flamm, Martin Hanczyc,Daniel Merkle, and Peter F Stadler. Navigating the chemical space ofHCN polymerization and hydrolysis: Guiding graph grammars by massspectrometry data.

Entropy , 15:4066–4083, 2013.104] Jakob L Andersen, Christoph Flamm, Daniel Merkle, and Peter F Stadler.Inferring chemical reaction patterns using graph grammar rule composition.

J. Syst. Chem. , 4:4, 2013.[5] Jakob L Andersen, Christoph Flamm, Daniel Merkle, and Peter F Stadler.50 Shades of rule composition: From chemical reactions to higher levels ofabstraction. In Fran¸cois Fages and Carla Carla Piazza, editors,

Proceedingsof the 1st International Conference on Formal Methods in Macro-Biology ,volume 8738 of

LNCS , pages 117–135. Springer-Verlag, Berlin Heidelberg,2014.[6] Jakob L Andersen, Christoph Flamm, Daniel Merkle, and Peter F Stadler.Generic strategies for chemical space exploration.

Int J Comp Biol DrugDes , 7:225–258, 2014.[7] Max Born and J Robert Oppenheimer. Zur Quantentheorie der Molek¨ulen.

Ann. Physik , 389:457–484, 1927.[8] Joerg M Buescher, Maciek R Antoniewicz, Laszlo G Boros, Shawn CBurgess, Henri Brunengraber, Clary B Clish, Ralph J DeBerardinis, OlivierFeron, Christian Frezza, Bart Ghesquiere, Eyal Gottlieb, Karsten Hiller,Russell G Jones, Jurre J Kamphorst, Richard G Kibbey, Alec C Kimmel-man, Jason W Locasale, Sophia Y Lunt, Oliver D K Maddocks, CraigMalloy, Christian M Metallo, Emmanuelle J Meuillet, Joshua Munger,Katharina N¨oh, Joshua D Rabinowitz, Markus Ralser, Uwe Sauer, GregoryStephanopoulos, Julie St-Pierre, Daniel A Tennant, Christoph Wittmann,Matthew G Vander Heiden, Alexei Vazquez, Karen Vousden, Jamey DYoung, Nicola Zamboni, and Sarah-Maria Fendt. A roadmap for inter-preting C metabolite labeling patterns from cells.

Curr Opin Biotechnol ,34:189–201, 2015.[9] U Burkert and NL Allinger.

Molecular Mechanics , volume 177. AmericanChemical Society, Washington, 1982.[10] AM Butlerov. Einiges ¨uber die chemische Structur der K¨orper.

Z Chem ,4:549–560, 1861.[11] Stephen A. Cook. The complexity of theorem-proving procedures. In

Pro-ceedings of the Third Annual ACM Symposium on Theory of Computing ,STOC ’71, pages 151–158, New York, NY, USA, 1971. ACM.[12] L.P. Cordella, P. Foggia, C. Sansone, and M. Vento. A (sub) graph isomor-phism algorithm for matching large graphs.

IEEE Transactions on PatternAnalysis and Machine Intelligence , 26(10):1367, 2004.[13] Luigi Pietro Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento.An improved algorithm for matching large graphs. In

Proc. of the 3 rd IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition ,pages 149–159, 2001. 1114] A Corradini, T Heindel, F Hermann, and B K¨onig. Sesqui-pushout rewrit-ing. In

Graph Transformations, Third International Conference, ICGT2006, Proceedings , volume 4178 of

Lecture Notes in Computer Science ,pages 30–45. Springer, 2006.[15] A Dessmark, A Lingas, and A Proskurowski. Faster algorithms for sub-graph isomorphism of k-connected partial k-trees.

Algorithmica , 27:337–347, 2000.[16] Albert Eschenmoser. On a hypothetical generational relationship betweenHCN and constituents of the reductive citric acid cycle.

Chem. Biodivers. ,4:554–573, 2007.[17] Henry Eyring. The activated complex in chemical reactions.

J. Chem.Phys. , 3:107–114, 1935.[18] MR Garey and DS Johnson.

Computers and Intractability. A Guide to theTheory of

N P

Completeness . Freeman, San Francisco, 1979.[19] RJ Gillespie. Fifty years of the VSEPR model.

Coord. Chem. Rev. ,252:1315–1327, 2008.[20] Chris M. Gothard, Siowling Soh, Nosheen A. Gothard, Bartlomiej Kowal-czyk, Yanhu Wei, and Bartosz A. Grzybowski Bilge Baytekin. Rewiringchemistry: Algorithmic discovery and experimental validation of one-pot reactions in the network of organic chemistry.

Angewandte Chemie ,51(32):7922–7927, 2012.[21] Dietmar Heidrich, Wolfgang Kliesch, and Wolfgang Quapp.

Properties ofChemically Interesting Potential Energy Surfaces , volume 56 of

LectureNotes in Chemistry . Springer-Verlag, Berlin, 1991.[22] James B Hendrickson. Comprehensive system for classiﬁcation and nomen-clature of organic reactions.

J Chem Inf Comput Sci , (37):852–860, 1997.[23] James B Hendrickson. Systematic signatures for organic reactions.

J ChemInf Model , 50:1319–1329, 2010.[24] Roald Hoﬀmann. An extended H¨uckel theory. I. Hydrocarbons.

J ChemPhys , 39:1397–1412, 1963.[25] Tam´as Horv´ath and Jan Ramon. Eﬃcient frequent connected subgraphmining in graphs of bounded tree-width.

Theoret Comput Sci , 411:2784–2797, 2010.[26] Ernst H¨uckel. Quantentheoretische Beitr¨age zum Benzolproblem. I. DieElektronenkonﬁguration des Benzols und verwandter Verbindungen.

Z.Physik , 70:204–286, 1931.[27] S. Kauﬀman.

At Home in the Universe: The Search for the Laws of Self-Organization and Complexity . Oxford University Press, 1995.1228] S. Kauﬀman. Question 1: Origin of life and the living state.

Orig Life EvolBiosph , 37(4):315–322, 2007.[29] Jiˇri Matouˇsek and Robin Thomas. On the complexity of ﬁnding iso- andother morphisms for partial k-trees.

Disc Math , 108:343–364, 1992.[30] JA McCammon, BR Gelin, and M Karplus. Dynamics of folded proteins.

Nature , 267:585–590, 1977.[31] J Meisenheimer. ¨Uber eine eigenartige Umlagerung des Methyl-allyl-anilin-N-oxyds.

Chem Ber , 52:1667–1677, 1919.[32] PG Mezey.

Potential Energy Hypersurfaces . Elsevier, Amsterdam, 1987.[33] Dmitrij Rappoport, Cooper J. Galvin, Dmitry Yu. Zubarev, and Al´anAspuru-Guzik. Complex chemical reaction networks from heuristics-aidedquantum chemistry.

J Chem Theory Comput , 10(3):897–907, 2014.[34] Grzegorz Rozenberg, editor.

Handbook of Graph Grammars and Computingby Graph Transformation . World Scientiﬁc, 1997.[35] Peter Sanders. Algorithm engineering - an attempt at a deﬁnition. In

Eﬃcient Algorithms , volume 5760 of

Lecture Notes in Computer Science ,pages 321–340. Springer, 2009.[36] DH Turner and DH Mathews. NNDB: the nearest neighbor parameterdatabase for predicting stability of nucleic acid secondary structure.

NucleicAcids Res , 38:280–282, 2010.[37] Robert Ulanowicz.

Ecology, the Ascendent Perspective . Columbia Univ.Press, 1997.[38] S. Weinberg.

The Quantum Theory of Fields . Cambridge University Press,2005.[39] George M. Whitesides. Reinventing chemistry.

Angew. Chem. Int. Ed. ,54(11):3196–3209, 2015.[40] Atsuko Yamaguchi, Kiyoko F Aoki, and Hiroshi Mamitsuka. Finding themaximum common subgraph of a partial k-tree and a graph with a poly-nomially bounded number of spanning trees.

Inf Proc Lett , 92:57–63, 2004.[41] M Zuker and P Stiegler. Optimal computer folding of larger RNA sequencesusing thermodynamics and auxiliary information.