[PDF] When Machine Learning Meets Multiscale Modeling in Chemical Reactions

Abstract

Due to the intrinsic complexity and nonlinearity of chemical reactions, direct applications of traditional machine learning algorithms may face with many difficulties. In this study, through two concrete examples with biological background, we illustrate how the key ideas of multiscale modeling can help to reduce the computational cost of machine learning a lot, as well as how machine learning algorithms perform model reduction automatically in a time-scale separated system. Our study highlights the necessity and effectiveness of an integration of machine learning algorithms and multiscale modeling during the study of chemical reactions.

Full PDF

SSample title

AIP/123-QED

When Machine Learning Meets Multiscale Modeling in Chemical Reactions

Wuyue Yang, a) Liangrong Peng, a) Yi Zhu, and Liu Hong b)1) Zhou Pei-Yuan Center for Applied Mathematics, Tsinghua, Beijing, 100084,P.R. China College of Mathematics and Data Science, Minjiang University, Fuzhou, 350108,P.R. China (Dated: 2 June 2020)

Due to the intrinsic complexity and nonlinearity of chemical reactions, direct applicationsof traditional machine learning algorithms may face with many difﬁculties. In this study,through two concrete examples with biological background, we illustrate how the key ideasof multiscale modeling can help to reduce the computational cost of machine learning alot, as well as how machine learning algorithms perform model reduction automaticallyin a time-scale separated system. Our study highlights the necessity and effectiveness ofan integration of machine learning algorithms and multiscale modeling during the study ofchemical reactions. a) These authors have contributed equally to this work. b) Electronic mail: [email protected] a r X i v : . [ q - b i o . M N ] J un ample title I. INTRODUCTION

Accompanied with the matter synthesis and decomposition, energy storage and release, bio-function activation and deactivation, chemical reactions play a fundamental role in multipledisciplines , including biology, chemical engineering, materials science and so on. They helpto model complicated phenomena in nature by an explicit reaction network, to allow the inter-pretation of observed data through quantitative mathematical equations, and to translate variedexperimental conditions into tunable reaction rates and reaction orders. Due to their high com-plexity and nonlinearity, the previous studies of chemical reactions heavily rely on sophisticatedmathematical analysis and ﬁrst-principle calculations, like quantum chemistry .The ﬁrst mission of studies on chemical reactions is to obtain the proper mathematical modelwhich can interpret the observed phenomena and data. Even though there are some empirical lawsand some pre-knowledge on the reaction networks which may help to build the model, the parame-ters like reaction rates are usually deeply buried inside the massive data. Recent rapid developmentof various machine learning algorithms, especially deep neural networks, make inferring reactionnetworks and parameters be possible and efﬁcient. Mangan et al. proposed an implicit sparseidentiﬁcation of nonlinear dynamics to infer hidden biochemical reaction networks, with empha-sis on the rational nonlinear forms of the governing dynamics. Hu et al. constructed a so-calledODENet (short for Ordinary Differential Equations Network), which was used for explicitly mod-eling the Lotka-Volterra type dynamics and actin growth in the presence of medium-level noises.From a stochastic perspective, the chemical reaction system was modeled as a continuous-timeMarkov chain, whose propensity function was reconstructed as a combination of the pre-designedbasis functions based on the maximization of log-likelihood function .Costello and Martin showed that a supervised learning method can predict the metabolic path-way dynamics from proteomics data, which may be used to design various bioengineered systems.Yang et al. revealed that the aggregation rates of amyloid proteins could be reliably estimatedbased on the feedforward fully connected neural network and feature selection. In organic chem-istry, given some reactants and external conditions, all possible reactions were ranked by a machinelearning approach, including a reactive site classiﬁer and a ranking model, with the top-rankedmechanism corresponding to the major products . The Gaussian process regression was utilized toconstruct the potential energy surface of the HO x system , which could reduce the computationalcost and meanwhile guarantee the convergence with fewer training points. For more applications2ample titleof machine learning to chemical reactions, including the supervised and unsupervised learning,see e.g. Refs. .The successful attempts of machine-learning-based modeling pave a new way to understand thecomplicated dynamics of chemical reactions. However, most chemical reactions involve plenty ofreactants, multiple potential reaction routines, diverse reaction rates and so on. Without consid-ering the this intrinsic multi-component and multiscale nature of the system, direct applicationsof machine learning algorithms may face inevitable difﬁculties (see examples below for details).Motivated by the requirements on a real complex system, especially a simultaneous maintenanceof the efﬁciency of macroscopic models and the accuracy of microscopic models, the view of mul-tiscale modeling is introduced. It focuses on a proper separation of the system or phenomenon intoseveral scales with minimum overlap, a correct characterization of the relation between differentlevels of physical models, as well as a systematical procedure of coarse-graining . Multiscalemodeling offers a uniﬁed way to examine the system of chemical reactions, by looking into the re-actions occurring at different time scales and the relations between them. Therefore, it is expectedthat a proper integration of machine learning algorithms with ideas and methodology of multiscalemodeling and analysis will shed some light into this ﬁeld. And this leads to the major motivationof our current study.To be concrete, we will justify our arguments from two aspects: (1) By using the explicit cor-respondence between mesoscopic chemical master equations and macroscopic mass-action equa-tions in Kurtz’s limit, the challenging task of learning detailed probability distribution function(PDF) is converted into learning low-order moments. Obviously, the latter is much easier. In thiscase, the computational cost of direct machine learning is greatly reduced by incorporating themultiscale modeling. (2) When fast and slow reactions appear simultaneously in the same sys-tem, meaning there is a time-scale separation among the chemical reactions, the ODENet – a kindof machine learning algorithms with sparse identiﬁcation show an astonishing ability of derivingsimpliﬁed models under Quasi Steady State Approximation (QSSA) automatically. Therefore, ma-chine learning could help to model multiscale chemical reactions too. These two examples clearlydemonstrate that machine learning and multiscale modeling are closely related to each other. Aproper integration of two approaches will greatly facilitate our study of chemical reactions.The whole paper is organized as follows. A basic architecture of the ODENet, a special kind ofmachine learning algorithms which is designed to derive the explicit form of ODEs from the pre-given time series data, is introduced in Section II. Along with the basic ideas and techniques for3ample titlemultiscale modeling and analysis for chemical reactions, including the Kurtz’s limit from chemicalmaster equations to mass-action equations, and the quasi steady-state approximation. In SectionIII, we illustrate our key ideas through two examples – the development and differentiation of cells,as well as the self-regulatory gene transcription and translation. The usefulness of an integrationof machine learning and multiscale modeling could be clear learned. The last section containssome discussions. II. METHODSA. Basic Architecture of ODENet

The ordinary differential equations network was proposed as a continuous version of thefamous ResNet for dealing with time series data modeled by ordinary differential equations(ODEs). Mathematically, the consecutively repeating building blocks – each layer of a residualnetwork can be expressed as y k + = y k + f ( y k ; θ k ) , where y k is the output of k th hidden layer, y k + is the output of ( k + ) th hidden layer and f ( y k ; θ k ) represents the function of a network layerparameterized by θ k . After a simple algebraic transformation, we can get y k + − y k h = f ( y k ; θ k ) h , whichis the Euler’s discretization scheme of ODEs, d y dt = f ( y ; θ ) h . (1)As a consequence, the forward propagation process of a residue network is actually equivalentto the numerical solvation of a group of corresponding ordinary differential equations. Alterna-tively, it also means if we use an ODE solver to solve the ODEs directly, the process of forwardpropagation in a residue network is accomplished too. This signiﬁcant ﬁnding lays down the the-oretical foundation of ODENet. The application of ODE solvers could easily cope with input datawith unequal time intervals, ﬁght against medium-level noises, control the numerical errors anddynamically adjust its convergence criteria.To enhance the ability of learning the explicit governing ODEs from the pre-given time se-ries data, in a previous work we combined the ODENet with symbolic regression and sparseidentiﬁcation . Symbolic regression means the explicit form of f ( y ; θ ) is characterized throughparameters θ by expanding f ( y ) on a complete set of orthogonal basis functions Γ ( y ) , i.e.f ( y ; θ ) = θ Γ ( y ) . Consequently, the learning of ODEs becomes to determine the unknown param-eters θ from the data. In practice, polynomials are the most often used basis functions. Sparse4ample title FIG. 1.

An integration of ODENet with multi-scale modeling in the study of chemical reactions.

The upper panel illustrates the ODENet-based learning procedure of reaction mechanism under the helpof multiscale modeling, while the lower panel gives the automatic procedure for model reduction aided byODENet. The ﬂowchart of ODENet is shown in the middle. identiﬁcation means in the loss function L , an additional regulation term (cid:107) θ (cid:107) is added in order toremove redundant free parameters θ as many as possible. So that the loss function contains twoparts: L = (cid:107) y − (cid:98) y (cid:107) + ε (cid:107) θ (cid:107) . (2)The ﬁrst part controls the difference between the training data y and the predicted data (cid:98) y by5ample titleODENet, while the second part aims at a minimal model according to the Occam’s razor. Here ε is a hyperparameter. To obtain the optimal parameters θ , the classical Back Propagation (BP)algorithm is adopted to make an update, which will be repeated for many iterations until the lossfunction converges or is less than the threshold. Please see Fig. 1 on the ﬂowchart of ODENet orrefer to Ref. for further details. B. Multiscale Modeling of Chemical Reactions

Without loss of generality, we consider a chemical system with N species and M reactions , ν j S + ν j S + · · · + ν N j S N k j −→ ν (cid:48) j S + ν (cid:48) j S + · · · + ν (cid:48) N j S N , j = , , · · · , M , (3)where k j > j . The nonnegative integers { ν i j } and { ν (cid:48) i j } denote the stoichiometric coefﬁcients of the reactants and products respectively. The stoichiomet-ric matrix is introduced as U = [( u i j )] N × M with elements being u i j = ν (cid:48) i j − ν i j .

1. Chemical Master Equations

We focus on the molecular number of species ( S , S , · · · , S N ) represented by a stochas-tic variable n = ( n , n , · · · , n N ) T in a reaction vessel of volume V . When the magnitude of ( n , n , · · · , n N ) T is relatively small compared with the Avogadro’s constant, the randomnesscomes into play due to the intrinsic stochasticity of molecular collisions. From the perspective ofensemble average, we can denote the probability of the system in the state n by p ( n , t ) , where thetime-dependence is usually omitted as p ( n ) .With respect to the reactions in (3), the probability distribution obeys the following chemicalmaster equations (CMEs), in a compact form as, ddt p ( n ) = M ∑ j = (cid:2) p ( n − u j ) Φ j ( n − u j ) − p ( n ) Φ j ( n ) (cid:3) , (4)accompanied by the initial condition p ( n ) | t = = p ( n ) . Here u j is the j -th column of stoichiomet-ric matrix U = ( u , u , · · · , u M ) , and Φ j ( n ) is the mesoscopic propensity function characterizingthe probability Φ j ( n ) dt for which the j − th reaction occurs once within the time interval [ t , t + dt ) .In general, the state-dependent mesoscopic propensity function Φ j ( n ) of CMEs is assumed to6ample titlefollow the laws of mass-action, Φ j ( n ) = k j V N ∏ l = (cid:16) V − ν l j C ν l j n l (cid:17) , (5)which is the product of molecular number in a polynomial form and the rate coefﬁcient k j for n l ≥ ν l j , ∀ l = , , · · · , N . When there are not enough particles to form a reactant, saying S l , suchthat n l < ν l j , the propensity reduces to zero, Φ j ( n ) =

2. Stochastic Simulations

In most cases, the chemical master equations in (4) are a huge group of ordinary differentialequations, which are quite computational consuming. Alternative efﬁcient sampling algorithmsare needed. The Gillespie algorithm (GA) , which is able to generate typical time evolutionarytrajectories of species according to the reaction mechanisms and reaction rates in a stochastic way,maybe the most famous one.Gillespie implemented two stochastic simulation algorithms. The one is the direct method(DM) and the other is the ﬁrst-reaction method (FRM). These two methods are theoretically equiv-alent, so we here only implement the ﬁrst reaction method. The FRM generates putative time forevery reaction and chooses a time at which the corresponding reaction would occur while no otherreaction occurred before that.By independently running the Gillespie algorithm once and again, statistics on the correspond-ing stochastic trajectories will converge to the corrected probability distribution given by the chem-ical master equations. They constitute the training data set to feed into the machine learning algo-rithms.

3. Moment-Closure Equations in Kurtz’s Limit

Although CMEs provide a relatively accurate way to model general chemical reaction systems,it leads to a heavy burden in both modeling and experiments since the dimensionality of the tran-sition matrix is usually extremely high. Moreover, the time-consuming numerical simulation ofCMEs becomes a common bottleneck when the number of species or reactions is large. In orderto make a simpliﬁcation, we turn to look at the mean density of species, c i = ∑ n V − n i p ( n ) , (6)7ample titlewhen the molecular number of reactants becomes large.To deduce the macroscopic kinetics of the concentration c i , we multiply (4) by the numberdensity V − n i and take the summation over all admissible state { n } on both sides, which yields, ddt ∑ n V − n i p ( n ) = M ∑ j = ∑ n V − n i (cid:2) p ( n − u j ) Φ j ( n − u j ) − p ( n ) Φ j ( n ) (cid:3) = M ∑ j = u i j (cid:20) ∑ n V − p ( n ) Φ j ( n ) (cid:21) , (7)where in the last step we have used the variable substitution n − u j = n (cid:48) and have neglected theboundary terms. Direct calculation shows that the volume density of mesoscopic propensity func-tion deduces, V − Φ j ( n ) = φ j ( V − n ) + O ( V − ) , with φ j ( c ) = k j ∏ Nl = c l ν l j / ν l j ! being the usualmacroscopic propensity function.Taking the limit of V → + ∞ , n → + ∞ while keeping V − n ﬁnite, we have the following mass-action equations (MAEs) ddt c i ( t ) = M ∑ j = ( ν (cid:48) i j − ν i j ) φ j ( c ) , (8)on a nonnegative continuous state space { c | c ∈ R N ≥ } . The MAEs in (8) is the macroscopic de-scription derived from the mesoscopic CMEs of the reaction system (3). A rigorous mathematicaljustiﬁcation of the above limit process was ﬁrst done by Kurtz in the 1970s . Similar procedurecan be carried out for high-order moments of PDF, like the second-order variance studied in theﬁrst example in Section III. Remark II.1

According to the results proved by Kurtz , in the limit of V → + ∞ , for any ﬁnitetime the solution of CMEs in (4) will converge in probability to the solution of the correspondingMAEs in (8) , provided the initial conditions lim V → + ∞ V − n ( t = ) = c ( t = ) , which is a straight-forward consequence of the Central Limit Theorem. Our derivation above from the CMEs in (4) to MAEs in (8) for the reaction system (3) serves as a formal illustration of Kurtz’s theorem. C. Model Reduction by QSSA

Consider a very general chemical reaction system with time scale separation, which is writtenin an abstract matrix form,  d A dt = F ( A , B ) , ε d B dt = G ( A , B ) , (9)8ample titlewhere A and B respectively stand for slow and fast variables after some kind of proper non-dimensionalization. ε (cid:28) A , B can be regarded as remaining at a dynamically equilibrium state (quasi steadystate) due to their fast reactive nature, meaning approximately we have G ( A , B ) =

0. If B canbe uniquely solved from this algebraic relation, i.e. B = g ( A ) , the original time-scale separateddynamics could be simpliﬁed as d A dt = F ( A , g ( A )) , B = g ( A ) . (10)QSSA is a very classical model reduction approach and has been widely used in the study ofchemical reactions, see e.g. Ref. for details. III. RESULTS AND DISCUSSION

In this section, through two concrete examples – the single proliferative compartment model(SPCM) of IFE (interfollicular epidermis) maintenance as well as a gene network with autoreg-ulatory negative feedback, we are going to show how machine learning and multiscale modelinghelp each other in the study of chemical reactions.

A. Single Proliferative Compartment Model

1. The Basic Model

In the ﬁrst example, the SPCM of IFE maintenance considered by Clayton et al. is adoptedto illustrate how multiscale modeling helps to reduce the computational cost of machine learningduring inferring the detailed reaction mechanisms and reaction rates. According to the observa-tions by Clayton et al. , the clone fate of proliferating epidermal progenitor cells (EPCs) plays anessential role in adult epidermal homeostasis. And the key clone size distribution is modeled bychemical master equations, whose explicit forms are the major goal of machine learning. By tak-ing the explicit correspondence between mesoscopic chemical master equations and macroscopicmass-action equations in the Kurtz’s limit, the challenging task of learning detailed probabilitydistribution function is converted into learning low-order moments. Obviously, the latter is much9ample titleeasier. A similar idea has been previously applied by one of the authors to investigate the kineticsof amyloid aggregation, but without referring to machine learning .Consider two reactant species in the single-proliferative compartment model, including prolif-erating EPCs (denoted as A ) and post-mitotic cells in the basal layer ( B ). There are four reactionswhich involve symmetric cell division and asymmetric cell division. As shown in Fig. 2a, A has aunlimited self-reproduction potential at a rate r λ in order to maintain the epidermis, where λ isthe integrated division rate of proliferating EPCs A . A can also differentiate into A + B or B + B atthe rate of ( − r − r ) λ or r λ , respectively. In the transfer process, B cells in the basal layer leakfrom the clone-size distributions at Γ rate. The above reaction system reduces to the one studiedby Clayton et al. with symmetric division rates r = r . FIG. 2.

Single proliferative compartment model. (a)The mechanism of single proliferative compartmentmodel. EPCs (red circles) have an unlimited self-division potential to maintain the epidermis at a rate of r λ .Proliferating EPCs cells divide into two post mitotic basal cells (blue stars) at a rate of r λ . Asymmetricdivisions of EPCs into itself and post mitotic basal cells are at a rate of 1 − ( r + r ) λ . After mitosis inthe basal layer, the post mitotic basal cells leak at a rate of Γ . k and k represent the rate constants fortwo additional possible reactions inferred by the ODENet. 10 typical stochastic trajectories for (b) n A and(c) n B are generated by GA. The learned results of ODENet are compared with the training data generatedby GA on the (d) average and (e) variance of cell numbers. Here the rate constants in SPCM are set as λ = . , r = . , r = . , Γ = .

31 per week in accordance with . The initial PDF is taken as adelta distribution with p ( , ) = times independent stochasticsimulations are performed by using the Gillespie algorithm. They constitute the training data setto feed into our following ODENet-based machine learning procedure.

2. Learning Mass-Action Equations by ODENet

Here our major goal is to obtain the SPCM in Fig. 2a and the explicit rate constants. However,a direct application of ODENet to learn the time evolution of p ( n A , n B ) (or the chemical masterequations) from the training data generated by stochastic simulations is prohibited due to heavycomputational cost. Therefore, by taking advantage of the knowledge of multiscale modeling inchemical reactions, especially the explicit correspondence between mesoscopic chemical masterequations and macroscopic mass-action equations in the Kurtz’s limit, we turn to learn low-ordermoments instead of the probability distribution function governed by chemical mass-action equa-tions.With respect to training data of (cid:104) n A (cid:105) = ∑ n A p ( n A , n B ) and (cid:104) n B (cid:105) = ∑ n B p ( n A , n B ) by averagingthe stochastic trajectories generated through Gillespie algorithms, we need to determine the exacttypes of chemical reactions involving with these two reactants, their reaction orders and reactionrate constants. Without loss of generality, here we make a cutoff on the chemical reactions up tothe second order, corresponding to a combination of A , B , A + A , A + B , B + B , which reads  d (cid:104) n A (cid:105) dt = α (cid:104) n A (cid:105) + α (cid:104) n B (cid:105) + α (cid:104) n A (cid:105) + α (cid:104) n A (cid:105) (cid:104) n B (cid:105) + α (cid:104) n B (cid:105) , d (cid:104) n B (cid:105) dt = α (cid:104) n A (cid:105) + α (cid:104) n B (cid:105) + α (cid:104) n A (cid:105) + α (cid:104) n A (cid:105) (cid:104) n B (cid:105) + α (cid:104) n B (cid:105) . (11)Now we implement the ODENet to learn the dynamics in (11). Clearly, not all reaction rate con-stants will appear in the ﬁnal model. Those redundant coefﬁcients will be picked out by ODENetand removed through sparse identiﬁcation. After training and regression, only three non-zerocoefﬁcients α = . α = . α = − .

3. Learning High-Order Moment Equations

During the learning procedure of ODENet, since all coefﬁcients in front of quadratic termsin (11) are removed, we can make a conclusion that only ﬁrst-order reactions are present in thecurrent system. Then with respect to above learned dynamics and coefﬁcients, the desired single11ample titleproliferative compartment model as shown in Fig. 2a is reconstructed by ODENet. However, theexistence of two additional reactions, A k −→ φ and A k −→ B (see orange box in Fig. 2a) could notbe excluded in principle, which means at the moment the probability distribution of A -type and B -type cells follows ddt p ( n A , n B ) = k [( n A + ) p ( n A + , n B ) − n A p ( n A , n B )]+ k [( n A + ) p ( n A + , n B − ) − n A p ( n A , n B )]+ λ (cid:2) r ( n A − ) p ( n A − , n B ) + r ( n A + ) p ( n A + , n B − )+ ( − r − r ) n A p ( n A , n B − ) − n A p ( n A , n B ) (cid:3) + Γ (cid:2) ( n B + ) p ( n A , n B + ) − n B p ( n A , n B ) (cid:3) . (12)Here the same notations are borrowed just for simplicity. It should be noted that at the moment westill have no precise knowledge on all six reaction rate constants k , k , r , r , λ and Γ .To determine the unknown coefﬁcients, we further go to the second-order of PDF, the varianceof cell numbers to be exact. Based on (12), the ﬁrst-order (average) and second-order moments(variance) of n A and n B evolve according to  d (cid:104) n A (cid:105) dt = [( r − r ) λ − k − k ] (cid:104) n A (cid:105) , d (cid:104) n B (cid:105) dt = [ k + ( − r + r ) λ ] (cid:104) n A (cid:105) − Γ (cid:104) n B (cid:105) , dV A dt = β (cid:104) n A (cid:105) + β V A , dV B dt = β (cid:104) n A (cid:105) + β (cid:104) n B (cid:105) + β V B + β Cov , dCovdt = β (cid:104) n A (cid:105) + β V A + β Cov , (13)where V A = (cid:10) n A (cid:11) − (cid:104) n A (cid:105) , V B = (cid:10) n B (cid:11) − (cid:104) n B (cid:105) and Cov = (cid:104) n A n B (cid:105) − (cid:104) n A (cid:105) (cid:104) n B (cid:105) . Furthermore, we have β = k + k + ( r + r ) λ , β = − ( k + k ) + ( r − r ) λ , β = k + ( − r + r ) λ , β = Γ , β = − Γ , β = (( − r + r ) λ + k ) , β = − ( r λ + k ) , β = ( − r + r ) λ + k , β =( r − r ) λ − k − k − Γ . Now following the same procedure, the unknown values of β ’s could belearned by ODENet and are summarized in SI.12ample title

4. Deriving Chemical Master equations

The relations among desired rate constants k , k , r , r , λ , Γ and those learned parameters α ’sand β ’s are stated through the following matrix, i.e.  − − − − − − − − − − − − − (cid:124) (cid:123)(cid:122) (cid:125) V  r λ r λ k k λ Γ (cid:124) (cid:123)(cid:122) (cid:125) ˆ u =  α α − α β β β β − β β β (cid:124) (cid:123)(cid:122) (cid:125) ˆ b . (14)Direct calculations show that the rank of the augmented matrix ( rank ( V | ˆ b ) =

7) is larger thanthat of the coefﬁcient matrix ( rank ( V ) = u = (cid:0) V T V (cid:1) − V T ˆ b , whose relative errors with respect to the true valuesare less than 8%. Parameters k k r r λ Γ true value 0 0 0.0836 0.0764 1.1 0.31learned value 0.0123 -0.0197 0.0831 0.0824 1.1059 0.3074relative errors ∼ ∼ Even though k and k are not exactly identiﬁed as zero, their values are about one order of mag-nitude smaller than the others. In this sense, we have successful reconstructed the original SPCMbased on the stochastic time trajectories of n A and n B in the training data set. As further validated13ample titlein Fig. 3, the joint probability distribution of the desired single-proliferative compartment modelis honestly reproduced (see SI for the marginal probability distribution), which highlights the efﬁ-ciency and effectiveness of our integrated approach of ODENet with multiscale modeling duringthe study of chemical reactions. FIG. 3.

Comparison of PDF generated by GA with the learned results of ODENet.

Joint probabilitydistributions p ( n A , n B ) are shown in (a,e) 1, (b,f) 5, (c,g) 10 and (d,h) 20 weeks respectively. B. A Gene Network with Autoregulatory Negative Feedback

1. The Basic Model

In the second example, we plan to show how machine learning can be used for model reduc-tion, an important aspect of multiscale modeling with vast applications in chemical reactions. Toillustrate our ideas, let us consider a gene network with autoregulatory negative feedback, whichincludes ﬁve reactants – the gene ( G ), mRNA ( M ), protein ( P ), and two gene-protein complexes( GP , GP ). Among them, there are eight reactions (see Fig. 4a). k , k s , k dm are rate constants oftranscription from the gene G , translation into the protein P , and mRNA degradation, respectively.The gene can bind with either one or two proteins, whose forward and backward reaction rate con-stants are denoted as k , k − , k and k − separately. Furthermore, it is assumed that GP producesmRNA at the same rate k as the transcription rate of G alone.14ample title FIG. 4.

Validation of ODENet aided model reduction. (a) A cartoon illustration of the gene network withnegative feedback, including transcription, translation, degradation and a negative feedback loop. Predic-tions of the reduced model in (16) (blue crosses) are compared with the original model in (15) (red solidlines) on concentrations of (b) protein and mRNA in the slow time scale, and (c) gene and (d-e) gene-proteincomplexes in the fast time scale.  ddt c P = k s c M − k c G c P + k − c GP − k c P c GP + k − c GP , ddt c M = k c G + k c GP − k dM c M , ddt c G = − k c G c P + k − c GP , ddt c GP = k c G c P − k − c GP − k c P c GP + k − c GP , ddt c GP = k c P c GP − k − c GP . (15)It is noted that the total gene concentration is a constant due to the conservation law, i.e. c G + c GP + c GP = c total . To produce a time-scale separation of reactions, we choose k = , k − = . k = , k − = k = . , k s = . , k dm = .

01, meaning the concentrations of G , GP , GP canquickly reach dynamical balance in comparison with those of mRNA and protein. The initialconditions are set as c G = c GP = c GP = . c P = c M =

2. Model Reduction by ODENet

Due to the existence of time-scale separation, it is possible to make a simpliﬁcation of thereaction system in (15). Classically, this is done by analytical methods, like Quasi Steady-StateAssumption and Partial Equilibrium Assumption . Here, we are going to show how the sim-pliﬁcation procedure can be carried out automatically by ODENet.

Pearson’s coefﬁcient dc P / dt dc M / dt dc G / dt dc GP / dt dc GP / dtdc P / dt dc M / dt dc G / dt dc GP / dt dc GP / dt At the ﬁrst step, with the help of traditional classiﬁcation algorithms, like the correlation anal-ysis based on the Pearson’s coefﬁcient between concentration derivatives (see Table. II), the fast16ample titleand slow variables can be easily separated into two groups. Inspired by the classical results ofMichaelis-Menton kinetics, we suppose three fast variables c G , c GP , c GP (see Fig. 4b-4e) are char-acterized by fractional functions, whose numerator and denominator are polynomials of c P (up tothe second-order in the current study). In contrast, c M does not appear in the fractional functions,since the last three formulas in (15) contain no terms of c M . Consequently, the simpliﬁed modelwe are seeking for is given by  ddt c P = k s c M − k c P H ( c G ) + k − H ( c GP ) − k c P H ( c GP ) + k − H ( c GP ) , ddt c M = − k dM c M + k H ( c G ) + k H ( c GP ) , H ( c G ) = Ω Ω , H ( c GP ) = Ω Ω , H ( c GP ) = Ω Ω , (16)where Ω = β + β c p + β c P , Ω = α + α c P + α c P , Ω = α + α c P + α c P , Ω = α + α c P + α c P . Parameters β β β α α α QSSA 0.53 0.67 1 0.0159 0 0ODENet 0.54 0.66 1 0.0163 0 0Relative error 1.89% 1.49% ∼ ∼ ∼ α α α α α α QSSA 0 0.0201 0 0 0 0.03ODENet 0 0.020 0 0 0 0.03Relative error ∼ ∼ ∼ ∼ β . β , · · · , β and α , · · · , α are twelve free parameters to be speciﬁed. As summarized in Table.III, the simpliﬁed model learned by ODENet is very close to that by QSSA (see next section). Inparticular, terms of α c P , α c P , α , α c P , α and α c P are removed by sparse identiﬁcationduring the learning procedure. A major difference between two simpliﬁcation methods lies in the17ample titleextra four underlined terms on the right-hand side of the ﬁrst formula in (16). In QSSA, thesefour terms are exactly cancelled by each other. While during the simpliﬁcation procedure aidedby ODENet, we can only conclude that their sum is quite small instead of exactly zero (see SI).

3. Comparison with QSSA

Our above ODENet aided model reduction is consistent with the classical quasi steady-stateapproximation. Since G , GP , GP are considered as the fast intermediates, in contrast to the slowspecies P and M , a direct application of QSSA to (15) leads to c G c total = K Ω , c GP c total = K c P Ω , c GP c total = c P Ω , (17)where Ω = K + K c P + c P , K = k − / k , K = k − / k , K = k − k − / ( k k ) , and c total = c G + c GP + c GP is a constant.The corresponding reduced equations are  ddt c P = k s c M , ddt c M = − k dM c M + k ( K + K c P ) c total / Ω , c G c total = K Ω , c GP c total = K c P Ω , c GP c total = c P Ω , (18)which has been used to evaluate the performance of our ODENet aided model reduction. IV. CONCLUSION

Nowadays, various machine learning algorithms, like deep learning and reinforcement learning,have found their applications in diverse ﬁelds with great success. While in the ﬁeld of chemicalreactions, related studies begin to emerge, yet are still quite few. In the current paper, through twoconcrete biochemical examples, the single proliferative compartment model and a gene networkwith autoregulatory negative feedback, we present our key ideas on how machine learning andmultiscale modeling can help each other during the study of chemical reactions. And, as webelieve, an effective integration of two approaches will be crucial for the success of related studiesin this direction.Potential generalizations of our current work include but are not limited to:18ample title(1)

The spacial heterogeneity of chemical reactions.

In the current study, all reactions areassumed to proceed under well-mixed conditions, which means we can adopt a relatively simpleODE-based description. However, it is well-known the spacial heterogeneity can produce far morecomplicated and also interesting phenomena , like the Turing pattern, phase separation, activematter, etc. So how to generalize our results to PDEs would be of general interest. Recently,PDE-based machine learning algorithms shed light on this aspect.(2)

Bistability, oscillation, bifurcation of chemical reactions.

Even restricted to ODEs, achemical reaction system can possess very complex dynamical behaviors, like bistability, oscilla-tion, bifurcation, blow-up, etc. , than one can imagine . In the presence of noise, the situationbecomes even more complicated. The high-nonlinearity of chemical reactions puts forward greatchallenges to our ODENet-based model derivation and model reduction.(3)

Extension to other model reduction methods.

Here we test the possibility and accuracy ofODENet aided model reduction with respect to the QSSA method. Extension of our ideas to partialequilibrium approximation , maximum entropy principle , maximal likelihood estimation , aswell as other statistics or probability based approximations would be worthy of further studies. ACKNOWLEDGMENT

This work was supported by the National Science Foundation of China (Grant No. 21877070and 11871299), the Startup Research Funding of Minjiang University (mjy19033), and the Spe-cial Project of COVID-19 Epidemic Prevention and Control by Fuzhou Science and TechnologyBureau (2020-XG-002). The authors would like to thank the helpful discussions from Dr. Pipi Hu.

AIP PUBLISHING DATA SHARING POLICY

All the data in this paper which support the ﬁndings of this study are available from the corre-sponding author upon reasonable request.

REFERENCES T. Janos and E. Peter,

Mathematical Models of Chemical Reactions: Theory and Applications ofDeterministic and Stochastic Models (Princeton University Press, Princeton, 1989). K. H. Johnson, “Quantum chemistry,” Annual Review of Physical Chemistry , 39–57 (2003).19ample title N. M. Mangan, S. L. Brunton, J. L. Proctor, and J. N. Kutz, “Inferring biological networks bysparse identiﬁcation of nonlinear dynamics,” IEEE Transactions on Molecular, Biological andMulti-Scale Communications , 52–63 (2016). P. Hu, W. Yang, Y. Zhu, and L. Hong, “Revealing hidden dynamics from time-series data byodenet,” arXiv preprint arXiv:2005.04849 (2020). W. Zhang, S. Klus, T. Conrad, and C. Schütte, “Learning chemical reaction networks fromtrajectory data,” SIAM Journal on Applied Dynamical Systems , 2000–2046 (2019). Z. Costello and H. G. Martin, “A machine learning approach to predict metabolic pathway dy-namics from time-series multiomics data,” NPJ Systems Biology and Applications , 19 (2018). W. Yang, P. Tan, X. Fu, and L. Hong, “Prediction of amyloid aggregation rates by machinelearning and feature selection,” The Journal of Chemical Physics , 084106 (2019). M. A. Kayala, C.-A. Azencott, J. H. Chen, and P. Baldi, “Learning to predict chemical reac-tions,” Journal of Chemical Information and Modeling , 2209–2222 (2011). Q. Song, Q. Zhang, and Q. Meng, “Revisiting the gaussian process regression for ﬁtting high-dimensional potential energy surface and its application to the OH + HO −→ O + H O reac-tion,” The Journal of Chemical Physics , 134309 (2020). A. F. Villaverde and J. R. Banga, “Reverse engineering and identiﬁcation in systems biology:strategies, perspectives and challenges,” Journal of the Royal Society Interface , 20130505(2014). S. L. Brunton, J. L. Proctor, and J. N. Kutz, “Discovering governing equations from data bysparse identiﬁcation of nonlinear dynamical systems,” Proceedings of the National Academy ofSciences , 3932–3937 (2016). L. Boninsegna, F. Nüske, and C. Clementi, “Sparse learning of stochastic dynamical equations,”The Journal of Chemical Physics , 241723 (2018). K. Choi, “Robust approaches to generating reliable predictive models in systems biology,” in

Systems Biology. RNA Technologies , edited by N. Rajewsky, S. Jurga, and J. Barciszewski(Springer, Cham, 2018) pp. 301–312. B. C. Daniels and I. Nemenman, “Automated adaptive inference of phenomenological dynamicalmodels,” Nature Communications , 8133 (2015). E. Weinan,

Principles of Multiscale Modeling (Cambridge University Press, Cambridge, 2011). R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural ordinary differentialequations,” in

Advances in Neural Information Processing Systems 31 , edited by S. Bengio,20ample titleH. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Curran Associates,Inc., 2018) pp. 6571–6583. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in

TheIEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016). Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel,“Backpropagation applied to handwritten zip code recognition,” Neural Computation , 541–551(1989). H. G. Othmer, “Analysis of complex reaction networks,” Lecture Notes, School of Mathematics,University of Minnesota (2003). D. T. Gillespie, “Exact stochastic simulation of coupled chemical reactions,” The Journal ofPhysical Chemistry , 2340–2361 (1977). T. G. Kurtz, “The relationship between stochastic and deterministic models for chemical reac-tions,” The Journal of Chemical Physics , 2976–2978 (1972). L. A. Segel and M. Slemrod, “The quasi-steady-state assumption: A case study in perturbation,”Siam Review , 446–477 (1989). E. Clayton, D. P. Doupé, A. M. Klein, D. J. Winton, B. D. Simons, and P. H. Jones, “A singletype of progenitor cell maintains normal epidermis,” Nature , 185–189 (2007). L. Hong and W.-A. Yong, “Simple moment-closure model for the self-assembly of breakableamyloid ﬁlaments,” Biophysical Journal , 533–540 (2013). P. Tan and L. Hong, “Modeling ﬁbril fragmentation in real-time,” Journal of Chemical Physics , 084904 (2013). Y. J. Huang, L. Hong, and W. A. Yong, “Partial equilibrium approximations in apoptosis. ii. thedeath-inducing signaling complex subsystem,” Mathematical Biosciences , 126–134 (2015). S. H. Rudy, S. L. Brunton, J. L. Proctor, and J. N. Kutz, “Data-driven discovery of partialdifferential equations,” Science Advances , e1602614 (2017). M. Raissi, P. Perdikaris, and G. Karniadakis, “Numerical gaussian processes for time-dependentand nonlinear partial differential equations,” SIAM Journal on Scientiﬁc Computing , A172–A198 (2017). H. Qian, P.-Z. Shi, and J. Xing, “Stochastic bifurcation, slow ﬂuctuations, and bistability as anorigin of biochemical complexity,” Physical Chemistry Chemical Physics , 4861 (2009). L. Bishop and H. Qian, “Stochastic bistability and bifurcation in a mesoscopic signaling systemwith autocatalytic kinase,” Biophysical journal , 1–11 (2010).21ample title E. T. Jaynes, “Information theory and statistical mechanics,” Physical Review , 620 (1957). F. E. Harrell, “Overview of maximum likelihood estimation,” in