Population-based Optimization for Kinetic Parameter Identification in Glycolytic Pathway in Saccharomyces cerevisiae
Ewelina Weglarz-Tomczak, Jakub M. Tomczak, Agoston E. Eiben, Stanley Brul
PPopulation-based Optimization for Kinetic Parameter Identification inGlycolytic Pathway in
Saccharomyces cerevisiae
Ewelina Weglarz-Tomczak a , ∗ ,1 , Jakub M. Tomczak b , ∗ ,2 , Agoston E. Eiben b and Stanley Brul a a Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, the Netherlands b Department of Computer Science, Faculty of Science, Vrije Universiteit Amsterdam, the Netherlands
A R T I C L E I N F O
Keywords :Dynamic ModelsMetabolismGlycolysisYeastEvolutionary ComputingDerivative-free Optimization
A B S T R A C T
Models in systems biology are mathematical descriptions of biological processes that are used toanswer questions and gain a better understanding of biological phenomena. Dynamic models representthe network through rates of the production and consumption for the individual species. The ordinarydifferential equations that describe rates of the reactions in the model include a set of parameters.The parameters are important quantities to understand and analyze biological systems. Moreover, theperturbation of the kinetic parameters are correlated with upregulation of the system by cell-intrinsicand cell-extrinsic factors, including mutations and the environment changes.Here, we aim at using well-established models of biological pathways to identify parameter val-ues and point their potential perturbation/deviation. We present our population-based optimizationframework that is able to identify kinetic parameters in the dynamic model based on only input andoutput data ( i.e. , timecourses of selected metabolites). Our approach can deal with the identifica-tion of the non-measurable parameters as well as with discovering deviation of the parameters. Wepresent our proposed optimization framework on the example of the well-studied glycolytic pathwayin
Saccharomyces cerevisiae .
1. Introduction
Mathematical models in systems biology provide a rep-resentation of the information obtained from experimentalobservations about the structure and function of a particularbiological network [14, 24]. The models that include dy-namics of the network typically consist of systems of ordi-nary differential equations (ODE) [41, 46]. We call themdynamic or kinetic models, and their crucial element are pa-rameters. Many parameters are generally unknown, therebyit hampers the possibility for obtaining quantitative predic-tions [3]. Kinetic parameters characterize the particular re-action catalyzed by a specific enzyme in particular condi-tions. Therefore, the deviation from the standard value couldbe correlated with a mutation, an epigenetics or a change ofthe environment. In other words, determining values of pa-rameters for the considered biological systems could be usedfor further understanding and analysis of the system.In traditional systems biology approaches the kinetic pa-rameters in a dynamic model can be identified by fitting themodel to experimental data or are measured for individualreactions separately. Such models can be used to confirmhypotheses, to draw predictions and to find those (time vary-ing) stimulation conditions that result in a particular desiredbehavior [13, 24, 44]. We propose to go a step forward and ∗ Corresponding author [email protected] (E. Weglarz-Tomczak); [email protected] (J.M. Tomczak)
ORCID (s): (E. Weglarz-Tomczak); (J.M. Tomczak); (A.E. Eiben); (S. Brul) Tel. +31 20 525 8626, Address: Science Park 904, 1098 XH, Ams-terdam, the Netherlands Tel. +31 20 598 8206, Address: De Boelelaan 1111, 1081 HV, Ams-terdam, the Netherlands we aim at using established models to predict a perturbationin the biological system and to point out the step (reaction)where it occurs via identification of the kinetic parametersof differential equations. Therefore, the goal of our work isto develop a computational-based framework for: (i) iden-tifying the non-measurable parameters so as to reproduce,insofar as it is possible, the experimental data, and (ii) pre-dicting kinetic parameters of any reaction in the biochemicalnetwork based on the timecourse of only input and outputmetabolites.
Parameter identificationInputs
Dynamic model+initial conditions Timecourse data (limited)
Outputs
PySCeSStep 1
Generation
Step 2
Evaluation
Step 3
SelectionPopulation-basedoptimization
Timecourses of all metabolitesParameters values
Figure 1:
A schematic representation of our approach.Weglarz-Tomczak et al.:
Preprint
Page 1 of 11 a r X i v : . [ q - b i o . B M ] S e p inetic Parameters Identification Among many branches of computational methods, theoptimization algorithms seemed for us to be most promisingfor achieving our goal. The optimization strategies, namely,deterministic, stochastic and heuristics have been already ap-plied in systems biology for parameter identification for dia-betes dynamics [22], biomarker identification, in-silico sim-ulations of biological phenomena, and for a variety of statis-tical inferences and time course estimations [33].In general, optimization is about finding a solution thatminimizes (or maximizes) an objective function for givenconstraints, i.e. , possible values that solutions can take. Asubset of optimization problems with non-differentiable or black-box objective functions constitute derivative-free op-timization (DFO) problems. In general, a black-box is anyprocess that for given input, returns an output, but its analyt-ical description is unavailable or it is non-differentiable [1].Moreover, the big advantage of the black-box optimizationand the derivative-free optimization is that they make almostno assumptions about the problem [7]. Therefore, they arewidely used in many domains, e.g. , in optimizing computerprograms [4], biochemical processes [11, 21, 38, 40], bio-engineering [47], or in evolutionary robotics [5].There exists a vast of derivative-free optimization (DFO)methods, ranging from classical algorithms like iterative lo-cal search or direct search [1] to modern approaches likeBayesian optimization (BO) [35] and evolutionary algorithms(EA) [2, 8]. The main drawback of classical approaches isthat they become infeasible for high-dimensional problems,and they require additional strategies like multiple starts toobtain good solutions. Bayesian optimization is currentlythe state-of-the-art approach for black-box optimization. Thisapproach combines a surrogate model with an active query-ing strategy to find high quality candidate solutions. How-ever, typically the surrogate model is non-parametric thatleads to a cubic complexity with respect to the number ofstored solutions. Therefore, BO is typically employed to op-timize expensive-to-evaluate functions. The last group ofDFO methods, evolutionary algorithms or, more generally,population-based methods [9], utilize a population of solu-tions that share information and point the search to region ofhigh potential. If evaluating the objective function is rela-tively low, and, thus, one can afford to have a large popula-tion, then these approaches give very good results in a largeclass of optimization problems. Here, since we deal with bi-ological systems in silico , we decide to follow this approachand use the population-based optimization methods for theparameter identification, see Figure 1.In this study, we chose glycolysis that is a crucial metabolicpathway and its upregulation is correlated with diseases likecancer [10, 30]. Nearly all living organisms carry out gly-colysis as a part of cellular metabolism. Glycolytic paththat consists of a series of reactions breaks down glucoseinto two three-carbon compounds and extracts energy forcellular metabolism. Therefore, glycolysis is at the heartof classical biochemistry and, as such, it is very well de-scribed. One of the most intensively studied organisms incontext of, among others, glycolysis is Saccharomyces cere- glucosev1v2fructose-1,6-biphosphateATP v11triosephosphates v3 v4 NADNADATP v5triphosphoglyceratepyruvateacetaldehydeNAD externalacetaldehydev6v7 v9 v10v8ATP
Figure 2:
The glycolysis process in the yeast
Saccharomycescerevisiae proposed in [45]. There are reactions governingthe process with parameters in total, and metabolites.Blue circles depict observable metabolites, red circles denoteunobservable metabolites, and green squares represent reac-tions. A white circle with a diagonal line corresponds to asink. The model is taken from the JWS database [27]. visiae species, also known also known as bakerâĂŹs yeast[6, 19, 20, 25, 28]. Whereas, the dynamic model of glycoly-sis in Saccharomyces cerevisiae is of big interest in systemsbiology dynamic modeling literature [12, 17, 37, 43, 45].We applied our optimization framework to a model ofglycolysis in yeast proposed in [45] that suffices to describethe essence of our research goal, see Figure 2. This modelcontains lumped reactions of the glycolytic pathway and in-cludes production of glycerol, fermentation to ethanol andexchange of acetaldehyde between the cells, and trapping ofacetaldehyde by cyanide.This paper has a multidisciplinary character. Therefore,
Weglarz-Tomczak et al.:
Preprint
Page 2 of 11inetic Parameters Identification we state research goals that are of interest for computationalbiology, systems biology and derivative-free optimization,namely:• Apply the population-based optimization methods tothe parameter identification of the glycolysis processand analyze their performance.• Determine whether it is possible to identify parame-ters if only a subset of metabolites are observed.• Determine whether it is possible to identify parame-ters if one parameter in the system is slightly changed, i.e. , in the case of a mutation.The first research goal require to implement the population-based optimization methods and combine them with an ODEsolver. Moreover, we must be able to express a biologicalmodel and process it. For this purpose, we build on top ofthe Python Simulator for Cellular Systems (PySCeS) library[26]. We also propose two surrogate-assisted population-based optimization methods to reduce computational com-plexity and enhance exploration. Further, we slightly mod-ify parameter values of the model from [45] in order to an-swer our another two research questions. Finally, we conductextensive experiments in silico and provide qualitative andquantitative analysis of the population models.The contribution of the paper is threefold:• We provide a population-based optimization frame-work for parameter identification and showcase its per-formance on the example of the glycolysis of
Saccha-romyces cerevisiae , one of the most studied species inbiology.• We analyze the performance of the population-basedoptimization framework in the considered problem andindicate its high potential for future research.• We extend the Python framework PySCeS [26] by im-plementing the population-based optimization meth-ods ( methods known in literature, and new meth-ods) in Python. The code for the methods togetherwith the experiments is available online: https://github.com/jmtomczak/popi .
2. Methods
We consider an optimization problem of a function 𝑓 ∶ 𝕏 → ℝ , where 𝕏 ⊆ ℝ 𝐷 is the search space. In this paperwe focus on the minimization problem, namely: 𝐱 ∗ = arg min 𝐱 ∈ 𝕏 𝑓 ( 𝐱 ; ) , (1)where denotes observed data.Further, we assume that the analytical form of the func-tion 𝑓 is unknown or cannot be used to calculate derivatives,however, we can query it through a simulation or experi-mental measurements. Problems of this sort are known as derivative-free or black-box optimization problems [1, 16].Additionally, we consider a bounded search space, i.e. , weinclude inequality constraints for all dimensions in the fol-lowing form: 𝑙 𝑑 ≤ 𝑥 𝑑 ≤ 𝑢 𝑑 , where 𝑙 𝑑 , 𝑢 𝑑 ∈ ℝ and 𝑙 𝑑 < 𝑢 𝑑 ,for 𝑑 = 1 , , … , 𝐷 . We consider the glycolysis process in yeast as a biochem-ical system with inputs and outputs (see Figure 2). The in-put to the system is glucose ( glu ), and the outputs are ATP( atp ), NAD ( nad ), acetaldehyde ( ac ) and external acetalde-hyde ( ace ). The other metabolites, i.e. , triose phosphates( triop ), pyruvate ( pyr ), fructose-1,6-biphosphate ( fru ) andtriphoshoglycerate ( tp ) are considered to be unobserved quan-tities. The system is governed by reactions with pa-rameters in total (see Appendix for details). Each reactionis represented by an ordinary differential equation that isknown. We assume that we have measurements of the in-puts and outputs, i.e. , glu , atp , nad , ac , and ace , and eachquantity is represented as a timecourse of length 𝑇 . We de-note these measurements by = { glu , atp , nad , ac , ace } . Further, following the nomenclature presented in [4], weconsider the system of differential equations representing theglycolysis process as the simulator that for given values ofparameters and initial conditions provides timecourses of allmetabolites. Then, we can denote parameters by 𝐱 and thesimulator by sim ∶ → ℝ 𝑇 , i.e. , sim takes parameters 𝐱 and simulates timcourses of length 𝑇 for all metabolites,including glu , atp , nad , ac , ace . In order to calculate the ob-jective (or the fitness) of the parameter values, we use thefollowing function: 𝑓 ( 𝐱 ; ) = ∑ 𝑖 =1 𝛾 ⋅ 𝑇 𝑇 ∑ 𝑡 =1 ‖ 𝐲 𝑖,𝑡 − sim 𝑖,𝑡 ( 𝐱 ) ‖ , (2)where 𝐲 𝑖,𝑡 corresponds to one of the observed metabolitesat the 𝑡 -th time step, and sim 𝑖,𝑡 ( 𝐱 ) is the corresponding syn-thetically generated signal given by the simulator with pa-rameters 𝐱 , 𝛾 > specifies the strength of penalizing a mis-take. Notice that this is the (unnormalized) logarithm ofthe product of Gaussian distributions with means given by sim( 𝐱 ) and the diagonal covariance matrix with shared vari-ance 𝛾 . One group of widely-used methods for derivative-freeoptimization problems is population-based optimization al-gorithms. The idea behind these methods is to use a popu-lation of individuals , i.e. , a collection of candidate solutions = { 𝐱 , … , 𝐱 𝑁 } , instead of a single individual in the iter-ative manner. The premise of utilizing the population over In general, a black-box problem means that a formal description of aproblem is unknown, however, very often non-differentiable problems withknown mathematical representation ( e.g. , differential equations) are treatedas black-box.
Weglarz-Tomczak et al.:
Preprint
Page 3 of 11inetic Parameters Identification
Generation: 0
Generation: 1
Generation: 2
Generation: 3
Figure 3:
An illustration of a population-based optimization of a quadratic function (bluesolid line). At each generation a population is selected (blue nodes) and weakest individualsare discarded (red crosses). New candidate solutions are generated by sampling from thenormal distribution fit to the previous population (orange solid line). a single candidate solution is to obtain better exploration ofthe search space and exploiting potential local optima [9, 8].In the essence, every population-based algorithm con-sists of three following steps that utilize a procedure for gen-erating new individuals 𝐺 , and a selection procedure 𝑆 , thatis: (Init) Initialize = { 𝐱 , … , 𝐱 𝑁 } and evaluate all in-dividuals 𝑥 = { 𝑓 𝑛 ∶ 𝑓 𝑛 = 𝑓 ( 𝑥 𝑛 ) , 𝑥 𝑛 ∈ } . (Generation) Generate new candidate solutions usingthe current population, = 𝐺 ( , 𝑥 ) . (Evaluation) Evaluate all candidates solutions: 𝑐 = { 𝑓 𝑛 ∶ 𝑓 𝑛 = 𝑓 ( 𝑥 𝑛 ) , 𝑥 𝑛 ∈ } . (Selection) Select a new population using the candi-date solutions and the old population ∶= 𝑆 ( , 𝑥 , , 𝑐 ) . Go to
Generate or terminate.An exemplary population-based optimization approach is de-picted in Figure 3.In general, the population-based optimization methodsare favorable over standard DFO algorithms in problems whenquerying the objective function is relatively cheap. If timerequired to obtain a value of the objective function (or thefitness function in the context of EA) is low, then their com-putational complexity is linear with respect to the size ofthe population 𝑁 . Bayesian Optimization, for instance, isknown to give good performance, but its complexity typi-cally scales cubicly with respect to the number of queries[35]. Here, we take advantage of very low execution timeof running a simulator (the glycolysis model) and propose touse the population-based methods for the parameter identi-fication task.There is a plethora of population-based DFO algorithms[8], however, our goal is to verify whether this approach ingeneral could be successfully used in the considered task.Therefore, we decide to choose four instances of group ofmethods that are easy-to-use and are proven to work wellin practice: evolutionary strategies (ES), differential evolu-tion (DE), estimation of distribution algorithms (EDA), andrecently proposed reversible differential evolution (RevDE). Moreover, we propose to enhance EDA and RevDE with asurrogate model to allow better exploration and speed up cal-culations. Evolutionary strategies can be seen as a specialization ofevolutionary algorithms with very specific choices of 𝐺 and 𝑆 . The core of ES is to formulate 𝐺 using the multivari-ate Gaussian distribution. Here, we follow the widely-used(1+1)-ES that generates a new candidate using the Gaussianmutation parameterized by 𝜎 > , namely: 𝐱 ′ = 𝐱 + 𝜎 ⋅ 𝜀, (3)where 𝜀 ∼ (0 , I) , and (0 , I) denotes the Gaussian dis-tribution with zero mean and the identity covariance matrix I . Next, if the fitness value of 𝐱 ′ is smaller than the value offitness function of 𝐱 , the new candidate is accepted and theold one is discarded.The crucial element of this approach is determining thevalue of 𝜎 . In order to overcome possibly time-consuminghyperparameter search, the following adaptive procedure isproposed [2]: 𝜎 ∶= ⎧⎪⎨⎪⎩ 𝜎 ⋅ 𝑐 if 𝑝 𝑠 < ,𝜎 ∕ 𝑐 if 𝑝 𝑠 > ,𝜎 if 𝑝 𝑠 = 1∕5 . (4)where 𝑝 𝑠 is the number of accepted individuals of the off-spring divided by the population size 𝑁 , and 𝑐 is equal . following the recommendation in [34]. Differential evolution is another population-based methodthat is loosely based on the Nelder-Mead method [36, 32]. Anew candidate is generated by randomly picking a triple fromthe population, ( 𝐱 𝑖 , 𝐱 𝑗 , 𝐱 𝑘 ) ∈ , and then 𝐱 𝑖 is perturbed byadding a scaled difference between 𝐱 𝑗 and 𝐱 𝑘 , that is: 𝐲 = 𝐱 𝑖 + 𝐹 ( 𝐱 𝑗 − 𝐱 𝑘 ) , (5)where 𝐹 ∈ ℝ + is the scaling factor. This operation could beseen as an adaptive mutation operator that is widely knownas differential mutation [32].Further, the authors of [36] proposed to sample a binarymask 𝐦 ∈ {0 , 𝐷 according to the Bernoulli distribution Weglarz-Tomczak et al.:
Preprint
Page 4 of 11inetic Parameters Identification with probability 𝑝 = 𝑃 ( 𝑚 𝑑 = 1) shared across all 𝐷 di-mensions, and calculate the final candidate according to thefollowing formula: 𝐯 = 𝐦 ⊙ 𝐲 + (1 − 𝐦 ) ⊙ 𝐱 𝑖 , (6)where ⊙ denotes the element-wise multiplication. In theevolutionary computation literature this operation is knownas uniform crossover operator [8]. In this paper, we fix 𝑝 =0 . following general recommendations in literature [29] anduse the uniform crossover in all methods.The last component of a population-based method is aselection mechanism. There are multiple variants of selec-tion [8], however, here we use the “survival of the fittest”approach, i.e. , we combine the old population with the newone and select 𝑁 candidates with highest fitness values, i.e. ,the deterministic ( 𝜇 + 𝜆 ) selection.This variant of DE is referred to as âĂIJDE/ rand / / bin âĂİ,where rand stands for randomly selecting a base vector, isfor adding a single perturbation and bin denotes the uniformcrossover. Sometimes it is called classic DE [32]. The mutation operator in DE perturbs candidates usingother individuals in the population to generate a single newcandidate. As a result, having too small population couldlimit exploration of the search space. In order to overcomethis issue, a modification of DE was proposed that utilizedall three individuals to generate three new points in the fol-lowing manner [39]: 𝐲 = 𝑥 𝑖 + 𝐹 ( 𝐱 𝑗 − 𝐱 𝑘 ) 𝐲 = 𝑥 𝑗 + 𝐹 ( 𝐱 𝑘 − 𝐲 ) (7) 𝐲 = 𝑥 𝑘 + 𝐹 ( 𝐲 − 𝐲 ) . New candidates 𝐲 and 𝐲 could be further used to calculateperturbations using points outside the population. This ap-proach does not follow a typical construction of an EA whereonly evaluated candidates are mutated. Further, we can ex-press (7) as a linear transformation using matrix notation byintroducing matrices as follows: ⎡⎢⎢⎣ 𝐲 𝐲 𝐲 ⎤⎥⎥⎦ = ⎡⎢⎢⎣ 𝐹 − 𝐹 − 𝐹 𝐹 𝐹 + 𝐹 𝐹 + 𝐹 − 𝐹 + 𝐹 + 𝐹 𝐹 − 𝐹 ⎤⎥⎥⎦ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ = 𝐑 ⎡⎢⎢⎣ 𝐱 𝐱 𝐱 ⎤⎥⎥⎦ . (8)In order to obtain the matrix 𝐑 , we need to plug 𝐲 to the sec-ond and third equation in (7), and then 𝐲 to the last equationin (7). As a result, we obtain 𝑀 = 3 𝑁 new candidate so-lutions. This version of DE is called Reversible DifferentialEvolution , because the linear transformation 𝐑 is reversible[39]. Most of the population-based optimization methods aimat finding a solution and the information about the distri- bution of the search space and the fitness function is repre-sented implicitly by the population. However, this distribu-tion could be modeled explicitly using a probabilistic model[9]. These methods have become known as estimation ofdistribution algorithms [18, 23, 31].The key difference between EDA and EA is the genera-tion step. While an EA uses evolutionary operators like mu-tation and cross-over to generate new candidate solutions,EDA fits a probabilistic model to the population, and thennew individuals are sampled from this model.Therefore, fitting a distribution to the population is thecrucial part of an EDA. There are various probabilistic mod-els that could be used for this purpose. Here, we proposeto fit the multivariate Gaussian distribution ( 𝜇, Σ) to thepopulation . For this purpose, we can use the empiricalmean and the empirical covariance matrix: ̂𝜇 = 1 𝑁 𝑁 ∑ 𝑛 =1 𝐱 𝑛 , (9)and ̂ Σ = 1 𝑁 𝑁 ∑ 𝑛 =1 ( 𝐱 𝑛 − ̂𝜇 )( 𝐱 𝑛 − ̂𝜇 ) ⊤ . (10)An efficient manner of sampling new candidates is to firstcalculate the Cholesky decomposition of the covariance ma-trix, ̂ Σ = 𝐋𝐋 ⊤ , where 𝐋 is the lower-triangular matrix, andthen computing: 𝐱 ′ = 𝜇 + 𝐋 𝜀, (11)where 𝜀 ∼ (0 , I) . The Eq. 11 is repeated 𝑀 times togenerate a new set of candidate solutions. Here, we set 𝑀 to the size of the population, i.e. , 𝑀 = 𝑁 .Once new candidate solutions are generated, the selec-tion mechanism is applied. In this paper, we use the sameselection procedure as the one used for DE. A possible drawback of population-based methods is thenecessity of evaluating large populations that, even thoughwe assume a low time cost per a single evaluation, couldsignificantly slow down the whole optimization process. Inorder to overcome this issue, a surrogate model could beused to partially replace querying the fitness function [15].The surrogate model is either a probabilistic model or a ma-chine learning model that gathers previously evaluated popu-lations, and allows to mimic the behavior of the fitness func-tion. It is assumed that the computational costs is lower oreven significantly lower than the computational cost of run-ning the simulator.There are multiple possible surrogate models, however,non-parametric models, e.g. , Gaussian processes [35], arepreferable, because they do not suffer from catastrophic for-getting ( i.e. , overfitting to last population and forgetting first
Weglarz-Tomczak et al.:
Preprint
Page 5 of 11inetic Parameters Identification populations). Here, we consider K -Nearest-Neighbor ( K -NN) regression model that stores all previously seen indi-viduals with evaluations, and the prediction of a new can-didate solution is an average over 𝐾 ( e.g. , 𝐾 = 3 ) closestpreviously seen individuals. Current implementations of the K -NN regressor provide efficient search procedures that re-sult in the computational complexity better than 𝑁 ⋅ 𝐷 , e.g. ,using KD-trees results in 𝑂 ( 𝐷 log 𝑁 ) . RevDE+
In the RevDE approach we generate 𝑁 new can-didate solutions and all of them are further evaluated. How-ever, this introduces and extra computational cost of runningthe simulator. This issue could be alleviated by using the K -NN regressor to approximate the fitness values of the newcandidates. Further, we can select 𝑁 most promising points.We refer to this approach as RevDE+. EDA+
The outlined procedure of EDA produces 𝑀 newcandidate solutions and in order to keep a similar computa-tional cost as ES and DE, we set 𝑀 to 𝑁 . However, thiscould significantly limit the potential of modeling a searchspace, because sampling in high-dimensional search spacesrequires a significantly large number of point. A potentialsolution to this problem could be the application of the K -NN regressor to quickly verify of the 𝐿 new points. As longas the time cost of providing the approximated value of thefitness function is lower than the running time of the simu-lator, we can afford to take 𝐿 > 𝑁 ( e.g. , 𝐿 = 5 𝑁 ). We referto this approach as EDA+.
3. Results
Model
In order to verify whether it is possible to identifyparameters in the glycolysis process in
Saccharomyces cere-visiae by observing only a subset of metabolites, we considera model presented in [45]. The model consists of ordinarydifferential equations and reaction with parameters. Inthe Appendix A, we present details about the model, as wellas initial conditions, and parameter values measured in [45]( real parameter values ). We treat the system of ordinary dif-ferential equations as the simulator.In the original model in [45], the authors were focusedon oscillatory character of the system, therefore, they assumea constant injection of glu (see reaction 𝑣 in Figure 2, i.e. ,the glucose transporter). However, we consider other sce-nario where there is only an initial input of glucose. For thispurpose, we set 𝑘 in 𝑣 equal . Observations
In the experiments we assume only glu , atp , nad , ac , and ace are observed. We generate the observedmetabolites by running the simulator with the real parametervalues. In order to mimic real measurements that are typi-cally noisy, we add a Gaussian noise with zero mean andthe standard deviation equal of a generated value of ametabolite at a given time step. We notice that adding noiseprohibits finding a solution ( i.e. , values of parameters) thatachieves error defined in Eq. 2 equal zero. We repeat experiments three times. For each repetition,we set the length of a timecourse to 𝑇 = 30 . Two cases
Our main research goal is the parameter iden-tification of a partially observable system of multiple bio-chemical reactions. However, as highlighted in the introduc-tion, a proper identification of parameters could be used forfingerprinting normal and abnormal biochemical processes.Therefore, we aim at developing optimization methods thatallow to identify parameters of reactions that are not directlyobserved. For this purpose, we distinguish two cases. Inthe
Case 1 we use the model as described before. In the
Case 2 we assume a mutation of the reaction 𝑣 that com-bines two unobserved metabolites ( fru and triop , see Figure2). We simulate the perturbation by changing the value ofthe parameter 𝑘 from . to . . It this difficult to predicthow such relatively small modification influences the wholesystem, and, thus, it serves as an import case study for opti-mization methods. Quantitative and qualitative evaluation
In order to an-swer our research questions, we use the following evaluationmeasures. First, we monitor a convergence of the optimiza-tion methods by plotting the error in Eq. 2. We are interestedwhether an optimizer converges to a minimum, and how fastit is achieved. The speed of convergence is defined by thenumber of evaluated individuals by a population-based meth-ods.Second, we qualitatively inspect the difference betweenthe simulator output of unobserved metabolites and the realmetabolites. The qualitative evaluation is given by the Eq.2, however, it is also important to obtain an insight into howa potentially misidentified parameter result in a metabolitetimecourse.Third, since we know the real parameter values, we canalso evaluate a difference between them and the best val-ues found by the optimization methods. We use the absolutevalue of the difference of two values. We calculate the meanand the standard deviations of the difference from three runs,and use the cumulative distribution function of the foldednormal distribution to visualize the distribution of differ-ences (the ideal case is ). Hyperparameters of optimizers
For all optimization meth-ods, we set the population size to 𝑁 = 100 . All optimiz-ers run maximally generations. In the case of ES, weuse the initial value of 𝜎 equal . . For DE, RevDE, andRevDE+, we use 𝐹 = 0 . , and 𝑝 = 0 . . For EDA we take 𝑀 = 100 . In the case of EDA+ and RevDE+, we use the 𝐾 -NN as the surrogate model with 𝐾 = 3 , and we do notstore more than , evaluated individuals. Implementation details
All computational methods are im-plemented in Python using standard packages ( e.g. , numpy , scipy ). We also use the Python Simulator for Cellular Sys- The difference between two real-valued random variable is normallydistributed. However, taking the absolute value of a normally distributedrandom variable results in the folded normal distribution.
Weglarz-Tomczak et al.:
Preprint
Page 6 of 11inetic Parameters Identification A No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e DE exp1: 3.676exp2: 3.676exp3: 3.676 No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e RevDE exp1: 3.676exp2: 3.676exp3: 3.676
No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e EDA exp1: 3.676exp2: 3.677exp3: 3.679
No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e ES exp1: 3.676exp2: 3.676exp3: 3.676 No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e RevDE+ exp1: 3.676exp2: 3.676exp3: 3.676
No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e EDA+ exp1: 3.676exp2: 3.676exp3: 3.676 B No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e DE exp1: 3.676exp2: 3.676exp3: 3.676 No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e RevDE exp1: 3.676exp2: 3.676exp3: 3.676
No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e EDA exp1: 3.679exp2: 3.677exp3: 3.677
No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e ES exp1: 3.677exp2: 3.676exp3: 3.677 No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e RevDE+ exp1: 3.676exp2: 3.676exp3: 3.676
No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e EDA+ exp1: 3.676exp2: 3.676exp3: 3.676
Figure 4:
The convergence of the population-based optimization methods over runs: A Case 1, B Case 2 ( mutation ). In the legends, we indicate the value of the fitness functionafter the methods converged. tems (PySCeS) library [26] that is an extendable toolkit forthe analysis and investigation of cellular systems. It allowsto import a model represented in a human-readable manner,and solve a system of differential equation using a built-insolver. In the experiments, we downloaded the model pro-posed in [45] from the JWS Online Database [27], availableunder the following link [42]. All implemented population-based optimization methods as well as the experiments areavailable online: http://XXX . Fitness value
In Figure 4 we present convergence of themethods for Case 1 (Figure 4.A) and Case 2 (Figure 4.B).We notice that all methods were able to converge and achievevery similar fitness values. However, the (1+1)-ES methodwas slowest due to the slow exploration capabilities. EDAalso required more evaluations to obtain better results. Inter-estingly, DE, RevDE, RevDE+ and EDA+ achieved almostidentical values of the fitness function (the differences werebeyond the three digit precision). An important observationis that application of the surrogate model (the 𝐾 -NN regres- Weglarz-Tomczak et al.:
Preprint
Page 7 of 11inetic Parameters Identification A a t p RevDE+ atp (real)atp (sim)atp (3*std) 0 10 20 30time0123 t p RevDE+ tp (real)tp (sim)tp (3*std) B a t p RevDE+ atp (real)atp (sim)atp (3*std) 0 10 20 30time0.00.20.40.60.81.0 t p RevDE+ tp (real)tp (sim)tp (3*std)
Figure 5:
A comparison of the timecourses of the selected twometabolite for RevDE+: A : Case 1, B : Case 2 ( mutation ).Real timecourses are depicted in red, and the average valueand a confidence interval ( standard deviation) over runsof the simulator are in blue. sor) allowed to speed up the convergence of RevDE+ com-pared to RevDE significantly. Moreover, in the case of EDA,the surrogate model allowed a better exploration ( 𝑀 = 5 𝑁 )and, thus, EDA+ obtained better results in a significantlyless number of evaluations. We conclude that all population-based methods were able to converge and achieved almostidentical scores, and our proposition of applying the surro-gate model led to improving RevDE and EDA Timecourses
The final value of the fitness function tellsus how well the simulator models the observed timecoursesfor given parameters provided by an optimizer. Addition-ally, we can also qualitatively inspect the timecourses boththe observed and unobserved metabolites. In Figure 5 wepresent exemplary timecourses for atp (the observed metabo-lite), and tp (the unobserved metabolite), for parameter val-ues found by RevDE+. Since all methods obtained very lowerrors close to noise in data (see Figure 4), we show the time-courses of the unobserved metabolites in Figures 7 and 8 forCase 1 and Case 2, respectively, in the Appendix B.In all cases except ES for tp , the best parameter valuesfound by the optimizers resulted in timecourses that are al-most identical to the real observations. For all unobservedmetabolites the average over repetitions of the experimentsoverlapped with the real value, or lied within the confidenceinterval ( standard deviation). This is a result that wehoped for since being able to generate unobserved metabo-lite is extremely important for analysing biological systems. Differences in parameters
In this paper, we know pre-cisely the values of the parameters since they were mea-sured in [45] and, in Case 2, we modify one parameter byhand. Hence, we can compare the parameter values found by the optimization methods with the real parameter values.In Figure 6 we present four parameters for which we see aclear difference between Case 1 and Case 2. Difference ofall parameters are included in the Appendix B, in Figures 9and 10. In general, the differences are marginal and we canconclude that all parameter values were properly identified.The biggest problems though appear for parameters that havevery large values, e.g. , 𝑘 or 𝑘 .Interestingly, we noticed some significant differences forparameter values found by the methods between Case 1 andCase 2. For instance, for 𝑘 , most methods achieved a dif-ference around in Case 1, while it was almost doubledin Case 2. This result shows that parameter identification incomplex biological systems is a challenging task and its be-havior cannot be easily predicted upfront.
4. Conclusion
Here, we propose a new optimization framework for pa-rameter identification of biological systems. We assume adynamical model of the system with initial conditions, mea-surements of selected metabolites, and an ODE solver to-gether with other tools to represent the model in machine-readable format as inputs in our framework (see Figure 1).Since the cost of solving the model is relatively low, we pro-pose to utilize the population-based optimization methodsto identify the parameters of the system. As a result, oncewe have found the parameters values, we can run simula-tions and generate timecourses of all metabolites, includingthe ones that are unobserved, and further analyze the bio-logical system. In this paper, we provide a proof-of-conceptof one of the most important metabolic processes, namely,the glycolysis pathway, of the well-studied
Saccharomycescerevisiae , known also as baker’s yeast.We outlined the general scheme of population-based op-timization methods, followed by a description of three clas-sic population-based DFO algorithms, namely, differentialequation (DE), an evolutionary strategy, and the univariateGaussian estimation of distribution algorithm (EDA). Next,we described recently published extension of the differentialevolution called Reversible Differential Evolution (RevDE)[39]. Further, in order to decrease the computational com-plexity of the RevDE, we proposed to utilize the 𝐾 -NN re-gressor as a surrogate model. Similarly, we used the 𝐾 -NNbased surrogate model to increase exploration capabilities ofthe EDA. In the experiments, we showed that all population-based methods could be successfully used to identify param-eters of a complex biological networks. However, it seemsthat too simplistic approaches ( e.g. , (1+1)-ES) could be slowand not accurate enough (see Figures 7 and 8). The surrogate-based methods indeed achieved better scores than their vanillacounterparts with almost negligible computational burden.In the introduction, we stated three research goals and inthe experiments we achieved them all. First, we applied vari-ous population-based optimization algorithms to the param-eter identification of the glycolysis pathway. In the experi-ments, we analyzed the performance of the methods and we Weglarz-Tomczak et al.:
Preprint
Page 8 of 11inetic Parameters Identification A . . . Difference in k20.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k330.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k80.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . . . Difference in ntot0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ B . . . . . Difference in k20.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k330.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k80.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . Difference in ntot0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Figure 6:
The cumulative distribution functions (cdfs) of the differences for selected pa-rameters: A : Case 1, B : Case 2 ( mutation ). Ideally, a cdf of an optimization methodshould resemble a step-function centered at . The averages and the scales are calculatedover repetitions of the experiment. noticed that: (i) all algorithms converged to (local) minima,however, ES and EDA needed more evaluations, (ii) in bothcases ( i.e. , with the real parameter values and with the mu-tation) the methods converged, (iii) enhancing RevDE andEDA with surrogate models led to speeding up convergenceand increasing exploration capabilities.Second, we show in the experiments that indeed it ispossible to identify parameter values while only a subset ofmetabolites are observed. This result is important and en-couraging for further studies with larger networks.Last, we consider a case study with a mutation of one re-action, i.e. , a value of a parameter is modified. We chose thereaction 𝑣 since it combines two unobserved metabolites( fru and triop ) that makes the problem more challenging. Inthis case, the population-based optimization methods wereable to find good solutions anyway that again is an essentialindication of usefulness of the presented optimizers.From the biological perspective, our work is among firstthat showed that the parameter identification problem of com-plex biological systems with limited access of observed metabo-lites is possible. Our paper is a positive proof-of-conceptand should be further investigated. A possible research di-rection is to consider larger dynamical models ( i.e. , morereactions and more parameters), and a more thorough anal-ysis of the parameter identification with various number ofobserved metabolites. The possible application of our ap-proach is the identification of single mutation as well as phe-notypic heterogeneity within a population. Therefore, ourmethod opens up a new perspective in the field of medicine,and biology, and may result in an innovative computational-aided diagnostic and / or analytical tool.From the computational perspective, our work indicatesa great potential of population-based optimization methods in the field of biology and biochemistry. In the case of rela-tively low computational costs of obtaining an evaluation ofparameters, the population-based methods seem to be suffi-cient to solve the parameter identification problem. More-over, our results for applying surrogate models to the opti-mizers can be highly effective. It is a well-known fact ( e.g. ,see [15]), nevertheless, we believe that the optimization withsurrogate models has a great future and should be investi-gated in more detail. CRediT authorship contribution statement
Ewelina Weglarz-Tomczak:
Conceptualization, Vali-dation, Investigation, Writing - Original Draft, Writing - Re-view & Editing.
Jakub M. Tomczak:
Methodology, Soft-ware, Investigation, Writing - Original Draft, Writing - Re-view & Editing.
Agoston E. Eiben:
Writing - Review &Editing.
Stanley Brul:
Supervision, Writing - Review &Editing.
Acknowledgments
EW-T is financed by a grant within Mobilność Plus V fromthe Polish Ministry of Science and Higher Education (GrantNo. 1639/MOB/V/2017/0).
References [1] Audet, C., Hare, W., 2017. Derivative-free and Blackbox Optimiza-tion. Springer.[2] Bäck, T., Foussette, C., Krause, P., 2013. Contemporary EvolutionStrategies. Springer.[3] Balsa-Canto, E., Alonso, A.A., Banga, J.R., 2010. An iterative iden-tification procedure for dynamic modeling of biochemical networks.BMC systems biology 4, 11.
Weglarz-Tomczak et al.:
Preprint
Page 9 of 11inetic Parameters Identification [4] Cranmer, K., Brehmer, J., Louppe, G., 2020. The frontier ofsimulation-based inference. Proceedings of the National Academyof Sciences .[5] Doncieux, S., Bredeche, N., Mouret, J.B., Eiben, A.E., 2015. Evolu-tionary robotics: what, why, and where to. Frontiers in Robotics andAI 2, 4.[6] Duarte, N.C., Herrgård, M.J., Palsson, B.Ø., 2004. Reconstructionand validation of saccharomyces cerevisiae ind750, a fully compart-mentalized genome-scale metabolic model. Genome research 14,1298–1309.[7] Eiben, A.E., Smith, J., 2015a. From evolutionary computation to theevolution of things. Nature 521, 476–482.[8] Eiben, A.E., Smith, J.E., 2015b. Introduction to Evolutionary Com-puting. volume 53. Springer.[9] Gallagher, M., Frean, M., 2005. Population-based continuous opti-mization, probabilistic modelling and mean shift. Evolutionary Com-putation 13, 29–42.[10] Gatenby, R.A., Gillies, R.J., 2004. Why do cancers have high aerobicglycolysis? Nature reviews cancer 4, 891–899.[11] Gerard, M.F., Stegmayer, G., Milone, D.H., 2013. An evolutionaryapproach for searching metabolic pathways. Computers in Biologyand Medicine 43, 1704–1712.[12] Hynne, F., Danø, S., Sørensen, P.G., 2001. Full-scale model of gly-colysis in saccharomyces cerevisiae. Biophysical chemistry 94, 121–163.[13] Ideker, T., Galitski, T., Hood, L., 2001. A new approach to decodinglife: systems biology. Annual review of genomics and human genetics2, 343–372.[14] Ingalls, B.P., 2013. Mathematical modeling in systems biology: anintroduction. MIT press.[15] Jin, Y., 2011. Surrogate-assisted evolutionary computation: Recentadvances and future challenges. Swarm and Evolutionary Computa-tion 1, 61–70.[16] Jones, D.R., Schonlau, M., Welch, W.J., 1998. Efficient global op-timization of expensive black-box functions. Journal of Global opti-mization 13, 455–492.[17] Kourdis, P.D., Goussis, D.A., 2013. Glycolysis in saccharomycescerevisiae: algorithmic exploration of robustness and origin of os-cillations. Mathematical biosciences 243, 190–214.[18] Larrañaga, P., Lozano, J.A., 2001. Estimation of distribution algo-rithms: A new tool for evolutionary computation. Springer Science& Business Media.[19] Lee, T.I., Rinaldi, N.J., Robert, F., Odom, D.T., Bar-Joseph, Z., Ger-ber, G.K., Hannett, N.M., Harbison, C.T., Thompson, C.M., Simon,I., et al., 2002. Transcriptional regulatory networks in saccharomycescerevisiae. science 298, 799–804.[20] Mensonides, F.I., Brul, S., Hellingwerf, K.J., Bakker, B.M., Teix-eira de Mattos, M.J., 2014. A kinetic model of catabolic adaptationand protein reprofiling in saccharomyces cerevisiae during tempera-ture shifts. The FEBS Journal 281, 825–841.[21] Moles, C.G., Mendes, P., Banga, J.R., 2003. Parameter estimation inbiochemical pathways: a comparison of global optimization methods.Genome research 13, 2467–2474.[22] Morbiducci, U., Di Benedetto, G., Kautzky-Willer, A., Deriu, M.A.,Pacini, G., Tura, A., 2011. Identification of a model of non-esterifiedfatty acids dynamics through genetic algorithms: The case of womenwith a history of gestational diabetes. Computers in Biology andMedicine 41, 146–153.[23] Mühlenbein, H., Paass, G., 1996. From recombination of genes tothe estimation of distributions i. binary parameters, in: InternationalConference on Parallel Problem Solving from Nature, Springer. pp.178–187.[24] Nielsen, J., 2017. Systems biology of metabolism. Annual review ofbiochemistry 86, 245–275.[25] Nielsen, J., 2019. Yeast systems biology: model organism and cellfactory. Biotechnology journal 14, 1800421.[26] Olivier, B.G., Rohwer, J.M., Hofmeyr, J.H.S., 2005. Modelling cel-lular systems with PySCeS. Bioinformatics 21, 560–561. [27] Olivier, B.G., Snoep, J.L., 2004. Web-based kinetic modelling usingjws online. Bioinformatics 20, 2143–2144.[28] Orij, R., Urbanus, M.L., Vizeacoumar, F.J., Giaever, G., Boone, C.,Nislow, C., Brul, S., Smits, G.J., 2012. Genome-wide analysis ofintracellular ph reveals quantitative control of cell division rate by phc in saccharomyces cerevisiae. Genome biology 13, R80.[29] Pedersen, M.E.H., 2010. Good parameters for differential evolution.Technical Report HL1002. Hvass Laboratories.[30] Pelicano, H., Martin, D., Xu, R., Huang, P., 2006. Glycolysis inhibi-tion for anticancer treatment. Oncogene 25, 4633–4646.[31] Pelikan, M., Hauschild, M.W., Lobo, F.G., 2015. Estimation of dis-tribution algorithms, in: Springer Handbook of Computational Intel-ligence. Springer, pp. 899–928.[32] Price, K., Storn, R.M., Lampinen, J.A., 2006. Differential Evolution:A Practical Approach to Global Optimization. Springer Science &Business Media.[33] Reali, F., Priami, C., Marchetti, L., 2017. Optimization algorithmsfor computational systems biology. Frontiers in Applied Mathematicsand Statistics 3, 6.[34] Schwefel, H.P., 1977. Numerische Optimierung von Computer-Modellen mittels der Evolutionsstrategie. Springer.[35] Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., De Freitas, N.,2015. Taking the human out of the loop: A review of Bayesian opti-mization. Proceedings of the IEEE 104, 148–175.[36] Storn, R., Price, K., 1997. Differential evolution–a simple and effi-cient heuristic for global optimization over continuous spaces. Journalof Global Optimization 11, 341–359.[37] Teusink, B., Passarge, J., Reijenga, C.A., Esgalhado, E., Van der Wei-jden, C.C., Schepper, M., Walsh, M.C., Bakker, B.M., Van Dam, K.,Westerhoff, H.V., et al., 2000. Can yeast glycolysis be understoodin terms of in vitro kinetics of the constituent enzymes? testing bio-chemistry. European Journal of Biochemistry 267, 5313–5329.[38] Tomczak, J.M., Węglarz-Tomczak, E., 2019. Estimating kinetic con-stants in the Michaelis–Menten model from one enzymatic assay us-ing Approximate Bayesian Computation. FEBS letters 593, 2742–2750.[39] Tomczak, J.M., Weglarz-Tomczak, E., Eiben, A.E., 2020. Differentialevolution with reversible linear transformations, in: GECCO 2020.[40] Toni, T., Welch, D., Strelkowa, N., Ipsen, A., Stumpf, M.P., 2009. Ap-proximate Bayesian computation scheme for parameter inference andmodel selection in dynamical systems. Journal of the Royal SocietyInterface 6, 187–202.[41] Tsiantis, N., Balsa-Canto, E., Banga, J.R., 2018. Optimality and iden-tification of dynamic models in systems biology: an inverse optimalcontrol framework. Bioinformatics 34, 2433–2440.[42] https://jjj.bio.vu.nl/models/wolf/ , 07-August-2020. [accessed on-line].[43] Van Eunen, K., Bouwman, J., Daran-Lapujade, P., Postmus, J.,Canelas, A.B., Mensonides, F.I., Orij, R., Tuzun, I., Van Den Brink,J., Smits, G.J., et al., 2010. Measuring enzyme activities under stan-dardized in vivo-like conditions for systems biology. The FEBS jour-nal 277, 749–760.[44] Westerhoff, H.V., Palsson, B.O., 2004. The evolution of molecularbiology into systems biology. Nature biotechnology 22, 1249–1252.[45] Wolf, J., Passarge, J., Somsen, O.J., Snoep, J.L., Heinrich, R., West-erhoff, H.V., 2000. Transduction of intracellular and intercellular dy-namics in yeast glycolytic oscillations. Biophysical journal 78, 1145–1153.[46] Wolkenhauer, O., Ullah, M., Kolch, W., Cho, K.H., 2004. Model-ing and simulation of intracellular dynamics: choosing an appropriateframework. IEEE transactions on nanobioscience 3, 200–207.[47] Yang, X., Cheng, X., Liu, Q., Zhang, C., Song, Y., 2020. The re-sponse surface method-genetic algorithm for identification of the lum-bar intervertebral disc material parameters. Computers in Biology andMedicine 124, 103920.
Weglarz-Tomczak et al.:
Preprint
Page 10 of 11inetic Parameters Identification
A. Appendix: The glycolysis modeldescription
In the considered model of the glycolysis we distinguishthe following metabolites:• glycolysis ( glu );• fructose-1,6-bisphosphate ( fru );• triosephosphates ( triop );• triphosphoglycerate ( tp );• pyruvate ( pyr );• acetaldehyde ( ac );• external acetaldehyde ( ace ).Following the same assumptions as in [45] ( i.e. , a homo-geneous distribution of the metabolites in the intracellularand in the extracellular solution), the systems of ordinary dif-ferential equations of the glycolysis model in Saccharomycescerevisiae is the following [42]: ̇𝑔𝑙𝑢 = 𝑣 − 𝑣 (12) ̇𝑓 𝑟𝑢 = 𝑣 − 𝑣 (13) ̇𝑡𝑟𝑖𝑜𝑝 = 2 𝑣 − 𝑣 − 𝑣 (14) ̇𝑡𝑝 = 𝑣 − 𝑣 (15) ̇𝑝𝑦𝑟 = 𝑣 − 𝑣 (16) ̇𝑎𝑐 = 𝑣 − 𝑣 − 𝑣 (17) ̇𝑎𝑐𝑒 = 0 . 𝑣 − 𝑣 (18) ̇𝑎𝑡𝑝 = −2 𝑣 + 𝑣 + 𝑣 − 𝑣 (19) ̇𝑛𝑎𝑑 = 𝑣 − 𝑣 − 𝑣 (20)with the rate equations: 𝑣 = 𝑘 (21) 𝑣 = 𝑘 ⋅ 𝑔𝑙𝑢 ⋅ 𝑎𝑡 𝑎𝑡 ∕ 𝑘 𝑖 ) 𝑛 (22) 𝑣 = 𝑘 ⋅ 𝑓 𝑟𝑢 (23) 𝑣 = 𝑘 ⋅ 𝑘 ⋅ 𝑡𝑟𝑖𝑜𝑝 ⋅ 𝑛𝑎𝑑𝐴 − 𝑘 ⋅ 𝑘 ⋅ 𝑡𝑝 ⋅ 𝑎𝑡𝑝 𝑁𝑘 ⋅ 𝑁 + 𝑘 ⋅ 𝐴 (24) 𝑣 = 𝑘 ⋅ 𝑡𝑝 ⋅ 𝐴 (25) 𝑣 = 𝑘 ⋅ 𝑝𝑦𝑟 (26) 𝑣 = 𝑘 ⋅ 𝑎𝑐 ⋅ 𝑛𝑎𝑑 (27) 𝑣 = 𝑘 ⋅ 𝑎𝑡𝑝 (28) 𝑣 = 𝑘 ⋅ 𝑡𝑟𝑖𝑜𝑝 ⋅ 𝑛𝑎𝑑 (29) 𝑣 = 𝑘 ⋅ 𝑎𝑐𝑒 (30) 𝑣 = 𝑘 ⋅ 𝑎𝑡𝑝 (31)where: 𝐴 = ( 𝑎 𝑡𝑜𝑡 − 𝑎𝑡𝑝 ) (32) 𝑁 = ( 𝑛 𝑡𝑜𝑡 − 𝑛𝑎𝑑 ) (33)The initial conditions are the following: 𝑎𝑡𝑝 = 2 . (34) 𝑛𝑎𝑑 = 0 . (35) 𝑔𝑙𝑢 = 5 . (36) 𝑓 𝑟𝑢 = 5 . (37) 𝑡𝑟𝑖𝑜𝑝 = 0 . (38) 𝑡𝑝 = 0 . (39) 𝑝𝑦𝑟 = 8 . (40) 𝑎𝑐 = 0 . (41) 𝑎𝑐𝑒 = 0 . . (42)The model is schematically depicted in Figure 2.The real values of the parameters are the following [42]: 𝑎 𝑡𝑜𝑡 = 4 ∈ [0 , (43) 𝑘 = 0 ∈ [0 , (44) 𝑘 = 550 ∈ [550 , (45) 𝑘 = 9 . , (46) 𝑘 = 323 . , (47) 𝑘 = 76411 . , (48) 𝑘 = 57823 . , (49) 𝑘 = 23 . , (50) 𝑘 = 80 ∈ [80 , (51) 𝑘 = 9 . , (52) 𝑘 = 2000 ∈ [2000 , (53) 𝑘 = 28 . , (54) 𝑘 = 85 . , (55) 𝑘 = 0 ∈ [0 , (56) 𝑘 = 375 ∈ [350 , (57) 𝑘 𝑖 = 1 ∈ [0 , (58) 𝑛 = 4 ∈ [0 , (59) 𝑛 𝑡𝑜𝑡 = 1 ∈ [0 , , (60)where we indicate the set of possible values of the parame-ters in the square brackets.We note that for the sake of our experiments, we set 𝑘 to (originally: 𝑘 = 50 [42]) in order to forbid a constantinjection of 𝑔𝑙𝑢 , and 𝑘 to (originally: 𝑘 = 80 [42]) inorder to avoid oscillatory behavior of the system. B. Appendix: Additional results
The detailed results for the timecourses of the unobservedmetabolites are presented in Figures 7 and 8.The detailed results of the differences between the realparameters and found parameters are depicted in Figures 9and 10.
Weglarz-Tomczak et al.:
Preprint
Page 11 of 11inetic Parameters Identification f r u DE fru (real)fru (sim)fru (3*std) 0 10 20 30time0123 t p DE tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.60.8 t r i o p DE triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r DE pyr (real)pyr (sim)pyr (3*std)0 10 20 30time0246 f r u RevDE fru (real)fru (sim)fru (3*std) 0 10 20 30time0123 t p RevDE tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.60.81.0 t r i o p RevDE triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r RevDE pyr (real)pyr (sim)pyr (3*std)0 10 20 30time0246 f r u RevDE+ fru (real)fru (sim)fru (3*std) 0 10 20 30time0123 t p RevDE+ tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.60.8 t r i o p RevDE+ triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r RevDE+ pyr (real)pyr (sim)pyr (3*std)0 10 20 30time0246 f r u EDA fru (real)fru (sim)fru (3*std) 0 10 20 30time012 t p EDA tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.60.81.0 t r i o p EDA triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r EDA pyr (real)pyr (sim)pyr (3*std)0 10 20 30time0246 f r u EDA+ fru (real)fru (sim)fru (3*std) 0 10 20 30time0123 t p EDA+ tp (real)tp (sim)tp (3*std) 0 10 20 30time0.000.250.500.751.00 t r i o p EDA+ triop (real)triop (sim)triop (3*std) 0 10 20 30time0246810 p y r EDA+ pyr (real)pyr (sim)pyr (3*std)0 10 20 30time0246 f r u ES fru (real)fru (sim)fru (3*std) 0 10 20 30time0123 t p ES tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.60.8 t r i o p ES triop (real)triop (sim)triop (3*std) 0 10 20 30time0246810 p y r ES pyr (real)pyr (sim)pyr (3*std) Figure 7:
A comparison of the timecourses of the unobserved metabolites in the Case 1.Real timecourses are depicted in red, and the average value and a confidence interval ( standard deviation) over runs of the simulator are in blue. The titles of the plots indicateoptimization methods.Weglarz-Tomczak et al.: Preprint
Page 12 of 11inetic Parameters Identification f r u DE fru (real)fru (sim)fru (3*std) 0 10 20 30time0.00.20.40.60.81.0 t p DE tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.6 t r i o p DE triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r DE pyr (real)pyr (sim)pyr (3*std)0 10 20 30time0246 f r u RevDE fru (real)fru (sim)fru (3*std) 0 10 20 30time0.00.20.40.60.81.0 t p RevDE tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.6 t r i o p RevDE triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r RevDE pyr (real)pyr (sim)pyr (3*std)0 10 20 30time0246 f r u RevDE+ fru (real)fru (sim)fru (3*std) 0 10 20 30time0.00.20.40.60.81.0 t p RevDE+ tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.6 t r i o p RevDE+ triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r RevDE+ pyr (real)pyr (sim)pyr (3*std)0 10 20 30time0246 f r u EDA fru (real)fru (sim)fru (3*std) 0 10 20 30time0.00.51.0 t p EDA tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.6 t r i o p EDA triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r EDA pyr (real)pyr (sim)pyr (3*std)0 10 20 30time0246 f r u EDA+ fru (real)fru (sim)fru (3*std) 0 10 20 30time0.00.20.40.60.81.0 t p EDA+ tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.6 t r i o p EDA+ triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r EDA+ pyr (real)pyr (sim)pyr (3*std)0 10 20 30time02468 f r u ES fru (real)fru (sim)fru (3*std) 0 10 20 30time1012 t p ES tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.6 t r i o p ES triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r ES pyr (real)pyr (sim)pyr (3*std) Figure 8:
A comparison of the timecourses of the unobserved metabolites in the Case 2.Real timecourses are depicted in red, and the average value and a confidence interval ( standard deviation) over runs of the simulator are in blue. The titles of the plots indicateoptimization methods.Weglarz-Tomczak et al.: Preprint
Page 13 of 11inetic Parameters Identification . . . . . Difference in atot0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k00.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k10.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . Difference in k20.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k310.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k320.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k330.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k340.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k40.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k50.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k60.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . . . Difference in k70.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k80.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . . . Difference in k90.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k100.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in ki0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . . . Difference in n0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . . . Difference in ntot0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Figure 9:
The cumulative distribution functions (cdfs) of the differences for all parameters.Ideally, a cdf of an optimization method should resemble a step-function centered at .The averages and the scales are calculated over repetitions of the experiment in the Case1.Weglarz-Tomczak et al.: Preprint
Page 14 of 11inetic Parameters Identification
Difference in atot0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k00.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k10.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . . . Difference in k20.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k310.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k320.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k330.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k340.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k40.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . Difference in k50.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k60.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k70.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k80.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . Difference in k90.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in k100.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Difference in ki0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . . . Difference in n0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . Difference in ntot0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+
Figure 10: