[PDF] Population-based Optimization for Kinetic Parameter Identification in Glycolytic Pathway in Saccharomyces cerevisiae

Abstract

Models in systems biology are mathematical descriptions of biological processes that are used to answer questions and gain a better understanding of biological phenomena. Dynamic models represent the network through rates of the production and consumption for the individual species. The ordinary differential equations that describe rates of the reactions in the model include a set of parameters. The parameters are important quantities to understand and analyze biological systems. Moreover, the perturbation of the kinetic parameters are correlated with upregulation of the system by cell-intrinsic and cell-extrinsic factors, including mutations and the environment changes. Here, we aim at using well-established models of biological pathways to identify parameter values and point their potential perturbation/deviation. We present our population-based optimization framework that is able to identify kinetic parameters in the dynamic model based on only input and output data (i.e., timecourses of selected metabolites). Our approach can deal with the identification of the non-measurable parameters as well as with discovering deviation of the parameters. We present our proposed optimization framework on the example of the well-studied glycolytic pathway in Saccharomyces cerevisiae.

Full PDF

PPopulation-based Optimization for Kinetic Parameter Identiﬁcation inGlycolytic Pathway in

Saccharomyces cerevisiae

Ewelina Weglarz-Tomczak a , ∗ ,1 , Jakub M. Tomczak b , ∗ ,2 , Agoston E. Eiben b and Stanley Brul a a Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, the Netherlands b Department of Computer Science, Faculty of Science, Vrije Universiteit Amsterdam, the Netherlands

A R T I C L E I N F O

Keywords :Dynamic ModelsMetabolismGlycolysisYeastEvolutionary ComputingDerivative-free Optimization

A B S T R A C T

Models in systems biology are mathematical descriptions of biological processes that are used toanswer questions and gain a better understanding of biological phenomena. Dynamic models representthe network through rates of the production and consumption for the individual species. The ordinarydiﬀerential equations that describe rates of the reactions in the model include a set of parameters.The parameters are important quantities to understand and analyze biological systems. Moreover, theperturbation of the kinetic parameters are correlated with upregulation of the system by cell-intrinsicand cell-extrinsic factors, including mutations and the environment changes.Here, we aim at using well-established models of biological pathways to identify parameter val-ues and point their potential perturbation/deviation. We present our population-based optimizationframework that is able to identify kinetic parameters in the dynamic model based on only input andoutput data ( i.e. , timecourses of selected metabolites). Our approach can deal with the identiﬁca-tion of the non-measurable parameters as well as with discovering deviation of the parameters. Wepresent our proposed optimization framework on the example of the well-studied glycolytic pathwayin

Saccharomyces cerevisiae .

1. Introduction

Mathematical models in systems biology provide a rep-resentation of the information obtained from experimentalobservations about the structure and function of a particularbiological network [14, 24]. The models that include dy-namics of the network typically consist of systems of ordi-nary diﬀerential equations (ODE) [41, 46]. We call themdynamic or kinetic models, and their crucial element are pa-rameters. Many parameters are generally unknown, therebyit hampers the possibility for obtaining quantitative predic-tions [3]. Kinetic parameters characterize the particular re-action catalyzed by a speciﬁc enzyme in particular condi-tions. Therefore, the deviation from the standard value couldbe correlated with a mutation, an epigenetics or a change ofthe environment. In other words, determining values of pa-rameters for the considered biological systems could be usedfor further understanding and analysis of the system.In traditional systems biology approaches the kinetic pa-rameters in a dynamic model can be identiﬁed by ﬁtting themodel to experimental data or are measured for individualreactions separately. Such models can be used to conﬁrmhypotheses, to draw predictions and to ﬁnd those (time vary-ing) stimulation conditions that result in a particular desiredbehavior [13, 24, 44]. We propose to go a step forward and ∗ Corresponding author [email protected] (E. Weglarz-Tomczak); [email protected] (J.M. Tomczak)

ORCID (s): (E. Weglarz-Tomczak); (J.M. Tomczak); (A.E. Eiben); (S. Brul) Tel. +31 20 525 8626, Address: Science Park 904, 1098 XH, Ams-terdam, the Netherlands Tel. +31 20 598 8206, Address: De Boelelaan 1111, 1081 HV, Ams-terdam, the Netherlands we aim at using established models to predict a perturbationin the biological system and to point out the step (reaction)where it occurs via identiﬁcation of the kinetic parametersof diﬀerential equations. Therefore, the goal of our work isto develop a computational-based framework for: (i) iden-tifying the non-measurable parameters so as to reproduce,insofar as it is possible, the experimental data, and (ii) pre-dicting kinetic parameters of any reaction in the biochemicalnetwork based on the timecourse of only input and outputmetabolites.

Parameter identificationInputs

Dynamic model+initial conditions Timecourse data (limited)

Outputs

PySCeSStep 1

Generation

Step 2

Evaluation

Step 3

SelectionPopulation-basedoptimization

Timecourses of all metabolitesParameters values

Figure 1:

A schematic representation of our approach.Weglarz-Tomczak et al.:

Preprint

Page 1 of 11 a r X i v : . [ q - b i o . B M ] S e p inetic Parameters Identiﬁcation Among many branches of computational methods, theoptimization algorithms seemed for us to be most promisingfor achieving our goal. The optimization strategies, namely,deterministic, stochastic and heuristics have been already ap-plied in systems biology for parameter identiﬁcation for dia-betes dynamics [22], biomarker identiﬁcation, in-silico sim-ulations of biological phenomena, and for a variety of statis-tical inferences and time course estimations [33].In general, optimization is about ﬁnding a solution thatminimizes (or maximizes) an objective function for givenconstraints, i.e. , possible values that solutions can take. Asubset of optimization problems with non-diﬀerentiable or black-box objective functions constitute derivative-free op-timization (DFO) problems. In general, a black-box is anyprocess that for given input, returns an output, but its analyt-ical description is unavailable or it is non-diﬀerentiable [1].Moreover, the big advantage of the black-box optimizationand the derivative-free optimization is that they make almostno assumptions about the problem [7]. Therefore, they arewidely used in many domains, e.g. , in optimizing computerprograms [4], biochemical processes [11, 21, 38, 40], bio-engineering [47], or in evolutionary robotics [5].There exists a vast of derivative-free optimization (DFO)methods, ranging from classical algorithms like iterative lo-cal search or direct search [1] to modern approaches likeBayesian optimization (BO) [35] and evolutionary algorithms(EA) [2, 8]. The main drawback of classical approaches isthat they become infeasible for high-dimensional problems,and they require additional strategies like multiple starts toobtain good solutions. Bayesian optimization is currentlythe state-of-the-art approach for black-box optimization. Thisapproach combines a surrogate model with an active query-ing strategy to ﬁnd high quality candidate solutions. How-ever, typically the surrogate model is non-parametric thatleads to a cubic complexity with respect to the number ofstored solutions. Therefore, BO is typically employed to op-timize expensive-to-evaluate functions. The last group ofDFO methods, evolutionary algorithms or, more generally,population-based methods [9], utilize a population of solu-tions that share information and point the search to region ofhigh potential. If evaluating the objective function is rela-tively low, and, thus, one can aﬀord to have a large popula-tion, then these approaches give very good results in a largeclass of optimization problems. Here, since we deal with bi-ological systems in silico , we decide to follow this approachand use the population-based optimization methods for theparameter identiﬁcation, see Figure 1.In this study, we chose glycolysis that is a crucial metabolicpathway and its upregulation is correlated with diseases likecancer [10, 30]. Nearly all living organisms carry out gly-colysis as a part of cellular metabolism. Glycolytic paththat consists of a series of reactions breaks down glucoseinto two three-carbon compounds and extracts energy forcellular metabolism. Therefore, glycolysis is at the heartof classical biochemistry and, as such, it is very well de-scribed. One of the most intensively studied organisms incontext of, among others, glycolysis is Saccharomyces cere- glucosev1v2fructose-1,6-biphosphateATP v11triosephosphates v3 v4 NADNADATP v5triphosphoglyceratepyruvateacetaldehydeNAD externalacetaldehydev6v7 v9 v10v8ATP

Figure 2:

The glycolysis process in the yeast

Saccharomycescerevisiae proposed in [45]. There are reactions governingthe process with parameters in total, and metabolites.Blue circles depict observable metabolites, red circles denoteunobservable metabolites, and green squares represent reac-tions. A white circle with a diagonal line corresponds to asink. The model is taken from the JWS database [27]. visiae species, also known also known as bakerâĂŹs yeast[6, 19, 20, 25, 28]. Whereas, the dynamic model of glycoly-sis in Saccharomyces cerevisiae is of big interest in systemsbiology dynamic modeling literature [12, 17, 37, 43, 45].We applied our optimization framework to a model ofglycolysis in yeast proposed in [45] that suﬃces to describethe essence of our research goal, see Figure 2. This modelcontains lumped reactions of the glycolytic pathway and in-cludes production of glycerol, fermentation to ethanol andexchange of acetaldehyde between the cells, and trapping ofacetaldehyde by cyanide.This paper has a multidisciplinary character. Therefore,

Weglarz-Tomczak et al.:

Preprint

Page 2 of 11inetic Parameters Identiﬁcation we state research goals that are of interest for computationalbiology, systems biology and derivative-free optimization,namely:• Apply the population-based optimization methods tothe parameter identiﬁcation of the glycolysis processand analyze their performance.• Determine whether it is possible to identify parame-ters if only a subset of metabolites are observed.• Determine whether it is possible to identify parame-ters if one parameter in the system is slightly changed, i.e. , in the case of a mutation.The ﬁrst research goal require to implement the population-based optimization methods and combine them with an ODEsolver. Moreover, we must be able to express a biologicalmodel and process it. For this purpose, we build on top ofthe Python Simulator for Cellular Systems (PySCeS) library[26]. We also propose two surrogate-assisted population-based optimization methods to reduce computational com-plexity and enhance exploration. Further, we slightly mod-ify parameter values of the model from [45] in order to an-swer our another two research questions. Finally, we conductextensive experiments in silico and provide qualitative andquantitative analysis of the population models.The contribution of the paper is threefold:• We provide a population-based optimization frame-work for parameter identiﬁcation and showcase its per-formance on the example of the glycolysis of

Saccha-romyces cerevisiae , one of the most studied species inbiology.• We analyze the performance of the population-basedoptimization framework in the considered problem andindicate its high potential for future research.• We extend the Python framework PySCeS [26] by im-plementing the population-based optimization meth-ods ( methods known in literature, and new meth-ods) in Python. The code for the methods togetherwith the experiments is available online: https://github.com/jmtomczak/popi .

2. Methods

We consider an optimization problem of a function 𝑓 ∶ 𝕏 → ℝ , where 𝕏 ⊆ ℝ 𝐷 is the search space. In this paperwe focus on the minimization problem, namely: 𝐱 ∗ = arg min 𝐱 ∈ 𝕏 𝑓 ( 𝐱 ;  ) , (1)where  denotes observed data.Further, we assume that the analytical form of the func-tion 𝑓 is unknown or cannot be used to calculate derivatives,however, we can query it through a simulation or experi-mental measurements. Problems of this sort are known as derivative-free or black-box optimization problems [1, 16].Additionally, we consider a bounded search space, i.e. , weinclude inequality constraints for all dimensions in the fol-lowing form: 𝑙 𝑑 ≤ 𝑥 𝑑 ≤ 𝑢 𝑑 , where 𝑙 𝑑 , 𝑢 𝑑 ∈ ℝ and 𝑙 𝑑 < 𝑢 𝑑 ,for 𝑑 = 1 , , … , 𝐷 . We consider the glycolysis process in yeast as a biochem-ical system with inputs and outputs (see Figure 2). The in-put to the system is glucose ( glu ), and the outputs are ATP( atp ), NAD ( nad ), acetaldehyde ( ac ) and external acetalde-hyde ( ace ). The other metabolites, i.e. , triose phosphates( triop ), pyruvate ( pyr ), fructose-1,6-biphosphate ( fru ) andtriphoshoglycerate ( tp ) are considered to be unobserved quan-tities. The system is governed by reactions with pa-rameters in total (see Appendix for details). Each reactionis represented by an ordinary diﬀerential equation that isknown. We assume that we have measurements of the in-puts and outputs, i.e. , glu , atp , nad , ac , and ace , and eachquantity is represented as a timecourse of length 𝑇 . We de-note these measurements by  = { glu , atp , nad , ac , ace } . Further, following the nomenclature presented in [4], weconsider the system of diﬀerential equations representing theglycolysis process as the simulator that for given values ofparameters and initial conditions provides timecourses of allmetabolites. Then, we can denote parameters by 𝐱 and thesimulator by sim ∶  → ℝ 𝑇 , i.e. , sim takes parameters 𝐱 and simulates timcourses of length 𝑇 for all metabolites,including glu , atp , nad , ac , ace . In order to calculate the ob-jective (or the ﬁtness) of the parameter values, we use thefollowing function: 𝑓 ( 𝐱 ;  ) = ∑ 𝑖 =1 𝛾 ⋅ 𝑇 𝑇 ∑ 𝑡 =1 ‖ 𝐲 𝑖,𝑡 − sim 𝑖,𝑡 ( 𝐱 ) ‖ , (2)where 𝐲 𝑖,𝑡 corresponds to one of the observed metabolitesat the 𝑡 -th time step, and sim 𝑖,𝑡 ( 𝐱 ) is the corresponding syn-thetically generated signal given by the simulator with pa-rameters 𝐱 , 𝛾 > speciﬁes the strength of penalizing a mis-take. Notice that this is the (unnormalized) logarithm ofthe product of Gaussian distributions with means given by sim( 𝐱 ) and the diagonal covariance matrix with shared vari-ance 𝛾 . One group of widely-used methods for derivative-freeoptimization problems is population-based optimization al-gorithms. The idea behind these methods is to use a popu-lation of individuals , i.e. , a collection of candidate solutions  = { 𝐱 , … , 𝐱 𝑁 } , instead of a single individual in the iter-ative manner. The premise of utilizing the population over In general, a black-box problem means that a formal description of aproblem is unknown, however, very often non-diﬀerentiable problems withknown mathematical representation ( e.g. , diﬀerential equations) are treatedas black-box.

Weglarz-Tomczak et al.:

Preprint

Page 3 of 11inetic Parameters Identiﬁcation

Generation: 0

Generation: 1

Generation: 2

Generation: 3

Figure 3:

An illustration of a population-based optimization of a quadratic function (bluesolid line). At each generation a population is selected (blue nodes) and weakest individualsare discarded (red crosses). New candidate solutions are generated by sampling from thenormal distribution ﬁt to the previous population (orange solid line). a single candidate solution is to obtain better exploration ofthe search space and exploiting potential local optima [9, 8].In the essence, every population-based algorithm con-sists of three following steps that utilize a procedure for gen-erating new individuals 𝐺 , and a selection procedure 𝑆 , thatis: (Init) Initialize  = { 𝐱 , … , 𝐱 𝑁 } and evaluate all in-dividuals  𝑥 = { 𝑓 𝑛 ∶ 𝑓 𝑛 = 𝑓 ( 𝑥 𝑛 ) , 𝑥 𝑛 ∈  } . (Generation) Generate new candidate solutions usingthe current population,  = 𝐺 (  ,  𝑥 ) . (Evaluation) Evaluate all candidates solutions:  𝑐 = { 𝑓 𝑛 ∶ 𝑓 𝑛 = 𝑓 ( 𝑥 𝑛 ) , 𝑥 𝑛 ∈  } . (Selection) Select a new population using the candi-date solutions and the old population  ∶= 𝑆 (  ,  𝑥 ,  ,  𝑐 ) . Go to

Generate or terminate.An exemplary population-based optimization approach is de-picted in Figure 3.In general, the population-based optimization methodsare favorable over standard DFO algorithms in problems whenquerying the objective function is relatively cheap. If timerequired to obtain a value of the objective function (or theﬁtness function in the context of EA) is low, then their com-putational complexity is linear with respect to the size ofthe population 𝑁 . Bayesian Optimization, for instance, isknown to give good performance, but its complexity typi-cally scales cubicly with respect to the number of queries[35]. Here, we take advantage of very low execution timeof running a simulator (the glycolysis model) and propose touse the population-based methods for the parameter identi-ﬁcation task.There is a plethora of population-based DFO algorithms[8], however, our goal is to verify whether this approach ingeneral could be successfully used in the considered task.Therefore, we decide to choose four instances of group ofmethods that are easy-to-use and are proven to work wellin practice: evolutionary strategies (ES), diﬀerential evolu-tion (DE), estimation of distribution algorithms (EDA), andrecently proposed reversible diﬀerential evolution (RevDE). Moreover, we propose to enhance EDA and RevDE with asurrogate model to allow better exploration and speed up cal-culations. Evolutionary strategies can be seen as a specialization ofevolutionary algorithms with very speciﬁc choices of 𝐺 and 𝑆 . The core of ES is to formulate 𝐺 using the multivari-ate Gaussian distribution. Here, we follow the widely-used(1+1)-ES that generates a new candidate using the Gaussianmutation parameterized by 𝜎 > , namely: 𝐱 ′ = 𝐱 + 𝜎 ⋅ 𝜀, (3)where 𝜀 ∼  (0 , I) , and  (0 , I) denotes the Gaussian dis-tribution with zero mean and the identity covariance matrix I . Next, if the ﬁtness value of 𝐱 ′ is smaller than the value ofﬁtness function of 𝐱 , the new candidate is accepted and theold one is discarded.The crucial element of this approach is determining thevalue of 𝜎 . In order to overcome possibly time-consuminghyperparameter search, the following adaptive procedure isproposed [2]: 𝜎 ∶= ⎧⎪⎨⎪⎩ 𝜎 ⋅ 𝑐 if 𝑝 𝑠 < ,𝜎 ∕ 𝑐 if 𝑝 𝑠 > ,𝜎 if 𝑝 𝑠 = 1∕5 . (4)where 𝑝 𝑠 is the number of accepted individuals of the oﬀ-spring divided by the population size 𝑁 , and 𝑐 is equal . following the recommendation in [34]. Diﬀerential evolution is another population-based methodthat is loosely based on the Nelder-Mead method [36, 32]. Anew candidate is generated by randomly picking a triple fromthe population, ( 𝐱 𝑖 , 𝐱 𝑗 , 𝐱 𝑘 ) ∈  , and then 𝐱 𝑖 is perturbed byadding a scaled diﬀerence between 𝐱 𝑗 and 𝐱 𝑘 , that is: 𝐲 = 𝐱 𝑖 + 𝐹 ( 𝐱 𝑗 − 𝐱 𝑘 ) , (5)where 𝐹 ∈ ℝ + is the scaling factor. This operation could beseen as an adaptive mutation operator that is widely knownas diﬀerential mutation [32].Further, the authors of [36] proposed to sample a binarymask 𝐦 ∈ {0 , 𝐷 according to the Bernoulli distribution Weglarz-Tomczak et al.:

Preprint

Page 4 of 11inetic Parameters Identiﬁcation with probability 𝑝 = 𝑃 ( 𝑚 𝑑 = 1) shared across all 𝐷 di-mensions, and calculate the ﬁnal candidate according to thefollowing formula: 𝐯 = 𝐦 ⊙ 𝐲 + (1 − 𝐦 ) ⊙ 𝐱 𝑖 , (6)where ⊙ denotes the element-wise multiplication. In theevolutionary computation literature this operation is knownas uniform crossover operator [8]. In this paper, we ﬁx 𝑝 =0 . following general recommendations in literature [29] anduse the uniform crossover in all methods.The last component of a population-based method is aselection mechanism. There are multiple variants of selec-tion [8], however, here we use the “survival of the ﬁttest”approach, i.e. , we combine the old population with the newone and select 𝑁 candidates with highest ﬁtness values, i.e. ,the deterministic ( 𝜇 + 𝜆 ) selection.This variant of DE is referred to as âĂĲDE/ rand / / bin âĂİ,where rand stands for randomly selecting a base vector, isfor adding a single perturbation and bin denotes the uniformcrossover. Sometimes it is called classic DE [32]. The mutation operator in DE perturbs candidates usingother individuals in the population to generate a single newcandidate. As a result, having too small population couldlimit exploration of the search space. In order to overcomethis issue, a modiﬁcation of DE was proposed that utilizedall three individuals to generate three new points in the fol-lowing manner [39]: 𝐲 = 𝑥 𝑖 + 𝐹 ( 𝐱 𝑗 − 𝐱 𝑘 ) 𝐲 = 𝑥 𝑗 + 𝐹 ( 𝐱 𝑘 − 𝐲 ) (7) 𝐲 = 𝑥 𝑘 + 𝐹 ( 𝐲 − 𝐲 ) . New candidates 𝐲 and 𝐲 could be further used to calculateperturbations using points outside the population. This ap-proach does not follow a typical construction of an EA whereonly evaluated candidates are mutated. Further, we can ex-press (7) as a linear transformation using matrix notation byintroducing matrices as follows: ⎡⎢⎢⎣ 𝐲 𝐲 𝐲 ⎤⎥⎥⎦ = ⎡⎢⎢⎣ 𝐹 − 𝐹 − 𝐹 𝐹 𝐹 + 𝐹 𝐹 + 𝐹 − 𝐹 + 𝐹 + 𝐹 𝐹 − 𝐹 ⎤⎥⎥⎦ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ = 𝐑 ⎡⎢⎢⎣ 𝐱 𝐱 𝐱 ⎤⎥⎥⎦ . (8)In order to obtain the matrix 𝐑 , we need to plug 𝐲 to the sec-ond and third equation in (7), and then 𝐲 to the last equationin (7). As a result, we obtain 𝑀 = 3 𝑁 new candidate so-lutions. This version of DE is called Reversible DiﬀerentialEvolution , because the linear transformation 𝐑 is reversible[39]. Most of the population-based optimization methods aimat ﬁnding a solution and the information about the distri- bution of the search space and the ﬁtness function is repre-sented implicitly by the population. However, this distribu-tion could be modeled explicitly using a probabilistic model[9]. These methods have become known as estimation ofdistribution algorithms [18, 23, 31].The key diﬀerence between EDA and EA is the genera-tion step. While an EA uses evolutionary operators like mu-tation and cross-over to generate new candidate solutions,EDA ﬁts a probabilistic model to the population, and thennew individuals are sampled from this model.Therefore, ﬁtting a distribution to the population is thecrucial part of an EDA. There are various probabilistic mod-els that could be used for this purpose. Here, we proposeto ﬁt the multivariate Gaussian distribution  ( 𝜇, Σ) to thepopulation  . For this purpose, we can use the empiricalmean and the empirical covariance matrix: ̂𝜇 = 1 𝑁 𝑁 ∑ 𝑛 =1 𝐱 𝑛 , (9)and ̂ Σ = 1 𝑁 𝑁 ∑ 𝑛 =1 ( 𝐱 𝑛 − ̂𝜇 )( 𝐱 𝑛 − ̂𝜇 ) ⊤ . (10)An eﬃcient manner of sampling new candidates is to ﬁrstcalculate the Cholesky decomposition of the covariance ma-trix, ̂ Σ = 𝐋𝐋 ⊤ , where 𝐋 is the lower-triangular matrix, andthen computing: 𝐱 ′ = 𝜇 + 𝐋 𝜀, (11)where 𝜀 ∼  (0 , I) . The Eq. 11 is repeated 𝑀 times togenerate a new set of candidate solutions. Here, we set 𝑀 to the size of the population, i.e. , 𝑀 = 𝑁 .Once new candidate solutions are generated, the selec-tion mechanism is applied. In this paper, we use the sameselection procedure as the one used for DE. A possible drawback of population-based methods is thenecessity of evaluating large populations that, even thoughwe assume a low time cost per a single evaluation, couldsigniﬁcantly slow down the whole optimization process. Inorder to overcome this issue, a surrogate model could beused to partially replace querying the ﬁtness function [15].The surrogate model is either a probabilistic model or a ma-chine learning model that gathers previously evaluated popu-lations, and allows to mimic the behavior of the ﬁtness func-tion. It is assumed that the computational costs is lower oreven signiﬁcantly lower than the computational cost of run-ning the simulator.There are multiple possible surrogate models, however,non-parametric models, e.g. , Gaussian processes [35], arepreferable, because they do not suﬀer from catastrophic for-getting ( i.e. , overﬁtting to last population and forgetting ﬁrst

Weglarz-Tomczak et al.:

Preprint

Page 5 of 11inetic Parameters Identiﬁcation populations). Here, we consider K -Nearest-Neighbor ( K -NN) regression model that stores all previously seen indi-viduals with evaluations, and the prediction of a new can-didate solution is an average over 𝐾 ( e.g. , 𝐾 = 3 ) closestpreviously seen individuals. Current implementations of the K -NN regressor provide eﬃcient search procedures that re-sult in the computational complexity better than 𝑁 ⋅ 𝐷 , e.g. ,using KD-trees results in 𝑂 ( 𝐷 log 𝑁 ) . RevDE+

In the RevDE approach we generate 𝑁 new can-didate solutions and all of them are further evaluated. How-ever, this introduces and extra computational cost of runningthe simulator. This issue could be alleviated by using the K -NN regressor to approximate the ﬁtness values of the newcandidates. Further, we can select 𝑁 most promising points.We refer to this approach as RevDE+. EDA+

The outlined procedure of EDA produces 𝑀 newcandidate solutions and in order to keep a similar computa-tional cost as ES and DE, we set 𝑀 to 𝑁 . However, thiscould signiﬁcantly limit the potential of modeling a searchspace, because sampling in high-dimensional search spacesrequires a signiﬁcantly large number of point. A potentialsolution to this problem could be the application of the K -NN regressor to quickly verify of the 𝐿 new points. As longas the time cost of providing the approximated value of theﬁtness function is lower than the running time of the simu-lator, we can aﬀord to take 𝐿 > 𝑁 ( e.g. , 𝐿 = 5 𝑁 ). We referto this approach as EDA+.

3. Results

Model

In order to verify whether it is possible to identifyparameters in the glycolysis process in

Saccharomyces cere-visiae by observing only a subset of metabolites, we considera model presented in [45]. The model consists of ordinarydiﬀerential equations and reaction with parameters. Inthe Appendix A, we present details about the model, as wellas initial conditions, and parameter values measured in [45]( real parameter values ). We treat the system of ordinary dif-ferential equations as the simulator.In the original model in [45], the authors were focusedon oscillatory character of the system, therefore, they assumea constant injection of glu (see reaction 𝑣 in Figure 2, i.e. ,the glucose transporter). However, we consider other sce-nario where there is only an initial input of glucose. For thispurpose, we set 𝑘 in 𝑣 equal . Observations

In the experiments we assume only glu , atp , nad , ac , and ace are observed. We generate the observedmetabolites by running the simulator with the real parametervalues. In order to mimic real measurements that are typi-cally noisy, we add a Gaussian noise with zero mean andthe standard deviation equal of a generated value of ametabolite at a given time step. We notice that adding noiseprohibits ﬁnding a solution ( i.e. , values of parameters) thatachieves error deﬁned in Eq. 2 equal zero. We repeat experiments three times. For each repetition,we set the length of a timecourse to 𝑇 = 30 . Two cases

Our main research goal is the parameter iden-tiﬁcation of a partially observable system of multiple bio-chemical reactions. However, as highlighted in the introduc-tion, a proper identiﬁcation of parameters could be used forﬁngerprinting normal and abnormal biochemical processes.Therefore, we aim at developing optimization methods thatallow to identify parameters of reactions that are not directlyobserved. For this purpose, we distinguish two cases. Inthe

Case 1 we use the model as described before. In the

Case 2 we assume a mutation of the reaction 𝑣 that com-bines two unobserved metabolites ( fru and triop , see Figure2). We simulate the perturbation by changing the value ofthe parameter 𝑘 from . to . . It this diﬃcult to predicthow such relatively small modiﬁcation inﬂuences the wholesystem, and, thus, it serves as an import case study for opti-mization methods. Quantitative and qualitative evaluation

In order to an-swer our research questions, we use the following evaluationmeasures. First, we monitor a convergence of the optimiza-tion methods by plotting the error in Eq. 2. We are interestedwhether an optimizer converges to a minimum, and how fastit is achieved. The speed of convergence is deﬁned by thenumber of evaluated individuals by a population-based meth-ods.Second, we qualitatively inspect the diﬀerence betweenthe simulator output of unobserved metabolites and the realmetabolites. The qualitative evaluation is given by the Eq.2, however, it is also important to obtain an insight into howa potentially misidentiﬁed parameter result in a metabolitetimecourse.Third, since we know the real parameter values, we canalso evaluate a diﬀerence between them and the best val-ues found by the optimization methods. We use the absolutevalue of the diﬀerence of two values. We calculate the meanand the standard deviations of the diﬀerence from three runs,and use the cumulative distribution function of the foldednormal distribution to visualize the distribution of diﬀer-ences (the ideal case is ). Hyperparameters of optimizers

For all optimization meth-ods, we set the population size to 𝑁 = 100 . All optimiz-ers run maximally generations. In the case of ES, weuse the initial value of 𝜎 equal . . For DE, RevDE, andRevDE+, we use 𝐹 = 0 . , and 𝑝 = 0 . . For EDA we take 𝑀 = 100 . In the case of EDA+ and RevDE+, we use the 𝐾 -NN as the surrogate model with 𝐾 = 3 , and we do notstore more than , evaluated individuals. Implementation details

All computational methods are im-plemented in Python using standard packages ( e.g. , numpy , scipy ). We also use the Python Simulator for Cellular Sys- The diﬀerence between two real-valued random variable is normallydistributed. However, taking the absolute value of a normally distributedrandom variable results in the folded normal distribution.

Weglarz-Tomczak et al.:

Preprint

Page 6 of 11inetic Parameters Identiﬁcation A No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e DE exp1: 3.676exp2: 3.676exp3: 3.676 No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e RevDE exp1: 3.676exp2: 3.676exp3: 3.676

No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e EDA exp1: 3.676exp2: 3.677exp3: 3.679

No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e ES exp1: 3.676exp2: 3.676exp3: 3.676 No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e RevDE+ exp1: 3.676exp2: 3.676exp3: 3.676

No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e EDA+ exp1: 3.676exp2: 3.676exp3: 3.676 B No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e DE exp1: 3.676exp2: 3.676exp3: 3.676 No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e RevDE exp1: 3.676exp2: 3.676exp3: 3.676

No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e EDA exp1: 3.679exp2: 3.677exp3: 3.677

No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e ES exp1: 3.677exp2: 3.676exp3: 3.677 No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e RevDE+ exp1: 3.676exp2: 3.676exp3: 3.676

No. of evaluations3.6753.6803.6853.690 O b j e c t i v e v a l u e EDA+ exp1: 3.676exp2: 3.676exp3: 3.676

Figure 4:

The convergence of the population-based optimization methods over runs: A Case 1, B Case 2 ( mutation ). In the legends, we indicate the value of the ﬁtness functionafter the methods converged. tems (PySCeS) library [26] that is an extendable toolkit forthe analysis and investigation of cellular systems. It allowsto import a model represented in a human-readable manner,and solve a system of diﬀerential equation using a built-insolver. In the experiments, we downloaded the model pro-posed in [45] from the JWS Online Database [27], availableunder the following link [42]. All implemented population-based optimization methods as well as the experiments areavailable online: http://XXX . Fitness value

In Figure 4 we present convergence of themethods for Case 1 (Figure 4.A) and Case 2 (Figure 4.B).We notice that all methods were able to converge and achievevery similar ﬁtness values. However, the (1+1)-ES methodwas slowest due to the slow exploration capabilities. EDAalso required more evaluations to obtain better results. Inter-estingly, DE, RevDE, RevDE+ and EDA+ achieved almostidentical values of the ﬁtness function (the diﬀerences werebeyond the three digit precision). An important observationis that application of the surrogate model (the 𝐾 -NN regres- Weglarz-Tomczak et al.:

Preprint

Page 7 of 11inetic Parameters Identiﬁcation A a t p RevDE+ atp (real)atp (sim)atp (3*std) 0 10 20 30time0123 t p RevDE+ tp (real)tp (sim)tp (3*std) B a t p RevDE+ atp (real)atp (sim)atp (3*std) 0 10 20 30time0.00.20.40.60.81.0 t p RevDE+ tp (real)tp (sim)tp (3*std)

Figure 5:

A comparison of the timecourses of the selected twometabolite for RevDE+: A : Case 1, B : Case 2 ( mutation ).Real timecourses are depicted in red, and the average valueand a conﬁdence interval ( standard deviation) over runsof the simulator are in blue. sor) allowed to speed up the convergence of RevDE+ com-pared to RevDE signiﬁcantly. Moreover, in the case of EDA,the surrogate model allowed a better exploration ( 𝑀 = 5 𝑁 )and, thus, EDA+ obtained better results in a signiﬁcantlyless number of evaluations. We conclude that all population-based methods were able to converge and achieved almostidentical scores, and our proposition of applying the surro-gate model led to improving RevDE and EDA Timecourses

The ﬁnal value of the ﬁtness function tellsus how well the simulator models the observed timecoursesfor given parameters provided by an optimizer. Addition-ally, we can also qualitatively inspect the timecourses boththe observed and unobserved metabolites. In Figure 5 wepresent exemplary timecourses for atp (the observed metabo-lite), and tp (the unobserved metabolite), for parameter val-ues found by RevDE+. Since all methods obtained very lowerrors close to noise in data (see Figure 4), we show the time-courses of the unobserved metabolites in Figures 7 and 8 forCase 1 and Case 2, respectively, in the Appendix B.In all cases except ES for tp , the best parameter valuesfound by the optimizers resulted in timecourses that are al-most identical to the real observations. For all unobservedmetabolites the average over repetitions of the experimentsoverlapped with the real value, or lied within the conﬁdenceinterval ( standard deviation). This is a result that wehoped for since being able to generate unobserved metabo-lite is extremely important for analysing biological systems. Diﬀerences in parameters

In this paper, we know pre-cisely the values of the parameters since they were mea-sured in [45] and, in Case 2, we modify one parameter byhand. Hence, we can compare the parameter values found by the optimization methods with the real parameter values.In Figure 6 we present four parameters for which we see aclear diﬀerence between Case 1 and Case 2. Diﬀerence ofall parameters are included in the Appendix B, in Figures 9and 10. In general, the diﬀerences are marginal and we canconclude that all parameter values were properly identiﬁed.The biggest problems though appear for parameters that havevery large values, e.g. , 𝑘 or 𝑘 .Interestingly, we noticed some signiﬁcant diﬀerences forparameter values found by the methods between Case 1 andCase 2. For instance, for 𝑘 , most methods achieved a dif-ference around in Case 1, while it was almost doubledin Case 2. This result shows that parameter identiﬁcation incomplex biological systems is a challenging task and its be-havior cannot be easily predicted upfront.

4. Conclusion

Here, we propose a new optimization framework for pa-rameter identiﬁcation of biological systems. We assume adynamical model of the system with initial conditions, mea-surements of selected metabolites, and an ODE solver to-gether with other tools to represent the model in machine-readable format as inputs in our framework (see Figure 1).Since the cost of solving the model is relatively low, we pro-pose to utilize the population-based optimization methodsto identify the parameters of the system. As a result, oncewe have found the parameters values, we can run simula-tions and generate timecourses of all metabolites, includingthe ones that are unobserved, and further analyze the bio-logical system. In this paper, we provide a proof-of-conceptof one of the most important metabolic processes, namely,the glycolysis pathway, of the well-studied

Saccharomycescerevisiae , known also as baker’s yeast.We outlined the general scheme of population-based op-timization methods, followed by a description of three clas-sic population-based DFO algorithms, namely, diﬀerentialequation (DE), an evolutionary strategy, and the univariateGaussian estimation of distribution algorithm (EDA). Next,we described recently published extension of the diﬀerentialevolution called Reversible Diﬀerential Evolution (RevDE)[39]. Further, in order to decrease the computational com-plexity of the RevDE, we proposed to utilize the 𝐾 -NN re-gressor as a surrogate model. Similarly, we used the 𝐾 -NNbased surrogate model to increase exploration capabilities ofthe EDA. In the experiments, we showed that all population-based methods could be successfully used to identify param-eters of a complex biological networks. However, it seemsthat too simplistic approaches ( e.g. , (1+1)-ES) could be slowand not accurate enough (see Figures 7 and 8). The surrogate-based methods indeed achieved better scores than their vanillacounterparts with almost negligible computational burden.In the introduction, we stated three research goals and inthe experiments we achieved them all. First, we applied vari-ous population-based optimization algorithms to the param-eter identiﬁcation of the glycolysis pathway. In the experi-ments, we analyzed the performance of the methods and we Weglarz-Tomczak et al.:

Preprint

Page 8 of 11inetic Parameters Identiﬁcation A . . . Difference in k20.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k330.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k80.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . . . Difference in ntot0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ B . . . . . Difference in k20.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k330.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k80.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . Difference in ntot0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Figure 6:

The cumulative distribution functions (cdfs) of the diﬀerences for selected pa-rameters: A : Case 1, B : Case 2 ( mutation ). Ideally, a cdf of an optimization methodshould resemble a step-function centered at . The averages and the scales are calculatedover repetitions of the experiment. noticed that: (i) all algorithms converged to (local) minima,however, ES and EDA needed more evaluations, (ii) in bothcases ( i.e. , with the real parameter values and with the mu-tation) the methods converged, (iii) enhancing RevDE andEDA with surrogate models led to speeding up convergenceand increasing exploration capabilities.Second, we show in the experiments that indeed it ispossible to identify parameter values while only a subset ofmetabolites are observed. This result is important and en-couraging for further studies with larger networks.Last, we consider a case study with a mutation of one re-action, i.e. , a value of a parameter is modiﬁed. We chose thereaction 𝑣 since it combines two unobserved metabolites( fru and triop ) that makes the problem more challenging. Inthis case, the population-based optimization methods wereable to ﬁnd good solutions anyway that again is an essentialindication of usefulness of the presented optimizers.From the biological perspective, our work is among ﬁrstthat showed that the parameter identiﬁcation problem of com-plex biological systems with limited access of observed metabo-lites is possible. Our paper is a positive proof-of-conceptand should be further investigated. A possible research di-rection is to consider larger dynamical models ( i.e. , morereactions and more parameters), and a more thorough anal-ysis of the parameter identiﬁcation with various number ofobserved metabolites. The possible application of our ap-proach is the identiﬁcation of single mutation as well as phe-notypic heterogeneity within a population. Therefore, ourmethod opens up a new perspective in the ﬁeld of medicine,and biology, and may result in an innovative computational-aided diagnostic and / or analytical tool.From the computational perspective, our work indicatesa great potential of population-based optimization methods in the ﬁeld of biology and biochemistry. In the case of rela-tively low computational costs of obtaining an evaluation ofparameters, the population-based methods seem to be suﬃ-cient to solve the parameter identiﬁcation problem. More-over, our results for applying surrogate models to the opti-mizers can be highly eﬀective. It is a well-known fact ( e.g. ,see [15]), nevertheless, we believe that the optimization withsurrogate models has a great future and should be investi-gated in more detail. CRediT authorship contribution statement

Ewelina Weglarz-Tomczak:

Conceptualization, Vali-dation, Investigation, Writing - Original Draft, Writing - Re-view & Editing.

Jakub M. Tomczak:

Methodology, Soft-ware, Investigation, Writing - Original Draft, Writing - Re-view & Editing.

Agoston E. Eiben:

Writing - Review &Editing.

Stanley Brul:

Supervision, Writing - Review &Editing.

Acknowledgments

EW-T is ﬁnanced by a grant within Mobilność Plus V fromthe Polish Ministry of Science and Higher Education (GrantNo. 1639/MOB/V/2017/0).

References [1] Audet, C., Hare, W., 2017. Derivative-free and Blackbox Optimiza-tion. Springer.[2] Bäck, T., Foussette, C., Krause, P., 2013. Contemporary EvolutionStrategies. Springer.[3] Balsa-Canto, E., Alonso, A.A., Banga, J.R., 2010. An iterative iden-tiﬁcation procedure for dynamic modeling of biochemical networks.BMC systems biology 4, 11.

Weglarz-Tomczak et al.:

Preprint

Page 9 of 11inetic Parameters Identiﬁcation [4] Cranmer, K., Brehmer, J., Louppe, G., 2020. The frontier ofsimulation-based inference. Proceedings of the National Academyof Sciences .[5] Doncieux, S., Bredeche, N., Mouret, J.B., Eiben, A.E., 2015. Evolu-tionary robotics: what, why, and where to. Frontiers in Robotics andAI 2, 4.[6] Duarte, N.C., Herrgård, M.J., Palsson, B.Ø., 2004. Reconstructionand validation of saccharomyces cerevisiae ind750, a fully compart-mentalized genome-scale metabolic model. Genome research 14,1298–1309.[7] Eiben, A.E., Smith, J., 2015a. From evolutionary computation to theevolution of things. Nature 521, 476–482.[8] Eiben, A.E., Smith, J.E., 2015b. Introduction to Evolutionary Com-puting. volume 53. Springer.[9] Gallagher, M., Frean, M., 2005. Population-based continuous opti-mization, probabilistic modelling and mean shift. Evolutionary Com-putation 13, 29–42.[10] Gatenby, R.A., Gillies, R.J., 2004. Why do cancers have high aerobicglycolysis? Nature reviews cancer 4, 891–899.[11] Gerard, M.F., Stegmayer, G., Milone, D.H., 2013. An evolutionaryapproach for searching metabolic pathways. Computers in Biologyand Medicine 43, 1704–1712.[12] Hynne, F., Danø, S., Sørensen, P.G., 2001. Full-scale model of gly-colysis in saccharomyces cerevisiae. Biophysical chemistry 94, 121–163.[13] Ideker, T., Galitski, T., Hood, L., 2001. A new approach to decodinglife: systems biology. Annual review of genomics and human genetics2, 343–372.[14] Ingalls, B.P., 2013. Mathematical modeling in systems biology: anintroduction. MIT press.[15] Jin, Y., 2011. Surrogate-assisted evolutionary computation: Recentadvances and future challenges. Swarm and Evolutionary Computa-tion 1, 61–70.[16] Jones, D.R., Schonlau, M., Welch, W.J., 1998. Eﬃcient global op-timization of expensive black-box functions. Journal of Global opti-mization 13, 455–492.[17] Kourdis, P.D., Goussis, D.A., 2013. Glycolysis in saccharomycescerevisiae: algorithmic exploration of robustness and origin of os-cillations. Mathematical biosciences 243, 190–214.[18] Larrañaga, P., Lozano, J.A., 2001. Estimation of distribution algo-rithms: A new tool for evolutionary computation. Springer Science& Business Media.[19] Lee, T.I., Rinaldi, N.J., Robert, F., Odom, D.T., Bar-Joseph, Z., Ger-ber, G.K., Hannett, N.M., Harbison, C.T., Thompson, C.M., Simon,I., et al., 2002. Transcriptional regulatory networks in saccharomycescerevisiae. science 298, 799–804.[20] Mensonides, F.I., Brul, S., Hellingwerf, K.J., Bakker, B.M., Teix-eira de Mattos, M.J., 2014. A kinetic model of catabolic adaptationand protein reproﬁling in saccharomyces cerevisiae during tempera-ture shifts. The FEBS Journal 281, 825–841.[21] Moles, C.G., Mendes, P., Banga, J.R., 2003. Parameter estimation inbiochemical pathways: a comparison of global optimization methods.Genome research 13, 2467–2474.[22] Morbiducci, U., Di Benedetto, G., Kautzky-Willer, A., Deriu, M.A.,Pacini, G., Tura, A., 2011. Identiﬁcation of a model of non-esteriﬁedfatty acids dynamics through genetic algorithms: The case of womenwith a history of gestational diabetes. Computers in Biology andMedicine 41, 146–153.[23] Mühlenbein, H., Paass, G., 1996. From recombination of genes tothe estimation of distributions i. binary parameters, in: InternationalConference on Parallel Problem Solving from Nature, Springer. pp.178–187.[24] Nielsen, J., 2017. Systems biology of metabolism. Annual review ofbiochemistry 86, 245–275.[25] Nielsen, J., 2019. Yeast systems biology: model organism and cellfactory. Biotechnology journal 14, 1800421.[26] Olivier, B.G., Rohwer, J.M., Hofmeyr, J.H.S., 2005. Modelling cel-lular systems with PySCeS. Bioinformatics 21, 560–561. [27] Olivier, B.G., Snoep, J.L., 2004. Web-based kinetic modelling usingjws online. Bioinformatics 20, 2143–2144.[28] Orij, R., Urbanus, M.L., Vizeacoumar, F.J., Giaever, G., Boone, C.,Nislow, C., Brul, S., Smits, G.J., 2012. Genome-wide analysis ofintracellular ph reveals quantitative control of cell division rate by phc in saccharomyces cerevisiae. Genome biology 13, R80.[29] Pedersen, M.E.H., 2010. Good parameters for diﬀerential evolution.Technical Report HL1002. Hvass Laboratories.[30] Pelicano, H., Martin, D., Xu, R., Huang, P., 2006. Glycolysis inhibi-tion for anticancer treatment. Oncogene 25, 4633–4646.[31] Pelikan, M., Hauschild, M.W., Lobo, F.G., 2015. Estimation of dis-tribution algorithms, in: Springer Handbook of Computational Intel-ligence. Springer, pp. 899–928.[32] Price, K., Storn, R.M., Lampinen, J.A., 2006. Diﬀerential Evolution:A Practical Approach to Global Optimization. Springer Science &Business Media.[33] Reali, F., Priami, C., Marchetti, L., 2017. Optimization algorithmsfor computational systems biology. Frontiers in Applied Mathematicsand Statistics 3, 6.[34] Schwefel, H.P., 1977. Numerische Optimierung von Computer-Modellen mittels der Evolutionsstrategie. Springer.[35] Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., De Freitas, N.,2015. Taking the human out of the loop: A review of Bayesian opti-mization. Proceedings of the IEEE 104, 148–175.[36] Storn, R., Price, K., 1997. Diﬀerential evolution–a simple and eﬃ-cient heuristic for global optimization over continuous spaces. Journalof Global Optimization 11, 341–359.[37] Teusink, B., Passarge, J., Reijenga, C.A., Esgalhado, E., Van der Wei-jden, C.C., Schepper, M., Walsh, M.C., Bakker, B.M., Van Dam, K.,Westerhoﬀ, H.V., et al., 2000. Can yeast glycolysis be understoodin terms of in vitro kinetics of the constituent enzymes? testing bio-chemistry. European Journal of Biochemistry 267, 5313–5329.[38] Tomczak, J.M., Węglarz-Tomczak, E., 2019. Estimating kinetic con-stants in the Michaelis–Menten model from one enzymatic assay us-ing Approximate Bayesian Computation. FEBS letters 593, 2742–2750.[39] Tomczak, J.M., Weglarz-Tomczak, E., Eiben, A.E., 2020. Diﬀerentialevolution with reversible linear transformations, in: GECCO 2020.[40] Toni, T., Welch, D., Strelkowa, N., Ipsen, A., Stumpf, M.P., 2009. Ap-proximate Bayesian computation scheme for parameter inference andmodel selection in dynamical systems. Journal of the Royal SocietyInterface 6, 187–202.[41] Tsiantis, N., Balsa-Canto, E., Banga, J.R., 2018. Optimality and iden-tiﬁcation of dynamic models in systems biology: an inverse optimalcontrol framework. Bioinformatics 34, 2433–2440.[42] https://jjj.bio.vu.nl/models/wolf/ , 07-August-2020. [accessed on-line].[43] Van Eunen, K., Bouwman, J., Daran-Lapujade, P., Postmus, J.,Canelas, A.B., Mensonides, F.I., Orij, R., Tuzun, I., Van Den Brink,J., Smits, G.J., et al., 2010. Measuring enzyme activities under stan-dardized in vivo-like conditions for systems biology. The FEBS jour-nal 277, 749–760.[44] Westerhoﬀ, H.V., Palsson, B.O., 2004. The evolution of molecularbiology into systems biology. Nature biotechnology 22, 1249–1252.[45] Wolf, J., Passarge, J., Somsen, O.J., Snoep, J.L., Heinrich, R., West-erhoﬀ, H.V., 2000. Transduction of intracellular and intercellular dy-namics in yeast glycolytic oscillations. Biophysical journal 78, 1145–1153.[46] Wolkenhauer, O., Ullah, M., Kolch, W., Cho, K.H., 2004. Model-ing and simulation of intracellular dynamics: choosing an appropriateframework. IEEE transactions on nanobioscience 3, 200–207.[47] Yang, X., Cheng, X., Liu, Q., Zhang, C., Song, Y., 2020. The re-sponse surface method-genetic algorithm for identiﬁcation of the lum-bar intervertebral disc material parameters. Computers in Biology andMedicine 124, 103920.

Weglarz-Tomczak et al.:

Preprint

Page 10 of 11inetic Parameters Identiﬁcation

A. Appendix: The glycolysis modeldescription

In the considered model of the glycolysis we distinguishthe following metabolites:• glycolysis ( glu );• fructose-1,6-bisphosphate ( fru );• triosephosphates ( triop );• triphosphoglycerate ( tp );• pyruvate ( pyr );• acetaldehyde ( ac );• external acetaldehyde ( ace ).Following the same assumptions as in [45] ( i.e. , a homo-geneous distribution of the metabolites in the intracellularand in the extracellular solution), the systems of ordinary dif-ferential equations of the glycolysis model in Saccharomycescerevisiae is the following [42]: ̇𝑔𝑙𝑢 = 𝑣 − 𝑣 (12) ̇𝑓 𝑟𝑢 = 𝑣 − 𝑣 (13) ̇𝑡𝑟𝑖𝑜𝑝 = 2 𝑣 − 𝑣 − 𝑣 (14) ̇𝑡𝑝 = 𝑣 − 𝑣 (15) ̇𝑝𝑦𝑟 = 𝑣 − 𝑣 (16) ̇𝑎𝑐 = 𝑣 − 𝑣 − 𝑣 (17) ̇𝑎𝑐𝑒 = 0 . 𝑣 − 𝑣 (18) ̇𝑎𝑡𝑝 = −2 𝑣 + 𝑣 + 𝑣 − 𝑣 (19) ̇𝑛𝑎𝑑 = 𝑣 − 𝑣 − 𝑣 (20)with the rate equations: 𝑣 = 𝑘 (21) 𝑣 = 𝑘 ⋅ 𝑔𝑙𝑢 ⋅ 𝑎𝑡 𝑎𝑡 ∕ 𝑘 𝑖 ) 𝑛 (22) 𝑣 = 𝑘 ⋅ 𝑓 𝑟𝑢 (23) 𝑣 = 𝑘 ⋅ 𝑘 ⋅ 𝑡𝑟𝑖𝑜𝑝 ⋅ 𝑛𝑎𝑑𝐴 − 𝑘 ⋅ 𝑘 ⋅ 𝑡𝑝 ⋅ 𝑎𝑡𝑝 𝑁𝑘 ⋅ 𝑁 + 𝑘 ⋅ 𝐴 (24) 𝑣 = 𝑘 ⋅ 𝑡𝑝 ⋅ 𝐴 (25) 𝑣 = 𝑘 ⋅ 𝑝𝑦𝑟 (26) 𝑣 = 𝑘 ⋅ 𝑎𝑐 ⋅ 𝑛𝑎𝑑 (27) 𝑣 = 𝑘 ⋅ 𝑎𝑡𝑝 (28) 𝑣 = 𝑘 ⋅ 𝑡𝑟𝑖𝑜𝑝 ⋅ 𝑛𝑎𝑑 (29) 𝑣 = 𝑘 ⋅ 𝑎𝑐𝑒 (30) 𝑣 = 𝑘 ⋅ 𝑎𝑡𝑝 (31)where: 𝐴 = ( 𝑎 𝑡𝑜𝑡 − 𝑎𝑡𝑝 ) (32) 𝑁 = ( 𝑛 𝑡𝑜𝑡 − 𝑛𝑎𝑑 ) (33)The initial conditions are the following: 𝑎𝑡𝑝 = 2 . (34) 𝑛𝑎𝑑 = 0 . (35) 𝑔𝑙𝑢 = 5 . (36) 𝑓 𝑟𝑢 = 5 . (37) 𝑡𝑟𝑖𝑜𝑝 = 0 . (38) 𝑡𝑝 = 0 . (39) 𝑝𝑦𝑟 = 8 . (40) 𝑎𝑐 = 0 . (41) 𝑎𝑐𝑒 = 0 . . (42)The model is schematically depicted in Figure 2.The real values of the parameters are the following [42]: 𝑎 𝑡𝑜𝑡 = 4 ∈ [0 , (43) 𝑘 = 0 ∈ [0 , (44) 𝑘 = 550 ∈ [550 , (45) 𝑘 = 9 . , (46) 𝑘 = 323 . , (47) 𝑘 = 76411 . , (48) 𝑘 = 57823 . , (49) 𝑘 = 23 . , (50) 𝑘 = 80 ∈ [80 , (51) 𝑘 = 9 . , (52) 𝑘 = 2000 ∈ [2000 , (53) 𝑘 = 28 . , (54) 𝑘 = 85 . , (55) 𝑘 = 0 ∈ [0 , (56) 𝑘 = 375 ∈ [350 , (57) 𝑘 𝑖 = 1 ∈ [0 , (58) 𝑛 = 4 ∈ [0 , (59) 𝑛 𝑡𝑜𝑡 = 1 ∈ [0 , , (60)where we indicate the set of possible values of the parame-ters in the square brackets.We note that for the sake of our experiments, we set 𝑘 to (originally: 𝑘 = 50 [42]) in order to forbid a constantinjection of 𝑔𝑙𝑢 , and 𝑘 to (originally: 𝑘 = 80 [42]) inorder to avoid oscillatory behavior of the system. B. Appendix: Additional results

The detailed results for the timecourses of the unobservedmetabolites are presented in Figures 7 and 8.The detailed results of the diﬀerences between the realparameters and found parameters are depicted in Figures 9and 10.

Weglarz-Tomczak et al.:

Preprint

Page 11 of 11inetic Parameters Identiﬁcation f r u DE fru (real)fru (sim)fru (3*std) 0 10 20 30time0123 t p DE tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.60.8 t r i o p DE triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r DE pyr (real)pyr (sim)pyr (3*std)0 10 20 30time0246 f r u RevDE fru (real)fru (sim)fru (3*std) 0 10 20 30time0123 t p RevDE tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.60.81.0 t r i o p RevDE triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r RevDE pyr (real)pyr (sim)pyr (3*std)0 10 20 30time0246 f r u RevDE+ fru (real)fru (sim)fru (3*std) 0 10 20 30time0123 t p RevDE+ tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.60.8 t r i o p RevDE+ triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r RevDE+ pyr (real)pyr (sim)pyr (3*std)0 10 20 30time0246 f r u EDA fru (real)fru (sim)fru (3*std) 0 10 20 30time012 t p EDA tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.60.81.0 t r i o p EDA triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r EDA pyr (real)pyr (sim)pyr (3*std)0 10 20 30time0246 f r u EDA+ fru (real)fru (sim)fru (3*std) 0 10 20 30time0123 t p EDA+ tp (real)tp (sim)tp (3*std) 0 10 20 30time0.000.250.500.751.00 t r i o p EDA+ triop (real)triop (sim)triop (3*std) 0 10 20 30time0246810 p y r EDA+ pyr (real)pyr (sim)pyr (3*std)0 10 20 30time0246 f r u ES fru (real)fru (sim)fru (3*std) 0 10 20 30time0123 t p ES tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.60.8 t r i o p ES triop (real)triop (sim)triop (3*std) 0 10 20 30time0246810 p y r ES pyr (real)pyr (sim)pyr (3*std) Figure 7:

A comparison of the timecourses of the unobserved metabolites in the Case 1.Real timecourses are depicted in red, and the average value and a conﬁdence interval ( standard deviation) over runs of the simulator are in blue. The titles of the plots indicateoptimization methods.Weglarz-Tomczak et al.: Preprint

Page 12 of 11inetic Parameters Identiﬁcation f r u DE fru (real)fru (sim)fru (3*std) 0 10 20 30time0.00.20.40.60.81.0 t p DE tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.6 t r i o p DE triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r DE pyr (real)pyr (sim)pyr (3*std)0 10 20 30time0246 f r u RevDE fru (real)fru (sim)fru (3*std) 0 10 20 30time0.00.20.40.60.81.0 t p RevDE tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.6 t r i o p RevDE triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r RevDE pyr (real)pyr (sim)pyr (3*std)0 10 20 30time0246 f r u RevDE+ fru (real)fru (sim)fru (3*std) 0 10 20 30time0.00.20.40.60.81.0 t p RevDE+ tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.6 t r i o p RevDE+ triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r RevDE+ pyr (real)pyr (sim)pyr (3*std)0 10 20 30time0246 f r u EDA fru (real)fru (sim)fru (3*std) 0 10 20 30time0.00.51.0 t p EDA tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.6 t r i o p EDA triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r EDA pyr (real)pyr (sim)pyr (3*std)0 10 20 30time0246 f r u EDA+ fru (real)fru (sim)fru (3*std) 0 10 20 30time0.00.20.40.60.81.0 t p EDA+ tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.6 t r i o p EDA+ triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r EDA+ pyr (real)pyr (sim)pyr (3*std)0 10 20 30time02468 f r u ES fru (real)fru (sim)fru (3*std) 0 10 20 30time1012 t p ES tp (real)tp (sim)tp (3*std) 0 10 20 30time0.00.20.40.6 t r i o p ES triop (real)triop (sim)triop (3*std) 0 10 20 30time02468 p y r ES pyr (real)pyr (sim)pyr (3*std) Figure 8:

A comparison of the timecourses of the unobserved metabolites in the Case 2.Real timecourses are depicted in red, and the average value and a conﬁdence interval ( standard deviation) over runs of the simulator are in blue. The titles of the plots indicateoptimization methods.Weglarz-Tomczak et al.: Preprint

Page 13 of 11inetic Parameters Identiﬁcation . . . . . Difference in atot0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k00.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k10.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . Difference in k20.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k310.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k320.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k330.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k340.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k40.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k50.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k60.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . . . Difference in k70.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k80.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . . . Difference in k90.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k100.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in ki0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . . . Difference in n0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . . . Difference in ntot0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Figure 9:

The cumulative distribution functions (cdfs) of the diﬀerences for all parameters.Ideally, a cdf of an optimization method should resemble a step-function centered at .The averages and the scales are calculated over repetitions of the experiment in the Case1.Weglarz-Tomczak et al.: Preprint

Page 14 of 11inetic Parameters Identiﬁcation

Difference in atot0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k00.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k10.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . . . Difference in k20.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k310.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k320.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k330.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k340.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k40.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . Difference in k50.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k60.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k70.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k80.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . Difference in k90.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in k100.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Difference in ki0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . . . Difference in n0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+ . . . Difference in ntot0.00.20.40.60.81.0 P ( X x ) RevDE+RevDEDEESEDAEDA+

Figure 10: