[PDF] Finding Acceptable Parameter Regions of Stochastic Hill functions for Multisite Phosphorylation Mechanism

Abstract

Full PDF

FF INDING A CCEPTABLE P ARAMETER R EGIONS OF S TOCHASTIC H ILL FUNCTIONS FOR M ULTISITE P HOSPHORYLATION M ECHANISM

Minghan Chen

Department of Computer ScienceVirginia TechBlacksburg, VA 24061 [email protected]

Mansooreh Ahmadian

Department of Computer ScienceVirginia TechBlacksburg, VA 24061 [email protected]

Layne Watson

Department of Computer ScienceVirginia TechBlacksburg, VA 24061 [email protected]

Yang Cao

Department of Computer ScienceVirginia TechBlacksburg, VA 24061 [email protected] A BSTRACT

Multisite phosphorylation plays an important role in regulating switchlike protein activity and hasbeen used widely in mathematical models. With the development of new experimental techniques andmore molecular data, molecular phosphorylation processes emerge in many systems with increasingcomplexity and sizes. These developments call for simple yet valid stochastic models to describevarious multisite phosphorylation processes, especially in large and complex biochemical networks.To reduce model complexity, this work aims to simplify the multisite phosphorylation mechanism bya stochastic Hill function model. Further, this work optimizes regions of parameter space to matchsimulation results from the stochastic Hill function with the distributive multisite phosphorylationprocess. While traditional parameter optimization methods have been focusing on ﬁnding the bestparameter vector, in most circumstances modelers would like to ﬁnd a set of parameter vectors thatgenerate similar system dynamics and results. This paper proposes a general α - β - γ rule to returnan acceptable parameter region of the stochastic Hill function based on a quasi-Newton stochasticoptimization (QNSTOP) algorithm. Different objective functions are investigated characterizingdifferent features of the simulation-based empirical data, among which the approximate maximumlog-likelihood method is recommended for general applications. Numerical results demonstrate thatwith an appropriate parameter vector value, the stochastic Hill function model depicts the multisitephosphorylation process well except the initial (transient) period. Protein phosphorylation is a fundamental molecular mechanism that regulates protein function through the addition of aphosphate group. Many proteins have multiple phosphorylation sites and the phosphorylation process often targetsmultiple distinct sites, referred to as multisite phosphorylation. Multisite phosphorylation facilitates the complexityand scope of protein activity and is ubiquitous in biological processes such as cell cycle regulation [2]. Over the years,researchers have revealed that multisite phosphorylation forms the basis of switchlike response [14] and can exhibitbistability and periodic oscillations [21, 34, 5].Based on the enzyme (kinase) processivity, there are two categories of multisite phosphorylation mechanisms [36]. Theprocessive mechanism occurs when the enzyme phosphorylates all sites without dissociation from the substrate, whilein the distributive mechanism, the enzyme dissociates from its substrate after one site becomes phosphorylated. This a r X i v : . [ q - b i o . M N ] S e p aper considers the multisite phosphorylation process with an ordered distributive mechanism, shown by B f r A s ÝÝÝáâÝÝÝ b B f r A s ÝÝÝáâÝÝÝ b B . . . f n ´ r A s ÝÝÝÝÝáâÝÝÝÝÝ b n ´ B n ´ f n r A s ÝÝÝáâÝÝÝ b n B n . (1)B is a protein (substrate) with n phosphorylation sites. A is the enzyme (kinase) that can bind to substrate B. B represents the unphosphorylated form (unbound to any phosphoryl group), and B n is the fully phosphorylated form. B n is formed in sequential steps, with intermediate stages increasing by one site phosphorylated until all n sites arephosphorylated. Assume that there are enough phosphate groups for the phosphorylation process in system (1) and allbinding sites are the same (no preference over the selection of binding sites). Enzyme A controls the phosphorylationof B i with the phosphorylation activity f i r A s , where f i is the phosphorylation rate and r A s denotes the quantity(population or concentration) of the enzyme A. In the stochastic regime, [A] represents the enzyme population. b i p i “ , , . . . , n q denotes the dephosphorylation activity of B i , where b i “ k i r phosphotase s . Assuming that thephosphotase level is constant in the enzyme-substrate system (1), one can simply use b i to represent the product ofdephosphorylation rate k i and phosphotase level.The kinetic behavior of multisite phosphorylation can be mathematically modeled by a set of differential equationsbased on a series of elementary reactions describing the enzyme-substrate binding and catalysis process. Thus, theconcentrations of all possible phosphorylated forms r B i s is controlled by $’’’’’’&’’’’’’% d r B s dt “ b r B s ´ f r A sr B s ,d r B i s dt “ f i r A sr B i ´ s ` b i ` r B i ` s ´ p f i ` r A s ` b i qr B i s , for ă i ă n,d r B n s dt “ f n r A sr B n ´ s ´ b n r B n s . (2)The above differential equations are a traditional deterministic method for modeling and simulating the multisitephosphorylation. While in the stochastic regime, the enzyme-substrate system is governed by a list of reactionpropensities, which describe the probabilities of reaction events. Below are the reactions and corresponding propensitiesfor the multisite phosphorylation system (1), where ă i ď n .Table 1: Chemical reaction and propensity calculationReaction Propensity B i ´ f i r A s ÝÝÝÑ B i a “ f i r A sr B i ´ s B i b i ÝÝÝÑ B i ´ a “ b i r B i s In the above chemical reactions, if we treat [A] as a constant (parameter), the system (1) becomes a linear reactionnetwork. There are many studies on the stochastic kinetics and efﬁcient modeling and simulation of such chemicalreaction networks [6, 26, 12, 16, 23]. For example, Gadgila et al. [16] formulated the governing master equations ofﬁrst order reactions and results showed that the distribution of all the system components in an open system followsPoisson distribution at steady state. Lente [23] developed a stochastic mapping for ﬁrst order reaction networks to helpevaluate the appropriate method between stochastic and deterministic approaches. Our recent work [26] also presenteda numerical analysis for the accuracy of the hybrid ODE/SSA method on linearly reacting systems.However, for complex biochemical networks involving multisite protein phosphorylation, the above kinetic system inboth deterministic and stochastic regimes increases the model size and complexity especially when n is large (a proteinwith n phosphorylation sites has potentially n to n distinct phosphorylation forms.) On the other hand, when tryingto formulate a realistic multisite phosphorylation system, little is known beyond speciﬁc enzyme-substrate systems.Usually, most reaction rates are tuned to match with empirical observations and assumptions are made with limitedbiological justiﬁcation. Considering the complexity and lack of knowledge, this paper proposes to model the multisitephosphorylation with a Hill function system and study the dynamic behavior of the fully phosphorylated form B n .The Hill equation is widely used in biochemical networks to model fast signal response and complex binding processes.It was ﬁrst introduced to model the observed curves of ligand binding to the receptor [20]. The equation is a nonlinearfunction of the ligand concentration, and the Hill exponent deﬁnes the degree of cooperativity of ligand binding. Othercomplex models have been proposed to describe different cooperativity of ligand binding [1, 30, 22, 28]. The sigmoidalcurves generated from the Hill function system is similar to the switchlike protein activity. Furthermore, the Hill equation2equires little prior knowledge of the binding mechanism and is much simpler than the kinetic model (2). Therefore,this work investigates the simple Hill function system for the ordered distributive enzyme-substrate system (1): B k a r A s σ k σm ` r A s σ ÝÝÝÝÝÝÝÝÝÝáâÝÝÝÝÝÝÝÝÝÝ k d B n , (3)where k a and k d are the forward and reverse reaction constant, respectively, k m is the dissociation constant, and σ isthe Hill exponent, a real number with range σ ď n .Since the Hill equation has been thoroughly studied and extensively used in traditional deterministic models (differentialequations), the unexplored discrete stochastic representation gets more attention from researchers with the presence oflow population levels and molecular noise. Previous studies have discovered that the sigmoidal behavior of the Hillfunction dynamics may reduce to a linear function in the stochastic regime, especially under the reaction-diffusionmaster equation framework [11]. This paper models the Hill function system in the stochastic regime and restrictsattention to a homogeneous domain. Using the stochastic simulation algorithm (SSA) [18], the above Hill functionsystem (3) is then governed by the forward and reverse reactions with propensities: a f “ k a r A s σ k σm ` r A s σ r B s ; a r “ k d r B n s . (4)Biologists are able to quantify species population at molecular levels with improved experimental techniques [32],which provides good resources (observed empirical data) for validating stochastic models and optimizing “stochasticparameters”. Assuming the ordered distributive mechanism (1) is the ground truth for the multisite phosphorylationprocess, the trajectories of r B n s generated from the stochastic enzyme-substrate system (1) can be considered empiricaldata, while the trajectories of r B n s generated from the stochastic Hill function system (3) are considered as simulateddata. This paper, instead of validating the stochastic results against the deterministic results, will focus on optimizingthe stochastic Hill function model for the multisite phosphorylation process, meaning optimizing the parameter vector θ “ r k a , k d , σ , k m s deﬁning the stochastic Hill function model.The challenges of solving the inverse problem of parameter estimation for modeling biological systems are manifold.One major challenge is large biological networks with many unknown parameters that are hard or impossible to measurein experiments. The nonlinear nature of the models involves a nonconvex problem with multiple locally optimumpoints, and local optimization methods may be trapped at local optimum points. Global optimization methods can bevery expensive in high dimensions, and global optimality is often not guaranteed [4]. Moreover, most of the efforts insolving such problems thus far have been focused on deterministic models, particularly estimating the parameters ofmodels formulated by nonlinear differential equations [27]. Parameter estimation in stochastic models is even morechallenging as the amount of empirical data must be large enough to obtain statistically valid parameter estimates. Twowell-known approaches for stochastic optimization problems are stochastic approximation (SA) and response surfacemethodology (RSM). The class of quasi-Newton methods for stochastic optimization extends state-of-the-art numericaloptimization methods (e.g., secant updates, trust regions) [8], can also be used for deterministic global optimizationwith minor variations [13, 3], and has been successfully applied to various stochastic optimization problems, suchas cell cycle models [9], bistable models [10], and biomechanics problems [31]. Other approaches such as Bayesianinference [25, 33], Kalman ﬁlter, and its variants [24] have been applied to solve this problem as well. Recently,machine learning techniques have been tailored by sparsity-promoting methods to identify not only the parameters butalso the structure of both ordinary differential equations [7] and partial differential equations [35].Most parameter optimization methods only return a single best parameter vector, regardless of the fact that there aremany parameter vectors that could generate similar system dynamics and characteristics. Take the bistable switch in thecell cycle as an example [17]. Bistable phenomena occur when model parameters are inside the bistability region. Theentire bistability region or at least most parameter values sampled in the region may be considered acceptable if thegoal is to simply model the bistable switch process, rather than to minimize the objective function. This work focuseson ﬁnding an “acceptable" parameter region, where parameter vectors in the acceptable region are good alternativesto the best parameter vector minimizing the objective function. In other words, the system parameters are given by aregion in parameter space rather than a single point.This work studies acceptable parameter regions of the stochastic Hill function (3) for the multisite phosphorylationsystem (1). Section 2 introduces the chemical master equation for deriving the transition probability matrix and thequasi-Newton algorithm used for stochastic optimization. In Section 3, three objective functions measuring differentfeatures of the simulation-based empirical data are investigated. Section 4 then presents the proposed α - β - γ rule fordeﬁning the acceptable parameter regions found by the quasi-Newton stochastic optimization (QNSTOP) algorithm.Numerical results and detailed analyses are given in Section 5.3 Background

This section ﬁrst reviews the chemical master equation and the analytical solution of a general biochemical system.Then, the essential steps of the quasi-Newton stochastic optimization algorithm (QNSTOP) are summarized.

Consider a well-mixed system of N distinct species and M reaction channels with ˆ N possible states. The chemicalmaster equation [19] that describes the probabilistic time evolution of the system dynamics is ddt P p X ; t q “ P p X ; t q C, (5)where X “ r x , x , . . . , x ˆ N s is all possible state vectors x i at any time t , P p X ; t q represents the probabilities of thosestate vectors at time t , and C is the state reaction matrix [29], given by C ij “ $&% ´ ř Mµ “ a µ p x j q , for i “ j,a µ p x i q , for i such that x j “ x i ` v µ , , otherwise , (6)where v µ is the stoichiometric transition vector for reaction channel µ .From equation (5), the solution is P “ P e Ct , (7)where P is the initial probability of all the possible states, and the transition probability matrix can be calculated by T “ e Cτ , (8)where τ is the period of time the system has evolved from a previous time [29]. QNSTOP is a class of quasi-Newton methods developed for stochastic optimization [8], where the objective function f p X q is a random variable, and X is contained in a box L ď X ď U . The essential steps of QNSTOP are summarizedhere. Further details on the algorithm and implementation can be found in Ref. [3]. • In each iteration k , construct a quadratic model p m k p X ´ X k q “ ˆ f k ` ˆ g Tk p X ´ X k q ` p X ´ X k q T ˆ H k p X ´ X k q centered at X k , where ˆ g k is the gradient vector and ˆ H k is the Hessian matrix. Note that ˆ f k is generally not f p X k q , which is stochastic. • QNSTOP uses an ellipsoidal region centered at the current iterate X k P IR n with radius τ k , E k p τ k q “ ! X P IR n : p X ´ X k q T W k p X ´ X k q ď τ k ) , where W k is a symmetric, positive deﬁnite scaling matrix, satisfying W k P W γ , W γ “ (cid:32) W P IR n ˆ n : W “ W T , det p W q “ , γ ´ I n ĺ W ĺ γI n ( for some γ ě , where I n is the n ˆ n identity matrix, and A ĺ B means B ´ A is positive semideﬁnite. Theelements of the set W γ are valid scaling matrices that control the shape of the ellipsoidal design regions witheccentricity constrained by γ . • Then QNSTOP estimates the gradient based on a set of N uniformly sampled design sites t X k , . . . , X kN u Ă E k p τ k q X Θ ( Θ “ r L, U s , which is the feasible set of parameters deﬁned initially.) For the Hessian matrix,QNSTOP uses either a variation of the SR1 (symmetric, rank one) quasi-Newton update (stochastic f ) or theunconstrained BFGS quasi-Newton update (global optimization of deterministic f ). • For the next iteration, by utilizing an ellipsoidal trust region concentric with the design region for controllingstep length, X k ` is updated as X k ` “ ˆ X k ´ ” ˆ H k ` µ k W k ı ´ ˆ g k ˙ Θ , where µ k is the Lagrange multiplier of a trust region subproblem, and p¨q Θ denotes projection onto the feasibleset Θ “ r L, U s . 4 Finally, the experimental design region E k p τ k q is updated to approximate a conﬁdence set by updating thescaling matrix W k . The updated scaling matrix is given by W k ` “ ´ ˆ H k ` µ k W k ¯ T V ´ k ´ ˆ H k ` µ k W k ¯ , where V k is the covariance matrix of ∇ p m k p X k ` ´ X k q . For numerical stability, W k ` is constrained(by modifying its eigenvalues) to satisfy the constraints γ ´ I n ĺ W k ` ĺ γI n and det p W k ` q “ , so W γ Q W k ` . This section presents three different objective functions characterizing different aspects of the population trajectory of B n . In particular, we propose a general simulation-based objective function that can be applied to large biochemicalnetworks. In the enzyme-substrate system (1), suppose D “ r ˆ x , ˆ x , . . . , ˆ x m s is a sequence of the molecular population of B n collected from stochastic simulation results after every time τ (denoted as r t , t , . . . , t m s ), where m is thedata size. This subsection will consider the population difference of B n between the empirical data (from multisitephosphorylation) and simulation results (from stochastic Hill function) over time, called the distance area. Giventhe empirical data and the simulated data as two vectors, the p -norm is one traditional way to measure the vectors’difference. The problem is stochastic and the time series data can be quite noisy, and outliers have more inﬂuence for p ą , hence the 1-norm is used to measure the distance. Deﬁne the distance area as the objective function, f d p θ q “ ż | p p t q ´ q p t q| dt « m ÿ i “ | ˆ x i ´ ˆ y i | τ, (9)where θ is the vector of model parameter values, p p t q and q p t q are the trajectory functions of empirical and simulatedpopulations in continuous domains. For discrete time series data, ˆ x i and ˆ y i represent the population of B n fromempirical and simulated data, respectively. The stochastic optimization problem to be solved is min θ P Θ f d p θ q , (10)where Θ Ă IR n deﬁnes the feasible set (allowable values for the model parameter vector θ ). The minimum distance area, similar to other traditional optimization methods, builds objective functions based on‘mean’ measurements from stochastic simulations, hence cannot reﬂect the intrinsic noise in stochastic models. Tocapture the stochastic ﬂuctuations, measure the transition probability that a system jumps from one state to the nextstate after a certain time step. The likelihood function of time series data can then be factorized into the product oftransition probabilities. For convenience, the logarithm of the likelihood, which changes a product to a sum, is used.The logarithm of the likelihood of the observed data D is log L p θ | D q “ log ˆ m ź i “ T x i ´ ,x i ˙ “ m ÿ i “ log T x i ´ ,x i (11)where θ P IR n is the vector of model parameter values and T is the transition probability matrix. Speciﬁcally, T x i ´ ,x i is the transition probability that the system changes from state x i ´ to state x i . Note that we take the logarithm ofthe likelihood because the transition probability matrix is usually very close to zero. The log-Likelihood functionexpresses the probabilities of the observed empirical data for different values of parameter vector θ . A larger value oflog-likelihood indicates a better ﬁt to the empirical data.Using the maximum log-likelihood, the objective function is f l p θ q “ ´ log L p θ | D q , (12)and the stochastic optimization problem to be solved is min θ P Θ f l p θ q , (13)5here Θ is a set in IR n deﬁning the feasible set (allowable values for the model parameter vector θ ).When a system has a ﬁnite number of states, then we can calculate T directly from Eq. (8). When a system is small andhas an inﬁnite number of states, the ﬁnite state projection (FSP) method [29] projects the inﬁnite state vector X to aﬁnite state vector, approximating the CME solution with an error (cid:15) . Accordingly, C and T are approximated by ˆ C and ˆ T , respectively. Fox et al. [15] proved that the FSP-derived likelihood converges monotonically to the exact likelihoodvalue. In the above maximum log-likelihood method, the approximation of the transition matrix T is only tractable for smallsystems. It is difﬁcult or nearly impossible to solve the CME for large systems. To overcome this limitation, thissubsection proposes an approximate maximum log-likelihood method, a general use objective function that is applicableto large complex biochemical networks.The transition probability of system state going from x i ´ to x i is T x i ´ ,x i “ Pr p x i | x i ´ q . (14)Thus, the logarithm of the likelihood of the empirical data D is log L p θ | D q “ log ˆ m ź i “ Pr p x i | x i ´ q ˙ “ m ÿ i “ log Pr p x i | x i ´ q , (15)where Pr p x i | x i ´ q can be approximated using simulation data (An example will be shown later in (19)). The objectivefunction of approximate maximum log-likelihood is f p p θ q “ ´ log L p θ | D q . (16)The stochastic optimization problem to be solved is min θ P Θ f p p θ q , (17)where Θ is a set in IR n deﬁning the feasible set.Algorithm 1 summarizes the essential steps of the approximate maximum log-likelihood method. In Line 6, bysimulating the system q times from time t i ´ to t i with initial system state r B n s “ x i ´ , we get a list of simulationresults for r B n s at time t i , represented as S i “ r s , s , . . . , s q s . Pr p x i | x i ´ q can be approximated from the distributionof S i . For example, assuming x i (sampled in S i ) follows a normal distribution x i „ N p µ, σ q , (18)where µ and σ are the mean and variance of x i , we can approximate the probability as Pr p x i | x i ´ q « Pr p x i ´ . ă µ ` σz i ă x i ` . q , z i “ x i ´ µσ „ N p , q . (19)Note that the empirical data could be multiple species’ population trajectories, then x i in Pr p x i | x i ´ q becomes a vectorreferring to multiple species’ population. In Line 4, if the simulation results do not include r B n s “ ˆ x i , then choose onethat is the closest to ˆ x i . As mentioned before, for most systems, especially those with multidimensional parameters, there are possibly manycombinations of parameter values producing similar system dynamics and behaviours. This section introduces the α - β - γ rule to ﬁnd the acceptable parameter regions of the stochastic Hill function. Parameter values sampled in anacceptable region should have similar system results compared to the best parameter values. In particular, we appliedthe α - β - γ rule to QNSTOP, which has been used to ﬁnd the best parameter values of several stochastic problems. α - β - γ Rule

Based on the fact that QNSTOP creates an ellipsoidal design region at each iteration, we can utilize this ellipsoid to deﬁnethe acceptable parameter region. For an ellipsoid E , deﬁne the objective function values as f p E q “ t f p x q | x P E u . Toaccept a parameter region, intuitively, most parameters sampled from the region should have relatively small objectivevalues, and close to the minimum objective function value of the ellipsoidal region, written as min f p E q . Based onscrutinizing all possible distributions of f p E q , deﬁne a stable ellipsoidal region E as follows:6 lgorithm 1 Approximate maximum log-likelihood

Input:

Empirical data D “ r ˆ x , ˆ x , . . . , ˆ x m s represents the population trajectory of species r B n s Output: ´ ř mi “ log Pr p ˆ x i | ˆ x i ´ q Initialization i “ , system state ˆ x . while i ď m do if i “ then go to line 6 else initialize the system with the simulation result where r B n s “ ˆ x i ´ at time t i ´ end if Simulate the system q times from time t i ´ to t i generating a list of simulation results for B n at time t i , denotedas S i “ r s , s , . . . , s q s . Construct Pr p ˆ x i | ˆ x i ´ q based on the distribution of S i i “ i ` end whileDeﬁnition 4.1. Assume f p θ q ě always. E is a min-stable region if Pr “ f p θ q ď p ` α q min f p E q ‰ ě β, θ P E, α

P p , , β P p , s . In the above deﬁnition, α measures how close the objective function values are to the minimum value min f p E q , β controls the percentage of parameter values that generate close minimum objective function values. α and β canbe assigned with different values depending on the problem. For any parameter region E , if α is ﬁxed, deﬁne thepercentage of points with objective function values no larger than p ` α q min f p E q as the region stability , which isthe value of Pr “ f p θ q ď p ` α q min f p E q ‰ .As QNSTOP designs a speciﬁc ellipsoidal region at each iteration, the minimum objective function values found mayvary dramatically between iterations. We don’t want to accept the ellipsoidal region E of the ﬁrst iteration even if E ismin-stable, because min f p E q is usually much larger than the minimum objective value f min found over all iterationsand all starting points (if QNSTOP is run with multiple starting points). Thus, to ensure that the parameter regionis globally min-stable, we need to choose those min-stable regions whose local minimum objective function valuesare close to the lowest minimum found so far. The acceptable parameter region is deﬁned as a union of min-stableellipsoids with local minimum objective function values close to f min : Deﬁnition 4.2.

An acceptable region is R “ ď k P B E k where B “ t k | E k is min-stable and min f p E k q ď p ` γ q f min ( , γ P r , . In the above deﬁnition, k is the iteration number, γ controls how close min f p E k q is to the lowest minimum found overall iterations, and γ may vary according to the problem. Choosing values for α , β , γ is referred to as the α - β - γ rule.Note that if optimization algorithms use multiple starting points, then k indexes iterations over all starting points, and f min is the minimum found over all starting points. Values for α , β , and γ are derived from analyzing the objective function values in parameter regions. Fig. 1 shows thedistributions of the three objective function values over several iterations. For iteration 10, f p X q of maximum log-likelihood method is almost a uniform distribution, with values spread out over the domain. As the iteration continues,more objective function values are getting closer to f min , the minimum value found so far. The distribution of f p X q gradually forms a peak around f min . While these features hold for all three methods, the maximum log-likelihood hasthe highest percentage of the small objective values. Based on the distributions, the values of α for the three methodsare chosen as 0.5 (minimum distance area), 0.2 (maximum log-likelihood), 0.3 (approximate maximum log-likelihood),respectively. Fig. 2 illustrates how the ellipsoidal regions of parameters k a , k d shrink over iterations.Figure 3 shows that for all three methods, the average region stability (deﬁned before) over 100 starting points increaseswith iteration number. The maximum log-likelihood has the highest region stability compared to the other two objectivefunctions. For the maximum log-likelihood method, α “ γ “ . and β “ . based on the region stability. In this way,at least 80% of the sampled points from the acceptable region have objective function values within 20% relative errorof the minimum value, called the 80%-20% rule. For the other two methods, α “ γ “ . , β “ . for approximatemaximum log-likelihood, and α “ β “ γ “ . for minimum distance area.7 ( X )

100 200 300 400 500 P r obab ili t y Iteration 10Iteration 40Iteration 70Iteration 100 (a) Minimum distance area f ( X )

150 200 250 300 P r obab ili t y Iteration 10Iteration 40Iteration 70Iteration 100 (b) Maximum log-likelihood f ( X )

100 150 200 250 300 P r obab ili t y Iteration 10Iteration 40Iteration 70Iteration 100 (c) Approximate maximum log-likelihood

Figure 1: Distributions of objective function values from three methods (minimum distance area, maximum log-likelihood, and approximate maximum log-likelihood) based on 1000 sampled points inside the ellipsoidal regions foriterations 10, 40, 70, and 100.

This section ﬁrst discusses the experimental setup including the empirical data and input parameters. Then theoptimization results are divided into two parts. The ﬁrst part demonstrates the result of optimizing two parameters, k a and k d ; the second part compares the three objective functions and studies the full parameter vector θ . This work assumes the ordered distributive multisite phosphorylation system (1) as the ground truth.In particular, consider the system size n “ , the population level of enzyme r A s “ , and the reaction constants f i “ . , b i “ for i “ , , , . The initial condition is r B s “ , r B i s “ for ă i ď n . Use thestochastic simulation algorithm to simulate the system and sample one population trajectory of B n with a time step τ p t “ τ , t “ τ , . . . , t m “ mτ q as a single set D of empirical data. Since both systems (1) and (3) stabilize at asteady state after a certain time (transition period), if the empirical data D contains more steady state information thantransition period, then the system parameters will be optimized in a way that minimizes the difference of steady statesbut overlooks the transition dynamics before the system stabilizes, and vice versa. Therefore, the empirical data pointsare sampled equally covering both the transition and stable periods.The stochastic Hill function system (3) has an initial condition r B s “ and r B n s “ , thus there are at most 101system states. Table 2 lists the bounds for each parameter in the system.8 og ( k a ) -3 -2 -1 0 1 2 3 l og ( k d ) -3-2-10123 Iteration 10Iteration 40Iteration 70Iteration 100

Figure 2: QNSTOP ellipsoids at iterations 10, 40, 70, and 100 from the maximum log-likelihood method.

Iteration S t ab ili t y Min distance areaMax log-likelihoodApprox max log-likelihood

Figure 3: Average region stability from iteration 1 to 200 based on 100 starting points from minimum distance area( α “ . ), maximum log-likelihood ( α “ . ), and approximate maximum log-likelihood ( α “ . ).Note that parameters k a and k d (rate constants of association and dissociation) control the system’s time scale. Forﬁxed parameters k m and σ , while parameter region r . , s for k a and k d occupies 0.0001% of the entire search box,the system time scale varies by three orders of magnitude. In Fig. 4, the ﬁnal population of B n initially grows linearly(in logarithm) with k a { k d and then levels off around 100. When k a ą , k d “ , the objective function value does notchange much because all B molecules are phosphorylated at the stable state. Thus, the decimal region r L, s ( L ă is the lower bound), while sensitive, is minimized or overlooked when the upper bound U is much larger than one( U " ). This decimal parameter sensitivity lost phenomenon can affect the optimization performance, especially9able 2: Parameter boundary in the stochastic Hill function system.Parameter k a k d k m σ r L, U s r . , s r . , s r . , s r . , s log pr L, U sq r´ , s r´ , s r´ , s r´ , s for systems that are sensitive to the r , s domain. To solve this problem, simply use the logarithm of the parameters.Thus the bound for log p k a q and log p k d q is r´ , s , shown in Table 2. For numerical stability, QNSTOP scales thesearch box r L, U s to the unit cube. k a /k d -5 B n ( T f i n a l ) -2 -1 Figure 4: The population of B n at stable state under the stochastic Hill function system with respect to different k a { k d values, where k d “ . This subsection studies the two parameter case, in which k a and k d are unknown while σ “ and k m “ in thestochastic Hill function system. The QNSTOP settings are: total iterations = 100, sample point number N “ , initialellipsoid radius TAU “ . (one-tenth of the r L, U s box diameter), ellipsoid radius τ k decay factor GAIN “ ,MODE “ ‘G’ (global optimization mode). There are m “ empirical data points collected at time step τ “ . from one SSA simulated trajectory of the B n population. The α - β - γ acceptable region is deﬁned by α “ . , β “ . , γ “ . , which indicates that 80% of the sampled values should have objective function values within 20% relativeerror of the local minimum, and the local minimum is within 20% relative error of the lowest minimum found over allstarting points and iterations.Fig. 5 shows the objective function values of maximum log-likelihood ( f l ) over the entire k a , k d domain. In the centerof the graph, the dark blue region ( log p k a q P r´ , s , log p k d q P r´ , s ) has the minimum objective function value, f l « . Any ( log p k a q , log p k d q ) pairs sampled in the dark blue region are acceptable values for the stochasticHill function system (3). Inﬂuence of starting points.

Fig. 6 shows the execution traces of QNSTOP and the corresponding acceptable parameterregions from different starting points: lower boundary point L , upper boundary point U , and center point p L ` U q{ .There are three types of execution traces in Fig. 6(a, c, e): the solid red line is the objective function values of theellipsoid center at each iteration; the dashed blue line is the minimum objective function values sampled in the ellipsoidalregion of each iteration; the dotted black line is the maximum objective function values sampled in the ellipsoidal10igure 5: Exhaustive search of objective function values from the maximum log-likelihood method over domain p log p k a q , log p k d qq P r´ , s .region of each iteration. From the execution traces of all three starting points, the best sampled objective function value f l p X q decreases fast in the ﬁrst 20 iterations and then oscillates around 200. The worst sampled objective functionvalue also oscillates around 200 after 60 iterations. In Fig. 6(b, d, f), all ellipsoids satisfying the α - β - γ rule, which formthe acceptable region of parameter space, are plotted in the domain. The acceptable parameter regions are similar forall three methods and are located in the range of r´ , . s for both log p k a q and log p k d q , regardless of the subtledifferences in region size and number of acceptable ellipsoids. Since QNSTOP can choose multiple random startingpoints from a Latin Hypercube experimental design, the rest of the paper shows the acceptable parameter regionscollected from 20 such starting points. Inﬂuence of empirical data.

To study the robustness of the algorithm, vary the empirical data and the data size. Fig. 7illustrates the acceptable regions from the same population trajectory of B n but with different numbers of data points.Fig. 7(a) samples ten data points of B n population with a time step τ “ , while Fig. 7(b),(c) sample 50 and 200 datapoints with time steps τ “ . and τ “ . , respectively. The acceptable region of parameter space increases with thedata size m .Fig. 8 shows the optimization results from three different empirical datasets of B n population trajectories sampledfrom the stochastic multisite phosphorylation system. With different population trajectories, the acceptable parameterregions are consistent in size and shape. Thus, the parameter optimization of the stochastic Hill function system is moresensitive to the size of dataset than the data content variation. This subsection studies the full parameter vector, in which all four parameters k a , k d , σ , and k m are unknown in thestochastic Hill function system. The QNSTOP settings are the same except the initial ellipsoid radius TAU “ . andthe sample point number N “ . The empirical data is one simulated trajectory of the B n population where m “ , τ “ . . The values of α - β - γ deﬁning the acceptable regions are set differently according to the objective functions. Inﬂuence of objective functions.

Fig. 9 presents the results of the three objective functions: minimum distance area,maximum log-likelihood, and approximate maximum log-likelihood. For all three methods, the acceptable regionsfor the ( log p k a q , log p k d q ) pair are an ellipsoid shape centered at the middle of the domain, though the size hassubtle differences. While for the ( log p k m q , log p σ q ) pair, the acceptable regions are all over the domain, shown inFig. 10a. This results happens to all three methods, indicating that the system is not sensitive to parameters k m , σ . Notethat here the population level of enzyme A is ﬁxed in this stochastic Hill function. The acceptable regions of the pair11 teration f ( X ) Ellipsoid CenterBest SampledWorst Sampled (a) Execution trace of QNSTOP log ( k a ) -3 -2 -1 0 1 2 3 l og ( k d ) -3-2-10123 (b) Acceptable region Iteration f ( X ) Ellipsoid CenterBest SampledWorst Sampled (c) Execution trace of QNSTOP log ( k a ) -3 -2 -1 0 1 2 3 l og ( k d ) -3-2-10123 (d) Acceptable region Iteration f ( X ) Ellipsoid CenterBest SampledWorst Sampled (e) Execution trace of QNSTOP log ( k a ) -3 -2 -1 0 1 2 3 l o g ( k d ) -3-2-10123 (f) Acceptable region Figure 6: Execution traces of QNSTOP (left column) and acceptable regions of parameter space (right column) frommaximum log-likelihood method with different starting points: (a, b) the lower box corner p log p k a q , log p k d qq “p´ , ´ q ; (c, d) box center p log p k a q , log p k d qq “ p , q ; (e, f) the upper box corner p log p k a q , log p k d qq “ p , q .The three types of execution traces are: objective function values of the ellipsoid center at each iteration (red line);minimum objective function values sampled in the ellipsoidal region of each iteration (dashed blue line); maximumobjective function values sampled in the ellipsoidal region of each iteration (dotted black line).12 ime P opu l a t i on (a) Empirical data of B n population log ( k a ) -3 -2 -1 0 1 2 3 l og ( k d ) -3-2-10123 (b) Acceptable region Time P opu l a t i on (c) Empirical data of B n population log ( k a ) -3 -2 -1 0 1 2 3 l og ( k d ) -3-2-10123 (d) Acceptable region Time P opu l a t i on (e) Empirical data of B n population log ( k a ) -3 -2 -1 0 1 2 3 l og ( k d ) -3-2-10123 (f) Acceptable region Figure 7: Different empirical data of B n population (left column) and the corresponding acceptable parameter region(right column) from maximum log-likelihood method. The empirical data are sampled using different time steps τ andsample sizes m : (a, b) τ “ , m “ ; (c, d) τ “ . , m “ ; (e, f) τ “ . , m “ . The acceptable parameterregion is the union of results from 20 starting points.( log p k m q , log p σ q ) will be signiﬁcantly different if considering multiple population levels of enzyme A. Fig. 10bshows the results for an objective function that sums over 11 population levels of enzyme A, where r A s “ , 300,400, 520, 620, 800, 1050, 1400, 2100, 5000, 20000. This summed objective function will be explored in future work.13 ime P opu l a t i on (a) Empirical data of B n population log ( k a ) -3 -2 -1 0 1 2 3 l og ( k d ) -3-2-10123 (b) Acceptable region Time P opu l a t i on (c) Empirical data of B n population log ( k a ) -3 -2 -1 0 1 2 3 l og ( k d ) -3-2-10123 (d) Acceptable region Time P opu l a t i on (e) Empirical data of B n population log ( k a ) -3 -2 -1 0 1 2 3 l og ( k d ) -3-2-10123 (f) Acceptable region Figure 8: Optimization results of maximum log-likelihood with three different empirical data sets containing 50 datapoints, collected every 0.2 time unit. The acceptable region is the union of results from 20 starting points.To compare the stochastic Hill function system with the ordered distributive enzyme-substrate phosphoryaltion process,sample 100 points from the returned acceptable parameter region. Fig. 11 shows the average population trajectoryof B n in the stochastic Hill function using the sampled parameter values. The average population dynamics fromall three methods match well with the empirical data. Fig. 12 further demonstrates the population distributions ofthe stochastic Hill function system based on the 100 parameter vector values sampled from the acceptable regions.Except for the signiﬁcant difference at the initial stage of the B n transition (time t ă ), the empirical data falls in the14 teration f ( X ) Ellipsoid CenterBest SampledWorst Sampled (a) Average execution trace of QNSTOP log ( k a ) -3 -2 -1 0 1 2 3 l og ( k d ) -3-2-10123 (b) Acceptable region Iteration f ( X ) Ellipsoid CenterBest SampledWorst Sampled (c) Average execution trace of QNSTOP log ( k a ) -3 -2 -1 0 1 2 3 l og ( k d ) -3-2-10123 (d) Acceptable region Iteration f ( X ) Ellipsoid CenterBest SampledWorst Sampled (e) Average execution trace of QNSTOP log ( k a ) -3 -2 -1 0 1 2 3 l og ( k d ) -3-2-10123 (f) Acceptable region Figure 9: Average execution traces of QNSTOP (left column) based on 20 starting points and the correspondingacceptable parameter regions (right column) projected to two-dimensional domains of k a and k d . (a, b) minimumdistance area, (c, d) maximum log-likelihood, (e, f) approximate maximum log-likelihood.25th-75th percentile range for other stages (time t “ , t “ , t “ , t “ , t “ ) from maximum log-likelihoodand approximate maximum log-likelihood methods. The 25th-75th percentile range of B n population from minimumdistance area is narrower than that from the other two methods.15 og ( k m ) -2 0 2 4 6 l og ( σ ) -3-2.5-2-1.5-1-0.500.51 (a) Acceptable region for k m , σ log ( k m ) -2 0 2 4 6 l og ( σ ) -3-2.5-2-1.5-1-0.500.51 (b) Acceptable region for k m , σ Figure 10: Acceptable parameter regions projected to two-dimensional domains of k m and σ from maximum log-likelihood method. (a) The population of enzyme A is ﬁxed at a single value. (b) 11 population levels of enzyme A areconsidered in the stochastic Hill function system. Time P opu l a t i on Empirical dataMin distance areaMax log-likelihoodApprox max log-likelihood

Figure 11: Average population dynamics of B n in the stochastic Hill function system based on 100 parametervector values sampled from the acceptable regions returned by minimum distance area, maximum log-likelihood, andapproximate maximum log-likelihood. The simulated empirical data (50 data points) are marked by red crosses. This paper formulated a stochastic Hill function model for the multisite phosphorylation mechanism, and matched SSAsimulation-based empirical population trajectory data for the ordered distributive enzyme-substrate phosphorylationprocess, using three different objective functions — minimum distance area, maximum log-likelihood, and approximatemaximum log-likelihood. The approximate maximum log-likelihood method works well and is applicable to largecomplex biochemical networks. The main contributions are (1) an α - β - γ rule to ﬁnd acceptable parameter regionsinstead of a single best parameter vector, (2) the Hill function system model must be matched to data with varyingenzyme levels [A], and (3) demonstrating again that QNSTOP is well-suited to stochastic biological system modelparameter estimation. Results showed that the optimized stochastic Hill function can be used to model the switch16 ime P opu l a t i on Empirical data (a) Minimum distance area

Time P opu l a t i on Empirical data (b) Maximum log-likelihood

Time P opu l a t i on Empirical data (c) Approximate maximum log-likelihood

Figure 12: Population distributions of B n in the stochastic Hill function system (3) at time t “ , 2, 3, 6, 8, 10,corresponding to 10%, 20%, 30%, 60%, 80%, 100% of the total simulation time, based on 100 points sampled ineach acceptable parameter region from minimum distance area, maximum log-likelihood, and approximate maximumlog-likelihood.behavior and the steady state of the multisite phosphorylation process, while it cannot capture the initial transition period.QNSTOP and the α - β - γ rule are generally applicable to other stochastic models. Meanwhile, for aforementionedﬁrst order reaction networks where theoretical solution or efﬁcient modeling and simulation methods of the stochastickinetics are available, one may take advantage of these theoretical solutions or reduced models and develop moreefﬁcient parameter estimation strategies. Acknowledgments

This work was partially supported by the National Science Foundation under awards CCF-1526666, MCB-1613741 andCCF-1909122.

References [1] G. S. Adair. The hemoglobin system VI. The oxygen dissociation curve of hemoglobin.

Journal of BiologicalChemistry , 63(2):529–545, 1925.[2] F. Ali, C. Hindley, G. McDowell, R. Deibler, A. Jones, M. Kirschner, F. Guillemot, and A. Philpott. Cellcycle-regulated multi-site phosphorylation of neurogenin 2 coordinates cell cycling with differentiation duringneurogenesis.

Development , 138(19):4267–4277, 2011.173] B. D. Amos, D. R. Easterling, L. T. Watson, W. I. Thacker, B. S. Castle, and M. W. Trosset. Algorithm XXX:QNSTOP—quasi-Newton algorithm for stochastic optimization. Technical report, TR-14-02, Department ofComputer Science, Virginia Polytechnic Institute & State University, Blacksburg, VA, 2014.[4] M. Ashyraliyev, Y. Fomekong-Nanfack, J. A. Kaandorp, and J. G. Blom. Systems biology: parameter estimationfor biochemical models.

The FEBS journal , 276(4):886–902, 2009.[5] D. Barik, W. T. Baumann, M. R. Paul, B. Novak, and J. J. Tyson. A model of yeast cell-cycle regulation based onmultisite phosphorylation.

Molecular systems biology , 6(1), 2010.[6] F. L. Brown. Single-molecule kinetics with time-dependent rates: a generating function approach.

Physical reviewletters , 90(2):028302, 2003.[7] S. L. Brunton, J. L. Proctor, and J. N. Kutz. Discovering governing equations from data by sparse identiﬁcation ofnonlinear dynamical systems.

Proceedings of the National Academy of Sciences , 113(15):3932–3937, 2016.[8] B. Castle.

Quasi-Newton methods for stochastic optimization and proximity-based methods for disparate informa-tion fusion . PhD thesis, Indiana University, 2012.[9] M. Chen, B. D. Amos, L. T. Watson, J. J. Tyson, Y. Cao, C. A. Shaffer, M. W. Trosset, C. Oguz, and G. Kakoti.Quasi-Newton stochastic optimization algorithm for parameter estimation of a stochastic model of the buddingyeast cell cycle.

IEEE/ACM transactions on computational biology and bioinformatics , 16(1):301–311, 2017.[10] M. Chen, Y. Cao, and L. T. Watson. Parameter estimation of stochastic models based on limited data.

ACMSIGBioinformatics Record , 7(3):3, 2018.[11] M. Chen, F. Li, S. Wang, and Y. Cao. Stochastic modeling and simulation of reaction-diffusion system with hillfunction dynamics.

BMC systems biology , 11(3):21, 2017.[12] I. Darvey and P. Staff. Stochastic approach to ﬁrst-order chemical reaction kinetics.

The Journal of ChemicalPhysics , 44(3):990–997, 1966.[13] D. R. Easterling, L. T. Watson, M. L. Madigan, B. S. Castle, and M. W. Trosset. Parallel deterministic andstochastic global minimization of functions with very many minima.

Computational Optimization and Applications ,57(2):469–492, 2014.[14] J. E. Ferrell, S. H. Ha, et al. Ultrasensitivity part ii: multisite phosphorylation, stoichiometric inhibitors, andpositive feedback.

Trends in biochemical sciences , 39(11):556–569, 2014.[15] Z. Fox, G. Neuert, and B. Munsky. Finite state projection based bounds to compare chemical master equationmodels using single-cell data.

The Journal of chemical physics , 145(7):074101, 2016.[16] C. Gadgil, C. H. Lee, and H. G. Othmer. A stochastic analysis of ﬁrst-order reaction networks.

Bulletin ofmathematical biology , 67(5):901–946, 2005.[17] C. Gérard, J. J. Tyson, and B. Novák. Minimal models for cell-cycle control based on competitive inhibition andmultisite phosphorylations of cdk substrates.

Biophysical journal , 104(6):1367–1379, 2013.[18] D. T. Gillespie. A general method for numerically simulating the stochastic time evolution of coupled chemicalreactions.

Journal of computational physics , 22(4):403–434, 1976.[19] D. T. Gillespie. A rigorous derivation of the chemical master equation.

Physica A: Statistical Mechanics and itsApplications , 188(1-3):404–425, 1992.[20] A. V. Hill. The possible effects of the aggregation of the molecules of haemoglobin on its dissociation curves. j.physiol. , 40:4–7, 1910.[21] O. Kapuy, D. Barik, M. R. D. Sananes, J. J. Tyson, and B. Novák. Bistability by multiple phosphorylation ofregulatory proteins.

Progress in biophysics and molecular biology , 100(1-3):47–56, 2009.[22] D. Koshland Jr, G. Nemethy, and D. Filmer. Comparison of experimental binding data and theoretical models inproteins containing subunits.

Biochemistry , 5(1):365–385, 1966.[23] G. Lente. Stochastic mapping of ﬁrst order reaction networks: A systematic comparison of the stochastic anddeterministic kinetic approaches.

The Journal of chemical physics , 137(16):164101, 2012.[24] X. Liu and M. Niranjan. State and parameter estimation of the heat shock response system using kalman andparticle ﬁlters.

Bioinformatics , 28(11):1501–1507, 2012.[25] X. Liu and M. Niranjan. Parameter estimation in computational biology by approximate bayesian computationcoupled with sensitivity analysis. arXiv preprint arXiv:1704.09021 , 2017.[26] M Chen, S Wang, and Y Cao. Accuracy analysis of hybrid stochastic simulation algorithm on linear chain reactionsystems.

Bulletin of Mathematical Biology , 81(8):3024–3052, 2019.1827] P. Mendes and D. Kell. Non-linear optimization of biochemical pathways: Applications to metabolic engineeringand parameter estimation.

Bioinformatics (Oxford, England) , 14(10):869–883, 1998.[28] J. Monod, J. Wyman, and J.-P. Changeux. On the nature of allosteric transitions: a plausible model.

J Mol Biol ,12(1):88–118, 1965.[29] B. Munsky and M. Khammash. The ﬁnite state projection algorithm for the solution of the chemical masterequation.

The Journal of chemical physics , 124(4):044104, 2006.[30] L. Pauling. The oxygen equilibrium of hemoglobin and its structural interpretation.

Proceedings of the NationalAcademy of Sciences of the United States of America , 21(4):186, 1935.[31] N. R. Radcliffe, D. R. Easterling, L. T. Watson, M. L. Madigan, and K. A. Bieryla. Results of two globaloptimization algorithms applied to a problem in biomechanics. In

Proceedings of the 2010 Spring SimulationMulticonference , page 86. Society for Computer Simulation International, 2010.[32] A. Raj and A. van Oudenaarden. Single-molecule approaches to stochastic gene expression.

Annual review ofbiophysics , 38:255–270, 2009.[33] C. Robert and G. Casella.

Monte Carlo statistical methods . Springer Science & Business Media, 2013.[34] B. Y. Rubinstein, H. H. Mattingly, A. M. Berezhkovskii, and S. Y. Shvartsman. Long-term dynamics of multisitephosphorylation.

Molecular biology of the cell , 27(14):2331–2340, 2016.[35] S. H. Rudy, S. L. Brunton, J. L. Proctor, and J. N. Kutz. Data-driven discovery of partial differential equations.

Science Advances , 3(4):e1602614, 2017.[36] C. Salazar and T. Höfer. Multisite protein phosphorylation–from molecular mechanisms to kinetic models.