BioMETA: A multiple specification parameter estimation system for stochastic biochemical models
BBioMETA: A multiple specification parameter estimation systemfor stochastic biochemical models
Arfeen Khalid
Department of Computer ScienceUniversity of Central FloridaOrlando, FL 32816 [email protected] A BSTRACT
The inherent behavioral variability exhibited by stochastic biochemical systems makes it a challengingtask for human experts to manually analyze them. Computational modeling of such systems helps ininvestigating and predicting the behaviors of the underlying biochemical processes but at the sametime introduces the presence of several unknown parameters. A key challenge faced in this scenariois to determine the values of these unknown parameters against known behavioral specifications. Thesolutions that have been presented so far estimate the parameters of a given model against a singlespecification whereas a correct model is expected to satisfy all the behavioral specifications wheninstantiated with a single set of parameter values. We present a new method, BioMETA, to addressthis problem such that a single set of parameter values causes a parameterized stochastic biochemicalmodel to satisfy all the given probabilistic temporal logic behavioral specifications simultaneously.Our method is based on combining a multiple hypothesis testing based statistical model checkingtechnique with simulated annealing search to look for a single set of parameter values so that thegiven parameterized model satisfies multiple probabilistic behavioral specifications. We study twostochastic rule-based models of biochemical receptors, namely, Fc (cid:15)
RI and T-cell as our benchmarksto evaluate the usefulness of the presented method. Our experimental results successfully estimate parameters of Fc (cid:15) RI and parameters of T-cell receptor model against three probabilistic temporallogic behavioral specifications each. K eywords cell signaling; multiple hypothesis testing; parameter estimation; rule-based models; signal temporal logic;statistical model checking; stochastic biochemical models; systems biology Stochastic rule-based models serve as a natural and compact representation for biochemical reactions. Biochemicalmodeling languages [1, 2] are used to succinctly describe biochemical systems, such as cell signaling pathways. TheGillespie stochastic simulation algorithm [3] and its variants [4] are then employed to predict the behavior of thebiochemical system modeled by the stochastic rule-based model.However, it is often not feasible to create a complete stochastic rule-based model from first principles. Instead, ourknowledge of the biochemical system is used to obtain the set of chemical reactions or the core structure of the stochasticrule-based model. The lack of knowledge about the rate constants of biochemical reactions is readily modeled asunknown parameters in stochastic rule-based models.A primary challenge in the use of such a parameterized stochastic rule-based model for predicting the behavior ofa biological system is the determination of the parameters of the model from multiple experimental observations.Traditionally, parameter values of a stochastic model have been estimated against quantitative time series data. A fewrelatively recent efforts [5, 6] have focused on estimating the parameters of a biological model against qualitativestochastic experimental observations encoded as probabilistic temporal logic formulas. a r X i v : . [ q - b i o . Q M ] J a n owever, the focus of these efforts has been on discovering parameter values of a paramaterized stochastic biologicalmodel from a single probabilistic temporal logic specification. In practice, a biological model must satisfy multipleexperimental observations made on the biological system being modeled. Hence, it is important to estimate a singleset of parameter values that causes a parameterized stochastic model to satisfy multiple probabilistic temporal logicspecifications simultaneously.In this study, we make the following contributions towards parameter estimation of stochastic biochemical models:(i) We design a new approach that utilizes a quantitative tightness metric to estimate a single set of parametervalues for a parameterized stochastic biochemical model such that the model satisfies all the given probabilistictemporal logic behavioral specifications simultaneously . Our approach employs multiple hypothesis testingbased statistical model checking to evaluate stochastic models against multiple specifications. It further usessimulated annealing search to estimate the unknown parameters present in the model.(ii) Our experimental results demonstrate that our method BioMETA is able to estimate parameters of theFc (cid:15) RI model and parameters of the T-cell receptor model satisfying three temporal logic properties each. In the following subsections, we first describe rule-based modeling using software tool BioNetGen [2]. We thenformally define Signal Temporal Logic (STL) [7] that is used to encode behavioral specifications as temporal logicformulas. We also describe TeLEx [8] that computes the quantitative tightness metric describing how well the givenmodel satisfies a behavioral specification.
Rule-based modeling is a relatively new formalism that allows to model biochemical systems by representing moleculesas structured objects and molecular interactions as rules [1, 9, 10, 11]. The interactions can be of various types includingassociations, dissociations, modifications to the internal state of a molecule as well as the production or consumption ofmolecular species [12]. In other words, a rule specifies how the states of reactants are modified to generate productsin a biochemical reaction. Molecular interactions can result in a large number (hundreds to thousands) of possiblemolecular species. Conventional approaches, involving translating the model into ODEs or CTMCs, are unable tohandle such combinatorial complexity. Rule-based modeling addresses this problem by expressing models with ahigh degree of modularity and avoid the explicit enumeration of all possible molecular species or all the states of asystem; hence, providing a succinct representation of the model [1]. The rules are then simulated to generate a reactionnetwork comprised of all chemically distinct species and reactions [2]. Therefore, a rule-based model is a compact andgeneralized representation of conventional biochemical models.
Biological Network Generator (BioNetGen) is an open source software tool that can be used to construct, visualize,simulate and analyze rule-based models. The tool is based on a formal language, known as BioNetGen language(BNGL), for specifying molecules and rules to model biological systems. A BNGL model file is composed of six primaryblocks [13], namely, (i) parameters that include rate constants and values for initial concentrations of species and areresponsible for governing the dynamics of the system, (ii) molecule types defining molecules, including componentsand allowed component states, (iii) seed species that describe the initial state of the system, (iv) observables thatoutput functions of concentrations of species having particular attributes, (v) reaction rules describing how moleculesinteract with each other and (vi) actions that provide various methods for generating and simulating the network. In ourexperiments, we use the stochastic simulation algorithm (SSA) implemented in the software to simulate our biochemicalmodels. Each model simulation results in a time-stamped simulation trace capturing the behavior of the model wheninstantiated with a particular set of parameter values.
It is required that the acceptable behaviors of a model are represented as temporal logic formulas so that computationalmethods can be applied to verify whether a model satisfies a behavior or not. In this study, we use Signal TemporalLogic (STL) to encode the known behaviors of a stochastic biochemical model. It is a linear time temporal logic usedto represent continuous time behaviors of a model [14, 7]. We use a time bounded variant of STL where all temporaloperators are associated with a lower and upper time-bound. The logical operators in STL formulas consist of ∧ (and), ∨ (or), ¬ (negation), and bounded time operators consist of G (global), F (future), and U (until).2 efinition 1 (Signal Temporal Logic) . A Signal Temporal Logic formula representing a model’s known behavior isdefined recursively by the following grammar: φ := µ | ¬ µ | φ ∧ φ | φ ∨ φ | G [ t ,t ] φ | F [ t ,t ] φ | φ U [ t ,t ] φ where φ , φ and φ are STL formulas, 0 (cid:54) t < t < ∞ and µ is a predicate whose value is determined by the sign of afunction of an underlying signal x ∈ R , i.e., µ ≡ µ ( x ) > x is termed as a simulation trace or simulation trajectory and is obtained by simulating a given model in BioNetGen software. The validity of a formula φ with respect to signal x at time t is defined inductively as follows: ( x , t ) (cid:15) µ ⇔ µ ( x t ) > ( x , t ) (cid:15) ¬ µ ⇔ ¬ (( x , t ) (cid:15) µ )( x , t ) (cid:15) φ ∧ φ ⇔ ( x , t ) (cid:15) φ ∧ ( x , t ) (cid:15) φ ( x , t ) (cid:15) φ ∨ φ ⇔ ( x , t ) (cid:15) φ ∨ ( x , t ) (cid:15) φ ( x , t ) (cid:15) G [ t ,t ] φ ⇔ ∀ t (cid:48) ∈ [ t + t , t + t ] s.t. ( x , t (cid:48) ) (cid:15) φ ( x , t ) (cid:15) F [ t ,t ] φ ⇔ ∃ t (cid:48) ∈ [ t + t , t + t ] s.t. ( x , t (cid:48) ) (cid:15) φ ( x , t ) (cid:15) φ U [ t ,t ] φ ⇔ ∃ t (cid:48) ∈ [ t + t , t + t ] s.t. ( x , t (cid:48) ) (cid:15) φ ∧ ∀ t (cid:48)(cid:48) ∈ [ t, t (cid:48) ] , ( x , t (cid:48)(cid:48) ) (cid:15) φ A signal x satisfies φ , denoted by x (cid:15) φ , if ( x ,0) (cid:15) φ . Informally, x (cid:15) G [ t ,t ] φ if φ holds at every time step between t and t , and x (cid:15) F [ t ,t ] φ if φ holds at some time step between t and t . Also, x (cid:15) φ U [ t ,t ] φ if φ holds at everytime step before φ holds, and φ holds at some time step between t and t .Due to the stochastic nature of the given biochemical models, their behavioral specifications are also often probabilisticin nature. Therefore, we use a probabilistic variant of Signal Temporal Logic to encode model behaviors. ProbabilisticSTL can be further explained with the help of following example: Example 1.
Consider the following probabilistic Signal Temporal Logic formula: P ≥ . ( G [0 , (( GP rotein > ∧ F [100 , ( GP rotein <
It says that G protein should always have a high value i.e. greater than 6000 units during the first 100 (0-100) timeunits and should fall below 6000 at some point in time during the next 100 (100-200) time units with a probability of atleast 0.85.
To verify the model against a given STL property we use the algorithm implemented by a software framework TemporalLogic Extractor (TeLEx) [8]. Given a time-stamped simulation trace and a STL behavioral specification formula, TeLExquantifies the degree of model satisfiability by computing a tightness metric that describes how well a model satisfies thespecification. TeLEx uses smooth functions, such as sigmoid and exponentials, to compute tight-satisfiability of STLformulas. A successful simulation trajectory of a model refers to such a trajectory that satisfies a given STL formula.Whereas an unsuccessful simulation trajectory refers to a trajectory that does not satisfy the given STL formula. TeLExreturns a positive value in case the STL formula is verified successfully; the larger the value the better it satisfies thebehavior. If the model is not able to satisfy the STL formula, the algorithm returns a negative value describing how faroff is the model from satisfying the specification; for further details and examples see [8].
One of the crucial steps in the process of parameter estimation is to check the model against behavioral temporal logicspecifications. Statistical Model Checking (SMC) [15] is a popular method among many well-studied model checkingtechniques available in the literature. A detailed survey [16] on various SMC techniques indicates that Statistical ModelChecking based on Wald’s Sequential Probability Ratio Test (SPRT) [17] is widely studied [18, 19, 20] and used inpractice. Another variant of SMC [21, 6] uses Bayesian Sequential Hypothesis Testing and improves performance byincorporating prior knowledge about the model being verified. Mancini et al. [22] propose a parallel SMC algorithm toyield model parameters of biological systems represented as Ordinary Differential Equations (ODEs). Their approachfollows a master-slave architecture where a single master implements the SMC algorithm and assigns the numericalintegration of the given ODE system to slaves. 3atisfiability Modulo Theory (SMT) has also been used to perform model checking of biological models [23]. It worksby first extracting a collection of ODEs from a given model and then formulating these ODEs along with time-seriesdata into a collection of SMT problems. Another approach [5] performs model checking based on an open sourcesymbolic model checker, NuSMV [24], and formulates a given model using Binary Decision Diagrams (BDDs). TheseBDDs provide a compact way of representing boolean functions which in turn represent the different states of the givenmodel.A recent study [12] presents a SMC based parameter estimation framework for rule-based models formulated inBioNetGen. It verifies experimental data as well as qualitative properties of the given model. Wang et al. [25] extendsthe rule-based BioNetGen language to enable the specification of interactions among more than one cell. They employSMC in order to analyze system properties and obtain interesting insights into the development of novel therapeuticstrategies for pancreatic cancer. Techniques based on applying convolutional neural networks [26] in order to learntemporal formulas have also been recently proposed.However, each of these methods is limited to verifying the model with a binary (yes/no) outcome whereas we computea quantitative tightness metric describing how well the given model satisfies a behavioral specification. Further, thesemethods are limited to verifying only a single temporal logic property of the model at one time whereas BioMETA usesa sequential multiple hypothesis testing technique to validate the model against all given specifications simultaneously.
Several attempts have been made to estimate parameters of stochastic systems using a quantitative measure of how wella model satisfies a known temporal logic behavioral specification.Rizk et al. [27] define a continuous degree of satisfaction of a temporal logic property which is then used as a fitnessfunction in order to find kinetic parameters of a biochemical model. However, their continuous degree of satisfactionis limited to providing a quantitative measurement only in case of dissatisfaction and remains zero in case the modelsatisfies a temporal logic property. Whereas we compute a tightness metric that provides a quantitative measureirrespective of whether or not the model satisfies a system property; thus, giving a more meaningful interpretation ofmodel verification.Another technique [28] using Gaussian Process Upper Confidence Bound (GP-UCB) computes a distribution of aquantitative satisfaction function specifying the degree of satisfaction of a property by the model. An average of thisdistribution is later used to guide the parameter search. However, this approach is limited to verifying only one temporallogic property at one time as opposed to BioMETA’s ability to verify all the properties simultaneously.A global exploration of the parameter space of stochastic biochemical systems is performed using probabilistic modelchecking [29, 30, 31]. For each parameter point, they compute approximate upper and lower bounds of a landscapefunction that returns a quantitative value. This value is based on the probability that the model satisfies a givenCSL (continuous stochastic logic) formula. Another software framework uses the Bayesian formalism for parameterestimation and model selection [32]. Euclidean distance (sum of squares) between the observed data and a simulatedtrajectory is computed; a parameter point is accepted if this distance is less than a threshold value.To the best of our knowledge, this is the first time that a quantitative tightness metric is used to check model satisfiabilityagainst multiple probabilistic temporal logic behavioral specifications simultaneously; hence, generating a single set ofparameter values.
We are interested in finding a set of parameter values ρ ∈ R n so that the model M , when instantiated with set ρ ,satisfies each specification φ i with probability greater than or equal to its corresponding required probability r i . Wenow formally define our problem. Definition 2 (Multiple specification parameter estimation problem) . Given a parameterized stochastic rule-basedmodel M ( ρ ) with parameter set ρ ∈ R n , a set of desired specifications φφφ = { φ , ..., φ k } in Signal Temporal Logic, anda corresponding set of required probabilities rrr = { r , ..., r k } where r i ∈ [0,1], find a set of parameter values ρ suchthat the following holds: M ( ρ ) (cid:15) P ≥ r ( φ ) ∧ P ≥ r ( φ ) ∧ · · · ∧ P ≥ r k ( φ k ) Our solution to the problem of estimating parameters from multiple specifications is illustrated in Figure 1. Givenan initial set of parameter values ρ init , we first simulate the given model M ( ρ init ) using BioNetGen to generate atime-stamped simulation trace. The simulation trace along with the given set of k STL specifications φφφ = { φ , ..., φ k } is4hen fed to TeLEx which returns k quantitative tightness metrics describing the distance between the model M ( ρ init )and each of the k given specifications. TeLEx quantitative tightness metrics and the set of required probabilities rrr ={ r , ..., r k } are then passed onto multiple hypothesis testing (MHT) that decides whether the model instantiated with ρ init satisfies the given probabilistic STL specifications or not. The hypothesis test continues on drawing samples, i.e.continues on generating model simulations, until it has made a decision. If the test declares that M ( ρ init ) satisfies allthe given specifications with greater than or equal to their corresponding required probabilities, then ρ init is returnedas the estimated set of parameter values. Otherwise a search algorithm, guided by the mean of quantitative tightnessmetrics, explores the parameter space of the given model in order to find a new set of parameter values and repeats theabove process.Figure 1: Our proposed method: BioMETA. A Multiple specification parameter estimation system using multiplehypothesis testing based statistical model checking.We explain the approach implemented in this study by dividing it into two phases. The first phase employs multiplehypothesis testing (MHT) based statistical model checking to check a given model against multiple STL behavioralspecifications. The second phase of our method implements a simulated annealing based search algorithm to exploremodel’s parameter space and finds a single set of parameter values which satisfies all the given probabilistic behavioralspecifications simultaneously. As the problem (Definition 2) states, given a set φφφ of k STL behavioral specifications we are required to generate asingle set of parameter values such that the model satisfies each specification φ i with probability at least r i . In order toachieve this, the given model is first simulated using BioNetGen and the simulation trace is verified using TeLEx againsteach given specification. Essentially, TeLEx verification function is called k times and each call returns a quantitativetightness metric against the respective specification φ i .Due to the stochastic nature of the given model, model simulations with the same parameter values show varyingbehaviors which results in varying TeLEx tightness metrics. Therefore a multiple hypothesis testing technique, proposedby Bartroff et al. [33], is employed that generates several model simulations before deciding model satisfiability. Theprocess of generating multiple simulations results in forming a separate distribution of TeLEx tightness metrics againsteach specification. Hence, k distributions are generated where each distribution corresponds to the tightness metricscomputed by TeLEx against each of the k STL specifications.We define the mean of each distribution i as θ ( i ) . In order to decide whether the given model satisfies a specification,we aim to test the hypothesis that the mean θ ( i ) of each distribution is less than a threshold value θ (cid:48) ( i ) [17, Section 5.4].If the mean θ ( i ) of distribution i is greater than its corresponding threshold θ (cid:48) ( i ) , the model satisfies φ i since a greatermean of TeLEx tightness metrics indicates better model satisfiability. However if θ ( i ) < θ (cid:48) ( i ) , it is considered that themodel does not satisfy the corresponding specification φ i .5nstead of testing the hypothesis against a single value θ (cid:48) ( i ) , the problem is relaxed by introducing two thresholds θ ( i )0 and θ ( i )1 such that θ ( i )0 < θ (cid:48) ( i ) < θ ( i )1 . We vary the two thresholds using a common factor, δ i.e. θ ( i )0 = θ (cid:48) ( i ) − δ and θ ( i )1 = θ (cid:48) ( i ) + δ where < δ < [34]. The region ( θ ( i )0 , θ ( i )1 ) is defined as the indifference region . As the indifferenceregion becomes smaller, we reduce the risk of making a wrong decision but this comes at the cost of drawing moresamples in order for the hypothesis test to make a decision. Hence for each distribution i , we test the followinghypothesis: H ( i )0 : θ ( i ) = θ ( i )0 H ( i )1 : θ ( i ) = θ ( i )1 (1)with type I (false negative) and type II (false positive) family-wise error rate (FWER) bounds as α and β respectively.Suppose x ( i ) j is the tightness metric returned by TeLEx after verifying j th simulation trace against i th STL specification φ i . Then for the n th model simulation, the hypothesis test calculates the following log-likelihood ratio against eachspecification φ i : Z i = log e − (1 / σ i ) (cid:80) nj =1 ( x ( i ) j − θ ( i )1 ) e − (1 / σ i ) (cid:80) nj =1 ( x ( i ) j − θ ( i )0 ) (2)The test samples sequentially until Z i ≤ A i or Z i ≥ B i where A i and B i are the stopping boundaries and are definedas functions of FWER bounds α and β . For each distribution i , the stopping boundaries are defined as: A i = log (cid:32) β (1 − α i )( k − i + 1) (cid:33) , B i = log (cid:32) (1 − β i )( k − i + 1) α (cid:33) (3)where, α i = ( k − i + 1 − β ) α ( k − i + 1)( k − β ) , β i = ( k − i + 1 − α ) β ( k − i + 1)( k − α ) (4)Note that the hypothesis test continues on taking samples until each Z i crosses one of its corresponding stoppingboundaries A i or B i .Rejection of the null hypothesis H ( i )0 indicates that the mean of the distribution i is greater than threshold θ (cid:48) ( i ) andmodel satisfies the corresponding behavioral specification φ i whereas, acceptance of a null hypothesis H ( i )0 means thatthe model with its current set of parameter values is not able to satisfy φ i . In case all H ( i )0 are rejected, BioMETAfurther checks whether each specification φ i is satisfied with probability at least r i . If yes, BioMETA returns the currentset of parameter values as the solution otherwise, it explores other parameter values using our search algorithm. Wedescribe the behavior of the search algorithm in the following subsection. To explore the parameter space of the given model, we employ an iterative search algorithm known as simulatedannealing. It is a probabilistic technique for approximating the global optimum of a given function. The algorithm(refer to Algorithm 1) starts the search at a high temperature value, t i , provided as input (line 1). It then generates arandom set of parameter values, ρ init , and checks model satisfiability against given specifications using our multiplehypothesis testing (MHT) based statistical model checking approach (described in the previous subsection) (lines 2-3). ρ init is returned as the estimated set of parameter values in case the hypothesis test returns “accept" which indicates thatthe model M ( ρ init ) satisfies all given specifications (line 4). If the hypothesis test rejects ρ init , the algorithm generatesa new neighboring set of parameter values, ρ new , by slightly perturbing the previous parameter set, and again checksmodel satisfiability with ρ new . Every time a parameter set is rejected, the algorithm starts a new iteration by finding aneighboring parameter set and repeating the model checking step (lines 7-21).The objective function that drives the search algorithm is defined as the mean of unsuccessful simulations exhibitingnegative TeLEx tightness metrics. For each new iteration, mean ( µ new ) of the current iteration is compared with mean( µ old ) of the previous iteration and the parameter set that is maximizing the mean function is accepted. However, inorder to avoid local optima the algorithm also sometimes, with a very small probability, accepts bad solutions i.e.6ccepts parameter set with smaller means (lines 14-18). As the current temperature t goes down, based on a coolingfactor c (line 20), the probability of accepting bad solutions also decreases in order for the algorithm to achieve stability.Once it finds a set of parameter values for which the model successfully satisfies all specifications with their respectiverequired probabilities, it stops and returns that parameter set. The algorithm continues until it finds the required set ofparameter values or the given temperature cools down to a predefined value, t f . In the later case, the algorithm returnsthat it was unable to find the required parameter set (line 22). A python based implementation of Algorithm 1 can befound at https://github.com/arfeenkhalid/parameter-synth . Algorithm 1
BioMETA Algorithm: Multiple specification parameter estimation system.
Require: M ( ρ ) Parameterized stochastic model{ φ , φ , ..., φ k } STL specifications{ r , r , ..., r k } required probabilities t i initial temperature t f final temperature c cooling rate Ensure: ρ such that M ( ρ ) (cid:15) P ≥ r ( φ ) ∧ P ≥ r ( φ ) ∧ .... ∧ P ≥ r k ( φ k ) t ← t i ρ init ← rand( ) result, µ init ← MHT( M ( ρ init ), { φ , .., φ k }, { r , .., r k }) if result = “acc" then return ρ init end if ρ new ← ρ init µ new ← µ init while t (cid:62) t f do ρ old ← ρ new µ old ← µ new ρ new ← F IND AN EIGHBOUR ( ρ old ) result, µ new ← MHT( M ( ρ new ), { φ , .., φ k }, { r , .., r k }) if result = “acc" then return ρ new else if µ new (cid:54) µ old then if rand (0, 1) > e ( − ( µ new − µ old ) /t ) then ρ new ← ρ old end if end if end if t ← c ∗ t end while return “No parameter set found!” We study two rule-based receptor models Fc (cid:15)
RI and T-cell for experimental purposes. We show that the presentedmethod BioMETA is successful in estimating parameters of a Fc (cid:15) RI model and parameters of a T-cell modelsatisfying three STL properties each. The next two subsections describe the models, discuss their respective STLspecifications and demonstrate our experimental results. All experiments were performed on an Intel Xeon Platinum8160 48-Core 2.10 GHz processor with 768 GB of RAM operating under Ubuntu 16.04.6 LTS. (cid:15) RI Fc (cid:15) RI is the high affinity receptor for immunoglobulin E (IgE) and is a member of the multichain immune recognitionreceptors (MIRR) family. It is responsible for controlling the activation of two different types of white blood cellsknown as human mast cells and basophils which are a crucial part of the immune and neuroimmune system in livingorganisms. To be more specific, Fc (cid:15)
RI plays an important role in wound healing and immediate allergic reactions. Italso participates in antigen presentation of immediate hypersensitivity reactions involving IgE-mediated release of7istamine and some other mediators. Some of these allergic reactions can result in serious consequences and Fc (cid:15)
RIis of vital importance [35] in providing physiological protection against such reactions and maintaining the allergicresponse by controlling the secretion of allergic mediators and induction of cytokine gene transcription.We perform our experiments on a rule-based model of Fc (cid:15)
RI developed by Faeder et al. [36] which aims to examinethe function of multiple components in the phosphorylation and activation of Spleen tyrosine kinase, Syk, whoseinhibition aids in treating autoimmune diseases. The authors of [36] model Fc (cid:15)
RI in its tetrameric form, which includesan α -chain that binds IgE and three Immunoreceptor Tyrosine-based Activation Motif (ITAM)-containing subunits, a β -chain, and two disulfide-linked γ -chains. The model exhibits four major reactions namely association, dissociation,phosphorylation, and dephosphorylation. All experiments in [36] are performed to stimulate and observe rat basophilicleukemia (RBL) cells using covalently cross-linked IgE dimers.We translate three important model behaviors described in the results section of [36] to the following probabilisticSignal Temporal Logic behavioral properties: Property 1: P ≥ . ( F [0 , (( RecDim/RecT ot > . ∧ G [1500 , ( RecDim/RecT ot (cid:62) . The first property investigates the kinetics of receptor aggregation and tyrosine phosphorylation in RBL cells whenthey are stimulated with covalently linked dimers of IgE. This property monitors one of the features of dimer-inducedreceptor (RecDim) phosphorylation where the percentage of RecDim reaches half of its maximum value at around halfan hour of the simulation and keeps on increasing till one hour. In other words, this property verifies if the percentageof RecDim is observed to persist in the later half of the simulation.
Property 2: P ≥ . ( F [0 , (( LynRecP beta/LynT ot (cid:62) . ∧ G [1500 , ( LynRecP beta/LynT ot (cid:62) . Simulating the Fc (cid:15)
RI model shows a rapid redistribution of Tyrosine-protein kinase Lyn. This redistribution becomeseven more tightly associated with the receptor via binding of Lyn’s SH2 domain to the phosphorylated β ITAM(LynRecPbeta). The second property verifies if a large percentage (say 80%) of the available Lyn is bound through itsSH2 domain to β when the receptor aggregation reaches its maximum. Property 3: P ≥ . ( G [1200 , ( RecP gamma/RecP beta (cid:62) . Another observation made by the authors in [36] is that Syk, which binds to the γ ITAM, is present in much higherconcentration than Lyn, which binds to the β ITAM. This happens because a phosphorylated γ ITAM tyrosine hasa longer lifetime than a β phosphotyrosine. Our third property verifies this behavior by showing that the level of γ phosphorylation (RecPgamma) exceeds that of β phosphorylation (RecPbeta) by ∼ (cid:15) RI model. For comparison purposes, this figure also shows some of theunsuccessful traces obtained during the model checking phase.Table 1 shows the estimated set of parameter values of Fc (cid:15)
RI satisfying all three specifications simultaneously obtainedby BioMETA when α = 0 . , β = 0 . and rrr = { . , . , . } . As mentioned before there are two types ofparameters in rule-based models: initial concentration of species and rate constants. The first four parameters in Table 1represent the molecules present in Fc (cid:15) RI model and the values represent their initial concentration at the start of eachsimulation. Parameters 5-26 are the rate constants which measure the rate at which a certain reaction rule proceeds. Forinstance, parameter 17 (pLb) and 19 (pLg) act as the rate constants for receptor transphosphorylation of β and γ ITAMsrespectively by constitutive Lyn. The result of this transphosphorylation is being observed in Property 3 of Fc (cid:15)
RI model.Each of the rate constants is able to regulate one or more reactions. For example, parameter 25 (dm) serves as a rateconstant for two reactions i.e. receptor dephosphorylation of β as well as γ ITAMs.
Our second benchmark is a rule-based model of T-cell receptor, also known as T-lymphocyte. It is a type of whiteblood cell that is a vital part of the immune system. It detects the presence of toxins or other foreign substances, knownas antigens, with the help of T-cell receptors (TCRs). The antigens trigger a living organism’s immune system toproduce anitbodies which in turn help protect the body against bacteria and viruses. The surface receptors TCRs bind8igure 2: Successful simulation trajectories (in blue) of Fc (cid:15)
RI model satisfying property 1, 2 and 3. Unsuccessfultrajectories against each of the above properties are shown in red.Table 1: Estimated set of parameter values obtained by BioMETA for Fc (cid:15)
RI satisfying three specifications with rrr ={0.95, 0.80, 0.99}, α = 0.005, β = 0.2 Param No. Param Name Param Value1 Lig_tot 3.748e062 Rec_tot 582.793 Lyn_tot 0.59214 Syk_tot 1.772e055 kp1 1.90e-096 km1 1.5077 kp2 4.14748 km2 0.00379 kpL 0.260310 kmL 0.511811 kpLs 0.381512 kmLs 0.0003613 kpS 0.0275714 kmS 0.005515 kpSs 0.527416 kmSs 0.0001317 pLb 61.43218 pLbs 13124.4119 pLg 397.21720 pLgs 0.171621 pLS 0.118122 pLSs 1453.9123 pSS 10578.9224 pSSs 5.306425 dm 0.0635426 dc 73.2459o some specific polypeptide fragments that are displayed, by a protein called the major histocompatibility complex(MHC), on the surface of neighboring cells. T-cell is known to maintain a very intricate balance between exhibitingstrong responses to the presence of extremely small quantities of antigen while not responding to the large quantitiesof host peptide-MHC (pMHC). By detecting antigens in this way, it is able to successfully avoid autoimmunity. Thischaracteristic along with some other attributes affiliated with the model, add an element of uncertainty to its dynamicsresulting in trajectories that may exhibit completely different behavior even though generated from the same initial state[37]. Such a stochastic nature of this model makes it a challenging benchmark for our current study.In order to avoid autoimmunity, T-cell receptor must ignore self peptides and at the same it must recognize foreignpeptides for immune defense. Thus, a correct T-cell model is expected to discriminate between agonist and antagonistpeptides [38]. In order to test this particular behavior, we observe the model’s primary output i.e. the fraction ofdoubly phosphorylated ERK (ppERK). This fraction (ppERK/totERK) is also taken as a measure of T-cell activation.If ppERK/totERK < . , the cell is considered to be inactive and if ppERK/totERK > . , the T-cell isconsidered to be in its active state. All three of our properties closely observe the value of this fraction and determine ifthe cell successfully achieves an active state after exhibiting an inactive state (and vice versa) within a defined period oftime. We estimate the parameter values of the model such that it satisfies the following three properties: Property 1: P ≥ . ( G [0 , ( ppERK/totERK < . As mentioned earlier, cell activity is measured by changing the quantity of ppERK. The first property verifies if thefraction of ppERK always stays below a given threshold value (0.95) during the first 300 time units of the simulation.
Property 2: P ≥ . ( F [0 , (( ppERK/totERK < . ∧ F [300 , ( ppERK/totERK > . Model simulations have shown that a small number of agonist peptides are sufficient for cell activation. Our secondproperty verifies this behavior by determining if T-cell is able to achieve the active state in the later half of its simulationprovided that it was inactive during the first half.
Property 3: P ≥ . ( F [0 , (( ppERK/totERK > . ∧ F [1000 , ( ppERK/totERK < . It is known that cell activation achieved by agonist peptides (Property 2) can be almost completely inhibited byantagonist peptides [38]. The third property monitors the deactivation behavior of T-cells which is a result of pSHPmediated negative feedback. This property verifies if the system can go from an active state during first 300 seconds ofthe simulation to an inactive state during the next 300 seconds. Figure 3 illustrates the successful as well as unsuccessfultraces of T-cell model obtained from BioNetGen against the above three STL properties.Table 2 shows the estimated set of parameter values of the same model satisfying all three probabilistic specificationssimultaneously obtained by BioMETA when α = 0 . , β = 0 . and rrr = { . , . , . } . In Table 2, the first eightparameters represent the molecules present in T-cell model with their respective initial concentrations. Parameters9-29 are the rate constants which quantify the rate of reaction rules. All three properties validated for this modelinvolve the monitoring of doubly phosphorylated ERK (ppERK). The rate constants involved in the phosphorylationand dephosphorylation of ERK are parameter 28 (e1) and 29 (e2) respectively. In this study, we introduce a new approach for estimating the parameters of a rule-based stochastic biochemical modelsuch that the model is able to satisfy multiple probabilistic temporal logic behavioral specifications simultaneously.Our proposed method BioMETA is based on merging a multiple hypothesis testing based statistical model checkingapproach with a simulated annealing based search to find a single set of parameter values for the model so that themodel satisfies all the given probabilistic temporal logic behavioral specifications. Our experimental results demonstratethat BioMETA was successful in estimating and parameters of two rule-based biochemical models Fc (cid:15) RI andT-cell respectively against three probabilistic temporal logic behavioral specifications each.We plan to pursue multiple directions for future work. Our current method requires several sequential model simulationsto be generated in order to check model satisfiability; therefore, we plan to take advantage of the latest parallel10igure 3: Successful simulation trajectories (in blue) of T-cell model satisfying property 1, 2 and 3. Unsuccessfultrajectories against each of the above properties are shown in red.Table 2: Estimated set of parameter values obtained by BioMETA for T-cell model satisfying three specifications with rrr = {0.85, 0.80, 0.70}, α = 0.005, β = 0.2Param No. Param Name Param Value1 pMHC(p ∼ ag) 2380.7442 pMHC(p ∼ en) 1.984e053 TCR 170.824 LCK 9.002e055 ZAP 1.121e066 MEK 7.734e057 ERK 63578.598 SHP 2681.379 b1 0.00010310 b2 0.00026911 d1 0.855412 d2 244.88613 lb 8.76e-0914 ly1 1.90e-0615 ly2 2.690316 ls1 0.002517 ls2 1.79e-0518 tp 1.404719 s0 3.48e-0820 s1 0.0010721 s2 1.22e-0622 s3 0.00011323 z0 0.00012124 z1 0.0016825 z2 1.5257226 m1 0.00050427 m2 0.3800228 e1 1.32e-0529 e2 0.507811omputing frameworks and execute these simulations in parallel by implementing the presented algorithm on GPUs.The curse of dimensionality is one of the biggest challenges faced when designing solutions to such problems. Toaddress this, we intend to investigate dimensionality reduction techniques [39] which would help the search process toperform more efficiently and generate useful results faster.We are also interested in exploring and designing solutions based on deep learning techniques that can improve thesearch process. Another interesting future direction is to construct a neural network that is able to learn the givenbiochemical model from its BioNetGen simulation traces. Later this neural network could be used to generate hundredsof simulation traces in parallel on GPUs eliminating the need to actually simulate the model. This could eventually helpexpedite the parameter estimation process. References [1] James R Faeder, Michael L Blinov, and William S Hlavacek. Rule-based modeling of biochemical systems withBioNetGen.
Systems Biology , pages 113–167, 2009.[2] Michael L Blinov, James R Faeder, Byron Goldstein, and William S Hlavacek. BioNetGen: software forrule-based modeling of signal transduction based on the interactions of molecular domains.
Bioinformatics ,20(17):3289–3291, 2004.[3] Daniel T Gillespie. Exact stochastic simulation of coupled chemical reactions.
The Journal of Physical Chemistry ,81(25):2340–2361, 1977.[4] Dan T Gillespie, Min Roh, and Linda R Petzold. Refining the weighted stochastic simulation algorithm.
TheJournal of Chemical Physics , 130(17):174103, 2009.[5] Laurence Calzone, Nathalie Chabrier-Rivier, François Fages, and Sylvain Soliman. Machine learning biochemicalnetworks from temporal logic properties.
Lecture Notes in Computer Science , 4220:68–94, 2006.[6] Faraz Hussain, Christopher J Langmead, Qi Mi, Joyeeta Dutta-Moscato, Yoram Vodovotz, and Sumit K Jha. Auto-mated parameter estimation for biological models using Bayesian statistical model checking.
BMC Bioinformatics ,16(17):S8, 2015.[7] Alexandre Donzé, Thomas Ferrere, and Oded Maler. Efficient robust monitoring for STL. In
InternationalConference on Computer Aided Verification , pages 264–279. Springer, 2013.[8] Susmit Jha, Ashish Tiwari, Sanjit A Seshia, Tuhin Sahai, and Natarajan Shankar. Telex: Passive stl learning usingonly positive examples. In
International Conference on Runtime Verification , pages 208–224. Springer, 2017.[9] Lily A Chylek, Leonard A Harris, Chang-Shung Tung, James R Faeder, Carlos F Lopez, and William S Hlavacek.Rule-based modeling: a computational approach for studying biomolecular site dynamics in cell signaling systems.
Wiley Interdisciplinary Reviews: Systems Biology and Medicine , 6(1):13–36, 2014.[10] Wen Xu, Adam M Smith, James R Faeder, and G Elisabeta Marai. Rulebender: a visual interface for rule-basedmodeling.
Bioinformatics , 27(12):1721–1722, 2011.[11] Adam M Smith, Wen Xu, Yao Sun, James R Faeder, and G Elisabeta Marai. Rulebender: integrated modeling,simulation and visualization for rule-based intracellular biochemistry.
BMC Bioinformatics , 13(8):S3, 2012.[12] Bing Liu and James R Faeder. Parameter estimation of rule-based models using statistical model checking. In , pages 1453–1459. IEEE, 2016.[13] Michael L Blinov, James R Faeder, and William S Hlavacek. Rule-based modeling of biological systems usingBioNetGen modeling language.[14] Vasumathi Raman, Alexandre Donzé, Dorsa Sadigh, Richard M Murray, and Sanjit A Seshia. Reactive synthesisfrom signal temporal logic specifications. In
Proceedings of the 18th International Conference on Hybrid Systems:Computation and Control , pages 239–248. ACM, 2015.[15] Axel Legay, Benoît Delahaye, and Saddek Bensalem. Statistical model checking: An overview. In
InternationalConference on Runtime Verification , pages 122–135. Springer, 2010.[16] Paolo Zuliani. Statistical model checking for biological applications.
International Journal on Software Tools forTechnology Transfer , 17(4):527–536, 2015.[17] Abraham Wald. Sequential tests of statistical hypotheses.
The Annals of Mathematical Statistics , 16(2):117–186,1945.[18] Sucheendra K Palaniappan, Benjamin M Gyori, Bing Liu, David Hsu, and PS Thiagarajan. Statistical modelchecking based calibration and analysis of bio-pathway models. In
International Conference on ComputationalMethods in Systems Biology , pages 120–134. Springer, 2013.1219] Faraz Hussain, Sumit K Jha, Susmit Jha, and Christopher J Langmead. Parameter discovery in stochastic biologicalmodels using simulated annealing and statistical model checking.
International Journal of Bioinformatics Researchand Applications 2 , 10(4-5):519–539, 2014.[20] R Ramanathan, Yan Zhang, Jun Zhou, Benjamin M Gyori, Weng-Fai Wong, and PS Thiagarajan. Parallelizedparameter estimation of biological pathway models. In
HSB , pages 37–57, 2015.[21] Sumit Kumar Jha, Edmund M Clarke, Christopher James Langmead, Axel Legay, André Platzer, and Paolo Zuliani.A Bayesian approach to model checking biological systems. In
CMSB , volume 5688, pages 218–234. Springer,2009.[22] Toni Mancini, Enrico Tronci, Ivano Salvo, Federico Mari, Annalisa Massini, and Igor Melatti. Computingbiological model parameters by parallel statistical model checking. In
International Conference on Bioinformaticsand Biomedical Engineering , pages 542–554. Springer, 2015.[23] Curtis Madsen, Fedor Shmarov, and Paolo Zuliani. BioPSy: an SMT-based tool for guaranteed parameter setsynthesis of biological models. In
International Conference on Computational Methods in Systems Biology , pages182–194. Springer, 2015.[24] Alessandro Cimatti, Edmund Clarke, Enrico Giunchiglia, Fausto Giunchiglia, Marco Pistore, Marco Roveri,Roberto Sebastiani, and Armando Tacchella. NuSMV 2: An opensource tool for symbolic model checking. In
International Conference on Computer Aided Verification , pages 359–364. Springer, 2002.[25] Qinsi Wang, Natasa Miskov-Zivanov, Bing Liu, James R Faeder, Michael Lotze, and Edmund M Clarke. Formalmodeling and analysis of pancreatic cancer microenvironment. In
International Conference on ComputationalMethods in Systems Biology , pages 289–305. Springer, 2016.[26] Jun Zhou, R Ramanathan, Weng-Fai Wong, and PS Thiagarajan. Automated property synthesis of ODEs basedbio-pathways models. In
International Conference on Computational Methods in Systems Biology , pages 265–282.Springer, 2017.[27] Aurélien Rizk, Grégory Batt, François Fages, and Sylvain Soliman. On a continuous degree of satisfaction oftemporal logic formulae with applications to systems biology. In
International Conference on ComputationalMethods in Systems Biology , pages 251–268. Springer, 2008.[28] Ezio Bartocci, Luca Bortolussi, Laura Nenzi, and Guido Sanguinetti. On the robustness of temporal properties forstochastic models. arXiv preprint arXiv:1309.0866 , 2013.[29] Luboš Brim, Milan ˇCeška, Sven Dražan, and David Šafránek. Exploring parameter space of stochastic biochemicalsystems using quantitative model checking. In
Proceedings of the 25th international conference on ComputerAided Verification , pages 107–123. Springer-Verlag, 2013.[30] Milan ˇCeška, Frits Dannenberg, Nicola Paoletti, Marta Kwiatkowska, and Luboš Brim. Precise parameter synthesisfor stochastic biochemical systems.
Acta Informatica , 54(6):589–623, 2017.[31] Milan ˇCeška, Petr Pilaˇr, Nicola Paoletti, Luboš Brim, and Marta Kwiatkowska. PRISM-PSY: precise GPU-accelerated parameter synthesis for stochastic systems. In
International Conference on Tools and Algorithms forthe Construction and Analysis of Systems , pages 367–384. Springer, 2016.[32] Juliane Liepe, Paul Kirk, Sarah Filippi, Tina Toni, Chris P Barnes, and Michael PH Stumpf. A framework forparameter estimation and model selection from experimental data in systems biology using approximate Bayesiancomputation.
Nature Protocols , 9(2):439–456, 2014.[33] Jay Bartroff and Jinlin Song. Sequential tests of multiple hypotheses controlling type I and II familywise errorrates.
Journal of Statistical Planning and Inference , 153:100–114, 2014.[34] Håkan LS Younes and Reid G Simmons. Statistical probabilistic model checking with a focus on time-boundedproperties.
Information and Computation , 204(9):1368–1409, 2006.[35] Helen Turner and Jean-Pierre Kinet. Signalling through the high-affinity IgE receptor Fc (cid:15)
RI.
Nature ,402(6760supp):24, 1999.[36] James R Faeder, William S Hlavacek, Ilona Reischl, Michael L Blinov, Henry Metzger, Antonio Redondo,Carla Wofsy, and Byron Goldstein. Investigation of early events in Fc (cid:15)
RI-mediated signaling using a detailedmathematical model.
The Journal of Immunology , 170(7):3769–3781, 2003.[37] Edmund M Clarke, James R Faeder, Christopher J Langmead, Leonard A Harris, Sumit Kumar Jha, and AxelLegay. Statistical model checking in BioLab: Applications to the automated analysis of T-cell receptor signalingpathway. In
International Conference on Computational Methods in Systems Biology , pages 231–250. Springer,2008. 1338] Tomasz Lipniacki, Beata Hat, James R Faeder, and William S Hlavacek. Stochastic effects and bistability in T cellreceptor signaling.