[PDF] Leveraging Structured Biological Knowledge for Counterfactual Inference: a Case Study of Viral Pathogenesis

Abstract

Counterfactual inference is a useful tool for comparing outcomes of interventions on complex systems. It requires us to represent the system in form of a structural causal model, complete with a causal diagram, probabilistic assumptions on exogenous variables, and functional assignments. Specifying such models can be extremely difficult in practice. The process requires substantial domain expertise, and does not scale easily to large systems, multiple systems, or novel system modifications. At the same time, many application domains, such as molecular biology, are rich in structured causal knowledge that is qualitative in nature. This manuscript proposes a general approach for querying a causal biological knowledge graph, and converting the qualitative result into a quantitative structural causal model that can learn from data to answer the question. We demonstrate the feasibility, accuracy and versatility of this approach using two case studies in systems biology. The first demonstrates the appropriateness of the underlying assumptions and the accuracy of the results. The second demonstrates the versatility of the approach by querying a knowledge base for the molecular determinants of a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-induced cytokine storm, and performing counterfactual inference to estimate the causal effect of medical countermeasures for severely ill patients.

Full PDF

11 a r X i v : . [ q - b i o . Q M ] J a n Leveraging Structured Biological Knowledge forCounterfactual Inference: a Case Study of ViralPathogenesis

Jeremy Zucker ∗ , Kaushal Paneri ∗ , Sara Mohammad-Taheri ∗ ,Somya Bhargava , Pallavi Kolambkar , Craig Bakker , Jeremy Teuton , Charles Tapley Hoyt , KristieOxford , Robert Ness and Olga Vitek † (cid:70) Abstract —Counterfactual inference is a useful tool for compar-ing outcomes of interventions on complex systems. It requiresus to represent the system in form of a structural causal model,complete with a causal diagram, probabilistic assumptions on ex-ogenous variables, and functional assignments. Specifying suchmodels can be extremely difﬁcult in practice. The process re-quires substantial domain expertise, and does not scale easily tolarge systems, multiple systems, or novel system modiﬁcations.At the same time, many application domains, such as molecularbiology, are rich in structured causal knowledge that is qualitativein nature. This manuscript proposes a general approach forquerying a causal biological knowledge graph, and convertingthe qualitative result into a quantitative structural causal modelthat can learn from data to answer the question. We demonstratethe feasibility, accuracy and versatility of this approach using twocase studies in systems biology. The ﬁrst demonstrates the ap-propriateness of the underlying assumptions and the accuracyof the results. The second demonstrates the versatility of theapproach by querying a knowledge base for the molecular de-terminants of a severe acute respiratory syndrome coronavirus2 (SARS-CoV-2 or SARS-CoV-2)-induced cytokine storm, andperforming counterfactual inference to estimate the causal effectof medical countermeasures for severely ill patients.

Keywords—

Biological expression language, structural causalmodel, counterfactual inference, causal biological knowledge graph,systems biology, SARS-CoV-2

NTRODUCTION

Each time a cell senses changes in its environment, it marshalsa complex choreography of molecular interactions to initiate anappropriate response. When a virus infects the cell, this delicate balance ∗ Equal contribution Paciﬁc Northwest National Laboratory, Richland, WA Microsoft, Redmond, WA Northeastern University, Boston, MA Enveda Biosciences, Bonn, Germany Altdeep, Boston, MA † Corresponding author: [email protected] is disrupted and can result in a cascade of systemic failures leading todisease. In particular, severe acute respiratory syndrome coronavirus2 (SARS-CoV-2), the novel pathogen responsible for the COVID-19pandemic, has a complex etiology that differs in subtle and substantialways from previously studied viruses. To make informed decisionsabout the risk that a new pathogen presents, it is imperative to rapidlypredict the determinants of pathogenesis and identify potential targetsfor medical countermeasures. Current solutions for this task includesystems biology data-driven models, which correlate biomolecularexpression to pathogenicity, but cannot go beyond associations inthe data to reason about causes of the disease [1], [2]. Alternatively,hypothesis-driven mathematical models capture causal relations, but arehampered by limited parameter identiﬁability and predictive power [3],[4]. We argue that counterfactual inference [5] helps bridge the gapbetween data-driven and hypothesis-driven approaches. It enablesquestions of the form: “Had we known the eventual outcome ofa patient, what would we have done differently?” At the heart ofcounterfactual inference is a formalism known as a structural causalmodel (SCM) [5], [6]. It represents prior domain knowledge in termsof causal diagrams, assumes a probability distribution on exogenousvariables, and assigns a deterministic function to endogenous variables.SCM are particularly attractive in systems biology, where structureddomain knowledge is extracted from the biomedical literature and isreadily available through advances in natural language processing [7],[8], [9], large-scale automated assembly systems [10], and semi-automated curation workﬂows [11]. This knowledge is curated bymultiple organizations [12], [13], [14], [15], [16] and stored instructured knowledge bases [17], [18], [19], [20]. It can be brought tobear for answering causal questions regarding SARS-CoV-2.This manuscript contributes a three-part algorithm that leveragesexisting structured biological knowledge to answer counterfactualquestions about viral pathogenesis. Algorithm 1 formalizes biologi-cally relevant questions as queries to an existing causal knowledgegraph. Algorithm 2 converts the query result into a structural causalmodel. Algorithm 3 operationalizes the counterfactual inference byinterrogating the model with the observed data to estimate a causaleffect.We illustrate the beneﬁts of this approach using two case studies.Case study 1 illustrates the increased precision of counterfactualestimates, as compared to the ODE- and SDE-based forward simulation,in a situation with known ground truth mechanisms of data generation.Case study 2 demonstrates the automated construction of an SCM andthe value of counterfactual reasoning in novel situations with limitedtreatment options (as is the case for SARS-CoV-2). It shows thatcounterfactual inference enables more precise predictions regardingwho would be likely to survive without receiving treatment, who would be likely to die even if they did receive treatment, and who wouldlikely survive only if they received treatment.

ACKGROUND

Biological signaling pathways

Signaling pathways are composed ofentities that engage in activities [21]. Examples of entities are proteinsand metabolites, but also higher level biological processes such as animmune response. Activities are the producers of change. Examplesinclude catalytic activity, kinase activity, or transcriptional activity.The basic unit of causality in signaling pathways is a directedmolecular interaction, where the activity of an upstream moleculeincreases or decreases the activity of a downstream molecule. Forexample, the mitogen-activated protein kinase (MAPK) intracellularsignaling pathway is a causal chain of directed molecular interactionsshown in Eq. (1) a ( S ) → kin ( p ( Raf )) → kin ( p ( Mek )) → kin ( p ( Erk )) (1)The interactions transmit information about a stimulus at the cellsurface to the nucleus, where proteins called transcription factorsactivate an appropriate biological process [22]. A causal diagram ofMAPK consists of a signaling molecule S and three proteins Raf , Mek , and

Erk , each of which engage in kinase activity. We representsignaling molecule abundance with a () , protein abundance with p () and the kinase activity of a protein with kin () . In the case of MAPK,the abundance or activity of an upstream entity causes the abundanceor activity of a downstream entity to increase, and is representedwith a sharp edge → . The diagram is a abstraction showing that theabundance of the signaling molecule S increases the kinase activity of Raf , which increases the kinase activity of

Mek , which increases thekinase activity of

Erk . In other cases, if the abundance or activity ofan upstream entity causes the abundance or activity of a downstreamentity to decrease, we represent this with a blunt edge (cid:173) . Viral dysregulation

Viral disruptions of a signaling pathway takeform of overactivation or repression of its activities. For example,by amplifying the release of intercellular signaling molecules thatoverstimulate the immune response, known as Cytokine ReleaseSyndrome (cytokine storm or CytokineStorm), a virus can cause severesystem-level cellular damage.

Quantitative modeling of biological processes with ODE/SDE

Tem-poral dynamics of biological processes can be expressed quantitativelyusing ordinary (or stochastic) differential equations. A small number ofhigh quality, validated models have been published in the literature andstored in a computable form in repositories such as Biomodels [23],[24]. For example, the MAPK signaling pathway in Eq. (1) is wellcharacterized. We denote R ( t ) , M ( t ) , and E ( t ) as the respectiveamounts of active Raf , Mek , and

Erk at time t ; We denote T R , T M , and T E as their total amounts, which we assume do not changeduring the considered timeframe; v act R , v inh R , v act M , v inh M , v act E , and v inh E areexperimentally derived activation or inhibition kinetic rate constants;and S is the amount of the input signal. The system of ordinarydifferential equations (ODEs) is speciﬁed as follows [25], [26]: d R d t = v act R S ( T R − R ( t )) − v inh R R ( t ) (2) d M d t = ( v act M ) v inh M R ( t ) ( T M − M ( t )) − v act M R ( t ) M ( t ) − v inh M M ( t )d E d t = ( v act E ) v inh E M ( t ) ( T E − E ( t )) − v act E M ( t ) E ( t ) − v inh E E ( t ) Given initial conditions, forward simulations from the ODEs canbe used to generate the temporal trajectories of the amounts of activatedproteins , such as R ( t ) , M ( t ) , and E ( t ) in the MAPK example. Inthis manuscript we refer to such simulated data as observational data .We deﬁne an ideal intervention as an event that ﬁxes the amount ofan activated protein. For example, if we ﬁx the kinase acivity of Mek at M ( t ) = m , the second equality d M d t in Eq. (2) becomes zero. We Raf Mek Erkf

Erk f Mek N Raf N Erk N Mek

Raf m (cid:48)

Erkf

Erk ˆ N Raf ˆ N Erk (a) (b)Fig. 1:

Causal modeling of MAPK signaling pathway

Circles arevariables, double circles are variables intervened upon, squares aredeterministic functional assignments, gray nodes are observed variables,and white nodes are hidden variables. (a) Structural causal model. N Raf , N Mek and N Erk are statistically independent noise variables.Root node

Raf is only dependent on noise variable N Raf . Non-root nodes

Mek and

Erk are dependent on their parent and on theassociated noise variable. (b) Counterfactual model. The interventionﬁxes the count of phosphorylated

Mek to m (cid:48) , such that Mek is nolonger dependent on

Raf and N Mek . Given an observed data point,counterfactual inference infers the noise variables ˆ N Raf , and ˆ N Erk .can simulate data from Eq. (2) with d M d t = 0 , and refer to these as interventional data . Contrasting observational and interventional datahelps evaluate the outcome of the intervention [27].The deterministic ODE ignore the fact that at low concentration,stochasticity becomes a signiﬁcant factor in determining the reaction[28]. As the collisions between molecules participating in biochemicalprocess become stochastic, a stochastic model is required. In contrast toODE, a stochastic differential equation model or stochastic differentialequation (SDE) speciﬁes biological process as a random process. Forexample, in the case of MAPK, the random process of the reaction Mek → Erk is speciﬁed with dP E ( t ) dt = g E ( t, v actE , v inhE , M ( t )) , E (0) = e (3)where P E ( t ) is marginal probability density of E ( t ) , function g E determines the probability of a state change between E ( t ) and E ( s ) , s > t , e is initial condition, and M ( t ) is the value of its parentMek at t . Once stochastic differential equation are fully speciﬁed,one can use, e.g. Gillespie’s stochastic simulation algorithm [29]to simulate observational and interventional data, and evaluate theoutcomes of interventions.Unfortunately, even simple ODEs such as the one in the MAPKexample are difﬁcult to build de novo . This is nearly impossible fornovel and poorly studied systems that lack the existence or ﬁndabilityof experimental information describing the structure or boundariesof the process, kinetic equations governing their dynamics [30], rateconstants for these equations, or rules governing each agents’ statesand functions. Equilibrium enzyme kinetics

Simpler and more general quantitativemodels can be speciﬁed when a reaction reaches the state of chemicalequilibrium [31]. One commonly used such model is

Hill function inthe form of X = β PA nX K n + PA nX (4)where X is the abundance of a protein in a causal diagram (such as Erk in Eq. (1)), PA X is the set of its parents, n is a parameter interpretedas the number of ligand binding sites of the protein, and β is the totalnumber of molecules of the protein. A special and frequently used caseof the Hill function, called Michaelis-Menten function, occurs when n = 1 . Although simple to use, these models are deterministic, and donot describe the stochasticity that is a distinctive property of biologicalsystems at low concentrations. Modeling biological processes with structural causal models

Thestochastic nature of biological processes at steady-state can be represented by an SCM such as in Fig. (1) (a) [27], [32]. SCMsrepresent the dependencies between a child node X and its parents PA X in terms of a deterministic function X = f X ( PA X , N X ) called structural assignment , and a noise variable N X . In Fig. (1) (a), f Mek and f Erk are linear or non-linear structural assignments, and N Raf , N Mek , and N Erk are statistically independent noise variables withdeﬁned probability distributions

Raf = N Raf ; Mek = f Mek ( Raf ; N Mek ) (5) Erk = f Erk ( Mek, N

Erk ) An ideal intervention in an SCM is performed on a functional assign-ment. For example, an ideal intervention on

Mek sets

Mek = m (cid:48) ,deﬁning a new SCM Raf = N Raf ; Mek = m (cid:48) ; Erk = f Erk ( Mek, N

Erk ) (6)An ideal intervention can also be thought of as a process of mutilatingthe causal graph. For example, intervening on Mek eliminates itsdependence upon

Raf , and therefore the edge from

Raf to Mek isremoved as shown in Fig. (1)(b).

Counterfactual inference with SCM

Beyond direct model-basedpredictions, SCMs enable counterfactual inference , i.e., the process ofinferring the unseen outcomes of a hypothetical intervention given dataobserved in absence of the intervention [5]. In the context of SCM,counterfactuals are deﬁned as operations Y do ( T = t (cid:48) ) ( u ) (cid:44) Y M do ( T = t (cid:48) ) ( u ) (7)In other words, the outcome Y that individual u would have had shereceived treatment t (cid:48) is deﬁned as the value that Y would have ina structural causal model M mutilated to replace T = f T ( · ) with T = t (cid:48) .For example, in the MAPK signaling pathway, we may beinterested in the counterfactual question: Having observed the kinaseactivities of

Raf = r , Mek = m , Erk = e , what would be the kinaseactivity of Erk in a hypothetical experiment where the kinase activityof

Mek was ﬁxed to m (cid:48) ? This counterfactual query is more formallytranslated into P ( Erk do ( Mek = m (cid:48) ) | Raf = r, Mek = m, Erk = e ) (8)The probability distribution in Eq. (8) is estimated with the followingsteps:1) Abduction:

Given observational data, estimate the posteriordistribution of the noise variables. In the MAPK example, weestimate the posterior distribution of the noise variables: ˆ N Raf = { N Raf | Raf = r, Mek = m, Erk = r } ˆ N Erk = { N Erk | Raf = r, Mek = m, Erk = r } Several inference algorithms are available for this task, e.g.Markov Chain Monte Carlo [33], Gibbs sampling [34], orno-u-turn Hamiltoninan Monte Carlo (HMC) [35]. In recentyears, gradient-based inference algorithms such as stochasticvariational inference [36] have become popular, because theycan scale to larger models by converting an inference probleminto an optimization problem.2)

Intervention:

Apply the intervention to the SCM to generatea mutilated SCM as in Fig. (1)(b). In the MAPK SCM,

Mek = f Mek ( Raf, N

Mek ) is replaced with Mek = m (cid:48) asshown in Fig. (1)(b).3) Prediction:

Generate samples from the mutilated SCMusing the estimated posterior distribution over the exogenousvariables ˆ N Raf and ˆ N Erk to obtain the counterfactualdistribution, as shown in Fig. (1)(b).

Causal effects

We distinguish between two causal effects. The ﬁrst isthe average treatment effect (ATE or ATE), deﬁned as the differencebetween the outcome of a hypothetical intervention and the observed outcome in the entire population. In the MAPK example, the ATE of

Erk upon an intervention ﬁxing

Raf = r (cid:48) is: (cid:8) Erk do ( Raf = r (cid:48) ) − Erk (cid:9) (9)This requires no observational data, and therefore the average treatmenteffect (ATE or ATE) can be inferred with forward simulation.On the other hand, the individual treatment effect (ITE or ITE)is deﬁned as the difference between the outcome of a hypotheticalintervention and the observed outcome for a speciﬁc individual orcontext. In the MAPK example, the individual treatment effect of

Erk upon an intervention ﬁxing

Raf = r (cid:48) in a context where Raf = r , Mek = m , Erk = e is: (cid:8) Erk do ( Raf = r (cid:48) ) − Erk (cid:9) | Raf = r, Mek = m, Erk = e (10)The ITE shares stochastic components of the noise variables betweenobservational and interventional data, and is therefore often moreprecise than a comparison based on a direct simulation [27].In cases where domain knowledge is available to describe thesystems dynamics in the form of an SDE, the system at equilibriumcan be translated into an SCM to enable counterfactual reasoning andestimation of the individual treatment effect [27], [37]. Unfortunately,this process is challenging in novel and poorly studied systems, due toour limited ability to establish the structure of the causal graph. Structured knowledge graphs

Although there exist a multitudeof biological knowledge bases that are manually curated from theliterature [12], [13], [14], [15], [16], the systems biology communityhas coalesced around a small number of structured knowledgerepresentations that differ mainly in their intended purpose. Forexample, the Biological Pathway Exchange Language (BioPAX) [17]was designed for pathway database integration [17], and the SystemsBiology Graphical Notation (SBGN) [19] was designed for graphicallayout [19].In contrast, the Biological Expression Language (BEL) [20] wasspeciﬁcally designed for manual extraction and automated integrationof author statements about causal relationships among biological enti-ties, biological processes, and cellular-level observable phenomena [11].The syntax of a BEL statement is comprised of a triple in the formof { subject , predicate , object }. Each subject and object represents anactivity or abundance whose entities are grounded using terms fromstandard namespaces. If the subject directly increases the abundanceor the activity of the object, we represent this with => , and for directlydecreasing relationships, we use =| . BEL statements can be chainedtogether from the object of the ﬁrst statement to the subject of the nextstatement, as shown in Fig. (2) for the case of the MAPK pathway.BEL provides a number of valuable features for causal modeling.First, the restriction of BEL edges to causal relations implies thetopology of the BEL graph can be reﬂected in the topology of thecausal model. Second, the language is expressive enough for humansto manually curate a wide range of biological concepts, but formalenough to serve as a training corpus for natural language processingof biomedical literature (BioNLP) competitions [38]. Third, the BELecosystem is sufﬁciently mature that causal knowledge represented inother languages can be readily converted to BEL [39], [40]. ETHODS

Let X = { X i } be a set of variables, such as molecular activities in asignaling pathway. Let P = { P j } be a set of causal predicates thatlink these variables, such as increases, or regulates. Using this notation,we deﬁne a knowledge graph K as a set of k triples K = { X i , P j , X i (cid:48) | X i ∈ X , P j ∈ P , X i (cid:48) ∈ { X \ X i }} kj =1 (11)We deﬁne a causal query Q as a set { X c , X e , X z } of variables thatare potential causes, effects and covariates of interest for the biologicalinvestigation, where X c ⊂ X , X e ⊂ X \ X c , and X z ⊂ X \ X c \ X e kin(p(fplx:RAF)) => kin(p(fplx:MEK))kin(p(fplx:MEK)) => kin(p(fplx:ERK)) Fig. 2:

Example BEL statement

The statement details the processes in the MAPK signaling pathway in Eq. (1). The ﬁrst line states that thekinase activity of RAF directly increases the kinase activity of MEK. The second line states that kinase activity of MEK directly increases thekinase activity of ERK.

Algorithm 1

Causal query to Biological Expression Language(

QUERY BEL or https://github.com/bel2scm) algorithm

Inputs: knowledge graph K causal query Q = { X c , X e , X z } Outputs: B procedure QUERY BEL ( X c , X e , X z , K ) (cid:73) Get all pathways from cause to effect for each cause X c i ∈ X c and for each effect X e j ∈ X e do ﬁnd all pathways (cid:8) P (cid:0) X c i , X e j (cid:1)(cid:9) (cid:73) Get all pathways from covariates to causes for each covariate X z i ∈ X z and for each cause X c j ∈ X c do ﬁnd all pathways (cid:8) P (cid:0) X z i , X c j (cid:1)(cid:9) (cid:73) Get all pathways from covariates to effects for each covariate X z i ∈ X z and for each effect X e j ∈ X e do ﬁnd all pathways (cid:8) P (cid:0) X z i , X e j (cid:1)(cid:9) B = (cid:8) P (cid:0) X c i , X e j (cid:1)(cid:9) ∪ (cid:8) P (cid:0) X z i , X c j (cid:1)(cid:9) ∪ (cid:8) P (cid:0) X z i , X e j (cid:1)(cid:9) return B A pathway P ( X , X k (cid:48) +1 ) , k ≤ k (cid:48) is a sequence of a subset of triplesfrom K , where the object of the previous triple is subject of the nexttriple { ( X , P , X ) , ( X , P , X ) , . . . , ( X k (cid:48) , P k (cid:48) , X k (cid:48) +1 ) } (12)Our goal is to query the knowlege graph to generate a qualitativecausal model B that links the causes, the effects and the covariatesof interest. Importantly, the query result B induces a directed acyclicgraph G with p variables from X as nodes, and causal relations from P as edges.We assume that every variable in B is continuous. We denote D = { X j , X j , ..., X pj } mj =1 the observational data of m samplesfrom the joint distribution P ( X ; θ ) . The distribution is speciﬁed interms of parameters θ . We denote R ⊂ X a set of nodes in G withoutparents. Given a biological knowledge graph K and a causal query of interest Q , our ﬁrst objective is to generate a qualitative causal model B capable of answering the query. To this end, we need to explore allpotential directed acyclic paths in K from the cause to the effect in Q , and then consider all covariates that may act as confounders of thecausal question. This is done with the steps in Alg. 1. The algorithmcan be implemented on any knowledge graph that represents causalrelationships as directed edges, such as BEL or the Systems BiologyGraphical Notation Activity Flow [19] (SBGN-AF) language [41].In the case of MAPK, the qualitative causal model that is capableof answering the counterfactual question in Eq. (8) correspondsto the result of this query: Q = { X c = kin ( p ( MEK )) , X e = kin ( p ( ERK )) , X z = kin ( p ( RAF )) } .We execute Alg. 1 step 2 to obtain all pathways from the cause tothe effect: kin ( p ( MEK )) → kin ( p ( ERK )) We execute Alg. 1 step 5 to obtain all pathways from the covariate tothe cause: kin ( p ( RAF )) → kin ( p ( MEK )) Algorithm 2

Biological Expression Language to Structural CausalModels (

BEL SCM or https://github.com/bel2scm) algorithm

Inputs:

BEL statements BD ∼ P ( X , ..., X p ) Outputs:

SCM M = { f i ( PA i , N i ) } pi =1 procedure BEL SCM ( B , D ) M = {} Get network structure G from B . for each R ∈ R in G do (cid:73) Use D to estimate parameters θ of P ( R ; θ ) θ = arg max θ P ( R ; θ | D ) (cid:73) Reparameterize P ( R ; θ ) in terms of f R and N R N R ∼ N (0 , f R ( N R ) = F − P ( R ; θ ) ( N R ) M .Add( f R ( N R ) ) for each X ∈ { X \ R } in G do (cid:73) Estimate parameters w and b of sigmoid function log( Xβ X − X ) = w (cid:48) PA X + b (cid:73) Deﬁne distribution of N X from model residuals. residual = X − β X − w (cid:48) PA X − b ) N X ∼ N (0 , MSE ( residual ) ) (cid:73) Get f X ( PA X , N X ) with additive N X . f X ( PA X , N X ) = β X exp ( − w (cid:48) X PA X − b X ) + N X M .Add( f X ( PA X , N X ) ). return M We execute Alg. 1 step 8, but since there are no new pathways fromthe covariate kin ( p ( RAF )) to the effect kin ( p ( ERK )) , we obtain theempty set. The ﬁnal returned model is: kin ( p ( RAF )) → kin ( p ( MEK )) → kin ( p ( ERK )) Our second objective is to express the qualitative causal structure in B into a quantitative SCM, and estimate the parameters of the SCM fromexperimental data. These steps are described in Algorithm 2. Input

The algorithm takes as input a BEL causal query result B andobserved measurements on its variables D . Get network structure G from B (Alg. 2 line 3) Since a set of BELstatements identiﬁes parents and children, it induces a causal networkstructure. We determine this structure by traversing BEL statementswith the breadth ﬁrst search approach, starting with root variables (suchas

Raf in Figure 2). For all the non-root variables, the algorithm waitsuntil all the parents are traversed.

For each root node R , use D to estimate parameters θ of P ( R ; θ ) (Alg. 2 line 5) In order to specify the SCM, we need todeﬁne the type and parameters of the marginal probability distributionsof the root variables P ( R ; θ ) . The BEL statements provide priorknowledge about the distribution in a parametric form. Therefore, thisstep involves techniques such as maximum likelihood to estimate theparameters of this distribution.For example, in a stochastic MAPK system at equilibrium theroot variable the number of active Raf in a cell follows a Binomialdistribution. When the maximum number of active or inactive particlesin the system is large, the Binomial distribution can be approximated X Hill,n=1, k=30Hill,n=2, k=30Hill,n=3, k=30sigmoid,w=.025,b=0.4sigmoid,w=.1,b=0.3sigmoid,w=.05,b=0.4

Fig. 3:

Examples of Hill function and sigmoid function for twovariables X is a single node that has a single parent PA X . We use theHill function ( X = β PA nX K n + PA nX ) and sigmoid function as in Eq. (15)to predict the value of X given its parent value. In the Hill function, K is the activation rate, n deﬁnes the steepness of function and β is ﬁxed at 100. Blue lines correspond to Hill equation with K = 30 and n ∈ { , , } . Brown lines correspond to sigmoid function where b ∈ { . , . , . } and w ∈ { . , . , . } with a Normal distribution with θ Raf = ( µ Raf , σ Raf ) . We thenestimate θ Raf using maximum likelihood from the observed

Raf in D . For each root node R , reparameterize P ( R ; θ ) in terms of f R and N R (Alg. 2 line 7) The speciﬁcation of an SCM requires us toseparate the deterministic and the stochastic components of variationof each variable as shown in Fig. (1). We accomplish this using areparameterization technique popularized by variational autoencoders[42], which was shown to make counterfactual inference consistentwith core biological assumptions [43]. In the case of root nodes, wereparameterize P ( R ; θ ) with Uniform(0,1), and then pass it to theinverse CDF of P ( R ; θ ) , as follows Original : R ∼ P ( R ; θ ) (13) Reparametrized : N R ∼ Uniform (0 , f R ( N R ) = F − P ( R ; θ ) ( N R ) where F − P ( R ; θ ) ( N R ) is the inverse cumulative distribution functionof P ( R ; θ ) . In the case of MAPK, since Raf follows a Normaldistribution with parameters θ Raf , the reparameterization simpliﬁeseven further to

Original :

Raf ∼ N ( µ Raf , σ Raf ) (14) Reparametrized : N Raf ∼ N (0 , f Raf ( N Raf ) = σ Raf N Raf + µ Raf

Add R to M (Alg. 2 line 10) For each root node, we add thecorresponding function f R ( N R ) and its noise variable N R to M . Forexample, since MAPK has only one root node Raf , the Algorithmadds f Raf ( N Raf ) to M . For each X ∈ { X \ R } , estimate parameters w and b of sigmoidfunction (Alg. 2 line 12) In order to specify the SCM for non-rootnodes, we need to deﬁne the form (polynomial, linear, non-linear,sigmoid, etc.) of functional assignments linking the measurementson the parent nodes to the measurements on the child. We chose thefunctional assignment in the form of a sigmoid function log (cid:18) Xβ X − X (cid:19) = w (cid:48) PA X + b (15)where β X is the maximum number of activated protein molecules. Fora node X with q parents, PA X is a q × vector of measurements on the parent nodes, w is a × q vector of weights, w (cid:48) is the transpose of w , and b is a scalar bias. Parameters w and b of the sigmoid functionare estimated from the data, e.g. using smooth L loss function.In the example of the MAPK pathway, f Mek has only one parent.Therefore f Mek has the form f Mek ( Raf, N

Mek ) = β Mek exp ( − w Mek

Raf − b ) + N Mek (16)We use the sigmoid function in Eq. (15) as a special case of the Hillequation. The full parametric description of the Hill equation has anuanced precise biochemical interpretation. For example, the parameter n represents the number of times a protein must be phosphorylatedbefore it becomes active and can therefore be obtained from domainknowledge. However, it is difﬁcult to estimate this parameter from data.The sigmoid function maintains the Hill equation’s functions, but witha reduced set of parameters that are easier to estimate. Fig. (3) showsthat the approximation is reasonable for a range of parameter values. Deﬁne distribution of N X from model residuals (Alg. 2 line 14) Similarly to the root variables, for non-root variables we assumethat the noise variables follow Normal distribution with 0 mean. Thevariance of this distribution is estimated from the residuals of themodel ﬁt in the previous step. For example, in the MAPK pathway, f Mek has only one parent

Raf . Therefore, the residuals of the sigmoidcurve ﬁt for

Mek are deﬁned as residual

Mek = Mek − β Mek − w Mek

Raf − b ) (17)and the distribution of the noise variable is deﬁned as N Mek ∼N (0 , MSE ( residual Mek )) Get f X ( PA X , N X ) with additive N X (Alg. 2 line 17) The stepcombines the sigmoid functional assignment and the independent noisevariable. In the example of

Mek in the MAPK pathway, the stepoutputs f Mek ( Raf, N

Mek ) = β Mek exp ( − w Mek

Raf − b ) + N Mek (18)

Add f X ( PA X , N X ) to SCM (Alg. 2 line 19) The step iterativelyadds ( f X , N X ) for all X ∈ X . Output (Alg. 2 line 20)

The algorithm returns a generativestructural causal model M = { f i ( PA i , N i ) } pi =1 where PA i ⊂ X . For example, in the case of the MAPK model, it returns [ N Raf , N

Mek , N

Erk , f

Raf ( N Raf ) , f Mek ( Raf, N

Mek ) ,f Erk ( Mek, N

Erk )] . The generated SCM enables counterfactual inference using a standardprocedure [5]. Given a new observation D new ,1) Abduction:

Update the probability P ( N X ) to obtain P ( N X | D new ) .2) Action:

Replace the equations determining the variables inset X c by X c = x c (cid:48) .3) Prediction:

Sample from the modiﬁed model to generate thetarget distribution X e do ( X c = x c (cid:48) ) .After generating the target distribution of the intervention model, weestimate causal effects. Alg. 3 describes the detailed steps of bothcounterfactual inference (with D new ) and forward simulation (if D new is empty) Causal query to Biological Expression Language (

QUERY BEL or https://github.com/bel2scm) was implemented manually using apublicly available instance of BioDati Studio [44], then validatedusing Integrated Dynamical Reasoner and Assembler (INDRA) [10]’sinteractive dialogue system Bob with BioAgents [10]. Parameterestimation in Biological Expression Language to Structural Causal

Algorithm 3

Estimate causal effect on X E upon intervening on X C Inputs:

New data point D new effect node X E observational data for effect node D E ∈ D new intervention value c node to intervene upon X C number of iteration I network structure G SCM M Outputs:

Causal Effect CE procedure GET C AUSAL E FFECT ( D new , E, D E , X C , c, I, G, M ) ˆ N = {} (cid:73) Interventional data for effect node X E ID E = {} for I do for each X ∈ { X \ X C } in G do (cid:73) Abduction:

Apply stochastic variational inference ˆ N X = SV I ( D new ) ˆ N .Add( ˆ N X ) (cid:73) Action:

Apply intervention on X C CM = pyro.do ( M , X C = c ) (cid:73) Get posterior of CM with importance sampling CMP = pyro.infer.Importance ( CM, ˆ N ) (cid:73) Prediction:

Get EmpiricalMarginal (EM) for X E CMM = pyro.infer.EM ( CMP, X E ) ID E .Add( CMM ) CE = ID E − D E return CE Models (

BEL SCM or https://github.com/bel2scm) was implementedin PyTorch. Let C be the number of nodes in causal graph G withparents. Let k be the number of iterations for gradient descent, let N be the number of samples in data, and let d be the maximum number ofparents in graph G . Computational complexity of parameter estimationstep is given by O ( CkNd ) .SCM-based counterfactual inference was performed with Pyro [45],due to its ability to perform interventions on probabilistic mod-els and scalability to larger models, as described in Alg. 3.Speciﬁcally, the implementation relies on the following func-tionalities in Pyro. The pyro.do method is an implemen-tation of Pearl’s do-operator used for causal inference. The pyro.infer.SVI method performs abduction using stochastic vari-ational inference with ELBO loss. The pyro.infer.Importance method performs posterior inference by importance sampling. The pyro.infer.EmpiricalMarginal method performs empiricalmarginal distribution from the trace posterior’s model.Experiments in this manuscript took between 13 to 82 secondsdepending on the graph size on a system with Intel Core i7 8th GenCPU, 16 GB RAM and Ubuntu 18.04 Operating System. The code isavailable at https://github.com/bel2scm. ASE S TUDIES

Below we introduce two biological case studies investigated using theapproach proposed in this manuscript. The ﬁrst case study allows us toevaluate the accuracy of the results based on known ground truth. Thesecond uses counterfactual reasoning to pinpoint the mechanism bywhich SARS-CoV-2 infection can lead to a cytokine storm in severelyill coronavirus disease 2019 (COVID-19) patients. The details of thecase studies, parameter values of the simulations, and of the results areat https://github.com/bel2scm.

The system

The insulin-like growth factor (IGF) signaling pathway(Figure 4) regulates growth and energy metabolism of a cell. The IGFsystem has been extensively investigated, and its dynamics are wellcharacterized in form of ODE and SDE models [25]. Activated by

EGF IGFnucleus

RafPP

Raf Akt

PP2A

MekErkRasSOS PI3Kp90

RasGap cellular membrane

Fig. 4:

Case Study 1: the IGF signaling system

The insulin-likegrowth factor (IGF) and epidermal growth factor (EGF) are receptorsof external stimuli, triggering downstream signaling pathways thatinclude the MAPK pathway. All the relationships between abundancesof activated proteins in this network are of the type increase , except forthe relationship between

Akt and

Raf which is of the type decrease .external stimuli, insulin-like growth factor (IGF) or epidermal growthfactor (EGF) triggers a signaling event, which includes the MAPKsignaling pathway in Eq. (1). Similarly to Eq. (1), nodes in the systemare kinase activities, and edges represent whether the kinase activity ofthe upstream protein directly increases or decreases the kinase activityof the downstream protein. However, the system is larger and morecomplex. It includes two different paths from

Ras to Erk , one directand the other through

P I K and Akt . This challenges estimates ofoutcomes of interventions. In this case study, we assume that the IGFsystem has no unobserved confounders.

Intervention

We considered two interventions. The ﬁrst ﬁxes thekinase activity of

Mek to 40. The second ﬁxes the kinase activity of

Ras to 30.

Causal effects of interest

We are interested in two causal questions.First, what would have been the kinase activity of

Erk had weintervened to ﬁx the kinase activity of

Mek to 40?

The second queryis as above, but with the intervention ﬁxing the kinase activity of

Ras to 30. More formally, we are interested in the average treatment effect (cid:8)

Erk do ( Mek =40) − Erk (cid:9) (19) (cid:8)

Erk do ( Ras =30) − Erk (cid:9) (20)Next, we introduce a new piece of information about a speciﬁc datapoint generated from the ODE-based simulation. We wish to estimatethe causal effect of intervention for this speciﬁc data point. Moreformally, we are interested in the individual treatment effect (cid:8)

Erk do ( Mek =40) − Erk (cid:9) | D new (21) (cid:8) Erk do ( Ras =30) − Erk (cid:9) | D new (22)where D new is a new data point. We note that this counterfactualinference can only be performed with an SCM. We wish to comparethese estimates of causal effects, in order to characterize the abilityof counterfactual inference via D new to improve the precision of theestimates. Evaluation

The kinetic equations described by the ODE and SDErepresent the true underlying dynamics of the IGF signaling pathway.Since the ODE and the SDE can estimate the causal effects by forwardsimulation, we view the estimates as the ground truth. We then wishto compare the estimates from the SCM against the ground-truthestimates from the ODE and the SDE. Since an SCM represents causal relationships at steady state, we train the parameters of the SCM usingdata generated from the ground-truth SDE after it has reached steadystate.We consider two types of evaluations. First, we compare theestimates of the forward simulation of the ODE and SDE with theforward simulation of the SCM. This allows us to characterize theimpact of SCM speciﬁcation and estimates of weights on the accuracyof causal effects. We do not expect to see a substantial differencebetween these two approaches for a correctly speciﬁed SCM. Wethen compare the SCM-based counterfactual inference of causaleffects with the estimates based on forward simulation. We expectthat the counterfactual inference will provide more precise estimates,illustrating the statistical efﬁciency of counterfactual inference ascompared to the forward simulation.

The system

Retrospective studies have indicated that high levels ofpro-inﬂammatory cytokine Interleukin 6 (IL6 or IL6) are stronglyassociated with severely ill COVID-19 patients [46]. One recentlyproposed explanation for this is the viral induction of a positivefeedback loop, known as Interleukin 6 Ampliﬁer (IL6-AMP orIL6-AMP) [47]. IL6-AMP is stimulated by simultaneous activation ofnuclear factor kappa-light-chain-enhancer of activated B cell (NF- κ Bor NF- κ B) and Signal Transducer and Activator of Transcription 3(STAT3 or STAT3) [48]. This in turn induces various pro-inﬂammatorycytokines and chemokines, including Interleukin 6, which recruitactivated T cells and macrophages. This strengthens the Interleukin 6Ampliﬁer into a positive feedback loop leading to a cytokine storm [49],which is believed to be responsible for the tissue damage observed inpatients with acute respiratory distress syndrome (ARDS) [47].

Intervention

Originally developed to treat autoimmune disorderssuch as rheumatoid arthritis [50], Tocilizumab (Toci or Toci) is animmunosuppressive drug consisting of a recombinant monoclonalantibody that targets the soluble Interleukin 6 receptor and caneffectively block the IL6 signal transduction pathway [51]. Tocilizumabhas emerged as a promising drug repurposing candidate to reducemortality in severely ill COVID-19 patients [52], [53].

Causal effect of interest

We deﬁne a severely ill COVID-19 patient assomeone with CytokineStorm > . We are interested in the individualtreatment effect (ITE) (cid:110) CytokineStorm do ( Toci =0) − CytokineStorm (cid:111) | D new (23)where D new is an observed patient who received Tocilizumab treatmentand became severely ill. We wish to characterize the severity ofcytokine storm which would have occurred had she not received thetreatment. We further wish to compare the ITE with the averagetreatment effect (ATE or ATE) (cid:110) CytokineStorm do ( Toci =0) − CytokineStorm (cid:111) (24)

Evaluation

Tocilizumab is known to have a strong inhibitory effect onsoluble Interleukin 6 receptor. We therefore expect that the severity ofthe cytokine storm would have been worse had the patient not receivedtreatment. Unfortunately, at the time of writing, there were no ODE orSDE-based models of the pathway, nor were there publicly availableCOVID-19 datasets quantifying the kinase activity of the Interleukin 6Ampliﬁer pathway at the single-cell level. Therefore, we simulated datafrom a “ground-truth” sigmoidal structural causal model, where thetopology reﬂects the causal structure of the pathway, and the numericvalues of the parameters were ﬁxed to reﬂect our prior qualitativeknowledge of the IL6-AMP pathway.We evaluate the ITE the proposed approach in two ways. First, wetrain the parameters of the SCM using the simulated data, and comparethe counterfactual inference of the ITE obtained from the “trained”SCM to the counterfactual inference of the ITE from the “ground-truth”SCM. This comparison allows us to characterize the impact of weight

30 40 50 60

Raf M e k Samples from SDESamples from SCMFitted sigmoid line

Fig. 5:

Case study 1: IGF model

Scatter plot of

Mek versus

Raf .Blue points are the data points generated by SDE. Yellow points are theestimates from SCM. The red line is the ﬁtted sigmoid curve in Alg. 2line 12.

25 30 35 40 45 50 55 60

SOS C o un t

50 60 70 80

PI3K C o un t (a) (b)Fig. 6: Case study 1: probability distributions of the root nodesof IGF model (a) Histogram of

SOS generated from SDE simulation(b) As in (a), for

P I K .estimation on the accuracy of causal effects. We expect that the need toestimate the weights will inﬂate the variance of the estimates. Second,we compare the estimates of ITE to the estimates of the ATE using thetrained SCM. This comparison allows us to characterize the statisticalefﬁciency of counterfactual inference when estimating causal effects.We expect that the ITE will provide much more precise estimates. ESULTS

Generating BEL causal model

The BEL representation of the IGFsystem was manually curated using BioDati studio [44], to matchthe existing ODE and SDE. The BEL representation of the IGFsystem speciﬁed all the node types as in category abundance . Allthe relationships between parents and children nodes were of type increase , except for the parent node

Akt , where the relationship wasof type decrease . Observational data

We mimicked the process of collecting observa-tional data by simulating kinase activity from the corresponding ODEand SDE. The initial number of particles for the receptor was 37 for

EGF and 5 for

IGF . The deterministic simulation numerically solvedthe ODE using the deSolve [54] R package. The stochastic simulationused the Gillespie algorithm [29] from the smfsb [55] R package.

Appropriateness of model assumptions

SCM-based estimates offunctional assignments with sigmoid approximations were well withinthe range of the SDE-based data (as shown for

Raf and

Mek inFig. (5)). Similar results were obtained for estimates of

Ras , P I K , AKT , Raf , and

Erk . The ﬁtted functional assignment had littlecurvature. This indicates that a more complicated function with moreparameters, such as Hill equation, was unnecessary in this case.To further evaluate the plausibility of the assumptions, Fig. (6)shows the histograms of the SDE-generated abundances of root nodes,

25 20 15 10 5 0 5 10

Causal effect on Erk upon intervention on Mek P r o b a b ili t y d e n s i t y ODESDESCM with estimated weights (ITE)

25 20 15 10 5 0 5 10

Causal effect on Erk upon intervention on Ras P r o b a b ili t y d e n s i t y ODESDESCM with estimated weights (ITE) (a) (b)

25 20 15 10 5 0 5 10

Causal effect on Erk upon intervention on Mek P r o b a b ili t y d e n s i t y ODESDEForward simulation from SCM with estimated weights (ATE)

25 20 15 10 5 0 5 10

Causal effect on Erk upon intervention on Ras P r o b a b ili t y d e n s i t y ODESDEForward simulation from SCM with estimated weights (ATE) (c) (d)Fig. 7:

Case study 1: estimated causal effects of the IGF signalingpathway using Alg. 3.

The ODE and SDE represent the trueunderlying dynamics of the IGF signaling pathway. The ODE andSDE-based forward simulation can only estimate the average treatmenteffect. These estimates are viewed as ground truth. In contrast, an SCMcan estimate both the average treatment effect (ATE) and the individualtreatment effect (ITE). (a) Comparison of ITE vs ATE for

Erk when

Mek is ﬁxed. (b) Comparison of ITE vs ATE for

Erk when

Ras isﬁxed. (c) Comparison of SCM, SDE and ODE estimates of the ATEfor

Erk when

Mek is ﬁxed. (d) Comparison of SCM, SDE and ODEestimates of the ATE on

Erk when

Ras is ﬁxed.which were not affected by functional assignments in SCM. The shapeof the histograms indicate that the assumption of Normal distributionwas plausible.

Accuracy of causal effects

Fig. (7)(c) and (d) show that the averagetreatment effects (ATEs or ATEs) on

Erk of ﬁxing

Mek and

Raf ,based on forward simulation of ODE, SDE and SCM, were consistent.Fig. (7)(a) and (b) show that the individual treatment effects (ITEs orITEs) based on counterfactual inference has a smaller variance thanthe ATE. Since counterfactual inference reduces nuisance variationby sharing stochastic components in contexts with and withoutintervention, it increases the statistical efﬁciency of the estimation.The individual treatment effect on

Erk by ﬁxing

Mek was muchstronger than the ITE on

Erk by ﬁxing

Ras for the following reason.While

Mek directly inﬂuences

Erk (i.e., there is a single path from

Mek to Erk ), Ras has two pathways to

Erk . The path through

AKT has an inhibiting (deactivation) effect on

Raf , and estimated negativeweights in the sigmoid function in Eq. (15). The alternative path, acascade from

Ras to Erk , has the opposite (activating) effect on

Erk .The two paths mitigate the overall causal effect of

Ras on Erk . Generating BEL causal model

The steps of the proposed Alg. 1produced the qualitative causal model in Fig. (8), and the correspondingBEL causal model B , as follows. In accordance with the inputsto Alg. 1, we deﬁned the knowledge base K as the Covid-19knowledge network automatically assembled from the COVID-19Open Research Dataset (CORD-19) [56] document corpus usingthe INDRA workﬂow. We deﬁned the cause X c as sIL6R α , the Fig. 8:

Case study 2: host response to viral infection

Pointed edgesrepresent relationships of type increase ; ﬂat-headed edges representrelationships of type decrease . Nodes SARS-COV2 and Toci areexternal stimuli.effect X e as cytokine storm, and the covariates X z as SARS-CoV-2 and Toci. Therefore the causal query of interest was deﬁned as Q = { sIL6R α, CytokineStorm , { SARS-CoV-2 , Toci }} .Alg. 1 line 2 generated all pathways from soluble Inter-leukin 6 receptor to Cytokine Release Syndrome, resulting in kin ( p ( sIL6R α )) → kin ( p ( IL6-STAT3 )) → bp ( IL6-AMP ) → bp ( CytokineStorm ) , where bp () is a biological process. Next, line5 generated all pathways from Tocilizumab to soluble Interleukin6 receptor: a ( Toci ) (cid:173) kin ( p ( sIL6R α )) , where a () is the dosagelevel of Tocilizumab. We then generated all pathways from severeacute respiratory syndrome coronavirus 2 to soluble Interleukin 6receptor: pop ( SARS-CoV-2 ) (cid:173) cat ( ACE2 ) (cid:173) a ( Angiotensin II ) → kin ( p ( AGTR1 )) → kin ( p ( ADAM17 )) → kin ( p ( sIL6R α )) , where pop () is the viral load of SARS-CoV-2 and cat () is the normal catalyticactivity of Angiotensin Converting Enzyme 2.Line 8 found no new branches from Tocilizumab to Cy-tokine Release Syndrome. Finally, we generated all pathwaysfrom severe acute respiratory syndrome coronavirus 2 to Cy-tokine Release Syndrome, which resulted in three new branches pop ( SARS-CoV-2 ) → kin ( p ( PRR )) → kin ( p ( NF- κ B )) → bp ( IL6-AMP )) , kin ( p ADAM17 )) → p ( EGF ) → kin ( p ( EGFR )) → kin ( p ( NF- κ B )) , and kin ( p ( ADAM17 )) → kin ( p ( TNF α )) → kin ( p ( NF- κ B )) . Observational data

We simulated observational data from a “ground-truth” sigmoidal structural causal model, where the topology reﬂectsthe causal structure in Fig. (8), and the parameters reﬂect ourprior qualitative knowledge of the IL6-AMP pathway. The rootnodes SARS-CoV-2 and Tocilizumab were sampled from a Normaldistribution with mean of and standard deviation of . The non-root nodes were sampled from a sigmoid function as in Eq. (15). Sincewe have prior qualitative knowledge that IL6-AMP is only activateddue to simultaneous activation of NF- κ B and IL6-STAT3, we set thethreshold for activation above what could be achieved by NF- κ B orIL6-STAT3 alone. Since we also know that Toci is a strong inhibitorof sIL6R α , we set the inhibition coefﬁcient to a large negative number.The parameters of the sigmoid function were chosen to ensure that thevariables were in the desired range of – . Finally, we randomlygenerated two new individuals D new with Cytokine Release Syndrome > to represent severely ill patients. The ﬁrst patient had a higherviral load of SARS-CoV-2 and received a lower dose of Toci. Thesecond patient had a lower viral load of and received a higher dose ofToci. Estimation of individual-level treatment effect

Fig. (9) evaluates theSCM-based estimates of the individual treatment effect of withholding Causal effect of setting Toci to 0 on cytokine storm P r o a b ili t y d e n s i t y SCM with known weights (ITE)SCM with estimated weights (ITE)

Causal effect of setting Toci to 0 on cytokine storm P r o a b ili t y d e n s i t y SCM with known weights (ITE)SCM with estimated weights (ITE) (a) (b)Fig. 9:

Case study 2: SCM-based estimates of the individualtreatment effect (ITE or ITE) using Alg. 3 . Blue histogram: the ITEestimated from the ground-truth SCM using Alg. 3. Yellow histogram:the ITE estimated from the Alg. 2-trained SCM using Alg. 3. (a)Patient has a high viral load and received a low dose of Tocilizumab.(b) Patient has a low viral load and received a high dose of Tocilizumab.Both patients were severely ill.

Causal effect of setting Toci to 0 on cytokine storm P r o a b ili t y d e n s i t y Forward simulation from SCM with estimated weights (ATE)SCM with estimated weights (ITE)

Causal effect of setting Toci to 0 on cytokine storm P r o a b ili t y d e n s i t y Forward simulation from SCM with estimated weights (ATE)SCM with estimated weights (ITE) (a) (b)Fig. 10:

Case study 2: SCM-based estimates of the averagetreatment effect (ATE or ATE) and of the ITE using Alg. 3.

Yellowhistogram: the ITE estimated using counterfactual inference. Brownhistogram: the ATE estimated using forward simulation. (a) Patient hasa high viral load and received a low dose of Tocilizumab. (b) Patienthas a low viral load and a received a high dose of Tocilizumab. Bothpatients were severely ill.treatment from two COVID-19 patients who were severely ill. Thedistribution of the individual treatment effect obtained with the SCMtrained using Alg. 2 was consistent with, but had a slightly largervariance then, the distribution of ITE obtained with the “ground truth"SCM with known weights. Even though both patients had the sameseverity of illness prior to the intervention, patient B was estimated tohave a more severe cytokine storm after Toci was withheld.Fig. (10) further compared the individual treatment effect obtainedwith the SCM trained using Alg. 2 with the average treatmenteffect estimated from the same model using forward simulation. Thedistribution of the individual treatment effect was patient-speciﬁcand had smaller variance, thus illustrating the statistical efﬁciency ofcounterfactual inference.

ISCUSSION

We proposed a general approach that leverages structured qualitativeprior knowledge, automatically generates a quantitative SCM, andenables answers to counterfactual research questions. In both casestudies, the use of the Biological Expression Language allowed usto leverage large repositories of structured biological knowledgeto specify an SCM and perform counterfactual inference in anautomated manner, which would otherwise require a substantial manual effort. The application to the IGF signaling system demonstrated theappropriateness of the underlying assumptions, and the accuracy of theresults when compared to ODE- and SDE-based forward simulation.The application to a study of host response to SARS-CoV-2 infectiondemonstrated the feasibility, versatility and usefulness of this approachas applied to an urgent public health issue. In particular, the approachcan help determine the amount of Tocilizumab (Toci or Toci) requiredto reduce the severity of each individual’s cytokine storm. Furthermore,in situations where treatment options are limited (as is the case SARS-CoV-2), counterfactual estimates enable a more precise conclusionregarding who would likely live without receiving the treatment, whowould likely die even if they did receive the treatment, and who wouldlikely live only after receiving the treatment.The approach opens multiple directions for future research. Inparticular, future work can extend the conﬁgurability of the

BEL SCM algorithm by incorporating the rich type information in BEL, mappingparent-child type signatures to functional forms such as post-nonlinearmodels, neural networks, mass action kinetics and Hill equations, andincorporating additional data types such as binary variables, categoricalvariables, and continuous variables with constraints on their domains.In some cases, the variables in the model may not be directly observ-able, but may nonetheless be characterized by means of detectablemolecular signatures. For example, even if interferon signaling maynot be directly observable using transcriptomics measurements, it maystill be possible to infer the activity of interferon signaling by anupregulation of interferon stimulated genes (ISG). Future work willfocus on leveraging molecular signature databases to infer the activityof variables in the model, and on learning and/or evaluating the modelsusing experimental data [57].We also note that experimentalists typically formulate biologicalprocesses as linear pathways (e.g., from S to Erk in the MAPKexample) that can be effectively perturbed and measured in a laboratorysetting. Yet such boundaries of biological processes are quite arbitrary,and are therefore highly susceptible to confounders. One way to addressthis issue is to search the knowledge graph for all common causes ofvariables in the causal model, use an identiﬁcation algorithm [58] toﬁnd the minimal valid adjustment set of the augmented model, andthen prune all common causes that do not contribute to that set. Thisapproach will require us to tackle the issues of parameter and causalidentiﬁability in the presence of confounders.In addition to unobserved confounders, the validity of causalinferences can be threatened by feedback loops, model misspeciﬁcation,missing data, and out-of-sample distributions. To address the possibilityof feedback loops, we must consider the time scale at which thesefeedbacks reach steady-state: fast timescale feedback loops can beaddressed with the chain graph interpretation of SCMs [59] [60];intermediate timescale feedbacks can be addressed with non-recursivestructural causal models [5]; slow timescale feedback loops can behandled by unrolling the structure of the SCM as is done with dynamicBayesian networks [61], or simply by representing the entire feedbackloop as a biological process, as we did with IL6-AMP. In the case ofmodel misspeciﬁcation, we will investigate the ability of counterfactualinference to improve the estimation [43]. For missing data, we canleverage causal inference recoverability algorithms that have beenpublished recently [62], and for handling out-of-sample distributions,we can leverage recent results applying causal inference to the problemof external validity [63]. Future work will focus on addressing thesethreats to validity when applied to real biological data. A CKNOWLEDGMENTS

This work was supported by funds from the PNNL Mathematicsand Artiﬁcial Reasoning Systems Laboratory Directed Researchand Development Initiative. Knowledge curation environments wereprovided by BioDati.com and Causaly.com. We would also like toacknowledge Jessica Stothers and Rose Glavin at CoronaWhy.organd Marek Ostaszewski at the COVID-19 Disease Map Initiative forproviding valuable feedback about the IL6-AMP model. R EFERENCES [1] A. Pezeshki, I. G. Ovsyannikova, B. A. McKinney, G. A. Poland,and R. B. Kennedy, “The role of systems biology approaches indetermining molecular signatures for the development of moreeffective vaccines.”

Expert Review of Vaccines , vol. 18, p. 253,2019.[2] M. Pedragosa, G. Riera, V. Casella, A. Esteve-Codina, Y. Steuer-man, C. Seth, G. Bocharov, S. Heath, I. Gat-Viks, J. Argilaguet,and A. Meyerhans, “Linking cell dynamics with gene coex-pression networks to characterize key events in chronic virusinfections,”

Frontiers in Immunology , vol. 10, p. 1002, 2019.[3] V. K. Nguyen, F. Klawonn, R. Mikolajczyk, and E. A.Hernandez-Vargas, “Analysis of practical identiﬁability of a viralinfection model,”

Plos One , vol. 11, p. e0167568, 2016.[4] A. Arazi, W. F. Pendergraft, R. M. Ribeiro, A. S. Perelson, andN. Hacohen, “Human systems immunology: hypothesis-basedmodeling and unbiased data-driven approaches,”

Seminars inImmunology , vol. 25, p. 193, 2013.[5] J. Pearl,

Causality: Models, Reasoning and Inference . Cam-bridge, MA, USA„ 2013.[6] J. Peters, D. Janzing, and B. Schölkopf,

Elements of CausalInference: Foundations and Learning Algorithms . MIT press,2017.[7] J. F. Allen, M. Swift, and W. De Beaumont, “Deep semantic anal-ysis of text,”

Proceedings of the 2008 Conference on Semanticsin Text Processing , vol. 1, p. 343, 2008.[8] D. D. McDonald, “Issues in the Representation of Real Texts:The Design of Krisp,”

Natural Language Processing and Knowl-edge Representation , p. 77, 2000.[9] M. A. Valenzuela-Escárcega, O. Babur, G. Hahn-Powell, D. Bell,T. Hicks, E. Noriega-Atala, X. Wang, M. Surdeanu, E. Demir,and C. T. Morrison, “Large-scale automated machine readingdiscovers new cancer-driving mechanisms,”

Database , vol. 2018,p. 1, 2018.[10] B. M. Gyori, J. A. Bachman, K. Subramanian, J. L. Muhlich,L. Galescu, and P. K. Sorger, “From word models to executablemodels of signaling networks using automated assembly,”

Molec-ular Systems Biology , vol. 13, 2017.[11] C. T. Hoyt, D. Domingo-Fernández, R. Aldisi, L. Xu, K. Kolpeja,S. Spalek, E. Wollert, J. Bachman, B. M. Gyori, P. Greene,and M. Hofmann-Apitius, “Re-curation and rational enrich-ment of knowledge graphs in Biological Expression Language,”

Database , vol. 2019, 2019.[12] E. G. Cerami, B. E. Gross, E. Demir, I. Rodchenkov, O. Babur,N. Anwar, N. Schultz, G. D. Bader, and C. Sander, “PathwayCommons, a web resource for biological pathway data,”

NucleicAcids Research , vol. 39, p. 685, 2011.[13] A. Fabregat, S. Jupe, L. Matthews, K. Sidiropoulos, M. Gillespie,P. Garapati et al. , “The Reactome pathway knowledgebase,”

Nucleic Acids Research , vol. 46, p. D649, 2018.[14] M. Kanehisa, M. Furumichi, M. Tanabe, Y. Sato, and K. Mor-ishima, “KEGG: New perspectives on genomes, pathways, dis-eases and drugs,”

Nucleic Acids Research , vol. 45, p. D353,2017.[15] L. Perfetto, L. Briganti, A. Calderone, A. C. Perpetuini, M. Ian-nuccelli, F. Langone, L. Licata, M. Marinkovic, A. Mattioni,T. Pavlidou, D. Peluso, L. L. Petrilli, S. Pirró, D. Posca, E. San-tonico, A. Silvestri, F. Spada, L. Castagnoli, and G. Cesareni,“SIGNOR: A database of causal relationships between biologicalentities,”

Nucleic Acids Research , vol. 44, p. D548, 2016. [16] D. N. Slenter, M. Kutmon, K. Hanspers, A. Riutta, J. Wind-sor, N. Nunes et al. , “WikiPathways: a multifaceted pathwaydatabase bridging metabolomics to other omics research.”

Nu-cleic Acids Research , vol. 46, p. D661, 2018.[17] E. Demir, M. P. Cary, S. Paley, K. Fukuda, C. Lemer, I. Vastrik et al. , “The BioPAX community standard for pathway datasharing,”

Nature Biotechnology , vol. 28, p. 1308, 2010.[18] M. Hucka, F. T. Bergmann, A. Dräger, S. Hoops, S. M. Keating,N. Le Novère, C. J. Myers, B. G. Olivier, S. Sahle et al. ,“The Systems Biology Markup Language (SBML): languagespeciﬁcation for level 3 version 2 core,”

Journal of IntegrativeBioinformatics , vol. 15, 2018.[19] N. Le Novere, M. Hucka, H. Mi, S. Moodie, F. Schreiber,A. Sorokin, E. Demir, K. Wegner, M. I. Aladjem, S. M.Wimalaratne et al. , “The systems biology graphical notation,”

Nature Biotechnology , vol. 27, p. 735, 2009.[20] T. Slater, “Recent advances in modeling languages for pathwaymaps and computable biological networks,”

Drug DiscoveryToday , vol. 19, p. 193, 2014.[21] P. Machamer, L. Darden, and C. F. Craver, “Thinking aboutmechanisms,”

Philosophy of Science , vol. 67, p. 1, 2000.[22] Y. Li, J. Roberts, Z. AkhavanAghdam, and N. Hao, “Mitogen-activated protein kinase (MAPK) dynamics determine cell fate inthe yeast mating response,”

The Journal of Biological Chemistry ,vol. 292, p. 20354, 2017.[23] L. Chen, R. Wang, C. Li, and K. Aihara,

Modeling BiomolecularNetworks in Cells: Structures and Dynamics . Springer Science& Business Media, 2010.[24] D. Gratie, B. Iancu, and I. Petre, “ODE analysis of biologicalsystems,” in

International School on Formal Methods for theDesign of Computer, Communication and Software Systems ,2013, p. 29.[25] F. Bianconi, E. Baldelli, V. Ludovini, L. Crino, A. Flacco, andP. Valigi, “Computational model of EGFR and IGF1R pathwaysin lung cancer: a systems biology approach for translationaloncology,”

Biotechnology Advances , vol. 30, p. 142, 2012.[26] E. K. Kim and E.-J. Choi, “Pathological roles of MAPK signal-ing pathways in human diseases,”

Biochimica et Biophysica Acta- Molecular Basis of Disease , vol. 1802, p. 396, 2010.[27] R. Ness, K. Paneri, and O. Vitek, “Integrating Markov processeswith structural causal modeling enables counterfactual inferencein complex systems,” in

Advances in Neural Information Pro-cessing Systems , 2019, p. 14211.[28] K. Paneri, “Integrating markov process and structural causalmodels enables counterfactual inference in complex systems,”2019.[29] D. T. Gillespie, “Exact stochastic simulation of coupled chemicalreactions,”

The Journal of Physical Chemistry , vol. 81, p. 2340,1977.[30] S. K. Jha and C. J. Langmead, “Exploring behaviors of stochasticdifferential equation models of biological systems using changeof measures,”

BMC Bioinformatics , vol. 13, p. S8, 2012.[31] U. Alon,

An Introduction to Systems Biology: Design Principlesof Biological Circuits . CRC press, 2019.[32] S. Bongers and J. M. Mooij, “From random differential equationsto structural causal models: the stochastic case,” in

Proceedingsof Uncertainty in Artiﬁcial Intelligence , 2019.[33] M. Jerrum, A. Sinclair, and D. S. Hochbaum, “The Markov chainMonte Carlo method,”

Approximation Algorithms for NP-hardProblems , 1997. [34] A. E. Gelfand, “Gibbs sampling,” Journal of the Americanstatistical Association , vol. 95, p. 1300, 2000.[35] M. D. Hoffman and A. Gelman, “The No-U-Turn sampler:adaptively setting path lengths in Hamiltonian Monte Carlo,”

Journal of Machine Learning Research , vol. 15, p. 1593, 2014.[36] M. D. Hoffman, D. M. Blei, C. Wang, and J. Paisley, “Stochasticvariational inference,”

The Journal of Machine Learning Re-search , vol. 14, p. 1303, 2013.[37] T. Blom, S. Bongers, and J. M. Mooij, “Beyond structural causalmodels: Causal constraints models,” in

Proceedings of the 35thConference on Uncertainty in Artiﬁcial Intelligence , 2019.[38] S. Madan, J. Szostak, R. Komandur Elayavilli, R. T.-H. Tsai,M. Ali, L. Qian, M. Rastegar-Mojarad, J. Hoeng, and J. Fluck,“The extraction of complex relationships and their conversion tobiological expression language (BEL) overview of the BioCre-ative VI (2017) BEL track.”

Database: the Journal of BiologicalDatabases and Curation , vol. 2019, 2019.[39] C. T. Hoyt, D. Domingo-Fernández, S. Mubeen, J. M. Llaó,A. Konotopez, C. Ebeling, C. Birkenbihl, O. Muslu, B. English,S. Müller, M. P. de Lacerda, M. Ali, S. Colby, D. Türei,N. Palacio-Escat, and M. Hofmann-Apitius, “Integration ofstructured biological data sources using biological expressionlanguage,”

BioRxiv , 2019.[40] C. T. Hoyt, A. Konotopez, C. Ebeling, and J. Wren, “PyBEL:a computational framework for biological expression language.”

Bioinformatics , vol. 34, p. 703, 2018.[41] H. Mi, F. Schreiber, S. Moodie, T. Czauderna, E. Demir, R. Haw,A. Luna, N. Le Novère, A. Sorokin, and A. Villéger, “SystemsBiology Graphical Notation: Activity Flow language Level 1Version 1.2.”

Journal of Integrative Bioinformatics , vol. 12, p.265, 2015.[42] D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochasticbackpropagation and approximate inference in deep generativemodels,” arXiv:1401.4082 , 2014.[43] R. Ness, K. Paneri, and O. Vitek, “Integrating Markov processeswith structural causal modeling enables counterfactual inferencein complex systems,” in

Advances in Neural Information Pro-cessing Systems , 2019, p. 14234.[44] “BioDati, inc.” [Online]. Available: https://studio.covid19.biodati.com/[45] E. Bingham, J. P. Chen, M. Jankowiak, F. Obermeyer, N. Prad-han, T. Karaletsos, R. Singh, P. Szerlip, P. Horsfall, and N. D.Goodman, “Pyro: Deep Universal Probabilistic Programming,”

Journal of Machine Learning Research , 2018.[46] Z. S. Ulhaq and G. V. Soraya, “Interleukin-6 as a potentialbiomarker of COVID-19 progression.”

Medecine et MaladiesInfectieuses , vol. 50, p. 382, 2020.[47] T. Hirano and M. Murakami, “COVID-19: A new virus, buta familiar receptor and cytokine release syndrome.”

Immunity ,vol. 52, p. 731, 2020.[48] M. Murakami and T. Hirano, “The pathological and physiolog-ical roles of IL-6 ampliﬁer activation.”

International Journal ofBiological Sciences , vol. 8, p. 1267, 2012.[49] H. Ogura, M. Murakami, Y. Okuyama, M. Tsuruoka,C. Kitabayashi, M. Kanamoto, M. Nishihara, Y. Iwakura, andT. Hirano, “Interleukin-17 promotes autoimmunity by triggeringa positive-feedback loop via interleukin-6 induction,”

Immunity ,vol. 29, p. 628, 2008.[50] V. Oldﬁeld, S. Dhillon, and G. L. Plosker, “Tocilizumab: a reviewof its use in the management of rheumatoid arthritis.”

Drugs ,vol. 69, p. 609, 2009. [51] C. Zhang, Z. Wu, J.-W. Li, H. Zhao, and G.-Q. Wang, “Cytokinerelease syndrome in severe COVID-19: Interleukin-6 receptorantagonist Tocilizumab may be the key to reduce mortality,”

In-ternational Journal of Antimicrobial Agents , vol. 55, p. 105954,2020.[52] E. A. Coomes and H. Haghbayan, “Interleukin-6 in COVID-19:A systematic review and meta-analysis,” medRxiv , 2020.[53] X. Xu, M. Han, T. Li, W. Sun, D. Wang, B. Fu, Y. Zhou,X. Zheng, X. L. Y. Yang, X. Zhang, A. Pan, and H. Wei, “Effec-tive Treatment of Severe COVID - 19 Patients with Tocilizumab,”

PNAS , vol. 117, p. 10970, 2020.[54] K. E. R. Soetaert, T. Petzoldt, and R. W. Setzer, “Solving dif-ferential equations in R: package deSolve,”

Journal of StatisticalSoftware , vol. 33, 2010.[55] D. Wilkinson, “Package smfsb,” 2018.[56] L. Lu Wang, K. Lo, Y. Chandrasekhar, R. Reas, J. Yang, D. Eide,K. Funk, R. Kinney, Z. Liu, W. Merrill, P. Mooney, D. Murdick,D. Rishi, J. Sheehan, Z. Shen, B. Stilson, A. D. Wade, K. Wang,C. Wilhelm, B. Xie, D. Raymond, D. S. Weld, O. Etzioni, andS. Kohlmeier, “CORD-19: The covid-19 open research dataset.” arXiv , 2020.[57] A. Liu, P. Trairatphisan, E. Gjerga, A. Didangelos, J. Barratt,and J. Saez-Rodriguez, “From expression footprints to causalpathways: contextualizing large signaling networks with CAR-NIVAL,”

Systems Biology and Applications , vol. 5, p. 1, 2019.[58] S. Tikka and J. Karvanen, “Identifying causal effects with theRpackagecausaleffect,”

Journal of Statistical Software , vol. 76,p. 1, 2017.[59] S. L. Lauritzen and T. S. Richardson, “Chain graph modelsand their causal interpretations,”

Journal of the Royal StatisticalSociety: Series B , vol. 64, p. 321, 2002.[60] E. Sherman and I. Shpitser, “Identiﬁcation and estimation ofcausal effects from dependent data.”

Advances in Neural Infor-mation Processing Systems , vol. 2018, p. 9446, 2018.[61] D. Koller and N. Friedman,

Probabilistic Graphical Models:Principles and Techniques - Adaptive Computation and MachineLearning . The MIT Press, 2009.[62] R. Nabi, R. Bhattacharya, and I. Shpitser, “Full law identiﬁcationin graphical models of missing data: Completeness results,” arXiv , 2020.[63] E. Bareinboim and J. Pearl, “Causal inference and the data-fusion problem.”