[PDF] A framework for modelling Molecular Interaction Maps

Abstract

Metabolic networks, formed by a series of metabolic pathways, are made of intracellular and extracellular reactions that determine the biochemical properties of a cell, and by a set of interactions that guide and regulate the activity of these reactions. Most of these pathways are formed by an intricate and complex network of chain reactions, and can be represented in a human readable form using graphs which describe the cell cycle checkpoint pathways. This paper proposes a method to represent Molecular Interaction Maps (graphical representations of complex metabolic networks) in Linear Temporal Logic. The logical representation of such networks allows one to reason about them, in order to check, for instance, whether a graph satisfies a given property ϕ , as well as to find out which initial conditons would guarantee ϕ , or else how can the the graph be updated in order to satisfy ϕ . Both the translation and resolution methods have been implemented in a tool capable of addressing such questions thanks to a reduction to propositional logic which allows exploiting classical SAT solvers.

Full PDF

AA framework for modelling Molecular InteractionMaps

Jean-Marc Alliot ∗ [email protected] Marta Cialdea Mayer † [email protected] Robert Demolombe ∗ [email protected] Mart´ın Di´eguez ∗ [email protected] Luis Fari˜nas del Cerro ∗ [email protected] Abstract

Metabolic networks, formed by a series of metabolic pathways, are made ofintracellular and extracellular reactions that determine the biochemical propertiesof a cell, and by a set of interactions that guide and regulate the activity of thesereactions. Most of these pathways are formed by an intricate and complex networkof chain reactions, and can be represented in a human readable form using graphswhich describe the cell cycle checkpoint pathways.This paper proposes a method to represent Molecular Interaction Maps (graph-ical representations of complex metabolic networks) in Linear Temporal Logic.The logical representation of such networks allows one to reason about them, inorder to check, for instance, whether a graph satisﬁes a given property φ , as wellas to ﬁnd out which initial conditons would guarantee φ , or else how can the thegraph be updated in order to satisfy φ .Both the translation and resolution methods have been implemented in a toolcapable of addressing such questions thanks to a reduction to propositional logicwhich allows exploiting classical SAT solvers. ∗ Institut de Recherche en Informatique de Toulouse. Toulouse University, Toulouse, France † Dipartimento di Ingegneria, Universit`a degli Studi Roma Tre, Rome, Italy a r X i v : . [ c s . L O ] A ug Introduction

Metabolic networks, formed by a series of metabolic pathways, are made of intracel-lular and extracellular reactions that determine the biochemical properties of a cell byconsuming and producing proteins, and by a set of interactions that guide and regulatethe activity of such reactions. Cancer, for example, can sometimes appear in a cell asa result of some pathology in a metabolic pathway. These reactions are at the center ofa cell’s existence, and are regulated by other proteins, which can either activate thesereactions or inhibit them. These pathways form an intricate and complex network ofchain reactions, and can be represented in a human readable form using graphs, calledMolecular Interaction Maps (MIMs) [26, 33] which describe the cell cycle checkpointpathways (see for instance Figure 1).Although capital for Knowledge Representation (KR) in biology, MIMs are difﬁ-cult to use due to the very large number of elements they may involve and the intrinsicexpertise needed to understand them. Moreover, the lack of a formal semantics forMIMs makes it difﬁcult to support reasoning tasks commonly carried out by experts,such as checking properties on MIMs, determining how a MIM can explain a givenproperty or how a MIM can be updated in order to describe empirically obtained evi-dences.This contribution carries on the research undertaken by the authors aiming at pro-viding a formal background to study MIMs. A ﬁrst set of works proposed a formali-sation of MIMs based on a decidable fragment of ﬁrst-order logic [14, 15, 16]. In anattempt to ﬁnd a simpler representation, without resorting to the expressivity of ﬁrst-order logic, other works [1, 2, 3] proposed an ad-hoc deﬁned non-monotonic logic,called Molecular Interaction Logic (MIL), allowing one to formalize the notions of production and consumption of reactives. In order to formalise the “temporal evo-lution” of a biological system, MIL formulae are then mapped into Linear TemporalLogic (LTL) [32].This paper embraces the idea, proposed by the above mentioned works, that LTLis a suitable framework for modelling biological systems due to its ability to describethe interaction between components (represented by propositional variables) and theirpresence/absence in different time instants. Beyond giving a formal deﬁnition of graphsrepresenting MIMs, the paper shows how they can be modeled as an LTL theory, bymeans of a direct “encoding”, without resorting to intermediate (and cumbersome) ad-hoc logics. The logical encoding allows one to formally address reasoning tasks, suchas, for instance, checking whether a graph satisﬁes some given property φ , as well asﬁnding out which initial conditions would guarantee φ , or else how can the the graphbe updated in order to satisfy φ . A ﬁrst prototypal system has been implemented onthe basis of the theoretical work, allowing one to automatically accomplish reasoningtasks on MIMs.It is worth pointing out that the adequacy of LTL to model MIMs is due to thefact that the latter are qualitative representations of biological processes. In otherterms, they model the interactions among the different components of a biological sys-tem without resorting, for instance, to differential equations like the Systems BiologyMarkup Language (SBML) [22] does.The rest of this paper is organized as follows. Section 2 gives a brief overview of2igure 1: atm-chk2/atr-chk1 molecular interaction map.3odelling approaches for networks of biological entities. Section 3 presents the lacoperon that will be used as a leading example to introduce all the concepts dealt withby our approach. Section 4 describes the fundamental elements and concepts of themodelling approach. Section 5 presents Molecular Interaction Graphs (MIGs), whichformalize Molecular Interaction Maps capable of describing and reasoning about gen-eral pathways. Section 6 explains how MIGs can be represented by use of LinearTemporal Logic. Section 8 describes the current state of the operational implemen-tation of the software tool and section 9 presents some examples on larger problems.Finally, Section 10 concludes this paper and discusses possible future work. The typical objects to be modelled in the framework of systems biology are networks ofinteracting elements that evolve in time. According to the features of the network andits properties, various approaches can be followed, which can describe the dynamics ofthe system taking the following elements into consideration: • Components : they are represented by variables, which can be either discrete or continuous depending on the requirements of the model. • Interactions : they are represented by rules that specify the dynamical changes inthe variables values. These interactions can in their turn be classiﬁed accordingto the adopted representation of time ( discrete or continuous ). Finally, the execu-tion of an action can be either stochastic or not, if a certain degree of uncertaintyis considered, reﬂecting the assumption of a noisy environment.According to the different possible semantics, the various modelling approachesmay be classiﬁed as follows [20]: • Models that involve component quantities and deterministic interactions: suchmodels are mathematical, inherently quantitative and usually based on ordinarydifferential equations. Tools like Timed Automata representations or Continuous-Time Markov Chains are used in the construction of models of this category. • Discrete-value models: they are characterised by the use of discrete time. Ap-proaches like executable models based on Finite State Machines representationsor stochastic models such as Discrete-Time Markov Chains belong to this cate-gory.Other hybrid models such as Hybrid Automata or Process Algebraic Techniques, mixdiscrete and continuous representation for both variables and time dynamics. Biolog-ical properties can be distinguished between qualitative and quantitative : in the for-mer case, time has an implicit consideration while the latter involves reasoning on thedynamics of the system along time. To give an example, reachability and temporalordering of events are considered qualitative properties while equilibrium states and matabolite dynamics are quantitative properties.4ene Regulatory Networks (GRNs) have been very well studied in the temporalcontext because the interaction between components may be easily represented bytheir presence/absence, i.e components are represented by boolean variables and in-teractions are represented by logical rules on their values. Following this approachChabrier et al. [10] successfully modeled a very large network, involving more than500 genes. They resorted to Concurrent Transition Systems (CTS), allowing one tomodel modular systems, and can be then translated into the NuSMV language. Theychecked reachability, stability and temporal ordering properties by the use of CTL. Asimilar study on a much smaller (although real) biological system has been performedin [6]. Here the LTL speciﬁcation syntax and the Spin model checker are used to verifystability properties.When a quantitative approach is chosen, the model dimensions drop drastically.This is essentially due to lack of knowledge on the parameter values for all the interac-tions, and to the increased computational complexity deriving from a large model. Inthis kind of settings, logical approaches have been used to verify temporal propertieson the representations. For instance, [4, 18, 17] use CTL to verify, among other proper-ties, reachability and stability on different types of biological networks, and in [7] suchproperties are checked by using LTL. All these approaches are supported not only fortheoretical results but also for tools and frameworks that allow biologists to describe abiological network and then verify whether such representations satisfy some desiredproperties. Among others, the systems BIOCHAM [8], Bio-PEPA [12, 30] and AN-IMO [36] are very popular in the community. We refer the reader to [35, 19] for anoverview on this topic.Some considerations can be made from the study of the aforementioned contribu-tions: • the size of the modelled systems is generally very small, and a great degree ofabstraction and suitable tools are needed to deal with large models; • qualitative approaches are generally enough to analyse a large variety of inter-esting biological properties; • temporal logic plays an important role in the representation and veriﬁcation ofbiological systems.Contrarily to approaches incorporating quantitative information into the temporalformalisation [11], our contribution belongs to the category of qualitative approaches,since quantitative information in biological relations, such as the quantity of reactivesand their speed of consumption in a reaction, are not formalised. MIMs in fact representthe interaction among the different components of the system and how they evolve intime according to the different reactions. To the best of our knowledge, there is not anycontribution where MIMs are used to model quantitative biological information.5igure 2: The lac operon This section describes a simple example, which represents the regulation of the lac operon (lactose operon), already used in [1, 3]. The lac operon is an operon requiredfor the transport and metabolism of lactose in many bacteria. Although glucose isthe preferred carbon source for most bacteria, the lac operon allows for the effectivedigestion of lactose when glucose is not available. The lac operon is a sequence ofthree genes (lacZ, lacY and lacA) which encodes 3 enzymes which in turn carry thetransformation of lactose into glucose. We will concentrate here on lacZ which encodes β -galactosidase which cleaves lactose into glucose and galactose.The lac operon uses a two-part control mechanism to ensure that the cell expendsenergy producing the enzymes encoded by the lac operon only when necessary. First,in the absence of lactose, the lac repressor halts production of the enzymes encodedby the lac operon. Second, in the presence of glucose, the catabolite activator protein(CAP), required for production of the enzymes, remains inactive.Figure 2 describes this regulatory mechanism. The expression of lacZ gene isonly possible when RNA polymerase (pink) can bind to a promotor site (marked P, The Nobel prize was awarded to Monod, Jacob and Lwoff in 1965 partly for the discovery of the lacoperon by Monod and Jacob [24], which was the ﬁrst genetic regulatory mechanism to be understood clearly,and is now a “standard” introductory example in molecular biology classes. See also [37] negative regulation , or inhibition , asit blocks the production of the proteins. When lactose is present, one of its isomer,allolactose, binds with repressor protein Lacl which is no longer able to bind to thepromotor site, thus enabling RNA polymerase to bind to the promotor site and to startexpressing the lacZ gene if CAMP is bound to CAP.The CAMP molecule is on the opposite a positive regulation molecule, or an ac-tivation molecule, as its presence is necessary to express the lacZ gene. However,the concentration of CAMP is itself regulated negatively by glucose: when glucose ispresent, the concentration of CAMP becomes low, and thus CAMP does not bind tothe CAP site, blocking the expression of lacZ. Thus glucose prevents the activation byCAMP of the expression of galactosidase from lacZ.

The mechanism described in the previous section is represented in Figure 3, which isan example of MIM. This example contains all the relations and all the categories ofentities (i.e. the nodes of the graph) that we use in our modelling. They are presentedbelow. Technically, the generation of CAMP from Adenosine Tri Phosphate (ATP) is blocked by the presenceof glucose, but we have simpliﬁed the graph by simply writing that the presence of glucose prevents theactivation by CAMP of the expression of galactosidase from lacZ. .1 Relations The relations among the entities (represented by links in the graphs) represent reactionsand can be of different types:

Productions can take two different forms, depending on whether the reactants areconsumed by the reactions or not:1) The graphical notation used when a reaction consumes completely the re-actant(s) is a , . . . , a n ß b , meaning that the production of b completelyconsumes a , . . . , a n .For instance, in Figure 3, lactose, when activated by galactosidase producesglucose, and is consumed while doing so, which is thus noted by lactose ß glucose .2) If the reactants are not completely consumed by the reaction, the used no-tation is a , . . . , a n (cid:254) b . Here b is produced but a , . . . , a n are still presentafter the production of b .For example, the expression of the lacZ gene to produce galactosidase (orof the lacl gene to produce the Lacl repressor protein) does not consumethe gene, and we thus have lacZ (cid:254) galactosidase . Regulations are also of two types: every reaction can be either inhibited or activated by other proteins or conditions.1. The notation of the type a , . . . , a n (cid:254) . . . means that the simultaneouspresence of a , . . . , a n activates a production or another regulation.In the example of Figure 3 the production of galactosidase from the ex-pression of the lacZ gene is activated by CAMP ( CAM P (cid:254) ( lacZ (cid:254) Galactosidase ) expresses activation).2. The notation a , . . . , a n (cid:173) . . . represents the fact that simultaneous pres-ence of a , . . . , a n inhibits a production or another regulation.In Figure 3, Repressor (cid:173) ( lacZ (cid:254) Galactosidase ) represents the factthat production of galactosidase is blocked (or inhibited) by the Lacl re-pressor protein.Figure 4.(a) shows the basic inhibitions/activations on a reaction: the production of b from a , . . . , a n is activated by the simultaneous presence of both c , . . . , c n and thesimultaneous presence of d , . . . , d n , and inhibited by either the simultaneous presenceof e , . . . , e n or the simultaneous presence of f , . . . , f n .These regulations are often “stacked”, on many levels, like shown in Figure 4.(b).For example in Figure 3, the inhibition by the Lacl repressor protein of the productionof galactosidase can itself be inhibited by the presence of lactose, while the activationof the same production by CAMP is inhibited by the presence of glucose.8a) Activations/Inhibitions (b) StackingFigure 4: Activations/Inhibitions and Stacking Entities occurring in node labels can be of two different types:

Exogenous: the value of an exogenous variable is set once and for all by the envi-ronment or by the experimenter at the start of the simulation and never changesthrough time; if the entity is set as present and used in a reaction, the environmentwill always provide “enough” of it and it will remain present.

Endogenous: an endogenous entity can either be present or absent at the beginning ofthe process, as set by the experimenter, and its value after the start of the processis set only by the dynamics of the graph.These distinctions are fundamental, because the dynamics of entities are differentand they must be formalized differently. In practice, the type of an entity is somethingwhich is set by the biologist, according to his professional understanding of the biolog-ical process described by the map. For instance, in Figure 3, the type of the differententities could be set as follows in order to describe the real behaviour of the lac operon:lacl, lacZ, CAMP and lactose are initial external conditions of the model and they donot evolve in time, and are thus exogenous. Note, in particular, that lactose can beset as an exogenous entity, even if the graph “says” that it is consumed when produc-ing glucose. Conversely, galactosidase, the repressor protein and glucose can only beproduced inside the graph, and are thus endogenous.It is important to notice that glucose could be set as an exogenous variable if theexperimenter is interested in testing an environment where glucose is provided exter-nally. Reciprocally, in a more accurate representation of the lac operon, CAMP wouldbe an endogenous variable, produced by ATP and regulated by glucose. These graphsare only a representation and an approximation of the real process, designed to ﬁt theparticular level of description that the experimenter wants to model.Although MIMs may contain also other kinds of entities or links, the two kind ofentitities and four kinds of interactions presented above are all that is needed to buildthe Molecular Interactions Maps we are using in this paper.9 .3 Temporal evolution

A MIM can be considered as an automaton which produces sequences of states ofits entities and Linear Temporal Logic formulas can well describe such sequences ofstates. Time is supposed to be discrete, and all relations (productions/consumptions)that can be executed are executed simultaneously at each time step. An entity canhave two states (or values): absent (0) or present (1). When an entity is consumed,it becomes absent and when it is produced it becomes present. In other terms, sincequantities are not taken into account, due also to the lack of reliable data thereon,reactions do not contend to get use of given resources: if an entity is present, its quantityis assumed to be enough to be used by all reactions needing it.This behaviour might look simplistic, as it does not take into account the kinetic ofreactions, but it reﬂects a choice underlying MIMs representation framework and, as amatter of fact, it is nevertheless adequate to handle many problems.The software tool that will be described in Section 8 provides default values bothfor the variables and for their classiﬁcation as exogenous or endogenous. However, theuser can modify such default settings through the graphical interface.

This section is devoted to deﬁne

Molecular Interaction Graphs (MIGs), the graphstructures which are the formal representations of MIMs. The concept of trace willalso be deﬁned, with the aim of characterising the dynamic behaviour of a MIM.A MIG is essentially a graph whose vertices are identiﬁed with ﬁnite sets of atoms,each of which represents a molecule. Productions are represented by links connectingvertices, while regulations are links whose origin is a vertex and whose target is anotherlink.

Deﬁnition 1 (Molecular Interaction Graph) . A Molecular Interaction Graph (MIG) G is a tuple (cid:104) At, Ex, Ed, P , C , A , I , B(cid:105) where • At is a ﬁnite set of atoms, partitioned into the sets Ex (the set of exogenous atoms) and Ed (the set of endogenous ones), • B is a set of literals from atoms in At (the initial conditions ), • P and C are sets of productions : P ⊆ { ( P (cid:254) Q ) | P, Q ⊆ At } C ⊆ { ( P ß Q ) | P, Q ⊆ At }• A and I are sets of regulations , such that for some n ∈ N : A = n (cid:91) i =0 A i I = n (cid:91) i =0 I i where A i and I i are inductively deﬁned as follows: A ⊆ { ( P (cid:254) X ) | X ∈ P ∪ C} I ⊆ { ( P (cid:173) X ) | X ∈ P ∪ C}A i +1 ⊆ { ( P (cid:254) X ) | X ∈ A i ∪ I i } I i +1 ⊆ { ( P (cid:173) X ) | X ∈ A i ∪ I i } link is either a production (i.e. an element of P ∪ C ) or a regulation (an element of

A ∪ I ). The depth of a regulation X is the integer k such that X ∈ A k ∪ I k . The “stratiﬁed” deﬁnition of regulations rules out the possibility of circular chainsof activations and inhibitions. Furthermore, it is worth pointing out that, since At is aﬁnite set of atoms, then also the sets P and C are ﬁnite. Consequently, A and I areﬁnite sets too, since the depth of their elements is bounded by a given ﬁxed n ∈ N .Note that Deﬁnition 1 is a generalization w.r.t. the presentation of productions givenin Section 4, in that it allows for multiple entities on the right-hand side of a produc-tion. This extension can be considered as an abbreviation: a , . . . , a n (cid:40) b , . . . b k (where (cid:40) is either (cid:254) or ß ) stands for the set of productions a , . . . , a n (cid:40) b , . . . , a , . . . , a n (cid:40) b k . Example 1.

The MIG representing the MIM shown in Figure 3, ignoring the initialconditions and the exogenous and endogenous atoms, is constituted by At = { Lactose, Galactosidase, Glucose, CAM P, lacZ, Galactosidase,Lactose, Repressor, lacl }P = { ( lacl (cid:254) Repressor ) , ( lacZ (cid:254) Galactosidase ) }C = { ( Lactose ß Glucose ) }A = { ( CAM P (cid:254) ( lacZ (cid:254) Galactosidase )) , ( Galactosidase (cid:254) (( Lactose ß Glucose )) }I = { ( Repressor (cid:173) ( lacZ (cid:254) Galactosidase )) , ( Lactose (cid:173) ( Repressor (cid:173) ( lacZ (cid:254) Galactosidase ))) , ( Glucose (cid:173) ( CAM P (cid:254) ( lacZ (cid:254) Galactosidase ))) } Having deﬁned the structure of MIGs, we now need to provide some machinery thatallows one to determine the set of substances that trigger an activation (resp. inhibition)in a MIG. The next deﬁnition introduces functions whose values are the regulationsdirectly activating/inhibiting a link X in a MIG. Deﬁnition 2 (Direct regulations of a link) . For every link X ∈ P ∪ C ∪ A ∪ I : γ a ( X ) = { Y ∈ A | Y has has the form ( P (cid:254) X ) } γ i ( X ) = { Y ∈ I | Y has the form ( P (cid:173) X ) } Similarly to transition systems, a MIG constitutes a compact representation of a setof inﬁnite sequences of states, where every state is determined by the set of proteins,genes, enzymes, metabolites, etc that are present in the cell at a given time. A sequenceof such states thus represents the temporal evolution of the cell, and will be called a trace . Differently from transition systems, however, the evolution of a MIG is deter-ministic: each possible initial conﬁguration determines a single trace. The reason forthis is that the representation abstract from quantities, hence entities are not consideredas resources over which reactions may compete (see the remark at the end of Section4). Before formally deﬁning the concept of trace, we introduce some preliminary con-cepts such as the notion of active and inhibited links. These two concepts, that arerelative to a given situation (i.e. a given set of atoms assumed to be true), will providethe temporal conditions under which a production can be triggered.11 eﬁnition 3 (Active and inhibited links) . Let G = (cid:104) At, Ex, Ed, P , C , A , I , B(cid:105) be aMIG. Given D ⊆ At and (cid:40) ∈ { (cid:254) , ß , (cid:173) } , a link X = ( P (cid:40) Y ) ∈ P ∪ C ∪ A ∪ I (where Y is either a set of atoms or a link) is said to be active in D if the followingconditions hold:1. P ⊆ D ;2. every Z ∈ γ a ( X ) is active in D – i.e. every regulation of the form Q (cid:254) X ∈ A is active in D ;3. for all Z ∈ γ i ( X ) , Z is not active in D – i.e. there are no regulations of the form ( Q (cid:173) X ) ∈ I that are active in D .A link X ∈ P ∪ C ∪ A ∪ I is inhibited in D iff X is not active in D . Before formalising the concepts of production and consumption of substances in-side a cell, it is worth pointing out that:1) a substance is produced in a cell as a result of a reaction, which is triggeredwhenever the reactants are present and the regulation conditions allow its execu-tion.2) A substance is consumed in a cell if it acts as a reactive in a reaction which hasbeen triggered.3) We do not consider quantitative information like concentrations or reaction times:if a substance is involved in several reactions at a time, its concentration does notmatter, all reactions will be triggered. Conversely, if a substance belongs to theconsumed reactants of a triggered reaction, it will be completely consumed.4) I might be the case that a substance is consumed in a reaction while produced bya different one, at the same time. This possibility, that will be further commentedbelow, will however raise no inconsistency in the deﬁnition of traces.

Deﬁnition 4 (Produced and consumed atoms) . Let G = (cid:104) At, Ex, Ed, P , C , A , I , B(cid:105) be a MIG and D ⊆ At . An atom p ∈ At is produced in D iff p ∈ Ed and there exists ( P (cid:40) Q ) ∈ P ∪ C , for (cid:40) ∈ { (cid:254) , ß } , such that:(i) p ∈ Q and(ii) ( P (cid:40) Q ) is active in D .An atom p is consumed in D iff p ∈ Ed and there exists ( P ß Q ) ∈ C such that(i) p ∈ P and(ii) ( P ß Q ) is active in D . Remark 1.

It may happen that an atom p is both produced and consumed in a given D ⊆ At . Consider, for instance, a MIG with At = Ed = { p, q, r } , P = { ( p (cid:254) q ) } , C = { ( q ß r ) } , I = A = ∅ . If D = { p, q } , the atom q is produced by ( p (cid:254) q ) andconsumed by ( q ß r ) in D , since { p } ⊆ D , q ∈ { q } ⊆ D and there are no regulationsgoverning these two productions. An even simpler example is given by the (unrealistic)MIG with At = Ed = { p } , C = { ( p ß p ) } , P = I = A = ∅ and D = { p } . trace , takinginto account activations, inhibitions, productions and consumptions. Deﬁnition 5 (Trace) . A trace T on a set At of atoms is an inﬁnite sequence of subsetsof At , T , T , · · · , called states . If G = (cid:104) At, Ex, Ed, P , C , A , I , B(cid:105) is a MIG, a tracefor G is a trace T such that:1. p ∈ T for every p ∈ B and p (cid:54)∈ T for every ¬ p ∈ B ;2. for all k ≥ and every atom p ∈ At : • if p ∈ Ex , then p ∈ T k +1 iff p ∈ T k ; • if p ∈ Ed , then p ∈ T k +1 if and only if either p is produced in T k or p ∈ T k and p is not consumed in T k . It is worth pointing out that the condition on traces for a given MIG G ensures thatevery change in a state of the trace affecting endogenous atoms has a justiﬁcation in G .Consequently, given the initial state T of a trace for G , all the others are deterministi-cally determined by the productions and regulations of G .As a ﬁnal observation we remark that, when an atom p is both produced and con-sumed in a given T k , production prevails over consumption. For instance, in a tracefor the MIG of Remark 1 with T = { p, q } , where q is both produced and consumed, T k = { p, q, r } for all k ≥ . This section considers the connection between traces and LTL and describes how torepresent a MIG G by means of an LTL theory whose models are exactly the traces for G . LTL formulae with only unary future time operators are built from the grammar ϕ ::= ⊥ | p | ¬ ϕ | ϕ ∨ ϕ | (cid:13) ϕ | (cid:3) ϕ where p is an atom (the other propositional connectives and the “eventually” operatorcan be deﬁned as usual).An LTL interpretation T is a trace, i.e. an inﬁnite sequence T , T , . . . of states ,where a state is a set of atoms. The satisfaction relation T k | = ϕ , where T k is a stateand ϕ a formula built from a set of atoms At , is deﬁned as follows:1. T k | = p iff p ∈ V k , for any p ∈ At ;2. T k (cid:54)| = ⊥ ;3. T k | = ¬ ϕ iff T k (cid:54)| = ϕ ;4. T k | = ϕ ∨ ψ ; iff T k | = ϕ or T k | = ψ ;5. T k | = (cid:13) ϕ iff T k +1 | = ϕ ;6. T k | = (cid:3) ϕ iff for all j ≥ k , T j | = ϕ ;13 formula ϕ is true in an interpretation T if and only if T | = ϕ .A MIG G = (cid:104) At, Ex, Ed, P , C , A , I , B(cid:105) is represented by means of a set of LTLformulae on the set of atoms At . First of all, classical formulae representing the factthat a given link is active (or inhibited) are deﬁned. Below, (cid:40) stands for any of (cid:254) , ß or (cid:173) Deﬁnition 6.

Let G = (cid:104) At, Ex, Ed, P , C , A , I , B(cid:105) be a MIG. If X = ( P (cid:40) Y ) ∈P ∪ C ∪ A ∪ I (where Y is either a set of atoms or a link), then: A ( P (cid:40) Y ) def = (cid:94) p ∈ P p ∧ (cid:94) ρ ∈ γ a ( X ) A ( ρ ) ∧ (cid:94) ρ ∈ γ i ( X ) I ( ρ ) where I ( X ) is an abbreviation for the negation normal form of ¬ A ( X ) . It is worth pointing out that both A ( X ) and I ( X ) are classical propositional for-mulae. Example 2.

Let us consider, for instance, the links of the MIG G of Example 1:

1) ( lacl (cid:254)

Repressor )2) ( lacZ (cid:254)

Galactosidase )3) (

Lactose ß Glucose )4) (

CAM P (cid:254) ( lacZ (cid:254) Galactosidase ))5) (

Galactosidase (cid:254) ( Lactose ß Glucose )6) (

Repressor (cid:173) ( lacZ (cid:254) Galactosidase ))7) (

Lactose (cid:173) ( Repressor (cid:173) ( lacZ (cid:254) Galactosidase )))8) (

Glucose (cid:173) ( CAM P (cid:254) ( lacZ (cid:254) Galactosidase )))

For each of them, A ( X ) and I ( X ) can be computed as follows: A (1) = A ( lacl (cid:254) Repressor ) = lacl ; A (7) = A ( Lactose (cid:173) ( Repressor (cid:173) ( lacZ (cid:254) Galactosidase ))) =

Lactose ; A (8) = A ( Glucose (cid:173) ( CAM P (cid:254) ( lacZ (cid:254) Galactosidase ))) =

Glucose ; A (4) = A ( CAM P (cid:254) ( lacZ (cid:254) Galactosidase )) =

CAM P ∧ I (8) = CAM P ∧ ¬

Glucose ; A (5) = A ( Galactosidase (cid:254) ( Lactose ß Glucose )) =

Galactosidase ; A (3) = A ( Lactose ß Glucose ) =

Lactose ∧ A (5) = Lactose ∧ Galactosidase ; A (6) = A ( Repressor (cid:173) ( lacZ (cid:254) Galactosidase )) =

Repressor ∧ I (7) = Repressor ∧ ¬

Lactose ; A (2) = A ( lacZ (cid:254) Galactosidase ) = lacZ ∧ A (4) ∧ I (6) = lacZ ∧ CAM P ∧ ¬

Glucose ∧ ( ¬ Repressor ∨ Lactose ) ; The next result establishes that A ( X ) is an adequate representation of the propertyof being active for the link X . Lemma 1.

Let G = (cid:104) At, Ex, Ed, P , C , A , I , B(cid:105) be a MIG and D ⊆ At . For everylink X ∈ P ∪ C ∪ A ∪ I : D | = A ( X ) if and only if X is active in D . roof. Let the size of a link X , size ( X ) , be deﬁned as the number of arrows (cid:40) ∈ { (cid:254) , ß , (cid:173) } occurring in X , and let M be the maximal size of a link in G . If X = P (cid:40) Y is any link in G , the proof is by induction on k = M − size ( X ) . • If k = 0 , then G does not have any link of size greater than size ( X ) , hence γ a ( X ) = γ i ( X ) = ∅ , A ( P (cid:40) Y ) = (cid:86) p ∈ P P , and X is active in D iff P ⊆ D .Clearly, D | = (cid:86) p ∈ P P iff P ⊆ D . • If k > , then, for every Z ∈ γ a ( X ) ∪ γ i ( X ) , size ( Z ) = size ( X ) + 1 , hence M − ( k + 1) < k . By the induction hypothesis, D | = A ( Z ) iff Z is active in D .Then the thesis follows from the facts that: (i) D | = (cid:86) p ∈ P p iff P ⊆ D ; (ii) for all Z ∈ γ a ( X ) , D | = A ( Z ) iff Z is active in D (by the induction hypothesis), and(iii) for all Z ∈ γ i ( X ) , D | = ¬ A ( Z ) iff Z is not active in D (by the inductionhypothesis).In order to give a more compact presentation of the LTL theory representing aMIG, we deﬁne, for each atom p ∈ At , classical formulae representing the fact that p is produced or consumed. Deﬁnition 7.

Let Let G = (cid:104) At, Ex, Ed, P , C , A , I , B(cid:105) be a MIG, P rod = P ∪ C , and p ∈ At . Then: Pr ( p ) def = (cid:40) ⊥ if p ∈ Ex (cid:87) ( P (cid:40) Q ) ∈P rod, p ∈ Q A ( P (cid:40) Q ) if p ∈ Ed Cn ( p ) def = (cid:40) ⊥ if p ∈ Ex (cid:87) ( P ß Q ) ∈C , p ∈ P A ( P ß Q ) if p ∈ Ed Example 3.

Let us consider the simple MIG G of Example 1, where atoms are parti-tioned into Ex = { lacl, lacZ, CAM P } and Ed = { Repressor, Lactose, Galactosidase,Glu cose } . The abbreviations Pr ( p ) and Cn ( p ) for the endogenous atoms are the In this example we assume that lactose is endogenous, because it is the only consumed entity in thesimple MIM of ﬁgure 3. ollowing: Pr ( Repressor ) def = A ( lacl (cid:254) Repressor ) def = lacl Pr ( Lactose ) def = ⊥ Pr ( Galactosidase ) def = A ( lacZ (cid:254) Galactosidase ) def = lacZ ∧ CAM P ∧ ¬

Glucose ∧ ( ¬ Repressor ∨ Lactose ) Pr ( Glucose ) def = A ( Lactose ß Glucose ) def = Lactose ∧ Galactosidase Cn ( Lactose ) def = A (( Lactose ß Glucose ) def = Lactose ∧ Galactosidase Cn ( p ) def = ⊥ for p ∈ { Repressor, Galactosidase, Glucose } Finally, the set of LTL formulae ruling the overall behaviour of a MIG can bedeﬁned.

Deﬁnition 8. If G = (cid:104) At, Ex, Ed, P , C , A , I , B(cid:105) is a MIG, the

LTL encoding of G isthe set of formulae containing all the literals in B and, for every p ∈ At , the formula (cid:3) ( (cid:13) p ↔ Pr ( p ) ∨ ( p ∧ ¬ Cn ( p ))) It is worth pointing out that, if p ∈ Ex , then the formula encoding its behaviour isequivalent to (cid:3) ( (cid:13) p ↔ p ) . For endogenous atoms, the encoding captures the (nega-tive and positive) effects produced by a reaction on the environment at any time. Thisencoding has some similarities with the successor state axioms of the Situation Calcu-lus [34].

Example 4. If G is the MIG of Example 1, the LTL encoding of G contains (formulaeequivalent to) (cid:3) ( (cid:13) lacl ↔ lacl ) , and similar ones for lacZ and CAM P .Furthermore, it contains the following formulae, ruling the behaviour of endoge-nous atoms: (cid:3) ( (cid:13) Repressor ↔ Pr ( Repressor ) ∨ ( Repressor ∧ ¬ Cn ( Repressor ))) ≡ (cid:3) ( (cid:13) Repressor ↔ lacl ∨ Repressor ) (cid:3) ( (cid:13) Lactose ↔ Pr ( Lactose ) ∨ ( Lactose ∧ ¬ Cn ( Lactose ))) ≡ (cid:3) ( (cid:13) Lactose ↔ Lactose ∧ ¬ ( Lactose ∧ Galactosidase )) (cid:3) ( (cid:13) Galactosidase ↔ Pr ( Galactosidase ) ∨ ( Galactosidase ∧ ¬ Cn ( Galactosidase ))) ≡ (cid:3) ( (cid:13) Galactosidase ↔ ( lacZ ∧ CAM P ∧ ¬

Glucose ∧ ( ¬ Repressor ∨ Lactose ) ∨ Galactosidase )) (cid:3) ( (cid:13) Glucose ↔ Pr ( Glucose ) ∨ ( Glucose ∧ ¬ Cn ( Glucose ))) ≡ (cid:3) ( (cid:13) Glucose ↔ ( Lactose ∧ Galactosidase ) ∨ Glucose ) The rest of this section is devoted to show that the LTL encoding of a MIG correctlyand completely represents its behaviour. First of all, we prove that the truth of Pr ( p ) and Cn ( p ) in a state coincide with the atom p being produced/consumed at that state.16 emma 2. If T is a model of the LTL encoding of a MIG, then for every k and everyatom p ∈ At , p is produced in T k iff T k | = Pr ( p ) and p is consumed in T k iff T k | = Cn ( p ) .Proof. Let T be a model of G = (cid:104) At, Ex, Ed, P , C , A , I , B(cid:105) P rod = P ∪ C , k ∈ N and p ∈ At .1. If T k | = Pr ( p ) then Pr ( p ) (cid:54) = ⊥ and there exists some ( P (cid:40) Q ) ∈ P rod suchthat p ∈ Q and T k | = A ( P (cid:40) Q ) . By Lemma 1, ( P (cid:40) Q ) is active in T k .Moreover, since Pr ( p ) (cid:54) = ⊥ , p ∈ Ed . Therefore, from Deﬁnition 4 it followsthat p is produced in T k .2. If p is produced in T k , then p ∈ Ed and there exists some ( P (cid:40) Q ) ∈ P rod ,such that p ∈ Q and ( P (cid:40) Q ) is active in T k . By Lemma 1, T k | = A ( P (cid:40) Q ) ,hence T k | = Pr ( p ) by Deﬁnition 7, since p ∈ Ed .3. If T k | = Cn ( p ) then Cn ( p ) (cid:54) = ⊥ and there exists some ( P ß Q ) ∈ C such that p ∈ P and T k | = A ( P ß Q ) , By Lemma 1, P ß Q is active in T k . Moreover,since Cn ( p ) (cid:54) = ⊥ , p ∈ Ed . Therefore, from Deﬁnition 4 it follows that p isconsumed in T k .4. If p is consumed in T k , then p ∈ Ed and there exists some ( P ß Q ) ∈ C suchthat p ∈ P and ( P ß Q ) is active in T k . By Lemma 1, T k | = A ( P ß Q ) ,therefore T k | = Cn ( p ) by Deﬁnition 7, since p ∈ Ed .The adequacy of the LTL encoding of a MIG can ﬁnally be proved. Theorem 1 (Main result) . If G is a MIG, then:1. every trace for G is a model of the LTL encoding of G ;2. every model of the LTL encoding of G is a trace for G .Proof. Let us assume that T is a trace for G = (cid:104) At, Ex, Ed, P , C , A , I , B(cid:105) . Clearly,for every literal (cid:96) ∈ B , T | = (cid:96) , since (cid:96) belongs to the encoding of G . Moreover, for all k ≥ and every atom p ∈ At : • if p ∈ Ex , then p ∈ T k +1 if and only if p ∈ T k . Hence, T k | = (cid:13) p ↔ p , i.e. T k | = (cid:13) p ↔ Pr ( p ) ∨ ( p ∧ ¬ Cn ( p )) . • If p ∈ Ed , then p ∈ T k +1 if and only if either p is produced in T k or p ∈ T k and p is not consumed in T k . By Lemma 2, this amounts to saying that p ∈ T k +1 if and only if either T k | = Pr ( p ) or p ∈ T k and T k (cid:54)| = Cn ( p ) . Consequently, T k | = (cid:13) p ↔ Pr ( p ) ∨ ( p ∧ ¬ Cn ( p )) .Since these properties hold for all k , it follows that for all p ∈ At , T | = (cid:3) ( (cid:13) p ↔ Pr ( p ) ∨ ( p ∧ ¬ Cn ( p ))) .For the other direction, let us assume that T is a model of the LTL encoding of G .Then, in particular, T | = B , hence p ∈ T for every p ∈ B , and p (cid:54)∈ T for every ¬ p ∈ B . Moreover, for all k ≥ and every atom p ∈ At :17 if p ∈ Ex , then T k | = (cid:13) p ↔ p , hence p ∈ T k +1 if and only if p ∈ T k . • If p ∈ Ed , then T k | = (cid:13) p ↔ Pr ( p ) ∨ ( p ∧ ¬ Cn ( p )) , hence p ∈ T k +1 ifand only if either T k | = Pr ( p ) or p ∈ T k and T k (cid:54)| = Cn ( p ) . By lemma 2, thisamounts to saying that p ∈ T k +1 if and only if either p is produced in T k or p ∈ T k and p is not consumed in T k .Consequently, T is a trace for G . The use of an LTL formalization allows us to consider solutions with inﬁnite lengthwhen performing reasoning tasks such as abduction or satisﬁability checks. However,LTL tools for abduction are not as developed as in the case of propositional logic, sincethe abductive task is in general very complex. In order to take advantage of the highlyefﬁcient tools for propositional reasoning such as SAT-solvers, abduction algorithms,etc, the solver that will be presented in Section 8 reduces the problem to propositionallogic by assuming bounded time. In essence, the reduction simulates the truth value ofan LTL propositional variable p along time by a ﬁnite set of n fresh atoms, one per timeinstant. Moreover, the behaviour of the “always” temporal operator is approximated byuse of ﬁnite conjunctions. Exogenous variables are not grounded, since it is uselessand expensive to consider different variables in this case.In detail, the grounding to a given time k ∈ N of a propositional formula ϕ builtfrom a set of atoms partitioned into exogenous and endogenous is ﬁrst of all deﬁned. Deﬁnition 9 (Grounding of propositional formulae) . Let ϕ be a propositional formulabuilt from the set of atoms At = Ex ˙ ∪ Ed . The grounding of ϕ to time k , (cid:104) ϕ (cid:105) k , isdeﬁned as follows: • if p ∈ Ex , then (cid:104) p (cid:105) k def = p ; • if p ∈ Ed , then (cid:104) p (cid:105) k def = p k , where p k is a new propositional variable; • (cid:104)¬ ϕ (cid:105) k def = ¬(cid:104) ϕ (cid:105) k ; • (cid:104) ϕ ∨ ψ (cid:105) k def = (cid:104) ϕ (cid:105) k ∨ (cid:104) ψ (cid:105) k .If S is a set of proposional formulae, then (cid:104) S (cid:105) k = {(cid:104) ϕ (cid:105) k | ϕ ∈ S } . Next, the grounding of the encoding of a MIG is deﬁned.

Deﬁnition 10 (Grounding of the encoding of a MIG) . Let G = (cid:104) At, Ex, Ed, P , C , A , I , B(cid:105) be a MIG, S its LTL encoding and k ∈ N .For all p ∈ Ed , if SSA p is the formula (cid:3) ( (cid:13) p ↔ Pr ( p ) ∨ ( p ∧ ¬ Cn ( p ))) belonging to S , we deﬁne (cid:104) SSA p (cid:105) k = p k +1 ↔ (cid:104) Pr ( p ) ∨ ( p ∧ ¬ Cn ( p )) (cid:105) k A method to perform abduction for a fragment of LTL sufﬁcient to represent problems on MIMs hasbeen proposed in [9], but it has not been implemented. he grounding (cid:104) S (cid:105) k of S up to time k is deﬁned as follows: (cid:104) S (cid:105) k = {(cid:104) (cid:96) (cid:105) | (cid:96) ∈ B} ∪ {(cid:104) SSA p (cid:105) i | p ∈ Ed and ≤ i < k } The grounding (cid:104)

SSA p (cid:105) k is well deﬁned, since Pr ( p ) ∨ ( p ∧¬ Cn ( p )) is a classicalformula. Note that “successor state axioms” SSA p in the LTL encoding of G aregrounded only for endogenous variables and only as far as the “ (cid:13) p ” refers to a statethat “exists” in the bounded timed model.The next deﬁnition formalizes the notion of a temporal interpretation T and a clas-sical one M being models of the same initial state. Deﬁnition 11.

Let At = Ex ˙ ∪ Ed be a set of atoms, T = T , T , . . . an LTL in-terpretation of the language At and k ∈ N . A classical interpretation M is said tocorrespond to T up to time limit k if M is an interpretation of the language Ex ∪ { p i | p ∈ Ed and ≤ i ≤ k } and for all p ∈ At , M | = (cid:104) p (cid:105) iff T | = p . The next result establishes a kind of “model correspondence” property.

Theorem 2 (Model correspondence) . Let G = (cid:104) At, Ex, Ed, P , C , A , I , B(cid:105) be a MIG, S its LTL encoding, and (cid:104) S (cid:105) n the grounding of S up to time n . If T = T , T , . . . isany model of S and M a model of (cid:104) S (cid:105) n corresponding to T , then for every classicalpropositional formula ϕ and every k = 0 , . . . , n : M | = (cid:104) ϕ (cid:105) k iff T k | = ϕ .Proof. By double induction on k and ϕ .1. If k = 0 , the thesis is proved by induction on ϕ .(a) If ϕ is an atom, then the thesis follows immediately from the fact that M corresponds to T .(b) If ϕ = ¬ ϕ or ϕ = ϕ ∨ ϕ , the thesis follows from the induction hy-pothesis, the deﬁnition of (cid:104) ϕ (cid:105) k (Deﬁnition 9) and the deﬁnition of | = forclassical logic.2. < k ≤ n : By the induction hypothesis T k − | = ϕ iff M | = (cid:104) ϕ (cid:105) k − for everypropositional formula ϕ . The thesis is proved by induction on ϕ :(a) If ϕ is an atom, we consider two cases:i. p ∈ Ex : since T k − | = (cid:13) p ↔ p , then T k | = p iff T k − | = p . Bythe induction hypothesis, T k − | = p iff M | = (cid:104) p (cid:105) k − . Since (cid:104) p (cid:105) k − = p = (cid:104) p (cid:105) k , T k | = p iff M | = (cid:104) p (cid:105) k .ii. p ∈ Ed : since T k − | = (cid:13) p ↔ Pr ( p ) ∨ ( p ∧ ¬ Cn ( p )) , T k | = p iff T k − | = Pr ( p ) ∨ ( p ∧ ¬ Cn ( p )) . By the induction hypothesis, thelatter assertion holds iff M | = (cid:104) Pr ( p ) ∨ ( p ∧ ¬ Cn ( p )) (cid:105) k − . By Def-inition 10, (cid:104) S (cid:105) n contains p k ↔ (cid:104) Pr ( p ) ∨ ( p ∧ ¬ Cn ( p )) (cid:105) k − , and,since M | = (cid:104) S (cid:105) n , M | = (cid:104) Pr ( p ) ∨ ( p ∧ ¬ Cn ( p )) (cid:105) k − iff M | = p k .Therefore, T k | = p iff M | = p k .(b) If ϕ = ¬ ϕ or ϕ = ϕ ∨ ϕ , the thesis follows from the induction hypoth-esis, Deﬁnition 9 and the deﬁnition of | = for classical logic, like in the basecase. 19he rest of this section is devoted to establish the complexity of grounding for theencoding of a MIG. Let the size of a formula be measured in terms of the number ofits logical operators: if ϕ is a formula, || ϕ || is the number of logical operators in ϕ . If S is a set of formulae, then || S || = (cid:88) ϕ ∈ S || ϕ || . Theorem 3 (Complexity of the encoding) . Let G be a MIG, S its LTL encoding and (cid:104) S (cid:105) n the grounding of S up to time n . Then ||(cid:104) S (cid:105) n || ≤ n × || S || .Proof. First of all we note that if ϕ is a classical formula, then || ϕ || = ||(cid:104) ϕ (cid:105) k || for any k . Consequently, || p k ↔ (cid:104) Pr ( p ) ∨ ( p ∧ ¬ Cn ( p )) (cid:105) k − || = || (cid:13) p ↔ Pr ( p ) ∨ ( p ∧ ¬ Cn ( p )) || − and ||(cid:104) SSA p (cid:105) k || = || SSA p || − .Let S be the LTL encoding of a MIG G = (cid:104) At, Ex, Ed, P , C , A , I , B(cid:105) and (cid:104) S (cid:105) n its grounding up to time n .1. For each (cid:104) (cid:96) (cid:105) ∈ (cid:104) S (cid:105) n such that (cid:96) ∈ B , ||(cid:104) (cid:96) (cid:105) || = || (cid:96) || . Therefore ||(cid:104)B(cid:105) || = ||B|| .2. Beyond the literals in (cid:104)B(cid:105) , (cid:104) S (cid:105) n contains (cid:104) SSA p (cid:105) k for all p ∈ Ed and ≤ k

The system takes as input ﬁles representing MIMs as created by PathVisio , a freeopen-source biological pathway analysis software that allows one to draw biologicalpathways. The graph is displayed to the user, using colors and typefaces to distin-guish the types and initial values of atoms, which are given a default value by thesoftware tool based on “commonsense” rules. Figure 6 shows how the software hasset the variable types: lacl, lacZ, CAMP and Lactose are in bold typeface, as they areset as exogenous variables, glucose, galactosidase and repressor boxes are in normal typeface, as they are endogenous .Variables initial values are shown by use of different colors: by default, the initialvalues of all variables are unset and their names will be shown in black. Henceforth,atoms whose initial value is not set will be called free . The user is allowed to changeboth types and initial values of atoms. Figure 7 shows the graph when the user hasmodiﬁed the values of some variables: lacl, lacZ and CAMP are green , to indicatethat they are present at the start of the process (they will remain present since they areexogenous atoms). Repressor is green , as the repressor protein is supposed to be inthe cell at the start of the process. Lactose remains black since it is a free atom, aboutwhich the user is going to query the system. Initially absent variables (Galactosidaseand Glucose) are shown in red .Other parameters, such as the number of time steps, the number of modiﬁcations tomake for graph updating, queries etc. are set via the command line. The resolution engine is able to perform the following reasoning tasks.

Graph validation.

This task consists in checking whether the graph G is consistent.The temporal encoding of G is grounded to the speciﬁed time and the SAT solverPicosat is used in a straightforward way in order to check the consistence of thegrounded theory. https://github.com/PathVisio/pathvisio Graph querying.

This task consists in ﬁnding which initial values of the free atomsmake G satisfy some temporal property ϕ . It is an abductive reasoning task [23],that could be solved by use of classical algorithms for computing prime im-plicants. But we have checked that, for instance, the Kean and Tsiknis algo-rithm [25] results to be very slow even when the total number of atoms is small.However, biologists are usually only interested by the values of the free atoms.Since their number is often quite small, it is usually faster to use Picosat to solveiteratively all possible models. In other terms, all the possible combinations ofinitial values for free atoms are generated (by the formula enumerator of ﬁg-ure 5) and the SAT solver is run on each of the so-obtained initial conditions.The system, tested on graphs with up to 22 nodes and 41 relations, showed to beeffective up to roughly 16 to 20 free atoms depending on the complexity of themap.In performing this task, exogenous and endogenous atoms can be treated differ-ently: the user can either ask which values of all the free variables imply thegiven property, or else to ﬁnd out which values of the free exognenous atomsguarantee that for all values of the free endogenous ones the query holds at thegiven time. Graph updating.

Given a graph G for which a given property ϕ does not hold, thistask consists in turning G into a new graph G (cid:48) satisfying ϕ . This is the mostcomplex task, since there is a very large number of possible graphs solving theproblem. Currently, the system computes all graphs G (cid:48) that can be obtained from G by adding, removing or modifying a single relation (this step is called the graph enumerator in ﬁgure 5). Then for each G (cid:48) , graph querying on G (cid:48) and ϕ isperformed, in order to ﬁlter out those which do not satisfy ϕ . . The software tool has been tested on graphs with up to 20 atoms, 22 nodes and 41 links.In this section we show some examples of the two most complex tasks: graph queryingand graph updating. 23 .1 Graph querying

A more complex example will be considered here, i.e., a meaningful part of the mappresented in Gigure 1, the atm-chk2 metabolic pathway, which leads to cellular apop-tosis when the DNA double strand breaks. DNA double strand break ( dsb ) is a ma-jor cause of cancers, and medical and pharmaceutical research [26, 21] have shownthat dsb can occur in a cell as the result of a pathology in a metabolic pathway. Thiskind of map is used to ﬁnd the molecular determinants of tumoral response to can-cers. Molecular parameters included the metabolic pathways for repairing DNA, themetabolic pathways for apoptosis, and the metabolic pathways of cellular cycle con-trol [33, 26, 21, 28, 31]. When DNA is damaged, cellular cycle control points areactivated and can quickly kill the cell by apoptosis, or stop the cellular cycle to enableDNA repair before reproduction of cellular division. Two of these control points arethe metabolic pathways atm-chk2 and atr-chk2 [33].The graph of Figure 8 (built from the map in Figure 1) represents the metabolicpathway atm-chk2 which can lead to apoptosis in three different ways. This map in-volves 20 variables, six of which ( atm, dsb, chk2, mdm2, pml and p53 ) are exogenousand the rest endogenous. Some of these variables are proteins, others, such as dsb or apoptose , representing cell death, are conditions or states.The time required for solving graph querying problems depends on the number offree variables and time steps. The P3M solver has been called on this graph to ﬁnd outwhat would cause the cell apoptosis. It has been tested with different grounding values g , ranging from 1 to 50, and queries to ﬁnd out the initial conditions that make theatom apoptose ( g ) derivable, i.e., the conditions causing cell apoptosis at time g . Thesystem has been tested with a number of free variables ranging from 6 (only exogenousvariables are free) to 20 (all variables are set to free, thus asking the system to ﬁnd alsotheir initial values).The 3D diagram in Figure 9 plots the grounding values and the number of freevariables against the time taken by the system to solve the problem, by calling Picosat(the time taken to encode the graph into propositional logic is negligible). From thediagram, it is clear that the number of free variables is the bottleneck, as it was actuallyexpected since the time required to solve the problem is exponential in the number offree variables. Moreover, 50 time steps are overkill, most systems reaching a stablestate in less than 10 time steps.The questions asked to the system can be reﬁned, in order to ﬁnd out, for instance,how much time is required to reach apoptosis on each of the three possible ways, andwhich are the initial conditions which lead to each of them. The questions to ask are apoptose1(i) , apoptose2(i) and apoptose3(i) , for different values of i , where a query ofthe form p ( i ) means that one looks for an explanation of p being true at time step i .The answers given by the system show that: • apoptose1 can be obtained is the in the fastest way: apoptose1(2) ( apoptose1 holding at the second time step) is true if atm , dsb and p53 are present, and mdm2 is absent (the values of pml and chk2 do not matter). For i ≥ , the answer to apoptose1(i) is the same, but mdm2 does not matter any longer ( p53 mdm2 isdissociated at step 2). 24igure 8: The Molecular Interaction Map atm-chk2 • obtaining apoptose2 requires 5 time steps; atm , chk2 , dsb , p53 have to be present,and mdm2 and pml do not matter. • apoptose3 requires the same number of steps as apoptose2 but the initial condi-tions are different: atm , chk2 , dsb , and pml have to be present, while mdm2 and p53 do not matter. Figure 10 shows the map of the lac operon where the inhibition of lactose on the nega-tive regulation of the repressor to the production of galactosidase has been suppressed.So here, glucose is not produced anymore when lactose is present. The user can askthe system what modiﬁcations could be done in order to produce glucose when lactoseis present. The “correct” solution is found immediately (Figure 11), along with others.Some of these other generated solutions have no interest, such as the direct productionof glucose by genes lacZ or lacl. But the system also proposes reasonable solutions,such as that shown in Figure 12, where glucose is used to provide the inhibiting actionfor the repressor protein. When glucose is present, the production of galactosidase isstopped, while it is done when glucose is absent. However nature has chosen the moreeconomical solution, because here galactosidase would be produced as soon as glucoseis absent, which is useless if there is no lactose.

10 Conclusion

This paper presents a method to translate MIMs, representing biological systems, intoLinear Temporal Logic, and a software tool able to solve complex questions on thesegraphs. The system, though still a prototype, is able to solve quite realistic examplesof a large size.The proposed approach can be improved in different directions. On the theoret-ical side, it is worth remarking that the speed of reactions is not taken into account.27igure 11: Correct solutionFigure 12: Another interesting solutionThis limitation could be overcome by using the dual of speed (duration) and by using alogic that represents the duration of reactions. Moreover, the system relies on the “allor nothing” hypothesis: we do not represent quantities other than “absent” or “present”.As a consequence, all productions that are enabled at a given time are ﬁred simultane-ously, since they do not compete on the use of resources. Even if we have been able toefﬁciently model complex graphs with this constraint, an important step forward to beplanned is modelling a more realistic evolution of networks by taking quantities intoaccount.On the practical point of view, the possibility should be explored to avoid groundingand replacing the formula enumerator procedure of P3M by implementing a directabduction algorithm for (a suitable fragment of) LTL, as proposed in [9], or else bydirectly using temporal model checkers [13], or tools like RECAR (Recursive Exploreand Check Abstraction Reﬁnement ) [27] which allows one to solve modal satisﬁabilityproblems .Moreover, the software tool can be improved in several respects like, for instance,improving the graphical interface by enriching the number of parameters the user canchoose and making it more user friendly. 28 eferences [1] Jean-Marc Alliot, Robert Demolombe, Mart´ın Di´eguez, Luis Fari˜nas del Cerro,Gilles Favre, Jean-Charles Faye, Naji Obeid, and Olivier Sordet. Temporal logicmodeling of biological systems. In

Towards Paraconsistent Engineering , pages205–226. Springer International Publishing, 2016.[2] Jean-Marc Alliot, Robert Demolombe, Luis Fari˜nas del Cerro, Mart´ın Di´eguez,and Naji Obeid. Abductive reasoning on molecular interaction maps. In

In-teractions Between Computational Intelligence and Mathematics , pages 43–56.Springer International Publishing, 2018.[3] Jean-Marc Alliot, Mart´ın Di´eguez, and Luis Fari˜nas del Cerro. Metabolic path-ways as temporal logic programs. In Loizos Michael and Antonis Kakas, editors,

Logics in Artiﬁcial Intelligence , pages 3–17. Springer International Publishing,2016.[4] Gr´egory Batt, Delphine Ropers, Hidde de Jong, Johannes Geiselmann, Radu Ma-teescu, Michel Page, and Dominique Schneider. Validation of qualitative modelsof genetic regulatory networks by model checking: analysis of the nutritionalstress response in

Escherichia coli . In

Proceedings Thirteenth International Con-ference on Intelligent Systems for Molecular Biology 2005, Detroit, MI, USA,25-29 June 2005 , pages 19–28, 2005.[5] Armin Biere. Picosat essentials.

Journal on Satisﬁability, Boolean Modeling andComputation (JSAT) , 4:75–97, 2008.[6] D. Boˇsnaˇcki, P.A.J. Hilbers, R.S. Mans, and E.P. de Vink. Chapter 39: Modelingand analysis of biological networks with model checking. In M. Elloumi and A.Y.Zomaya, editors,

Algorithms in Computational Molecular Biology: Techniques,Approaches and Applications , volume 1 of

Wiley Series in Bioinformatics , pages915–940. Wiley, 2011.[7] Luboˇs Brim, Milan ˇCeˇska, and David ˇSafr´anek. Model Checking of BiologicalSystems. In

Formal Methods for Dynamical Systems: 13th International Schoolon Formal Methods for the Design of Computer, Communication, and SoftwareSystems, SFM 2013, Bertinoro, Italy, June 17-22, 2013. Advanced Lectures , pages63–112. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013.[8] Laurence Calzone, Franc¸ois Fages, and Sylvain Soliman. BIOCHAM: an envi-ronment for modeling biological systems and formalizing experimental knowl-edge.

Bioinformatics , 22(14):1805–1807, 2006.[9] Serenella Cerrito, Marta Cialdea Mayer, and Robert Demolombe. Temporal ab-ductive reasoning about biochemical reactions.

Journal of Applied Non-ClassicalLogics , 27(3-4):269–291, 2017.[10] Nathalie Chabrier-Rivier, Marc Chiaverini, Vincent Danos, Franc¸ois Fages, andVincent Sch¨achter. Modeling and querying biomolecular interaction networks.

Theoretical Computer Science , 325(1):25–44, 2004.2911] Nathalie Chabrier-Rivier, Francois Fages, and Sylvain Soliman. The BiochemicalAbstract Machine BIOCHAM. In Vincent Danos and Vincent Sch¨achter, editors,

CMSB’04: Proceedings of the second Workshop on Computational Methods inSystems Biology , volume 3082, pages 172–191, Paris, 2004. Springer-Verlag.[12] Federica Ciocchetta and Jane Hillston. Bio-pepa: An extension of the processalgebra pepa for biochemical networks.

Electron. Notes Theor. Comput. Sci. ,194(3):103–117, 2008.[13] Edmund Clarke, Orna Grumberg, Somesh Jha, Yuan Lu, and Helmut Veith.Counterexample-guided abstraction reﬁnement for symbolic model checking.

Journal of the ACM , 50(5):752–794, 2003.[14] R. Demolombe, L. Fari˜nas del Cerro, and N. Obeid. Automated reasoning inmetabolic networks with inhibition. In th International Conference of the Ital-ian Association for Artiﬁcial Intelligence, (AI*IA’13) , pages 37–47, Turin, Italy,2013.[15] R. Demolombe, L. Fari˜nas del Cerro, and N. Obeid. Translation of ﬁrst orderformulas into ground formulas via a completion theory.

Journal of Applied Logic ,15:130–149, 2016.[16] Robert Demolombe, Luis Fari˜nas del Cerro, and Naji Obeid. A logical model formolecular interaction maps. In Fari˜nas and Inoue [19], chapter 3, pages 93–123.[17] Francois Fages and Sylvain Soliman. Formal Cell Biology in Biocham. In

For-mal Methods for Computational Systems Biology: 8th International School onFormal Methods for the Design of Computer, Communication, and Software Sys-tems, SFM 2008 Bertinoro, Italy, June 2-7, 2008 Advanced Lectures , pages 54–80. Springer Verlag, Berlin, Heidelberg, 2008.[18] Francois Fages, Sylvain Soliman, and Nathalie Chabrier-rivier. Modelling andquerying interaction networks in the biochemical abstract machine biocham.

Journal of Biological Physics and Chemistry , 4:64–73, 2004.[19] L. Fari˜nas and K. Inoue, editors.

Logical Modeling of Biological Systems . JohnWiley & Sons, 2014.[20] Fisher Jasmin and Henzinger Thomas A. Executable cell biology.

Nat Biotech ,25(11):1239–1249, nov 2007.[21] V. Glorian, G. Maillot, S. Poles, J. S. Iacovoni, G. Favre, and S. Vagner. HuR-dependent loading of miRNA RISC to the mRNA encoding the Ras-related smallGTPase RhoB controls its translation during UV-induced apoptosis.

CCell Deathand Differentiation , 18(11):1692–1701, 2011.[22] M. Hucka, H. Bolouri, A. Finney, H. M. Sauro, J. C. Doyle, and H. Kitano. Thesystems biology markup language (SBML): A medium for representation andexchange of biochemical network models.

Bioinformatics , 19:524–531, 2003.3023] Katsumi Inoue. Linear resolution for consequence ﬁnding.

Artiﬁcial Intelligence ,56(2):301 – 353, 1992.[24] F. Jacob and J. Monod. Genetic regulatory mechanisms in the synthesis of pro-teins.

Journal of Molecular Biology , 3:318–356, 1961.[25] A. Kean and G. Tsiknis. An incremental method for generating prime impli-cants/implicates.

Journal of Symbolic Computing , 9:185–206, 1990.[26] K. W. Kohn and Y. Pommier. Molecular interaction map of the p53 and Mdm2logic elements, which control the off-on swith of p53 response to DNA damage.

Biochemical and Biophysical Research Communications , 331(3):816–827, 2005.[27] Jean-Marie Lagniez, Daniel Le Berre, Tiago de Lima, and Valentin Montmirail.A recursive shortcut for CEGAR: Application to the modal logic K satisﬁabilityproblem. In

Proceedings of the Twenty-Sixth International Joint Conference onArtiﬁcial Intelligence (IJCAI-17) , pages 674–680, 2017.[28] W. J. Lee, D. U. Kim, M. Y. Lee, and K. Y. Choi. Identiﬁcation of proteins inter-acting with the catalytic subunit of PP2A by proteomics.

Proteomics , 7(2):206–214, 2007.[29] Xavier Leroy, Damien Doligez, Alain Frisch, Jacques Garrigue, Didier R´emy, andJ´erˆome Vouillon.

The OCaml System, Documentation and user’s manual . InstitutNational de Recherche en Informatique et en Automatique, 2017.[30] Alida Palmisano and Corrado Priami. Bio-PEPA. In

Encyclopedia of SystemsBiology , pages 145–146. Springer New York, New York, NY, 2013.[31] H. Pei, L. Zhang, K. Luo, Y Qin, M. Chesi, F Fei, P. L. Bergsagel, Wang L.,Z. You, and Z. Lou. MMSET regulates histone H4K20 methylation and 53BP1accumulation at DNA damage sites.

Nature , 470(7332):124–128, 2011.[32] A. Pnueli. The temporal logic of programs. In

Proc. of the 18 th Annual Symposiumon Foundations of Computer Science , pages 46–57, Providence, Rhode Island,USA, 1977.[33] Y. Pommier, O. Sordet, V. A. Rao, H. Zhang, and K.W. Kohn. Targeting chk2kinase: molecular interaction maps and therapeutic rationale.

Current Pharma-ceutical Design , 11(22):2855–2872, 2005.[34] Raymond Reiter.

Knowledge in Action: Logical Foundations for Specifying andImplementing Dynamical Systems . MIT Press, 2001.[35] Mara Sangiovanni.

Model Checking of Metabolic Networks: Application toMetabolic Diseases . PhD thesis, Federico II University of Naples., 2014.[36] Jetse Scholma, Stefano Schivo, Ricardo A. Urquidi Camacho, Jaco van de Pol,Marcel Karperien, and Janine N. Post. Biological networks 101: Computationalmodeling for molecular biologists.