Computational Logic for Biomedicine and Neurosciences
Elisabetta de Maria, Joelle Despeyroux, Amy Felty, Pietro Liò, Carlos Olarte, Abdorrahim Bahrami
CComputational Logic for Biomedicine andNeuroscience
Elisabetta De Maria , Jo¨elle Despeyroux , Amy Felty , Pietro Li`o , CarlosOlarte , and Abdorrahim Bahrami I3S Laboratory, Sophia-Antipolis, France, INRIA and CNRS, I3S Laboratory, Sophia-Antipolis, France School of Electrical Engineering and Comp. Science, University of Ottawa, Canada Department of Computer Science and technology, University of Cambridge, UK ECT – Universidade Federal do Rio Grande do Norte, Brazil
Abstract.
We advocate here the use of computational logic for systemsbiology, as a unified and safe framework well suited for both model-ing the dynamic behaviour of biological systems, expressing propertiesof them, and verifying these properties. The potential candidate logicsshould have a traditional proof theoretic pedigree (including either in-duction, or a sequent calculus presentation enjoying cut-elimination andfocusing), and should come with certified proof tools. Beyond providinga reliable framework, this allows the correct encodings of our biologicalsystems. For systems biology in general and biomedicine in particular, wehave so far, for the modeling part, three candidate logics: all based on lin-ear logic. The studied properties and their proofs are formalized in a veryexpressive (non linear) inductive logic: the Calculus of Inductive Con-structions (CIC). The examples we have considered so far are relativelysimple ones; however, all coming with formal semi-automatic proofs inthe Coq system, which implements CIC. In neuroscience, we are directlyusing CIC and Coq, to model neurons and some simple neuronal circuitsand prove some of their dynamic properties. In biomedicine, the studyof multi omic pathway interactions, together with clinical and electronichealth record data should help in drug discovery and disease diagnosis.Future work includes using more automatic provers. This should enableus to specify and study more realistic examples, and in the long term toprovide a system for disease diagnosis and therapy prognosis.
To formally model biological systems, several approaches have been proposedin the literature. Discrete and hybrid Petri nets [16, 42] are directed-bipartitegraphs composed of places and transitions. In π -calculus [67] and its stochas-tic variant [64], processes communicate on complementary channels identifiedby specific names. Bio-ambients [66] are based on bounded places where pro-cesses are contained and where communications take place. Hybrid automata [1]combine finite state automata with continuously evolving variables. Piecewiselinear equations [46] capture the switch-like interactions between components. a r X i v : . [ q - b i o . Q M ] O c t ule-based modeling languages such as Biocham [33, 34], Kappa [13, 24], andBioNetGen [10] describe how (sets of) reactants can be transformed into (sets of)products, and associate corresponding rate-laws. They offer discrete, stochastic,or continuous semantics. To express the dynamics of the different components,ordinary and stochastic differential equations are massively used too. AnswerSet Programming (ASP) is an unusual constraint logic programming approachenabling the modeling and study of large-scale biological systems. MolecularLogic [75] uses boolean logic gates to define regulations in networks. Finally,one of the most successful approaches to model and analyse signal transductionnetworks, inside the cells, is Pathway Logic [72]: a system based on rewritingrules.One of the most common approaches to the formal verification of biologicalsystems is symbolic model checking [18], which exhaustively, but possibly byimplicit boolean formulas, enumerates all the states reachable by the system.In order to apply such a technique, the biological system should be encodedas a finite transition system and relevant system properties should be specifiedusing temporal logic. Other approaches simply use simulators. In addition toits simulator, Kappa offers powerful static -and causal- analysis of the models[12, 20–22].In contrast to the aforementioned approaches, our approach is an unified ap-proach, in which logic is used both to model biological systems, express theirtemporal properties and prove these properties. Depending on the case, the logicschosen may be different. However, they are always highly expressive, computa-tional logics that can be used as logical frameworks .A computational logic is a logic that enables proofs on the computer, i.e.mechanized reasoning: one of the foundations of Artificial Intelligence (AI). For-mal logic emerged from philosophy in the 19th century, in an attempt to un-derstand and formalize mathematical reasoning. The field was then further de-veloped by mathematicians and more recently by computer scientists. Variouslogics have been proposed. From propositional logic to linear or inductive logics,the more expressive the logic, the less automated proofs can be. Modern proofassistants, however, enable partially automated proofs. Logical frameworks are logics designed to formally study a variety of systems,such as transition systems, semantics of programming languages, mathematicaltheories, or even logics. Logical frameworks allow both the formal modeling ofthese systems and the proof of properties of the systems at hand. In the casewhere the logical systems are themselves logics, the logical framework enablesthe proofs of both meta-theoretical theorems (about the logic being formalised)and object level theorems (about the systems being encoded in the formalisedlogic). We shall follow this approach (sometimes called “the two-level approach”)in the first part of this chapter (Sec. 2).We advocate the use of computational logic as an unified and safe approachto both specifying and analysing biological systems. Furthermore, as we shall seein Sec. 2, our approach is general and can be used to model biological networksboth inside and outside the cell.his chapter presents two applications areas of our approach: biomedicine(Sec. 2) and neuroscience (Sec. 3). The logics used for modeling the systems willbe different in these two areas. On the other hand, the properties will be writtenand the proofs realized in the same logic.In the first part of this chapter (Sec. 2), devoted to biomedicine, biologicalsystems will be modeled in linear logic (see Sec. 2.2): a logic well suited for thestudying of discrete state transition systems.In the second part of the chapter (Sec. 3), devoted to an application to neu-roscience, neurons (and sets of neurons) will be modeled directly in the Calculusof Inductive Constructions (CIC): a general, typed and inductive logic. Moreprecisely, neurons will be described in the Coq system, used here as a high-leveltyped functional language which proves to be well suited to the description ofneurons, their potential function and their combinations. Coq includes a pro-gramming language (Gallina). Coq is also, above all, a proof assistant [9], whichimplements CIC.In both cases, the properties will be written and the proofs realized in thesame logic: CIC. However, the properties will be written using CIC in a differentway. In the first case (biomedicine, Sec. 2), CIC and LL will be used in a two-level approach - as suggested above and described later (see Sec. 2.3). In thesecond case (neurosciences, Sec. 3), CIC will be used directly. We shall use Coqto prove the properties of our systems in the two applications areas presented inthis chapter.The approach presented in this chapter is new, only proposed in four recentworks: on the one hand [29, 52, 62] in biomedicine and on the other hand [7] inneuroscience. Here we choose discrete modeling. We believe that discrete model-ing is crucial in systems biology since it allows taking into account some eventsthat have a very low chance of happening (and could thus be neglected by differ-ential approaches), but which may have a strong impact on system behaviour.The present chapter is made of two different models: metastatic breast cancerand neural archetypes. These together make a good exemplification of modelingdifferent aspects of biological complexity: cancer is about tissue cells becomingindependent and neuron is a cell that builds circuits with other neurons. Fig. 1might help the reader understanding the structure of the chapter.
In this section, we present the use of logical frameworks, and more preciselylinear logic, to model and analyse biological systems in the biomedicine area.We focus here on metastatic breast cancer [29], which is representative of ourapproach.We shall see that, in contrast to usual mathematical approaches, which usedifferencial equations to define, typically, an average number of cells, in linearlogic, it is more natural (but not required), to define single cell models.
Use of driver and non driver mutations and metabolic modeling
Specification of CTCs in Linear Logic
Formulation of a cancer model based on 3 compartments
Formal modeling of circulating tumour cells in breast, blood, and bone compartments Use of FATHMM-MK Analysis of the Cosmic database Encoding in Coq the following: simple serie of neurons; serie with multiple output; parallel composition; negative loop; inhibition of a behaviour; contralateral inhibition Computer gene enrichment analysis, focusing on gene and disease ontology datasets
Metastatic breast cancer Description of basic neural archetypes and their encoding
Neuronal archetypes
Dalayer effet of a neuron, filter effet, inhibition Reachability property: an extravasating CTC; existence of cycles and a ("méta level") structural property
Selection of biological processes: effects of mutations and selections (cancer); effects of cell composition (neuron)
Opportunity of data integration (images et multiomic data), and inlcude survival and relapse
Properties of archetypes and their proofs in Coq
Opportunities for formal modeling of compositions of archétypes
Summary of key points on neuronal archetype
Verifying properties of the model
Summary of key points
Conclusions
Fig. 1.
Contents of the chapter. .1 Introduction
Cancer is a complex evolutionary phenomenon, characterised by multiple levelsof heterogeneity (interpatient, intrapatient and intratumour), multiscale events(i.e. changes at intracellular, intercellular, tissue levels), multiomics variability(i.e. changes to chromatine, epigenetic and transcriptomic levels) that affect allaspects of clinical decisions and practice. The most remarkable phenomenonis the occurrence of numerous somatic mutations, of which only a subset con-tributes to cancer progression. The dynamic genetic diversity, coupled with epi-genetic plasticity, within each individual cancer induces new genetic architecturesand clonal evolutionary trajectories.A striking feature of cancer is the subclonal genetic diversity, i.e., the pres-ence of clonal succession and spatial segregation of subclones in primary sitesand metastases. At the root of the subclonal expansion there are developmen-tally regulated potentially self-renewing cells. After initiation, multiple subclonesoften coexist, signalling parallel evolution with no selective sweep or clear fitnessadvantage that becomes evident with therapies. Fitness calculation for each sub-clonal is difficult as subclones can form ecosystems and can cooperate throughparacrine loops or interact through stromal, endothelial and inflammatory cells.Tumour stages describe the progress of the tumour cells. One widely adoptedapproach is the American Joint Committee on Cancer (AJCC) tumour nodemetastasis (TNM) staging system. It classifies tumours with a combined stagebetween I-IV using three values: T gives the size of the primary tumour andextent of invasion, N describes if the tumour has spread to regional lymph nodesand M is indicative of distant metastasis. In terms of prognosis, Stage I patientshave the best prognosis with 5 year survival rates (80-95%). The survival ratesprogressively worsen with each stage. Even with advances in targeted therapies,Stage IV patients have survival rates of just over two years. The integrationof blood tests, biopsies, medical imaging, with genomic data have allowed theclassification of many subtypes of cancers with striking differences of driver mu-tations and survival patterns. Therefore, the current data stream goals for apersonalised breast cancer program should include the generation of tumourwhole genome sequencing (DNA and RNA). Another data stream goal could fo-cus on liquid biopsies. This will consist of data obtained from circulating tumourDNA (ctDNA) and single cell analysis. However, the full power of these datasetswill not be realised until we leverage advanced statistical, mathematical andcomputational approaches to devise the needed procedures to conduct analysesthat transect these streams. This condition clearly depends on the availabilityof a large amount of good quality data.There is a very rich body of biomedical statistics, machine learning and epi-demiological literature for cancer data analysis which includes methods rangingfrom survival analysis, i.e., the effect of a risk factor or treatment with respectto cancer progression, analyses of co-alteration and mutual exclusivity patternsfor genetic alterations, gene expression analyses, to network science algorithms(see e.g., [5, 8, 14, 32, 37, 48, 70]). For example, in survival modeling, the data isreferred to as the time to event date and the objective is to analyse the timehat passes before an event occurs due to one or more covariates [43, 44]. Webelieve that together with machine learning and biostatistics, there is a role fora logical approach in guiding optimal treatment decisions and in developing arisk stratification and monitoring tool to manage cancer.Algorithms and benchmarks developed in cancer medicine need to go throughstages of standardisation and validations to become actionable software and beused in worldwide clinical settings. This is a long multi-phase trajectory. Logichas the required interpretability, explanability, compositional and effectiveness tointegrate data and protocols at different biomedical scales (single cell to humanlife style) in a monitorable and rigorous way.In this work, we focus on the use of a formal logical framework to providea reliable hypothesis-driven decision making system based on molecular data(single cell level). This first step is important in the investigation of the cancerto justify treatment.The rest of this section is organised as follows. Section 2.2 introduces Lin-ear Logic. Section 2.3 presents the “two-level” modeling approach we advocate.Section 2.4 first describes some relevant properties related to cancer mutationswhich we believe are key factors driving the model dynamics. Then we presentour model of breast cancer progression. The formal proofs of some propertiesof our model are presented in Sec. 2.5. Finally we conclude with a discussionon challenges and opportunities of logical frameworks in cancer studies, and inbiomedicine in general. The reader may find the proofs of the results presentedhere in [29]. Moreover, all the proofs of the properties of our model were certifiedin Coq and are available at http://subsell.logic.at/bio-CTC/ . Among the many frameworks that have been proposed in the literature, linearlogic [38] (LL) is one of the most successful ones. This is mainly because LLis resource conscious and, at the same time, it can internalise classical and intuitionistic behaviours (see, for example, [15, 56]).
Classical logics, in whichthe law of excluded middle (which states that for any proposition, either thatproposition is true or its negation is true) is valid, are natural to mathematicians,while computer scientists - or mathematicians - doing mechanized proofs (on thecomputer) generally prefer intuitionistic logics. In intuitionistic logic, the law ofexcluded middle is not valid. Moreover, any proof of the existence of a term x satisfying some property P ( x ) must produce a witness for it (i.e. a t termsuch as P ( t ) is true). Hence, LL allows for a declarative and straightforwardspecification of transition systems as we shall see in brief. In fact, LL has beensuccessfully used to model such diverse systems as Petri nets, process calculi,security protocols, multiset rewriting, graph traversal algorithms, and games.LL is general enough for specifying and verifying properties of a large numberof systems. However, in some cases, the object-level system (the one we aremodeling) exhibits some characteristics that may pose difficulties in the modelingtask. For instance, in a transition system, we may be interested in specifyingconstraints on the timing for a transition to happen or, it could be also the casehat transitions are constrained according to the spatial location of the objects.Although those characteristics can be indeed modelled in LL (see [17, 30]), wemay ask whether there are suitable extensions of the framework offering a morenatural representation for those constraints/modalities.Extensions of LL, or its intuitionistic version ILL [38], have been proposedin order to fill this gap, hence providing stronger logical frameworks that pre-serve the elegant properties of linear logic as the underlying logic. Two suchextensions are HyLL (Hybrid Linear Logic) [28], an extension of ILL, and SELL(Subexponential Linear Logic) [23,59], an extension of ILL and LL. These logicshave been extensively used for specifying systems that exhibit modalities suchas temporal or spatial ones. The difference between HyLL and SELL relies onthe way those modalities are handled.In the following, we shall introduce LL and its extensions HyLL and SELL inan intuitive way and with the level of detail needed to understand the applica-tions in Sec. 2.4. The reader may find in [28,38,59,73] a more detailed account onthe proof theory of those systems. For didactic purposes, we shall consider a setof states S and transitions of the form s −→ s (cid:48) where s, s (cid:48) ∈ S . Incrementally, wewill add more structure to the states, e.g., we can consider that s and s (cid:48) aboveare multisets. Moreover, we will impose (temporal and spatial) constraints onthe transitions in order to introduce the extensions of LL. Linear logic
Modern logics, especially computational logics, are defined by theset of their formulas , the form of their sequents ( “judgements” stating that aformula is true under a set of hypotheses), and the inference rules that explainhow to build proofs in the logic at hand.In Intuitionistic Linear Logic (ILL), formulas are defined by the followinggrammar (LL’s formulas include those of ILL, as well as other formulas that wewill not use in this chapter): – Terms: t, ... ::= c | x | f ( (cid:126)t ) – Propositions:
A, B, ... ::= p ( (cid:126)t ) | A −◦ B | A ⊗ B | | A & B | (cid:62) | A ⊕ B | ! A | ∀ x.A | ∃ x.A A term t is a constant c , a variable x , or a function of one or more terms f ( (cid:126)t ). The simplest form of a formula/ proposition is p ( (cid:126)t ). For example, proteins P53 , phosphorylated
MAPK , and the complex (
TGF β, LTBP1 ) may be represented,respectively, by the terms
P53 , ph ( MAPK ) and complex ( TGF β, LTBP1 ) . Concen-tration, or the presence or absence of an element, will be represented, not byterms, but by propositions -which may be true or false. For example: C ( P53 , . ), pres ( x ), abs ( y ).One of the main features of LL is the distinction between a proposition alwaystrue - stable truth - (example: “Socrates is a man”) and a proposition considered asa resource (“I have one dollar”). Any proposition A is a resource; ! A representsa resource that can be used/consumed an arbitrary number of times - i.e. astable truth. Therefore, ” A −◦ B ” means ”give me one A and I will give you one ”, while ”(! A ) −◦ B ” represents the usual implication ”B is true whenever Ais”. This awareness of resources leads to the existence of two versions of someconstructors. Thus LL has two forms of ”and”, denoted ⊗ and &. In A ⊗ B and A & B , both resources/actions ( A and B ) are available/possible. However, in A ⊗ B , both actions will be carried out, while in A & B , only one of the twoactions will be performed. In our applications, a set of two elements/molecules A and B in a given state will be represented by A ⊗ B , while a system containingtwo biological rules r and r will be represented, in an asynchronous semantics(where only one rule can be fired at a time), by r & r . Each of these conjunctionshas its neutral element: for ⊗ and (cid:62) for &. The connective ⊕ represents an”or”. Its neutral element is . ILL has only one version of the “or”; while LL(a perfectly symmetric logic) has two. Finally, LL is a first-order logic , whoseformulas therefore include universal and existential quantifiers on variables: ∀ x.A and ∃ x.A .The simplest form of a sequent is Γ (cid:96) G where G is the formula (goal) to beproved (examples in Sec. 2.5), with Γ being a set of hypotheses (also formulas).LL is a substructural logic where there is an explicit control over the numberof times a formula can be used in a proof. So-called “Structural rules”, suchas contraction (which ignores the number of occurrences of hypotheses in Γ )and weakening (which allows a hypothesis to be ignored), usual in (non-linear)logic are not available here. Formulas in LL can be split into two sets: classical(those that can be used as many times as needed) or linear (those that are con-sumed after being used). Using a dyadic system for LL, sequents take the form Γ ; ∆ (cid:96) G where G is the formula (goal) to be proved, Γ is the set of classical for-mulas/hypotheses and ∆ is the multiset of linear formulas/hypotheses. A proofof a Γ ; ∆ (cid:96) G judgment may or may not use the classical assumptions (in Γ ).It must, on the other hand, use/consume all the resources in the linear context( ∆ ). We will use a focused system [3] here. By building what might be called anormal form of the proofs, such a system reduces the non-determinism duringautomatic proof search. For that, we will decorate some of the formulas witharrows (see ⇓ A below) to unequivocally determine the next connective/formulathat needs to be considered.In the linear context, classical formulas are marked with the exponentialmodality !, whose left introduction rule (reading the inference rule from top tobottom, i.e. from the premises to the conclusion) is as follows: Γ, F ; ∆ (cid:96) GΓ ; ∆, ! F (cid:96) G ! L This rule (reading from the conclusion to the premises, as usually done in proof-search procedures) simply stores the formula F in the classical context Γ .In our applications, we shall store in Γ the formulas representing the rules ofthe system and in ∆ the atomic predicates (that can be produced and consumed)representing -a sub-part of- the state of the system.For the moment, let us assume that S is a finite set of states, and eachelement s ∈ S will be represented as an atomic proposition in LL also denoteds s . A transition rule of the form s −→ s (cid:48) can be naturally specified as thelinear implication s −◦ s (cid:48) where, s is consumed to later produce s (cid:48) . The rules forthis connective are: Γ ; ∆ (cid:96)⇓ F Γ ; ∆ , ⇓ F (cid:96) GΓ ; ∆ , ∆ , ⇓ F −◦ F (cid:96) G −◦ L Γ ; ∆, F (cid:96) F Γ ; ∆ (cid:96) F −◦ F −◦ R A left introduction rule, as −◦ L above, explains how to use a formula (here ⇓ F −◦ F ) in a proof (here simply a derivation tree ), that is built bottom up.A right introduction rule, as −◦ R , describes how to prove a formula. Such a setof rules defines the top operator (here −◦ ) of the given formula.In −◦ R , the proof of F −◦ F requires (in addition to the resources in ∆ andpossibly assumptions in Γ ) the use of the resource F to conclude F . This rule isinvertible (i.e., the premise is provable if and only if the conclusion is provable).Hence, this rule belongs to the so-called negative phase of the construction of afocused proof, where, without losing provability, we can apply all the invertiblerules in any order.The rule −◦ L shows the resource awareness of the logic: part of the context( ∆ ) is used to prove F and the remaining resources ( ∆ ) must be used (inaddition to F ) to prove G . The classical context Γ is not divided but copiedin the premises. The rule −◦ L is non-invertible and then, it belongs to the so-called positive phase. This means that there may be several ways to prove theconclusion: for example by ”decomposing” the top operator of the goal G , insteadof decomposing the implication ( −◦ ) in the resources/assumptions of the sequent-conclusion. Therefore, if we decide to work/focus on that formula, it may be thecase that we have to backtrack if the proof cannot be completed. The notation ⇓ F −◦ F (read ⇓ ( F −◦ F )) means that we decided to focus on that formulaand then, we have to keep working on the sub-formulae F and F (notation ⇓ F and ⇓ F ). During the negative phase as during the positive phase, thecontext (linear or classical) can have zero, one or more focused formulas. In anycase, one can only focus on one formula.As an example, in Γ above, we can store formulas of the shape s −◦ s (cid:48) spec-ifying the transition rules of the system (that can be used as many times asneeded). Assuming that all the negative connectives have been already intro-duced, the following derivation shows how to focus on / select a formula s −◦ s (cid:48) stored in Γ . This is the purpose of the decision rule D C : Γ, s −◦ s (cid:48) ; ∆, ⇓ s −◦ s (cid:48) (cid:96) GΓ, s −◦ s (cid:48) ; ∆ (cid:96) G D C Note that the rule D C creates a copy of the formula s −◦ s (cid:48) and places it inthe linear context.If the current (partial) state ∆ contains the formula s , the proof can proceedby applying rule −◦ L , meaning applying the biological rule s −◦ s (cid:48) : Γ, s −◦ s (cid:48) ; s (cid:96)⇓ s I Γ, s −◦ s (cid:48) ; ∆, ⇓ s (cid:48) (cid:96) GΓ, s −◦ s (cid:48) ; s, ∆, ⇓ s −◦ s (cid:48) (cid:96) G −◦ L Γ, s −◦ s (cid:48) ; s, ∆ (cid:96) G D C is the initial rule that allows us to prove an atomic proposition if such atomicproposition is the unique formula in the linear context (alternatively, the atomicproposition can be in the classical context Γ and the linear context must beempty). When looking at this derivation, bottom up, we observe that the multiset { s, ∆ } was transformed into { ∆, ⇓ s (cid:48) } , thus reflecting (almost completely) thetransition ( s −◦ s (cid:48) ) in the biological system (the ”focus” on s (cid:48) remains to beremoved). We will see this in more detail, considering a more general form ofbiological rule.Usually, in biochemical systems, the state of the system is composed of a(multi) set of components. Rules describe how one or more reactants are con-sumed in order to produce some other components. For instance, a typical rulemay be ”cdk46 + cycD −→ cdk46-cycD” representing cdk binding to a cyclin, inthe cell cycle.In Linear Logic, the transition above will be represented by the formula“cdk46 ⊗ cycD −◦ cdk46-cycD”. This representation requires the use of the con-junction ⊗ and the following logical rules defining it: Γ ; ∆, F , F (cid:96) GΓ ; ∆, F ⊗ F (cid:96) G ⊗ L Γ ; ∆ (cid:96)⇓ F Γ ; ∆ (cid:96)⇓ F Γ ; ∆ , ∆ (cid:96)⇓ F ⊗ F ⊗ R The rule ⊗ R belongs to the positive phase and it says that the proof of F ⊗ F requires the linear context to be split in order to prove both F and F . The left rule ( ⊗ L ) belongs to the negative phase and the resource F ⊗ F issimply transformed into two resources ( F and F ). For instance, assuming that A, B, C, D are propositional variables and letting F = ( A ⊗ B ) −◦ ( C ⊗ D ) and F ∈ Γ , we obtain the following derivation: Γ ; A (cid:96)⇓ A I Γ ; B (cid:96)⇓ B IΓ ; A, B (cid:96)⇓ ( A ⊗ B ) ⊗ R Γ ; ∆, C, D (cid:96) GΓ ; ∆, C ⊗ D (cid:96) G ⊗ L Γ ; ∆, ⇓ ( C ⊗ D ) (cid:96) G RΓ ; ∆, A, B, ⇓ F (cid:96) G −◦ L Γ ; ∆, A, B (cid:96) G D C Γ ; ∆, A ⊗ B (cid:96) G ⊗ L Γ ; ∆ (cid:96) ( A ⊗ B ) −◦ G −◦ R Here we have introduced a new rule. The rule R , for release, means that thefocused phase is finished since the formula C ⊗ D (on the left) belongs to thenegative phase. When looking at this derivation, bottom up, we observe that in aswitch of the polarity of the proof (a complete positive phase followed by a nega-tive one) the multiset of components { ∆, A, B } was transformed into { ∆, C, D } ,thus reflecting the transition ( F ) in the biological system. This derivation is apartial proof, in which it remains to prove the goal G , with (the assumptions in Γ if needed and) the new resources { ∆, C, D } .The attentive reader will have noticed that we have not given the definitionof all the constructors of ILL. Some inference rules -not necessary for the un-derstanding of the applications presented in this chapter- have been deliberatelyomitted in order to simplify the presentation of the logic. few words on syntax versus semantics. We said that the constructors of alogic were defined by their introduction/elimination rules. This is correct, in a syntactic approach/definition of logic. The syntactic approach is the usual ap-proach in computational logic: computers only understand syntax. Nevertheless,even in computational logic, it can be interesting to define the semantics of alogic, to better understand her. This only makes sense for a very expressive logic.Otherwise, semantics are trivial, like, typically, truth tables for propositionallogic: ” A ∧ B is true if A is true and B is true” seems like a truism (”Lapalis-sade”)... LL thus has a semantics that represents proofs by hypergraphs: proofnets. Proof nets allow, notably, to identify two derivations syntactically different,although ”morally”/semantically identical (e.g. except for rule permutations). Adding modalities
One may want to consider adding the time needed fora transition to happen. Let us then consider a transition of the form s d −→ s (cid:48) which means that the state s is transformed into s (cid:48) in d time-units. For the sakeof simplicity, we consider here that d is a constant; however, it can also be afunction (examples in Sec. 2.4). We may add to the model a unary predicate t ( · )denoting the current time-unit and then, the transition above can be modelledby the following formula ∀ n, ( t ( n ) ⊗ s ) −◦ ( t ( n + d ) ⊗ s (cid:48) ) (1)The rules for the universal quantifier are the following Γ ; ⇓ F [ t/x ] (cid:96) GΓ ; ⇓ ∀ x.F (cid:96) G ∀ L Γ ; ∆ (cid:96) G [ x e /x ] Γ ; ∆ (cid:96) ∀ x.G ∀ R where x e is a fresh variable (not occurring else where). Note that ∀ L belongs tothe positive phase, where a term t must be provided to continue the derivation.On the other side, ∀ R belongs to the negative phase and x is simply replacedby a fresh variable. Let F be the formula (1), it is easy to complete the missingsteps in the derivation below: Γ ; ∆, s (cid:48) , t ( x + d ) (cid:96) GΓ ; ∆, s, t ( x ) , ⇓ F (cid:96) GΓ ; ∆, s, t ( x ) (cid:96) G D C The double bars in the above derivation represents the whole positive phase(decomposing the connectives in the rules ∀ L , −◦ L and ⊗ R ) followed by thenegative phase (loosing focusing on the rule ⊗ L ). Again, note that a focusedstep of the proof reflects exactly a transition in the system. Adequacy
In the following results, we shall use [[ s ]] x to denote the formula s ⊗ t ( x ) (i.e., the state s at time-unit x ). We shall also use s ( r,d ) −→ s (cid:48) to denotehat the system may evolve from state s to state s (cid:48) by applying the rule r thattakes d time-units. Hence, S s = { ( s (cid:48) , r, d ) | s ( r,d ) −→ s (cid:48) } represents the set of possibletransitions starting from s . Moreover, with system we denote the encoding ofthe transition rules of the system. We can show that all transitions in S s matchexactly one focused derivation of the encoded system. More precisely, Theorem 1 (Adequacy).
Let s be a state and S s = { ( s (cid:48) , r, d ) | s ( r,d ) −→ s (cid:48) } .Then, ( s (cid:48) , r, d ) ∈ S s iff focusing on the encoding of r leads to the followingderivation. system ; [[ s (cid:48) ]] t + d (cid:96) G system ; [[ s ]] t (cid:96) G Moreover, let s, s (cid:48) be two states. Then s ( r,d ) −→ s (cid:48) iff the sequent below is provable system ; · (cid:96) [[ s ]] t −◦ [[ s (cid:48) ]] t + d The above results allow us to use the whole positive-negative phase as macrorules in the logical system. Formally, we can show that the corresponding logicalrule is admissible in the system, i.e., if the premise is provable then the conclusionis also provable.
Corollary 1 (Macro rules).
Assume that s ( r,d ) −→ s (cid:48) . Then, the following macrorule is admissible: system ; ∆, [[ s (cid:48) ]] t + d (cid:96) G system ; ∆, [[ s ]] t (cid:96) G r
Modalities as worlds
This section and the next one may be overlooked ata first reading. We present there two extensions of Linear Logic that allow in-teresting modelings of modalities. Indeed, these logics have been used to modeland analyze small biological systems in two initial experiments [52, 62]. How-ever, these modal extensions are not necessary for understanding the rest of thechapter.Hybrid Linear Logic (HyLL) is a conservative extension of ILL where thetruth judgements are labelled by worlds representing constraints on states andstate transitions. Judgements of HyLL are of the form “ A is true at world w ”,abbreviated as A @ w . Particular choices of worlds produce particular instancesof HyLL, e.g., A @ t can be interpreted as “ A is true at time t ”. HyLL wasfirst proposed in [28] and it has been used as a logical framework for specifyingmodalities as well as biological systems [52].Worlds are given meaning via a constraint domain W , which is a monoidstructure (cid:104) W, ., ι (cid:105) and the reachability relation on worlds (cid:22) : W × W is definedas u (cid:22) w iff there exists v ∈ W such that u.v = w . The identity world ι is the (cid:22) -initial and it is intended to represent the lack of any constraints. Thus, theordinary first-order ILL can be embedded into any instance of HyLL by settingll world labels to the identity. A typical example of a constraint domain is T = (cid:104) IN , + , (cid:105) , representing instants of time.Formulas in HyLL are constructed from atomic propositions, connectives ofILL and the following two hybrid connectives: satisfaction ( at ), which statesthat a proposition is true at a given world and localisation ( ↓ ), which binds aname for the current world where the proposition is true. The rules of theseconnectives are: Γ ; ∆, A @ u (cid:96) C @ wΓ ; ∆, ( A at u ) @ v (cid:96) C @ w [ at L ] Γ ; ∆ (cid:96) A @ uΓ ; ∆ (cid:96) ( A at u ) @ w [ at R ] Γ ; ∆, A [ v/u ] @ v (cid:96) C @ wΓ ; ∆, ↓ u.A @ v (cid:96) C @ w [ ↓ L ] Γ ; ∆ (cid:96) A [ w/u ] @ wΓ ; ∆ (cid:96)↓ u.A @ w [ ↓ R ]The simple example above, representing cdk binding to a cyclin, in the cellcycle, can be written in HyLL as follows, where we specify the delay d for thereaction: ”cdk46 + cycD at t −◦ cdk46-cycD at ( t + d ).” Modalities as subexpoentials
Linear logic with subexponentials (SELL)shares with LL all its connectives except the exponentials: instead of havinga single pair of exponentials ! and ? (we will not use ? in this chapter), SELLmay contain as many subexponentials [23, 59], written ! a and ? a , as one needs.A SELL system is specified by a subexponential signature Σ = (cid:104) I, (cid:22) , U (cid:105) ,where I is a set of labels, U ⊆ I specifies which subexponentials are unboundedand (cid:22) is a pre-order among the elements of I . Intuitively, ! a F means that F ismarked with a given modality a . The preorder (cid:22) on the indices I determines theprobability relation. For instance, ! a F proves ! b F whenever b (cid:22) a (intuitively, ! a is a stronger modality that can be used in a proof of the weaker modality ! b ).Moreover, ! b F cannot prove ! a if b ≺ a (weaker formulas cannot be used during aprove of stronger ones). As shown in [60, 61], the formula ! a F can be interpretedin several ways, for instance, it may represent the fact that F holds in the spacelocation a or that F holds in the time-unit a . Moreover, if a is unbounded (orclassical), then F can be used as many times as needed. The rules for ! a are thefollowing: Γ, a : F ; ∆ (cid:96) GΓ ; ∆, ! a F (cid:96) G ! L Γ (cid:22) a ; · (cid:96) GΓ ; · (cid:96)⇓ ! a G ! R In ! L , we simply “store” the formula F in the context a . This rule belongsto the negative phase. Rule ! R belongs to the positive case. Note that the lin-ear context must be empty. Moreover, the resulting Γ (cid:22) a context only containsformulas of the form ! b F such that a (cid:22) b .Now consider a constrained transition of the form[ A ] a + [ B ] b −→ [ C ] c meaning that the component A must be located in the space domain a (similarlyfor B ) in order to produce C in the space domain c .he component [ A ] a can be adequately represented as the formula ! a A andthe reaction above becomes (! a A ⊗ ! b B ) −◦ ! c C .It is also possible to enhance the expressiveness of SELL with the subexpo-nential quantifiers (cid:101) (“for all locations”) and (cid:100) (“there exists a location”) [61]given by the rules Γ ; ∆, ⇓ , G [ l/l x ] (cid:96) GΓ ; ∆, ⇓ (cid:101) l x : a.F (cid:96) G (cid:101) L Γ ; ∆ (cid:96) G [ l e /l x ] Γ ; ∆ (cid:96) (cid:101) l x : a.G (cid:101) R where l e is fresh. Intuitively, subexponential variables play a similar role as eigen-variables. The generic variable l x : a represents any subexponential, constant orvariable in the ideal of a . Hence l x can be substituted by any subexponential l of type b (i.e., l : b ) if b (cid:22) a . We call the resulting system SELL (cid:101) .By ordering the location in the subexpoential signature, we can create hier-archies of spaces. Then, we may stipulate that a certain reaction occurs in allthe spaces related to a given location x as follows: (cid:101) l : x. (cid:16) ! l A ⊗ ! l B −◦ ! l C (cid:17) In this case, we observe the transition A + B −→ C in all the space domainssubordinate to x .As shown in [62], it is also possible to provide a suitable subexponential sig-nature to combine both spatial and temporal modalities in the same framework.The reader may find in [17, 30] a formal comparison of HyLL and SELL, andan adequate encoding of Temporal Logic in LL extended with fixpoints. HyLLmay be used to describe stochastic processes (based on variables with exponen-tial distributions). This approach has been used to encode the S π -calculus intoHyLL [28]. However, only symbolic operations on rates were needed there. Usingthis approach to model and analyse biological systems, for example, is still futurework. In our later experiments, such as the one described in Sec. 2.4, we chose touse pure LL, and define a predicate to encode time. This approach might be lesselegant; however, it avoids the extra complication of copying the unused (notconsumed) information to the next time-unit (HyLL world or SELL subexpo-nential), thus benefiting from the usual compositional nature of the logic. As we have seen in the introduction of the present chapter (Sec. 1), we formalizeboth biological systems and their properties in logic, and prove these propertiesin logic as well. We shall use here Linear Logic (LL) to model biological systems(here the evolution of cancer cells) and the Calculus of Inductive Constructions(CIC) to write their properties and prove them. More precisely, in a “two-levelapproach”, we shall use Linear Logic (LL) as the intermediate logic, formalisedin CIC, which is a type theory implemented in the Coq Proof Assistant [9]. Wenote that the Coq system has been (partially) proven correct in itself, extensivemeta-theoretical studies of LL are available in the Coq system (see e.g., [76]),nd our encoding of biological systems is adequate (Section 2.2). This means thatwe prove that the formal model of the system correctly encodes (the transitionsystem modeling) the intended biological system. The approach is thus an unifiedapproach, fully based on logic, and a safe approach, as each step is proved correct,as far as it can be (G¨odel’s theorem).We leverage our formalisation of LL in CIC to give a natural and direct char-acterisation of the state transformations of Circulating Tumour Cells (CTCs).For instance, a rule describing the evolution of a cell n , in a region r , from ahealthy cell (with no mutations) to a cell that has acquired a mutation TGF β canbe modelled by the linear implication C ( n, r, []) −◦ C ( n, r, [ TGF β ]). This formuladescribes the fact that a state where a cell C ( n, r, []) is present can evolve intoa state where C ( n, r, [ TGF β ]) holds. If this transition takes a delay d , we will for-malise it by adding a predicate T , as proposed in Sec. 2.2: T ( t ) ⊗ C ( n, r, []) −◦ T ( t + d ) ⊗ C ( n, r, [ TGF β ]). More interestingly, the LL specification can be used toprove some desired properties of the system. For instance, it is possible to prove reachability properties , i.e., whether the system can reach a given state or evenmore abstract (meta-level) properties such as checking all the possible evolutionpaths the system can take under certain conditions (Sect. 2.5). Finally, we at-tain a certain degree of automation in our proofs which opens the possibility oftesting recent proposed hypotheses in the literature.It is worth noting that knowledge of logic, and in particular of LL, is un-doubtedly useful, but not necessary for the biologist interested in formalizingand studying his system following the approach proposed here. LL has been im-plemented in Coq, with some of its rule systems (focused or not), as well as tacticsfor developing proofs, by different experts (see, for example, [35, 51, 76]), onceand for all. The biologist potentially interested in using our approach should beable to easily model his system in LL by following one of the applications givenas an example in this chapter or in the preliminary studies cited [52, 62]. Evenbetter, we think in the near future, not, a priori, to develop our own interface toa ”biology dedicated LL”, but to propose translations/compilations of modelswritten in rule-based languages such as Biocham and Kappa (presented in otherchapters of this book) to corresponding models written in LL. The writing ofproperties of interest, in the form of sequents, could probably, in simple cases,be done by the biologist. However, in the current state of science, this step, likethe previous one (modeling), often requires a long dialogue between the biol-ogist and the computer scientist, if only to estimate the possible contributionof the proposed approach to the problem and the most appropriate degree ofabstraction. The final step -the proofs- will probably have to be performed byexperts, here in mecanised proofs, on the computer, although the developmentof powerful tactics (by these same experts) should eventually make this step alsoaffordable to non-experts. .4 Modeling Breast Cancer Progression We first describes here (2.4) some relevant properties related to cancer mutationsand CTCs which we believe are key factors driving the model dynamics. Then,in Section 2.4, we specify in LL the dynamics of CTCs. Reachability, existenceof cycles, and meta-level properties of the system will be proven in Section 2.5.
Tumour Cells in Metastatic Breast Cancer
In this section we first describethe mutations involved in cancer in general and then, we focus on the evolutionof circulating tumour cells described in this work.
Cancer Mutations.
Cancer mutations can be divided into drivers and non-drivers (or passengers). Non-driver mutations may change the metabolic networkand affect important cell-wide processes such as phosphorilation and methyla-tion. They could initiate a cascade of changes that generate effective clonalheterogeneity. The accumulation of evidence of clonal heterogeneity and the ob-servation of the emergence of drug resistance in clonal sub-populations suggestthat mutations usually classified as non-drivers may have an important role inthe fitness of the cancer cell and in the evolution and physiopathology of cancer.Similarly, mutations that alter the metabolism and the epigenetics may modifythe fitness of the cancer cells. A meaningful way to identify drivers and pas-senger mutations is to use a statistical estimator of the impact of mutationssuch as FATHMM-MKL and a very large mutation database such as Cosmic( http://cancer.sanger.ac.uk/ ) [71].The mutation process (causing tumour evolution) generates intra-tumourheterogeneity. The subsequent selection and Darwinian evolution (including im-mune escape) on intra-tumour heterogeneity is the key challenge in cancer medi-cine. Those clones that have progressed more than the others will have largerinfluence on patient survival and determine the cancer subtype stratification ofthe patient. The amount of heterogeneity within primary tumours or betweenprimary and metastatic can be huge. The heterogeneity could be investigatedthrough molecular biology techniques such as single cell sequencing, in situ Poly-merase Chain Reaction (PCR) and could also be phenotypically classified usingmicroscope image analysis. In case of large heterogeneity we could assume thatthe survival of the patient strongly depends on the mutations of the most aggres-sive clones/cells. Therefore single cell sequencing at primary site and metastaticsites could be highly informative of the level of heterogeneity; probably morethan bulk tissue gene expression analysis. Although still used only in few clini-cal protocols, cancer single cell analysis may spread quickly as it can be easilyintegrated with other tissue, organ, blood and patient level (imaging, life style)information in a clinical decision system. This observation motivates the choiceof single cell models in this methodological study.
Circulating Tumour Cells (CTCs).
We follow the study of the evolutionof Circulating Tumour Cells in metastatic breast cancer in [5], where the au-thors use differential equations. This reference has an extensive discussion onthe modeling choices, in particular concerning the driver mutations.n [5] the probability for a cell in a duct in the breast to metastasise in thebone depends on the following mutational events:1. A mutation in the
TGF β pathways frees the cell from the surrounding cells.2. A mutation in the EPCAM gene makes the cell rounded and free to divide.Then the cell enters the blood stream and becomes a circulating cancer cell.3. In order to survive, this cell needs to over express the gene
CD47 that preventsattacks from the immune system.4. Finally, there are two mutations that allow the circulating cancer cell toattach to the bone tissue and start the deadly cancer there:
CD44 and
MET . Hence, a cancer cell has four possible futures: (a) acquiring a driver muta-tion; (b) acquiring a passenger mutation which does not cause too much of aviability problem: it simply increases a sort of “counter to apoptosis”; (c), thenew (i.e., last) mutation brings the cell to apoptosis; and (d), moving to the nextcompartement, or seeding in the bone.The behaviour of the cells depends on the compartments the cells live in (herethe breast, the blood and the bone), the other cells (i.e., the environment : theavailability of food/oxygen or the pressure by the other cells), and the behaviourof the surface proteins (the mutations ). In this work, we shall formalise thecompartments and the mutations, and leave the formalisation of the environmentto future work. Note that this environment plays a role only in the breast.The phenotype of a cell is characterised by both the number of its mutationsand its fitness. In biology the fitness is the capability of the cell to survive andproduce offsprings. The cell’s viability is particularly dependent on metabolichealth and energy level. Most of the cell’s metabolic health depends on theaccumulation of mutations that affect the production of enzymes involved incatalysing energy-intensive reactions and cell homeostasis. The fitness is partic-ularly altered by the occurrence of driver mutations: each driver mutation pro-vides the cell with additional fitness. Non-driver mutations, on the other hand,may accumulate in large numbers, and may affect cell stress response due to thealtered metabolism and the competition with other neighboring cells [40]. Wet-lab tests for cell fitness and stress responsiveness have been recently developed,see for instance [4, 69, 74]. In our formalisation, the fitness will be a parameterof the cells. Physicians see the appearance of the cell (round, free, etc.), whilebiologists see the mutations; our model can take both into account.We extend the model in [5] with a few rules modeling DNA repair of passengermutations. These rules, only available for cells with
TGF β or EPCAM mutations, (i.e.before
CD47 mutation), represent DNA repair by increasing the fitness by one.Note that this addition introduces cycles in our model (i.e., it is possible for acell to go back to a previous state).
Modeling CTCs in Linear Logic
In this section we formalise in LL thebehaviours of the Circulating Tumour Cells. The correctness of this formalisation(with respect to the corresponding traditional modeling as a transition system)was proven in a generic way in Sec. 2.2.e have seen in Sec. 2.2 that LL formulas can be split into two sets: classical(those that can be used as many times as needed) or linear (those that areconsumed after being used). Recall that, in a dyadic system for LL, sequentstake the form Γ ; ∆ (cid:96) G where G is the formula (goal) to be proved (examplesin Section 2.5), Γ is the set of classical formulas and ∆ is the multiset of linearformulas. We store in Γ the formulas representing the rules of the system andin ∆ the atomic predicates representing the state of the system, namely:- C ( n , c , f , lm ), denoting a cell n (a natural number used as an id), in a com-partment c (breast, blood, or bone), with a phenotype given by a fitness degree f ∈ ..
12 and a list of driver mutations lm . The list of driver mutations lm isbuilt up from mutations TGF β , EPCAM , CD47 , CD44 , and/or
MET , to which we add seeded , for the cells seeded in the bone. As
TGF β is required before any further mu-tations, a list of mutations [ EPCAM , . . . ] will by convention mean [
TGF β, EPCAM , . . . ];- A ( n ), representing the fact that the cell n has gone to apoptosis;- and T ( t ), stating that the current time-unit is t .Each rule of the biological system is associated with a delay (see the termsof the form d i in Fig. 2), which depends on the fitness parameter. Fitness pa-rameters decrease marginally with passenger mutations and increase drasticallywith driver mutations. A typical rule in our model is then as follows: rl ( br e . ) def = ∀ t, n. T ( t ) ⊗ C ( n, breast, , [ EPCAM ]) −◦ T ( t + d (1)) ⊗ C ( n, breast, , [ EPCAM ]) This rule describes a cell acquiring passenger mutations. Its fitness is decreasedby one in a time-delay d (1) and its driver mutations remain unchanged.Most of the rules in our model are parametric on the fitness degree. Hence,a rule of the form: rl ( br t ) def = ∀ t, n. T ( t ) ⊗ C ( n, breast, f, [ TGF β ]) −◦ T ( t + d ( f )) ⊗ A ( n ) , f ∈ .. represents, in fact, three rules (one for each value of f ∈ .. TGF β ] may go toapoptosis and the time needed for such a transition is d ( f ). Note that d ( · ) is afunction that depends on f . If such d ( · ) does not depend on f , we shall simplywrite d instead of d ().A typical rule describing a cell acquiring a driver mutation is: rl ( br t . ) def = ∀ t, n. T ( t ) ⊗ C ( n, breast, , [ TGF β ]) −◦ T ( t + d ) ⊗ C ( n, breast, , [ EPCAM ]) This rule says that a cell in the breast with a fitness degree f = 1 may acquirea new mutation ( EPCAM ), which increases its fitness by 1.Another kind of rule describes a cell moving from one compartment to thenext. The following rule describes an intravasating CTC: rl ( br e ) def = ∀ t, n. T ( t ) ⊗ C ( n, breast, f, [ EPCAM ]) −◦ T ( t + d ( f )) ⊗ C ( n, blood, f + 1 , [ EPCAM ]) , f ∈ .. Finally, a last kind of rule describes a DNA repair of passenger mutations: rl ( br e r ) def = ∀ t, n. T ( t ) ⊗ C ( n, breast, f, [ EPCAM ]) −◦ T ( t + d r ( f )) ⊗ C ( n, breast, f + 1 , [ EPCAM ]) , f ∈ .. The complete set of rules is in Fig. 2. For the sake of readability, we omitthe universal quantification on t and n in the formulas. We shall use system todenote the set of rules and then, sequents take the form: n the breast rl ( br ) def = T ( t ) ⊗ C ( n, breast, , []) −◦ T ( t + d ) ⊗ C ( n, breast, , []) rl ( br ) def = T ( t ) ⊗ C ( n, breast, f, []) −◦ T ( t + d ( f )) ⊗ A ( n ) f ∈ .. rl ( br ) def = T ( t ) ⊗ C ( n, breast, , []) −◦ T ( t + d ) ⊗ C ( n, breast, , [ TGF β ]) rl ( br t ) def = T ( t ) ⊗ C ( n, breast, , [ TGF β ]) −◦ T ( t + d ) ⊗ C ( n, breast, , [ TGF β ]) rl ( br t r ) def = T ( t ) ⊗ C ( n, breast, , [ TGF β ]) −◦ T ( t + d r ) ⊗ C ( n, breast, , [ TGF β ]) rl ( br t ) def = T ( t ) ⊗ C ( n, breast, f, [ TGF β ]) −◦ T ( t + d ( f )) ⊗ A ( n ) , f ∈ .. rl ( br t ) def = T ( t ) ⊗ C ( n, breast, f, [ TGF β ]) −◦ T ( t + d ) ⊗ C ( n, breast, f + 1 , [ EPCAM ]) f ∈ .. rl ( br e ) def = T ( t ) ⊗ C ( n, breast, f, [ EPCAM ]) −◦ T ( t + d ( f )) ⊗ C ( n, breast, f − , [ EPCAM ]) , f ∈ .. rl ( br e r ) def = T ( t ) ⊗ C ( n, breast, f, [ EPCAM ]) −◦ T ( t + d r ( f )) ⊗ C ( n, breast, f + 1 , [ EPCAM ]) , f ∈ .. rl ( br e ) def = T ( t ) ⊗ C ( n, breast, f, [ EPCAM ]) −◦ T ( t + d ( f )) ⊗ A ( n ) , f ∈ .. rl ( br e ) def = T ( t ) ⊗ C ( n, breast, f, [ EPCAM ]) −◦ T ( t + d ( f )) ⊗ C ( n, blood, f + 1 , [ EPCAM ]) , f ∈ .. rl ( bl e ) def = T ( t ) ⊗ C ( n, blood, f, [ EPCAM ]) −◦ T ( t + d ( f )) ⊗ C ( n, blood, f − , [ EPCAM ]) , f ∈ .. rl ( bl e r ) def = T ( t ) ⊗ C ( n, blood, f, [ EPCAM ]) −◦ T ( t + d r ( f )) ⊗ C ( n, blood, f + 1 , [ EPCAM ]) , f ∈ .. rl ( bl e ) def = T ( t ) ⊗ C ( n, blood, f, [ EPCAM ]) −◦ T ( t + d ( f )) ⊗ A ( n ) , f ∈ .. rl ( bl e ) def = T ( t ) ⊗ C ( n, blood, f, [ EPCAM ]) −◦ T ( t + d ( f )) ⊗ C ( n, blood, f + 2 , [ EPCAM , CD47 ]) , f ∈ .. rl ( bl ec ) def = T ( t ) ⊗ C ( n, blood, f, [ EPCAM , CD47 ]) −◦ T ( t + d ( f )) ⊗ C ( n, blood, f − , [ EPCAM , CD47 ]) , f ∈ .. rl ( bl ec ) def = T ( t ) ⊗ C ( n, blood, f, [ EPCAM , CD47 ]) −◦ T ( t + d ( f )) ⊗ A ( n ) , f ∈ .. rl ( bl ec ) def = T ( t ) ⊗ C ( n, blood, f, [ EPCAM , CD47 ]) −◦ T ( t + d ( f )) ⊗ C ( n, blood, f + 2 , [ EPCDCD ]) , f ∈ .. rl ( bl ec ) def = T ( t ) ⊗ C ( n, blood, f, [ EPCAM , CD47 ]) −◦ T ( t + d ( f )) ⊗ C ( n, blood, f + 2 , [ EPCDME ]) , f ∈ .. rl ( bl ecc ) def = T ( t ) ⊗ C ( n, blood, f, [ EPCDCD ]) −◦ T ( t + d ( f )) ⊗ C ( n, blood, f − , [ EPCDCD ]) , f ∈ .. rl ( bl ecc ) def = T ( t ) ⊗ C ( n, blood, f, [ EPCDCD ]) −◦ T ( t + d ( f )) ⊗ A ( n ) , f ∈ .. rl ( bl ecc ) def = T ( t ) ⊗ C ( n, blood, f, [ EPCDCD ]) −◦ T ( t + d ( f )) ⊗ C ( n, blood, f + 2 , [ EPCDCDME ]) , f ∈ .. rl ( bl ecm ) def = T ( t ) ⊗ C ( n, blood, f, [ EPCDME ]) −◦ T ( t + d ( f )) ⊗ C ( n, blood, f − , [ EPCDME ]) , f ∈ .. rl ( bl ecm ) def = T ( t ) ⊗ C ( n, blood, f, [ EPCDME ]) −◦ T ( t + d ( f )) ⊗ A ( n ) , f ∈ .. rl ( bl ecm ) def = T ( t ) ⊗ C ( n, blood, f, [ EPCDME ]) −◦ T ( t + d ( f )) ⊗ C ( n, blood, f + 2 , [ EPCDCDME ]) , f ∈ .. rl ( bl eccm ) def = T ( t ) ⊗ C ( n, blood, f, [ EPCDCDME ]) −◦ T ( t + d ( f )) ⊗ C ( n, blood, f − , [ EPCDCDME ]) , f ∈ .. rl ( bl eccm ) def = T ( t ) ⊗ C ( n, blood, f, [ EPCDCDME ]) −◦ T ( t + d ( f )) ⊗ A ( n ) , f ∈ .. rl ( bl eccm ) def = T ( t ) ⊗ C ( n, blood, f, [ EPCDCDME ]) −◦ T ( t + d ) ⊗ C ( n, bone, f + 1 , [ EPCDCDME ]) , f ∈ .. rl ( bo ) def = T ( t ) ⊗ C ( n, bone, f, [ EPCDCDME ]) −◦ T ( t + d ( f )) ⊗ C ( n, bone, f − , [ EPCDCDME ]) , f ∈ .. rl ( bo ) def = T ( t ) ⊗ C ( n, bone, f, [ EPCDCDME ]) −◦ T ( t + d ( f )) ⊗ A ( n ) , f ∈ .. rl ( bo ) def = T ( t ) ⊗ C ( n, bone, f, [ EPCDCDME ]) −◦ T ( t + d ( f )) ⊗ C ( n, bone, f + 1 , [ EPCDCDME , seeded ]) , f ∈ .. Fig. 2.
Complete set of rules. Variables t and n are universally quantified. EPCDCDME , EPCDCD and
EPCDME are shorthand, respectively, for the list of mutations [
EPCAM , CD47 , CD44 , MET ],[
EPCAM , CD47 , CD44 ] and [
EPCAM , CD47 , MET ]. system ; T ( t ) , C ( · ) , · · · , C ( · ) (cid:96) G where G is a property to be proved (Sect. 2.5). Observe that a cell that has goneto apoptosis cannot evolve any further (as no rule has a A ( n ) on the left handside).We note that our rules are asynchronous: only one rule can be fired at atime. As in Biocham, we choose an asynchronous semantics, which enables amore refined description of the evolution of biological systems than the syn-chronous approach. Finally, we note that the delays depend on the fitness andmore accurate values can be found using data. DNA mutational processes havebeen succesfully modeled as compound poisson processes (see for instance [31]).Delays could be seen as event waiting times which are well-known measures forPoisson processes (see for instance [47]). In our model, delays are (uninterpreted)ogical constants that can be later tuned when experimental results are available.The proofs presented here remain the same regardless of such values. We present here some properties of our model, aiming at testing our rules, butalso testing some hypotheses of our model—as these are recent proposals inthe literature. The reader may find more properties formalised in [29]. We shalldetail some of the proofs here. The others can be found in the proof scriptsand the documentation of our Coq formalisation ( http://subsell.logic.at/bio-CTC/ ).The properties we have proven are of three types:1. Reachability Properties. The goal here is to prove the existence of a ”trace”of execution in the model;2. The existence of cycles. This type of property can be considered a property ofreachability, although the proofs are often more difficult in other approaches- this is not the case here;3. ”Must” properties. Unlike the previous two cases, this is about proving, notthat a certain behavior can take place, but that it must : may vs must prop-erties. ”Must” properties require inductive reasoning; at the meta level . A Reachability Property: an extravasating
CTC
A first property of in-terest describes an extravasating CTC: in our case a CTC that has reached thebone. Let us consider the following property: “is it possible for a CTC, with mu-tations [
EPCAM , CD47 ] and fitness 3, to become an extravasating CTC with fitness8? What is the time delay for such a transition?” Recall that a CTC in the bloodis a cell C ( n, blood, f, m ) while an extravasating CTC , in the bone, is a cell of theshape C ( n, bone, f, [ EPCAM , CD47 , CD44 , MET ]). Our property is formalised as follows:
Proposition 1 (Reachability).
The following sequent is provable: system ; . (cid:96) ∀ n, t. T ( t ) ⊗ C ( n, blood, , [ EPCAM , CD47 ]) −◦∃ d. T ( t + d ) ⊗ C ( n, bone, , [ EPCAM , CD47 , CD44 , MET ]) In our Coq formalisation, we have implemented several tactics (e.g., solveF and applyRule used below) to automate the process of proving properties and makethe resulting scripts compact and clear. This should ease the testing/proving ofnew hypotheses in our model. For instance, the proof of the previous propertyis as follows ( F below is the formula in Property 1): Lemma Property1 : forall n t , | − System ; FProof with solveF . (* solves the "trivial" goals in a focused proof *)intros . (* introducing the quantified variables n and t *) As breafly said in the introduction, our object systems are biological systems, ourspecification/modelisation logic is linear logic, while our meta logic, used to writeproofs, is CCind, on which the Coq proof assistant is based. pplyRule ( blec2 (* application of macro rules -- corollary 2-- *)applyRule ( blecc2 applyRule ( bleccm2 eapply tri_dec1 ... (* decision rule, focusing on the goal *)eapply tri_tensor ... (* tensor *)eapply tri_ex with ( t := ( d72 s + ( d52 s + ( d42 s + ( Cte t )) ... (* existential quantifier *)eapply Init1 ... (* initial rule *)eapply Init1 ... (* initial rule *)Qed . The reader may compare the steps in the script above with the proof (byhand) of Property 1 in [29] (Appendix).
Existence of Cycles
Rules for passenger mutations decrease the fitness ofthe cell by one, while rules for DNA repair increase the fitness. Hence, we mayobserve loops and oscillations in our model. This can be exemplified in the fol-lowing property: “a cell in the breast, with mutation [
EPCAM ], might have itsfitness oscillating from 1 to 2 and back.”
Proposition 2 (Cycle).
The following sequents are provable: system ; . (cid:96) ∀ n, t. T ( t ) ⊗ C ( n, breast, , [ EPCAM ]) −◦ ∃ d. T ( t + d ) ⊗ C ( n, breast, , [ EPCAM ]) and system ; . (cid:96) ∀ n, t. T ( t ) ⊗ C ( n, breast, , [ EPCAM ]) −◦ ∃ d. T ( t + d ) ⊗ C ( n, breast, , [ EPCAM ]) A ”Must” Property
In a first experiment on using LL for biology on thecomputer [52], we defined the set of biological rules as an inductive type in CIC,and proved some of their properties by induction on the set of fireable rules. Here,we choose a different approach. We have defined the biological rules by formulasin LL, and we use focusing, along with adequacy, to look for the fireable rulesat a given state. Properties of type ”must”, whose proofs need meta-reasoning,will be formalised at the level of derivations, as illustrated here.The following property states one of the key properties of our model: “anycell in the blood, with mutations including
CD47 , has four possible evolutions:1. acquiring passenger mutations: its fitness decreases by one and the drivermutations remain unchanged;2. going to apoptosis;3. acquiring a driver mutation: its fitness increases by two;4. moving to the bone: its fitness increases by one and the driver mutations([
EPCAM , CD47 , CD44 , MET ]) remain unchanged.”
Proposition 3 (Must ). Let ∆ be a multiset of atoms of the form C ( · ) . Then,in any derivation of the form system ; ∆, T ( t + t d ) , St (cid:96) G system ; ∆, T ( t ) , C ( n, blood, f, m ) (cid:96) G rl ( · ) with m containing CD47 , it must be the case that. either St = C ( n, blood, f − , m ) ,2. or St = A ( n ) ,3. or St = C ( n, blood, f +2 , m (cid:48) ) with m (cid:48) being as m plus an additional mutation,4. or St = C ( n, bone, f + 1 , m ) with m = [ EPCAM , CD47 , CD44 , MET ] . In our Coq formalisation, the above property can be discharged with fewlines of code:
Lemma Property4 : forall n t c lm , F . Proof by solveF . intros H . (* the first sequent is assumed to be provable *)apply FocusOnlyTheory in H ; auto . (* The proof H must start by focusing on one of the rules *)destruct H as [ R ] ; destruct H . repeat ( first [ CaseRule | DecomposeRule ; FindUnification | SolveGoal ]) .
Qed . The
FocusOnlyTheory lemma says that the proof of the sequent must start byfocusing on one of the formulas in
System . The destruct tactic simplifies thehypotheses after the use of lemma
FocusOnlyTheory . The interesting part is thelast line of the script. The
CaseRule tactic tests each of the rules of the sys-tem. Then,
DecomposeRule; FindUnification decomposes (positive-negativephase) the application of the rule. Finally,
SolveGoal proves the desired goalafter the application of the rule. This is a very general scheme, where we docase analysis on all possible rules. Some of them cannot be fired in the currentstate and then, the proof follows by contradiction. In the rest of the cases, the
SolveGoal tactic is able to conclude the goal.
Here we have proposed first steps in translating logic into biomedicine. In ourfirst experiment, our goal was to study cancer progression, aiming at a bet-ter understanding of it, and, in the long term, help in finding, and testing,new targeted drugs, a priori much more efficient than most of the drugs usedso far. This chapter describes the use of Linear Logic in modeling the multi-compartment role of driver mutations in breast cancer. This work is innovativebut also proof-of-principle. It can clearly be generalised to other cancer typeswhere driver mutations are known. It also makes evident the capability of thislogical approach to integrate different types of data as basic as mutations in sin-gle cell genetics and contribute towards a diagnosis with higher interpretabilitythan many currently fashionable machine learning methods such as deep learn-ing. Note however that building a decision support system for cancer/diseasediagnosis and therapy prognosis would require both automatic proof search andvarious additional information such as the size of the tumour that we do notaddress here.Also note that, although all the properties considered so far only deal withthe evolution of one single cell, our approach allows us to consider a state withany cells. We believe that the chapter and the rich sections in the online supple-mentary material (http://subsell.logic.at/bio-CTC/) will become an importantresource for other similar studies. For example we believe work on mathematicalmodels such as [5], which include survival data and quantitative results, is com-plementary to our work. Logic allows the modeling of the evolution of cells acrossscales and compartments, while ODEs require parameter estimation (qualitativevs quantitative).While temporal logics have been very successful in practice with efficient,fully automatic, model checking tools, these logics do not enjoy standard prooftheory. In contrast, Linear Logic has a very traditional proof theoretic pedigree:it is naturally presented as a sequent calculus that enjoys cut-elimination andfocusing. A further advantage of our approach with respect to model checking isthat it provides a unified framework to encode both transition rules and (bothstatements and proofs of) temporal properties. Observe also that we do notneed to build the set of states of the transition system (neither completely nor”on-the-fly”). Last but not least, as we shall see in the next section devoted toneuroscience, the computational logic approach enables proofs of more generalproperties, compared to model-checking; thanks to the expressiveness of the logiclanguage (here ILL and/or CIC) and to the powerful associated proof tools (asthe Coq system chosen here). We view model checking as a useful first step beforeproofs: testing the model before trying to prove properties of it. The interestedreader can find a detailed comparison of the approaches in [52]. See also [30] foran adequate encoding of temporal logic in Linear Logic.We believe that different modeling approaches should be linked, where rele-vant. For our part, we plan, in the near future, not, a priori, develop our own in-terface to a ”biology dedicated LL”, but propose translations/compilations frommodels written in rule-based languages such as Biocham and Kappa (presentedin other chapters of the present book) to corresponding models written in LL.This compilation seems feasable, at least without taking into account reactionrates, or by translating them into delays where appropriate. This would pro-vide the biologist with a richer environment in which to model his systems andcarry out initial static, and possibly causal, analyses of his models in Biochamor Kappa, before proving their (or other) properties of interest in logic.
In recent years, the exploration of neuronal micro-circuits has become an emerg-ing question in Neuroscience, notably with respect to their integration with neu-rocomputing approaches [53].
Archetypes [25] are specific graphs of a few neuronswith biologically relevant structure and behavior. These archetypes correspondto elementary and fundamental elements of neuronal information processing.Several archetypes can be coupled to constitute the elementary building blocksf bigger neuronal circuits in charge of specific functions. For instance, locomo-tive motion and other rhythmic behaviors are controlled by well-known specificneuronal circuits called Central Generator Patterns (CPG) [54]. These CPG cir-cuits have an oscillating output when the neurons composing them have somespecific parameters and when these neurons are connected in a specific way.To study these circuits, it is relevant to investigate the dynamic behaviorof all the possible archetypes of 2, 3 or more neurons, up to and includingthe consideration of archetypes of archetypes. One of the open questions is:are the properties of these archetypes of archetypes simply the conjunction ofthe individual constituent archetypes properties or something more or less? Inother words, does the resulting network satisfy exactly the properties that werealready satisfied by the constituent archetypes or are there new properties (orless properties) that are satisfied by this combination? Another crucial questionis: can we understand the computational properties of large groups of neuronssimply as the coupling of the properties of individual archetypes, as it is for thealphabet and words, or is there something more again?The first attempts in the literature to apply formal methods from com-puter science to model and verify temporal properties of fundamental neuronalarchetypes in terms of neuronal information processing can be found in [25,27]. Inthis work, the authors take advantage of the synchronous programming languageLustre to implement six neuronal archetypes and their coupling and to formalizetheir expected properties. Then, they exploit some associated model checkers toautomatically validate these behaviors. However, model checkers prove proper-ties for some given parameter intervals, and do not handle inputs of arbitrarylength. In the work described in this section, we use Coq to prove four importantproperties of neurons and archetypes. This work extends [7], primarily by con-sidering archetypes in fuller detail. One of the main advantages of using Coq forthe work in this section is the generality of its proofs. Using such a system, wecan prove properties about arbitrary values of parameters, such as any length oftime, any input sequence, or any number of neurons (parametric verification).We use Coq’s general facilities for structural induction and case analysis, as wellas Coq’s standard libraries that help in reasoning about rational numbers andfunctions on them.As far as neuronal networks are concerned, in the literature their modelingis classified into three generations [50, 63]. First generation models, representedby the McCulloch-Pitts model [55], handle discrete inputs and outputs and theircomputational units consist of a set of logic gates with a threshold activationfunction. Second generation models, whose most representative example is themulti-layer perceptron [19], exploit real valued activation functions. These net-works, whose real-valued outputs represent neuron firing rates, are widely usedin the domain of artificial intelligence. Third generation networks, also called spiking neural networks [63], are characterized by the relevance of time aspects.Precise spike firing times are taken into account. Furthermore, they considernot only current input spikes but also past ones (temporal summation). In [45],spiking neural networks are classified with respect to their biophysical plausibil-ty, that is, the number of behaviors (i.e., typical responses to an input pattern)they can reproduce. Among these models, the Hodgkin-Huxley model [41] is theone able to reproduce most behaviors. However, its simulation process is veryexpensive even for a few neurons and for a small amount of time. In this work,we choose to use the leaky integrate and fire (LI&F) model [49], a computation-ally efficient approximation of a single-compartment model, which proves to beamenable to formal verification. Notice that discrete modeling is well suited forthis task because neuronal activity, as with any other recorded physical event, isonly recorded at discrete intervals (the recording sampling rate is usually set ata significantly higher resolution than the rate of the system being recorded, sothat there is no loss of information). We describe neural networks as weighted di-rected graphs whose nodes represent neurons and whose edges stand for synapticconnections. At each time unit, all the neurons compute their membrane poten-tial accounting not only for the current input signals, but also for the signalsreceived during a given temporal window. Each neuron can emit a spike whenit exceeds a given threshold.In addition to the papers already cited in this section ( [7, 25, 27]), in theliterature there are a few attempts at giving formal models for spiking neuralnetworks. In [2], a mapping of spiking neural P systems into timed automata isproposed. In that work, the dynamics of neurons are expressed in terms of evolu-tion rules and durations are given in terms of the number of rules applied. Timedautomata are also exploited in [26] to model LI&F networks. This modeling issubstantially different from the one proposed in [2] because an explicit notion ofduration of activities is given. Such a model is formally validated against somecrucial properties defined as temporal logic formulas and is then exploited tofind an assignment for the synaptic weights of neural networks so that they canreproduce a given behavior.The rest of this section is organized as follows. In Section 3.2 we present adiscrete version of the LI&F model. Section 3.3 is devoted to the descriptionof the neuronal archetypes we consider. In Section 3.4, we introduce featuresof Coq important for this work. In Section 3.5, we present the Coq model forneural networks, which includes definitions of neurons, operations on them, andthe combination of neurons into archetypes. In Section 3.6, we present and dis-cuss four important properties, starting with properties of single neurons andthe relation between the input and output, and moving toward more complexproperties that express their interactions and behaviors as a system. Proofs canbe found in [7]. Finally, in Section 3.7 we conclude and discuss future work. Theaccompanying Coq code can be found at: . In this section, we introduce a discrete (Boolean) version of LI&F modeling.We first present the basic biological knowledge associated to the phenomena wemodel, and then we provide details of the model.hen a neuron receives a signal at one of its synaptic connections, it pro-duces an excitatory or an inhibitory post-synaptic potential (PSP) caused by theopening of selective ion channels according to the nature of the post-synapticreceptor. An activation leads to an inflow of cations in the cell; an inhibitionleads to an inflow of anions in the cell. This local flow of ions influences theelectrochemical potential difference on both sides of the plasma membrane andlocally depolarizes (excitation) or hyper-polarizes (inhibition) the neuron mem-brane. Such polarization is transmitted, step by step, to the rest of the membraneand thus influences the potential difference on both sides of the membrane at thewhole cell level. The potential difference is called membrane potential . In general,a PSP alone is not enough for the membrane potential of the receiving neuronto exceed its depolarization threshold , and thus to emit an action potential at itsaxon to transmit the signal to other neurons.However, two phenomena allow the cell to exceed its depolarization threshold:the spatial summation and the temporal summation [65]. Spatial summation isthe sum of all the different PSPs produced at a given time at different areas of themembrane. Temporal summation is the sum of all the different PSPs producedbetween two instants that are considered “close enough.” These summations arepossible thanks to a property of the membrane, which allows it to behave like acapacitor and locally store some electrical loads ( capacitive property ).The neuron membrane, due to the presence of leakage channels, is not aperfect conductor and can be compared to a resistor inside an electrical circuit.Thus, the range of the PSPs decreases with time and space ( resistivity of themembrane).A LI&F neuron network is represented with a weighted directed graph whereeach node stands for a neuron soma and each edge stands for a synaptic connec-tion between two neurons. The associated weight for each edge is an indicator ofthe weight of the connection of the receiving neuron: a positive (resp. negative)weight is an activation (resp. inhibition).The depolarization threshold of each neuron is modeled via the firing thresh-old τ , which is a numerical value; it is the value that the neuron membranepotential p must exceed at a given time t in order to emit an action potential,or spike , at the time t + 1.The membrane resistivity is modeled as a numerical coefficient called the leakfactor r , which allows the range of a PSP to decrease over time.Spatial summation is implicitly taken into account. In our model, a neuron u is connected to another neuron v via a single synaptic connection of weight w uv . This connection represents the entirety of the shared connections between u and v . Spatial summation is also more explicitly taken into account with thefact that, for each instant, the neuron sums each signal received from each inputneuron. To take the temporal summation into account, we add both the past andpresent PSPs when computing the membrane potential. As the value of a PSPgets older over time, it has less impact on the calculation of the current membranepotential. This decrease is taken into account by repeatedly decreasing the valueby r . As a consequence, old PSPs can be neglected (their value is very small) andhe effect of past PSPs is restricted to a given time interval. In particular, weintroduce a sliding integration window of length σ for each neuron. This allowsus to obtain finite sets of states, and thus to easily apply formal techniques.More formally, the following definition can be given: Definition 1 (Boolean Spiking Integrate and Fire Neural Network).
Aspiking Boolean integrate and fire neural network is a tuple ( V, E, w ) , where: – V is a set of Boolean spiking integrate and fire neurons, – E ⊆ V × V are synapses, – w : E → Q ∩ [ − , is the synapse weight function associating to each synapse ( u, v ) a weight w uv . A spiking Boolean integrate and fire neuron is a tuple ( τ, r, p, y ) , where: – τ ∈ N is the firing threshold , – r ∈ Q ∩ [0 , is the leak factor , – p : N → Q +0 is the [membrane] potential function defined as p ( t ) = (cid:26) (cid:80) mi =1 w i · x i ( t ) , if p ( t − (cid:62) τ (cid:80) mi =1 w i · x i ( t ) + r · p ( t − , otherwise where p (0) = 0 , m is the number of inputs of the neuron, w i is the weightof the synapse connecting the i th input neuron to the current neuron, and x i ( t ) ∈ { , } is the signal received at the time t by the neuron through its i th input synapse (observe that, after the potential exceeds its threshold, it isreset to ), – y : N → { , } is the neuron output function, defined as y ( t ) = (cid:40) p ( t − (cid:62) τ . The development of the recursive equation for the membrane potential functionand the introduction of a sliding integration window of length σ leads to thefollowing equation for p (when p ( t − < τ ): p ( t ) = (cid:80) σe =0 r e (cid:80) mi =1 w i · x i ( t − e ),where e represents the time elapsed up until the current time. The six basic archetypes we study are as follows (see Fig. 3). These archetypescan be coupled to constitute the elementary building blocks of bigger neuronalcircuits. – Simple series is a sequence of neurons where each element of the chainreceives as input the output of the preceding one. – Series with multiple outputs is a series where, at each time unit, we areinterested in knowing the outputs of all the neurons (i.e., the output of allthe neurons together constitutes the output of the archetype as a whole).
Parallel composition is a set of neurons receiving as input the output ofa given neuron. – Negative loop is a loop consisting of two neurons: the first neuron activatesthe second one while the latter inhibits the former. – Inhibition of a behavior consists of two neurons, the first one inhibitingthe second one. – Contralateral inhibition consists of two or more neurons, each one in-hibiting the other ones. S S S n (a) Simple series S n S S S (c) Parallel composition S S (e) Inhibition of a behavior S S (f) Contralateral inhibition S S S n (b) Series with multiple outputs S S (d) Negative loop Fig. 3.
The basic neuronal archetypes.
In this section, we present the basic elements of Coq that we use to represent ourmodel. We encode the model and properties directly in the logic implementedin Coq, which is the Calculus of Inductive Constructions. We use several basictypes, data structures, and properties from Coq’s libraries, and we do not addany axioms. Expressions in Coq include a functional programming language. Itis a typed language, which means that every Coq expression has a type. Forinstance, X : nat expresses that variable X is in the domain of natural numbers.The types used in our model include nat , Q , and list which denote naturalumbers, rational numbers, and list of elements respectively. These types arefound in Coq’s standard libraries. All elements in a list have the same type.For instance, L : list nat means that L is a list of natural numbers. A list canbe empty, which is written [] or nil in Coq. Functions are a basic element ofany functional programming language. The general form of a function in Coq isshown below. Definition / Fixpoint
Function Name(Input : Type of Input ) . . . (Input n : Type of Input n ): Output Type := Body of thefunction. Definition and
Fixpoint are Coq keywords for defining non-recursive and recur-sive functions, respectively. After either one of these keywords comes the namethat a programmer gives to the function. Following the function name are theinput arguments and their types. If two or more inputs have the same type,they can be grouped. For example, ( X Y Z : Q ) means all variables X , Y , and Z arerational numbers. Following the inputs is a colon, followed by the output type ofthe function. Finally, the body of the function is a Coq expression representinga program, followed by a dot.Pattern matching is a useful feature in Coq, used for case analysis. Thisfeature is used, for instance, for distinguishing between base cases and recursivecases in recursive functions. For example, it can distinguish between empty andnonempty lists. The pattern for a non-empty list shows the first element of thelist, which is called the head , followed by a double colon, followed by the rest ofthe list, which is called the tail . The tail of a list itself is a list of elements of thesame type as the type of the head. For example, let L be the list (6::3::8:: nil ) containing three natural numbers. An alternate notation for Coq lists allows L to be written as [6;3;8] where the head is and the tail is [3;8] . Thus, thegeneral pattern for non-empty lists often used in Coq recursive functions hasthe form ( h :: t ) . Another example of a Coq data type is the natural numbers. Anatural number is either or the successor of another natural number, written ( S n ) , where n is a natural number. For example, is represented as ( S , as ( S ( S , etc. In the code below, some patterns for lists and natural numbersare shown using Coq’s match . . . with . . . end pattern matching construct. match X with | ⇒ calculate something when X = 0 | S n ⇒ calculate something when X is the successor of 0 endmatch L with | [] ⇒ calculate something when L is an empty list | h :: t ⇒ calculate something when L has head h followed by tail tend In addition to the data types that are defined in Coq’s libraries, new data typescan be defined. One way to do so is using records. Records can have differentfields with different types. For example, we can define a record that has 3 fields
Fieldnat , FieldQ , and
ListField , which have types natural number, rationalumber, and list of natural numbers, respectively. The code below shows theCoq syntax for the definition of this record with one additional field called CR . Record Sample_Record := MakeSample { Fieldnat : nat ; FieldQ : Q ; ListField : list nat ; CR : Fieldnat > } . S : Sample_Record
Fields in Coq can represent conditions on other fields. For example, field CR inthe above code is a condition on the Fieldnat field stating that it must be greaterthan 7. After defining a record type, it is a type like any other type, and so forexample, we can have variables with the new record type. Variable S shown withtype Sample_Record is an example.
We illustrate our encoding of neural networks in Coq by beginning with the codebelow.
Record Neuron := MakeNeuron { Output : list nat ; Weights : list Q ; Leak_Factor : Q ; Tau : Q ; Current : Q ; Output_Bin : Bin_List Output ; LeakRange : Qle_bool Leak_Factor = true ∧ Qle_bool Leak_Factor true ; PosTau : Qlt_bool Tau = true ; WRange : WeightInRange Weights = true } . Fixpoint potential ( Weights : list Q ) ( Inputs : list nat ): Q := match Weights , Inputs with | nil , _ ⇒ | _ , nil ⇒ | h1 :: t1 , h2 :: t2 ⇒ if ( beq_nat h2 nat ) then ( potential t1 t2 ) else ( potential t1 t2 ) + h1end . We use Coq’s record structure to define a neuron. This record includes five fieldswith their types, and four fields which represent constraints that the first fivefields must satisfy according to the LI&F model mentioned in Section 3.2. Thetypes include natural numbers, rational numbers, and lists. In particular, a neu-ron’s output ( Output ) is represented as a list of natural numbers, with one entryfor each time step. The weights attached to the inputs of the neuron ( Weights ) are stored in a list of rational numbers, one for each input in some designatedrder. The leak factor ( Leak_Factor ) , the firing threshold ( Tau ) , and the mostrecent neuron membrane potential ( Current ) are rational numbers. With respectto the four conditions, for example, consider PosTau , which expresses that
Tau must be positive.
Qle_bool and other arithmetic operators come from Coq’s ra-tional number library. The other three state, respectively, that
Output containsonly 0s and 1s (it is a binary list),
Leak_Factor is between 0 and 1 inclusive, andeach input weight is in the range of [-1, 1]. We omit the definitions of
Bin_List and
WeightInRange used in these statements.Given a neuron N , we write ( Output N ) to denote its first field, and simi-larly for the others. To create a new neuron with values O , W , L , T , and C ofthe appropriate types, and proofs P1 , . . . , P4 of the four constraints, we write ( MakeNeuron O W L T C P1 P2 P3 P4 ) .The next definition in the above code implements the weighted sum of theinputs of a neuron, which is an important part of the calculation of the potentialfunction (see Definition 1). In this recursive function, there are two arguments: Weights representing w , . . . , w m and Inputs representing x , . . . , x m . The func-tion returns an element of type Q . Its definition uses pattern matching on bothinputs simultaneously. The body of the definition uses Booleans, the if state-ment, and the equality operator on natural numbers ( beq_nat ) , all from Coq’sstandard library. Natural numbers, such as \ % nat above are marked with theirtype to distinguish them from rational numbers, whose types are omitted. Al-though, we always call the potential function with two lists of equal length,Coq requires functions to be total; when two lists do not have equal length, wereturn a “default” value of 0. Also, when we call this function, Inputs , which isthe second argument of the function, will always be a binary list (contains onlythe natural numbers 0 and 1). Thus, when the head of the list h2 is 0, we donot need to add anything to the final sum because anything multiplied by 0 is 0.In this case, we just call the function recursively on the remaining weights andinputs t1 and t2 , respectively. On the other hand, when h2 is 1, we need to add h1 , the head of Weights to the final sum, which again is the recursive call on t1 and t2 .The following code shows the NextPotential function, which implements p ( t )from Definition 1. Definition NextPotential ( N : Neuron ) (
Inputs : list nat ): Q := if ( Qle_bool ( Tau N ) (
Current N )) then ( potential ( Weights N ) Inputs ) else ( potential ( Weights N ) Inputs ) +(
Leak_Factor N ) ∗ ( Current N ). Recall that ( Current N ) is the most recent potential value of the neuron whichis p ( t −
1) in Definition 1. ( Qle_bool ( Tau N ) (
Current N )) represents τ ≤ p ( t − r · p ( t − Definition NextOutput ( N : Neuron ) (
Inputs : list nat ): nat := if ( Qle_bool ( Tau N ) (
NextPotential N Inputs )) hen natelse natDefinition NextNeuron ( N : Neuron ) (
Inputs : list nat ): Neuron := MakeNeuron (( NextOutput N Inputs )::(
Output N ))(
Weights N )( Leak_Factor N )( Tau N )( NextPotential N Inputs )( NextOutput_Bin_List N Inputs ( Output_Bin N ))(
LeakRange N )( PosTau N )( WRange N ). The first definition computes the next output of the neuron, which is y ( t )in Definition 1. Recall that ( NextPotential N Inputs ) computes p ( t ). Thus, theexpression ( Qle_bool ( Tau N ) (
NextPotential N Inputs )) expresses the condition τ ≤ p ( t ).In our model, the state of a neuron is represented by the Output and
Current fields. The
Output field of a neuron in the initial state is [0 \ % nat ] , which denotesa list of length 1 containing only 0. The Current field represents the initialpotential, which is set to 0. A neuron changes state by processing input. Afterprocessing a list of n inputs, the Output field will be a list of length n + 1containing 0’s and 1’s, and the Current field will be set to the value of thepotential after processing these n inputs. State change occurs by applying the NextNeuron function of the above code to a neuron and a list of inputs. As itis typical in functional programming, we represent a neuron at its later stateby creating a new record with the new values for
Output and
Current and othervalues directly copied over. We store the values in the
Output field in reverseorder, which simplifies proofs by induction over lists, which we use regularly inour Coq proofs. Thus, the most recent output of the neuron is at the head ofthe list. We can see this in the above code, where the new value of the output is (( NextOutput N Inputs )::(
Output N )) . The next output of the neuron is at the head,followed by the previous outputs. ( NextPotential N Inputs ) is the new value for ( Current N ) . Recall that ( Current N ) is the most recent value of potential value ofthe neuron or p ( t − p ( t ), the NextPotential function is called.Following the new values for each field of the neuron, we have proofs ofthe four constraints. The first requires a lemma
NextOutput_Bin_List (statementomitted) which allows us to prove that the new longer list is still a binary list.Proofs of the other three constraints are carried over exactly from the originalneuron, since they are about components of the neuron that do not change.To reinitialize a neuron to the initial state, the
ResetNeuron function is used.
Definition ResetNeuron ( N : Neuron ): Neuron := MakeNeuron ([0% nat ])(
Weights N ) Leak_Factor N )( Tau N )(0)(
Reset_Output )( LeakRange N )( PosTau N )( WRange N ). This function takes any
Neuron as input, and returns a new one, with the
Output , Current , and
Output_Bin fields reset, while keeping the others. The
Reset_Output property is a simple lemma stating that [0 \ % nat ] satisfies the Bin_List property.So far, we have discussed the encoding and processing of single neurons in iso-lation, which take in inputs and produce outputs. We next consider archetypes.In general, our approach is to encode the particular structure of each archetypeas a Coq record. Using a record for each archetype facilitates stating and prov-ing properties about them. Recall that archetypes are functional structures ofneural networks. Defining them in this abstract way helps us to present theirbasic functions. To illustrate this approach, we introduce now the encoding ofone archetype, the simple series in Figure 3(a). As shown in Figure 3(a), a sim-ple series consists of a list of single input neurons. The first neuron receives theinput of the archetype and sends its output to the second neuron. Starting fromthe second neuron each neuron receives its input from the previous neuron andsends its output as the input of the next neuron in the series. The last neuronproduces the output of the series. The
NeuronSeries record type defined belowrepresents this structure in Coq.
Record NeuronSeries { Input : list nat } := MakeNeuronSeries { NeuronList : list Neuron ; NSOutput : list nat ; AllSingle : forall ( N : Neuron ), In N NeuronList → ( beq_nat ( length ( Weights N )) 1% nat ) = true ; SeriesOutput : NSOutput = (
SeriesNetworkOutput Input NeuronList ); } . Records can have input parameters, similar to functions in Coq, and here thelist of inputs to the simple series is
Input : list nat . Thus NeuronSeries is actuallya function from a list of natural numbers to a record. Curly brackets aroundinput arguments is Coq notation for implicit arguments, which are argumentsthat can be omitted from expressions as long as Coq can figure out the missinginformation. Its use here allows us to write more readable Coq code.
NeuronList is a field in the record representing the list of neurons in the simple series. Thefirst element in this list is the first neuron in the series, etc.
NSOutput representsthe list of outputs of the series. In other words, it is the output list of the lastneuron in the series, which is also the last neuron of
NeuronList . There arealso two constraints for this archetype.
AllSingle expresses that all neurons inthe series are single input neurons. The functions In and length are definedin Coq’s list library and define list membership and size of a list, respectively. eriesOutput expresses that the output of the series is equal to the output of thefunction SeriesNetworkOutput . This function takes the input of the series andlist of neurons in the series and produces the output of the series. We leave outits definition and just note here that it expresses the details of the input/outputconnections between the elements of
NeuronList , and in the degenerate case when
NeuronList is empty,
NSOutput is set to the input.
As mentioned earlier, we state four basic properties of the LI&F model of neuronsin this section. All of them have been fully verified in Coq. We start in the nextsubsection with a property about a simple neuron, which has only one input.We refer to this neuron as a single-input neuron.In all of the statements of the properties, we omit the assumption that theinput sequence of the neuron is a binary list and contains only 0s and 1s. It is, ofcourse, included in the Coq code. We use several other conventions to enhancereadability when stating the property. For example, we state our property us-ing pretty-printed Coq syntax, with some abbreviations for our own definitions.For instance, we use mathematical fonts and conventions for Coq text, e.g., ( Output N ) is written Output ( N ), ( Tau N ) is written τ ( N ), ( Weights N ) is written w ( N ), ( Leak_Factor N ) is written r ( N ), and ( Current N ) is written p ( N ). In ad-dition, if w ( N ) is a list of the form [ w ; . . . ; w n ] for some n ≥
0, for i = 1 , . . . , n ,we often write w i ( N ) to denote w i . Also, we use notation and operators fromthe Coq standard library for lists. For instance, length and + are list operators;the former is for finding the number of elements in the list and the latter isthe notation we will use here for list concatenation. In addition, although for aneuron N , the list Output ( N ) is encoded in reverse order, in our Coq model,when presenting properties, we use forward order. The Delayer Effect for a Single-Input Neuron
The first property is calledthe delayer effect property. It concerns a single neuron, which has only one input.Recall that a neuron is in an inactive state initially, which means the output ofa neuron at time 0 is 0. When a neuron has only one input, and the weight ofthat input is greater than or equal to its activation threshold, then the neurontransfers the input sequence to the output without any change (except for a”delay” of length 1). For instance, if a single input neuron receives 0100110101as its input sequence, it will produce 00100110101 as output. Neurons that havethis property are mainly just transferring signals. Humans have some of thistype of neuron in their auditory system. This property is expressed as Theorem2.
Theorem 2. [Delayer effect for a single-input neuron] ∀ ( N : neuron )( input : list nat ) ,length ( w ( N )) = 1 ∧ w ( N ) ≥ τ ( N ) → Output ( N (cid:48) ) = [0] + input n the above statement, N (cid:48) denotes the neuron obtained by initializing N andthen processing the input (using ResetN euron and repeated applications of
N extN euron ). We use this convention in stating all of our properties. Note thatin Definition 1, p is a function of time. Time in our Coq model is encoded asthe position in the output list. If Output ( N ) has length t , then p ( N ) stores p ( t −
1) from Definition 1. If we then apply
N extN euron to N and the nextinput obtaining N (cid:48) , then Output ( N (cid:48) ) has length t + 1 and p ( N (cid:48) ) stores the value p ( t ) from Definition 1. The theorem is proved by induction on the length of theinput sequence. The Filter Effect for a Single Neuron
The next property we consider is alsoabout single-input neurons. It is defined with respect to a given integer n , where n is less than or equal to σ , the length of the integration window introduced inSection 3.2. When a neuron has only one input, and the weight of that inputis less than its activation threshold, the neuron passes on the value 1 once asoutput for every n consecutive occurrences of 1s in the input. For each such n occurrences of 1s in the input, all 1s are replaced by 0 except for the last one.The other 1s are “filtered out.” The neuron thus only “passes” one signal out of n , it behaves as a 1 /n filter. As a consequence, there are never two consecutive 1sin the output sequence. This consequence is called the filter effect. Most neuronsin a human body have the filter effect because their input weight is less thantheir activation threshold. Normally, more than one input is needed to activatea human neuron. In biology, this property is often called the integrator effect. Theorem 3. [Filter effect for a single-input neuron] ∀ ( N : neuron )( input : list nat ) ,length ( w ( N )) = 1 ∧ w ( N ) < τ ( N ) → / ∈ Output ( N (cid:48) )Note that in the statement above, 11 / ∈ Output ( N (cid:48) ) means there are no twoconsecutive 1s in the list Output ( N (cid:48) ). This theorem is also proved by inductionon the structure of the input list. The Inhibitor Effect
The next property is an important one because it hasthe potential to help us detect inactive zones of the brain. Normally, a humanneuron does not have negative weights for all of its inputs but when one or morepositive weight inputs are out of order because of some kind of disability, thisproperty can occur. It is called the inhibitor effect because it is important forproving properties of archetype 3(e). We consider here the single neuron case.When a neuron has only one input and the weight of that input is less than 0,then the neuron is inactive, which means that for any input, the neuron cannotemit 1 as output. i.e., if a signal reaches this neuron, it will not pass through.As with the other properties, the input sequence has an arbitrary finite length.This property is expressed as follows. heorem 4. [Inhibitor effect] ∀ ( N : neuron )( input : list nat ) ,length ( w ( N )) = 1 ∧ w ( N ) < → / ∈ Output ( N (cid:48) )In the statement above, 1 / ∈ Output ( N (cid:48) ) means there is no 1 in the list Output ( N (cid:48) ) . This property is also proved by induction on the structure of theinput. The inhibitor effect expressed in Theorem 4 has a more general version,which we plan to prove as a future work. For a neuron with multiple inputs, whenall input weights are less than or equal to 0, then the neuron is inactive and cannot pass any signal. Thus, in addition to proving this property for arbitraryinput length, we intend to generalize it to an arbitrary number of neurons.
The Delayer Effect in a Simple Series
The next property is about thearchetype shown in Figure 3(a). In this structure, each neuron output is theinput of the next neuron. If we have a series of n single input neurons and all ofthem have the delayer effect, then the output of the whole structure is the inputplus n leading zeros. In other words, this structure transfers the input sequenceexactly with a delay marked by the n leading zeros, denoted as zeros ( n ) in thestatement of the theorem below. This theorem is expressed as follows. Theorem 5. [Delayer effect in a simple series] ∀ ( Series : list neuron )( input : list nat )( i : nat ) ,length ( Series ) = n ∧ ≤ i < n ∧ length ( w ( Series [ i ])) = 1 ∧ w ( Series [ i ]) > τ ( Series [ i ]) → Output = zeros ( n ) + input This time, the proof proceeds by induction on the length of
Series . In this section, we have proposed a formal approach to modeling and validatingleaky integrate and fire neurons and some basic circuits. In the literature, thisis not the first attempt to formally investigate neural networks. In [25, 27], thesynchronous paradigm is exploited to model neurons and some small neuronalcircuits with relevant topological structure and behavior, and to prove someproperties concerning their dynamics. Our approach uses the Coq proof assistantand has turned out to be much more general. As a matter of fact, we guaranteethat the properties we prove are true in the general case, such as true for anyinput values, any length of input, and any amount of time. As an example, letus consider the simple series. In [27], the authors were able to write a function(more precisely, a Lustre node) which encodes the expected behavior of thecircuit. Then, they could call a model checker to test whether the property atssue is valid for some input series with a fixed length. Here we can prove thatthe desired behavior is true for any length of series and any parameters.We plan several future works. As a first next step, we intend to formally studythe missing archetypes of Figure 3 (series with multiple outputs, parallel com-position, negative loop, and contralateral inhibition) and other new archetypesmade of two, three or more neurons. We have already started to investigate thetwo-neuron positive loop, where the first neuron activates the second one, whichin turn activates the first one. Our progress so far includes defining a Coq recordthat expresses the structure of the positive loop, along with an inductive pred-icate that relates the two neurons and their corresponding two lists of valuesobtained by applying the potential function over time. This predicate is truewhenever the output has a particular pattern that is important for proving oneof the more advanced properties we are studying. Defining general relations thatcan be specialized to specific patterns will likely also be very useful for the kindsof properties that are important for more complex networks.As a second next step, we plan to focus on the composition of the studiedarchetypes. There are two main ways to couple two circuits: either to connect theoutput of the first one to the input of the second one, or to nest the first one insidethe second one. We are interested in detecting the compositions which lead tocircuits with a meaningful biological behavior. Archetypes can be considered asthe syllables of a given alphabet. When two or more syllables are combined, it ispossible to obtain either a real word or a word which does not exist. In the sameway, the archetype composition may or may not lead to meaningful networks. Asa long-term aim, with the help of the neurophysiologist Franck Grammont, wewould like to be able to prove that whatever neural network can be expressed asa combination of the small mini-circuits we have identified, similar to how all thewords can be expressed as a combination of the syllables of a given alphabet. Webelieve that the power of our theorem proving approach will allow us to advancerapidly in the study of the fundamental structural and functional properties ofthe elementary building blocks of the brain and cognition.
In the field of biomedicine and biology in general.
The ongoing revolutionin AI is accelerating the development of software that enables computers to per-form “intelligent” clinical and medical tasks. Machine learning algorithms findhidden patterns in data, classify and associate similar patients/diseases/drugsbased on common features (e.g., the IBM Watson system which is used to analysegenomic and cancer data). Future challenges in medicine include understandingbias in data collection (and also in doctor’s experience) and fostering the abil-ity to integrate evidence from heterogeneous datasets, from different omics andclinical data, from several lines of independent data. We believe that machinelearning could satisfy well these needs and that there is also a need to developmethods that offer a hypothesis-driven approach, so that doctors do not feel thatthey are going to be replaced. Such methods could provide them with a person-lised and easily interpretable clinical support decision-making tool that couldperform a synthesis of qualitative and quantitative multi-modal evidence. Exam-ples of decision trees used in current practice for breast cancer diagnosis can befound at pages 598–603 of [58]. Our logical approach, although focused on drivermutations, goes in such a direction and could be used with continuous and dis-crete mixed variables. This information could be obtainable through single cellexperiments on cancer biopsies (although with large variance), which is now atthe stage of passing from basic science to clinical protocols. Machine learningcould analyse cancer mutation patterns and feed our logic approach with thisinformation that could be integrated with other rules such as changes on themetabolic networks or on epigenetics. Other rules could be derived from otherlevels of cancer clinical investigation such as from image data (changes in fMRI,CT-scans and microscopy samples), blood analyses (identification and counts ofcirculating cancer cells) and other types of medical observations. The long termplan is to build a portable resource that facilitates diagnostic and therapeuticdecision making and promotes a cost-effective personalised patient workup. Thiswould represent a new paradigm in personalised and precision cancer treatmentwhich integrates multi-modality analyses and clinical characteristics in a near-real time manner, improving clinical management of cancer. Finally we believethat logical approaches could improve the harmonisation and standardisation ofthe reporting and interpretation of clinically relevant data.The logic approach could have far reaching applications; for example inhematopoiesis each cell branching brings a large number of questions, partic-ularly about regulative circuits, experimental settings, departure from home-ostasis during diseases or alterations. Disease conditions could be monitored bygenerating a score according to a CHESS-(“Changes in Health, End-stage dis-ease, and Signs and Symptoms”) scale. CHESS is a summary measure based ona count of comorbidity progression as well as symptoms and clinician ratingsof a prognosis of less than six months or so. Other applications can be moreambitious. We highlight three of them: – Explore the dependencies of all human cells types during embryogenesis.This approach could make use of cell atlas [68] and tissue atlas, see forexample [36]. – Explore the dependencies of models of cell dependencies. This would leadhelping the curation of models databases, see for example [39], thereforelogic could be used for automatic check of consistency and help in modelannotation. – Deriving ontologies. The biological information has complex structure andit requires an organization that includes controlled vocabularies and formatsfor the exchange of structured data. Bio-ontologies are consensus-based, con-trolled vocabularies for biological terms and interaction between humans andcomputers [6, 11, 57]. Addressing the cell status will require ontologies thattake into account the profound wiring of biological processes in the body.Here we envision to use logic to automatically generate ontologies from a setof rules.n summary we believe that together with the current machine learning, logiccould find a central position in modeling biomedicine.
In the neuroscience area.
Although, the proofs we have completed requiresome sophisticated reasoning, there is still a significant amount that is commonbetween them. As we continue, we expect to encounter more complex induc-tions as we consider more complex properties. Thus, it will become importantto automate as much of the proofs as possible, most likely by writing tactics tai-lored to the kind of induction, case analysis, and mathematical reasoning thatis needed here. Furthermore, defining general relations that can be specializedto specific patterns will likely also be very useful for the kinds of properties thatare important for more complex networks.So far, logic approaches turned out to be particularly suited to investigatedynamic properties of some canonical neuronal circuits, and we believe they willbe crucial to formally study the behavior of bigger circuits obtained by archetypecomposition.
More generally, in biology.
We believe that logic will allow both the inte-gration of scalability (i.e. from neurons to brain superior abilities) and a betteruse of unstructured data, such as users and doctors insights in the form of ex-perience and personal judgement. Moreover explanability and interpretabilitywill allow a team-in-the loop approach to medical cases, i.e. analysis of patientsthat could involve at different levels nurses, single doctors or teams of special-ists and consultants. This interaction will allow a logic clinical decision systemto re-interpret findings on the lights of additional medical expertise and eventoutcomes handling. This aspect will be a key stone in Intensive Care Unit (ICU)when physicians are asked to act quickly and in an explanable modus.We believe that, because of the above mentioned properties, logic could beused in clinical trials and medical protocols. An important component that wouldbe needed to help team (human-in the loop) scalability in addressing diseases isto generate a visualisation of real-time hypothesis and logic solutions allowingto interpret the results according to the personalisable features selected for.
References
1. Alur, R., Belta, C., Ivanicic, F., Kumar, V., Mintz, M., Pappas, G.J., Rubin,H., Schug, J.: Hybrid modeling and simulation of biomolecular networks. In: Pro-ceedings of the 4th Intl. Workshop on Hybrid Systems: Computation and Control(HSCC). Lecture Notes in Computer Science, vol. 2034, pp. 19–32. Springer (2001)2. Aman, B., Ciobanu, G.: Modelling and verification of weighted spik-ing neural systems. Theoretical Computer Science , 92 – 102(2016). https://doi.org/http://dx.doi.org/10.1016/j.tcs.2015.11.005,
3. Andreoli, J.: Logic programming with focusing proofs in linear logic. J. Log. Com-put. (3), 297–347 (1992)4. Antczak, C., Mahida, J., Singh, C., Calder, P., Djaballah, H.: A high content assayto assess cellular fitness. Combinatorial Chemistry & High Throughput Screening (1), 12–24 (jan 2014). https://doi.org/10.2174/13862073113169990056, https://doi.org/10.2174/13862073113169990056 . Ascolani, G., Occhipinti, A., Li`o, P.: Modelling circulating tumour cells for per-sonalised survival prediction in metastatic breast cancer. PLoS Computational Bi-ology (5) (2015). https://doi.org/10.1371/journal.pcbi.1004199, https://doi.org/10.1371/journal.pcbi.1004199
6. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M.,Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald,M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology.Nature Genetics (1), 25–29 (May 2000). https://doi.org/10.1038/75556, https://doi.org/10.1038/75556
7. Bahrami, A., De Maria, E., Felty, A.: Modelling and verifying dy-namic properties of biological neural networks in coq. In: Proceedings ofthe 9th International Conference on Computational Systems-Biology andBioinformatics. pp. 12:1–12:11. CSBio 2018, ACM, New York, NY, USA(2018). https://doi.org/10.1145/3291757.3291771, http://doi.acm.org/10.1145/3291757.3291771
8. Bellomo, N., Preziosi, L.: Modelling and mathematical problems related to tumorevolution and its interaction with the immune system. Mathematical and ComputerModelling (3), 413–452 (2000)9. Bertot, Y., Cast´eran, P.: Interactive Theorem Proving and Program Development.Coq’Art: The Calculus of Inductive Constructions. Springer (2004)10. Blinov, M.L., Faeder, J.R., Goldstein, B., Hlavacek, W.S.: BioNetGen: Software forrule-based modeling of signal transduction based on the interactions of moleculardomains. Bioinformatics (17), 3289–3291 (2004)11. Bodenreider, O.: Bio-ontologies: current trends and future directions. Briefingsin Bioinformatics (3), 256–274 (May 2006). https://doi.org/10.1093/bib/bbl027, https://doi.org/10.1093/bib/bbl027
12. Boutillier, P., Camporesi, F., Coquet, J., Feret, J., L´y, K.Q., Th´eret, N., Vignet, P.:Kasa: A static analyzer for kappa. In: Ceska, M., Safr´anek, D. (eds.) ComputationalMethods in Systems Biology - 16th International Conference, CMSB 2018, Brno,Czech Republic, September 12-14, 2018, Proceedings. Lecture Notes in ComputerScience, vol. 11095, pp. 285–291. Springer (2018). https://doi.org/10.1007/978-3-319-99429-1, https://doi.org/10.1007/978-3-319-99429-1
13. Boutillier, P., Maasha, M., Li, X., Medina-Abarca, H.F., Krivine, J., Feret, J.,Cristescu, I., Forbes, A.G., Fontana, W.: The kappa platform for rule-based mod-eling. Bioinformatics (13), i583–i592 (2018)14. Caravagna, G., Giarratano, Y., Ramazzotti, D., Tomlinson, I., Graham,T.A., Sanguinetti, G., Sottoriva, A.: Detecting repeated cancer evolutionfrom multi-region tumor sequencing data. Nature Methods (9), 707–714(aug 2018). https://doi.org/10.1038/s41592-018-0108-x, https://doi.org/10.1038/s41592-018-0108-x
15. Cervesato, I., Pfenning, F.: A Linear Logical Framework. Inf. & Comp. (1),19–75 (2002)16. Chaouiya, C., Naldi, A., Remy, E., Thieffry, D.: Petri net representationof multi-valued logical regulatory graphs. Natural Computing (2), 727–750 (2011). https://doi.org/10.1007/s11047-010-9178-0, http://dx.doi.org/10.1007/s11047-010-9178-0
17. Chaudhuri, K., Despeyroux, J., Olarte, C., Pimentel, E.: Hybrid linear log-ics, revisited. Mathematical Structures in Computer Science pp. 1–26 (2019).https://doi.org/10.1017/S09601295180004398. Clarke, E.M., Henzinger, T.A., Veith, H., Bloem, R. (eds.): Handbook of ModelChecking. Springer (2018). https://doi.org/10.1007/978-3-319-10575-8, https://doi.org/10.1007/978-3-319-10575-8
19. Cybenko, G.: Approximation by superpositions of a sigmoidal func-tion. Mathematics of Control, Signals and Systems (4), 303–314 (1989).https://doi.org/10.1007/BF0255127420. Danos, V., Feret, J., Fontana, W., Harmer, R., Hayman, J., Krivine, J., Thompson-Walsh, C.D., Winskel, G.: Graphs, rewriting and pathway reconstruction for rule-based models. In: D’Souza, D., Kavitha, T., Radhakrishnan, J. (eds.) IARCSAnnual Conference on Foundations of Software Technology and TheoreticalComputer Science, FSTTCS 2012, December 15-17, 2012, Hyderabad, India.LIPIcs, vol. 18, pp. 276–288. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik(2012). https://doi.org/10.4230/LIPIcs.FSTTCS.2012.276, https://doi.org/10.4230/LIPIcs.FSTTCS.2012.276
21. Danos, V., Feret, J., Fontana, W., Harmer, R., Krivine, J.: Rule-based modelling ofcellular signalling. In: Caires, L., Vasconcelos, V.T. (eds.) Proceedings of the 18thIntl. Conference on Concurrency Theory (CONCUR). Lecture Notes in ComputerScience, vol. 4703, pp. 17–41. Springer (2007). https://doi.org/10.1007/978-3-540-74407-8 3, http://dx.doi.org/10.1007/978-3-540-74407-8_3
22. Danos, V., Feret, J., Fontana, W., Krivine, J.: Abstract interpretation of cellularsignalling networks. In: Logozzo, F., Peled, D.A., Zuck, L.D. (eds.) Proceedings ofthe Ninth International Conference on Verification, Model Checking and AbstractInterpretation, VMCAI ’2008. Lecture Notes in Computer Science, vol. 4905, pp.83–97. Springer, Berlin, Germany, San Francisco, USA (7–9 January 2008)23. Danos, V., Joinet, J.B., Schellinx, H.: The structure of exponentials: uncoveringthe dynamics of linear logic proofs. In: Gottlob, G., Leitsch, A., Mundici, D. (eds.)Proceedings of the 3rd Kurt G¨odel colloquium. Lecture Notes in Computer Science,vol. 713, pp. 159–171. Springer Verlag (1993)24. Danos, V., Laneve, C.: Formal molecular biology. Theoretical Computer Science (1), 69–110 (Sep 2004), http://dx.doi.org/10.1016/j.tcs.2004.03.065
25. De Maria, E., Muzy, A., Gaff´e, D., Ressouche, A., Grammont, F.: Verification oftemporal properties of neuronal archetypes modeled as synchronous reactive sys-tems. In: Cinquemani, E., Donz´e, A. (eds.) Hybrid Systems Biology - 5th Interna-tional Workshop, HSB 2016, Grenoble, France, October 20-21, 2016, Proceedings.pp. 97–112 (2016). https://doi.org/10.1007/978-3-319-47151-8 726. De Maria, E., Di Giusto, C.: Parameter learning for spiking neural networks mod-elled as timed automata. In: Proceedings of the 11th International Joint Con-ference on Biomedical Engineering Systems and Technologies (BIOSTEC 2018) -Volume 3: BIOINFORMATICS, Funchal, Madeira, Portugal, January 19-21, 2018.pp. 17–28 (2018). https://doi.org/10.5220/0006530300170028, https://doi.org/10.5220/0006530300170028
27. De Maria, E., L’Yvonnet, T., Gaff´e, D., Ressouche, A., Grammont, F.: Mod-elling and formal verification of neuronal archetypes coupling. In: Proceed-ings of the 8th International Conference on Computational Systems-Biologyand Bioinformatics, Nha Trang City, Viet Nam, December 7-8, 2017. pp. 3–10 (2017). https://doi.org/10.1145/3156346.3156348, https://doi.org/10.1145/3156346.3156348
28. Despeyroux, J., Chaudhuri, K.: A hybrid linear logic for constrained transitionsystems. In: Post-Proceedings of the 9th Intl. Conference on Types for Proofs andPrograms (TYPES 2013). Leibniz Intl. Proceedings in Informatics, vol. 26, pp.50–168. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik (2014), http://dx.doi.org/10.4230/LIPIcs.TYPES.2013.150
29. Despeyroux, J., Felty, A., Lio, P., Olarte, C.: A logical framework for modellingbreast cancer progression. In: Proceedings of the 1st Intl. Symposium on Molecu-lar Logic and Computational Synthetic Biology (MLCSB’2018). Lecture Notes inComputer Science, vol. 11415. Springer (2019)30. Despeyroux, J., Olarte, C., Pimentel, E.: Hybrid and subexponential linearlogics. Electronic Notes in Theoretical Computer Science , 95–111 (June2017). https://doi.org/10.1016/j.entcs.2017.04.007, https://doi.org/10.1016/j.entcs.2017.04.007 , a preliminary version is available as an HAL hal-01358057 andArXiv report31. Ding, J., Trippa, L., Zhong, X., Parmigiani, G.: Hierarchical bayesian analysisof somatic mutation data in cancer. Ann. Appl. Stat. (2), 883–903 (06 2013).https://doi.org/10.1214/12-AOAS604, https://doi.org/10.1214/12-AOAS604
32. Enderling, H., Chaplain, M.A., Anderson, A.R., Vaidya, J.S.: A mathematicalmodel of breast cancer development, local treatment and recurrence. Journalof theoretical biology (2), 245–259 (2007),
33. Fages, F., Martinez, T., Rosenblueth, D.A., Soliman, S.: Influence networks com-pared with reaction networks: Semantics, expressivity and attractors. IEEE/ACMTransactions on Computational Biology and Bioinformatics (4), 1138–1151 (July2018). https://doi.org/10.1109/TCBB.2018.280568634. Fages, F., Soliman, S., Chabrier-Rivier, N.: Modelling and querying interactionnetworks in the biochemical abstract machine BIOCHAM. Journal of BiologicalPhysics and Chemistry (2), 64–73 (2004)35. Felty, A.P., Momigliano, A.: Hybrid: A definitional two-level approach to reasoningwith higher-order abstract syntax. Journal of Automated Reasoning (1), 43–105(2012)36. Gamazon, E.R., Segre, A.V., van de Bunt, M., Wen, X., Xi, H.S., Hormozdi-ari, F., Ongen, H., Konkashbaev, A., Derks, E.M., Aguet, F., Quan, J., Nico-lae, D.L., Eskin, E., Kellis, M., Getz, G., McCarthy, M.I., Dermitzakis, E.T.,Cox, N.J., Ardlie, K.G.: Using an atlas of gene regulation across 44 human tis-sues to inform complex disease- and trait-associated variation. Nature Genet-ics (7), 956–967 (Jun 2018). https://doi.org/10.1038/s41588-018-0154-4, https://doi.org/10.1038/s41588-018-0154-4
37. Gavaghan, D., Brady, J.M., Behrenbruch, C., Highnam, R., Maini, P.: Breastcancer: Modelling and detection. Computational and Mathematical Methods inMedicine (1), 3–20 (2002)38. Girard, J.Y.: Linear logic. Theoretical Computer Science , 1–102 (1987)39. Glont, M., Nguyen, T.V.N., Graesslin, M., H¨alke, R., Ali, R., Schramm,J., Wimalaratne, S.M., Kothamachu, V.B., Rodriguez, N., Swat, M.J., Eils,J., Eils, R., Laibe, C., Malik-Sheriff, R.S., Chelliah, V., Nov`ere, N.L., Her-mjakob, H.: BioModels: expanding horizons to include more modelling ap-proaches and formats. Nucleic Acids Research (D1), D1248–D1253 (Nov 2017).https://doi.org/10.1093/nar/gkx1023, https://doi.org/10.1093/nar/gkx1023
40. Gregorio, A.D., Bowling, S., Rodriguez, T.A.: Cell competition and its role in theregulation of cell fitness from development to cancer. Developmental Cell (6),621–634 (sep 2016). https://doi.org/10.1016/j.devcel.2016.08.012, https://doi.org/10.1016/j.devcel.2016.08.012
1. Hodgkin, A.L., Huxley, A.F.: A quantitative description of membrane current andits application to conduction and excitation in nerve. The Journal of Physiology (4), 500–544 (1952)42. Hofest¨adt, R., Thelen, S.: Quantitative modeling of biochemical networks. In: InSilico Biology, vol. 1, pp. 39–53. IOS Press (1998)43. Iuliano, A., Occhipinti, A., Angelini, C., Feis, I.D., Li´o, P.: Cancer markers selectionusing network-based cox regression: A methodological and computational practice.Frontiers in Physiology (jun 2016). https://doi.org/10.3389/fphys.2016.00208, https://doi.org/10.3389/fphys.2016.00208
44. Iuliano, A., Occhipinti, A., Angelini, C., Feis, I.D., Li`o, P.: Combining pathwayidentification and breast cancer survival prediction via screening-network meth-ods. Frontiers in Genetics (jun 2018). https://doi.org/10.3389/fgene.2018.00206, https://doi.org/10.3389/fgene.2018.00206
45. Izhikevich, E.M.: Which model to use for cortical spiking neurons?IEEE Transactions on Neural Networks (5), 1063–1070 (Sept 2004).https://doi.org/10.1109/TNN.2004.83271946. de Jong, H., Gouz´e, J.L., Hernandez, C., Page, M., Sari, T., Geiselmann, J.: Qual-itative simulation of genetic regulatory networks using piecewise-linear models.Bulletin of Mathematical Biology (2), 301–340 (2004), http://dx.doi.org/10.1016/j.bulm.2003.08.010
47. Kingman, J.F.C.: Poisson processes, Oxford Studies in Probability, vol. 3. TheClarendon Press Oxford University Press, New York (1993), oxford Science Publi-cations48. Knutsdottir, H., Palsson, E., Edelstein-Keshet, L.: Mathematical modelof macrophage-facilitated breast cancer cells invasion. Journal of theo-retical biology (2014),
49. Lapicque, L.: Recherches quantitatives sur l’excitation electrique des nerfs traiteecomme une polarization. J Physiol Pathol Gen , 620–635 (1907)50. Maas, W.: Networks of spiking neurons: The third generation of neural networkmodels. Trans. Soc. Comput. Simul. Int. (4), 1659–1671 (Dec 1997)51. Mahmoud, M.Y., Felty, A.P.: Formal meta-level analysis framework for quantumprogramming languages. In: Proceedings of the 12th Workshop on Logical andSemantic Frameworks with Applications (LSFA 2017). Electronic Notes in Theo-retical Computer Science, vol. 338, pp. 185–201. Elsevier (2018)52. de Maria, E., Despeyroux, J., Felty, A.: A logical framework for systems biology.In: Springer (ed.) Proc. of the 1st International Conference on Formal Methods inMacro-Biology (FMMB). LNCS, vol. 8738, pp. 136–155 (September 2014)53. Markram, H.: Biology - the blue brain project. In: Proceedings of theACM/IEEE SC2006 Conference on High Performance Networking and Com-puting, November 11-17, 2006, Tampa, FL, USA. p. 53. ACM Press(2006). https://doi.org/10.1145/1188455.1188511, http://doi.acm.org/10.1145/1188455.1188511
54. Matsuoka, K.: Mechanisms of frequency and pattern control in the neural rhythmgenerators. Biological cybernetics (5-6), 345–353 (1987)55. McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in ner-vous activity. The bulletin of mathematical biophysics (4), 115–133 (1943).https://doi.org/10.1007/BF02478259, http://dx.doi.org/10.1007/BF02478259
56. Miller, D., Pimentel, E.: A formal framework for specifying sequentcalculus proof systems. Theor. Comput. Sci. , 98–116 (2013).ttps://doi.org/10.1016/j.tcs.2012.12.008, http://dx.doi.org/10.1016/j.tcs.2012.12.008
57. Musen, M.A., Noy, N.F., Shah, N.H., Whetzel, P.L., Chute, C.G., Story,M.A., and, B.S.: The national center for biomedical ontology. Journalof the American Medical Informatics Association (2), 190–195 (Mar2012). https://doi.org/10.1136/amiajnl-2011-000523, https://doi.org/10.1136/amiajnl-2011-000523
58. Mushlin, S.B., Greene, H.L.: Decision making in medicine: An algorithmic ap-proach, 3e (clinical decision making series) 3rd edition,
59. Nigam, V., Miller, D.: Algorithmic specifications in linear logic with subexponen-tials. In: Porto, A., L´opez-Fraguas, F.J. (eds.) PPDP. pp. 129–140. ACM (2009)60. Nigam, V., Olarte, C., Pimentel, E.: A general proof system for modalities inconcurrent constraint programing. In: D’Argenio, P.R., Melgratti, H.C. (eds.) Pro-ceedings of the 24th intl. conference on Concurrency theory (CONCUR). LectureNotes in Computer Science, vol. 8052, pp. 410–424. Springer Verlag (2013)61. Nigam, V., Olarte, C., Pimentel, E.: On subexponentials, focusing andmodalities in concurrent systems. Theoretical Computer Science , 35–58(2017). https://doi.org/10.1016/j.tcs.2017.06.009, https://doi.org/10.1016/j.tcs.2017.06.009
62. Olarte, C., Chiarugi, D., Falaschi, M., Hermith, D.: A proof theoretic view ofspatial and temporal dependencies in biochemical systems. Theor. Comput. Sci. , 25–42 (2016)63. Paugam-Moisy, H., Bohte, S.M.: Computing with spiking neuron networks. In:Handbook of Natural Computing, pp. 335–376. Springer, Berlin, Heidelberg (2012).https://doi.org/10.1007/978-3-540-92910-9 1064. Phillips, A., Cardelli, L.: A correct abstract machine for the stochastic pi-calculus.In: BioConcur: Workshop on Concurrent Models in Molecular Biology (2004)65. Purves, D., Augustine, G.J., Fitzpatrick, D., Hall, W.C., LaMantia, A., McNamara,J.O., Williams, S.M. (eds.): Neuroscience. Sinauer Associates, Inc., 3rd edn. (2006)66. Regev, A., Panina, E.M., Silverman, W., Cardelli, L., Shapiro, E.: Bioambients:An abstraction for biological compartments. Theoretical Computer Science (1),141–167 (Sep 2004)67. Regev, A., Silverman, W., Shapiro, E.Y.: Representation and simulation of bio-chemical processes using the π -calculus process algebra. In: Proceedings of the 6thPacific Symposium on Biocomputing. pp. 459–470 (2001)68. Regev, A., Teichmann, S.A., Lander, E.S., Amit, I., Benoist, C., Birney, E., Boden-miller, B., Campbell, P., Carninci, P., Clatworthy, M., Clevers, H., Deplancke, B.,Dunham, I., Eberwine, J., Eils, R., Enard, W., Farmer, A., Fugger, L., Gottgens,B., Hacohen, N., Haniffa, M., Hemberg, M., Kim, S., Klenerman, P., Kriegstein,A., Lein, E., Linnarsson, S., Lundberg, E., Lundeberg, J., Majumder, P., Mari-oni, J.C., Merad, M., Mhlanga, M., Nawijn, M., Netea, M., Nolan, G., Peer, D.,Phillipakis, A., Ponting, C.P., Quake, S., Reik, W., Rozenblatt-Rosen, O., Sanes,J., Satija, R., Schumacher, T.N., Shalek, A., Shapiro, E., Sharma, P., Shin, J.W.,Stegle, O., Stratton, M., Stubbington, M.J.T., Theis, F.J., Uhlen, M., van Oude-naarden, A., Wagner, A., Watt, F., Weissman, J., Wold, B., Xavier, R., and, N.Y.:The human cell atlas. eLife (Dec 2017). https://doi.org/10.7554/elife.27041, https://doi.org/10.7554/elife.27041
69. Rogers, Z.N., McFarland, C.D., Winters, I.P., Naranjo, S., Chuang, C.H., Petrov,D., Winslow, M.M.: A quantitative and multiplexed approach to uncover thetness landscape of tumor suppression in vivo. Nature Methods (7), 737–742 (may 2017). https://doi.org/10.1038/nmeth.4297, https://doi.org/10.1038/nmeth.4297
70. Savage, N.: Computing cancer software models of complex tissues and disease areyielding a better understanding of cancer and suggesting potential treatments.Nature , s62–s63 (2012)71. Shihab, H.A., Rogers, M.F., Gough, J., Mort, M.E., Cooper, D.N., Day, I.N.M.,Gaunt, T.R., Campbell, C.: An integrative approach to predicting the functionaleffects of non-coding and coding sequence variation. Bioinformatics (10), 1536–1543 (2015)72. Talcott, C., Dill, D.: Multiple representations of biological processes. Transactionson Computable Systems Biology pp. 221–245 (2006)73. Troelstra, A.S.: Lectures on Linear Logic. CSLI Lecture Notes 29, Center for theStudy of Language and Information, Stanford, California (1992)74. Venkataram, S., Dunn, B., Li, Y., Agarwala, A., Chang, J., Ebel, E.R., Geiler-Samerotte, K., H´erissant, L., Blundell, J.R., Levy, S.F., Fisher, D.S., Sher-lock, G., Petrov, D.A.: Development of a comprehensive genotype-to-fitnessmap of adaptation-driving mutations in yeast. Cell (6), 1585–1596.e22 (sep2016). https://doi.org/10.1016/j.cell.2016.08.002, https://doi.org/10.1016/j.cell.2016.08.002
75. Wynn, M.L., Consul, N., Merajver, S.D., Schnell, S.: Logic-based models in sys-tems biology: a predictive and parameter-free network analysis method. Inte-grative Biology (11), 1323 (2012). https://doi.org/10.1039/c2ib20193c, https://doi.org/10.1039/c2ib20193c
76. Xavier, B., Olarte, C., Reis, G., Nigam, V.: Mechanizing focused linearlogic in coq. Electr. Notes Theor. Comput. Sci. , 219–236 (2018).https://doi.org/10.1016/j.entcs.2018.10.014,, 219–236 (2018).https://doi.org/10.1016/j.entcs.2018.10.014,