[PDF] Annotations for Rule-Based Models

Abstract

The chapter reviews the syntax to store machine-readable annotations and describes the mapping between rule-based modelling entities (e.g., agents and rules) and these annotations. In particular, we review an annotation framework and the associated guidelines for annotating rule-based models of molecular interactions, encoded in the commonly used Kappa and BioNetGen languages, and present prototypes that can be used to extract and query the annotations. An ontology is used to annotate models and facilitate their description.

Full PDF

aa r X i v : . [ q - b i o . M N ] S e p Annotations for Rule-Based Models

Matteo Cavaliere, Vincent Danos, Ricardo Honorato-Zimmerand William Waites

All authors contributed equally.

Corresponding author :William WaitesLaboratory for Foundations of Computer ScienceSchool of InformaticsUniversity of EdinburghEdinburgh, EH8 9LE, UKEmail: [email protected]

This manuscript has been prepared for inclusion in

Modeling Biomolecular Site Dynam-ics: Methods and Protocols (Ed. William S. Hlavacek), part of the

Methods in Molecular Biology series. ummary The chapter reviews the syntax to store machine-readable annotations and describes themapping between rule-based modelling entities (e.g., agents and rules) and these anno-tations. In particular, we review an annotation framework and the associated guidelinesfor annotating rule-based models, encoded in the commonly used Kappa and BioNetGenlanguages, and present prototypes that can be used to extract and query the annotations.An ontology is used to annotate models and facilitate their description.

Key words : Rule-Based Modelling, Kappa, BNGL, KaSim, BioNetGen, RDF, Turtle,MIRIAM, SPARQL, Rule-Based Model Ontology (rbmo) Introduction

The last decade has seen a rapid growth in the number of model repositories (1 – . Itis also well understood that the creation of models and of repositories requires expertknowledge and integration of different types of biological data from multiple sources (6) .These data are used to derive the structure of, and parameters for, models. However whichdata are used and how the model is derived from that data is not part of the model unlesswe explicitly annotate it in a well-deﬁned way.In general, annotations decorate a model with metadata linking to biologically rele-vant information (7) . Annotations can facilitate the automated exchange, reuse and com-position of complex models from simpler ones. Annotations can also be used to aid inthe computational conversion of models into a variety of other data formats. For example,PDF documents (1) or visual graphs (8) can be automatically generated from annotatedmodels to aid human understanding.On the computational and modelling side, rule-based languages such as Kappa (9 , and the BioNetGen language (BNGL) (11) have emerged as helpful tools for modellingbiological systems (12) . One of the key beneﬁts of these languages is that they can beused to concisely represent the combinatorially complex state space inherent in biolog-ical systems. Rule-based modelling languages have facilities to add comments that areintended for unstructured documentation and usually directed at the modeller or program-mer. These comments are in general human and not machine-readable. This can be aproblem because the biological semantics of the model entities are not computationallyaccessible and cannot be used to inﬂuence the processing of models. revious works have addressed the issue of annotations in rule-based models. Inparticular, Chylek et al. (13) suggested extending rule-based models to include metadata,focusing on documenting models with biological information using comments to aid theunderstanding of models for humans. More recently, Klement et al. (14) have presenteda way to add data in the form of property/value pairs using a speciﬁc syntax. On the otherhand, machine-readable annotations have been applied to rule-based models using PySB,a programming framework for writing rules using Python (15) . However, this approach isrestricted as annotations cannot be applied to sites or states.In this chapter we ﬁrst discuss the general idea of annotation, its relation with theconcept of abstraction and then review an annotation framework for rule-based modelsthat has recently introduced and deﬁned by Misirli et al. (16) . Before entering into the technicalities of the annotation framework of interest, we wouldlike to discuss in an informal and intuitive manner the differences between models cre-ated using reactions versus those obtained using rules, discussing the advantages of con-sidering annotations and how they are strictly linked to the much more general notion ofabstraction.

Rules as they are to be understood in the present context are a sort of generalisation ofreactions of the type familiar from chemistry. The reason this generalisation is useful can e easily seen. Consider the following toy example, + → which can be understood as a step in the creation of a polymer from two monomers.Multiple applications of this rule result in a progressively longer chain of molecules, + → + →· · · Writing this down in the notation of reaction, we would need to explicitly generate theentire unbounded sequence of reactions with an unbounded number of chemical species, A + A → A A + A → A A + A → A A + A → A · · · Clearly this is unworkable with ﬁnite resources. The solution is to allow a species tohave sites at which connections can be made. In the above example, the species could bedescribed as

A(u,d) , that is substance A with an upstream and a downstream site. The nteraction can then be written as, A ( d ) , A ( u ) → A ( d ! ) , A ( u ! ) where the notation d means that the downstream site is unbound, and the d!1 meansit is bound with a particular edge. Note that this says nothing about the state of theupstream site in the ﬁrst instance of A nor the downstream site in the second, so therecan be an arbitrarily long chain of molecules attached at those sites. It is easy to see thatthis compact notation captures both the inﬁnite sequence of reactions and the inﬁnite setof species that would be required to express the same interaction as a set of chemicalreactions. Informally, the word “annotation” has a meaning similar to “documentation” but with adifference in speciﬁcity. Whereas documentation connotes a rather large text describingsomething (e.g., an object), annotation is expected to be much shorter. It also evokesproximity: it should be in some sense “near” or “on” the thing being annotated. In bothcases there seems to be a sharp distinction between the text and its object. The objectshould exist in its own right, be operational or functional in the appropriate sense withoutneed to refer to exogenous information. Annotation might help to understand the objectbut the object exists and functions on its own.This folk theory of annotation breaks down almost immediately under inspection.A typical example is data about a book such as might be found in a library catalogue.This is a canonical example used to explain what is meant by metadata or data aboutdata. The ﬁrst observation is that if we look at a book and peruse the ﬁrst few pages it s almost certain that we will ﬁnd information about who wrote it and where and when itwas published. This information is not the book, it is metadata about the book, but it iscontained within the covers of the book itself.Perhaps this is not so serious a problem. It is possible in principle to imagine that abook, say with the cover and ﬁrst few pages torn out, is still a book that can be read andenjoyed. Perhaps somehow the metadata is separable and that is the important idea. Thebook-object can exist on its own and serve its purpose independently of any annotation ormetadata. While the metadata might usually be found attached to the book, can easily beremoved without affecting the fundamental nature of the book itself.But what of other things that we might want to do with a book? A favourite activity ofacademics is citing documents such as books and journal articles. This means includingenough information in one work to unambiguously refer to another. There is an urbanlegend that Robarts Library at the University of Toronto is said to be sinking because theengineers charged with building it did not account for the weight of the books within.Supposing that this were true, these poor apocryphal engineers could have used metadatawithin the university’s catalogue to sum up the number of pages of all the books andestimate their weight to prevent this tragedy. This summing is a computation that operatespurely on the metadata and not on the books themselves.More mundanely, categorising and counting books in order to plan for the use ofshelf space in a growing collection, or even locating a book in a vast library seem to be aplausible things to do with metadata that do not involve any actual books. Manipulationand productive use of annotation is possible in the absence of the objects and well-deﬁnedeven if the objects no longer exist. One imagines the despondent librarians and archivistsof Alexandria making such lists to document and take stock of their losses after the great re.Now suppose that this list created by the librarians of Alexandria itself ended up ina collection in some other library or museum. It is given a catalogue number, the yearit was acquired is marked. Now what was metadata has now itself become the object ofannotation! Here we arrive at the important insight: what is to be considered annotationand what is to be considered object depends on the purpose one has in mind. If the interestis the collection of books in Alexandria, the list is metadata, a collection of annotations,about them. If the interest is in the documents held by a contemporary museum, amongwhich the list is to be found, the list is an object. The distinction is not intrinsic to theobjects themselves.Turning to the subject at hand, the objects to be annotated are rules. According tothe folk theory of annotation, there should be a sharp distinction between rules and theirannotation. When it comes to executing a simulation, the software that does this neednot be aware of the annotations. Indeed the syntax for annotating rules described here isspeciﬁcally designed for backwards compatibility such that the presence of annotationsshould not require any disruption or changes to existing simulation software.So long as the purpose of the annotations is as an aid to understanding the rules thelocation of the distinction between rule and annotation is ﬁxed in this way. The obviousquestion is, are there other uses to which the annotations can be put?In the report of Misirli et al. (16) , where the annotation mechanism of interest was ﬁrstdescribed, one of the motivating examples was to create a contact map , a type of diagramthat shows which agents or species interact with each other and labels these interactionswith the rule(s) implementing them (an example of a contact map is provided later in thischapter). se of a contact map is illustrative of how movable the separation between object andannotation is (17) . The entities of interest, rules and agents, are on the one hand decoratedwith what seems to be purely metadata: labels, or friendly human-readable names that aresuitable for placing on a diagram, preferable to the arbitrary machine-readable tokens thatare used by the simulator (arbitrary because they are subject to renaming as required). Onthe other hand, the interactions between the substances, what we wish to make a diagram of , are written down in a completely different language with an incompatible syntax.A minor change of perspective neatly solves this problem. It is simply to rephrasethe rule, saying “A and B are related, and the way they are related is that they combineto form C”. This has the character of annotation: the rule itself is a statement about thesubstances involved. More particularly it describes a relation between the substances. Onclose inspection, giving a token used in a rule a human-readable name is also articulat-ing a relation, that is the relation called “naming” between the substance and a string ofcharacters suitable for human consumption.With this change of perspective, all of the information required to make the diagramis now of the same kind. The only construct that must be manipulated is sets of relationsbetween entities (and strings of text, which are themselves a kind of entity). Fortunatelythere exist tools and query languages for operating on data stored in just this form. Havingworked out the correct query to extract precisely what is needed to produce the diagram,actually generating it is trivial. The preceding section on annotation, describes what can be thought of as a “movableline”. “Above” this line are annotations and “below” it are the objects. The sketch of a rocedure for producing a diagram to help humans understand something about a systemof rules as a whole illustrated that it can be convenient to place this line somewhere otherthan might be obvious at ﬁrst glance — and this example will be considered in more detailbelow to demonstrate how this happens in practice. However the idea of such a line andhow it might be moved and what exactly that means is still rather vague. Let us now makethis notion more precise.Formally, a relation between two sets, X and Y , is a subset of their Cartesian product, X × Y . In other words it is a set of pairs, { ( x , y ) | x ∈ X , y ∈ Y } , and it is usually thecase that it is a proper subset in that not all possible pairs are present in the relation. Inorder to compute with relations, the sets must be symbols , X , Y ⊆ S , ultimately realisedas sequences of bits because a computer or Turing machine is deﬁned to operate on suchsequences and not on every day objects such as books, pieces of fruit, molecules or sub-atomic particles, or indeed concepts and ideas.This last point is important. It is not possible to compute with objects in the world, bethey concrete or abstract, it is only possible to compute with symbols representing theseobjects. Another kind of relation is required for this, R ⊆ S × W where W is the set ofobjects in the world. It is not possible to write down such relations between symbols andreal-world objects any more than it is possible to write down an apple. So we have twokinds of relations to work with: annotations which are relations among symbols in S × S and representations which map between symbols and the world, S × W .Some observations are in order. First, the representation relation has an inverse, W × S . This is trivial and is simply “has the representation” as opposed to “represents”.Second, of course, symbols are themselves objects in the world, so S ⊂ W . Finally, rela-tions among symbols—annotations—are likewise objects in the world, so S × S ⊂ W also. his is useful because it means that it is possible to represent annotations with symbolsand from there articulate relationships among them using more annotations, constructinga hierarchy of annotation as formalised by Buneman et al. (17) . We run into troublethough if we try to say that representations are in the world because S × W is larger than W , and this is why they cannot be written down. Symbols represent, annotations are rela-tions among symbols, and the character of representation is fundamentally different fromthat of annotation.We have enough background to explain the intuition behind the folk theory of an-notation, that there is a difference of kind between the annotation and its object. Thisdifference is just the same as considering a notional pair ( x ∈ S , − ) qua annotation or qua representation, that is, deciding the set from which the second element of the tuple shouldbe drawn. A similar choice is available, mutatis mutandis , for the inverse, ( − , x ∈ S ) . Ifthe unspeciﬁed element is in W \ S (i.e., tthose objects in the world that are not symbols),there is only one choice: the relation can only be treated as representation. If it is in W ∩ S then either interpretation is possible, and one or the other might be more appropriate de-pending on the purpose or question at hand.The ability to make this choice is no more than the ability to select an appropriate ab-straction . Selecting an abstraction means deciding to interpret a relation as representationand not annotation. This is best illustrated with an example. Here is a (representation ofan) agent or substance: u dbA A ( u , d , b ) Perhaps it is a fragment of DNA which can be connected up-stream and down-stream to ther such fragments, and it has a binding site where RNA polymerase can attach as partof the transcription process. Some annotations involving A might be, ( A , “Promoter” ) ∈ L ( A , TTGATCCCTCTT ) ∈ M where the ﬁrst is from the set of labellings, L , and the second is from the set of corre-spondences with symbols representing nucleotide sequences, which we will call M . Amore conventional way of writing these correspondences more closely to the SemanticWeb practice is, A label "Promoter"A has sequence TTGATCCCTCTT

The labelling annotation is easy to understand. It simply provides a friendly string forhumans.The second annotation is more challenging. It says that the DNA fragment representedby A corresponds to a certain sequence of nucleotides. On the one hand the symbol for thatsequence could simply be taken as-is, if it does not play an explicit role in the computersimulation of whatever interactions A is involved in. That corresponds to treating thesymbol TTGATCCCTCTT as a representation. It is the end of the chain; there only remainsthe relation from that symbol to something in the world, which is not something that wecan write down or compute with.On the other hand, it is equally possible to write down an annotation on the sequencesymbol that speciﬁes the list of (symbols representing) the nucleotides that it consists of,

TTGATCCCTCTT consists [T,T,G,A,T,C,C,C,T,C,T,T] . uch a verbose formulation might be useful if one had, for example, a machine for synthe-sizing DNA molecules directly to implement an experiment in vitro for a genetic circuitthat had already been developed and tested by simulation in silico , or a computer simu-lation that worked at a very detailed level. In this case the symbols, A , C , T and G playthe role of representing real-world objects and the symbol TTGATCCCTCTT is merely areference that can be used to ﬁnd the (list-structured) relations among them. By makingthis choice, the selected abstraction has become more granular.Another example, pertinent because while we do not yet have machines for arbitrarilyassembling DNA molecules from individuals, we do have tools for drawing contact mapdiagrams, is a rule involving this agent. This agent has a binding site which may beoccupied by an RNA-polymerase molecule at a certain rate. This could be expressed as, where now we have introduced a little bit more of the syntax that will be more fullyelaborated later for annotating rules written in a ﬁle using the Kappa language. Here arule is simply given a useful human-readable label, the canonical example of annotatingsomething. On its own, it is useful. Imagine a summary of the contents of a set of suchrules using labels like this. For that purpose the symbol r1 can be considered just torepresent the rule without looking any deeper. Ab RNApsr1

For a contact map diagram, more information is needed.At right is the diagram that corresponds to the example rule.It shows that A and RNAp interact, that it happens throughthe action of the rule r1 and in particular involves the sites b and s . Perhaps includingwhich sites are involved in the interaction is too granular and it might be desireable insome circumstances to have a similar diagram involving just the agents and the rules. Or erhaps more information is desired to be presented in the diagram such as whether therule involves creation or annihilation of a bond, say using arrows or a broken edge. Nomatter the level of granularity required, it is clear that the necessary information is con-tained within the rule itself, so simply considering the symbol r1 to opaquely representto the rule as an object is not enough. Such a level of abstraction would be too coarse, itmust be elaborated further. Instead it should be considered to represent annotations thatthemselves represent the structure of the rule.This discussion illustrates the idea of a contact map and how it can be generatedfrom annotations, but to elaborate the rule sufﬁciently to support the production of sucha diagram in practice involves a much greater amount of annotation structure than wehave seen so far. A rule has a left and a right side. Each of those has zero or moreagent patterns . A rule does not involve agents as such, rather it involves patterns thatcan match conﬁgurations of agents, so patterns then relate, intra alia , to agents and sites,and ﬁnally bonds between sites that are either to be matched (on the left-hand side) orcreated or annihilated (on the right-hand side). It involves some work to represent a ruleas annotation in sufﬁcient detail, but it is straightforward to do within the framework thatwe have given. We focus our attention on annotating models written using either the Kappa or BioNetGenlanguage. Software tools compatible with these modeling languages are available at thefollowing URLs:1. https://kappalanguage.org . https://github.com/RuleWorld Following our general discussion above about annotations and rule-based models, herewe move to the more technical aspects (focusing on two languages, Kappa and BNGL)and follow the terminology and the deﬁnitions provided in Ref. (16) .Biological entities are represented by agents in Kappa and molecule types in BNGL(we use ‘agent’ to generically refer to both types). Agents may include any number ofsites that represent the points of interactions between agents. For example, the DNAbinding domain of a transcription factor (TF) agent can be connected to a TF binding siteof a DNA agent. Moreover, sites can have states. For instance, a TF may have a site forphosphorylation and DNA binding may be constrained to occur only when the state ofthis site is phosphorylated.For an agent with two sites, of which one with two internal states and the other withthree, the number of possible combinations is six (Figure 1A, B). A pattern is an (possi-bly incomplete) expression of an agent in terms of its internal states and binding states.Rules specifying biological interactions consist of patterns on the left-hand side which,when matched, produce the result on the right-hand side (Figure 1C). Speciﬁc patterns ofinterest can be declared as an observable of a model (i.e., a simulation output).It is important to highlight that while the syntactic deﬁnition of an agent identiﬁessites and states in rule-based models, the semantics of sites and states is usually clear onlyto the modeller. Cleary, if one wishes to have machine access, then this information mustbe exposed in a structured way. The key idea of the approach presented in Ref. (16) andthat we review in what follows, is to extend the syntax of rule-based models to incorporate : An agent definitionA(site1˜u˜v, site2˜x˜y˜z) B : Possible combinations of internal statesA(site1˜u,site2˜x)A(site1˜u,site2˜y)A(site1˜u,site2˜z)A(site1˜v,site2˜x)A(site1˜v,site2˜y)A(site1˜v,site2˜z) C : An example binding ruleA(site1˜v,site2˜z),A(site1˜v,site2˜y)-> A(site1˜v!1,site2˜z),A(site1˜v!1,site2˜y) @kf Figure 1: A. An agent with two sites. site1 has two possible internal states while site2 has three. B. This agent can be used in six different ways depending on theinternal states of its sites. C. A rule that speciﬁes how agent A forms a dimer whenthe state of site1 is v and the states of site2 are z and y , respectively. Thesymbol !n means that the sites where it appears are bound (connected) together.The constant kf denotes the kinetic rate associated with the rule. annotations.Existing metadata resources include machine readable controlled vocabularies andontologies and Web services providing standard access to external identiﬁers and guide-lines for the use of these resources. For example, the Minimum Information Requested inthe Annotation of Models (MIRIAM) standard (18) provides a standard for the minimalinformation required for the annotation of models.Following Ref. (16) we suggest that entities in models should be linked to externalinformation through the use of unique and unambiguous Uniform Resource Identiﬁers(URIs), which are embedded within models. The uniqueness and global scope of theseURIs are then crucial for disambiguation of model agents, variables and rules.We also choose to represent annotations using the Resource Description Framework(RDF) data model (19 , as statements or binary predicates. A statement can link a mod-elling entity to a value using a standard qualiﬁer term (predicate), which represents the elationship between the entity and the value. These qualiﬁers often come from controlledvocabularies or ontologies in order to unambiguously identify the meaning of modellingentities. URIs are used as values to link these entities to external resources, and henceto a large amount of biological information by keeping the number of annotations min-imal. The links themselves are typed, again with URIs. The qualiﬁers and resources towhich they refer are drawn from ontologies that encode the Description Logic (21) for aparticular domain.Semantics can be uniﬁed by means of metadata with controlled vocabularies. Thereare several metadata standard initiatives that provide controlled vocabularies from whichstandard terms can be taken. For instance, metadata terms provided by the Dublin CoreMetadata Initiative (DCMI) (22) or BioModels qualiﬁers can be used to describe mod-elling and biological concepts (1 , . On the other hand, ontologies such as the RelationOntology provide formal deﬁnitions of relationships that can be used to describe mod-elling entities (24) . There are also several other ontologies and resources that are widelyused to classify biological entities represented in models with standard values (25) : theSystems Biology Ontology (SBO) (26) to describe types of rate parameters; the GeneOntology (GO) (27) and the Enzyme Commission (EC) numbers (28) to describe bio-chemical reactions; the Sequence Ontology (SO) (29) to annotate genomic features andunify the semantics of sequence annotation; the BioPAX ontology (30) to specify types ofbiological molecules and the Chemical Entities of Biological Interest (ChEBI) (31) termsto classify chemicals. URIs of entries from biological databases, such as UniProt (32) forproteins and KEGG (33) for reactions, can also be used to uniquely identify modellingentities.Access to data should be uniﬁed and this can be done by accessing external re- ources through URIs using MIRIAM or Identiﬁers.org URIs (34) . It should be notedthat MIRIAM identiﬁers are not resolvable directly over the Internet and require out ofband knowledge to retrieve additional information though they are unique and unambigu-ous. These URIs consist of collections and their terms, which may represent externalresources and their entries respectively. For example, the MIRIAM URI urn:miriam:uniprot:P69905 ( see Note 1 ) and the Identiﬁers.org URI http://identifiers.org/uniprot/P69905 can be used to link entities to the P69905 entry from UniProt.The relationships between modelling entities, annotation qualiﬁers and values can be rep-resented using RDF graphs.We recommend to use RDF syntax that represents knowledge as ( subject, predicate,value ) triples, in which the subject can be an anonymous reference or a URI, the predicateis a URI and the object can be a literal value, an anonymous reference or a URI.Subjects and objects may refer to an ontology term, an external resource or an entitywithin a model. RDF graphs can be then serialized in different formats such as XML orthe more human readable Turtle format (35) . Modelling languages such as the SystemsBiology Markup Language (SBML) (36) , CellML (37 , and Virtual Cell Markup Lan-guage (5) are all XML-based and provide facilities to embed RDF/XML annotations (6) .Moreover, there are also other exchange languages, such as BioPAX and the SyntheticBiology Open Language (SBOL) (39 , , that can be serialised directly as RDF/XMLallowing custom annotations to be embedded.Following the suggestion of Misirli et al. (16) one can extend the use of RDF andMIRIAM annotations to describe a syntax to store machine-readable annotations and anontology to facilitate the mapping between rule-based model entities and their annota-tions. We illustrate annotations using terms from this ontology and propose some exam- les. Here, we review the syntax originally deﬁned by Misirli et al. (16) for storing annotations.We start by noticing that a common approach, when trying to add additional structured in-formation to a language where it is undesirable to change the language itself, is to deﬁnea special way of using comments. This practice is established for structured documen-tation or “docstrings” in programming languages (41 , . The idea is to use this sameapproach so that models written using the conventions that we describe here do not requiremodiﬁcation of modelling software, such as KaSim (43) or RuleBender (44) .For this reason, we use the language’s comment delimiter followed by the ‘ ˆ ’ char-acter to denote annotations in the textual representation of rule-based languages. Kappaand BNGL both use the ‘ ’ symbol to identify comment lines, so in the case of theselanguages, comments containing annotations are signalled by a line beginning with ‘ ’.This distinguishes between comments containing machine-readable annotations and com-ments intended for direct human consumption. Annotation data for a single modellingentity or a model itself can be declared over several lines and each line is preﬁxed withthe ‘ ’ symbol.Annotations are then serialised in the RDF/Turtle format. We claim that this leads toa good balance between the need for a machine-readable syntax and a human readabletextual representation. Rule-based modelling languages are themselves structured textformats designed for this same balance, so RDF/Turtle is more suitable than the XML-based representations of RDF. nnotations for a single rule-based model entity are a list of statements. It is importantto stress that annotations may refer to other annotations within the same model. When allthe lines corresponding to a rule-based model and the annotation delimiter symbols areremoved, the remaining RDF lines can represent a single RDF document. This enablesannotations to be quickly and easily extracted without special tools ( see Note 2 ).In textual rule-based models, it is difﬁcult to store annotations within a modellingentity since Kappa and BNGL represent modelling entities such as agents and rules assingle lines of text. As a result, there is no straightforward location to attach annotationsto an entity. Following Ref. (16) we achieve the mapping between a modelling entity andits annotations by deﬁning an algorithm to construct a URI from the symbol used in themodelling language. The algorithm generates unique and unambiguous preﬁxed namesthat are intended to be interpreted as part of a Turtle document. The algorithm simplyconstructs the local part of a preﬁxed name by joining symbolic names in the modellinglanguage with the ‘:’ character, and prepending the empty preﬁx, ‘:’. This means thatone must satisfy the condition that the empty preﬁx is deﬁned for this use. Using thisalgorithm, we can derive a globally unique reference for the y internal state of site site2 of agent A from A(site1˜u˜v,site2˜x˜y˜z) as :A:site2:y .In Kappa, rules do not have symbolic names but each rule can be preceded by freetext surrounded by single quotes. We require this free text to be consistent with the localname syntax in the Turtle and SPARQL (45) languages. If this requirement is satisﬁed,identiﬁers for subrules are created by just adding their position index, based on one, to theidentiﬁer for a rule (see Figure 4B). A similar restriction is placed on other tokens used inthe models; agent and site names, variable and observable names must all conform to thelocal name syntax. ontrolled vocabularies such as BioModels.net qualiﬁers are formed of model and biology qualiﬁers. The former offers terms to describe models. BioModels.net qualiﬁersare also appropriate to annotate rule-base models, but additional qualiﬁers are neededto fully describe rule-based models. These are speciﬁc to the annotation of rule-basedmodels and this is done by using a distinct ontology – the Rule-Based Model Ontology – in the namespace http://purl.org/rbm/rbmo conventionally abbreviated as rbmo (we omit the preﬁx if there is no risk of ambiguity). Each qualiﬁer is constructedby combining this namespace with an annotation term. A subset of signiﬁcant terms arelisted in Table 1 while the full ontology is available online at the namespace URI.In the rbmo vocabulary, the

Model classes such as

Kappa and

BioNetGen spec-ify the type of the model being annotated. The term

Agent is used to declare physicalmolecules. Hence, the

Agent class can represent agents and tokens in Kappa, or moleculetypes in BioNetGen.

Site and

State represent sites and states in these declarations re-spectively. Rules are identiﬁed using

Rule . The predicates hasSite and hasState andtheir inverses are used to annotate the links between agents, sites and internal states dec-larations. Table 1 reviews the terms related to the declaration of the basic entities fromwhich models are constructed. We assume that the terms that start with an uppercase letterare types (In the sense of rdf:type , and also in this instance owl:Class ) for the entitiesin the model which the modeller could be expected to explicitly annotate. The predicatesbegin with a lowercase letter and are used to link entities to their annotations.Table 2 includes terms to facilitate representation of rules in RDF. This change of rep-resentation (materialization), from Kappa or BNGL to RDF is something that can easilybe automated and a tool is already available (for models written in Kappa).This representation in RDF is helpful for analysis of models because it merges the odel itself with the metadata in a uniform way easy to query. Annotations that cannot bederived from the model (as well as the model itself) are written explicitly in RDF/Turtleusing the terms from Table 1 embedded in comments using a special delimiter. Extrastatements can then be derived by parsing and analyzing the model using terms fromTable 2 and the same naming convention from the algorithm previously described. Thesestatements are then merged with the externally supplied annotations to obtain a completeand uniform representation of all the information about the model.The open-ended nature of the RDF data model means that it is possible to freelyincorporate terms from other ontologies and vocabularies, including application-speciﬁcones. In this respect, two terms are crucial. The dct:isPartOf predicate from DCMIMetadata Terms is used to denote that a rule or agent declaration is part of a particularmodel (or similarly with its inverse, dct:hasPart ).The bqiol:is predicate from the Biomodels.net Biology Qualiﬁers is used to linkinternal states of sites to indicate their biological meaning. This term is chosen because itdenotes a kind of identiﬁcation that is much weaker than the logical replacement seman-tics of owl:sameAs . Using the latter would imply that everything that can be said aboutthe site qua biological entity can also be said about the site qua modelling entity. Clearly,these are not the same and identifying them in a strong sense would risk incorrect resultswhen computing with the annotations.Table 3 enumerates useful ontologies and vocabularies with their conventional pre-ﬁxes to annotate rule-based models. This list is not exhaustive and can be extended. .3 Adding annotations to model-deﬁnition ﬁles Here, we demonstrate how the suggested annotations can be added to rule-based models.Again we follow the methodology originally presented in Ref. (16) . Figure 2: An example model annotation (as in (16) ), with details about its name,description, creators and online repository location. The preﬁx deﬁnitions re-quired to annotate the model are deﬁned ﬁrst, and the empty preﬁx is deﬁned forthe model namespace itself.

Annotations are added by simply adding a list of preﬁx deﬁnitions representing an-notation resources providing relevant terms for the annotation of all model entities (suchas agents and rules). These deﬁnitions are followed by statements about the title anddescription of the model, using the title and description terms from

Dublin Core .Annotations can be expanded to include model type, creator, creation time, and its link toan entry in a model database (Figure 2).Table 4 shows how distinct entities in a model can be annotated using terms from rbmo and from other vocabularies. Figure 3 shows examples of Agent annotations. InFigure 3A the ATP token is annotated as a small molecule with the identiﬁer 15422 fromChEBI. Agents without sites can also be annotated in a similar way. In Figure 3B, theagent is speciﬁed to be a protein using the biopax:Protein value for the biopax:physicalEntity : B : C : D : Figure 3: Examples of agent annotations for A. An ATP token agent. B. A kinaseagent with phosphorylated and unphosphorylated site. C. A promoter agent with aTF binding site. D. An agent and an associated observable for the phosphorylatedSpo0A protein, which can act as a TF. 24 erm. This protein agent is annotated as P16497 from UniProt, which is a protein kinase(i.e., an enzyme that phosphorylates proteins) involved in the process of sporulation. Ithas a site with the phosphorylated and unmodiﬁed states, which are annotated with corre-sponding terms from the Protein Modiﬁcation Ontology (46) .The ro:hasFunction term associates the agent with the GO’s histidine kinase molec-ular function term

GO:0000155 . In Figure 3C, a promoter agent with a TF binding siteis represented. Both the promoter and the operator agents are of “DnaRegion” type,and are identiﬁed with the

SO:0000167 and

SO:0000057 terms. Although the nu-cleotide information can be linked to existing repositories using the bqbiol:is term,for synthetic sequences agents can directly be annotated using SBOL terms. The term sbol:nucleotides is used to store the nucleotide sequences for these agents. A parent-child relationship between the promoter and the operator agents can be represented usingan sbol:SequenceAnnotation

RDF resource, which allows the location of an opera-tor subpart to be speciﬁed.This approach can be used to annotate a pattern with a speciﬁc entry from a database(patterns can also be stated as observables of the model). For instance, Figure 3D shows anexample of such an observable.

Spo0A p represents the phosphorylated protein, whichacts as a TF and is deﬁned as an observable.Figure 4 demonstrates annotation of rules. The ﬁrst rule (Figure 4A) describes thebinding of the LacI TF to a promoter. This biological activity is described using the

GO:0008134 ( transcription factor binding ) term. In the second example (Figure 4B),a phosphorylation rule is annotated. The rule contains a subrule representing ATP toADP conversion. This subrule is linked to the parent rule with the hasSubrule qualiﬁer.Moreover, the annotation of the rate for this rule is presented in Figure 4C. The anno- ated Kappa and BNGL models for a two-component system (TCS), controlling a simplepromoter architecture can be found online ( see Note 3 ).Finally, in Figure 5 we present the fragment of a speciﬁc rule (taken from the TCSKappa model) materialised using the krdf tool. The tool generates a version of the rulesthemselves in RDF together with the annotations (in this way the entire model is presentedin a more uniform way). A : B : C : Figure 4: Annotating rules and variables. A. TF DNA binding rule. B. Phospho-rylation rule with a subrule for the ATP to ADP conversion. C. Annotation of aphosphorylation rate variable.

The framework we have described can be coupled to the development of tools that al-low one to extract and analyze the annotations embedded in a model. Several tools arecurrently under development. We demonstrate here the krdf tool that can be used forchecking duplication of rules and inconsistencies between different parts of a model, basic As1As2Spo0A_to_As2Spo0A a rbmo:Rule ;dct:title "Cooperative unbinding" ;rbmo:lhs [a rbmo:Pattern ;rbmo:agent :Spo0A ;rbmo:status [rbmo:isBoundBy :As1As2Spo0A_to_As2Spo0A:left:1 ;rbmo:isStatusOf :Spo0A:DNAb ;a rbmo:BoundState ;], [rbmo:internalState :Spo0A:RR:p ;rbmo:isStatusOf :Spo0A:RR ;a rbmo:UnboundState ;] ;].

Figure 5: Fragment of the RDF representation of a materialised rule obtained bymerging the metadata supplied by the model author with an RDF representation ofthe rule. The left hand side of the rule contains a pattern involving :Spo0A and thatthere are two pieces of state information: The ﬁrst one refers to the :Spo0A:DNAb site, and it is bound to something (that can only be recovered using the rest ofthe model, not presented here). The second refers to the :Spo0A:RR site, it has aparticular internal state, and it is unbound. problems encountered when composing and creating biological models (47 , . Anotherapplication is to draw an annotated contact map visualising the entities involved, the inter-actions and the biological information stored in the annotations – this merges the classicalnotion of contact map used to illustrate Kappa and BNGL models (9 , with biologicalsemantics.The krdf tool operates on Kappa models and has several modes of operation thatcan provide increasingly more information about a model. The ﬁrst, selected with the -a option, extracts the modeller’s annotations. The second mode, selected with the -m option, materialises the information in the rules themselves into the RDF representation(as illustrated in Figure 5). Finally the -n option normalises the patterns present in therules according to their declarations.Once a complete uniform representation of the model in RDF has been generated, one an query it using SPARQL with a tool such as roqet (50) . For example, a SPARQLquery can deduce a contact map – pairings of sites in agents that undergo binding andunbinding according to the rules in a model. These pairings form a graph that can bevisualised using tools such as GraphViz (51) . With an appropriate query ( see Note 4 ), roqet can output the result in a GraphViz-compatible format. A more sophisticatedmanipulation ( see Note 5 ) can extract annotations from the RDF representation of theTCS example model and easily create a richly annotated contact map diagram (Figure6). In this way, biological information extracted from the annotations can be added to theagents, sites and interactions (using GraphViz for rendering) ( see

Note 6 ). b0: Spo0A binding to Operator 1b1: Spo0A binding to Operator 2b2: Spo0A-KinA bindingu0: Cooperative unbinding: Spo0A unbinds from Operator 1u1: Cooperative unbinding: Spo0A unbinds from Operator 2u2: Spo0A unbinding from Operator 1u3: Spo0A unbinding from Operator 2u4: Spo0A(phosp)-KinA unbindingu5: Spo0A(unphos)-KinA unbindingPromoter (DnaRegion)Spo0A (Protein) KinA (Protein)TTCGACA DNAbb0 u0 u2 AGTCGAAb1 u1 u3 RR H405b2 u4 u5 Figure 6: Contact map generated by a SPARQL query on the RDF materi-alisation of the TCS example in Kappa. Biological information concerningthe agents, rules and sites, types of the molecules, DNA sequences and ty-pology of the interaction, are extracted automatically from the model anno-tations. This ﬁgure is a reproduction of Fig. 6 in Ref. (16) ; no changeshave been made. The ﬁgure is used under the terms of the CC-BY license( https://opendefinition.org/licenses/cc-by/ ). Moreover, one can easily create a query that implements a join operation on the prop- rty of bqbiol:is , enforcing a stronger form of identity semantics than this predicateis usually given. A ﬁlter clause is necessary to prevent a comparison of a rule with itself(see the SPARQL query in Figure 7). In this way, the discussed annotations could alsobe used to detect duplication of rules (e.g., obtained when combining different biologicalmodels). SELECT DISTINCT ?modelA ?ruleA ?modelB ?ruleBWHERE {?ruleA a rbmo:Rule;dct:isPartOf ?modelA;bqbiol:is ?ident.?ruleB a rbmo:Rule;dct:isPartOf ?modelB;bqbiol:is ?ident.FILTER (?ruleA != ?ruleB)}

Figure 7: Detection of duplicate rules.

Another possible application of the presented annotation schema is the checking ofinconsistencies in a rule-based model. This can be done in several different ways. Asimple way is to use the replacement semantics of owl:sameAs . A statement of the form a owl:sameAs b means that every statement about a is also true if a is replaced by b . In particular if we have statements about the types of a and b , and these types aredisjoint, the collection of statements is unsatisﬁable (hence, the model has been found tobe inconsistent). Then, an OWL reasoner such as HermiT (52) or Pellet (53) can derivethat a and b have type owl:Nothing .This can be implemented with the following work-ﬂow (here only sketched): (i)generate the fully materialised RDF version of a model using krdf . For each use of bqbiol:is , add a new statement using owl:sameAs ; (ii) retrieve all ontologies that areused from the Web. For each external vocabulary term with bqbiol:is or bqbiol:isVersionOf retrieve a description and any ontology that it uses (recursively). Merge all of these into single graph. This graph contains the complete model and annotations, with entitieslinked using a strong form of equality to external vocabulary terms, and descriptions ofthe meaning of these vocabulary terms; (iii) the reasoner can be used to derive terms thatare equivalent to owl:Nothing and if any of these terms is found then an inconsistencyhas been identiﬁed. Using the proof generation facilities of OWL reasoners, the sequenceof statements required to arrive at foo rdf:type owl:Nothing can be reproduced (inthis way, the initial source of the inconsistency can be also identiﬁed). In this chapter we have reviewed the recent proposal to incorporate annotations into rule-based models, following the approach recently presented in Ref. (16) . We have alsodiscussed in a more general way the role of annotations and how they are strongly relatedto the notion of abstraction. In general, for consistency, we have followed the termsoriginally deﬁned in Ref. (16) . However, the suggested standardized terms can be used ina complementary manner with existing metadata resources such as MIRIAM annotationsand URIs, and existing controlled vocabularies and ontologies. Although, the approachhas only described the annotations of Kappa- and BNGL-formatted model-deﬁnition ﬁles,it can be easily applied to other formats for rule-based models.In particular, PySB (15) already includes a list of MIRIAM annotations at the modellevel, and can be extended to include the type of annotations described here. SBML’s multi package ( see

Note 7 ) (54) is intended to standardise the exchange of rule-basedmodels. The entities in this format inherit the annotation property from the standardSBML and can therefore include RDF annotations. These SBML models could thus beimported or exported by tools such as KaSim or BioNetGen/RuleBender, avoiding the oss of any biological information.It is important to remark that annotations are also useful for automated conversionsbetween different formats. Conversion between rules and reaction networks is already anongoing research subject (47) , and the availability of annotations can play an importantrole for reliable conversion and ﬁne-tuning of models (55 , . It is straightforward touse the framework presented and automatically map agents and rules to glyphs (13) or toconvert models into other visual formats such as SBGN or genetic circuit diagrams (57) .More generally, annotations are designed for machine readability and can be producedcomputationally (e.g., by model repositories). This can be done by developing APIs andtools to access a set of biological parts (4 , that will incorporate rule-based descriptionsand will be annotated with the proposed schema. This will open the possibility of compos-ing (stitching together) rule-based models extracted from distinct repositories. Tools suchas Saint (48) and SyBIL (7) could be extended to automate the annotation of rule-basedmodels. In this way, the extensive information available in biological databases and theliterature can be integrated and made available via rule-based models, taking advantageof the syntax and the framework presented here and elsewhere.One of the ultimate goals is to use annotations as a facilitator of automatic com-position of rule-based models. As recently suggested by Misirli et al. (59) the proposedschema can be used to automate the design of biological systems using a rule-based modelwith a workﬂow that combines the deﬁnition of modular templates to instantiate rules forbasic biological parts. The templates, deﬁning rule-based models for basic biological parts( see Note 8 ), can be associated with quantitative parameters to create particular parts mod-els, which can then be merged into executable models. Such models may be annotatedusing the reviewed schema leading to a feasible protocol to automate their composition or the scalable modelling of synthetic systems (59) .The described annotation ontology for rule-based models can be found at http://purl.org/rbm/rbmo while the tool and all the presented examples can be found at http://purl.org/rbm/rbmo/krdf .

1. A dereferenceable URI using the MIRIAM Web service is

2. For example, on a UNIX system, the following pipeline could be used: grep 'ˆ \ ˆ'| sed 's/ˆ \ ˆ//'

3. The ﬁles tcs.kappa and tcs.bngl are available in the http://purl.org/rbm/rbmo/examples directory.4. See the binding.sparql ﬁle in the krdf directory.5. See the contact.py script in the krdf directory.6. The tool assumes that only single instances of an agent are involved in a rule. Itcan be generalized.7. See http://sbml.org/Documents/Specifications/SBML_Level_3/Packages/multi for details.8. These are available at http://github.com/rbm/composition . cknowledgement The Engineering and Physical Sciences Research Council grant EP/J02175X/1 (to V.D.and M.C.), the European Union’s Seventh Framework Programme for research, techno-logical development and demonstration grant 320823 RULE (to W.W., R.H-Z, V.D.).

References (1)

Li C, Donizelli M, Rodriguez N, et al (2010) BioModels Database: an enhanced,curated and annotated resource for published quantitative kinetic models. BMC SystBiol 4:92 (2)

Yu T, Lloyd CM, Nickerson DP, et al (2011) The Physiome Model Repository 2.Bioinformatics 27:743–744 (3)

Snoep JL, Olivier BG (2003) JWS online cellular systems modelling and microbiol-ogy. Microbiology 149:3045–3047 (4)

Misirli G, Hallinan JS, Wipat A (2014) Composable modular models for syntheticbiology. ACM J Emerging Technol Comput Syst 11:22 (5)

Moraru II, Schaff JC, Slepchenko BM, et al (2008) Virtual Cell modelling and sim-ulation software environment. IET Syst Biol 2:352–362 (6)

Endler L, Rodriguez N, Juty N, et al (2009) Designing and encoding models forsynthetic biology. J R Soc Interface 6:S405–S417 (7)

Blinov ML, Ruebenacker O, Schaff JC, Moraru II (2010) Modeling without bor- ers: creating and annotating VCell models using the Web. Lect Notes Comput Sci6053:3–17 (8) Funahashi A, Jouraku A, Matsuoka Y, Kitano H (2007) Integration of CellDesignerand SABIO-RK. In Silico Biol 7:81–90 (9)

Danos V, Laneve C (2004) Formal molecular biology. Theor Comput Sci 325:69–110 (10)

Danos V, Feret J, Fontana W, Krivine J (2007) Scalable simulation of cellular sig-naling networks. Lect Notes Comput Sci 4807:139–157 (11)

Faeder JR, Blinov ML, Hlavacek WS (2009) Rule-based modeling of biochemicalsystems with BioNetGen. Methods Mol Biol 500:113–167 (12)

K ¨ohler A, Krivine J, Vidmar J (2014) A rule-based model of base excision repair.Lect Notes Comput Sci 8859:173–195 (13)

Chylek LA, Hu B, Blinov ML, et al (2011) Guidelines for visualizing and annotatingrule-based models. Mol BioSyst 7:2779–2795 (14)

Klement M, Dˇed T, ˇSafr´anek D, et al (2014) Biochemical Space: a frameworkfor systemic annotation of biological models. Electron Notes Theor Comput Sci306:31–44 (15)

Lopez CF, Muhlich JL, Bachman JA, Sorger PK (2013) Programming biologicalmodels in Python using PySB. Mol Syst Biol 9:646 (16)

Misirli G, Cavaliere M, Waites W, et al (2016) Annotation of rule-based models with ormal semantics to enable creation, analysis, reuse and visualisation. Bioinformat-ics 32:908–917 (17) Buneman P, Kostylev EV, Vansummeren S (2013) Annotations are relative. In: Pro-ceedings of the 16th International Conference on Database Theory, ACM, NewYork, pp 177–188 (18)

Le Nov`ere N, Finney A, Hucka M, et al (2005) Minimum information requested inthe annotation of biochemical models (MIRIAM). Nat Biotechnol 23:1509–1515 (19)

Cyganiak R, Wood D, Lanthaler M (2014) RDF 1.1 concepts and abstract syn-tax. URL , Ac-cessed 17 Aug 2016 (20)

Gandon F, Schreiber G (2014) RDF 1.1 XML syntax. URL , Accessed 17 Aug 2016 (21)

McGuinness DL, van Harmelen F (2004) OWL Web Ontology Language. URL , Accessed 17 Aug 2016 (22)

DCMI Usage Board (2012) DCMI metadata terms. URL , Accessed 17 Aug 2016 (23)

Le Nov`ere N, Finney A (2005) A simple scheme for annotating SBML with refer-ences to controlled vocabularies and database entries. URL , Ac-cessed 17 Aug 2016 Smith B, Ceusters W, Klagges B, et al (2005) Relations in biomedical ontologies.Genome Biol 6:R46 (25)

Swainston N, Mendes P (2009) libAnnotationSBML: a library for exploiting SBMLannotations. Bioinformatics 25:2292–2293 (26)

Courtot M, Juty N, Kn¨upfer C, et al (2011) Controlled vocabularies and semanticsin systems biology. Mol Syst Biol 7:543 (27)

The Gene Ontology Consortium (2001) Creating the Gene Ontology Resource: de-sign and implementation. Genome Res 11:1425–1433 (28)

Bairoch A (2000) The ENZYME database in 2000. Nucleic Acids Res 28:304–305 (29)

Eilbeck K, Lewis S, Mungall C, et al (2005) The Sequence Ontology: a tool for theuniﬁcation of genome annotations. Genome Biol 6:R44 (30)

Demir E, Cary MP, Paley S, et al (2010) The BioPAX community standard for path-way data sharing. Nat Biotechnol 28:935–942 (31)

Degtyarenko K, de Matos P, Ennis M, et al (2008) ChEBI: a database and ontologyfor chemical entities of biological interest. Nucleic Acids Res 36:D344–D350 (32)

Magrane M, UniProt Consortium (2011) UniProt Knowledgebase: a hub of inte-grated protein data. Database (Oxford) 2011:bar009 (33)

Kanehisa M, Araki M, Goto S, et al (2008) KEGG for linking genomes to life andthe environment. Nucleic Acids Res 36:D480–D484 (34)

Juty N, Le Novre N, Laibe C (2012) Identiﬁers.org and MIRIAM Registry: commu-nity resources to provide persistent identiﬁcation. Nucleic Acids Res 40:D580–D586 EPrud’hommeaux E, Carothers G (2014) RDF 1.1 Turtle. URL , Accessed on 17 Aug 2016 (36)

Hucka M, Finney A, Sauro HM, et al (2003) The Systems Biology Markup Lan-guage (SBML): a medium for representation and exchange of biochemical networkmodels. Bioinformatics 19:524–531 (37)

Cuellar AA, Lloyd CM, Nielsen PF, et al (2003) An overview of CellML 1.1, abiological model description language. SIMULATION 79:740–747 (38)

Hedley WJ, Nelson MR, Bellivant DP, Nielsen PF (2001) A short introduction toCellML. Philos Trans A Math Phys Eng Sci 359:1073–1089 (39)

Galdzicki M, Wilson ML, Rodriguez CA, et al (2012) Synthetic Biology OpenLanguage (SBOL) version 1.1.0. URL http://hdl.handle.net/1721.1/73909 , Accessed 17 Aug 2016 (40)

Galdzicki M, Clancy KP, Oberortner E, et al (2014) The Synthetic Biology OpenLanguage (SBOL) provides a community standard for communicating designs insynthetic biology. Nat Biotechnol 32:545–550 (41)

Acuff R (1988) KSL Lisp environment requirements. URL https://profiles.nlm.nih.gov/BB/G/H/S/D/_/bbghsd.pdf , Accessed14 Aug 2018 (42)

Stallman R, other GNU Project volunteers (1992) GNU coding standards. URL , Accessed 17 Aug 2016 Krivine J (2014) KaSim. URL https://github.com/Kappa-Dev/KaSim ,Accessed 17 Aug 2016 (44)

Xu W, Smith AM, Faeder JR, Marai GE (2011) RuleBender: a visual interface forrule-based modeling. Bioinformatics 27:1721–1722 (45)

Prud’hommeaux E, Seaborne A (2013) SPARQL query language for RDF. URL , Accessed 17 Aug 2016 (46)

Montecchi-Palazzi L, Beavis R, Binz PA, et al (2008) The PSI-MOD communitystandard for representation of protein modiﬁcation data. Nat Biotechnol 26:864–866 (47)

Blinov ML, Ruebenacker O, Moraru II (2008) Complexity and modularity of intra-cellular networks: a systematic approach for modelling and simulation. IET SystBiol 2:363–368 (48)

Lister AL, Pocock M, Taschuk M, Wipat A (2009) Saint: a lightweight integrationenvironment for model annotation. Bioinformatics 25:3026–3027 (49)

Danos V, Feret J, Fontana W, Harmer R, Krivine J (2009) Rule-based modelling andmodel perturbation. Lect Notes Comput Sci 5750:116–137 (50)

Beckett D (2015) Redland RDF libraries. URL http://librdf.org , Accessed17 Aug 2016 (51)

Ellson J, Gansner E, Koutsoﬁos L, North SC, Woodhull G (2001) Graphviz–opensource graph drawing tools. Lect Notes Comput Sci 2265:483–484 (52)

Shearer R, Motik B, Horrocks I (2008) HermiT: a highly-efﬁcient OWL reasoner. In: roceedings of the 5th International Workshop on OWL: Experiences and Directions(OWLED) (53) Sirin E, Parsia B, Cuenca Grau B, Kalyanpur A, Katz Y (2007) Pellet: A practicalOWL-DL reasoner. Web Semantics: Science, Services and Agents on the WorldWide Web 5:51–53 (54)

Zhang F, Meier-Schellersheim M (2018) SBML Level 3 package: multistate, multi-component and multicompartment species, version 1, release 1. J Integr Bioinform15:20170077 (55)

Tapia JJ, Faeder JR (2013) The Atomizer: extracting implicit molecular structurefrom reaction network models. In: Proceedings of the International Conference onBioinformatics, Computational Biology and Biomedical Informatics, ACM, NewYork (56)

Harris LA, Hogg JS, Tapia JJ, et al (2016) BioNetGen 2.2: advances in rule-basedmodeling. Bioinformatics 32:3366–3368 (57)

Misirli G, Hallinan JS, Yu T, et al (2011) Model annotation for synthetic biology:automating model to nucleotide sequence conversion. Bioinformatics 27:973–979 (58)

Cooling MT, Rouilly V, Misirli G, et al (2010) Standard virtual biological parts: arepository of modular modeling components for synthetic biology. Bioinformatics26:925–931 (59)

Misirli G, Waites W, Cavaliere M, et al (2016) Modular composition of synthetic bi-ology designs using rule-based models. In: Proceedings of 8th International Work-shop on Bio-Design Automation (IWBDA 2016) Natale DA, Arighi CN, Barker WC, et al (2011) The Protein Ontology: a structuredrepresentation of protein forms and complexes. Nucleic Acids Res 39:D539–D545 (61)

Mulder NJ, Apweiler R (2008) The InterPro database and tools for protein domainanalysis. Curr Protoc Bioinformatics 21:2.7.1–2.7.18 ables Table 1

Term Description

Kappa , BioNetGen

Model types.

Agent

Type for declarations of biological entities.

Site

Type for sites of

Agent s. State

Type for internal states of

Site s. hasSite , hasState , siteOf , stateOf Predicates for linking

Agent s, Site s and

State s. Rule

Type for interactions between agents. hasSubrule , subruleOf Speciﬁes that a rule has a subrule (i.e., KaSimsubrules).

Observable

Type for agent patterns counted by a simula-tion. 41 able 2

Term Description

Pattern

Type of a pattern as it appears in a

Rule or Observable . lhs , rhs Predicates for linking a

Rule to its left and right hand side

Pattern s. pattern Predicate for linking an

Observable to the patterns that itmatches. agent

Predicate for linking a

Pattern and a site within it to thecorresponding

Agent . status Speciﬁes a status of a particular

Site (and

State ) in a

Pattern . isStatusOf , internalState Predicates for linking a status in a

Pattern to correspond-ing

Site and

State declarations. isBoundBy

Speciﬁes the bond that a

Site is bound to in a particular

Pattern . Bonds are identiﬁed via URIs.

BoundState , UnboundState

Terms denoting that a

Site in a

Pattern is bound or un-bound. 42 able 3

Preﬁx Description rbmo

Rule-based modelling ontology (presented in this paper) dct

Dublin Core Metadata Initiative Terms ( ) bqiol BioModels.net Biology Qualiﬁers (1) go Gene Ontology (27) psimod

Protein Modiﬁcation Ontology (46) so Sequence Ontology (29) sbo

Systems Biology Ontology (26) chebi

Chemical Entities of Biological Interest Ontology (31) uniprot

UniProt Protein Database (32) pr Protein Ontology (60) ro OBO Relation Ontology (24) owl

Web Ontology Language ( ) sbol The Synthetic Biology Open Language (39 , foaf Friend of a Friend Vocabulary ( http://xmlns.com/foaf/spec ) ipr InterPro (61) biopax

Biological Pathway Exchange Ontology Ontology (30) able 4 erm Annotation Values Agent declarations: rdf:type Agentdct:isPartOf

Identiﬁer for the

Model . hasSite Identiﬁer of a

Site . biopax:physicalEntity A biopax:PhysicalEntity term, e.g. DnaRegion or SmallMolecule . bqbiol:is A term representing an individual type of an Agent entity, e.g. a proteinentry from UniProt. bqbiol:isVersionOf

A term representing the class type of an Agent entity, e.g. a SO term fora DNA-based agent.Site declarations: rdf:type SitehasState

Identiﬁer for an internal state. bqbiol:isVersionOf

A term representing the type of the site, e.g. A SO term for a nucleicacid-based site or an InterPro term for an amino acid-based site.Internal state declarations: rdf:type Statebqbiol:is

A term representing the state assignment, e.g. a term from the PSIMODor the PO.Rules: rdf:type Ruledct:isPartOf

Identiﬁer for the

Model . bqbiol:is A term representing an individual type of a rule, e.g. a KEGG entry. bqbiol:isVersionOf

A term representing a class type of a rule, e.g. an EC number, a SOterm or a GO term. subrule

Identiﬁer for a

Rule entity. lhs † rhs † References to the patterns forming the left and right hand side of therule.Observables: rdf:type Observabledct:isPartOf

Identiﬁer for the

Model . pattern † References the constituent patterns.Patterns: rdf:type Patternro:hasFunction

A GO term specifying a biological function. agent † Reference to the corresponding

Agent declaration internalState † Reference to a representation of a site’s state isStatusOf † Reference from a site’s state to the corresponding siteVariables: rdf:type sbo:SBO:0000002 ( quantitative systems description parameter ) dct:isPartOf Identiﬁer for the

Model . bqbiol:isVersionOf A term representing a variable type. If exists, the term should a subtermof

SBO:0000002 ..