[PDF] Attenuation Regulation as a Term Rewriting System

Abstract

The classical attenuation regulation of gene expression in bacteria is considered. We propose to represent the secondary RNA structure in the leader region of a gene or an operon by a term, and we give a probabilistic term rewriting system modeling the whole process of such a regulation.

Full PDF

aa r X i v : . [ q - b i o . Q M ] J un Attenuation Regulationas a Term Rewriting System ⋆ Eugene Asarin , Thierry Cachat , Alexander Seliverstov ,Tayssir Touili , and Vassily Lyubetsky LIAFA, CNRS and University Paris Diderot, asarin,txc,[email protected] IITP, Russian Academy of Science, slvstv,[email protected]

Abstract.

The classical attenuation regulation of gene expression inbacteria is considered. We propose to represent the secondary RNA struc-ture in the leader region of a gene or an operon by a term, and we give aprobabilistic term rewriting system modeling the whole process of sucha regulation.

Modeling the mechanisms of regulation of gene expression, allowing predictionof quantitative characteristics of this expression (such as estimation of the levelof expression and concentration of the substrate) is an important research chal-lenge. In a previous work [LRSP06,LPRS07], a model of one particular kindof regulation, the classical attenuation regulation, has been suggested. In thatmodel, the evolution of the secondary RNA structure in the leader region of agene, and the progress of the ribosome and the polymerase along the RNA/DNAstrands, are represented by a very special, elaborated in detail, Markov chain.In this chain the transition probability corresponding to the progress of the ri-bosome depends on a “control variable” — the concentration of charged tRNAmolecules in the cell. All the other probabilities do not depend on the controlvariable, they can be determined from energy-based considerations. Terminationand antitermination (of gene expression) correspond to particular random eventsin the Markov chain. In [LRSP06], a Monte-Carlo simulation of this Markov chainled to biologically realistic dependence of termination probability from the con-trol variable. Due to a large size and a complex structure of the Markov chain,its simulation is a heavy computational task, but it was successfully solved, anda software tool called

Rnamodel simulates one trajectory in fractions of a sec-ond [LRSP06,RNA]. However, the approach based on the direct description ofthe Markov chain and its simulation has some limitations, especially for a the-oretical analysis. Biologically, it would be nice to have a more structured andcompact representation of the Markov chain and its instantaneous probability ⋆ The support of CNRS-RAS cooperation agreement 19122

Evolver is gratefullyacknowledged. istributions over all states at every instant, or only for suﬃciently large time,or only probabilities of the two biologically important events — termination andantitermination.Note that the problem of modeling the classical attenuation regulation, asstated in [LRSP06] and in the current article, is related to the representationof the transient behavior of the secondary structure on a sliding window onthe RNA strand between the ribosome and the polymerase (see below for de-tails). This diﬀers from the kinetics of the secondary RNA structure on a ﬁxednucleotide sequence for unlimited time, i.e. unlimited number of steps, investi-gated in many papers. The structure that appears after a large amount of timeis called equilibrium secondary RNA structure, it corresponds to a minimum ofenergy, see e.g. [Zuk03,FFHS00]. The tool

Rnamodel has also the function ofdetermining this equilibrium structure and its energy as a special part of the fullmodel in [LRSP06]. However, real structures that appear on the RNA strandduring the regulation process are far from the equilibrium and their energies arefar from minimal.In this article we discover a regular internal structure of the Markov chain de-scribing the classical attenuation regulation. We show that it can be representedas a probabilistic term rewriting system for a particular type of terms. The setof rewriting rules can be large, but all of them are generated by a small set of(ﬁve) metarules. In fact we give the full description of the metarules and explainhow to generate all the rules for the case of classical attenuation regulation.Potential beneﬁts of such a representation are multiple: – easier and more precise modeling of regulation mechanisms depending onthe dynamics of the secondary structure; – compact description of such mechanisms, perhaps in dedicated languages,and hence a better biological understanding of regulation processes; – convenient representation of secondary structures by terms; – speciﬁc analysis and simulation methods for rewriting systems.This article is structured as follows. In section 2 we describe shortly thebiological phenomenon that we want to model: the mechanism of classical atten-uation regulation (CAR). In section 3 we introduce a class of terms and proba-bilistic term rewriting systems. In section 4 we represent a qualitative metamodelof the biological mechanism of CAR by a term rewriting system. In section 5 wereﬁne the previous system and decorate its transitions with rates, thus obtaininga representation of the Markov chain by a probabilistic term rewriting system. Insection 6 we show some simulation results. In section 7 we discuss some relatedwork on term rewriting and its applications. In section 8 we conclude with a dis-cussion of perspectives of the rewriting approach to modeling the mechanismsinvolving RNA secondary structures, especially regulation. To begin with, we recall some well-known biological facts about the biologicalphenomenon playing the central role in this article.he expression of a group of structural genes (that is synthesis of the cor-responding proteins, which are ferments for a chemical reaction) can be regu-lated by a sequence of nucleotides placed on the DNA upstream inside the socalled leader region of the genes [SB91]. This subsequence of the leader regionis called the regulatory region . In this article we deal with one particular type ofregulation, classical attenuation regulation (CAR) in bacteria. This regulationmechanism concerns structural genes (groups of genes — operons) that produceproteins which catalyze the synthesis of amino acids. The classical attenuationallows to activate such an operon when the cell contains a small concentration ofthe amino acid, to deactivate the operon whenever this concentration increases,and to do it fast. The mechanism of CAR involves several actors: the regulatoryregion on the DNA, its copy on the RNA, the ribosome, and a ferment calledRNA polymerase (see Fig.1).

Rib Pol str. genes

DNARNAQ QQ ' Q ' Q " Q " Fig. 1.

Classical attenuation regulation. The RNA polymerase

Pol transcribes the regu-latory region Q , the ribosome Rib translates the leader peptide gene Q ′ . The movementof Rib on regulatory codons Q ′′ is controlled by the concentration of charged tRNA.The secondary RNA structure ω between Rib and

Pol brakes

Pol and pushes it oﬀthe chain. If

Pol reaches the structural genes, then they are expressed, i.e. transcribedand then translated. Note that in both the DNA and the RNA, we use

Q, Q ′ and Q ′′ to denote the regulatory region, the leader peptide gene, and the regulatory codons,respectively. For structural genes to be expressed two concurrent processes should suc-ceed: the regulatory region Q should be transcribed creating an RNA by RNApolymerase. At the same time the ribosome should be bound to the very begin-ning of the freshly created segment Q ′ (called the leader peptide gene ) in theregulatory region Q on the RNA and starts translation of this leader peptidegene to an auxiliary protein. The essential part of the regulation process takesplace when the ribosome moves on Q ′ on the RNA and the polymerase movessomewhere downstream of the ribosome on Q on the DNA.The ribosome moves “rightwards” (formally speaking, in the direction fromthe 5 ′ to the 3 ′ end) on a segment Q ′ of the sequence Q . Its speed is constantxcept on a subsequence Q ′′ ( regulatory codons ) where it depends directly on theconcentration of the amino acid (via charged tRNA concentration). To the rightof the ribosome and independently of it, the polymerase moves rightwards on Q .Between the ribosome and the polymerase a secondary structure ω is formed onthe RNA. This structure consists in pairing of some nucleotides, and it changesvery fast. An important eﬀect of the secondary structure ω consists in slowingdown the movement of the polymerase. There are two possible scenarios: – When ω is strong enough, its “braking” action on the polymerase increases,and moreover, the polymerase can slip oﬀ the DNA (this can only happenon so-called T-rich sequence , where the connection of the polymerase andthe DNA weakens). Such an event is called termination , and in this casethe structural genes are not expressed: the transcription of the regulatoryregion is aborted, the structural genes are not transcribed and therefore nottranslated. – Another possibility is that the ribosome moves fast enough to weaken orpartly destroy most of the structure ω . In this case the polymerase safelytraverses the T-rich sequence, and arrives to the end of the leader region Q .Next, the polymerase enters the structural genes, and their transcription,followed by translation are unavoidable. This event is called antitermination and in this case the structural genes are expressed.In the rest of this article we build a qualitative and a quantitative models ofthe regulation process described above. Let Σ be a ﬁnite set of function symbols and X an enumerable set of variables (standing for sets of terms). The set T Σ [ X ] of terms over Σ and X is the smallestset that satisﬁes: – Σ ⊆ T Σ [ X ], – { f ( x ) | f ∈ Σ ∧ x ∈ X } ⊆ T Σ [ X ], – if f ∈ Σ and s ⊆ T Σ [ X ] is a set of terms, then f ( s ) is in T Σ [ X ].By deﬁnition we also put f ( ∅ ) = f for f ∈ Σ . For convenience we write f ( g, h ( e )) instead of f ( { g, h ( { e } ) } ). However one should remember that thecoma-separated terms are unordered. Example 1.

Let Σ = { e, f, g, h } and X = { x, y, z, . . . } , then the followings areterms in T Σ [ X ]: f ( g, h ( e )), f ( f ( x )) and e ( g, f ).Note that we consider function symbols of variable arity. T Σ stands for T Σ [ ∅ ].Terms in T Σ are called ground terms . Variables are used only to deﬁne substi-tution and rewriting rules. The “real” terms are ground terms. A substitution σ is a mapping from X to 2 T Σ [ X ] , written as σ = { x → T , . . . , x n → T n } , where i , 1 ≤ i ≤ n , is a ﬁnite set of terms that substitutes the variable x i . The termobtained by applying the substitution σ to a term t is written tσ . We call it an instance of t .Let R be a rule of the form l → r , where l and r are terms in T Σ [ X ]. Forground terms t, t ′ we write t → R t ′ if there exists a substitution σ such that t ′ can be obtained from t by replacing an occurrence of the subterm lσ by rσ . → R deﬁnes a relation between ground terms. Let → ∗ R be the reﬂexive transitiveclosure of → R . Example 2.

Let R = l → r with l = f ( x, e ), r = f ( g ( x ) , e ) and t = e ( f ( h, e )),then t → R t ′ where t ′ = e ( f ( g ( h ) , e )).A term rewriting system (TRS) is a ﬁnite set of rules of the form l → r . Givena TRS R and a set of terms I ⊂ T Σ , the language R ∗ ( I ) is deﬁned as the set ofall ground terms that can be obtained from the terms in I by applying a ﬁnitenumber of times the rules from R , i.e., R ∗ ( I ) = { t ∈ T Σ | ∃ t ′ ∈ I, t ′ → ∗R t } . Example 3.

Let R = { f ( x ) → g ( f ( x )) } and I = { f ( e, h ) } , then R ∗ ( I ) = { g n ( f ( e, h )) | n ∈ IN } . A Continuous Time Markov Chain is a pair (

S, ρ ), where S is a ﬁnite or enu-merable set of states and ρ : S × S → [0 , ∞ ) is the rate matrix. For s, s ′ ∈ S , ρ ( s, s ′ ) > s and s ′ , and that theprobability for moving from s to s ′ within t time units is equal to 1 − e − ρ ( s,s ′ ) · t .If a state s has more than one outgoing transition (i.e., if there exist more thanone state s ′ for which ρ ( s, s ′ ) >

0) there exists a race between these transi-tions and the probability for moving from s to s ′ within t time units is equal to ρ ( s,s ′ ) E ( s ) (cid:0) − e − E ( s ) · t (cid:1) , where E ( s ) = P s ′ ∈ S ρ ( s, s ′ ).A (continuous time) Probabilistic term rewriting system (PTRS) over Σ ∪ X is a (ﬁnite) set of rules of the form l Λ −−→ r , where l and r are terms in T Σ [ X ],and Λ ∈ (0 , ∞ ) is a rate.A PTRS R over Σ ∪ X deﬁnes a continuous time Markov chain on groundterms M = ( T Σ , ρ ), where ρ ( t, t ′ ) = Λ iﬀ there exists a rule l Λ −−→ r ∈ R suchthat t → R t ′ , where R is the “non probabilistic” rule l → r . Remark 1.

If there are several rules (or several instances of the same rule) thatlead from t to t ′ , then ρ ( t, t ′ ) = P Λ , where the sum is taken over all such rulesor instances. We want to model the phenomenon of the classical attenuation regulation de-scribed in section 2.e suppose that a regulatory region Q (see Fig. 1) is given and ﬁxed in thesequel, it is a sequence (word) Q ∈ { A , C , G , T } ∗ , the letters of this alphabetare called nucleotides . We denote by | x | the length of any word x and x i the i thletter of x , so x = x x . . . x | x | . The sequence Q can be folded in a way thatsome nucleotides of Q are paired: A with T and C with G . The complement ofa nucleotide is written using a bar: A = T , T = A , C = G , G = C . We look in Q for subwords (“stems”) of the form Q A Q A +1 . . . Q B and Q C Q C +1 . . . Q D such that B − A = D − C, A + 3 ≤ B, B + 3 ≤ C (1) Q A = Q D , Q A +1 = Q D − , . . . Q B = Q C . Any pair of such stems forms a hypohelix (see Figure 2, where the labels A i , B i , C i and D i are positions in the word Q ). A B C D | Q | f Fig. 2.

One hypohelix f . We describe a hypohelix f by a tuple of its stems’ extremities f = ( A, B, C, D ),and we introduce the following notations: stem ( f ) = [ A, B ] ∪ [ C, D ] , loop ( f ) = [ B + 1 , C − , supp ( f ) = [ A, D ] . There is a ribosome at some position on Q ′ and an RNA polymerase some-where to the right of it. Both move to the right, in one step the ribosome movesby three successive nucleotides and the polymerase by one nucleotide. The win-dow w = ( R, P ) represents the segment of RNA from the ﬁrst position R afterthe end of the ribosome to the last position P before the beginning of the poly-merase. In fact the folding of the RNA sequence Q can only happen within thecurrent window, i.e. between positions R and P . When the ribosome advancesto the right, it can destroy the leftmost hypohelix of a current conﬁguration,because it consumes the ﬁrst three letters of the window. On the other hand anypolymerase move adds one new letter to the window. only on its “active” part called window , as we will see below ormally a window has the form w = ( R, P ) with

R, P ∈ IN. The followingconstraints should be satisﬁed:13 ≤ R ≤ P ≤ | Q | (2)Thus, the window is moving and changing its length.Let W = { w = ( R, P ) | conditions (2) are satisﬁed } be the alphabet of allwindows. We deﬁne stem ( w ) = ∅ , loop ( w ) = [ R, P ] , supp ( w ) = [ R, P ] . We will write terms over the alphabet Σ of all hypohelices and all windows: Σ = H ∪ W where H = { f = ( A, B, C, D ) | conditions (1) are satisﬁed } . We consider only terms of the form w ( . . . ) for some w ∈ W (rooted by somewindow w ). According to the conditions that we will deﬁne next, a symbol f = ( A, B, C, D ) can appear in a term w ( . . . ) only if R ≤ A and D ≤ P , where w = ( R, P ).We say that a hypohelix f is embedded in g (which can be a hypohelix ora window), written f ≺ g , if supp ( f ) ⊆ loop ( g ). Two hypohelices f and g are disjoint , written f ⊲⊳ g , if supp ( f ) ∩ supp ( g ) = ∅ . We call f and g unknotted ifeither one of them is embedded in the other or they are disjoint. We say that g = ( A , B , C , D ) is an extension of f = ( A , B , C , D ), denoted f ⊑ g , if[ A , B ] ⊆ [ A , B ] and B − B = C − C , hence [ C , D ] ⊆ [ C , D ], and thepairing in g is an extension of that in f . See Figure 3.We call a term t over Σ well-formed if it satisﬁes the following conditions: (compatibility) any f and g appearing in t are unknotted, in particular any f can appear at most once, (ordering) if f and g occur in t , then f ≺ g iﬀ f is in the scope of g .The combination of two hypohelices in Figure 4 is biologically feasible, butaccording to our rules these hypohelices are incompatible. We believe that thisrestriction (crucial for representation by terms) does not undermine signiﬁcantlythe accuracy of the model.Notice, that a well-formed term of the form w ( . . . ) (rooted by some window w ) contains only hypohelices from Σ w = { f ∈ H | f ≺ w } . This simple observation greatly simpliﬁes the simulation process.In [LRSP06] an additional maximality condition is imposed. Using the ter-minology of this article, it requires that no hypohelix f in t can be replaced byits proper extension without creating an overlapping. Here we do not imposethis restriction.Each well-formed term represents a possible secondary RNA structure in awindow in Q : the set of hypohelices that are present in this window. It could be B C D fA B C D g A B C D f A B C D g Fig. 3.

Relative positions of two hypohelices f and g : f ≺ g and f ⊲⊳ g . Here f =( A , B , C , D ) and g = ( A , B , C , D ). On the left B < A and D < C , on theright D < A . A B C D A B C D Fig. 4.

Pseudo-knot: A < B < A < B < C < D < C < D . Such conﬁgurationsare not allowed in our model. ossible to allow knotted hypohelices, and hypohelices of length less than 3, buthere we do not consider them.We extend the deﬁnitions of ⊲⊳ and ≺ : let f be a term and c a set of terms, c ⊲⊳ f iﬀ ∀ g ∈ c ( g ⊲⊳ f ) , c ≺ f iﬀ ∀ g ∈ c ( g ≺ f ) . In the former case we say that f and c are disjoint , in the latter that c isembedded into f .We start from a sequence Q without any pairing of nucleotides, this structureis described by a term w () — “an empty window”, where w = (13 , w (). Our rewriting system will generate only well-formed terms.On the whole, there are ﬁve rewriting Meta -rules: – Binding and decomposition of a hypohelix f : (cid:0) ω = g ( c , d ) (cid:1) ←→ (cid:0) ω ′ = g ( c , f ( d )) (cid:1) with c ⊲⊳ f, d ≺ f, f ≺ g, (3)where c and d are sequences of terms. The concrete rewriting rules — andtheir rates — depend on c and d , as explained below. – Extension and reduction of a hypohelix (cid:0) ω = f (cid:1) ←→ (cid:0) ω ′ = g (cid:1) with f ⊑ g. (4) – The window movement can be described by the following rules, where w =( R, P ): (

R, P )( ω ) −→ ( R + 3 , P )( ω ′ ) , (5)( R, P )( ω ) −→ ( R, P + 1)( ω ) , (6) w ( ω ) −→ ⊥ . (7)In the last rule, ⊥ is a special symbol denoting termination. Rules (5) describethe movement of the ribosome. In these rules, ω ′ is obtained from ω by removingonly the possible symbol that is incompatible with the new window ( R + 3 , P ),or replacing it by a “shorter” hypohelix. Indeed, if the leftmost hypohelix in ω starts at a position between R and R + 3, then the movement of the ribosomeby three positions to the right will destroy this hypohelix. More formally, if ω ≺ ( R + 3 , P ), then ω ′ = ω . Otherwise the ribosome destroys the leftmosthypohelix. In this case, there is a single symbol f in ω such that f ( R + 3 , P ).Suppose the subterm rooted by f is f ( c ). Then, ω ′ is obtained by replacing in ω f ( c ) by either f ′ ( c ) or c , depending on the size of f , where f ′ ⊑ f .Rules 6 describe the movement of the polymerase. Note that if the polymerasereaches a position P + 1 where the structural genes are expressed, then we reachantitermination and the gene is expressed. Quantitative model

Now, we introduce the rates of the ﬁve rewriting rules.Let h ( f ( ∗ ) , . . . , f n ( ∗ )) be a term. Then the free loop length of the hypohelix h in this term is l h = | loop ( h ) | − n X i =1 | supp ( f i ) | . This numeric characteristic corresponds to the number of nucleotides in the loopof the hypohelix h that do not participate in inner hypohelices.In order to deﬁne the rate, we have to consider the concrete rule corre-sponding to the Metarule (3). For any f, g, c = c ( x ) , . . . , c m ( x m ) and d = d ( y ) , . . . , d n ( y n ) such that c ⊲⊳ f, d ≺ f, f ≺ g there is a concrete rule (cid:0) ω = g ( c ( x ) , . . . , c m ( x m ) , d ( y ) , . . . , d n ( y n )) (cid:1) ←→ (cid:0) ω ′ = g ( c ( x ) , . . . , c m ( x m ) , f ( d ( y ) , . . . , d n ( y n ))) (cid:1) (8)Recall that the subterms are unordered. Similarly the concrete rule correspond-ing to (4) is (cid:0) ω = a ( c ( x ) , . . . , c m ( x m ) , f ( d ( y ) , . . . , d n ( y n ))) (cid:1) ←→ (cid:0) ω ′ = a ( c ( x ) , . . . , c m ( x m ) , g ( d ( y ) , . . . , d n ( y n ))) (cid:1) (9)Note that this transformation can change the free loop length of the hypohelix a . The rate of the rules (8-9) is denoted K ( ω → ω ′ ), given by K ( ω → ω ′ ) = κ · exp (cid:18)

12 ( E ( ω ) − E ( ω ′ )) (cid:19) , (10)where the energy E ( ω ) = G hel ( ω )+ G loop ( ω ), κ is a parameter — usually κ = 10 — and G hel ( ω ) = 1 RT · X h E h and G loop ( ω ) = X h . · ln( l h + 1) + B , (11)and h varies over all hypohelices from ω . E h represents the total stacking energyalong the hypohelix h . It is the sum of stacking bond energies of the adjacent basepairs of h . B can take three diﬀerent values depending on the three possible typesof the loop of the hypohelix g : terminal loop, single-strand bulge and double-strand bulge.A codon is a triple of successive nucleotides. For a sequence Q ′ , each codonis ﬁxed to be either regulatory or non-regulatory. Analogously, each nucleotidein Q is ﬁxed to be either non T-rich or T-rich [LRSP06]. Let s be the “radius”of a ribosome — distance from P-site to the end of the ribosome — usually s = 12, and let s be the “radius” of a polymerase — distance from the 5 ′ end of a polymerase to its transcription center — usually s = 9. The rate ofthe rule (5) is denoted λ rib and is constant when R − s is a position of a non-regulatory codon, and otherwise λ rib depends on an external parameter c — theoncentration of charged tRNA [SB91]. The rate of the rule (6) is denoted ν anddepends on secondary structure ω in the window. The rule (7) applies only when P + s is a position of a T-rich nucleotide and its rate is denoted µ .In [LRSP06] the rate of the rule (5) was denoted λ rib and λ rib ( c ) = 45 c c . (12)The rate of the rule (6) was denoted ν and ν = 40 − F ( ω ) . (13)The rate of the rule (7) was denoted µ and µ = 14 F ( ω ) . (14)The function F ( ω ) in (13-14) for ω = f ( ∗ ) , . . . , f n ( ∗ ) depends only on functionalsymbols (hypohelices) f , . . . , f n , and not on the structure of their argumentsdenoted by ∗ . More precisely F ( ω ) = max i F ( f i ), where F ( f ) = δ · exp (cid:16) − r ( f ) r (cid:17) ( L ) · ( p ( f ) − p ) + 1 , (15)with p ( f ) ≈ π | supp ( f ) | , and r ( f ) the “free distance” from f to the end P of thewindow: for f = ( A, B, C, D ) and w = ( R, P ), we have r ( f ) = R − D − X i | supp ( f i ) | . (16)Other symbols in equation (15) denote constants: r = 1 , δ = 30 , L = 27 . , p =0 .

18, see [LRSP06].Note that the rates of the rules depend only on the local conﬁguration asexplained above and not on the outside context. In particular it does not dependon instantiations of x , . . . , x m , y , . . . , y n . atgaaagcaattttcgtactgaaaggttggtggcgcacttcctgaaacgggcagtgtattcaccatgcgtaaagcaatcagatacccagcccgcctaatgagcgggcttttttttg Fig. 5.

A regulatory region for trpE genes in

E. coli . We have adapted the simulator described in [LRSP06] and available at [RNA]to obtain sequences of terms. As an example in Figure 6 we give one (slightlyshortened and simpliﬁed) terminating trajectory of the regulation process forthe trpE genes (responsible for the synthesis of tryptophan) in

E. coli . Theregulatory region itself is presented in Figure 5.

Related Work

References to the literature on RNA regulation mechanisms can be found in[LRSP06,LPRS07].Term rewriting systems have been used in the so called

Regular Model Check-ing framework [KMM + +

03] to modelchemical reactions. Compared to our work, the rewriting systems consideredin [BIK06,BCC +

03] are not probabilistic. Moreover, these works consider themodeling of chemical reactions whereas we consider modeling of RNA secondarystructure.Finally, probabilistic term rewriting systems have also been considered in[BH03,BK02,KSMA03]. But in these works, the symbols are of ﬁxed arities andthe terms are ordered, whereas in our framework, the symbols have arbitraryarities and the terms are not ordered. Moreover, as far as we know, this is theﬁrst time that probabilistic term rewriting systems are used to model attenuationregulation.

We have established that the framework of probabilistic term rewriting systemsprovides compact and structured description of detailed models of RNA regula-tion.We intend to continue exploration of this framework. The most importanttask consists in the development of adequate data structures and algorithms,as well as approximation and abstraction methods for analysis of this kind ofmodels. The next step would be a massive computational experimentation, thebiological interpretation of results and validation of results by real biologicaldata.

Acknowledgments

The authors are thankful to Sergey Pirogov, Konstantin Gorbunov and LevRubanov for a valuable discussion. Lev Rubanov has also provided assistance inuse of the

Rnamodel tool. Oleg Zverkov has helped us in preparing computergraphics for this article. eferences [AJMd02] Parosh Aziz Abdulla, Bengt Jonsson, Pritha Mahata, and Julien d’Orso.Regular tree model checking. In

CAV’02 , volume 2404 of

Lecture Notes inComputer Science , pages 555–568, 2002.[ALdR05] Parosh Aziz Abdulla, Axel Legay, Julien d’Orso, and Ahmed Rezine.Simulation-based iteration of tree transducers. In

TACAS’05 , volume 3440of

Lecture Notes in Computer Science , pages 30–44, 2005.[BCC +

03] Olivier Bournez, Guy-Marie Cˆome, Val´erie Conraud, H´el`ene Kirchner, andLiliana Ibanescu. A rule-based approach for automated generation of ki-netic chemical mechanisms. In

RTA’03 , volume 2706 of

Lecture Notes inComputer Science , pages 30–45. Springer, june 2003.[BH03] Olivier Bournez and Mathieu Hoyrup. Rewriting logic and probabilities. In

RTA’03 , volume 2706 of

Lecture Notes in Computer Science , pages 61–75.Springer, June 2003.[BIK06] Olivier Bournez, Liliana Ibanescu, and H´el`ene Kirchner. From chemicalrules to term rewriting. In , volume 147(1) of

ENTCS , pages 113–134, 2006.[BK02] Olivier Bournez and Claude Kirchner. Probabilistic rewrite strategies: Ap-plications to ELAN. In

RTA’02 , volume 2378 of

Lecture Notes in ComputerScience , pages 252–266. Springer-Verlag, July 2002.[BT02] Ahmed Bouajjani and Tayssir Touili. Extrapolating tree transformations.In

CAV’02 , volume 2404 of

Lecture Notes in Computer Science , pages 539–554, 2002.[BT03] Ahmed Bouajjani and Tayssir Touili. Reachability analysis of processrewrite systems. In

FSTTCS’03 , Lecture Notes in Computer Science, pages73–87, 2003.[FFHS00] Christoph Flamm, Walter Fontana, Ivo L. Hofacker, and Peter Schuster.RNA folding at elementary step resolution.

RNA , 6(3):325–338, 2000.[KMM +

01] Yonit Kesten, Oded Maler, Monica Marcus, Amir Pnueli, and Elad Sha-har. Symbolic model checking with rich assertional languages.

TheoreticalComputer Science , 256:93–112, 2001.[KSMA03] Nirman Kumar, Koushik Sen, Jos´e Meseguer, and Gul Agha. A rewritingbased model for probabilistic distributed object systems. In

FMOODS’03 ,volume 2884 of

Lecture Notes in Computer Science , pages 32–46, 2003.[LPRS07] Vassily Lyubetsky, Sergey Pirogov, Lev Rubanov, and Alexander Seliver-stov. Modeling classic attenuation regulation of gene expression in bacteria.

Journal of Bioinformatics and Computational Biology , 5(1), 2007. in print.[LRSP06] Vassily Lyubetsky, Lev Rubanov, Alexander Seliverstov, and SergeyPirogov. Model of gene expression regulation in bacteria via formationof RNA secondary structures.

Molecular Biology , 40(3):440–453, 2006.[RNA] RNAmodel. Model of RNA-related regulation in bacteria.http://lab6.iitp.ru/rnamodel/rnamodee.html.[SB91] Maxine Singer and Paul Berg.

Genes & genomes . University Science BooksMill Valley, Calif, 1991.[Tou05] Tayssir Touili. Dealing with communication for dynamic multithreadedrecursive programs. In . IOS Press, 2005.[Zuk03] Michael Zuker. Mfold web server for nucleic acid folding and hybridizationprediction.

Nucleic Acids Research , 31(13):3406–3415, 2003. , i () →h , i ( a ) →h , i () →h , i ( b ) →h , i ( c ) →· ( b ) ∗ → h , i ( c ) →· ( d ) ∗ → · ( g ) →· ( f ( e )) →· ( d ) →h , i ( c ) ∗ → · ( b ) →· ( b ) →h , i ( b, h ) →· ( c ) ∗ → · ( h ) →· ( c, h ) →· ( c ) →h , i ( c, h ) →· ( h ) →· ( b, h ) →· ( b ) ∗ → · ( g ) →· ( f ( e )) →· ( b, h ) →h , i ( h ) →· ( b ) →· ( b, h ) →h , i ( h ) →· ( b ) →· ( b, i ) →· ( i ) →· ( c, i ) →· ( c, h ) →· ( c ) ∗ → h , i ( c, h ) →· ( c, i ) →· ( c, j ) →· ( c, k ) →· ( b ) ∗ → · ( k ) →· ( f ( e )) →· ( b, h ) →· ( b, i ) →· ( b, j ) →· ( b, k ) →· ( b, i ) →h , i ( i ) →· ( b ) →· ( b, l ) →· ( b, h ) →· ( b, j ) →· ( b, k ) →· ( h ) →· ( b, l ( h )) →· ( k ) →· ( l ( h )) →· ( l ) →· ( l ( h )) →h , i ( h ) →· ( l ) →· ( b, l ( h )) →· ( b, h ) →· ( b, l ) →· ( b ) →· ( l ( h )) →h , i ( h ) →· ( l ) →· ( b, l ( h )) →· ( b, h ) →· ( b, l ) →· ( c, h ) →· ( m ( h )) →· ( b ) →· ( b,i ) →· ( b, j ) →· ( b, k ) →· ( m ) ∗ → · ( k ) →· ( b, l ( h )) →h , i ( l ( h )) →· ( b, h ) →· ( b, l ) →· ( h ) →· ( l ) →· ( b ) →· ( b, l ( h )) →h , i ( l ( h )) →· ( b, h ) →· ( b, l ) →· ( h ) →· ( l ) →· ( b ) →· ( b, i ) →· ( b, j ) →· ( b, n ) →· ( b, k ) →· ( b, o ) →· ( b ) →h , i ( b, l ) ∗ → · ( b, o ) →· ( l ) →· ( b, l ( h )) →· ( h ) →· ( l ( h )) →· ( b, l ( h )) →h , i ( l ( h )) →· ( b, h ) →· ( b, l ) →· ( h ) →· ( l ) →· ( b ) →· ( b,h ) →h , i ( h ) →· ( b ) →· ( b, l ( h )) →· ( l ( h )) →· ( b, l ) →· ( l ) →· ( b,i ) →· ( b, j ) →· ( b, n ) →· ( b, k ) →· ( b, o ) →· ( k ) →· ( c, h ) →· ( m ( h )) →· ( c ) →· ( l ( h )) →h , i ( h ) →· ( l ) →· ( b, l ( h )) →· ( b, h ) →· ( b, l ) →· ( b ) →· ( b, h, p ) →· ( h, p ) →· ( b, p ) →· ( p ) →· ( c,h, p ) →· ( c, p ) →· ( d, p ) →· ( f,p ) →· ( e, p ) →· ( g,p ) →· ( b, i ) →· ( b, j ) →· ( b, n ) →· ( b, q ) →· ( b, k ) →· ( b, o ) →· ( c, h ) →· ( m ( h )) →· ( b, h, p ) →h , i ( h, p ) →· ( b, p ) →· ( b, h ) →· ( p ) →· ( b ) →· ( h ) →· ( c, h, p ) →· ( c, p ) →· ( d, p ) →· ( f, p ) →· ( e, p ) →· ( g, p ) →· ( b, l ( h )) →· ( l ( h )) →· ( b, l ) →· ( l ) →· ( b, i ) ∗ → · ( b, o ) →· ( q ) →· ( l ( h )) →h , i ( h ) →· ( l ) →· ( b, l ( h )) →· ( b, h ) →· ( b, l ) →· ( c, h ) →· ( m ( h )) →· ( h,p ) →· ( m ) →· ( m ( i )) →· ( m ( j )) →· ( m ( k )) →· ( m ( h )) →h , i ( h ) →· ( m ) →· ( c, h ) →· ( b, h ) →· ( l ( h )) →· ( h, p ) →· ( l ) →· ( b,l ( h )) →· ( b, l ) →· ( b ) →· ( b, h, p ) →· ( b, p ) →· ( p ) →· ( c,h, p ) →· ( b, h, p ) →h , i ( h, p ) →· ( b, p ) →· ( b,h ) →· ( p ) →· ( h ) →· ( c,h, p ) →· ( r ( h, p )) →· ( b ) →· ( c,p ) →· ( c, h ) →· ( c ) →· ( d, p ) →· ( r ( p )) →· ( f,p ) →· ( e, p ) →· ( g, p ) →· ( b, l ( h )) →· ( b, h, p ) →h , i ( h, p ) →· ( b, p ) →· ( b, h ) →· ( p ) →· ( b ) →· ( b, l ) ∗ → · ( b, s ) →· ( h ) →· ( c, h, p ) →· ( r ( h, p )) →· ( b, l ( h )) →· ( l ( h )) →· ( b, h, p ) →h , i ( h, p ) →· ( b, p ) →· ( b, h ) →· ( p ) →· ( b ) →· ( c, p ) →· ( d, p ) →· ( r ( p )) →· ( f, p ) →· ( e, p ) →· ( g, p ) →· ( r ) →· ( r ( e,p )) →· ( r ( g,p )) →· ( r ( h, p )) →· ( r ( e )) →· ( r ( h )) →· ( h ) →· ( r ( m ( h ))) →· ( m ( h )) →· ( r ( m )) →· ( m ) →· ( r ( m ( i ))) →· ( r ( m ( j ))) →· ( r ( m ( k ))) →· ( r ( m ( h ))) →h , i ( m ( h )) →· ( r ( h )) →· ( r ( m )) →· ( r ( m ( h ))) →h , i ( m ( h )) →· ( r ( h )) →· ( r ( m )) →· ( r ( m ( h ))) →h , i ( m ( h )) →· ( r ( h )) →· ( r ( m )) →· ( h ) →· ( r ) →· ( r ( h,p )) →· ( m ) →· ( r ( m ( i ))) →· ( r ( m ( j ))) →· ( r ( m ( k ))) →· ( m ( i )) →· ( r ( i )) →· ( i ) →· ( r ( m ( h ))) →h , i ( m ( h )) →· ( r ( h )) →· ( r ( m )) →· ( m ) →· ( r ) →· ( r ( m ( i ))) →· ( r ( m ( j ))) →· ( r ( m ( k ))) →· ( r ( m ( h ))) →h , i ( m ( h )) →· ( r ( h )) →· ( r ( m )) →· ( m ) →· ( r ) →· ( r ( m ( i ))) →· ( r ( m ( j ))) →· ( r ( m ( k ))) →· ( m ( i )) →· ( r ( i )) →· ( h ) →· ( r ( h, p )) →· ( m ( j )) →· ( m ( k )) →· ( r ( m ( h ))) →h , i ( m ( h )) →· ( r ( h )) →· ( r ( m )) →· ( h ) →· ( m ) →· ( r ) →· ( r ( h,p )) →· ( h, p ) →· ( r ( p )) →· ( p ) →· ( r ( e,p )) →· ( r ( g,p )) →· ( e,p ) →· ( r ( e )) →· ( r ( m ( i ))) →· ( r ( m ( j ))) →· ( r ( m ( k ))) →· ( r ( m ( h ))) →h , i ( m ( h )) →· ( r ( h )) →· ( r ( m )) →· ( h ) →· ( m ) →· ( r ) →· ( r ( h, p )) →· ( r ( m ( h ))) →h , i ( m ( h )) →· ( r ( h )) →· ( r ( m )) →· ( m ) →· ( r ) →· ( r ( m ( i ))) →· ( r ( m ( j ))) →· ( r ( m ( k ))) →· ( h ) →· ( r ( h, p )) →· ( c, h ) →· ( b, h ) →· ( t ( h )) →· ( l ( h )) →· ( h, p ) →· ( h, u ) →· ( t ) →· ( t ( h,p )) →· ( t ( p )) →· ( p ) →· ( t ( v ( p ))) →· ( c,h, p ) →· ( b, h, p ) →· ( r ( p )) →· ( r ( e, p )) →· ( r ( g, p )) →· ( r ( m ( h ))) →h , i ( m ( h )) →· ( r ( h )) →· ( r ( m )) →· ( m ) →· ( r ) →· ( r ( m ( i ))) →· ( r ( m ( j ))) →· ( r ( m ( k ))) →· ( m ( i )) →· ( r ( i )) →· ( h ) →· ( r ( m ( h ))) →h , i ( m ( h )) →· ( r ( h )) →· ( r ( m )) →· ( h ) →· ( m ) →· ( r ) →· ( r ( m ( i ))) →· ( r ( m ( j ))) →· ( r ( m ( k ))) →· ( r ( h, p )) →· ( h, p ) →· ( r ( p )) →· ( r ( m ( h ))) →h , i ( m ( h )) →· ( r ( h )) →· ( r ( m )) →· ( r ( m ( h ))) →h , i ( m ( h )) →· ( r ( h )) →· ( r ( m )) →· ( h ) →· ( m ) →· ( r ) →· ( r ( h, p )) →· ( r ( m ( i ))) →· ( r ( m ( j ))) →· ( r ( m ( k ))) →· ( r ( m ( h ))) →h , i ( m ( h )) → · ( r ( h )) → · ( r ( m )) → · ( h ) → · ( r ) → · ( r ( h, p )) → · ( r ( m ( h ))) → h , i ( m ( h )) → · ( r ( h )) → · ( r ( m )) →· ( h ) →· ( m ) →· ( r ) →· ( r ( h, p )) →· ( r ( m ( i ))) →· ( r ( m ( j ))) →· ( r ( m ( k ))) →· ( h,p ) →· ( r ( p )) →· ( p ) →· ( c, h, p ) →· ( b, h, p ) →· ( t ( h, p )) →· ( w ( h, p )) →· ( w ( p )) →· ( w ( h )) →· ( w ) →· ( w ( l ( h ))) →· ( w ( h,u )) →· ( w ( v ( p ))) →· ( v ( p )) →· ( w ( v )) →· ( w ( v ( p ))) →h , i ( v ( p )) →· ( w ( p )) →· ( w ( v )) →· ( p ) →· ( w ) →· ( w ( h,p )) →· ( w ( v ( p ))) →h , i ( v ( p )) →· ( w ( p )) →· ( w ( v )) →· ( p ) →· ( w ) →· ( w ( h, p )) →· ( v ) →· ( w ( v ( k ))) →· ( w ( v ( o ))) →· ( w ( v ( s ))) →· ( w ( v ( p ))) →h , i ( v ( p )) →· ( w ( p )) →· ( w ( v )) →· ( w ( v ( p ))) →h , i ( v ( p )) →· ( w ( p )) →· ( w ( v )) →· ( p ) →· ( w ) →· ( w ( h,p )) →· ( v ) →· ( w ( v ( k ))) →· ( w ( v ( o ))) →· ( w ( v ( s ))) →· ( w ( v ( p ))) →h , i ( v ( p )) →· ( w ( p )) →· ( w ( v )) →· ( w ( v ( p ))) →h , i ( v ( p )) →· ( w ( p )) →· ( w ( v )) →· ( w ( v ( p ))) →h , i ( v ( p )) →· ( w ( p )) →· ( w ( v )) →· ( w ( v ( p ))) →h , i ( v ( p )) →· ( w ( p )) →· ( w ( v )) →· ( w ( v ( p )) ,x ) →· ( v ( p ) ,x ) →· ( w ( p ) ,x ) →· ( w ( v ) , x ) →· ( v ) →· ( w ) →· ( w ( v ( k ))) →· ( w ( v ( o ))) →· ( w ( v ( s ))) →· ( p ) →· ( w ( h,p )) →· ( w ( v ( p ))) →h , i ( v ( p )) →· ( w ( p )) →· ( w ( v )) →· ( w ( v ( p )) , x ) →· ( p ) →· ( w ) →· ( w ( p ) ,x ) →· ( w ( h, p )) →· ( h, p ) →· ( w ( h )) →· ( w ( h,p ) , x ) →· ( v ) →· ( w ( v ) ,x ) →· ( w ( v ( k ))) →· ( w ( v ( o ))) →· ( w ( v ( s ))) →· ( v ( p ) ,x ) →· ( p,x ) →· ( v, x ) →· ( c, v ( p ) , x ) →· ( b, v ( p ) , x ) →· ( d, v ( p ) , x ) →· ( t ( v ( p )) ,x ) →· ( b,p, x ) →· ( b, v, x ) →· ( b, v ( p )) →⊥ Fig. 6. A simulation result: one typical terminating trajectory for classical attenua-tion regulation of trpE genes in

E. coli . Notations: → means one rewriting; ∗ → meansseveral similar rewritings; repeated window positions (e.g. repetitions of h , i )arereplaced by a · symbol; ⊥ means termination. There are 24 helices, denoted by lettersfrom a to xx