Synchrony in a Boolean network model of the L-arabinose operon in Escherichia coli
SSYNCHRONY IN A BOOLEAN NETWORK MODEL OF THE L -ARABINOSE OPERON INESCHERICHIA COLI ANDY JENKINS AND MATTHEW MACAULEYA
BSTRACT . The lactose operon in
Escherichia coli was the first known gene regulatory network, and it isfrequently used as a prototype for new modeling paradigms. Historically, many of these modeling frameworksuse differential equations. More recently, Stigler and Veliz-Cuba proposed a Boolean network model thatcaptures the bistability of the system and all of the biological steady states. In this paper, we model the well-known arabinose operon in E. coli with a Boolean network. This has several complex features not found inthe lac operon, such as a protein that is both an activator and repressor, a DNA looping mechanism for generepression, and the lack of inducer exclusion by glucose. For out of choices of initial conditions, weuse computational algebra and Sage to verify that the state space contains a single fixed point that correctlymatches the biology. The final initial condition, medium levels of arabinose and no glucose, successfullypredicts the system’s bistability. Finally, we compare the state space under synchronous and asynchronousupdate, and see that the former has several artificial cycles that go away under a general asynchronous update.
1. I
NTRODUCTION
The regulation of gene expression is essential for the maintenance of homeostasis within an organism.As such, the ability to predict which genes are expressed and which are silenced based on the cellularenvironment is of utmost importance to molecular biologists. Mathematical models of gene regulatorynetworks have classically been framed as systems of differential equations that are derived from mass-action kinetics. The result is a complex system of equations that cannot be solved explicitly, and involvesmany experimentally determined rate constants. In many cases, estimates of these constants in the literaturemay vary by orders of magnitude. The end result is a model which is quantitative by nature, but canonly really be analyzed qualitatively, even if it can be solved numerically. These continuous models arestill useful for understanding the mechanisms of regulation, but like any model, they have their strengthsand weaknesses. An alternative approach is to use discrete models, such as logical models or Booleannetworks [Td90]. These too have their drawbacks, but they have been shown to be useful tools when one isinterested in analyzing the global-level dynamics of the system. The interested reader can consult [DJ02]for a nice survey in modeling gene regulatory networks, including both continuous and discrete models.The lactose operon in
Escherichia coli was the first gene regulatory network discovered, and FrancoisJacob and Jacques Monod won a Nobel Prize in 1965 for their work on it [JPSM60]. It is an example ofan inducible operon under negative control, and a Boolean network model was proposed by Veliz-Cubaand Stigler in [VCS11]. In this paper, we propose a Boolean model for the L -arabinose operon. While thisoperon is also used by E. coli to regulate sugar metabolism, it contains several unique biological featuressuch as a protein that can act both as a positive inducible control mechanism, and as a negative controlmechanism by forming a DNA loop to block transcription. As far as we know, our model is the firstBoolean network model of a system that uses DNA looping to regulate transcription.We use computational algebra and Sage to analyze our model, which has variables and parameters.This leads to possible initial conditions, and our analysis shows that for all of these, our model accuratelypredicts the biological behavior of the operon and also provides insight into interactions within the network.Additionally, when there are medium levels of arabinose, the model predicts the observed bistability of thesystem. However, our model under synchronous update contains a few artificial cycles, but these go away Mathematics Subject Classification.
Primary: 92C42; Secondary: 13P25, 13P10, 70K05.
Key words and phrases. arabinose, ara operon, bistability, Boolean network model, DNA looping, fixed points, gene regulatorynetwork, inducer exclusion, Gr¨obner basis, synchrony.Partially supported by a National Science Foundation grant (DMS-1211691) and a Simons Foundation collaboration grant formathematicians (Award a r X i v : . [ q - b i o . M N ] N ov A. JENKINS AND M. MACAULEY when we look at a general asynchronous update scheme. This should serve as a cautionary tale as to howcertain artifacts can arise from a synchronous update in a molecular network model.This paper is organized as follows: in Section 2, we begin with a review of the biology of the ara operon, and we follow that with a review of Boolean and logical models. Finally, we propose our modeland justify the choice of functions. In Section 3, we convert our model into polynomials in F [ x , . . . , x n ] and we show how to use Sage to compute Gro¨obner bases to find the fixed points. We also use softwaresuch as ADAM [HBG + HE L - ARABINOSE OPERON IN E. COLI
Biological background.
A core theme throughout molecular biology is the so-called “central dogma”,which explains how information flows from genotype (the genetic code) to the phenotype (the observablephysical characteristics of an organism) [Cea70]. Genetic information encoded in the nucleotides of theDNA strand undergoes transcription, in which RNA polymerase enzymes “unzip” the DNA and read it,producing a messenger RNA strand. This mRNA then undergoes translation by ribosomes into an aminoacid sequence known as a polypeptide, or protein. The functions of these proteins then lead to the pheno-type. Clearly this process requires extensive regulation, as the desire for presence or absence of a protein,as well as its concentration, are dependent on the current state of the organism and its environmental con-ditions. Regulation of gene expression can occur at any of the steps during the flow of information, butthe regulation of transcription in prokaryotic organisms has been particularly well-studied, and most ofthe early understanding of genetic regulatory systems came from the study of operons. An operon is acollection of contiguous genes with related functions that are all transcribed together onto a single mRNAstrand, along with two control sequences upstream of the genes.The genes in an operon whose protein products perform some coordinated function are known as struc-tural genes. The two control sequences are a promoter , a region of DNA to which RNA polymerase bindsto initiate transcription, and an operator , a sequence of nucleotides located between the promoter andstructural genes whose status determines whether transcription will occur. The state of the operator and itsmethod of controlling gene expression can be categorized into one of four types. Operons can be inducible ,which means that their “default state” is off, except when needed. The opposite is a repressible operon,which means that the genes are transcribed unless there is a reason or need to turn them off. Additionally,the method of regulation can be positive (an activator protein binds to the operator to initiate transcription)or negative (a repressor protein binds to the operator to prevent transcription).One of the most well-studied prokaryotic organisms is
E. coli , a rod-shaped bacterium commonly foundin the gut of mammals and birds. It has a short genome which has been fully sequenced, and its physiologyis well-understood. Since
E. coli lives in the intestines, its nutrition depends on the diet of its host. Forexample, if the host consumes milk, then
E. coli is exposed to lactose, or milk sugar. Lactose can be usedas an energy source, but gene products such as proteins and enzymes are needed to carry out tasks liketransporting the molecules into the cell and then breaking them down. These gene products are not neededand therefore will not be produced if lactose is not available. The genes that regulate this make up the lac operon, and it is an example of an inducible operon with negative control. It was the first genetic regulatorymechanism to be completely understood, and it is the prototypical example of a gene regulatory network.As such, it is standard material in introductory molecular biology classes, and is used as test cases for newmodeling paradigms.Lactose is just one of many sugar molecules that
E. coli can use as an energy source. Another is thefive-carbon sugar L -arabinose (the L - prefix stands for levorotation , which describes its chirality), and itsmetabolism into carbon and energy is also controlled by the gene regulatory network called the ara operon.Both lactose and arabinose are complex sugar molecules which need to be broken down before they can beused for energy. Glucose, a more simple molecule, is a preferred energy source because it does not haveto be modified to enter the respiratory pathway. Thus, the lac and ara operons should be off if glucose ispresent. YNCHRONY IN A BOOLEAN NETWORK MODEL OF THE L-ARABINOSE OPERON IN ESCHERICHIA COLI 3 F IGURE
1. Without arabinose (left), the AraC protein forms a dimer which bonds to theDNA strand in two places, causing the DNA to loop and block transcription. At right isthe case of when arabinose is present. Images taken from Wikipedia.The ara operon contains three structural genes, called araBAD (that is, the adjacent ara genes B, A, andD). Transcription of these genes is controlled by the promoter p BAD . These genes code for thee enzymeswhich catalyze the following reactions needed for arabinose metabolism: the isomerase AraA converts L -arabinose to L -ribulose, the kinase AraB phosphorylates L -ribulose to L -ribulose-phosphate, and finallythe epimerase AraD converts L -ribulose-phosphate to D -xylulose-phosphate, which then enters the pentosephosphate pathway. In addition to the enzymes needed to metabolize arabinose, proteins are needed totransport it through the cell membrane. This is controlled by two different transport systems, both locatedupstream from the ara operon. The araE gene produces AraE, a membrane-bound protein that functions asa transporter in a low affinity transport system, and the araFGH genes which produce three correspondingproteins that together form a high affinity transport system known as an ATP-binding cassette.Like the lac operon, the ara operon is an inducible operon, but it differs in that it has both positive andnegative regulation. Specifically, the expression of the structural genes ( araBAD ) is regulated by the proteinAraC, which can function as either a repressor or an activator depending on the intracellular concentrationsof arabinose. The cell maintains a small amount (approx. 20 molecules) of AraC at all times, which bindsto two sites I , O on the DNA strand that causes a DNA loop structure, as shown in the image on theleft in Figure 1. This loop acts as a repressor of transcription of both the araBAD genes and the araC gene by physically blocking RNA polymerase from attaching to either promoter sequence. This method ofrepression is quite different than the method used in the lac operon, where a repressor protein binds directlyonto the DNA strand to block transcription. However, the end result is the same. If extracellular arabinose ispresent, some molecules can be transported into the cell via passive transport. Once intracellular arabinoseis available, these molecules will bind to the AraC protein and cause it to undergo a conformational change,which leads to it dissociating from the O operator site and subsequently binding to the I site. This isshown in the image on the right of Figure 1. In this arabinose-bound form, AraC now functions as anactivator and induces binding of RNA polymerase to the p BAD , p E , p F GH , and p C promoter regions.Transcription can now begin. In fact, the AraC protein was the first known positive regulator in an operon,which was verified in the early 1970s [GS71]. Later experiments [DBHH72], [OHS +
80] uncovered thefull structure of the operon; see [Sch00] for a gentle survey article.Since glucose is the preferred energy source to arabinose, it must play a role in regulating the transcrip-tion of the araBAD structural genes, and it does this via a method called catabolite repression . In additionto the arabinose-bound AraC protein, a second activator known as the cyclic AMP catabolite activator pro-tein complex (or cAMP-CAP complex) must bind to the promoter region to cause the DNA to undergofurther conformational changes to allow the binding of RNA polymerase. One can think of this complexas a “key” which is required for transcription to begin. The presence of glucose inhibits the production ofcAMP, which reduces the concentration levels of the cAMP-CAP complex, thereby preventing transcrip-tion of the ara operon. The lac operon not only uses catabolite repression, but also a form or repressioncalled inducer exclusion , where glucose binds to the transporter protein, causing a conformal change inpreventing lactose from entering the cell. This mechanism is not used in the ara operon [SR76], whichactually makes modeling it more delicate.
A. JENKINS AND M. MACAULEY
Boolean modeling background.
Molecular networks have been classically modeled using continu-ous methods, such as systems of differential equations. Examples of this for the lac operon can be foundin [BM85], [YM03], and [YSHM04]. The first Boolean network model of the lac operon was published in2011 [VCS11].Boolean network models were first proposed in 1969 by theoretical biologist S. Kauffman [Kau69] asmodels of gene regulatory networks. In 1973, biologist Ren´e Thomas published a discrete formalizationof biological regulatory networks that are often called logical networks [Tho73], and these continue to bestudied and used in modeling today. The framework starts with a finite set [ n ] := { , . . . , n } of nodes, eachone having a state x i in a finite set S i = { , . . . , r i − } . The case of each r i = 2 is a Boolean network .These are the most common, and henceforth, we will assume that S i = F = { , } . Each node also hasan update function f i : F n → F that updates its state based on the states of the other nodes. The updatefunctions are then put together to generate the global system dynamics which can be encoded by a directedgraph called the phase space . There are several ways to do this, but in all of them, the vertex set is the setof global states F n , and the edges represent transitions between states.The two basic ways to generate the dynamics are by applying the update functions synchronously orasynchronously. In a synchronous update, the dynamical system map f : F n → F n is defined by f =( f , . . . , f n ) . This defines a discrete-time dynamical system, and so the phase space consists of the n edges of the form ( x, f ( x )) , where x ∈ F n . Every node has one outgoing edge, and the periodic pointsare those on simple cycles; those of length are called fixed points . One advantage of a synchronousupdate is the local functions and dynamical map are multivariate polynomials over F , and so this opensthe door to using the rich toolbox of computational algebra to study biological systems. This algebraicapproach has been used extensively by R. Laubenbacher and his collaborators; see [LS09], [VCJL10] foran overview. For example, when reverse-engineering a model from data, the model space consists of thesum of the vanishing ideal and a particular solution. Approaches to the model selection problem includecomputing an ideal and its primary composition [LS04], or computing a Gr¨obner fan, selecting the conewith maximum weight, and then computing the Gr¨obner normal form of the corresponding Gr¨obner basis[DJLS07].One major drawback to this approach is that molecular networks do not have a “central clock” thatupdates every component synchronously – different gene products may have vastly different timescales,and there is always a degree of randomness involved. Therefore, many logical network models assumea general asynchronous update . In this case, the phase space consists of the n · n edges of the form ( x, f i ( x )) , though self-loops are frequently omitted. The time-evolution of the global system state can bedescribed as a walk on the phase space. With probability , a “random walk” will end up in a maximalstrongly connected component that has no out-going edges. These could be fixed points or cycles (called cyclic attractors ), but they can also be more complicated subgraphs called complex attractors [Td90].Regardless of whether the update functions are composed synchronously, asynchronously, or in someother method, the wiring diagram is defined to be the graph with vertex set [ n ] = { , . . . , n } and a directededge from i to j (or x i to x j ) if the function f j depends on x i . Such an edge is positive if f j ( x , . . . , x i − , , x i +1 , . . . , x n ) ≤ f j ( x , . . . , x i − , , x i +1 , . . . , x n ) and negative if the inequality is reversed. Positive edges are typically denoted as x i x j , and negativeedges as x i x j or x i x j . Of course, edges can be neither positive nor negative, though mostinteractions in molecular networks are one or the other – either activation or inhibition [Rae02].Early work on Boolean networks focused on models of well-known biological networks, as well aswhich network motifs in the wiring diagram can lead to common biological phenomenon. In 1981,Thomas conjectured [Tho81] that positive feedback loops are necessary for the dynamics to exhibit multi-stationarity (having multiple fixed points) and negative feedback loops are necessary for cyclic attractors.In this context, a feedback loop is a directed cycle in the wiring diagram, and it is positive if it consists ofan even number of negative edges, and odd otherwise. Biologically, a fixed point describes a biologicalsteady state such as phenotype, and a cyclic attractor can describe a biological phenomenon like a cell cycleor homeostasis. Thomas’ conjectures have been proven for both continuous systems such as differentialequations (e.g., [Sno98]) as well as discrete system such as logical or Boolean networks (e.g., [RRT08]). Itis easy to show that fixed points do not depend on the update order. Additionally, one can make a case thatdue to the robustness of biological systems, issues such as timing should not affect the long-term dynamics. YNCHRONY IN A BOOLEAN NETWORK MODEL OF THE L-ARABINOSE OPERON IN ESCHERICHIA COLI 5 f ( x , x ) = x f ( x , x ) = x F IGURE
2. A simple Boolean network and its 2-node wiring diagram. To the right is thephase space under a synchronous update, which contains two fixed points and a 2-cycle.On the far right is the phase space under general asynchronous update. This suggests thatif the nodes are updated randomly, then the system will end up in one of the fixed pointswith probability , and so the -cycle is an artifact of synchrony.However, a synchronous update can introduce artifacts into the phase space. A simple example of this isshown in Figure 2. Alternatively, non-determininstic versions of discrete models have been developed andstudied. Examples include probabilistic Boolean networks [MVCA +
12] and stochastic discrete dynamicalsystems [SD10].Since the lac operon was the first operon discovered and remains the most widely known, it is nosurprise that it was the first operon to be modeled in a Boolean or logical network framework. This wasdone in 2011 by Veliz-Cuba and Stigler [VCS11]. We will begin by summarizing their work here in a fewsentences, and then carry out a similar undertaking for the ara operon. In [VCS11], the authors proposed amodel of the lac operon with 10 Boolean variables that represent relevant gene products (mRNA, enzymesand proteins) and intracellular lactose. For most of these, one can think of as denoting “absent” or “low(basal) concentration”, and as denoting “present” or “high concentration”. A system state is thus a vectorin F , and they use a synchronous update f = ( f , . . . , f ) to get a dynamical system map f : F → F .Additionally, their model has three parameters: G e , L e , and L em , which are treated as constants and appearin some of the update functions. Extracellular glucose is either present ( G e = 1 ) or absent ( G e = 0 ).Concentration levels of extracellular lactose can be one of three levels, which are described using twoBoolean variables. The variable L e means “high levels of extracellular lactose” and L em means (at least)medium levels. Together, they describe high levels by ( L e , L em ) = (1 , , medium by ( L e , L em ) = (0 , ,and low (basal) levels by ( L e , L em ) = (0 , . The fourth possibility, ( L e , L em ) = (1 , , is ignoredbecause it is meaningless. This gives six possible values for the parameter vector ( L e , L em , G e ) ∈ F ,each representing an initial condition of the model.For five of the six initial conditions, the authors verify that the phase space consists of exactly one basinof attraction (connected component) with one fixed point. Each of these fixed points either correspondsto the lac operon being either OFF or ON, and each bit of the vector matches exactly what one shouldexpect biologically. The sixth initial condition, when ( L e , L em , G e ) = (0 , , represents medium levelsof extracellular lactose and no glucose. In this case, there are two basins of attractions and two fixed points,one OFF and one ON. This represents the important biological phenomenon of bistability. We will describethis in more details later, because it also arises in our model of the ara operon.2.3. Proposed Boolean model.
Our model of the ara operon consists of nine Boolean variables and fourparameters (or constants), which differ from the variables primarily in the time-scales over which theychange. For example, the presence of extracellular glucose or arabinose can certainly change, but it mightdo so on the scale of hours. In contrast, the concentrations of enzymes, proteins, or mRNA may changeon a scale of seconds or minutes. Thus, we consider extracellular glucose and arabinose to be parametersof the model, and the concentration of the gene products as variables. We also define a parameter forthe unbound AraC protein, since the gene that codes for it is not part of the operon. Our model uses thefollowing Boolean parameters: A e = extracellular arabinose (high levels) A em = extracellular arabinose (at least medium levels) Ara − = AraC protein (unbound to arabinose) The trp operon has been modeled in a discrete framework, but using Petri nets [SRTC05].
A. JENKINS AND M. MACAULEY G e = extracellular glucoseThe subscript m denotes medium concentration, and allows us to distinguish between three levels ofarabinose concentration. Low or “basal” levels occur when ( A e , A em ) = (0 , , medium levels when ( A e , A em ) = (0 , , and high levels when ( A e , A em ) = (1 , . The fourth case, when ( A e , A em ) = (1 , can be ignored, as it is meaningless. Next, we define the following Boolean variables: M S = mRNA of the structural genes ( ara BAD ) M T = mRNA of the transport genes ( ara EF GH ) E = enzymes AraA, AraB, and AraD, coded for by the structural genes T = transport proteins, coded for by the transport genes A = intracellular arabinose (high levels) A m = intracellular arabinose (medium levels) C = cAMP-CAP protein complex L = DNA loop
Ara + = arabinose-bound AraC proteinThe variable L is if the DNA is looped, and if it is not looped. All of the other variables representconcentration levels of the corresponding gene product: for “high”, and for “low”. Note that wedistinguish between the AraC protein that is unbound to arabinose (which acts as a repressor) and thearabinose-bound protein (which acts as a promoter), with the parameter Ara − and the variable Ara + ,respectively.For each of these variables, we propose a Boolean function that represents how its state is influencedby the other variables and parameters. We justify these functions below, using the standard Boolean logicsymbols ∧ for “AND”, ∨ for “OR”, and X to denote “NOT X ”.For the structural genes’ mRNA to be transcribed, we need the presence of the cAMP-CAP proteincomplex and the arabinose-bound AraC protein, as well as the DNA to be unlooped. The Booleanfunction is f M S = C ∧ Ara + ∧ L .For the transport genes’ mRNA to be transcribed, the cAMP-CAP protein complex and the arabinose-bound AraC protein are needed. The Boolean function is f M T = C ∧ Ara + .For the enzymes involved in the metabolism of arabinose (AraA, AraB, and AraD) to be present,the structural genes’ mRNA is needed. The Boolean function is f E = M S .For the transport proteins to be translated, the transport genes’ mRNA is needed. The Booleanfunction is f T = M T .For intracellular arabinose to be at a high concentration, high levels of extracellular arabinose andthe transport protein must be present. The Boolean function is f A = A e ∧ T .For intracellular arabinose to be at medium concentration (or higher), we require the presence ofeither extracellular arabinose at a high concentration, or a medium concentration of extracellulararabinose and the transport protein. The Boolean function is f A m = ( A em ∧ T ) ∨ A e .For the cAMP-CAP protein complex to be present we require the absence of external glucose. TheBoolean function is f C = G e .For the DNA loop to be formed, the AraC protein must be present, but it cannot be bound toarabinose. The Boolean function is f L = Ara − ∧ Ara + .For the arabinose-bound form of the AraC protein to be formed, AraC protein must be present,as well as either low or high concentrations of intracellular arabinose. The Boolean function is f Ara + = Ara − ∧ ( A m ∨ A ) .Those readers familiar with the Boolean model of the lac operon in [VCS11] may notice that we donot include a ∧ G e clause in f A or f A m like Veliz-Cuba and Stigler do in the functions f L and f L m forintracellular lactose. The lactose and arabinose transporter proteins are different, and glucose does notrepress arabinose via inducer exclusion the way it does lactose.3. A NALYSIS AND MODEL VALIDATION
There are = 16 possibilities for the parameter vector x = ( A e , A em , G e , Ara − ) ∈ F , but wedisregard the four where A e = 1 and A em = 0 because they are meaningless. This leaves possibleinitial conditions of our model, and we need to consider all of these in our analysis of the system dynamics. YNCHRONY IN A BOOLEAN NETWORK MODEL OF THE L-ARABINOSE OPERON IN ESCHERICHIA COLI 7 M S C Ara − TEL M T AAra + G e A e f A = A e ∧ Tf A m = ( A em ∧ T ) ∨ A e f Ara + = ( A m ∨ A ) ∧ Ara − f C = G e f E = M S f L = Ara + ∧ Ara − f M S = Ara + ∧ C ∧ Lf M T = Ara + ∧ Cf T = M T F IGURE
3. The wiring diagram (left) and logical functions (right) of our proposedBoolean model of the ara operon. Circles represent variables and rectangles representparameters. The two nodes for intracellular arabinose ( A and A m ) are collapsed intoone, as are the two nodes for extracellular arabinose.For each one, we look at the periodic orbit structure of the phase space. Naturally, this depends on theupdate scheme. However, in any Boolean network, the fixed points are the same under synchronous andasynchronous update. In this section, we use computational algebra to compute the fixed points for all initial conditions. For all cases of a single fixed point, we argue why that steady state makes sensebiologically. After that, we examine the case with two fixed points that predicts bistability. Finally, weneed to verify that there are no larger periodic orbit structures, and for this we turn to the GINsim (GeneInteraction Network simulation) software. It turns out that longer limit cycles appear under a synchronousupdate order, but these go away for general asynchronous update.3.1. Fixed points and Gr¨obner bases.
The fixed points of our Boolean network model can be found bysolving the system { f x i = x i | i = 1 , . . . , } where we rename our Boolean variables as follows: ( A, A m , Ara + , C, E, L, M S , M T , T ) = ( x , x , x , x , x , x , x , x , x ) . The resulting system is shown below, in Boolean logical form on the left, and in polynomial form on theright. The conversion from logical to polynomial form is done by replacing x ∧ y with xy , x ∨ y with x + y + xy , and x with x . f A = A e ∧ T = Af A m = ( A em ∧ T ) ∨ A e = A m f Ara + = ( A m ∨ A ) ∧ Ara − = Ara + f C = G e = Cf E = M S = Ef L = Ara + ∧ Ara − = Lf M S = Ara + ∧ C ∧ L = M S f M T = Ara + ∧ C = M T f T = M T = T ⇐⇒ x + x A e = 0 x + A em x + A e + A em A e x = 0 x + ( x + x + x x ) Ara − = 0 x + G e + 1 = 0 x + x = 0 x + ( x + 1) Ara − = 0 x + x x ( x + 1) = 0 x + x x = 0 x + x = 0 Systems such as these can be solved rather easily by computing a Gr¨obner basis with computational algebrasoftware such as Macaulay2 or Sage. As an example, when the parameter vector is ( A e , A em , Ara − , G e ) =(0 , , , , we type the following commands into Sage: P.
A. JENKINS AND M. MACAULEY
Initial condition(s) Fixed point(s) Operon x = ( A e , A em , Ara − , G e ) ( A, A m , Ara + , C, E, L, M S , M T , T ) state (0 , , ,
0) (0 , , , , , , , , OFF (0 , , , , , ,
0) (0 , , , , , , , , OFF (0 , , ,
1) (0 , , , , , , , , OFF (0 , , , , , ,
1) (0 , , , , , , , , OFF (0 , , ,
0) (0 , , , , , , , , OFF (0 , , ,
1) (0 , , , , , , , , OFF (0 , , , , , ,
1) (0 , , , , , , , , OFF (1 , , ,
0) (1 , , , , , , , , ON (0 , , ,
0) (0 , , , , , , , , OFF (0 , , , , , , , , ONT
ABLE
1. The fixed points of our Boolean network model for each choice of parameters,excluding the nonsensical ones where A e = 1 and A em = 0 . I = ideal(x1+x9*Ae, x2+(Aem*x9+Ae+Aem*Ae*x9), x3+(x1+x2+x1*x2)*ara,x4+Ge+1, x5+x7, x6+(x3+1)*ara, x7+x3*x4*(x6+1), x8+x3*x4, x9+x8);B = I.groebner.basis();
The output is the following: [x1, x2 + x9, x3 + x9, x4 + 1, x5 + x9ˆ2, x6 + x9 + 1, x7 + x9ˆ2, x8 + x9].
This says that the original nonlinear system of equations has the same set of solutions over F as themuch simpler system { x = 0 , x + x = 0 , x + x = 0 , x + 1 = 0 , x + x = 0 , x + x + 1 = 0 , x + x = 0 , x + x = 0 } . It is easy to verify by hand (recall that x i = x i ) that there are two solutions: x OF F = (0 , , , , , , , , , and x ON = (0 , , , , , , , , . We will now analyze and interpret these fixed points. The parameter vector ( A e , A em , Ara − , G e ) =(0 , , , means that there are medium levels of extracellular arabinose but no glucose, and that the un-bound AraC protein is present. The solution x OF F corresponds to the steady state when the cAMP-CAPcomplex is present ( C = 1) , the DNA is looped ( L = 1) , and all other variables are . This makes sense bi-ologically, since the DNA loop blocks transcription, thereby preventing the other gene products from beingpresent: arabinose, the arabinose-bound AraC protein, mRNA, and the enzymes and transporter proteins.In this case, the operon is OFF. The other steady state x ON corresponds to the case where the arabinose-bound AraC protein and all of the gene products are present, but the DNA loop is not ( L = 0 ), and internalarabinose levels are medium by not high ( A m = 1 , A = 0 ). This describes the ara operon being ON.The fact that there are two fixed points for the initial condition described above shows how the operonpredicts bistability, and this will be discussed in more detail in Section 3.2. Before that, there are otherinitial conditions that need to be analyzed in a similar manner. In all of these cases, there is only onefixed point. These are summarized in Table 1, grouped by the fixed point. We will justify the biologicalinterpretation of these fixed points now, going down the table from top-to-bottom.We start with the two initial conditions for which A e = Ara − = G e = 0 , which represent the casesof at most medium concentration levels of extracellular arabinose, no glucose, and no AraC protein (e.g.,due to a mutation in the araC gene). As shown in the first row of Table 1, the fixed point of this systemhas only C = 1 . This makes biological sense due to the following arguments: arabinose does not enter thecell (so A = A m = 0 ). Thus, there is no AraC protein bound to arabinose (so Ara + = 0 ), which meansthat the DNA is not looped ( L = 0) . Despite the absence of the DNA loop which blocks transcription,the absence of the AraC protein prohibits the initiation of transcription, so all of the gene products will be YNCHRONY IN A BOOLEAN NETWORK MODEL OF THE L-ARABINOSE OPERON IN ESCHERICHIA COLI 9 absent ( E = M S = M T = T = 0 ). Finally, the cAMP-CAP protein complex will be present ( C = 1 )because glucose is needed to reduce the concentration of CAP.A similar case holds for the initial condition ( A e , A em , Ara − , G e ) = (1 , , , . Here, the only differ-ence is that the high level of extracellular arabinose results in a few arabinose molecules entering the cellvia passive transport, so A m = 1 . The rest of the argument is the same.Another case similar to the first two occurs for the three initial conditions for which Ara − = 0 and G e = 1 . Once again, the first argument above carries through, except that the presence of glucose reducesCAP concentration levels. This results in C = 0 and the zero vector as the steady state, unless there arehigh levels of extracellular arabinose, in which case A m = 1 due to passive transport.The remaining cases all describe initial conditions where the AraC protein is present, i.e., Ara − = 1 .For the first, there is no arabinose or glucose, and so the DNA will be looped and cAMP-CAP will be high.The remaining variables should be zero because transcription cannot initiate with Ara + = 0 . In the nexttwo cases, glucose is present, so the cAMP-CAP complex will not be, and transcription cannot begin. Thismeans that there will be no transporter proteins, and since the extracellular arabinose levels are not high,the intracellular arabinose levels will remain low. All other variables will be .Finally, consider the two initial conditions with high levels of extracellular arabinose and Ara − = 1 .Some arabinose will enter the cell via passive transport and bind to the AraC protein, so Ara + = 1 andthe DNA will be unlooped. If glucose is present, then CAP levels are low and so transcription cannotbegin, and the fixed point that is reached is OFF, with only A m = Ara + = 1 . If glucose is absent, thentranscription initiates, and the ON fixed point is reached, with all states except L = 0 .3.2. Bistability.
Let us return to the initial condition describing medium levels of extracellular arabinoseand no glucose. Biologically, this captures the phenomenon of bistability , which says that the operon canreside in either the OFF or ON steady state. In classical differential equation models, bistability occurswhen there are two steady states which are separated by an unstable steady state. To understand this inthe ara operon, let [ A ] denote concentration of intracellular arabinose, and fix a gene product X such asmRNA, an enzyme, or protein, with concentration [ X ] . If [ A ] ≈ , then there is one steady state solution [ X ] = X ∗ ≈ . One wishes to understand how the steady state X ∗ depends on [ A ] , and this can beplotted in the ([ A ] , X ∗ ) -plane. An example of this that could arise from a differential equation model isshown on the left in Figure 4. If [ A ] ≈ , then X ∗ ≈ as well. As [ A ] increases continuously, then sodoes X ∗ , until [ A ] exceeds some threshold τ ↑ , at which point X ∗ undergoes a discontinuous “jump” fromthe operon being OFF to ON. However, once the operon is in the ON state, [ A ] must be reduced to somelower threshold τ ↓ < τ ↑ before X ∗ can “jump” down to the OFF state. This means that if the arabinoseconcentration [ A ] is in the range ( τ ↓ , τ ↑ ) , then the operon can be observed as ON in some cells and OFFin others. Transcription of the araBAD genes will likely not be initiated in cells that were raised in anarabinose-starved environment when [ A ] gets increased to this middle range. In contrast, transcription willlikely be observed in cells raised with arabinose, even as [ A ] is decreased into this range. The S-shape ofthe curve in Figure 4 is due to a third unstable steady state in the bistable region, which is denoted by thedashed line. The Boolean analogue of this bistability is shown on the right in Figure 4.To see how our Boolean network captures bistability, consider the initial condition whose parametervector is x = ( A e , A em , Ara − , G e ) = (0 , , , and look at Table 1. Some cells, such as those raisedin an arabinose-starved environment, end up in the OFF fixed point (0 , , , , , , , , . Now, con-sider increasing extracellular arabinose levels. When the concentration exceeds the “up-threshold” τ ↑ ,then A e flips to in the parameter vector. This results in the system settling in a new ON fixed point (1 , , , , , , , , . On the other hand, cells raised in an arabinose-rich environment will be initiallyobserved in the ON fixed point (0 , , , , , , , , . But if arabinose levels get too low, then A em = 0 in the parameter vector, and the system settles in a new OFF fixed point, (0 , , , , , , , , .3.3. Longer limit cycles and general asynchronous update.
At this point, we have verified that the fixedpoints of our Boolean network model make sense biologically for all 12 choices of initial conditions. Itremains to check that there are no other longer periodic cycles in the phase space. It would be nice to havealgebraic methods to do this or a theorem with sufficient conditions for having only one basin of attraction.As far as we know, neither of these exist, and so we need to resort to computer simulations. Using the freely F IGURE
4. On the left is a curve the describes the steady states of a gene product X inan ODE model, as a function of concentration of extracellular arabinose. On the right is aBoolean analogue of this. The dashed curve represents unstable steady states. Bistabilityoccurs when [ A ] ∈ ( τ ↓ , τ ↑ ) because there are two stable steady states.available online software Analysis of Dynamic Algebraic Models (ADAM) [HBG + initial conditions with only fixed points, there is only one basin of attraction consistingof all nodes. However, for the case where bistability is observed is more complicated. The phase spaceactually consists of six attractor basins: two with nodes each that lead into the fixed points, one with nodes that leads into a -cycle, (0 , , , , , , , , −→ (0 , , , , , , , , , and three with nodes each, that lead into the following -cycles: (0 , , , , , , , , −→ (0 , , , , , , , , −→ (0 , , , , , , , , −→ (0 , , , , , , , , , (0 , , , , , , , , −→ (0 , , , , , , , , −→ (0 , , , , , , , , −→ (0 , , , , , , , , , (0 , , , , , , , , −→ (0 , , , , , , , , −→ (0 , , , , , , , , −→ (0 , , , , , , , , , This presents a problem because these longer periodic cycles do not make sense biologically. One expla-nation could be that that the model is faulty, in that either it does not represent the biology well, or perhapsthe ara operon is simply ill-suited for a Boolean network framework. Another possibility could be thatthese periodic cycles are an artifact of using a synchronous update order. To test this, we used the
GeneInteraction Network simulation (GINsim) software [CNT12] to analyze the phase space under a generalasynchronous update, which is more representative of an actual molecular network. Under a general asyn-chronous update, the periodic cycles merge and the only attractors in the phase space are the original twofixed points. This provides evidence that the longer periodic cycles are indeed artifacts of a synchronousupdate order and not a flaw in the model design.4. D
ISCUSSION AND FUTURE WORK
In this paper, we proposed a Boolean network model of the ara operon in E. coli on variables and parameters. Unlike the lac operon which was modeled with a Boolean network in [VCS11], the ara operonuses DNA looping to block transcription, it has a protein that acts as both a repressor and an activator, andglucose does not repress the transport of arabinose via inducer exclusion. We showed that under 11 of the12 possible initial conditions, the system reaches a fixed point that agrees with the biology. In the last case,there were two fixed points – one for the operon being ON and the other OFF. This captures the notionof bistability under medium levels of arabinose and no glucose. Though there were other periodic cyclesunder an “artificial” synchronous update, these went away under a general asynchronous update order. Thechance that this would happen in all of these cases by pure luck is astronomically small, which providesstrong evidence for the strength and robustness of our model and the Boolean network framework.Our analysis used both computational algebra and software tools. We found the fixed points by writingthe Boolean functions as polynomials and then computing a Gr¨obner basis using Sage. Of course, this was The ADAM software is being overhauled and replaced with a crowd-sourced version called TURING, currently in Beta [HL16].
YNCHRONY IN A BOOLEAN NETWORK MODEL OF THE L-ARABINOSE OPERON IN ESCHERICHIA COLI 11 not necessary because this network is small enough that we could have used Boolean network software suchas ADAM and/or GINsim to complete it. However, these methods are necessary for larger networks wherethe phase space is too big for Boolean network software to handle. That said, one of the shortcomings isthat not all of the model analysis can be done using computational algebra, and it would be desirable tofurther develop this theory. For example, is there a way to deduce that that a model has only one attractorbasin, or only fixed points, just from the Boolean functions? There are some known results and theoremsalong these lines, but we were not able to find any the directly applied to our model. Examples of suchresults include the following, mostly for general asynchronous update. In [Rob80], Robert showed thatif the wiring diagram has no directed cycle, then f has a unique fixed point. However, this does noteliminate the possibility of other periodic cycles or complex attractors. Robert’s result was generalizedby Shih and Dong by restricting the directed cycles to certain subgraphs of the wiring diagram [SD05].In [Ric15], Richard gave a sufficient condition by proving a “forbidden subnetworks theorem.” By atheorem of Richard [Ric10], negative cycles are necessary for the existence of a cyclic attractor. Sincethe ara operon has no negative cycles, the phase space cannot contain periodic cycles. However, thatdoes not necessarily forbid the existence of long complex attractors. Another body of work involves theanalysis and control of Boolean networks using semi-tensor products [CQL11], but it is not clear whetherthis work is directly applicable. Future work on mathematical side includes either finding or developingtheorems that would apply for this type of system. Current and future work on the biological side consistsof modeling not only other operons in the Boolean network framework, but more complicated systemssuch as regulons and modulons. In some cases, if the biology is not completely understood, then algebraicreverse engineering techniques (e.g., [LS04] or [DGPH + EFERENCES [BM85] S. Busenberg and J. Mahaffy. Interaction of spatial diffusion and delays in models of geneticcontrol by repression.
J. Math. Biol. , 22(3):313–333, 1985.[Cea70] F. Crick and et al. Central dogma of molecular biology.
Nature , 227(5258):561–563, 1970.[CNT12] C. Chaouiya, A. Naldi, and D. Thieffry. Logical modelling of gene regulatory networks withGINsim.
Bacterial Molecular Networks: Methods and Protocols , pages 463–479, 2012.[CQL11] D. Cheng, H. Qi, and Z. Li.
Analysis and control of Boolean networks: a semi-tensor productapproach . Springer Science & Business Media, 2011.[DBHH72] M.E. Doyle, C. Brown, R.W. Hogg, and R.B. Helling. Induction of the ara operon of Es-cherichia coli B/r.
J. Bacteriol. , 110(1):56–65, 1972.[DGPH +
11] E. Dimitrova, L.D. Garc´ıa-Puente, F. Hinkelmann, A.S. Jarrah, R. Laubenbacher, B. Stigler,M. Stillman, and P. Vera-Licona. Parameter estimation for boolean models of biologicalnetworks.
Theor. Comput. Sci. , 412(26):2816–2826, 2011.[DJ02] H. De Jong. Modeling and simulation of genetic regulatory systems: a literature review.
J.Comp. Biol. , 9(1):67–103, 2002.[DJLS07] E.S. Dimitrova, A.S. Jarrah, R. Laubenbacher, and B. Stigler. A gr¨obner fan method forbiochemical network modeling. In
Proc. Internat. Symposium Symb. Algebraic Comput. ,pages 122–126. ACM, 2007.[GS71] J. Greenblatt and R. Schleif. Arabinose C protein: regulation of the arabinose operon invitro.
Nat. New. Biol. , 233(40):166–170, 1971.[HBG +
11] F. Hinkelmann, M. Brandon, B. Guang, R. McNeill, G. Blekherman, A. Veliz-Cuba, andR. Laubenbacher. ADAM: analysis of discrete models of biological systems using computeralgebra.
BMC Bioinformatics , 12(1):295, 2011.[HL16] A. Honsy and R. Laubenbacher. TURING: Algorithms for computation with finite dynam-ical systems. Published electronically at ,2016.[JPSM60] F. Jacob, D. Perrin, C. S´anchez, and J. Monod. L’op´eron: groupe de g`enes `a expressioncoordonn´ee par un op´erateur.
C.R. Acad. Sci. , 250:1727–1729, 1960.[Kau69] S.A. Kauffman. Metabolic stability and epigenesis in randomly constructed genetic nets.
J.Theor. Biol. , 22(3):437–467, 1969. [LS04] R. Laubenbacher and B. Stigler. A computational algebra approach to the reverse engineer-ing of gene regulatory networks.
J. Theor. Biol. , 229(4):523–537, 2004.[LS09] R. Laubenbacher and B. Sturmfels. Computer algebra in systems biology.
Amer. Math.Monthly , pages 882–891, 2009.[MVCA +
12] D. Murrugarra, A. Veliz-Cuba, B. Aguilar, S. Arat, and R. Laubenbacher. Modeling stochas-ticity and variability in gene regulatory networks.
EURASIP J. Bioinformatics Sys. Biol. ,2012(1):1–11, 2012.[OHS +
80] S. Ogden, D. Haggerty, C.M. Stoner, D. Kolodrubetz, and R. Schleif. The Escherichia coliL-arabinose operon: binding sites of the regulatory proteins and a mechanism of positiveand negative regulation.
Proc. Natl. Acad. Sci. , 77(6):3346–3350, 1980.[Rae02] L. Raeymaekers. Dynamics of Boolean networks controlled by biologically meaningfulfunctions.
J. Theor. Biol. , 218(3):331–341, 2002.[Ric10] A. Richard. Negative circuits and sustained oscillations in asynchronous automata networks.
Adv. Appl. Math. , 44(4):378–392, 2010.[Ric15] A. Richard. Fixed point theorems for Boolean networks expressed in terms of forbiddensubnetworks.
Theor. Comput. Sci. , 583:1–26, 2015.[Rob80] R. Robert. Iterations sur des ensembles finis et automates cellulaires contractants.
LinearAlgebra Appl. , 29:393–412, 1980.[RRT08] ´E. Remy, P. Ruet, and D. Thieffry. Graphic requirements for multistability and attractivecycles in a Boolean dynamical framework.
Adv. Appl. Math. , 41(3):335–350, 2008.[Sch00] R. Schleif. Regulation of the L-arabinose operon of Escherichia coli.
Trends Genet. ,16(12):559–565, 2000.[SD05] M.-H. Shih and J.-L. Dong. A combinatorial analogue of the Jacobian problem in automatanetworks.
Adv. Appl. Math. , 34(1):30–46, 2005.[SD10] I. Shmulevich and E.R. Dougherty.
Probabilistic Boolean Networks: the modeling andcontrol of gene regulatory networks . SIAM, 2010.[Sno98] E.H. Snoussi. Necessary conditions for multistationarity and stable periodicity.
J. Biol. Syst. ,6(01):3–9, 1998.[SR76] M.H. Saier and S. Roseman. Sugar transport. Inducer exclusion and regulation of the meli-biose, maltose, glycerol, and lactose transport systems by the phosphoenolpyruvate: sugarphosphotransferase system.
J. Biol. Chem. , 251(21):6606–6615, 1976.[SRTC05] E. Simao, E. Remy, D. Thieffry, and C. Chaouiya. Qualitative modelling of regulatedmetabolic pathways: application to the tryptophan biosynthesis in e. coli.
Bioinformatics ,21(suppl 2):ii190–ii196, 2005.[Td90] R. Thomas and R. d’Ari.
Biological feedback . CRC press, 1990.[Tho73] R. Thomas. Boolean formalization of genetic control circuits.
J. Theor. Biol. , 42(3):563–585, 1973.[Tho81] R. Thomas. On the relation between the logical structure of systems and their ability togenerate multiple steady states or sustained oscillations. In
Numerical methods in the studyof critical phenomena , pages 180–193. Springer, 1981.[VCJL10] A. Veliz-Cuba, A.S. Jarrah, and R. Laubenbacher. Polynomial algebra of discrete models insystems biology.
Bioinformatics , 26(13):1637–1643, 2010.[VCS11] A. Veliz-Cuba and B. Stigler. Boolean models can explain bistability in the lac operon.
Journal of Computational Biology , 18(6):783–794, 2011.[YM03] N. Yildirim and M.C. Mackey. Feedback regulation in the lactose operon: a mathematicalmodeling study and comparison with experimental data.
Biophysical J. , 84(5):2841–2851,2003.[YSHM04] N. Yildirim, M. Santillan, D. Horike, and M.C. Mackey. Dynamics and bistability in areduced model of the lac operon.
Chaos , 14(2):279–292, 2004. D EPARTMENT OF M ATHEMATICS , U
NIVERSITY OF G EORGIA , A
THENS , GA 30602, USA
E-mail address : [email protected] D EPARTMENT OF M ATHEMATICAL S CIENCES , C
LEMSON U NIVERSITY , C
LEMSON , SC 29634-0975, USA
E-mail address ::