A (possible) mathematical model to describe biological "context-dependence" : case study with protein structure
aa r X i v : . [ q - b i o . O T ] N ov A (possible) mathematical model to describe biological“context-dependence” : case study with protein structure
Anirban BanerjiBioinformatics Centre, University of PunePune-411007, Maharashtra, India.Email address : [email protected]
Abstract
Context-dependent nature of biological phenomena are well documented in every branch of biol-ogy. While there have been few previous attempts to (implicitly) model various facets of biologicalcontext-dependence, a formal and general mathematical construct to model the wide spectrum of context-dependence, eludes the students of biology. An objective and rigorous model, from both ’bottom-up’ aswell as ’top-down’ perspective, is proposed here to serve as the template to describe the various kinds ofcontext-dependence that we encounter in different branches of biology. Interactions between biologicalcontexts was found to be transitive but non-commutative. It is found that a hierarchical nature of depen-dence amongst the biological contexts models the emergent biological properties efficiently. Reasons forthese findings are provided with a general model to describe biological reality. Scheme to algorithmicallyimplement the hierarchic structure of organization of biological contexts was achieved with a constructnamed ’Context tree’. A ’Context tree’ based analysis of context interactions among biophysical factorsinfluencing protein structure was performed.
Keywords : Biological contexts; mathematical model; hierarchical organization; emergence;thread-mesh model; context tree.Introduction : ’Context-dependence’ is omnipresent in Biology. From the realm of substitution of nucleotides (Siepel etal. 2004, Zhang et al. 2007) to the paradigm of protein structure-function (Main et al. 1998, Nobeli etal. 2009), from the sphere of cellular dynamics (Hagan and Sharrocks 2002) to that in virulence studies inhost-parasite systems (Brown et al., 2003) and evolutionary dynamics (Jablonski et al. 2006), one encountersevents and processes that are “context-dependent”. While various attempts have been made from differingperspectives to somehow quantify context-sensitiveness of particular biological events (Andrianantoandro etal. 2006, Torney et al. 2009, Banerji and Ghosh 2011), a general mathematical framework that attemptsto capture and describe the ubiquitous ’context-dependence’, eludes the students of Biology. In the presentwork, such a mathematical structure is proposed that attempts to model biological ’context-dependence’from bottom-up as well as top-down perspectives. Although the need to engineer an exact scheme to de-scribe biological context-dependence was felt by many in recent past (Loewe 2009, Haseltine and Arnold 2007,Marguet et al. 2007, Platzer and Meinzer 2002), the present approach takes these concerns to a tangibleoutcome by proposing a general and robust theoretical construct to model biological organisation from bothtop-down and bottom-up perspectives with algorithmically implementable constructs. Unlike some previousattempts constructs proposed here do not tangentially touch upon context-dependence modelling (Standish2001, Edmonds 1999, Yartseva et al. 2007), but concentrate solely on it; nor do they restrict themselves into(successful yet) particular scopes (Doboli et al. 2000, Hoare et al. 2004). On the other hand, it does notattempt to construct a computational structure that helps in retrieval of biological data from some repositoryin a context-dependent manner (Yu et al. 2009, Boeckmann et al. 2005), nor does it propose some (effective)visualization tool to observe context-dependent interactions between biological properties (Gopalacharyuluet al. 2006). A reliable and general mathematical model to describe biological context-dependence is ofutmost necessity for contemporary Biology. This paper suggests a possible construct to achieve the same.1A recent work that attempted constructing a mathematical model to unambiguously describe the concept’evolvability’ (Valiant 2009), underlines the necessity of present genre of works.)We suggest the triad of the form
When λ phage (a virus that infects the bacteria Escherichia coli ) encounters a bacterium,it attaches itself only to certain particular receptors with specific structural features, on the bacterial mem-brane. That is, the relevant biological contexts ensure that binding of λ phage to various other candidatereceptor sites with slightly varying structural aspects, is not allowed. Subsequently, when the virus genomeenters the bacterium, only two pathways (out of theoretically infinite number of pathways) of alternativenature, namely the ’lytic pathway’ or the ’lysogenic pathway’ are allowed biologically (Yartseva et al. 2007);2lthough a theoretical thermodynamic study of the situation can suggest many possible pathways with (al-most) similar efficiencies. This entire process, in the context-space description can be modeled with non-zeroentries for the aforementioned two pathways, while the rest of the entries in A U will be assigned zero torepresent the fact that U specifies the contexts that ultimately ensures certain biological goals. Example-2)
Out of the entire spectrum of possible mRNAs that can be generated from a single gene,only one or a few are created at a time; nature of A U ensures that other possibilities do not come to being,although these theoretically possible elements of context-set A would have operated upon the same struc-tural parameters that constitute P . This act of ensuring the interplay between set of allowable contexts toachieve any particular goal, form the so-called ’regulatory mechanism’ that includes an extremely sensitivebalance between concentration magnitudes of pertinent entities, the destination, sequence variety, structuraldiversity and finally the functional options of the resulting protein related to the type of tissue or the stageof development, etc.. (Boeckmann et al. 2005).Henceforth, we remove corresponding rules to describe the biological impossibilities in U , making it surethereby that these compositions do not participate in the construction of the tree CT . Removing theseaforementioned entries, from the set A we obtain a non-redundant set of basic rules of context composition.From here onwards, we will denote this non-redundant set as set A . Model :Section-1) : Modeling biological context-dependence from bottom-up perspective
We can now attempt to describe the biological process of transportation, as an example. We assume threeelements { a , a , a } from the set of basic contexts A . We denote a as the context for loading, a for deliveryand a for unloading. To provide an example, we consider the case when axon of a neuron transports thetransmission of action potentials from the cell body to the synapse. The complicated mechanism of actionpotential propagation and their active transportation from their site of synthesis in the cell body through theaxoplasm to intracellular target sites in the axon and synapse, - provides an ideal case to demonstrate thevarious context-specific activities. Here the loading and delivery contexts ( a and a respectively) involves theseries synchronized couplings between allowed set of specific membranous organelles, synaptic vesicle precur-sors, signaling molecules, growth factors, protein complexes, cytoskeletal components, sodium and potassiumchannels and many other biological components. On the other hand, the unloading context ( a ) describesthe neurotrophic signals that are transported back from the synapse to the cell body, keeping an account ofefficiency and reliability of loading and delivery operations (Duncan and Goldstein 2006). Here we introducethe set of compositional rules U . U may then operate upon these three basic elements to ensure a sequentialexecution of elementary processes to ensure that final goal ( F ) is achieved. In this case the matrix A U canbe written as : a a a a a a (2)Hence the elementary biological context, α , is defined for only two cases from the entire spectrum of possi-ble contextual interactions; they are a αa (implying the existence of a pipeline where context for deliverycomes into action after the context for loading is ensured) and a αa (implying the existence of context forunloading after the operation of context for delivery is performed). It is interesting to note that a pipelinewith a αa is not biologically relevant, since it implies the existence of context for unloading after the contextfor loading without any delivery action. Similarly contextual relations like a αa (loading certain contextsrepeatedly without a purpose), a αa (existence of the context to ensure delivery when the initiation of theprocess is not ensured), a αa (context to ensure unloading when the delivery process is incomplete) andother spurious context-relations like a αa , a αa are assigned a magnitude zero because they are biologicalimpertinent. On a global scale, the composition a αa αa captures the purposeful nature of biological goals.The findings of the last paragraph implies that interactions between biological contexts, are transitive butare not commutative (since a αa and a αa are defined, a αa αa can be defined; but merely because3 αa exists doesn’t imply that a αa exists too). The non-commutative nature of biological contexts canbe understood better from the general treatment of the problem elaborated later.The (bottom-up) paradigm of description of interplay of biological contexts can be generalized by describingthe entire biological universe with the Thread-Mesh (TM) model (Banerji 2009). The TM model segmentsthe biological space-time into a series of different biological organizations, viz. the nucleotides; amino acids;macromolecules (proteins, sugar polymers, glycoproteins); biochemical pathways; network of pathways; bio-logical cell; tissue; organs; organisms; society and ecosystem; where these organizational schemes are calledthreshold levels. Emergence of a single biological property (compositional and/or structural and/or func-tional) creates a new biological threshold level in the TM model. Thus, if any arbitrarily chosen i th biologicalthreshold level is denoted as TH i , the succeeding one, viz. TH i+1 will be containing at least one biologicalproperty that TH i didn’t possess. Schemes with similar philosophy to identify biological threshold levelswere proposed previously (Testa and Kier 2000, Dhar 2007), but representation of emergence of any biologi-cal property and subsequent classification of biological organization with respect to this emergent behavior,was not done in either of these models. The basic principles for subsequent discourse are general and can beapplied to any threshold level. Every possible property that a threshold level is endowed with, is representedby a ’thread’ in the TM model. Thus an environmental property capable of influencing biological action willbe called as an ’environmental thread’ in the present parlance. Threads can be compositional, structuralor functional. For example, for the biological threshold level corresponding to the enzymes (threshold levelrepresenting the macromolecules), one of the compositional threads is the amino acid sequence; whereas theradius of gyration, the resultant backbone dipole moment and each of the bond lengths, bond angles, torsionangles are some examples of structural threads and the values for K m , V max , K cat are some examples ofit’s functional threads. It is advantageous to work with the TM model because it can attempt describingcontext-dependence and emergence from the framework of an invariant template. Section-2) : Modeling biological context-dependence from top-down perspective
While framework of eq n s [1 − along with examples 1 and 2 describe the nature of multilevel organizationof CT, such description is ’bottom-up’ in nature. Hence, while it is helpful to describe the context-mappingbetween any two particular adjacent biological threshold levels ′ l ′ and ′ l + 1 ′ , (say between threshold levelsrepresenting nucleotides and amino acids, amino acids and proteins, or between proteins and biochemicalpathways, etc ..) the general mode of dependency within CT with a birds-eye (’top-down’) view of the orga-nization of it, can hardly be guessed from such bottom-up approach.We start the construction of the top-down scheme of description of dependencies between biological con-texts, by enlisting the assumptions involved therein. Hence : Assumption-1 ) :
In absence of random external disturbances and without a failure of any component be-longing to physical structure of the system ( P ) all the rules of multilevel interactions between the contextsrepresenting any biological threshold level, can be constructed and described in deterministic manner. (Suc-cess of recent attempts with deterministic modeling of various biological phenomena from diverse backgrounds(Janda and Gegina 2008, Kim and Maly 2009, Ferreira and Azevedo 2007) suggest that such assumption isnot ill-founded, and that too in absence of possible perturbations.) Assumption-2 ) :
The necessary and sufficient condition in order these deterministic rules of inter-levelcontext interactions hold true, is in their accounting for the accomplishment of certain biological goals ( F ) .(Previous studies (Yartseva et al. 2007, Troyanskaya et al. 2003, Camon et al. 2004) vindicate such assump-tion.) Assumption-3 ) :
Although biological systems will be exposed to randomly varying magnitudes of externalparameters, the essence of the deterministic criteria of context interactions in order to accomplish any setof required biological function, will not be perturbed by significant margin. This assumption implies thatdeterministic manner of context interactions will not be undergoing significant change when the magnitudesof components of underlying physical structures { p i } ( p i ∈ P ) , comprised of relevant biological parameters,are altered within some allowable range. We describe this allowable range of assumed magnitude of somearbitrarily chosen parameter π by an interval [ π , π ] . cat val-ues of the protein kinases and phosphatases have been found to range from 0.01 to 1 s -1 (Kholodenko 2000).Similarly, for the proteins, the mass fractal dimension and hydrophobicity fractal dimension representingcompactness of mass and hydrophobicity distribution, have been found to be in the range between 2.18 to2.37 and 2.22 to 2.43 respectively (Banerji and Ghosh 2009).Based on these assumptions, we propose the functional that defines the probability of attaining the bio-logical goal ( F ) under consideration, to assume the form : F = ˆ x i ∈ S φ ( x , x , . . . , x n ) dx i (1 ≤ i ≤ n ) (3)where φ is the probability density of attaining the objective (biological goal) and X is the feasibility domainof the contexts x i .Since to achieve every biological goal, many (say, m ) successive stages of context interactions are required,we can express eq n − at a higher resolution as : φ ( x , x , . . . , x n ) = m Y j =1 φ j | φ j − ( x , x , . . . , x n ) (4)where φ j | φ j − represent the conditional probability associated with context { x i } interactions, while attempt-ing to achieve a particular biological goal.However, we note that individual physical parameters p i ( p i ∈ P ) , upon which the contexts are working,may not always be strongly correlated and although related with each other, can be considered indepen-dent when viewed individually with respect to their functional contribution to the system. For example,the time-dependent and context-dependent fluctuations in individual bond lengths, bond angles and torsionangles in the protein interior, although might be related in some intricate way to the resultant dipole momentfor the protein; can be considered, for all practical purposes, in terms of their individual (and not linked)contributions in ensuring proteins stability and functionality. Hence we attempt to partition the relevantcontexts into a sum of disjoint domains; such that : ( x i ∈ X i ) and P i X i = S .Considering this partition we can re-write eq n − as : F = ˆ x ∈ X ˆ x ∈ X .. ˆ x n ∈ X n φ ( x , x , .., x n ) × φ | ( x , x , .., x n ) ..φ m | m − ( x , x , .., x n ) dx dx ..dx m (5)In other words, purely in terms of achievement of biological goals : F = F F | . . . F m | m − (6)where F j | j − = ˆ x ∈ X ˆ x ∈ X . . . ˆ x n ∈ X n φ j | j − ( x , x , . . . , x n ) dx dx . . . dx n (7)5re the conditional probabilities of context-interactions of the system realizing the successive stages of thetask. It is necessary to mention here that to achieve any biological function, the domain of integration forevery x i in eq n − must be within their respective permissible range, say [ x π , x π ] .While it is difficult to assume that every context-interaction necessary to realize certain biological goalwill always be operating in deterministic manner with perfect efficiency, experience teaches us that biologicalgoals are seldom compromised with. Hence we assume that the reliability of any arbitrarily chosen contextinteraction at any arbitrarily chosen j th state in the realization of certain biological function, is statisticallyindependent of the probability of the realization of that particular biological function. In that case, the inte-grand in eq n − , can be expressed as a product φ j | j − ( x , x , . . . , x n ) r j ( x , x , . . . , x n ) , where r j , ( r j ∈ R ) describes the probability of reliability of any arbitrarily chosen context-interaction at j th state in the real-ization of certain biological function.Hence eq n − , can be expressed more realistically as : φ ( x , x , . . . , x n ) = m Y j =1 φ j | j − ( x , x , . . . , x n ) r j ( x , x , . . . , x n ) (8)Thus, when the reliability of context-interactions are taken into account, assuming that eq n − and eq n − are valid, eq n − and eq n − can be re-written as : F = m Y j =1 F j | F j − R j (9)where R j = ˆ x ∈ X ˆ x ∈ X . . . ˆ x n ∈ X n r j ( x , x , . . . , x n ) dx dx . . . dx n (10)In other words, eq n − , can be re-written as : F = F F | . . . F n | n − R R . . . R m (11) Result :(Bottom-Up) Modeling hierarchical organization with ’context tree’:Case-study with protein structure :
To describe the ’Context-Tree’( CT ) in such hierarchic paradigm under a generalized scheme we introduce aconstruct C , which is a family of embedded partitions of contexts C = < C , C , . . . , C r > that operate uponany relevant subset of structural threads ( J ) representing the physical structure ( P ) of any arbitrarily chosenthreshold level S . J ⊂ P and J = { , , . . . , m } . For example, it has been found (Main et al. 1998) thatat the threshold level of proteins; ( S : TH Proteins ) in an urea-induced media (the ’environmental thread’influencing J ), the extent of stability of mutant proteins are highly dependent on the contexts ( C ) whichoperate upon the various structural parameters ( J ) , that form a subset of ( P ) describing ( S ) .We can describe the situation as : C S = < C S , C S , . . . , C Sl > ∪ lj =1 C Sj = J, C Si ∩ C Sj = ∅ ( i = j ) , S = 1 , r (12)The embedding refers to any element of the partition of the S th biological threshold level; i.e., the set C sj represents the union of several sets C S − i , C S − i , . . . , C S − i z of the ( S − th biological threshold level. Suchdescription of ( CT ) conforms to a previous study on similar topic (Andrianantoandro et al. 2006). Findings6rom a recent study (Haseltine and Arnold 2007) vindicates C Si ∩ C Sj = ∅ . To elaborate the hierarchic struc-ture, we can write C ←→ < C S − , C S − , . . . , C S − l > , if C S = ∪ li =1 C S − i . Since the entire set of interactionsbetween various contexts is ultimately geared to satisfy biological goals and since the nature of organization ofbiological goals is hierarchic, we attempt to describe it by defining l (cid:0) C S (cid:1) = l and C root = < { , , . . . , m } > ; i.e., the partition at the highest (root) level consists of one set, namely J .We can associate each element C Sj (cid:0) s = 2 , r (cid:1) of the partition to the context-interaction function, namely f Sj (cid:16) α , α , . . . , α l ( C Sj ) (cid:17) , where α ∈ { , − , +1 } , conforming to the previously defined U ( U = { α , α , . . . , α n } ) .We associate each element C j of the first level to a binary relation R j (the previously defined elementary con-texts are related by this, say aRb , where A = { a, b, . . . , z } ) on the biological sub-space EC j .These concepts can formally be described as :Let C j ←→ < C , . . . , C l > and define a relation R j on E C j using the formula (for l > ) : a C j R bC j ⇐⇒ f j ( α , α , . . . , α l ) = 1 (13) α i = +1 if a C i R i b C i α i = − if a C i Rb C i α i = 0 if a C i = b C i In case of l = 1 , R j = R j . If all the relations R S − of the ( S − th biological threshold level are defined,then the relations R Sj of the S th threshold level with ( l > can be defined by the following construct :If C Sj ←→ < C S − , C S − , . . . , C S − l , > , then a C Sj R SbC Sj ⇐⇒ f Sj ( α , α , . . . , α l ) = 1 (14) α i = +1 if a C S − i R S − i b C S − i α i = − if a C S − i R S − b C S − i α i = 0 if a C S − i = b C S − i If C S = C S − , then R S = R S − ; in other words the construction requires the relation R to coincide with R r , which is a single relation at the r th upper level.To describe the entire bottom-up paradigm of description of interaction scheme between biological con-texts, we consider an example where we describe the contextual constraints on the active site of an enzymein simplistic terms. For this case, without any loss of generality, we consider the threshold level representingproteins to be the root level in this case. The goal of the system ( F ) for the present purpose is to makethe enzyme functional. We assume the elementary contexts that can influence functionality of the enzymeactive site to be represented with 3 basic partitions; namely, first, the contextual differences originating outof internal coordinates; second, contextual differences arising out of interaction profile of the active site atomswith water; and third, contextual differences arising out of the capability of the active site to undergo a shapechange. Hence we describe the family of partitions C , as C = < { , , } , { , } , { } > and C = < J > ;where element 1 denotes (possible) contextual difference arising out of the fluctuation of bond lengths, el-ement 2 denotes (possible) contextual difference arising out of the fluctuation of bond angles, element 37enotes (possible) contextual difference arising out of the fluctuation of torsion angles. Similarly, element4 stands for (possible) contextual difference arising out of the hydrophobicity of active site patch, element5 denotes the (possible) contextual difference arising out of the local electrostatic profile of the active sitepatch. Element 6 denotes the extent of (possible) contextual difference arising out of the change in the localshape of the active site patch. Denoting the set { , , } as D , { , } as D and { } as D , the hierarchicnature of these contextual dependencies can easily be described as : D RD ⇐⇒ [ D ≥ D ] similarly, D RD ⇐⇒ [ D ≥ D ] and D RD ⇐⇒ [ D ≥ D ] Implying that a (possible) contextual difference arising out of the hydrophobicity of active site patch, ora (possible) contextual difference due to the local electrostatic profile of the active site patch will surelyaccount for some change in the distribution profile of bond length, bond angle and torsion angle distribution.But inverse of this case, viz., a (secondary) change in the local electrostatics profile and local hydrophobic pro-file due to a (primary) change in bond-length, bond-angle or torsion angle might or might not be observed inreality. Similarly, in case of a possible change in local shape the local electrostatic profile, local hydrophobicityprofile, local distribution of bond length, bond angle, torsion angle will surely be taking place; but the otherway round might or might not be observed. This vindicates and generalizes our previous finding that interac-tions between biological contexts, are transitive but are not commutative (hence, if D RD and D RD aredefined, D RD RD can be defined; but merely because D RD exists doesn’t imply that D RD exists too). Conclusion :
While eq n − to eq n − constructed the bottom-up scheme of describing the interactions and dependenciesbetween biological contexts, the framework of equations eq n − to eq n − describe the top-down view of thesame. Together these set of equations present a comprehensive way to quantitatively model the omnipresent“context-dependence” in biology. Evidences for the reliability of such mathematical treatise can easily beobtained from the various experimentally proved results that are provided to emphasize the reasonable na-ture of these formulations. Since contemporary biology, as never before, is attempting to be objective inits philosophy, the necessity of a mathematical model to describe the “context-dependent” nature of it canhardly be ignored. The model proposed here, therefore, assumes immense importance. Acknowledgment :
This work was supported by COE-DBT (Department of Biotechnology, Government ofIndia) scholarship.