AA Model on Genome Evolution
LiaoFu LUO
Faculty of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China email: [email protected]
Abstract A model of genome evolution is proposed. Based on three assumptions the evolutionary theory of a genome is formulated. The general law on the direction of genome evolution is given. Both the deterministic classical equation and the stochastic quantum equation are proposed. It is proved that the classical equation can be put in a form of the least action principle and the latter can be used for obtaining the quantum generalization of the evolutionary law. The wave equation and uncertainty relation for the quantum evolution are deduced logically. It is shown that the classical trajectory is a limiting case of the general quantum evolution depicted in the coarse-grained time. The observed smooth/sudden evolution is interpreted by the alternating occurrence of the classical and quantum phases. The speciation event is explained by the quantum transition in quantum phase. Fundamental constants of time dimension, the quantization constant and the evolutionary inertia, are introduced for characterizing the genome evolution. The size of minimum genome is deduced from the quantum uncertainty lower bound. The present work shows the quantum law may be more general than thought, since it plays key roles not only in atomic physics, but also in genome evolution. Key words genome evolution, quantum evolution, wave equation, uncertainty relation, the least action principle, evolutionary rate, evolutionary direction.
Genome is a well-defined system for studying the evolution of species. There were many publications on genome evolution. Particularly, the problem of genome size evolution has been widely discussed for a long time. The C-value enigma is still puzzling and perplexing [1][2]. On the other hand, two theoretical points, phyletic gradualism and punctuated equilibrium, were proposed to explain the macroevolution. It seems that both patterns are real facts observed in fossil evolution [3]. A deeper research question is what conditions lead to more gradual evolution and what conditions to punctuated evolution, and how to unify two patterns in a logically consistent theory. We will propose a mathematical model on genome evolution in the article. Based on a measure of diversity for DNA sequence we suggest a second-order differential equation to describe genome evolution. The directionality of the evolution is easily defined by use of the equation. Then, by putting the differential equation in a form of the least action principle, we emonstrate that the classical evolutionary trajectory can be replaced by trajectory-transitions among them in general and the concept of quantum evolution should be introduced. Thus, both classical phase (gradually and continuously) and quantum phase (abruptly and stochastically) that have been observed in the evolution can be explained in a natural way. Simultaneously, the quantum theory gives us a fully new view on the genome evolution problem.
Basic assumptions of the model for genome evolution
Ansatz 1 : For any genome there exists a potential to characterize the evolutionary direction [4]
11 1 (,...,,)(,...,)(,..., ) mmenv m
VxxtDxxWx x = + ( ) where x i means the frequency of the i -th nucleotide (or nucleotide pair) in DNA , env W is a selective potential dependent of environment, and V depends on t through the change of environmental variables. D means the diversity-promoting potential (,...,)loglog , m mmii i i i DxxNNxxN x =- = (cid:229) (cid:229) (2) Notice that the potential defined by (2) equals Shannon information quantity multiplied by N . Set ii xf N = . We have log i i DNf f = - (cid:229) (3) In literatures D is called diversity measure which was firstly introduced by Laxton [5]. In their studies the geographical distribution of species (the absolute frequencies of the species in different locations) was used as a source of diversity. Recently, the method was developed and applied successfully to various bioinformatics problems, for example, the intron splice site recognition[6], the promoter and transcriptional starts recognition[7], the protein structural classification[8], the nucleosome positioning prediction[9], etc. Now we shall use it to study evolutionary problem. Ansatz 2 : The genome evolution equation reads as ( ) i ii ddxVdxc fdtdtxdt ¶= -¶ (4) where f > is a dissipation coefficient representing the effect of fluctuation force. The parameter c is introduced with the dimension of ( time ) which represents the evolutionary inertia of the genome. The evolutionary law can be reformulated based on Feynman ’ s action integral [10]. Introduce a functional (called Information action) ( )[(),...()](()(,...,)) 2 im m ctdxSxtxtVxxtdt dt = + (cid:229)(cid:242) (5) Then the solution of (4) can be expressed as ) 0 ii i SS xx d d ¶= =¶ (cid:229) (6) as the dissipation is weak ( f can be neglected ) . Therefore, the classical evolutionary trajectory x i ( t ) satisfies the principle of the least action. By use of path integral quantization the evolutionary trajectory theory can be generalized to a more general quantum formalism. Ansatz 3 : The genome evolution obeys a general statistical law, described by the information propagator [x](x',';x,)exp( ) iSUtt A L = (cid:229) (7) ( x=(,... ) m x x ) . The summation is taken over all ideal paths satisfying x=x at t and x=x ' at t ’ . Eq(7) is essentially a functional integral in mathematical term. Here L is a quantization constant of time dimension. The path integral (x',';x, ) Ut t describes the evolution of the genomic statistical state from t to t ’ . When ' tt L - > the virtual variations ( ) i x t d may lead to S L d >> and all terms in the summation (7) would be canceled each other due to phase interference apart from those in the vicinity of classical trajectory where S takes a stationary value ( S d =0). That is, for large ' t t - (> L ) the classical trajectory holds. Therefore, the classical trajectory is a limiting case of the general quantum theory. However, Instead of definite trajectories, the quantum picture of trajectory transitions among them will be important during speciation if L is defined by the time of the new species formation. Results of the model and discussions Direction of evolution
The selective potential env W depends on environment. However, in stable environment both env W and c are independent of t and from Eq(4) one deduces
101 0 (()())(()()) 0 t i it dxVtVtKtKtfd x dt ---= ‡ (cid:229) (cid:242) ( t t > ) (8) where ()((),...,(), ) m VtVxtxt t = and ()( )2 i cdxK t dt = (cid:229) (9) ( ) K t is a measure of the changing rate of the nucleotide frequency. It generally takes a smaller alue as compared with the change of potential ( )
V t for the appropriately chosen c . Thus Eq (8) means
101 0 ()()()() 0 envenv
DtDtWtW t -+- ‡ ( t t > ) (10) Eq (10) gives the direction of genome evolution in stable environment. The selective potential env W increases with N for positive selection and decreases with N for negative selection. For positive selection both two terms in Eq (10) are positive, namely ()() 0 envenv WtW t - ‡ and ()() 0
DtD t - ‡ . For negative selection one has ()() 0 envenv
WtW t - ‡ accompanying N decreases and simultaneously ()() 0 DtD t - < . In former case the DNA sequence length increases. In latter case, although the genome size decreases the eliminated segment is often useless or even deleterious due to negative selection but the function-coding DNA is generally not deleted. In fact, DNA loss was frequently observed in genome evolution. Recent research indicated that deletional bias is a major force shaping bacterial genomes[11,12].
However, through experiments on
E coli [13] we demonstrated that the pervasive bias towards segmental deletion is connected with the deletion of pseudogenes and other nonfunctional insertion sequences. Based on these observations it was concluded that due to natural selection, despite of the frequently occurring deletion events, the function-coding information quantity of a genome still grows in the course of evolution [4,13]. The evolutionary direction described by Eq (10) is essentially consistent with the law of function-coding information quantity growing.
2 Alternating occurrence of classical and quantum phases
Classical phase means the smooth evolution obeying the classical deterministic law, while the quantum phase means the sudden evolution obeying the quantum stochastic law. The present model of genome evolution predicts the alternating occurrence of both phases. Many different estimates for the rate of evolution were made from the fossil records. As compiled by Gingerich[14][15] four hundred and nine such estimates were reported and they vary between 0 and 39 darwins in fossil linearage. Palebiological studies indicated that species usually change more rapidly during, rather than between, speciation events. The smooth evolution always occurs between speciation events and the sudden evolution preferably occurs during speciation. The former can be interpreted as the classical phase and the latter as the quantum phase in our model. Palebiological studies also indicated that the structurally more complex forms evolve faster than simpler forms and that some taxonomic groups evolve more rapidly than others. All these observations can be interpreted by the alternating occurrence of classical and quantum phases and discussed by the evolutionary equation (4) and (7). Phyletic gradualism states that evolution has a fairly constant rate and new species arise by the gradual transformation of ancestral species. While punctuated equilibrium argues that the fossil record does not show smooth evolutionary transitions. A common pattern is for a species to appear suddenly, to persist for a period, and then to go extinct. Punctuated equilibrium states that volution is fast at times of splitting (speciation) and comes to a halt (stasis) between splits. The theory predicts that evolution will not occur except at times of speciation. It seems that phyletic gradualism and punctuated equilibrium are contrasting and contradicting theories.[16] However, from our model of genome evolution both smooth and sudden phase should occur in a unifying theory. By using the evolutionary equation (4) it is easily to deduce that the evolution has a range of rates, from sudden to smooth. Moreover, from the general formalism given by Eq (7) the evolutionary trajectories during speciation should be switched to quantum transitions among them . Thus, from the present theory the punctuated equilibrium and the phyletic gradualism are only the approximate description of two phases of an identical process.
3 Laws in classical phase
From Eq (4) and (1)(2) we deduce ()log iienv i i ddxxWdxc fdtdtNxdt ¶=-+ - ¶ (11) Eq ( ) can be written in a form of ( )( ) 2 iienvi ii i d xddxW fxcD xdtdtxdt ¶=+ -¶ (cid:229)(cid:229) (cid:229) (12) Eqs (11) and (12) mean the Shannon information log i xN - or diversity D plays a role of evolution-promoting force. Many examples show that the genome always becomes as diverse as possible and expands their own dimensionality continuously in the long term of evolution [4]. These observations are in agreement with the evolution-promoting force introduced in Eq (11). In his book “ Investigation ” Kauffman wrote : “ Biospheres, as a secular trend, that is, over the long term, become as diverse as possible, literally expanding the diversity of what can happen next. In other words, biospheres expand their own dimensionality as rapidly, on average, as they can. ” and called it “ the fourth law of thermodynamics for self-constructing systems of autonomous agents ” [17] . Our proposal on evolution-promoting force is consistent with Kauffman ’ s suggestion about the force for “ expanding the diversity ” in autonomous agents. Segment duplication including global duplication and regional multiplication is a major force for genome evolution. The duplication can easily be deduced from Eq (11) if one assumes the selective force envi W ax ¶ =¶ ( a - a positive constant) and c remaining a constant in certain short term. In fact, with the constant acceleration log 2 i Na ax +» + the evolutionary trajectory obeys
20 02 i i axxt t c ++ - ; . The time needed for i x attaining 2 i x is i xc a + . Assuming i a x = and c t = or t ( t - the time of one generation ) one immediately btains the duplication time about 1% to 10% of the generation. The driving force of genome evolution ( log i xN - or D ) and the selective force ( ) envi Wx ¶ ¶ are someway in equilibrium with the friction f . The selective force comes mainly from the environmental pressure and species competition; the superfluous fluctuation of the selection is represented by the friction. To be definite consider m =4. The force log i Nx makes four kinds of nucleotide tend to equally distributed in DNA sequence ( log i Nx =2). In stable environment an equilibrium will be established among three forces
2( )( ) envi ieq
Wdx xdt f ¥ ¶+ ¶= (13) ( ( ) envi Wx ¥ ¶ ¶ means the stable value of selective force). The friction coefficient f is species-dependent. The larger f is, the smaller the equilibrium evolutionary rate will be. For negative selection, when envi Wx ¶ ¶ cancel the evolution-promoting force it leads to ( ) i eq dxdt =0. This explains the evolutionary stasis of some species. For example, the study of “ living fossil ” lungfish showed that around 300 million years ago the lungfishes were evolving rapidly, but since about 250 to 200 million years ago their evolution has become right down. [3]. However, for a genome in suddenly-changing environment, the selective force envi Wx ¶ ¶ changes rapidly. If the negative selection is strong enough then the genome cannot adapt to the sudden change of environment (for example, the food deficiency), and the species would go extinct; or new species would otherwise emerge, the genes of which can adapt to the functional needs under new environment.
4 Basic characteristics in quantum phase
Suppose the statistical state of the DNA evolution is represented by a wave function (x, ) t y that describes the probability amplitude at time t . The propagation of wave is determined by U [10], (x,)(x,;x,)(x,) x tUttt d y y ¥-¥ = (cid:242) (14) We can prove (x, ) t y satisfies Schrodinger equation (see Appendix) (x,)(x, ) (x, )2( ) i iLtH tt LHV t ct x y y ¶ =¶ ¶=- -¶ (cid:229) (15) H is called Hamiltonian of the genome. Eq (15) is the stochastic-generalization of the deterministic equation (4) or (11). Define i i piL x ¶= - ¶ (16) On account of Eq (15) one deduces *2 * 12 (,)(,)(x,)(x,)(... ) mm mii mi iavav dxtpxtdxctxtdxdxdxdx dtorpc v yyy y = == (cid:242) (cid:242) (17) where i v means the changing rate of nucleotide frequency. Moreover, by defining square deviations *2* 2 (x,)(x,)((x,)(x,) ) m mii i xtxtdxtxtd x yyy y D= - (cid:242) (cid:242) *2* 2 (x,)()(x,)((x,)(x,) ) m mii i ptptdxtptd x yyy y
D= - (cid:242) (cid:242) (18) one easily obtains i i Lx p
DD ‡ (19) or i i Lx v c
DD ‡ (20) for any given time. Here i x D means the uncertainty of nucleotide frequency and i v D the uncertainty of the changing rate of nucleotide frequency. Eq (19) (20) means the uncertainty relation of nucleotide frequency and its changing rate in DNA sequence. Since x and v cannot be determined simultaneously to an enough accuracy, the evolutionary trajectory cannot be accurately defined in principle. The above discussions show that there exists good correspondence between classical evolutionary equation, Eq (4), and quantum equation, Eq (15). The quantum evolutionary equation is the logic generalization of the classical equation. The generalization is valid not only for the evolution in stable environment but also for the evolution in varying environment where the evolutionary potential V and inertia c are time-dependent. As an application of the quantum evolutionary theory we discuss the speciation event from the view of quantum transition. Based on Schrodinger equation the speciation rate can be calculated. Suppose the initial wave function of the “ old ” species denoted by (x) I y and the final ave function of the “ new ” species by (x) F y . The transition from “ old ” to “ new ” is caused by an interaction Hamiltonian int H in the framework of quantum mechanics. One may assume int H comes from the change of evolutionary inertia , namely int HH cc ¶= D¶ . Thus we have the transitional probability amplitude expressed by * int123 4* 123 4* 123 4 (x)(x)(x)()(x)()(x)(x) fiF IF II F I
THdxdxdxdx H cdxdxdxdxcE cdxdxdxdxc y yy yy y = ¶= D¶¶= D¶ (cid:242)(cid:242) (cid:242) (21) ( I E - the eigenvalue of H ) . Suppose
20 0 ( )8(x)exp( ) iAiI x xa a y p - -= (cid:229)
20 0 ( )8(x)exp( ) iBiF x xa a y p - -= (cid:229) ( ) ( a - the frequency distribution width, Ai x and Bi x the frequency distribution centers for two genomes respectively). Inserting (22) into (21) one obtains (x)(x)exp(),( ) 2 FFAiBi
RdxdxdxdxRx x a y y =-= - (cid:229)(cid:242) . Therefore, the transition probability is large only for small distance R , since it rapidly tends to zero with increasing Ra . During speciation, corresponding to one old genome there are many candidates for the posterity with different probabilities. The most probable one is the new genome having size equal or near to the old. This explains why the observed genome-size evolution is always continuous.
5 Differentiation between classical and quantum phases and the observation of quantum evolution
How to differentiate classical and quantum phases in the present theory? From Eq (5) the contribution of trajectory variation x( ) t d to (( )2 i cdx dtdt (cid:229)(cid:242) is proportional to c . So the variation of phase S d / L in Eq (7) is much influenced by c . The large c would make the contribution of different trajectories to the summation in (7), namely to (x',';x, ) Ut t , easily anceled each other , while the small c does not. We assume that the inertia c is a constant (denoted as c ) for genome moving on classical trajectory under stable environment. However, it is dependent of environmental selection. During speciation the relaxed selection pressure makes the evolutionary inertia of some new species jumping to a lower value (denoted as c ) since in this time all evolutionary events happened rapidly. If c as a parameter of time dimension decreases to 1/10 of the original value or less, then the picture of classical trajectory may cease to be correct. More generally, one may assume that c lowers from c to (0.01 ) c c £ during speciation, or lowers from c to some intermediate value between c and c during sub-speciation. The result is the evolutionary picture switched from classical to quantum or from classical to semi-quantum upon the assumption. The point can be clarified further by looking the uncertainty relation (20). Equation (20) holds in both phases, whatever in the quantum phase or in the classical phase. However, the small c requires the simultaneous occurrence of the larger frequency deviation and the larger deviation of the frequency changing rate. This leads to the nucleotide frequency and the frequency changing rate no longer simultaneously measurable to an enough accuracy. Therefore, the picture of classical trajectory should be replaced by a large amount of rapid trajectory-transitions. It means the quantum phase occurs and this gives a condition for new species production. How to observe the quantum evolution in the period of new species formation? The time evolution of the genome is described by the propagation function [x](x',';x,)exp( ) iSUtt A L = (cid:229) (Eq (7)) where the summation is taken over all paths and S is an integral between time t and ' t . The time interval ' t t - can be looked as the width of a window for observing the evolution. If ' t t - > L , only classical trajectory x( ) t contributes to (x',';x, ) Ut t due to the interference and cancelation of terms in the summation of U . However, if the window width ' t t - is smaller enough comparable with L , then the change of phase [x] SL d will only be in the order of one radian or less and all ideal trajectories will contribute to the propagation function. That is, the quantum transition should be observed through the small window whose width is near to or smaller than L . In a word, the coarse-grained evolution is observed on a classical trajectory but the quantum laws should be discovered by the fine-grained observations. Moreover, even for evolution in classical phase the quantum stochasticity can still be observed through a small enough window.
6 Fundamental constants and minimum genome wo constants, quantization constant L and evolutionary inertia c , both in the dimension of time, play important roles in the quantum evolutionary theory. The former is related to the realization of the quantum picture of the evolution. The variation of the latter makes the switch from the classical phase to quantum or vice versa . From the estimate of the frequently observed short-term duplication the parameter c should be a small quantity. Set τ the average lifetime for one generation of the species. One assumes tentatively c = t - in classical phase and c = t - in quantum phase. The speciation time L is also dependent of τ . One assumes tentatively L = 3 × τ from the primary estimate of human or bacterial speciation duration. The uncertainty relation (20) can be written in i i xv FLF c t t DD ‡ = F is the dimensionless lower bound of uncertainty relation and it is irrespective of τ . If the uncertainty i v D is estimated by i x t D , then the equation can be rewritten as i x F D ‡ (23)
For human the uncertainty i x D is estimated from the single nucleotide polymorphism, i x D= · . Taking the above-estimated values of L and c into account we prove Eq (23) is fulfilled for human. The similar prove can be done for other species. The lower bound F „ means the existence of minimum genome. The present theory predicts the size of minimum genome equal F . Taking the estimated values L =3 × τ and c t t - - = - we obtain F =·- · nucleotides in accordance with the size of a typical bacteriophage genome. This is consistent with the conventional understanding on phage as species of the simplest life. Two quantum theories and two classical limits
It is interesting to make comparison between the present quantum model for genome and the conventional quantum theory for electrons, atoms and molecules (even for some degrees of freedom of macromolecules [18]). Both systems have wave function satisfying Schrodinger equation. Both theories have their classical limit. The classical trajectory of an atom or an electron is given by the position coordinate of the particle as a function of t , while the classical trajectory of a genome is given by the nucleotide frequency of DNA as a function of t . The former is constrained by particle ’ s energy while the latter is constrained by genome ’ s information. Energy is conserved with time but information always grows in evolution. The atomic quantum theory has an elementary constant, the Planck ’ s constant, while the genomic quantum theory has a orresponding constant L . The Planck ’ s constant has dimension of (energy × time) but the genome constant L has dimension of time. The classical limit of both theories is deduced from Planck constant h or genome constant L approaching to zero respectively. However, the Planck ’ s constant is universal but the constant L is species-dependent because of the generation lifetimes varying largely from species to species. Appendix
Deduction of wave equation in quantum phase
Suppose the statistical state of x=(,... ) m x x represented by a wave function (x, ) t y .
00 0 (x,)(x,;x,0)(x,0) x
U d yee y = (cid:242) ( ε - a small quantity) (A1) where U is a functional integral, expressed as [x](x',';x,)exp()(x) (x,...x )limx...xexp( ) nn n iSUttA d LiSAd d L fi¥ - == (cid:242)(cid:242) (cid:242) (A2) and ' t t e - = has assumed. Inserting the information action functional [x] S (Eq (5)) () x[x()](()(x,)) 2 ct dStVtdt dt = + (cid:242) (A3) into (A2), after integrating over (x,...x ) n - and taking limit n fi ¥ we obtain [10]
22 2/2 0 00 (0)(0)(x-x)x+x(x,;x,0)()exp{(,0)}22 2 m ci cU VLi L e epe e = + (A4) Eq (A4) has been normalized to (x-x ) U d fi as V =0. In the deduction of (A4) the integral formula
11 2 22111 00
11 1...exp{()}()exp{() } nnnjj nj dydyyyiy y ini n p --¥ ¥ - +-¥-¥ = ---= - (cid:229)(cid:242) (cid:242) (A5) has been used. By using (A4), Eq (A1) can be written as
22 2/2 0 0 0 0 (0)(0)(x-x)x+x(x,)())expexp{(,0)}(x,0) x22 2 m cic i V dLiL L eye ype e = (cid:242) (A6) ere (x-x )exp 2 ic L e is a rapidly oscillating factor, only important near x-x =0. By setting x-x h = , Eq (A6) is rewritten as
22 2/2 (0)(0)(x,)())expexp{(x+,0)}(x,0)22 2 m cic i V dLiL L he hyeyh h pe e = + (cid:242) (A7) After expanding (x,0) y h + and exp(x+,0) 2 i VL e h with respect to h and e , finally we obtain (x,)(x,0){(x,0)}(x,0) 2(0) i i L VLc x eyey y ¶=-- - ¶ (cid:229) (A8) Therefore, (x, ) t y satisfies Schrodinger equation, (x,){(x,)}(x, )2( ) i LiLtVt ttct x y y ¶ ¶=- -¶ ¶ (cid:229) (A9)
Acknowlegement
The author is indebted to Dr Qi Wu for numerous helpful discussions.
References [1] Gregory TR. Genome size evolution in animals. In:
Evolution of the Genome (Edi. Gregory TR), Elsevier Inc. 2005. [2] Gregory TR et al. Eukaryotic genome size databases.
Nucleic Acids Research
Database issue D332-D338. [3] Ridley M.
Evolution (3 rd edition). Blackwell Publishing 2004. [4] Luo LF. Law of genome evolution direction: Coding information quantity grows. Front. Phys . 2009, (2): 241 – J Theor Biol. (1):51-67. [6] Zhang LR, Luo LF. Splice site prediction with quadratic discriminant analysis using diversity measure. Nucleic Acids Res. (21):6214-6220. [7] Lu J, Luo LF. Predicting human transcription starts by use of diversity measure with quadratic discriminant. Computation in Modern Science and Engineering: Proceedings of the International Conference on Computational Methods in Science and Engineering (ICCMSE 2007) , AIP Conf Proc. 2007, :1273-1277. [8] Feng YE, Luo LF. Use of tetrapeptide signals for protein secondary-structure prediction.
Amino Acids . 2008, (3): 607 – [9] Chen W, Luo LF, Zhang LR . The organization of nucleosomes around splice sites. Nucleic Acids Research :2788-2798. 10] Feynman RP, Hibbs A. Quantum Mechanics and Path Integrals . McGraw Hill, NY 1965 [11] Mira, A., Ochman, H. and Moran, N.A. Deletional bias and the evolution of bacterial genomes.
Trends Genet. : 589 – Genome Biol Evol . 2009, :145 – , Luo LF. Escherichia coli mutants induced by multi-ion irradiation.
Journal of Radiation Research :854 – Science : 159-161 [15] Hendry AP, Kinnison MT.
Microevolution: rates, pattern, process . Kluwer Academic, Dordecht, Netherland 2001 [16] Eldredge N , Gould SJ. Punctuated equilibria: an alternative to phyletic gradualism. (1972) In Schopf TJM (ed)
Models in Paleobiology . Freeman Cooper & Co, San Francisco . [17] Kauffman S.
Investigations . Oxford University Press, Inc, 2000. [18] Luo LF. Quantum theory on protein folding.
Sci China – Phys Mech Astron57