A construction of the genetic material and of proteins
aa r X i v : . [ q - b i o . O T ] A p r A construction of the genetic material andof proteins
J.-L. Sikorav , A. Braslau and A. Goldar DSM, Institut de Physique Th´eorique, IPhT, CNRS, MPPU, URA2306, CEA/Saclay, F-91191 Gif-sur-Yvette, France. DSM, Service de Physique de l’ ´Etat Condens´e, SPEC, CNRS URA2464, CEA/Saclay, F-91191 Gif-sur-Yvette, France. DSV, iBiTec-S, Service de Biologie Int´egrative et de G´en´etique Mol´eculaire, CEA/Saclay, F-91191 Gif-sur-Yvette, France.
Abstract
A theoretical construction of the genetic material establishes the unique and ideal character ofDNA. A similar conclusion is reached for amino acids and proteins. keywords: information orientation chirality polarity helix globule motor catalyst cyclicity
A central concept of modern biology is that of thegenetic material, the carrier of hereditary informa-tion, and important issues present in the founda-tions of biology can be explored through a specificinvestigation of this concept. It is known today thathereditary information has a material basis made ofdeoxyribonucleic acid or DNA. Why is DNA suchas it is and not otherwise? We examine this ques-tion through a theoretical construction of the ge-netic material, using the language and the methodsintroduced in a previous article [1] in order to buildthe foundations of biology. A similar constructionis used to better understand the structure of aminoacids and proteins.
It is common to introduce the genetic material fol-lowing the work of Watson and Crick describingfirst the structure of the DNA double helix and thenits replication process. [2, 3] Here we present anapproach that is constructive rather than descrip-tive, being both deductive and inductive, and pro-ceeds from the replication process to the structure.We deduce from universal biological phenomenathe necessary asymmetries that the genetic mate-rial should possess. This genetic material is notelemental but compound. The construction incor- porates as many compatible symmetry elements aspossible and, therefore, ends with an ideal, Pla-tonic final structure, which is compared with that ofDNA. From this emerges a better understanding ofthe necessary asymmetries and of the symmetriescompatible with it. We conclude that DNA appearsto be both unique and ideal.The first steps of the construction apply equallywell to polymeric nucleic acids as to proteins.The underlying reason is that both types of infor-mational biopolymers, though different, share thesame fundamental asymmetries. We can constructproteins, starting with their monomers the aminoacids, using a similar approach, again leading to abetter intuitive understanding of these polymers.
Our goal is to build a device that contains geneticinformation. The properties that the genetic mate-rial should possess can be extracted from the fourfundamental theories of life: [1] • Both the theory of natural selection and in-formational theory of life require a transmis-sion of hereditary information, a certain setof instructions. The minimal number of in-formation bits contained in this set can be in-ferred from the large number of hereditarytraits (or genes) always present in a livingorganism (estimated at about one thousandor more) and from the large number of bits1ontained in each of them (much larger thanunity, about one hundred or more). This in-dicates that the set should contain at least 10 bits of information. • The universal existence of cells and of cellreplication indicates that the transmission ofgenetic information is related to the phys-ical process of cell division. This impliesnot only the existence, but also the replica-tion and the segregation of a material en-tity through an appropriate transport pro-cess. The segregation of the genetic mate-rial during cell division could possibly relyon Brownian motion alone. However, thisprocess is far more efficient when assistedby molecular motors. We therefore concludethat there must exist such motors, translocat-ing the genetic material during cell division.It is, in turn, necessary that the structure ofthe genetic material be such as to allow it tointeract most efficiently with these molecularmotors. • Hereditary information is stably conservedin cryptobiosis, during which metabolism isfully suspended. The permanence of the ge-netic material and the maintenance of infor-mation cannot be solely based on a contin-uous dissipation of energy and must be un-derstood through the stability of an isolatedsystem. Since the inclusion of symmetry ele-ments increases the stability of conservativesystems, this provides a strong argument toincorporate as many symmetry elements aspossible in the structure to be built. • The genetic material must be parsimoniousin terms of amount of matter used and spaceoccupied. This means that considerations ofminiaturization and dense packing will con-stantly be present in the construction. • Lastly, the presence of chiral compounds inliving organisms raises an additional ques-tion as to the chirality of this material ele-ment. The genetic material could be chiral orelse achiral, the chirality of living matter be-ing transmitted in an epigenetic manner. Theconstruction will clarify this issue, showingthat molecular chirality is a necessary featureof the genetic material.
We now present step-by-step a theoretical construc-tion of the genetic material.
Matter is made of atoms, and this leads to thebuilding of a discrete structure made of a finitenumber of components. In order to increase thestability of an structure built of atoms, we mustuse the strongest chemical bonds, which are cova-lent (as also suggested by Schr¨odinger [4]). Thegenetic material is thus, at first, a molecule. Asmall molecule contains little information; There-fore, to carry genetic information, the desired struc-ture must be a macromolecule. By simplicity weshall consider a single, linear, unbranched poly-mer containing all the hereditary information, cor-responding to the case of a unique linkage group.A polymer made of identical monomers (a ho-mopolymer) does not contain information. We thusneed a heteropolymer made of at least two types ofmonomers. A simple binary code results from theuse of two types of monomers, a higher order codeif the number of distinct monomers is greater. Thesequence of the monomers encodes the genetic in-formation, which is of digital nature. This discretestructure of the monomers reflects the principle ofatomicity and is compatible with the necessity ofa random, quantum mechanical process involvingparticulate changes in the sequence as the basis ofmutations. The polymeric character of the geneticmaterial may be viewed as an expression of theprinciple of continuity, through the covalent bondsconnecting the monomers, a molecular transposi-tion of the concept of a linkage group. Yet, the uni-versal process of genetic recombination in whichthe order of genes in a linkage group can alteredimplies that the continuity of the chain can be tran-siently interrupted. This points to the necessary ex-istence of cut and paste devices (catalysts and mo-tors) operating on the heteropolymer to move a por-tion of the chain from one place to another.Assuming that each monomer contains one bitof information leads to envision a polymeric chainwith a large number of monomers ( ∼ ). Fur-thermore, as each monomer is minimally of atomicsize which yields a contour length of at least 10 mi-crometers for this polymer. An extended confor-mation of this chain is incompatible with the size2f a small cell or single-cell prokaryotic organism(less than one micrometer) which implies that thepolymer must be flexible. Flexibility is also asso-ciated with the possibility of filling space in a par-simonious manner, thus adopting a dense, globularconformation (further described below).The polymeric nature of the genetic materialimplies that heredity must also be understood in thelanguage of polymer science, using the concepts ofpolymer physics: static conformations as well aschain dynamics. The very large degree of poly-merization of the genetic material, in particular, hasconsequences that can be approached through scal-ing concepts reflecting the symmetry of scale in-variance. [5] The heteropolymer carrying the genetic informa-tion must be transported efficiently to the twodaughter cells during cell division. Brownian mo-tion alone is not be sufficient for that goal, as itcan be stalled be the presence of obstacles. Therequirement of transport leads to assert the exis-tence of molecular motors, called translocases, ableto translocate along the chain (or to translocate thechain if the motor remains immobile).Any arbitrary displacement in space can be de-composed into a combination of a rotation aroundan axis and a translation. The displacement of themotor relative to the heteropolymer will be mostefficient if it occurs in a regular manner, through arepetition of identical steps of translation and rota-tion. The polymer, as seen by the motor, must beable to adopt, at least transiently, a regular struc-ture. As a general rule, the construction of an ob-ject permitting such regular steps leads naturally toa circular helix. [6] A regular, chiral helix is thusthe most general structure (an achiral, degeneratehelix being a singular case).The requirement of interaction of a motor witha regular structure implies that each monomer mustcontain an identical chemical group used for the in-teraction with the motor. We call this group a verte-bra. To the vertebra is attached one of two possibleside-chains, specific for the two monomers, sepa-rated from the vertebra by a chemical group usedas a spacer so as not to interfere with the action ofthe translocase. The existence of this spacer is re-quired to offer a regular, periodic landscape to thetranslocases. The vertebra is used to connect the monomers between themselves to form the poly-meric chain; This chemical group is therefore tri-functional.The chain of vertebrae is called the backboneof the heteropolymer. The motor must also be ableto move in a constant direction along the backbone,thus to dissipate energy anisotropically when it in-teracts with a vertebra which must have a polar (di-rectional) structure. To the polarity of the verte-brae is associated a polarity of the backbone, and,thus, a polarity of the genetic information encodedby the sequence of the monomers. This polarityappears as an expression of the principle of fine di-vision: the mechanical efficiency of the translocaseis based on a fine, one-dimensional orientation ofthe heteropolymer.Each monomer thus contains a trifunctional,chiral vertebra: two of the functions are present in apolar, oriented backbone and the third connects thisbackbone to specific achiral side chains or residues,held outside of the reach of the motor.The trifunctional vertebra need not be chiralbut only prochiral. However, the simplest trifunc-tional vertebra contains a carbon atom with a dou-ble bond, and the resulting planar chemical struc-ture is less flexible than a carbon atom havingfour sigma bounds. As we require the chain tobe flexible, we are thus led to choose as a mini-mal vertebra a compound with a general formulaHC X X X , where the three X i groups differ fromone another and also from the hydrogen atom. Onegroup contains the spacer connecting to the specificside chains and the two remaining groups are usedfor the polymerization of the backbone. Each ver-tebra is, therefore, chiral. The monomers are ho-mochiral compounds and the polymers are isotac-tic chains. We can again understand this result asa consequence of the principle of fine division: acomplete, three-dimensional orientation of the ver-tebra contributes to the mechanical efficiency of themotors through a narrow channeling of the dissipa-tion of the chemical energy.We can further assume, by simplicity, that theside groups are achiral in the absence of other spe-cific requirements. Additional arguments in favorof the chiral character of the heteropolymer comefrom considerations of dense packing (not detailedin the present work).3 .2.3 The replication process and its conse-quences: semi-conservative duplicationand double helical structure The process of duplication of the genetic materialmust minimize space and maximize efficiency andeconomy. It must then be a local process, relyingon the concept of molecular complementary recog-nition, following the ideas presented by Friedrich-Freksa [7] and by Pauling and Delbr¨uck. [8,9] Thisconcept can be seen as yet another illustration ofthe principle of fine division (as shown for bio-logical catalysis by Fischer, Pauling and others).As a general rule, a mechanical copy of an ob-ject based on a molding process does not lead toa copy of the object itself, but to an object with acomplementary structure. The only exception ofthis rule occurs if the object to be copied containsits own complement. [10] In this case, the repli-cation process is semi-conservative. Starting froma parental structure to be replicated made of twocomplementary parts, one obtains two copies, eachof which contains one of the two parental parts.Each complementary part of the parental structurehas been used as a template for the production ofthe other part. The parental complementary partsend up separated at the completion of the process.Thus, replication is a duplication process, leadingto an efficient exponential amplification of the ini-tial structure as a function of the number of repli-cation rounds. We thus conclude that in order to becopied efficiently, the genetic material must con-tain its own complement and is replicated througha semi-conservative process. The helical structurebuilt so far is only half the desired structure. Thecomplete structure is to be made of two comple-mentary helical molecules (heteropolymers) withidentical backbones.
During replication, each of the two chains act astemplate for the copy of the complementary strand.The complementariness between the two helicesresults from a complementariness of the specificside groups of the monomers. Assuming the exis-tence of only two side groups in one helix forminga sequence, and of two corresponding complemen-tary side groups, leads to the employment of a totalof four different side groups, except for the partic- ular situation where the two side groups of one he-lix are complementary to each other, in which caseonly two types of side groups would be sufficient.As this particular case is the simplest choice, weshall retain it in the following with no loss of gen-erality.Due to the complementariness rule, the twocomplementary monomers are to be found in equalamount in the double helical structure. In this dou-ble helix, the information is doubly present becauseof the complementary rules. We have seen that thisredundancy can be understood not only in terms ofa greater efficiency of the replication process (ex-ponential amplification), but it is also a require-ment due to the necessary imperfection of the repli-cation process in order to fulfill the conditions ofShannon’s theorem, as explained previously. Re-dundancy contributes to increase the stability of thegenetic information in replication and in conserva-tion.The interactions between the complementaryparts must be strongly attractive, yet the individualbonds are broken during the duplication process,in contrast with the covalent bonds that connectthe monomers insuring the fidelity of each strand.This points to a complementariness between theside groups which relies exclusively on weak, non-covalent bonds. The two complementary strandsare thus specifically held together through multi-ple weak (non covalent) bonds. The general roleplayed by multiple weak attachments in the estab-lishment of specificity had been foreseen by Paul-ing [10] and, more explicitly, by Crane. [11] Theresulting double helix is a supramolecular entity.The semi-conservative duplication process, with itsneed for a physical separation of the complemen-tary parts (each of them ending up in a differentdaughter cell), also points to an obvious role forone of the translocases postulated to interact withthe structure: to act as a strand helicase, assistingthe mechanical disruption of the double helix bycleaving the weak bonds between the complemen-tary parts.
The semi-conservative duplication process createsa temporal asymmetry between the two strandsof the double helix: one older, that we call the C strand, coming from the parental molecule,and one newly synthesized complementary strand,4he W strand. This asymmetry makes possiblethe labeling of either the parental or the daugh-ter strand through certain chemical modifications(such as methylation). In addition, the synthesisof a complementary strand is a templated, out-of-equilibrium polymerization. This creates a tempo-ral order of the monomers within each strand. Bothtemporal orderings, along each strand and betweenthe two strands, make the structure fully oriented intime as expected from the principle of fine division. Because of the presence of complementary groupsconnecting the two backbones, the two polymerchains cannot be far apart from each other. Fur-thermore, the search for an economy of space alsoleads to require that the two polymers be held inclose vicinity. The simplest such structure that canbe envisioned of is a planar ladder, where the com-plementary side groups provide the rungs, but thisdegenerate helix is incompatible with the chiral na-ture of the monomers and can be rejected as singu-lar. par A simple thought experiment of the duplica-tion of a chiral helical mathematical curve consistsin copying and moving the curve by a small trans-lation along the helical axis or rotation about thisaxis (meaning that the displacement is small withrespect to the pitch of the helical curve). This isequivalent to the winding of two parallel lines ona cylinder and creates a pair of helices. A doublehelix made of two such identical chiral helices canbe one of two types: a pair of helices side-by-side,called paranemic ( παρ ´ α meaning side by side, and νῆμα meaning thread), or plectonemic ( πλεκτ ´ ος meaning intertwined). In the case of a plectone-mic structure, it is possible to embed the two he-lices within a single cylinder in a regular manner,confounding their axes of symmetry, without de-formation. This is impossible to do with a parane-mic structure which is, therefore, irregular and is tobe ruled out. We are left with a plectonemic dou-ble helix, also expected to be more stable and morecompatible with efficient interactions with molecu-lar motors.Given the polarity of the backbone, the twostrands of the double helix can have either a par-allel ↑↑ or antiparallel ↑↓ orientation. We can in-crease the symmetry of the structure by relating thetwo backbones through an appropriate geometricaltransformation, namely an isometry. A reflection is ruled out by the presence of chiral elements in thebackbone which means that a parallel orientationof the two backbones does not add any new sym-metry elements. The symmetry can be increasedthrough the introduction of two-fold ( C ) rotationaxes if the two strands are coupled with oppositepolarities. We therefore retain an antiparallel struc-ture. The double helical structure could be linear, but thepresence of ends of the double helix would destroymany of the symmetries introduced above (heli-cal symmetry and two-fold rotation axes). Indeed,there are no exact symmetries in a finite, linear he-lical structure. In a linear, double-helix made of afinite number of monomers, there is, in fact, onlyone nearly-exact C rotation axis in the structure(this central rotation axis is only exact if we ig-nore the temporal asymmetry between old and newstrands).We can increase the symmetry of this structureby requiring it to be cyclic, eliminating the asym-metries related to the ends. In doing so, we haveto bend the helix and, therefore, formally to de-stroy the exact symmetries associated with the heli-cal axis. However, as the polymer is very long andflexible, both the deformation and its the energeticcost should be very small. We therefore choose tocircularize the structure resulting in a covalently-closed, cyclic double-helix. The helical symmetryand the two-fold axes are not exact, but approxi-mate, plesiosymmetries. An objection can be raised against the cyclic struc-ture. Indeed, the two complementary strands form-ing the double helical structure are separated to be-come fully segregated in the daughter cells. Theplectonemic structure of the cyclic double helixcreates a topological obstacle to this separation, asthe two complementary strands are initially cate-nated. This problem can be solved by appropri-ate catalysts performing strand-passage reactions,that we call dianemases, operating through cyclesof controlled breakage and reunion. The existenceof similar cut-and-paste catalysts is, in fact, alreadyrequired in order to explain the process of genetic5ecombination. The simplest dianemase catalyzesthe passage of one strand through the other by thetransient breakage of the backbone of one strandfollowed by its resealing after the passage of theother strand through the transient breach. In thepresence of this strand passing activity, an insol-uble topological problem is replaced by a solublerheological one: the two closed complementarystrands can now flow one through another on a fi-nite time scale in a process which can be assisted bymolecular motors such as the helicases describedabove. The most efficient manner for this cata-lyst to operate is to become transiently covalentlybound to the broken strand.
We have now obtained a heteropolymer having adouble helical structure. To a first approximation,the complementary paired chains can be viewed asa homopolymer, locally difficult to bend to a helicalstructure (and thus semi-flexible), but globally flex-ible in view of its large degree of polymerization.Modern polymer physics has taught that the staticand dynamic properties of long flexible homopoly-mers have a universal character independent of themolecular details that can be expressed in terms ofscaling laws reflecting a symmetry of scale invari-ance. [5]Given the large degree of polymerization of thedouble helical structure, it is reasonable to attempta scaling analysis of its packing law. A flexiblepolymer can exist in one of two extreme confor-mations: either fully stretched or densely packed.Other, intermediate conformations are possible,such as the swollen coil and the ideal, Gaussianor Brownian conformation. These four conforma-tions can be described through scaling laws, uni-versal relations between the degree of polymeriza-tion N and the volume V occupied by the polymer.The characteristic size of the polymer is given by R ( N ) (cid:181) N n , where n is called the swelling expo-nent (its reciprocal is called the fractal or Hausdorffdimension of the chain [12]). The value of the scal-ing exponent is n = / / / V ( N ) (cid:181) R must scale lin-early with the chain length. The scaling law appliesto the chain taken as a whole; other regimes could apply at lower scales. This overall globular stateis expected to be permanent in a cell (as opposedto the more transient double helical conformation,which only persists between two rounds of repli-cation). There can exist, however, a great varietyof such globular states and their density can varythroughout a cell gemination cycle or with differentphysiological conditions. For instance, we expectthe density of the polymeric globule to be higher ina dehydrated dormant cell than in a similar cell thatis metabolically active. At high enough densities,the structure of this globular semi-flexible chain isexpected to be locally anisotropic due to excludedvolume effects between chain segments [13, 14]and may exhibit a liquid crystalline (nematic) or-dering. The chiral structure of the double helix willfavor a twisted nematic (cholesteric) ordering if thepacking density is not too high. The final, transient, structure obtained is a semi-flexible, compact (globular), cyclic, two-strandedstructure. The two cyclic strands are topologicallylinked. The separation of the two strands occurringduring the replication process is assisted by mo-tors (called helicases), disrupting the non-covalentbonds between them, and by enzymes performingstrand-passage reactions through cut-and-paste op-erations which contribute to decrease their topolog-ical linking number to zero. We call these enzymesdianemases, following the recommended nomen-clature for naming biological catalysts (those act-ing on DNA have been called topoisomerases).The genetic material is an information carry-ing device fully oriented in time and in space (asshown by the construction and as expected from theprinciple of fine division). The structure is char-acterized by necessary asymmetries: it containsinformation and this information is encoded bothspatially and temporally: The encoding is foundin the sequence of monomers and in the fact thateach strand is a fully oriented polymer, both spa-tially (through polarity and chirality) and tempo-rally, as there exists a temporal order along eachstrand, albeit imperfect, the monomers being as-sembled by a directed, out-of-equilibrium polymer-ization. The temporal ordering is also found atthe level of the double helix, where one strand,which we call the C strand, used as a templatein the previous round of duplication, is older thanthe complementary, younger W strand. This last6symmetry is a consequence of the necessary semi-conservative nature of the replication process. Thestructure obtained is ideal, having been systemat-ically saturated with compatible symmetries: heli-cal symmetry, homochirality, isotacticity, plectone-micity, two-fold rotation axes, circularity, globular-ity, complementarity, parity and redundancy, achi-ral side groups. None of these symmetries are ex-act, but only nearly so. The efficiency of the in-teraction with motors results both from asymme-tries (polarity and chirality) and symmetries (heli-cal: homochirality, isotacticity; two-fold rotationaxes).Temporal asymmetries signal the historicity ofthe double helix. The asymmetry between the oldand the new strand makes possible their labelingthrough certain chemical modifications, called epi-genetic, used for example in error-correction pro-cesses. In a general manner, epigenetic phenom-ena result from necessary asymmetries, present notonly in the genetic material but also in its environ-ment (for instance in cell membranes, where theexistence of new poles following cell division canbe exploited similarly).We have attempted to provide the simplestconstructive approach based on our current un-derstanding of biology, making use of ideas thatwere unknown in 1953. This is the case for theconcept of molecular motors (such as RNA poly-merase [15]) acting on the genetic material whichonly emerged in the mid 1990s, and is crucial toour approach. Similarly, the reasoning by whichone can reject a paranemic (side by side) doublehelical structure as being irregular [16] requiresto be fully rigorous C˘alug˘areanu formula relatingthe linking number of closed curves to twist andwrithe. [17–19] The theoretical construction allows one to betterunderstand the structure of nucleic acids; we fo-cus here on the primary structure of DNA andRNA, shown in Figures 1 and 2. DNA and RNAare heteropolymers made of four monomers (ratherthan the predicted, minimal number of two). Themonomers, called nucleotides, consist of a trifunc-tional handle to which is attached one out of fiveplanar, achiral nucleobases. The molecular verte-bra contains a trifunctional deoxyribose or ribose,to which are attached the 5 ′ and 3 ′ phosphates: these two links account for the polarities of thevertebra and of the heteropolymeric chains. Thevertebra is both polar and chiral as expected, butits structure is not minimal (that is consisting of asingle stereogenic carbon): instead, the three stere-ogenic carbons located in the pentose ring are en-dowed each with one of the three specific func-tions of the vertebra, being associated to the 3 ′ end(for C ′ ), to the 5 ′ end (for C ′ ), or to the lateralbase (C ′ ), as described by Natta. [20] The planarside groups are always connected in the same man-ner, forming an isotactic chain. RNA contains afourth chiral C ′ , lowering its symmetry, thus de-creasing its stability. This additional asymmetryis associated with novel phenomena: the attachedhydroxyl group is a catalytic component of sev-eral ribozymes. The phosphorus atom as well asall non-chiral carbon atoms are prochiral (with theexception of the methyl group of the thymine baseof DNA). All atoms attached to tetravalent phos-phorus and carbon atoms are thus discernible. Sim-ilarly, the two faces defined by (trigonal) trivalentcarbons and their attached groups are distinct, illus-trating again the principle of fine division. Lastly,both in DNA and RNA, the common vertebra ex-tends, in fact, into the planar nucleobases. Indeed,four atoms are common in all bases, not only thenitrogen linked to the deoxyribose or ribose, butalso two carbon atoms connected to this nitrogenand an additional hydrogen atom. (To emphasizethis fact, we have drawn the structure in Figures 1and 2 using different colors for the atoms belongingto the backbone, common to all monomers, shownin black, and for the atoms of the bases specific toeach residue, colored in blue.) The geometry ofthe common atoms associated with the backboneonly differ minimally by the angles of six- or five-atom heterocyclic ring structures of the purines andpyrimidines (60 ◦ versus 72 ◦ ). The differences ofchemical structure between the four nucleobasesarise by specific substitutions of reactive groups inthe pyrimidines cytosine, thymine (and uracil) aswell as in the purines guanine and adenine.The minimal number of monomers required tosynthesize informational polymers is two. The ex-istence of four bases, thus of two distinct typesof complementary base pairs having different bondstrengths (involving two hydrogen bonds for ade-nine with thymine versus three for guanine withcytosine), can be explained as follows: In the helix-coil denaturation of DNA, the secondary structure7f the double helix shows a sequence-dependentstability, a phenomenon of central importance inreplication, transcription and recombination. [21]This would be impossible to observe in a dou-ble helical polymer made of only two types ofmonomers, thus of a single type of pair. Indeed,the stability of such a double helix would be that ofa homopolymer. The additional asymmetry makespossible the orientation of the double helical struc-ture of DNA as well as novel phenomena in the ter-tiary structure of chromosomes, for example, or inthe interaction with catalysts.An unreplicated chromosomal DNA moleculeconsists of a single double-helical polymer (this isalso known as the mononeme hypothesis). Theoverall conformation of this extremely long poly-mer is indeed globular within cells. This gen-eral statement refers to a broad and complex fieldof investigation, that of DNA condensation to bediscussed elsewhere. In contrast with globular-ity, circularity seems at first not to be universallyobserved. Circular chromosomal DNA appearswidespread in prokaryotes and is probably univer-sal. Eukaryotic chromosomes are usually linear(circularized versions of these chromosomes canalso be observed in mutants, usually associatedwith genetic deficiencies). Furthermore, eukaroyticcells always contain cyclic DNA molecules as epi-somes (such as in mitochondrial DNA). We formu-late the hypothesis that all cells contain a cyclicDNA molecule, either as a chromosomal chain oras an episome. Circular DNA and the process ofDNA cyclization appears to be universal among liv-ing organisms. Proteins are heteropolymers that are assembled bya molecular motor called the ribosome which musttranslocate efficiently during the polymerizationsteps. This implies that one can apply the analysisleading to the conclusions reached above concern-ing the genetic material to understand the structureof amino acids and proteins.1. Proteins must be assembled from monomerscontaining trifunctional vertebrae that arepolar, chiral and to which side groups are at-tached;2. The side groups should be achiral, spatially separated by a spacer from the molecularvertebra (to avoid direct contact with the mo-tor).3. Polymers assembled from these monomers,polypeptides and proteins, must also be ableto adopt, at least transiently, a helical confor-mation.4. The tertiary structure of these polymersshould be globular to minimize space occu-pation.The structure of the twenty proteinogenicamino acids [22] obeys these basic rules. The stan-dard representation of these amino acids and of thepolypeptidic chain is due to Fischer. It is based onthe concept of the side group, denoted R, specificfor each amino acid and attached to the asymmetricC ∗ a atom, as illustrated for the monomer in Figure 3(top, left) and written below for a two amino acidpeptidic chain: [23]NH . C ∗ HR . CO . NH . C ∗ HR . COOHWe propose a refined representation of these aminoacids, shown in Figure 3 (top, right) which empha-sizes the existence of a C b H spacer group. Thestructure minimizes the amount of matter requiredto build a vertebra: the three-point, chiral, handleconsists of a single chiral carbon to which is at-tached a hydrogen atom and the constitution of thespacer consists of a single methyne group. Thesestructures cannot be further reduced. One can de-scribe this simplicity in terms of atom economy orbiological perfection. This economy results fromthe high energetic cost of the synthesis of proteins(the cost of nucleic acid synthesis is in compari-son much lower, and appears compatible with theirmore opulent sugar-phosphate-aromatic vertebra).The molecular vertebra consists of a single, tri-functional asymmetric C ∗ a atom to which are alsoattached hydrogen, carboxyl and amino reactivegroups. The description of the molecular vertebranow includes a methyne spacer, which can be re-moved from the conventional specific residues orside groups. The C b atom of this spacer is pre-dicted and observed to be achiral (isoleucine andthreonine are exceptions to this prediction, and theabsolute configuration is S for Ile and R for Thrin the Cahn-Ingold-Prelog notation). Table 1, list-ing the twenty proteinogenic amino acids sortedby decreasing molecular weight, details the sim-pler residues that consist of two groups R ′ and R ′′ .The second residue R ′′ is a single hydrogen atom8xcept for isoleucine, threonine and valine (whereR ′′ = CH ). Glycine is special as it is achiral andlacks a methyne spacer group. However, the sidegroup of glycine, being a simple H atom, causes nosteric clash with a molecular motor and glycine of-ten plays the role of a flexible junction in the con-struction of protein chains. Note that the carbonatom of glycine is prochiral and that the two hy-drogen atoms attached to it are discernible.Table 1 also lists the molecules obtained whena hydrogen atom would be added to the separatedactive residue R ′ in place of the spacer bond (in thecase of proline, where the side group is cyclicallyattached to the backbone, a second terminating hy-drogen atom is included). This permits to identifysixteen trimmed specific active side groups. Thefunctional properties of the trimmed side groupsare more readily grasped in this new representa-tion than when the C b H spacer is included in theresidue: the specific side group H − R ′ of trypto-phane is thus indole (instead of skatol), that of ty-rosine is phenol, that of phenylanine is benzene,and that of histidine is imidazole. The structureof these amino acids appears minimal if one wantsto employ the corresponding functional groups. Incontrast, this is not the case for other amino acids:for instance, lysine with 1-propylamine or argi-nine with N-ethylguanidine, where the functionalgroup (amine or guanidine) could be attached usinga short spacer. Altogether, eleven out of the twentyamino acids appear to possess minimal side groups(Trp, Tyr, Phe, His, Met, Asp, Asn, Cys, Ala andGly).The novel representation that we present heredoes not aid our understanding in the case of shortside chains (alanine, cysteine and serine), as the re-moval of the methylene group leaves the residuewithout a carbon atom. Here, it is be better to as-sociate the spacer with the functional side group,leading to the molecules methane, methylmercap-tan and methanol (instead of dihydrogen, hydrogensulfide and water).The major role of proteins as catalysts im-plies, in contrast to the relative simplicity of thegenetic material, that proteins should contain themain functional groups of organic chemistry, suchas acid and base, alcohol and thiol, or aromatic(benzene, indol and phenol). Proteins, furthermore,are to be produced in greater quantities then the ge-netic material itself and, therefore, must involve aneconomical use of matter (atoms). The conclusion reached above that eleven outof the twenty proteinogenic amino acids possessminimal side groups implies that the necessarynumber of functional groups is greater than thenumber of strict minimal structures.The analysis developed here permits to under-stand the polar (in the sense of a temporal and spa-tial, one-dimensional, orientation) and chiral struc-ture of the backbone of proteins and their abilityto adopt, at least transiently (during the elonga-tion of the polypeptidic chain by the ribosome ma-chinery), a helical configuration. In particular, theobservation that the conformation of amino acidsin polypeptides and proteins is very similar in Ra-machandran plots [24, 25] for all amino acids (ex-cept for the two exceptions glycine and proline,which are associated with a disruption of helical or-der) can be rationalized in terms of the requirementof an efficient, side chain independent, interactionwith a molecular motor.Many proteins, in particular enzymes, arefolded in dense, globular structures in their native,functional state. As in DNA, densely packed struc-tures are to be expected as a consequence of theconstant search for miniaturization by natural se-lection. Also as for DNA, this overall dense confor-mation can be described by a universal scaling lawrelating the volume of a globular protein and its de-gree of polymerization. The fractal exponent deter-mined experimentally for proteins (1 / v = .
6) is, infact, slightly smaller than that of a completely com-pact, three-dimensional collapsed polymer, [26] aresult which can be viewed as a consequence ofthe small number of monomers that make enzymesin comparison with the much larger number ofmonomers present in chromosomal DNA.Lastly we observe that proteins are usually lin-ear. However, as for DNA, it is tempting to specu-late that the existence of cyclic proteins is universalamong living organisms. In support of this hypoth-esis, it can be observed that a growing number ofcyclic proteins has been recently described. [27]These investigations suggest that it will beeventually possible to explain why the proteino-genic amino acids, their common vertebra and theirspecific side groups are such as they are and nototherwise.9
Comparison of nucleic acids andproteins
Both nucleic acid chains (DNA and RNA) and pro-teins are heteropolymers assembled out of equilib-rium by molecular motors. They share commoninvariants, common asymmetries (presence of in-formation, spatial orientation through polarity andchirality as well as temporal orientation) and com-mon symmetries (helical symmetry, homochiral-ity, isotacticity, globularity) and achiral side groups(except for two amino acids out of twenty). Thesecommon invariants possess the antiquity of life it-self. However, they fulfill mostly different func-tions: nucleic acids forming the genetic materialprovide a memory having long-term stability andreliability and proteins hold structural roles or serveas biochemical catalysts.The central tools used in the theoretical con-structions, not only catalysts but also biologicalmotors, must have been already present at a veryearly stage, raising a general question of the prebi-otic evolution of such motors.No more than four monomers are required inthe constitution of nucleic acids, whereas at leasteleven monomers are necessary in proteins. Thesefunctional as well as structural differences appearto reflect a certain division of labor between nu-cleic acids and proteins, the genetic material andcatalysts. This suggests a plausible explanation forthe necessary existence of the two types of biopoly-mers. It also points to the likely uniqueness of the logical solution offered by von Neumann for self-reproduction.The results of the constructions describedabove constrain the structures of biopolymers andtheir constituent monomers. This can contributeto the design of non-natural nucleotides or aminoacids to be used in synthetic biology, to a betterunderstanding of prebiotic chemistry, and in thesearch for extraterrestrial forms of life.
Acknowledgements
The present work summa-rizes an ongoing research program initiated a fewyears ago. Previous versions of it have beenpresented to various audiences: Carg`ese summerschool on DNA and Chromosomes, in 2004, 2006,and 2009; Gent Fantom Research School on Sym-metries and symmetry violation in 2004 and 2007;Lectures on the Foundations of Molecular Biol-ogy, Evry in 2004; Lectures on Elements of Biol-ogy, Ecole Polytechnique F´ed´erale de Lausanne in2008; Lectures on Foundations of Biology, Univer-sity Pierre and Marie Curie, Paris 2008-2013; 74thCold Spring Harbor Symposium “Evolution — TheMolecular Landscape” in 2009 and Meeting “FromBase Pair to Body Plan”, Cold Spring Harbor Lab-oratory in 2013. We have greatly benefited from thecomments of their participants. We wish to thankfor discussions and/or comments the manuscriptRoger Balian, Sydney Brenner, George Church,Gregory Chaitin, Albert Libchaber, Marie-ClaudeMarsolier-Kergoat, Theo Odijk, Monica Olvera dela Cruz and Edouard Y´eramian.
References [1] Sikorav JL, Braslau A, Goldar A. Foundations of Biology; 2014. Submitted to the Journal of theRoyal Society Interface.[2] Watson JD, Crick FHC. Molecular structure of nucleic acids — A structure for deoxyribose nucleicacid. Nature (London, U K). 1953;171(4356):737–738. doi:10.1038/171737a0.[3] Watson JD, Crick FHC. Genetical implications of the structure of deoxyribonucleic acid. Nature(London, U K). 1953;171(4361):964–967. doi:10.1038/171964b0.[4] Schr¨odinger E. What is Life? The Physical Aspect of the Living Cell. Cambridge: CambridgeUniv. Press; 1944.[5] de Gennes PG. Scaling concepts in polymer physics. Ithaca: Cornell Univ. Press; 1979.[6] Coxeter HSM. Introduction to Geometry. New York: Wiley; 1961.[7] Friedrich-Freksa H. Bei der chromosomenkonjugation wirksame kr¨afte und ihre bedeutungf¨ur identische verdopplung von nucleoproteinen. Naturewissenchaften. 1940;28(24):376–379.doi:10.1007/bf01480270. 108] Pauling L, Delbr¨uck M. The nature of the intermolecular forces operative in biological processes.Science (Washington, DC, U S). 1940;92(2378):77–79. doi:10.1126/science.92.2378.77.[9] Muller HJ. Pilgrim Trust Lecture: The gene. Proceedings of the Royal Society of London SeriesB-Biological Sciences. 1947;134(874):1–37. doi:10.1098/rspb.1947.0001.[10] Pauling L. Molecular architecture and the processes of life. 21st Sir Jesse Boot Foundation Lecture.Nottingham, UK: The University of Nottingham; 1948.[11] Crane HR. Principles and problems of biological growth. Scientific Monthly. 1950;70:376–389.doi:10.2307/20180.[12] Grosberg AY, Khokhlov AR. Giant Molecules. Here, There, and Everywhere. San Diego: Aca-demic Press; 1997.[13] Onsager L. The effects of shape on the interaction of colloidal particles. Annals of the New YorkAcademy of Sciences. 1949;51(4):627–659. doi:10.1111/j.1749-6632.1949.tb27296.x.[14] Khokhlov AR, Semenov AN. Liquid-crystalline ordering in the solution of long persistent chains.Physica A (Amsterdam, Neth). 1981;108(2–3):546–556. doi:10.1016/0378-4371(81)90148-5.[15] Yin H, Wang MD, Svoboda K, Landick R, Block SM, Gelles J. Transcription Against an AppliedForce. Science. 1995;270(5242):1653–1657. doi:10.1126/science.270.5242.1653.[16] Watson JD, Crick FHC. The structure of DNA. Cold Spring Harbor Symposia on QuantitativeBiology. 1953;18:123–131. doi:10.1101/sqb.1953.018.01.020.[17] C˘alug˘areanu G. Sur les classes d’isotopie des nœuds tridimensionnels et leurs invariants.Czechoslovak Mathematical Journal. 1961;11(4):588–625.[18] White JH. Self-linking and the Gauss integral in higher dimensions. American Journal of Mathe-matics. 1969;91:693–728. doi:10.2307/2373348.[19] Fuller FB. The writhing number of a space curve. Proceedings of the National Academy of Sciencesof the United States of America. 1971;68:815–819. doi:10.1073/pnas.68.4.815.[20] Natta G, Farina M. Stereochimica — molecole in 3D. Milano: Mondadori; 1968. Stereochemistry.London: Longman; 1972. Translated by A. Dempster.[21] Yeramian E. Genes and the physics of the DNA double-helix. Gene. 2000;255(2):139–150.doi:10.1016/S0378-1119(00)00301-2.[22] Crick FHC. On Protein Synthesis. Symposia of the Society for Experimental Biology.1958;12:138–163.[23] Fischer E. Synthese von Polypeptiden. II. Berichte der deutschen chemischen Gesellschaft.1904;37:2486. doi:10.1002/cber.190403702197.[24] Ramachandran GN, Ramakrishnan C, Sasikharan V. Stereochemistry of polypeptide chain config-urations. Journal of Molecular Biology. 1963;7(1):95–99. doi:10.1016/S0022-2836(63)80023-6.[25] Hovm¨oller S, Zhou T, Ohlson T. Conformations of amino acids in proteins. Acta CrystallographicaSection D. 2002;58(5):768–776. doi:10.1107/S0907444902003359.[26] Enright MB, Leitner DM. Mass fractal dimension and the compactness of proteins. Phys Rev E.2005;71(1):011912. doi:10.1103/PhysRevE.71.011912.[27] Trabi M, Craik DJ. Circular proteins — no end in sight. Trends in Biochemical Sciences.2002;27(3):132–138. doi:10.1016/S0968-0004(02)02057-1.11
IIPI OIIDI (cid:19) €€ ©© €€ © €€PPP P ¯ ( O OIDD D D $ 5
PI I DI (cid:20) OI (cid:19) PPPP ( P IIPI OIIDI (cid:19) €€ ©© €€ © €€ DDD OI OI ODO D ( " OI (cid:19) IP OI (cid:19)
Q Q N ¯ Figure 1:
Structure of nucleotides and primary structure of polynucleotides. I Deoxyribonucleicacid.
Two consecutive monomers are shown. The planar nucleobases (C: cytosine, T: thymine, G:guanine, and A: adenine) include atoms drawn in black belonging to the backbone, being common to allof the bases; the atoms that differ are drawn in blue and constitute the specific residues. The magentaarrow indicates the polarity of the backbone. The three chiral carbon atoms are indicated by red asterisks.
P IPIPI OIIDI (cid:19) €€ © €€ ©© €€ © €€PPP P ¯ ( O OIDD D D $ 6
PI I OI OI (cid:19)
PPPP ( P IPIPI OIIDI (cid:19) €€ © €€ ©© €€ © €€ DDD OI OI ODO D ( " OI (cid:19) IP OI (cid:19)
Q Q N ¯ Figure 2:
Structure of nucleotides and primary structure of polynucleotides. II Ribonucleic acid.
The same convention is used as in Figure 1, with U designating uracil. The backbone contains a fourthchiral carbon atom and a reactive hydroxyl group. 12 , p OI (cid:20) S I ' D PP ( D , p OI (cid:20) D | I ' S ¯ S ° I D PP ( DI (cid:19) OI (cid:20) ' D PP ( D , p OI (cid:19) I ' DI DI D | I D PP ( (MZ 1SP Figure 3:
Chemical structure of proteinogenic amino acids.
Top, left: conventional generic repre-sentation of the amino acids showing a residue R (in blue) attached to the asymmetric C ∗ a atom; Top,right: refined generic representation, emphasizing the presence of a prochiral spacer group C b H (in red)to which are attached two residues R ′ and R ′′ (in blue). See also Table 1. Bottom, left: glycine is achiraland lacks a methyne spacer group. Bottom, right: proline is also somewhat an anomaly since its sidegroup cycles back to the amino group of the vertebra.13able 1: The proteinogenic amino acids and their side groups. The table is sorted by decreasing molecular weight of the amino acids.Amino acid M w (Da) − R (conventional) H − R − R ′′ − R ′ H − R ′ − CH C H N skatole 3-methylindole − H − C H N indole2 Y Tyr Tyrosine 181.188 − CH C H OH 4-methylphenol 4-cresol − H − C H OH phenol3 R Arg Arginine 174.201 − ( CH ) NHC ( NH ) NH N-propylguanidine − H − ( CH ) NHC ( NH ) NH N-ethylguanidine4 F Phe Phenylalanine 165.189 − CH C H toluene − H − C H benzene5 H His Histidine 155.155 − CH C H N − H − C H N imidazole6 M Met Methionine 149.211 − ( CH ) SCH ethyl methyl sulfide − H − CH SCH dimethylsulfide7 E Glu Glutamate 147.129 − ( CH ) COOH propionic acid − H − CH COOH acetic acid8 K Lys Lysine 146.188 − ( CH ) NH − H − ( CH ) NH − ( CH ) CONH propionamide − H − CH CONH acetamide10 D Asp Aspartate 133.103 − CH COOH acetic acid − H − COOH formic acid11 N Asn Asparagine 132.118 − CH CONH acetamide − H − CONH formamide12 L Leu Leucine 131.173 − CH CH ( CH ) isobutane − H − CH ( CH ) propane13 I Ile Isoleucine 131.173 − CHCH CH CH butane − CH − CH CH ethane14 C Cys Cysteine 121.158 − CH SH methanethiolmethyl mercaptan − H − HS hydrogen sulfide15 T Thr Threonine 119.119 − CH ( OH ) CH ethanol − CH − OH water16 V Val Valine 117.146 − CH ( CH ) propane − CH − CH methane17 P Pro Proline 115.130 − ( CH ) − propane(H − R − H) − H − ( CH ) − ethane (H − R ′ − H)18 S Ser Serine 105.093 − CH OH methanol − H − OH water19 A Ala Alanine 89.093 − CH methane − H − H dihydrogen20 G Gly Glycine 75.067 − H dihydrogen ∅ ∅ ∅∅ ∅ ∅