Evolution of default genetic control mechanisms
EEvolution of default genetic control mechanisms
William Bains *, Enrico Borriello , Dirk Schulze-Makuch Department of Earth, Atmospheric and Planetary Sciences,Massachusetts Institute of Technology. Cambridge. MA 02139, USA ASU-SFI Center for Biosocial Complex Systems, Arizona StateUniversity, Tempe, AZ 85281-2701 , USA Zentrum f¨ur Astronomie und Astrophysik, Technische Universit¨at Berlin,Straße des 17. Juni 135, 10623 Berlin , Germany German Research Centre for Geosciences (GFZ), SectionGeomicrobiology, Potsdam, Germany. Department of Experimental Limnology, Leibniz-Institute of FreshwaterEcology and Inland Fisheries (IGB), Stechlin, Germany.* Correspondence to W.Bains; [email protected]
Abstract
We present a model of the evolution of control systems in a genome un-der environmental constraints. The model conceptually follows the Jacoband Monod model of gene control. Genes contain control elements whichrespond to the internal state of the cell as well as the environment to con-trol expression of a coding region. Control and coding regions evolve tomaximize a fitness function between expressed coding sequences and the en-vironment. 118 runs of the model run to an average of 1 . × ‘generations’each with a range of starting parameters probed the conditions under whichgenomes evolved a ‘default style’ of control. Unexpectedly, the control logicthat evolved was not significantly correlated to the complexity of the en-vironment. Genetic logic was strongly correlated with genome complexityand with the fraction of genes active in the cell at any one time. Morecomplex genomes correlated with the evolution of genetic controls in whichgenes were active (‘default on’), and a low fraction of genes being expressedcorrelated with a genetic logic in which genes were biased to being inactive1 a r X i v : . [ q - b i o . GN ] J a n nless positively activated (‘default off’ logic). We discuss how this mightrelate to the evolution of the complex eukaryotic genome, which operates ina ‘default off’ mode. Obligate multicellularity is uniquely a eukaryotic trait (1-3), and with itthe morphological complexity that comes from combining many distinct celltypes into one organism. Multicellularity requires complex genetic controlsboth to provide the control to generate different genetic activity patterns indifferent cell types and to provide the ‘programme’ to construct the adultorganism. In addition, the more complex internal architecture and controlsin the eukaryotic cell also require specific controls. In some single-celled eu-karyotes such internal complexity resembles that of equivalently sized mul-ticellular organisms. Reflecting this, genome sizes in eukaryotes can exceedthose of the largest bacterial or archaeal (“prokaryotic”) genomes by 4 ordersof magnitude (Figure 1).There is substantial overlap in coding capacity between the larger prokary- genome size (megabases)0.00.10.20.30.40.5 r e l a t i v e nu m b e r o f g e n o m e s VirusArchaeaBacteriaEukarya
Figure 1: Sizes of completed genome sequences, showing distinctsizes for prokaryotes (bacteria and archaea), eukaryotes and viruses. X axis: genome size in megabases. Y ∼ (cid:48) Ktedonobacter racemifer ( ∼ (cid:48)
500 CDS (5)),
Sorangium cellulosum ( ∼ Magnetobacterium bavaricum ( ∼ − (cid:48)
000 ( e.g. (8, 9)) and autotrophic protists (10 (cid:48) − (cid:48)
000 CDS (9)) andapproaches that of Drosophila melanogaster ( ∼ (cid:48)
000 CDS (10)). The sizedifference between prokaryotic and eukaryotic genomes is primarily due tonon-coding DNA that is related in part to gene control. Thus the E.coligenome has little non-coding DNA, and ∼
285 proteins are involved in genecontrol (11), ∼
7% of the genome. By contrast over 90% of the humangenome is non-coding, and conservative estimates are that 10 times as manynon-coding bases as coding bases are evolutionarily conserved (i.e. are pre-sumed to have selectable function) (12, 13).What enabled this increase in genetic complexity? The key differencebetween prokaryotic and eukaryotic cells have been suggested to be chem-istry, intracellular structure, energetics and genetics. In general, any smallmolecule structure made by a eukaryotic cell will be made by a prokaryoteas well. Many ‘eukaryotic’ cellular structures are actually found in a fewprokaryotes as well. Linear chromosomes are found in bacteria (14 − Achromatium oxaliferum con-tains complex internal membranes containing calcium carbonate (whosefunction is obscure) (26),
Entotheonella detoxifies arsenic and barium bysequestering it in internal vesicles (27), and cyanobacteria have stacked in-ternal photosynthetic membranes (28). The intracellular membrane systemof eukaryotes is integrated into a dynamic network of vesicle trafficking andcontrol which is rare in prokaryotes (reviewed in (29)); however some ofthe core proteins and structural elements of a cytoskeleton are also foundin prokaryotes (30 − Epulopiscium fishelsoni has an internal tubule system so similar to eukaryotes that it was initiallymistaken for a protozoan (36, 37). These examples all suggest that complexstructure per se follows from large size, rather than large size following frominternal structure.It is widely accepted that the modern eukaryotic cell evolved by a seriesof endosymbiotic events (38, 39). Recent insights gained from molecular bi-ology show examples of endosymbiotic bacteria that live inside other bacteria(40 − −
47) suggest that the endosymbiotic event, andconsequent development of internal membrane-bound energy-generating or-ganelles, enabled the ability to generate energy from intracellular membranesacquired through endosymbiosis is key, as more genes imply more proteinsand proteins require energy to make. We find this theory lacking for threereasons. Firstly, the majority of genes in the larger eukaryotic genomes donot code protein – complexity comes from non-coding RNA genes and reg-ulatory elements as discussed above. Secondly, most of the coding genesin any one cell are not transcribed; indeed the whole reason to maintain acomplex genetic apparatus is so that different subsets of genes can be ex-pressed at different times. Genomes containing more coding sequences donot make more proteins at any one time. Lastly, protein synthesis is onlythe major use of cellular energy in autotrophic bacteria grown under con-ditions of unlimited nutrition. Under more normal conditions of growth,protein synthesis rarely is observed to consume more than 20% of the cell’senergy, and of course in non-growing cells (which is most cells in the bio-sphere most of the time) protein synthesis is only needed for maintenanceand turnover, a minor part of the overall ‘maintenance energy’ (48 − We attempt to model the evolution of control logic of genes under selectivepressure. As a balance between the need for computability on one hand andthe need for biological ‘realism’ on the other, we chose the ‘classical’ operonas a model on which to build the model structure. A series of sequencesupstream of the coding sequence can bind proteins which allow, promote orcatalyse transcription (positive elements) or which can bind proteins thatretard or prevent transcription (negative elements). A similar process ap-plies to eukaryotic genes in that positive and negative regulatory elementsinfluence the transcription of the gene, although in eukaryotes those regu-latory elements may be distant from the gene. Whether those regulatoryelements are active will depend on the proteins in the cell, so that thereis feedback between the phenotype and the transcription of the genotypethat it encodes. The fitness of an organism depends on the ‘fit’ betweenits phenotype and its environment, but that environment can change, sothe expression of genes must also be influenced by the environment. The5)
SelectionTranscriptionControl FitnessSensing / Control EnvironmentEnvironmentPhenotypePhenotypeGenotypeGenotype B) Gene TranscriptOrganism input +ve -veEnvironmentenvironmentalmappositiveregulatoryelement negativeregulatoryelement 'coding'segmentPopulation
Figure 2: Summary of model structure. A) overall design philosophy, show-ing feedbacks between genotype, its encoded phenotype, and the environ-ment that it must fit. B) summary of model components. See text fordetails. 6odel must also be able to be queried for some surrogate of ‘default off’or ‘default on’ genetics independent of how many genes in an organism areactually transcribed at any one time (which will depend on the demands ofthe environment).The properties of the model are summarised in Figure 2A.
To capture the requirements above, the model was constructed as follows.For simplicity, everything in the model is strings of one type. Thus the phenotype is a set of strings of the same sort of as the genotype . The stringsare made up of different characters ; there can be any number of types ofcharacters (if the strings were to mimic DNA or RNA, the number of char-acter types would be 4; the model was run with the number of charactertypes ranging from 2 to 16). There is no equivalent of protein translationin the system. The model consisted of a number of organisms – in this ini-tial implementation there were only 5 organisms for computational reasons.The organisms exist in an environment . Each organism contains a numberof genes which together comprise its genotype ; in the runs reported here,organisms contained 25, 50 or 100 genes. Each gene is composed of up toten positive regulatory elements, up to ten negative regulatory elements, anda coding sequence . The sum of the coding regions of genes that are activeat any one time comprise the organism’s phenotype. The organism’s fitnessis the match between its phenotype and its environment as follows. Theenvironment comprises positive elements , negative elements , and signallingelements . Fitness is the sum of the number of positive environmental el-ements that match the current phenotype minus the number of negativeenvironmental elements that match the current phenotype. (This is to re-flect that sometimes having a function in a cell can be detrimental to thecell; were this not true, in our model all cells would express all genes all thetime for maximum fitness.)Gene expression is controlled as follows. A regulatory element is activewhen either it matches an environmental signalling element or it matchesthe phenotype. This represents the transduction of an environmental signalinto gene activity, and the transduction of internal gene activity into geneactivity. If the sum of the number of active positive regulatory elementsexceed the number of active negative regulatory elements then the gene ittranscribed and its coding sequence is added to the phenotype.The model is seeded with random strings. At each cycle a new phenotypeis computed, and new fitness computed for each organism , and the most fit7rganism randomly replaces one of the other organisms (which can includeself-replacement). The organisms are then mutated by making small, ran-dom changes (character changes, insertions or deletions, with a bias of 6:4deletion over insertion) to a fraction (typically between 10 − and 5 × − )of the strings in the genotype, or completely deleting one of them (typicallywith a probability between 10 − and 5 × − ).The model components are summarised in Figure 2B. We begin by showing that the model produces results that are consistentwith adaptation, i.e. with changing from an initial random state to a statewhere the average fitness of the organisms is greater than it was at the start.We emphasise that changes made to the components of the model are en-tirely random; there is no directionality in the model except a slight biastowards gene shrinkage noted below. Both the initial genome and the envi-ronmental factors that the genome has to adapt to are randomly generatedas well. Adaptation is therefore the result of selection for better ‘fitness’.We can measure the ‘degree of perfection’ P of an organism in terms ofthe evolved fitness F as a fraction of the possible maximum fitness, as themaximum fitness is the number of environmental factors E f . Some examplefitness curves are shown in Figure 3. Figure 3A shows a typical curve thatreaches a plateau of fitness and then does not achieve any greater fitnessin the run. Figure 3B shows a curve that is similar to ∼ (cid:48)
000 genera-tions, but then a new increase in fitness is observed. Figure 3C shows thedecomposition of the fitnesses to each of three environments in a model,together with the average across all environments. In this run, the organismis tested against one of three, unrelated environments; the environment thatthe organism has to match changes every two generations. Note that fit-ness to each environment does not increase in parallel – sometimes selectionhas resulted in better fitness for one environment, sometimes for another.Fitness for an environment can actually decline if overall fitness does notdecrease substantially. Figure 3D shows the separate fitness trajectories offive organisms as they evolve in a single environment. Again, individualorganism can lose fitness, but the population trend is usually to increasingfitness. Lastly, Figure 3E shows a model that has not evolved significantly.Most of the change in fitness in Figure 3E appear to be noise, and fitnesswanders around a low average ( P ∼ .
04 in this case, as E f = 100).8) B) a v e r a g e f i t n e ss a v e r a g e f i t n e ss C) D) a v e r a g e f i t n e ss environment 1environment 2environment 3average t i m e ( m illi o n s o f g e n e r a t i o n s ) o r g a n i s m ( i n d i v i d u a l ) f i t n e ss E) a v e r a g e f i t n e ss Figure 3: Examples of fitness plots for different runs of the model. A) Aver-age fitness of the population converges smoothly on a maximum. B) Averagefitness shows a jump in adaptation at 500 (cid:48)
000 generations. C) convergenceof average fitness across three environments, showing divergent adaptationto each of the environments. D) Fitness of each of the five organisms makingup the population plotted separately in a run that converges on a solution.E) Plot of fitness in a run that fails to converge on an optimum fitness.9 .2 Failure to adapt
Models did not converge onto a fit state ∼ / C p as follows: we define a time P as the time at which the pop-ulation reaches a plateau of adaptation, i.e. does not appear by inspectionto be able to increase its adaptation. C p distinguishes between populationsthat smoothly approach such a fitness plateau, such as shown in Figure 3A,and populations whose fitness fluctuates, such as in Figure 3E. Thus, if thefitness at times 0 . P , 0 . P , 0 . P and P are A , B , C , D respectively, and S ( x ) is the sign of x , (such that x > ⇒ s ( x ) = 1; X < ⇒ s ( x ) = − x = 0 ⇒ s ( x ) = 0) then C p = s ( B − A ) + s ( C − A ) + s ( D − A ) + s ( C − B ) + s ( D − B ) + s ( D − C ) . If A < B < C < D (i.e. fitness is increasing throughout the run), then C p = 6.For this analysis, runs of the model were only used if the curve parameter C p is greater than zero. 101 out of 118 runs of the model met this criterion.Omitting curves for which C p ≤ . × generations. (For comparison, the long-term evolu-tion experiments performed by the Lenski lab. have been running for morethan 20 (cid:48)
000 generations, and show a range of adaptations in gene controlstructure without changing the underlying mechanisms or logic of the genecontrol architecture (68 − genome complexity10 e n v i r o n m e n t a l c o m p l e x i t y perfection index Figure 4: Selection is inefficient for runs with a combination of high environ-mental complexity and low genome complexity. X axis: genome complexity(number of genes in each organism times the number of types of bases ofwhich those genes are made up, times the average length of the genes atthe end of the selection process. Y axis: environmental complexity; numberof environmental factors that must match the genotype, multiplied by thenumber of different environments to be fitted, the number of base types ineach sequence, and the length of the environmental strings to be matched.The ‘perfection index’ –fitness at selection plateau divided by maximumpossible fitness– is both proportional to the circle areas and, for enhancedreadability, to the colour scale (vertical bar). There is a slight bias the mutation mechanism towards gene shrinkage, in-cluded because a) this is seen in real mutation rates and b) it protects themodel against indefinite expansion of genes through ‘drift’. Despite this,the average length of coding regions tends to increase with model progres-sion (Figure 6). This is explicable as follows. Gene activation depends onmatching part of an expressed coding sequence to a regulatory sequence.Thus larger genes mean a greater chance of productive interaction with a11 .0 0.2 0.4 0.6 0.8 1.0fraction of time before fitness plateau0.000.250.500.751.001.25 a v e r a g e f i t n e ss + / - S T D all 118 models101 models with positive curve parameter Figure 5: Average fitness of the model runs. Curves are normalized tomaximum fitness = 1. The X axis shows the fraction of time until the fitnessreaches a stable plateau. The average fitness across all 118 model runs wascalculated for time ∈ [0 , . ∈ [0 . , .
5] , time ∈ [0 . , . ∈ [0 . ,
1] and time > >
1. Error bars are standard deviations. Blue curve: All118 model runs. Orange curve: 101 model runs for which Curve Parameter C p > The computational effort to exhaustively analyse the entire control networkof up to 100 genes each interacting with up to 20 control elements in eachgene in each of 5 organisms in 100 runs or up to 50 (cid:48)
000 time steps each isunrealistic, and so we summarise the overall style of control as follows.
Average control element length.
A short regulatory element is morelikely to match a sequence in the phenotype than a long regulatory element,because a short string is more likely to match a random target by chancethan is a long string. (Consider the chance that the strings “A” and “AL-PHABET” will match the text in this paper) Thus if negative regulatory12 a v e r a g e c o d i n g s e q u e n c e l e n g t h a t f i t n e ss p l a t e a u gene number Figure 6: Length of the coding sequences averaged across all genes in a run( Y axis) as a function of the starting length of those sequences ( X axis).Circle sizes and color scale show the number of genes in the run, and showno distinct pattern.elements are on average shorter than positive regulatory elements in a gene,it is likely that the gene is not active. Thus the ratio R a = ratio (average regulatory element length)= (cid:80) neg. elem. len. / (cid:80) neg. elem. count (cid:80) pos. elem. len. / (cid:80) pos. elem. countis a measure of the bias towards inactive genes, i.e. of ‘default off’ regula-tory style. (Zero-length elements, i.e. ones which have been deleted, are notcounted in the average). Average minimum control element length.
The problem with averageregulatory element length as a measure of control logic is that one shortregulatory element (likely to be active) can dominate the control of a geneover a number of long regulatory elements (which are in effect ‘junk DNA’,never being active). So the shortest regulatory element is the one most likelyto be ‘active’ in a gene. If the shortest positive regulatory element is shorterthan the shortest negative regulatory element, then there is a greater chancethat the gene will be active; if the shortest negative regulatory element isshorter than the shortest positive element, then the gene is more likely to be13nactive. We therefore adopted a measure looking for the shortest regulatoryelement in a gene. For each gene, the shortest non-zero control sequence isrecorded for positive and negative control elements. The average of thelength of the shortest regulatory element for all genes in the genotype isreported. Thus the ratio R m = ratio (minimum regulatory element length)= ( (cid:80) min. neg. elem. len./gene) / (cid:80) genes( (cid:80) min. pos. elem. len./gene) / (cid:80) genesis a measure of the bias towards inactive genes, i.e. of ‘default off’ regulatorystyle. (Again, zero-length elements, i.e. ones which have been deleted, arenot counted in the average)Correlations of these two measures to both the inputs and the outputsof model runs are provided in Table 1. We emphasise that this is an initialmodelling study, and much more extensive modelling with more efficientlycoded models and better hardware will be needed to confirm, expand anddissect these findings. However three patterns are clear from the resultshere.Correlations between two measures of genetic ‘style’ and some inputsand outputs from models.(a) Environmental complexity is the [number of characters] × [number ofenvironments] × [number of factors per environment];(b) Genome complexity = [number of characters] × [number of genes] × [average length of coding regions];(c) Regulatory complexity = [number of characters] × [number of genes] × [average length of regulatory elements].“Min –ve” = average length of the shortest negative regulatory elementin each gene, averaged over all genes. “min +ve“= average length of theshortest positive regulatory element in each gene, averaged over all genes.“Avg –ve” = average length of all non-zero-length negative regulatory el-ements in the genome.”Avg +ve”= average length of all non-zero-lengthpositive regulatory elements in the genome. Significance of the correla-tions (i.e. chance that the observed correlation is seen in 101 model runs ifthe measures of control logic are not correlated with the input parameters) ∗ = p < . , ∗∗ = p < . , ∗ ∗ ∗ = p < . , ∗ ∗ ∗ ∗ = p < . odel parameters Measures of genetic control logic min − ve / min +ve( < < Starting parameters
Number of environments − − − − Parameters at fitness plateau
Adaptation at plateau 0.326 ** 0.167Fraction of perfection 0.117 − No stochastic model will give the same results in different runs, so it isimportant to show that the variability in output is not so extreme as torender results uninterpretable. The purpose of this modelling was to testthe model concept and provide an initial exploration of parameter space: as aresult, only a few sets of runs of the model were replicate runs with the sameparameters. We chose three runs that gave different control logic outputs inan initial run and re-ran the same parameters with different starting genomesand environments. The results are summarised in Figure 8. This shows that,while results are variable, the genetic control outputs, and specifically the R m parameter, are consistent within replicates: Replicates of a model runthat gave R m = 1 consistently gave R m ∼ R m > R m >
1, andReplicates of a run that gave R m < R m < .5 1.0 1.5 2.0 2.5min -ve / min +ve0.00.20.40.60.81.0 f r a c t i o n o f g e n e s e x p r e ss e d genome size Figure 7: Relationship between the ratio of minimum negative elements tominimum positive elements ( X axis: < Y axis). Both circle areaand color scale are proportional to genome size (25, 50 or 100 genes). “Min–ve” = average length of the shortest negative regulatory element in eachgene, averaged over all genes. “min +ve“= average length of the shortestpositive regulatory element in each gene, averaged over all genes. We have presented a model of the evolution of genome control logic, and aninitial analysis of its performance on a small number of test cases. The modelperforms in a comprehensible way, and evolves fitter organisms. Preliminarystatistics suggest that the model performance is stable, i.e. a given set ofstarting conditions will give a set of outputs more closely related to eachother than random, despite the model being a stochastic one.We emphasise that this is a preliminary exploration of this model only,and much more needs to be done. However with that caveat, the resultsshow three things of potential interest to the hypothesis that stimulated itscreation i The genetic logic a population of organisms evolves is not related to thecomplexity of the environment it finds itself in. This was unexpected ii ) The evolved genetic logic is strongly related to the starting and the fi-nal, evolved genome complexity. More complex genomes have ‘default17) a v e r a g e - v e e l . l e n . / a v g . + v e e l . l e n . time to plateau(all data) B) a v e r a g e - v e e l . l e n . / a v g . + v e e l . l e n . set 3set 2 set 1 time to plateau(replicate sets) Figure 8: Reproducibility across runs. A) Example outcomes from all runswith diverse starting conditions, and B) From replicate sets of runs startedfrom the same set of parameters. X axis: ‘Perfection index’ (fitness at thefitness plateau as a fraction of the maximum possible fitness with those pa-rameters). Y axis: ratio of the length of the average minimum negativeregulatory element length to the average minimum positive regulatory ele-ment length. Both circle area and color scale are proportional to the numberof steps taken to reach a fitness plateau.18n’ logic. This is not predicted by the model, but as the model’s pre-dictions on evolution of genetic logic refers primarily to the acquisitionof new genes in the genome, an aspect of evolution not captured here,this does not test the hypothesis. iii ) The strongest correlations with genetic logic are with the fraction ofthe genome that is expressed.Point ( iii ) above fits with (although is a weak test of) our original hy-pothesis. It also fills in a significant gap in the hypothesis about why a‘default off’ logic should be selected. Clearly, an organism cannot evolve‘default off’ in anticipation of acquiring new genes. However if a specificcombination of environmental and genetic features encouraged the develop-ment of ‘default off’ genetics, then such an organism would be pre-adaptedfor genome complexification by gene duplication and divergence. As notedin the introduction, the majority of genes in eukaryotic genomes are notexpressed at any one time. Most of them are ‘off’. Our model appears tobe evolving a similar expression pattern in some cases, and in those casesthe ‘default off’ logic is selected.If our results represent the more complex world of real genetics, then wemight speculate that organisms living in an environment that occasionallycalled on a diverse set of genes but most of the time did not require themwould feel short-term selective pressure to evolve a ‘default off’ logic. Suchan environment could be one in which a heterotroph lived in a communitymade up of a changing composition of autotrophs, each of which provided asmall number of substrates to the heterotroph. If such a scenario were valid,then we would expect more comprehensive modelling to reveal influences ofenvironmental change on both expressed gene numbers and default logic.Such work is being actively pursued. References
1. Knoll AH. The Multiple Origins of Complex Multicellularity. AnnualReview of Earth and Planetary Sciences. 2011;39(1):217-39.2. Parfrey LW, Lahr DJG. Multicellularity arose several times in theevolution of eukaryotes (Response to DOI 10.1002/bies.201100187).BioEssays. 2013;35(4):339-47.3. Rokas A. The Origins of Multicellularity and the Early History of theGenetic Toolkit For Animal Development. Annual Review of Genetics.19008;42(1):235-51.4. Dagan T, Roettger M, Stucken K, Landan G, Koch R, Major P, et al.Genomes of Stigonematalean Cyanobacteria (Subsection V) and theEvolution of Oxygenic Photosynthesis from Prokaryotes to Plastids.Genome biology and evolution. 2013;5(1):31-44.5. Chang Y-j, Land M, Hauser L, Chertkov O, Glavina Del Rio T, NolanM, et al. Non-contiguous finished genome sequence and contextualdata of the filamentous soil bacterium Ktedonobacter racemifer typestrain (SOSP1-21T). Standards in Genomic Sciences. 2011;5(1):97-111.6. Schneiker S, Perlova O, Kaiser O, Gerth K, Alici A, Altmeyer MO,et al. Complete genome sequence of the myxobacterium Sorangiumcellulosum. Nat Biotech. 2007;25(11):1281-9.7. Kolinko S, Richter M, Gl¨ockner F-O, Brachmann A, Sch¨uler D. Single-cell genomics of uncultivated deep-branching magnetotactic bacteriareveals a conserved set of magnetosome genes. Environmental Micro-biology. 2016;18(1):21-37.8. Xu J, Saunders CW, Hu P, Grant RA, Boekhout T, Kuramae EE, et al.Dandruff-associated Malassezia genomes reveal convergent and diver-gent virulence traits shared with plant and human fungal pathogens.Proceedings of the National Academy of Sciences. 2007;104(47):18730-5.9. Anantharaman V, Iyer LM, Aravind L. Comparative Genomics of Pro-tists: New Insights into the Evolution of Eukaryotic Signal Transduc-tion and Gene Regulation. Annual review of microbiology. 2007;61(1):453-75.10. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Ama-natides PG, et al. The Genome Sequence of Drosophila melanogaster.Science. 2000;287(5461):2185-95.11. Borukhov S, Lee J, Laptenko O. Bacterial transcription elongationfactors: new insights into molecular mechanism of action. MolecularMicrobiology. 2005;55(5):1315-24.12. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J,et al. Initial sequencing and analysis of the human genome. Nature.2001;409:860. 203. Ponting CP, Hardison RC. What fraction of the human genome isfunctional? Genome Research. 2011;21(11):1769-76.14. Ferdows MS, Barbour AG. Megabase-sized linear DNA in the bac-terium Borrelia burgdorferi, the Lyme disease agent. Proceedings ofthe National Academy of Sciences. 1989;86(15):5969-73.15. Hinnebusch J, Tilly K. Linear plasmids and chromosomes in bacteria.Molecular Microbiology. 1993;10(5):917-22.16. Sherman L, Min H, Toepel J, Pakrasi H. Better Living Through Cyan-othece – Unicellular Diazotrophic Cyanobacteria with Highly VersatileMetabolic Systems. In: Hallenbeck PC, editor. Recent Advances inPhototrophic Prokaryotes. Advances in Experimental Medicine andBiology. 675: Springer New York; 2010. p. 275-90.17. Kube M, Schneider B, Kuhl H, Dandekar T, Heitmann K, MigdollA, et al. The linear chromosome of the plant-pathogenic mycoplasma’Candidatus Phytoplasma mali’. BMC Genomics. 2008;9(1):306.18. Kulp A, Kuehn MJ. Biological Functions and Biogenesis of SecretedBacterial Outer Membrane Vesicles. Annual review of microbiology.2010;64:163-84.19. Stolz JF. Bacterial Intracellular Membranes. eLS: John Wiley & Sons,Ltd; 2001.20. Hanson RS, Hanson TE. Methanotrophic bacteria. MicrobiologicalReviews. 1996;60(2):439-71.21. Prust C, Hoffmeister M, Liesegang H, Wiezer A, Fricke WF, Ehrenre-ich A, et al. Complete genome sequence of the acetic acid bacteriumGluconobacter oxydans. Nat Biotech. 2005;23(2):195-200.22. Fuerst JA. Intracellular compartmentalization in Planctomycetes. AnnRev Microbiology. 2005;59:299 - 328.23. van Niftrik LA, Fuerst JA, Damst´e JSS, Kuenen JG, Jetten MSM,Strous M. The anammoxosome: an intracytoplasmic compartment inanammox bacteria. FEMS Microbiology Letters. 2004;233(1):7-13.24. van Niftrik L, Geerts WJC, van Donselaar EG, Humbel BM, Yaku-shevska A, Verkleij AJ, et al. Combined structural and chemical anal-ysis of the anammoxosome: A membrane-bounded intracytoplasmic21ompartment in anammox bacteria. Journal of Structural Biology.2008;161(3):401-10.25. Fuerst JA, Webb RI, Garson MJ, Hardy L, Reiswig HM. Membrane-bounded nucleoids in microbial symbionts of marine sponges. FEMSMicrobiology Letters. 1998;166(1):29-34.26. Schorn S, Salman-Carvalho V, Littmann S, Ionescu D, Grossart H-P,Cypionka H. Cell Architecture of the Giant Sulfur Bacterium Achro-matium oxaliferum: Extra-cytoplasmic Localization of Calcium Car-bonate Bodies. FEMS Microbiology Ecology. 2019;96(2).27. Keren R, Mayzel B, Lavy A, Polishchuk I, Levy D, Fakra SC, et al.Sponge-associated bacteria mineralize arsenic and barium on intracel-lular vesicles. Nature Communications. 2017;8(1):14393.28. Kunkel DD. Thylakoid centers: Structures associated with the cyano-bacterial photosynthetic membrane system. Archives of Microbiology.1982;133(2):97-9.29. J´ekely G. Origin of eukaryotic endomembranes. In: J´ekely G, edi-tor. Eukaryotic Membranes and Cytoskeleton: Origins and Evolution.Berlin: Springer; 2007. p. 38 - 51.30. Roeben A, Kofler C, Nagy I, Nickell S, Ulrich Hartl F, Bracher A.Crystal Structure of an Archaeal Actin Homolog. Journal of MolecularBiology. 2006;358(1):145-56.31. Vollmer W. The prokaryotic cytoskeleton: a putative target for in-hibitors and antibiotics? Applied Microbiology and Biotechnology.2006;73(1):37-47.32. van den Ent F, Amos LA, Lowe J. Prokaryotic origin of the actincytoskeleton. Nature. 2001;413(6851):39-44.33. Erickson HP. FtsZ, a tubulin homologue in prokaryote cell division.Trends in Cell Biology. 1997;7(9):362-7.34. Ausmees N, Kuhn JR, Jacobs-Wagner C. The Bacterial Cytoskele-ton: An Intermediate Filament-Like Function in Cell Shape. Cell.2003;115(6):705-13.35. Williams TA, Foster PG, Cox CJ, Embley TM. An archeal originof eukaryotes supports only two primary domains of life. Nature.2014;504:231 - 6. 226. Montgomery WL, Pollak PE. Epulopiscium fishelsoni N. G., N. Sp., aProtist of Uncertain Taxonomic Affinities from the Gut of an Herbiv-orous Reef Fish. Eukaryotic microbiology. 1988;35(4):565 - 9.37. Angert ER, Clements KD, Pace NR. The largest bacterium. Nature.1993;362(6417):239-41.38. de Duve C. The origin of eukaryotes: a reappraisal. Nature ReviewsGenetics. 2007;8:395 - 403.39. Pittis AA, Gabald´on T. Late acquisition of mitochondria by a hostwith chimaeric prokaryotic ancestry. Nature. 2016;531(7592):101 - 4.40. Thao ML, Gullan PJ, Baumann P. Secondary ( γ -Proteobacteria) En-dosymbionts Infect the Primary ( ββ