A framework for parsing heritable information
AA framework for parsing heritable information
Antony M Jose Department of Cell Biology and Molecular Genetics,University of Maryland,College Park, MD 20742, USA
Living systems transmit heritable information using the replicating gene sequences and the cyclingregulators assembled around gene sequences. Here I develop a framework for heredity and developmentthat includes the cycling regulators parsed in terms of what an organism can sense about itself andits environment by defining entities, their sensors, and the sensed properties. Entities include smallmolecules (ATP, ions, metabolites, etc.), macromolecules (individual proteins, RNAs, polysaccharides,etc.), and assemblies of molecules. While concentration may be the only relevant property measuredby sensors for small molecules, multiple properties that include concentration, sequence, conformation,and modification may all be measured for macromolecules and assemblies. Each configuration of theseentities and sensors that is recreated in successive generations in a given environment thus specifies apotentially vast amount of information driving complex development in each generation. This Entity-Sensor-Property framework explains how sensors limit the number of distinguishable states, how distinctmolecular configurations can be functionally equivalent, and how regulation of sensors prevents detectionof some perturbations. Overall, this framework is a useful guide for understanding how life evolves andhow the storage of information has itself evolved with complexity since before the origin of life.
Keywords: systems biology, information theory, homeostasis, transgenerational inheritance, synthetic biology
Introduction
Analyses of living systems from molecular to pop-ulation scales have revealed information storage andprocessing across multiple scales as key attributesof life [1]. The need to understand the behavior ofa basic unit of life - a single cell - in terms of anintegrated framework for information handling hasbeen previously articulated [2–5], but is yet to bedeveloped. A single cell is often the bottleneck stagethat separates successive generations, making it theminimal space for storing all heritable information(see supplemental text for variations on the single-cell bottleneck). Such information in molecules arepart of the ‘nature’ of organisms and do not includeinformation transmitted when parents train progeny,which can be considered as ‘nurture’. Cells and morecomplex living systems can change their informationcontent by learning through interactions with theirenvironment. However, their ability to transmit anysuch learned information from one generation to thenext is limited by the available storage in the bottle-neck stage and potentially other system constraints(e.g. inability of learned information to cross genera-tional boundaries)[6]. To appreciate these limits, weneed to consider the total amount of information thatcould be encoded using all molecules in the bottle- neck stage. Such joint consideration of all heritableinformation that is transmissible using molecules willinform how complexity grows over evolutionary time,what constitutes nature versus nurture, and how tosynthesize new living systems.To facilitate discussion of all heritable informa-tion, I begin by defining key terms introduced in anearlier article [6]: stores of information, stored in-formation, and cell code.
Stores (n.) of informationrefer to molecules or arrangements of molecules thathold information. This information can be trans-ferred to other molecules or arrangements and theoriginal store can be degraded or modified after suchtransfer. Therefore, molecules and arrangements ofmolecules can have stored (v.) information.
Cellcode (n.) refers to the heritable information that en-codes the development of organisms from a bottle-neck stage, which is minimally a single cell . Similardevelopment in successive generations in a given en-vironment presumably relies on similar cell codes as-sembled during bottleneck stages (see supplementaltext for more on assembly of cell codes).The information in a cell code can be conceptu-ally separated into two distinct forms [6]. One isthe genome sequence, where information is stored ina linear sequence of bases, and the other is the re- correspondence: email - [email protected], mail - Rm 2136, Bioscience Research Building, 4066 Campus Drive, Universityof Maryland, College Park, MD - 20742, USA. a r X i v : . [ q - b i o . O T ] M a r urring arrangement, where information is stored inthe concentrations, configurations, and interactionsof molecules in bottleneck stages (see supplementaltext on cell code assembly). While the informationcontent in this arrangement and the extent to whichit is recreated is currently not easily quantified, itis clear that heredity relies on information that isheld in multiple stores and transmitted across gen-erations. This communication of heritable informa-tion through the development of an organism fromone generation to the next has been likened to thetransmission of messages through a communicationchannel from sender to receiver (e.g. refs. [6, 7]).Just as ‘the fundamental problem of communicationis that of reproducing at one point either exactly orapproximately a message selected at another point’[8], the fundamental problem of heredity is that of re-producing at one bottleneck stage either exactly orapproximately a cell code selected at the precedingbottleneck stage.Each choice of molecules and their arrangementin bottleneck stages collectively stores heritable in-formation and forms a message transmitted acrossgenerations. The average information content in amessage chosen from among N possible messagesis given by the Shannon entropy, H = P Ni =1 − p i • log ( p i ), where p i is the probability of the i th mes-sage [8, 9]. If the probability of selecting each mes-sage is equal, this expression simplifies to give themaximal information a message can carry, H = P Ni =1 − /N • log (1 /N ) = log N bits. Therefore,to determine the maximal heritable information in aliving system, we need to enumerate all distinguish-able states of its bottleneck stage (i.e., N ). Thisexercise will provide a starting point for the jointanalysis of all heritable information that needs to betransmitted across generational boundaries for thereproduction of living systems. Static and dynamic storage of information
The logical requirements for self-replication havebeen explored in two-dimensional universes calledcellular automata using abstract ‘machines’ [10] .Of particular relevance are self-replicating machinesthat use the same store of information in two dis-tinct ways: (1) as instructions whose interpretationleads to the construction of an identical copy of themachine, and (2) as data to be copied without inter-pretation and placed in the copied machine. Figure 1.
Self-replicating ‘machines’ with instructionsheld in a static tape or in a dynamic tape have beenimplemented in cellular automata. (A) Implementationof John von Neumann’s design of universal constructor[11].
Top,
The universal constructor in the starting con-figuration.
Inset,
Schematic of broad regions within theuniversal constructor.
Bottom,
The 32 states of the partsthat make up the machine (see [11] for the meaning ofeach color). (B) Implementation of the Langton loop [12].
Left,
The loop in the starting configuration.
Middle,
Areplication intermediate showing use of all states.
Right,
Loops near the end of one round of replication.
Bottom,
The 8 states used for replicating the loop (see [12] for themeaning of each color). Red bar indicates scale for com-paring (A) and (B). See methods for additional details.
This scheme for making self-replicating machines Individual units in these machines are called ‘cells’, but are referred to as ‘parts’ in this article to avoid confusion withbiological cells. ∼
24 hr period in cyanobacteria [13]despite a ∼
12 hr cell cycle [14]), in the localizationsof molecules (e.g., a ∼
40 s period in cell lines thathave a ∼
24 hr cell cycle [15]), in the collective mor-phology of embryonic cells (e.g., a ∼ ∼ ∼
24 hr circa-dian rhythms in non-cycling neurons [17]).From these considerations, the following realiza-tions emerge about living systems:(1) The transmission of form and function acrossgenerations relies on many stores of information thatcycle with different periods that could each in prin-ciple range from less than the duration of one celldivision cycle to more than that of one generation.(2) The relative phases of the many cycles withinthe continuous lineage of cells between generationscreates distinct states over time such that the cellcode for the development of an organism is approxi-mated at the start of each generation.Thus, the integrated process of self-replicationcannot be artificially parsed into the static genomethat holds all the instructions to be interpreted bythe dynamic molecular machines in the cell.
Information in self-replicating machines
Consideration of the total information stored in a self-replicating machine can clarify the differentstores of information required for replication andsharpen the corresponding unknowns in living sys-tems. For example, consider the self-replicating uni-versal constructor (Figure 1A), which has a ‘machine’that has 6,329 parts with 32 states per part and usesan instruction tape that has 145,315 parts with 2states per part. The maximal information stored inthis machine could be enumerated by separately con-sidering three different stores that each have analogsin living systems: (1) the configuration or shape ofthe machine, (2) the instruction tape, and (3) theparts of the machine.The information stored in the shape of the ma-chine is incalculably large because we have to con-sider the universe of shapes from which the particu-lar assembly of parts that make the machine was se-lected (see supplemental text and Supplemental Fig-ure 1 for a proof). This information is akin to theinformation required for getting together the partic-ular collection of molecules that constitutes each cur-rent living system and has accrued since before theorigin of life along the lineage of every living sys-tem. Because the unknown information in all histor-ical environments (i.e. past available complements ofmolecules) needs to be taken into account to deter-mine what life accrued bit by bit [18], the magnitudeof this information is incalculable.The maximal information that can be stored inthe instruction tape that has N = 145 ,
315 partswith two states each is given by H = log N = N =145 ,
315 bits. This store is analogous to the lineargenome where the information is stored in the se-quence of the four bases in DNA (A, T, G, C). Forsuch a genome of length L , H = log L = 2 L bits.The maximal information that can be stored inthe machine that has N = 6 ,
329 parts with 32 stateseach is given by H = log N = 5 N = 31 , Heritable information in living systems
The spatial arrangement of the genome andeverything else within the bottleneck stage couldchange over the course of development such that sim-ilar arrangements are reached with a period of onelife cycle. As a result, molecules that are part of therecurring cell code could play different roles through-out development and defy permenant classificationbased on their roles. For example, an abundant ma-ternal RNA that is simply used as a source of nu-cleotides in the developing embryo could at a laterstage become a message that is translated into a pro-tein. Nevertheless, a temporary classification duringthe bottleneck stage is necessary to enumerate thebits of information stored in cell codes. To facilitatethis enumeration in units that are relevant for eachliving system and its environment, I propose consid-ering entities, their sensors, and the sensed proper-ties.
Entities.
An entity is a molecule or associationof molecules within a living system or in the envi-ronment that interacts with the living system. A cellcode can include entities that are measured throughinteraction with other entities sometime during thelife cycle and also entities that are never measured,which can be considered as byproducts made by theprocesses of life. Such effectively inert and unmea-sured entities could nevertheless non-specifically con-tribute to molecular crowding at the bottleneck stageand thereby affect interactions among other enti-ties. While the number of all molecules in a cell islarge but countable, the combinatorial associationsof molecules could make the total number of effec-tive entities ( N ) larger still. Cellular componentsthat are entities or parts of entities include smallmolecules such as ATP, water, ions, metabolites, etc.,for which perhaps only concentrations are discernedby sensors, and macromolecules such as individualproteins, RNAs, polysaccharides, etc., for which con-centrations, sequences, and conformations may all bediscerned by sensors. Sensors.
A sensor is an entity or an associationof entities within the living system that responds tochanges in other entities with changes in its proper- ties such that these changes can result in subsequentchanges in the rest of the living system or its envi-ronment. A sensor could sense entities within thesystem ( N total) or in the outside environment ( O total) that interacts with the system (e.g., salts, nu-trients, etc.). An entity that binds or collides withanother entity without any specific downstream con-sequences is not considered a sensor (e.g. one wa-ter molecule bumping onto a membrane). An entitycould be a part of multiple sensors. For example, aprotein complex formed by the association of A, B,and C proteins could be detecting and respondingto the concentration of ATP while another proteincomplex made of A and C could be detecting GTP.Conversely, multiple sensors could be measuring thesame entity. For example, the many kinases in thecytosol are all potentially sensitive to the levels ofa common pool of ATP. By these definitions, ATPitself can be a sensor because its levels change in re-sponse to production by a synthase and this change iscommunicated to the kinases that respond to changesin ATP levels. All sensors are entities, but not all en-tities are sensors. Properties.
A property is an attribute of an entitythat is relevant for a living system because a sensorexists that can respond to changes in the values ofthat attribute. The number of different values for aproperty of an entity depends on the sensor and onthe regulatory constraints of the system. Considertwo sensors that can detect changes in the numberof molecules of a particular RNA: a protein Lo re-sponds when the numbers increase by 10 and a pro-tein Hi responds when the numbers increase by 100.These two proteins would thus each ‘see’ differentnumbers of measurable units for the same property(number of molecules) of the same entity. However,not all detectable values for a relevant property of anentity could be attainable because of the regulatoryconstraints of the system. For example, if the RNAaccumulated in steps of 50 molecules at a time, thenmany of the values measurable by Lo are never avail-able in the living system because the system changesin steps that are larger than the measuring step ofthe Lo sensor.
Environment.
Organisms develop as open sys-tems interacting with the environment. Therefore,some entities in the environment are measured andreacted to by the living system throughout develop-ment. Even for a constant environment, some enti- For simplicity, the term molecule is used to refer to everything found in a living system that is chemically isolatable suchas ions, atoms, and chemically bonded collections of atoms, and is extensible to all factors that remain to be discovered.
Forces.
Living systems can generate and be ex-posed to many kinds of forces, which are not beingexplicitly considered in the framework for heritableinformation developed here. Rather, they are im-plicitly accounted for in the properties of entities.For example, an entity experiences/exerts gravity be-cause of its mass, electrostatic attraction because ofits charge, tension because of its elasticity, etc. Thus,heritable information captured using entities, sen-sors, and properties can account for relevant forcesin the living system or in the environment.
Configurations.
The number of ways in whichmolecules can be arranged in the bottleneck stagesuch that they can be distinguished by the systemprovides an upper bound for the information thatcan be stored in a cell code, which is the subset ofconfigurations that are nearly reproduced during thebottleneck stage of successive generations. The max-imal number of such distinguishable configurationsof a living system for a given number of interactingentities in the environment is given by the productof the number of possible genome sequences and thenumber of possible systems and their coupled envi-ronments that can support each genome sequence.Assuming that each system-environment combina-tion generates one characteristic set of unmeasuredentities that contribute to crowding effects, the num-ber of distinguisable configurations for a living sys-tem and its environment during the bottleneck stage( C tot ) is given by: C tot = genomes × system-environments C tot = X L ( B X i =1 E i ( S i X j =1 S j ( P j X k =1 P k ))) (1) X = number of types of bases in the genome. L = length of the genome in base pairs. E = measured entity (total B in the bottleneckstage: N b in system, O b in environment). S = measuring sensor (total = S i for i th entity).= f ( Y ), where each Y ⊆ { E , E , ...E N } , i.e. aconfiguration of entities, N per life cycle. P = attainable and measurable values of property(total = P j for j th sensor of i th entity).This Entity-Sensor-Property framework enumer-ates all distinguishable configurations as a productof four terms that encapsulate the maximal num-bers of distinct states in two stores of information: X L enumerates all possible genome sequences, whichare replicating stores of heritable information, and P i E i P j S j P k P k enumerates all potentially recur-ring arrangements of interacting molecules, which arecycling stores of heritable information. Such enumer-ation without considering rearrangements of chemi-cal bonds within any molecule can be thought of asbiological entropy and is less than the chemical en-tropy of an organism, which was initially estimatedallowing for rearrangements of chemical bonds to be ≈ . × bits for E. coli [19].It is clear that the replicating store cannotuniquely predict the cycling store as evidenced bymost distinguishable cell types of the human bodyall having the same genome sequence. However,interdependence of the two stores and compatibil-ity with the perpetuation of life reduces this max-imal number of distinguishable states of the bot-tleneck stage. In other words, fewer configurationscan act as heritable cell codes (
Cell Code tot < C tot )because of mutual constraints between the arrange-ment of molecules and the genome sequence in liv-ing systems. First, some genome sequences may notbe sufficiently complex to support any living sys-tem (e.g. a genome of all As, all Gs, all Cs, orall Ts.). Second, each genome sequence constrains but does not dictate the number and kinds of en-tities that could be part of any cell that containsthe genome (e.g. DNA sequence constrains RNA se-quence, which constrains protein sequence). Third,the genome sequence may also constrain the totalnumber of possible arrangements of molecules withinany cell - i.e., the number of cell states and cell types- in a given environment. Fourth, the lineage of cellsthat connects two generations may be incapable ofsupporting some cell types because of the need toreturn to the cell code at the start of each genera-tion within the context of a living system (i.e., some5ifferentiated cell types may be irreversible withinthe context of the living system although many canbe transformed into pluripotent stem cells in vitro[20]). The number of all possible cell codes, however,is likely greater than that seen in evolved organismsbecause the historical process of evolution is not ex-pected to allow exploration of every cell code (i.e., C tot > Cell Code tot > Cell Code evol ).Cell codes of varying complexity have evolvedover time to specify the development of each organ-ism that has ever existed [6]. Cell codes could inprinciple differ in the relative amounts of informationstored in the genome sequence versus in the arrange-ment of molecules. The interdependence of these twostores of information invite exploration of the rela-tionship between their storage capacities over evolu-tionary time. Consider the consequences of addinginto a pre-existing cell code a newly evolved genesequence that codes for a protein. The number ofpossible genome sequences of a given length that cansupport this cell code decreases because fewer dis-tinct genomes can include the gene sequence for thenew protein (i.e., total sequences becomes less than X L , Supplemental Figure 2). However, the num-ber of distinguishable arrangements of all moleculescan either increase or decrease. Increase can oc-cur because addition of the new DNA sequence, thetranscribed RNA, and the translated protein to thecontents of the cell could all lead to new interac-tions with pre-existing molecules (i.e. E , S , and P could all increase, resulting in a larger value for P i E i P j S j P k P k ). Decrease can occur becausethese new molecules could constrain the arrangementthrough regulatory interactions (see section titledEntity-Sensor-Property: insights). Furthermore, themagnitude of changes in P i E i P j S j P k P k dependson the nature of the new gene product (e.g., expres-sion or repression of many gene sequences by a tran-scriptional activator or repressor, respectively, couldlead to large changes). Studies on the origin and evo-lution of information storage could illuminate trendsin the partition of heritable information between dif-ferent molecular stores and lead to general principles(for related views emphasizing arrangement see [21,22], genome sequence see [23, 24], anatomy see [25],and energy see [26]). For example, the complexity ofcell codes, and thus organisms, may have increasedthrough restriction of the genome sequence alongwith expansion of the arrangement of molecules assources of neutral or adaptive variation. Entity-Sensor-Property: extensions
Several processes in living systems could limit orexpand the number of arrangements in the bottle-neck stage ( P i E i P j S j P k P k ). Processes that canchange the information content of cell codes by de-creasing (e.g., self-organization and self-assembly),increasing (e.g., chemical modification) or variablychanging (e.g., compartmentalization) entities, sen-sors, and/or properties are being actively analyzed.Living systems could manipulate heritable informa-tion through the regulation of all such processes. Impact of self-assembly and self-organization.
Order can arise through the spontaneous associa-tion of molecules in living systems. Two forms ofsuch spontaneous order have been recognized: (1)self-assembly, which refers to the formation of staticstructures that are relatively stable (e.g., viruses,flagella)[27]; and (2) self-organization, which refersto the formation of dynamic structures that ap-pear stable (e.g., cytoskeleton, endocytic compart-ments) [28]. Both forms of order, however, dependon the immediate molecular environment. There-fore, changing the surroundings of a ‘self-assembled’or ‘self-organized’ structure can result in alterna-tive configurations that may be distinguishable byevolved sensors. For example, cells can use an adap-tor protein to modulate the size of vesicles that formthrough self-assembly [29] and cells can respond topressure by reversibly disassembling the mitotic spin-dle that is maintained through self-organization [30,31]. In this way, living systems can store and retrieveinformation from self-assembled and self-organizedcollections of molecules.
Impact of chemical modifications.
Modificationsof nucleic acids ( m C, hm C, m A, etc.) or proteins(phosphorylation, methylation, glycosylation, etc.)result in new entities with properties that could po-tentially be measured by sensors. Modified bases onthe genome could increase the number of possiblespatial arrangements of the genome and its bind-ing partners (i.e., E , S , and P in equation(1)), andcould also increase sequence information (i.e., X inequation (1)) if the modification alters base-pairing.Modifications on RNA or proteins on the other handcould either increase or decrease E , S , and P , butalways reduce the maximal number of genomes of agiven length that could support such a modificationbecause each possible genome would be constrainedto include the gene sequence for the enzyme thatcatalyzes the modification (i.e. total sequences be-come less than X L ). Similar considerations hold formodifications of all other molecules in the bottleneck6tage. Impact of compartmentalization.
Living sys-tems dynamically manipulate which entities come to-gether into organized units and which outputs fromthese units are subsequently measured. When differ-ent subcellular compartments form, the same entityor sensor could be present in two or more differentcompartments. If two such pools of the same entityare sensed separately during the life cycle of an or-ganism, the total number of possible configurationsare effectively increased. Alternatively, many differ-ent entities could be encapsulated into one compart-ment. If only a few aggregate properties of the com-partment are sensed during the life cycle of an organ-ism (e.g. droplet sizes of phase-separated aggregatessuch as RNA granules [32] or numbers of organellessuch as mitochondria), the number of distinguishableconfigurations are effectively reduced.These different ways of changing entites, sen-sors, and properties highlight the multiscale natureof living systems and suggest the utility of differ-ent Entity-Sensor-Property frameworks at differentscales and across scales.
Entity-Sensor-Property: insights
To appreciate some implications of the frame-work, consider a toy model where the genome se-quence and the environment are held constant (Fig-ure 2).Let the remaining contents of a ‘cell’ include threeentities ( E , E , E - three english letters) that canbe at four different states (two fonts with upper andlower cases) and be sensed by two sensors ( S mea-suring lines and S measuring curves). Each stateis analogous to different experimentally measurablevalues for a property of molecules in a cell (e.g., con-centration, localization, shape, charge, etc.). Con-sider the entity E in state ‘ A ’ made of three straightlines. A sensor that measures lines could measure oneof numerous possible properties: thickness of lines,color of lines, length of lines, etc. For simplicity,let number be the only property P sensed by both S and S . For example, the value of the propertysensed by S of E in state ‘ A ’ is 3 and that sensed by S of E in state ‘ c ’ is 1 (see Figure 2 for all values). Sensors can limit information storage.
To calcu-late the relevant information stored in a system, weneed to enumerate the number of different states ofthe entities sensed by the system (Figure 2).
Figure 2.
Distinguishable states in a toy model of pos-sible cell codes with a given ‘genome’ and ‘environment’.Three entities ( E , E , E ), two sensors ( S , S ), and onesensed property ( P ) are considered. The measurableproperty values of each entity by each sensor is enu-merated ( E S P, E S P, ... ). Each distinguishable set ofproperty values for all entities defines a distinguishablestate. Therefore, the number of distinct elements in aset of the measured values (i.e | E S P | , | E S P | , ... ) canbe used to calculate the total number of distinguishablestates in the system (4 × × × × × × × × × × While each sensor can sense one property of eachentity, different states of an entity may not alwaysbe distinguishable by a sensor (e.g. S will measurethe states ‘ C ’ and ‘ c ’ of E as 0 and S will measureboth as 1). Such indistinguishability is evident in liv-ing systems as the requirement for threshold levels ofa signal for a detectable response. Thus, ‘threshold-ing’ by the sensor results in a reduction in the totalnumber of states of the system and thus the storableinformation (Figure 2). For example, this system of E , S , and P can only distinguish 288 states ( ≈ Distinct states may be equivalent.
Selection canimpose external constraints on the form and functionof a living system. For example, the environment ofthe system in the toy model might require ‘cells’ withconsistent fonts and case for survival. This would re-sult in the survival of cells with ‘
ABC ’, ‘ abc ’, etc. asthe values for each of the three entities (Supplemen-tal Figure 3). However because there are two possi-ble cases (upper vs. lower), there would be two dis-7inguishable states that are effectively equivalent forsurvival in this environment. Such situations couldresult in unregulated redundancy such that similarfunctions are performed by different molecules in ran-dom sets of cells [33]. Over evolutionary time scales,this type of unregulated redundancy could result inorganisms with similar form and function but dif-ferent underlying molecular mechanisms [34]. Theseconsiderations also hold when such equivalency is im-posed by sensors that fail to distinguish different en-tities. For example, a channel protein respondingto changes in membrane potential would measurechanges in different ions as equivalent as long as theend result was a similar change in potential [35].
Figure 3.
Regulation reduces the number of states thatcan be sensed by the system. (A) Two states for eachentity - high (upper case) or low (lower case) - were con-sidered for exploring the impact of regulation in the toymodel. (B) Consequences of introducing regulatory con-straints. Inhibition (bar) or activation (arrow) betweenall pairs of entities were considered. Matrices of distin-guishable values (product of all = number of states) foreach cell with regulatory interactions between E and E ( top ), E and E ( middle ), or E and E ( bottom ) areshown. Different regulatory constraints result in differ-ential reduction in the number of states of the system. Regulation reduces sensed states.
The differentstates of each entity could be classified as high (up-per case) or low (lower case) to simplify the analysisof regulation in the system (Figure 3A). This simplifi-cation is similar to Boolean networks that have beenused to explore the impact of regulation [36]. Alladditions of either activation or inhibition as a reg-ulatory interaction between two entities reduces the number of distinguishable states in the system (Fig-ure 3B). This reduction occurs because any regula-tory interaction between two entities couples changesin those entities. As a result, two entities that werepreviously free to vary independently become eitherdirectly or inversely correlated, leading to an over-all reduction in the number of possible states. Dif-ferent regulatory architectures can lead to differentstates with equivalent capacity for information stor-age. Specifically, 12 different single regulatory inter-actions in the toy model lead to only 3 different stor-age capacities - 96, 216, or 256 states (Figure 3B).Adding two regulatory interactions results in all 36different regulatory architectures having only 96 dis-tinguishable states (Supplemental Figure 4). Theseresults suggest a preliminary conclusion: regulationreduces the ability of systems to store information inthe arrangement of molecules.
Reducing states may promote robustness.
Ro-bustness is the ability of living systems to remainsimilar despite some variation introduced by envi-ronmental or internal conditions [37]. In other words,some changes either do not alter anything about arobust system or can alter some entities but never-theless do not substantially affect the system. Thedifferences in the number of states in cells with dif-ferent regulatory architectures (Figure 3B) suggesta relationship between regulation and robustness ofcell types. Unlike in the toy model, in living sys-tems, all sensors are made from entities (equation(1)). Therefore, cell types with fewer states could bemore robust because they are only capable of sensing,and thus responding to, fewer perturbations. Con-versely, cell types with many states could be lessrobust because they are capable of sensing and re-sponding to many perturbations. Changes in regula-tory architectures could therefore be used to generatecell types that are differently responsive to externalsignals, which may have implications for the observedrobustness of development [38]. To achieve such ro-bust development, entities need to be assembled intocell codes such that the same sequence of events un-folds despite some perturbations. Storing entities asperturbation-resistant assemblies or combining enti-ties that fail under some conditions with entities thatfail under other conditions (redundancy) are possi-ble ways to ensure robust cell codes and subsequentdevelopment. An additional possibility suggested bythese observations is reducing the number of sensorsthrough increased regulation such that some pertur-bations are simply not sensed.8 wo-base genomes could be part of efficientliving systems
Our current ability to exquisitely edit genomesand transcriptomes [39, 40] is a limited manipulationof living systems in that the outcome of the edit is en-tirely determined by how the living system interpretsthe change. In other words, we can make changes toa sequence and read out what the living system doeswith the changed sequence but we cannot yet makechanges that instruct a living system to perform ar-bitrary tasks. Such expanded manipulation could re-quire ways of increasing the complexity of the storedheritable information. As suggested by equation (1),this increase could be achieved by either increasingstorage in the genome sequence or by increasing stor-age in the arrangement of molecules. Increases inthe storage capacity of a genome by increasing thenumber of different bases will require concomitentincreases in the complexity of the machinery for ac-curate reading and writing of the genome. For ex-ample, a 16-base genome of length L has four timesthe capacity of a 2-base genome (4 L bits vs L bits). Figure 4.
Increases in the types of base pairs in agenome increase the maximal information that can bestored per base but decrease the mean variation in baseavailability that can support the high information stor-age. Plot showing how maximal information ( H ) of abase pair varies with the standard deviation ( σ ) of baseprobability. For each n-base system (n = 2, 4, 6, 8, 10),the results for one million base probabilities drawn forma uniform distribution are plotted. However, such a genome would require machin-ery for discerning eight times as many kinds of bases.Furthermore, the range of availabilities of bases thatcan support the enhanced information storage de- creases with an increase in the number of differentbases (Figure 4).Perhaps the simplest route to the synthesis of liv-ing systems with arbitrary capacity to store heritableinformation would be to use a longer 2-base genomethat is equivalent to the natural 4-base genome (e.g.needs 5-base codons for encoding at least 20 aminoacids, 2 > > ), but can be supported bysimpler machinery in the cell to read and write thegenome. Testing the practicability of this specula-tion requires systematically changing the chemistryof the genome and the cell while preserving overallstorage capacity. Discussion
By jointly considering all information transmit-ted from one generation to the next using molecules,I have developed an expanded view of heredity (seesupplemental text for other applications). Heritableinformation stored outside the genome sequence islimited by mutual constraints with the sequence, byregulatory architectures, and by what a living systemcan sense about itself and its environment.
Strengths and limitations of framework.
Entitiesand sensors in a cell were parsed based on their rolesat a particular time in development - the bottleneckstage. However, the roles of entities and sensorsare potentially interconvertible over time. A sensorcould become a unresponsive entity for a while andan entity could become a responsive sensor when itencounters another appropriate entity. Such changesin roles are likely part of the changes during devel-opment that lead to the assembly of cell codes at thestart of each generation. Given this time-bound na-ture of entities and sensors, what is the duration ofa bottleneck stage? This question is currently verydifficult to answer and poses a practical problem forunambiguously defining the cell code of an organ-ism. Nevertheless, the stability of cell types suggeststhat functionally important states are preserved forsignificant periods through homeostasis.The framework presented here does not accountfor the stochastic and noisy nature of all interac-tions within a cell. For example, there are funda-mental limits to control that result from informationloss [41] and the physical limits of biochemical sig-naling [42, 43]. Unlike in man-made communicationsystems, the presence of numerous simultaneous sig-naling pathways in living systems - including as yetunknown pathways - makes it unclear whether anyobserved variation in one signaling pathway shouldbe characterized a priori as interference from another9ignaling pathway or as noise. Nevertheless, develop-ing an understanding of heredity in terms of genomesequence, entities, sensors and properties is a firststep towards future extensions of the framework thatcould address these issues.Some past frameworks for analyzing living sys-tems provide conceptual structures for explainingtheir evolution and behavior but do not inform theirconstruction or origin. Models that analyze evolu-tionary outcomes regardless of the material basis ofgenotype and phenotype (e.g., ref. [44]) are usefulguides for the analysis of organisms at the popula-tion level but not for the construction of organismsfrom molecules sought here. Phylogeny, architecture,and adaptation have been combined to understandtrends in the evolution of form [45], but such mod-els are currently not fine-grained enough to enableconstruction. The productive analysis of complexsystems by partitioning a system into abstract nodesand edges to view particular aspects of living sys-tems as networks [46] has generated intuitions andapproaches that could be extended to the frameworkpresented here. Such extension beyond abstract net-works is necessary to enable the construction of livingsystems because typical abstractions do not incorpo-rate all relevant properties of cellular contents. Theexplicit consideration of relevant properties for all en-tities that are measured by sensors in the frameworkpresented here could help in accruing knowledge in aform that is useful for the construction of living sys-tems and for the realization of a practical systemsbiology [47, 48].
Synthesis of living systems.
Building somethingusing its constituent parts is a good way to discoverthe flaws in our understanding of how it is put to-gether. For example, it is currently unclear if perfect self-replication ever occurs in living systems. Per-haps the perpetuation of life is always associatedwith having entities that are not recreated with aperiod of one generation but rather with longer orshorter periods. For example, when the noisy andvariable behavior of a synthetic oscillating circuitin
E. coli [49] was improved to obtain synchronouslong-term oscillations [50], the period of oscillationincreased to 14 generations. Such possibilities canbe explored by allowing different generation timesfor the precise recreation of some entities and ar-rangements in the cell code. The similarity in formand function of parent and progeny, however, sug-gests that the cell codes recreated with a period ofone generation are at least nearly equivalent for spec- ifying development in each generation.Evolved cell codes are unlikely to be efficientstores of heritable information because of the histor-ical measures and counter-measures through whichevolution proceeds [51, 52]. Efficient storage of themutual information between two variables can beachieved using a compressed bottleneck variable [53,54]. If there was selection for effectively packingmaximal information into the bottleneck stage in liv-ing systems, the entities and arrangements of evolvedcell codes could similarly be efficient stores of the mu-tual information between the past and the future.All such efficient cell codes might have similar char-acterstics as observed in cellular automata in whichthe capacity to support computation emerges (cap-tured in the λ parameter in [55]). Despite the pos-sibility of such overall optimization, it is unclear ifliving systems can evolve to maximally optimize in-formation storage and/or transmission for a particu-lar trait. In fact, it might be difficult to define whatthe ‘optimum’ is for a process because the presenceof many homeostatic mechanisms in cells, includingtransgenerational homeostasis [6], require opposingprocesses that could limit optimality. Experimentalapproaches that attempt to generate minimal bac-terial cells [56] need to be extended to different or-ganisms to discover how the complexity of organismsscales with their cell codes.Making efficient living systems of arbitrary com-plexity requires a holistic approach to informationhandling. The joint cosideration of all heritable in-formation presented in this article suggests that agenome with two different kinds of bases might func-tion as an efficient replicating store when combinedwith the simplest possible cycling stores (Figure 4).Thus far, experimental approaches to fundamentallychange heritable information have focused on in-creasing the storage capacity of the genome. A 50%increase in the storage capacity of DNA sequencecan be achieved by doubling the number of differ-ent bases in DNA [57]. Furthermore, an organismthat uses a 4-base genome can be modified with twoadditional DNA bases to successfully store [58] andretrieve [59] information. In contrast, we cannot yetengineer such increases in the information stored bythe arrangement of molecules because our knowledgeof this store of heritable information is in its infancy.The theoretical and practical limits of varying all heritable information deserve exploration to under-stand the evolution of natural, modified, and syn-thetic living systems.10 cknowledgements I thank Tom Kocher, Karen Carleton, CharlesDelwiche, David Wolpert, Chris Kempes, MichaelLachmann, Artemy Kolchinsky, James Yorke, DanielDamineli, Pierre-Emanuel Jabin, Katerina Ragk-ousi, David Jordan, KP Mohanan, LS Sashidhara,Sudha Rajamani, Jyotsna Dhavan, Michael Levin,Alejandro S´anchez Alvarado, Ajay Chitnis, and Vic-tor Ambros for discussions and encouragement; andTom Kocher, Karen Carleton, Ken Helland, mem-bers of the Jose lab, and an anonymous reviewerfor comments on the manuscript. Research in mylab is supported by the NIH (R01GM111457 andR01GM124356).
References
1. Tkaˇcik, G. & Bialek, W. Information Processing inLiving Systems.
Annual Review of Condensed Mat-ter Physics Journal of Theoretical Biology
Nature
Systemsand Synthetic Biology Philosoph-ical Transactions of the Royal Society B
BioEssays
IEEE Transactions on Information Theory
The Bell System Technical Journal
Elements of infor-mation theory (Wiley-Interscience, Hoboken, NewJersey, 2016).10. Sipper, M. Fifty years of research on self-replication:an overview.
Artificial Life Artificial Life Physica D
Sci-ence
Pro-ceedings of the National Academy of Sciences, USA
Journal of Cell Biology
Current Biology
Nature Reviews Neuroscience
PLoS Biology e1001323 (2012).19. Morowitz, H. J. Some order-disorder considerationsin living systems.
Bulletin of Mathematical Bio-physics
Cell Stem Cell
Journal of the Royal Society Interface
Wiley Interdisci-plinary Reviews: Systems Biology and Medicine, c1410 (2018).23. Lynch, M. The frailty of the adaptive hypothesis forthe origins of organismal complexity.
Proceedings ofthe National Academy of Sciences, USA
Nature
BioEssays
Journal of the Royal Society Interface
7. Kushner, D. J. Self-assembly of biological struc-tures.
Bacteriological reviews,
Self-organization innonequilibrium system. (John Wiley & Sons, Inc.,New York, New York, U.S.A., 1977).29. Zhang, B. et al.
Synaptic vesicle size and numberare regulated by a clathrin adaptor protein requiredfor endocytosis.
Neuron
Journal of Cell Biology
Journal of Cell Biology
Science eaaf4382 (2017).33. Ravikumar, S., Devanapally, S. & Jose, A. Gene si-lencing by double-stranded RNA from C. elegansneurons reveals functional mosaicism of RNA inter-ference.
Nucleic Acids Research
Evolution and Development BioEssays
Journalof Theoretical Biology
Cell
Nature
Science
Nature Reviews Genetics
Nature
Biophysical Journal
Proceedings of the Na-tional Academy of Sciences, USA
Proceedings of the National Academy of Sciences,USA
E1940–E1949 (2014).45. Briggs, D. E. G.
Evolving form and function: fossilsand development. Proceedings of a symposium hon-oring Adolf Seilacher for his contributions to pale-ontology, in celebration of his 80th birthday (A Spe-cial Publication of the Peabody Museum of NaturalHistory, Yale University, New Haven, Connecticut,U.S.A., 2005).46. Barab´asi, A.-L. & Oltvai, Z. N. Network biology:understanding the cell’s functional organization.
Nature Reviews Genetics Sci-ence
Nature
Nature
Nature
Science
Proceedingsof the Royal Society B arXiv. doi: arXiv :physics/0004057[physics.data-an] (2000).54. Kolchinsky, A. & Wolpert, D. H. Semantic informa-tion, autonomous agency and non-equilibrium sta-tistical physics.
Interface Focus Phys-ica D et al.
Design and synthesis ofa minimal bacterial genome.
Science aad6253(2016).57. Hoshika, S. et al.
Hachimoji DNA and RNA: Agenetic system with eight building blocks.
Science
8. Zhang, Y. et al.
A semisynthetic organism engi-neered for the stable expansion of the genetic alpha-bet.
Proceedings of the National Academy of Sci-ences, USA et al.
A semi-synthetic organism thatstores and retrieves increased genetic information.
Nature framework for parsing heritable information Antony M Jose
Department of Cell Biology and Molecular Genetics,University of Maryland,College Park, MD 20742, USA
Supplemental Material
Text, 4 Figures and Figure Legends, Methods, and References. a r X i v : . [ q - b i o . O T ] M a r upplemental Text Variations on the single-cell bottleneck
The life cycles of many multicellular organisms include a single-cell stage that limits the transmission ofheritable information. However, the amount of information transmitted through this bottleneck stage couldvary based on the ecology and developmental strategy of particular organisms.Growth in a predictable environment that is stable for many generations may reduce the information thatneeds to be transmitted through the bottleneck stage. The reliable association of microbiomes and othersymbionts in each generation could similarly facilitate the reduction of information transmitted throughthe bottleneck stage. At an extreme, viruses and parasites effectively form joint systems of heredity anddevelopment with the organisms they infect.Developmental strategies can impact the temporal and spatial reach of the information that is transmittedacross generations. For example, in a female human fetus, the germ cell precursors that will generate theoocytes have already differentiated, which could facilitate communication from the pregnant mother to theunborn grandchild through shared circulation. Such expansions of the bottleneck stage may increase thecomplexity of heritable form and function.While the framework for all heritable information developed in this study is presented for the common caseof a single-cell bottleneck between generations, it applies for all alternative configurations of the bottleneckstage.
Cell codes are assembled during development
The form and function of organisms are mostly preserved from one generation to the next. This preser-vation requires that development begin in successive generations with similar genome sequences and similararrangements of regulatory molecules. While the genome is copied during every cell division, the arrange-ment of molecules presumably changes during development but returns to a similar configuration after onelife cycle. These recurring arrangements therefore are cycling stores of heritable information that along withthe genome sequence form cell codes for the development of organisms in each generation [1].All cells accumulate molecules using building blocks from the environment in ways that depend on pre-existing molecules within cells. This dependence on prior state means that the current ‘phenotype’ of a cellis determined by the ‘genotype’ and the pre-existing phenotype of the cell. The observed similarity betweenorganisms of successive generations implies that the bottleneck stage needs to have molecules arranged insuch a way that similar temporal sequences and spatial patterns ensue during development in successivegenerations. In other words, cell codes encode both temporal and spatial order in spatial arrangementduring the bottleneck stage.Different collections and configurations of entities within a cell can arise through differences in physicaland chemical processes that include: (1) the temporal sequence of binding or chemical reactions; (2) therelative rates of different reactions; (3) the confinement of reactions; (4) the amplification of biases that arisefrom intrinsic noise; (5) the addition of external entities; and (6) the destruction or secretion of entities. Thearrangement of entities in a cell at any moment is an integrated consequence of the historical values of suchdifferences leading up to that moment. An organism can therefore assemble the information for making asimilar organism in the next generation by controlling such processes throughout development such that itsbottleneck stage contains a well-configured cell code - a spatial representation of the past ready to shape thefuture.
Information content of a shape
To attempt calculating the information in a particular shape, we need to make assumptions about theuniverse of shapes from which that shape is drawn. For example, in a two-dimensional cellular automataenvironment, we could assume that the sets of parts from which the shape is formed are contiguous (i.e.,each part shares at least one side with another part) and that rotations of shapes by multiples of 90 degreesare allowed. Given these assumptions, let S , .., S n be sets of parts that can each form one and only onetarget shape within it. The total number of objects of all sizes and shapes that could be formed using onesuch set of S i parts is given by 2 S i . Let O i be each such set of 2 S i objects (i.e. | O i | = 2 S i ). The number of2ll uniquely shaped objects in all such sets combined is given by | O ∪ O ...O n − ∪ O n | = U - the maximalnumber of unique objects aggregated from universes that could each contain an object with the target shapeonce and only once. The maximal amount of information in the target shape is therefore given by H = log U bits. Three simple cases illustrate how this number scales with the complexity and size of the shape in thecellular automata environment (Supplemental Figure 1).For a target shape made of 1 part, U = 2 and H = log (2) = 1 bit. For a target shape made of 2 partsthat are next to each other, U = 4 and H = log (4) = 2 bits. However, for a target shape made of 3 partsthat are next to each other in a row (or column), U = ∞ because an infinite number of parts that zig zagsuch that they only share a side with one row part and one column part could belong to the universe fromwhich the shapes are drawn. Similarly, for all machines made of n > n parts can be constructed to make U = ∞ . Thus,for the universal constructor made of 6,329 parts that are arranged into a complex shape (Figure 1A), theinformation content is incalculably large. Configurations of a gene sequence and its regulators
Basic principles underlying the assembly of entire cell codes may be discoverable through reductioniststudies on a few units of heredity. These studies would aim to discover how a unit of heredity is configuredin one bottleneck stage, how that configuration changes during development, and how the starting configu-ration is recovered by the next bottleneck stage. More than a century of experimental analyses support theusefulness of analyzing units of heredity [1], which were initially called cell elements [2] and can be thoughtof as having two parts: (1) a gene sequence transmitted between generations as part of the genome sequence;and (2) gene regulators transmitted between generations as part of an arrangement of molecules. The preciselimits of a gene sequence have proven to be difficult to establish [3, 4] and the precise configuration of all regulators for a given gene is likely to be similarly difficult to establish. Nevertheless, formulating a unit ofheredity that is associated with a gene sequence as a cell element is useful as a practical framework for reduc-tionist studies. Such studies do not need to analyze all entities and their sensors - indeed this is impractical.Instead, subprocesses like transcription, translation, protein localization etc. could be partitioned and thenthe rest of the entities, their sensors, and sensed properties under study could be analyzed to determine ifthey are sufficient to account for the observed heritable phenomena. To facilitate such reductionist analyses,a gene sequence and its regulators could be parsed into a provisional entity-sensor-property system and otherdistantly interacting entities could be considered as part of the ‘environment’. After such simplification, theconfigurations that this provisional entity-sensor-property system can distinguish is given by: c tot = x l ( b X i =1 e i ( s i X j =1 s j ( p j X k =1 p k ))) x = number of different types of bases in the gene. l = length of the gene sequence. e = measured entity (total b in the bottleneck stage: n b in system, o b in environment). s = measuring sensor impacting regulation of the gene sequence (total = s i for i th entity).= f ( y ), where each y ⊆ { e , e , ...e n } , i.e., a configuration of entities, n per life cycle. p = attainable and measurable values of property (total = p j for j th sensor of i th entity).Cell elements that encode the expression patterns of gene sequences would therefore be subsets of c tot that are recreated in successive generations. Progressive application of this framework by considering largersystems successively could provide a principled approach for combining cell elements into cell codes. Other applications of the Entity-Sensor-Property framework
The three parameters - entities, sensors, and properties - introduced here for measuring cycling storesof information may be applicable to broader non-biological classes of heritable information. As an extreme3xample, the persistence or evolution of ideas among groups of people could potentially be analyzed similarly.For ideas (or memes [5]) to be transmitted through a book (entity, E ), the book needs to be read by a person(sensor, S ) and its meaning (property, P ) understood. A reader who writes with or without changing theideas in the original book is effectively transmitting information across one ‘generation’. (Perhaps, the manyunread books are akin to the unmeasured molecules that crowd. Such crowding could narrow the focusof the reader on a few books and potentially change the nature of the books that are written.) Collectiveanalysis of many such transmissions through books and other media may provide insights into the origins ofa culture or zeitgeist.The homeostatic preservation of cell codes in successive generations that living systems achieve usingtheir entities, sensors, and properties could inform the design, analysis, and construction of other complexadaptive systems. To apply the Entity-Sensor-Property framework, the information content in a complexsystem needs to be parsed in terms of these three parameters. Organizations, economies, social networks,ecosystems etc., may all be amenable to such parsing. Indeed, similar sensor-based detection and controlhas been proposed even for the analysis of human behavior [6]. In cases where the constituent parts of asystem are not known or are unknowable [7], simulations exploring a variety of possible entities, sensors, andproperties could help constrain hypotheses.In summary, the framework developed here for heritable information in living systems is applicable acrossmany scales and therefore may be a generally useful lens for viewing other persistent adapting systems.4 upplemental Figures and Figure Legends Supplemental Figure 1.
The maximal information content in non-trivial shapes is incalculably large because thenumber of universes that can be constructed to contain a target shape once and only once is infinite (see supplementaltext for details). upplemental Figure 2. Conserved bases among a set of sequences reduce the capacity for storing new informationin any one sequence. (A) Sequence bias measures the reduction in the capacity to store information. The bias at anyposition in a 4-base genome is given by B i = H max − H i = 2 − H i , where H max and H i are the maximal and observedstorable information at a position i . For a gene of length l , the total bias is given by B = P i B i = l • log − P i H i =2 l − H . (B) Sequence bias reduces available space for storing new information. Left , A set of aligned sequences madeof 4 bases with varying degrees of conservation at individual positions.
Right , Bits of bias at each position ( B i ) in theset of sequences depicted as a sequence logo [8] using weblogo [9] without small sample correction. Thus, greater theconservation among a set of genomes from different organisms, fewer the bases available for storing new informationin the genome that distinguishes each organism. Supplemental Figure 3.
Distinct internal configurations of the toy model may be seen as equivalent by selection.With the two sensors (left) of the toy model, selection for uniform font & upper case (middle) or selection for uniformfont & lower case (right) would result in distinct internal states of the cell becoming equivalent. Numbers in matricesare the property values of the contents of the cell in different states (
ABC , abc , etc.) as measured by the two sensors. upplemental Figure 4. Impact of two regulatory constraints on the toy model. Different inhibition (bar) oractivation (arrow) interactions between all three entities were considered. Four different relationships between thethree entities can arise when two interactions are added (9 regulatory architectures each). In all 36 architectures,there are only two sets of values that all entities can take (e.g., when E → E → E , then either all entities have theuppercase values or all entities have the lowercase values). This results in the detection of fewer states by each sensorsuch that every architecture results in the same two matrices of sensed values. Calculating the sum of the productsof the elements in each matrix gives a total of 96 states. ethods Calculations of states for the toy model of an Entity-Sensor-Property system were performed by hand.Distributions of H and σ were calculated for one million samples of varying probabilities of base compositionusing the code below and plotted in R [10].S = 0 ; H = 0 ; data = l i s t ( ) for ( j i n seq ( 2 , 1 0 , 2 ) ) { for ( i i n 1 : 1 0 0 0 0 0 0 ) { v a l s = matrix ( runif ( j , 0 , 1 ) , nrow = 1 , ncol = j )pr ob s = sapply ( v a l s , FUN = function ( x ) x /sum ( v a l s ) )S [ i ] = sd ( pr ob s )H[ i ] = prod ( − sum ( sapply ( probs , FUN = function ( x ) prod ( x , log2 ( x ) ) ) ) ) } data [ [ j ] ] = data . frame ( S ,H) } For Figure 1, images were captured from simulations created in the Golly application [11]. Figure 1Adepicts an implementation of von Neumann’s self-reproducing universal constructor at the starting stage.This machine is a modification of the original design by von Neumann, but is regarded as the first implemen-tation of his vision of a self-replicating universal constructor. Minor correction of the published design and amodification that reduces the tape length by ≈
13% (script by Tim Hutton) were used in this implementationauthored by Renato Nobili and Umberto Pesavento. Figure 1B depicts an implementation of the Langtonloop. Instructions are stored in a set of dynamic core parts (light blue) that are surrounded by static sheathparts (grey). Each signal packet consists of the signal (yellow, dark blue, orange, magenta, green) followedby a blank (white). This implementation was done by Eli Bachmutsky.
Supplemental References
1. Jose, A. M. Replicating and cycling stores of information perpetuate life.
BioEssays
Proceedings of the Naturalist Association, Brno
IV, et al.
What is a gene, post-ENCODE? History and updated definition.
Genome Research
Genetics
The selfish gene. p. 192 (Oxford University Press, Oxford, U.K., 1989).6. Powers, W. T., Clark, R. K. & McFarland, R. L. A general feedback theory of human behavior.
Perceptual andMotor Skills
Essays on life itself (Columbia University Press, New York, New York, U.S.A., 2000).8. Schneider, T. D. & Stephens, M. Sequence logos: a new way to display consensus sequence.
Nucleus AcidsResearch
GenomeResearch
R: A Language and Environment for Statistical Computing
R Foundation for Statistical Com-puting (Vienna, Austria, 2019). < > .11. Golly version 3.2. doi: http://golly.sourceforge.net (2005-2018).(2005-2018).