Gene network robustness as a multivariate character
GGene network robustness as a multivariate character
Arnaud Le Rouzic
Abstract
Robustness to genetic or environmental disturbances is often considered as a keyproperty of living systems. Yet, in spite of being discussed since the 1950s, howrobustness emerges from the complexity of genetic architectures and how it evolvesstill remains unclear. In particular, whether or not robustness to various sources ofperturbations is independent conditions the range of adaptive scenarios that can beconsidered. For instance, selection for robustness to heritable mutations is likely tobe modest and indirect, and its evolution might result from indirect selection on apleiotropically-related character (e.g., homeostasis) rather than adaptation. Here, Ipropose to treat various robustness measurements as quantitative characters, andstudy theoretically, by individual-based simulations, their propensity to evolveindependently. Based on a simple evolutionary model of a gene regulatory network,I showed that different ways to measure the robustness of gene expression to geneticor non-genetic disturbances were substantially correlated. Yet, robustness wasvariable in several dimensions, and robustness components could evolvedifferentially under direct selection pressure. Therefore, the fact that the sensitivityof gene expression to e.g. mutations and environmental factors rely on the samegene networks does not preclude that robustness components may have distinctevolutionary histories.
Robustness is the capacity of living organisms to buffer internal or environmentaldisturbances. Robustness encompasses, for instance, the ability to maintainphysiological equilibria (homeostasis), to ensure developmental stability, or to repairand mitigate DNA damage in both soma and germline. Although robustness isvirtually intermingled with the definition of life itself, its underlying mechanismsand its evolutionary origins remain far from being clearly understood (Stearns,2002; Masel and Siegal, 2009; Wagner, 2013; Hallgrimsson et al., 2019).Robustness can evolve as a consequence of non-linearities in the developmentalprocess, i.e. changes in the magnitude of the effect of some genetic or environmentalfactor on the phenotype of interest (Nijhout, 2002). The study of the evolutionarymechanisms leading to robustness roots into the conceptual and empirical work byC.H. Waddington and the concept of canalization (Waddington, 1942;Schmalhausen, 1949; Waddington, 1959; Loison, 2019). Canalization is a propertyof complex developmental systems that buffers environmental and genetic variation,and maintains actively the organism in an optimal developmental path. Although1/22 a r X i v : . [ q - b i o . P E ] J a n he scope and the definition of canalization varies substantially among authors,canalization is generally expected to evolve as an adaptation to ”canalizing”selection for an optimal phenotype (Eshel and Matessi, 1998; Debat and David,2001; Flatt, 2005; Klingenberg, 2019). However, formal population genetic modelshave questioned the unicity of the canalization process. In particular, robustness toenvironmental factors appears more likely to evolve as an adaptation thanrobustness to genetic (mutational) disturbances, on which selection seems to berather weak and indirect even in optimal theoretical conditions (Wagner et al., 1997;Hermisson et al., 2003; Le Rouzic et al., 2013).In this context, the evolution of robustness as a general property of theorganisms heavily depends on the genetic and physiological integration of thedifferent robustness dimensions (Fares, 2015). If the robustness to environmentalfactors and to genetic mutations share the same physiological bases, the adaptiveevolution of environmental canalization could generate a correlated response ofgenetic canalization; this hypotheses has been refered to as ”congruent evolution”(Visser et al., 2003). In contrast, if genetic and environmental robustness hadindependent biological bases, they would be featured by independent evolutionarymechanisms, and possibly independent evolutionary histories.Although this issue would benefit from a better theoretical framework, modellingthe evolution of robustness is not straightforward. The simplest approach relies onfocusing on modifiers, i.e. genes that can influence the robustness of the organismwithout affecting the phenotype. However, in the case of genetic robustness,modifier-based models either rely on tricky rescaling or do not dissociate thephenotype and the robustness to the phenotype (Wagner et al., 1997; Kawecki, 2000;Rajon and Masel, 2013). In addition, in models where the genotype-phenotypeassociation is arbitrary, any correlation between environmental and geneticrobustness is a modelling choice, and not an output of the model. More promisingto address the congruent evolution issue are models in which the phenotype is aresult of an integrated process mimicking some developmental or physiologicalmechanism. In such dynamic models, robustness to various disturbances appear asan emergent property of the model complexity, caused by regulatory feedbacks, thatcannot be easily deduced from the model parameters. Although the potentialpalette of relevant dynamic models is large and could include morphologicaldevelopment models (Milocco and Salazar-Ciudad, 2020) or metabolic models(Nijhout et al., 2019), evolutionary biologists have often considered gene regulatorynetwork models as a good compromise between complexity and numericaltractability for studying the evolution of canalization and robustness (Kauffman,1969; Wagner, 1994; Smolen et al., 2000; Le Cunff and Pakdaman, 2012).Such theoretical gene networks have been shown to display enoughnon-linearities (epistasis and pleiotropy) to evolve enhanced or reduced sensitivityto environmental (Masel, 2004; Espinosa-Soto et al., 2011; Espinoza-Soto et al.,2011) and genetic (Wagner, 1996; Bergman and Siegal, 2003; Draghi and Wagner,2009; Azevedo et al., 2006; R¨unneburger and Le Rouzic, 2016) perturbations.Interesting observations suggest that environmental or genetic canalization could becorrelated to other robustness properties in such models. In particular, it has beenshown that network stability, the propensity of the network to maintain stable(non-cyclic) gene expressions, was correlated to robustness, as selection on stabilityalone could drive an indirect response of genetic (Siegal and Bergman, 2002) andenvironmental (Masel, 2004) canalization. In contrast, Odorico et al. (2018) showedthat networks selected to maintain (but not converge to) an equilibrium couldbecome both environmentally sensitive and genetically canalized, suggesting thatenvironmental and genetic robustness could be theoretically decoupled. However,2/22o systematic quantitative description of the pleiotropic pattern underlyingdifferent robustness components has never been attempted.Here, I aim at extending the study of canalization in theoretical gene networksto address the multidimensional nature of robustness, by estimating theevolutionary independence of various robustness components. Fiverobustness-related measurements were considered, two of them corresponding toenvironmental robustness (early vs. late disturbances), two corresponding to geneticrobustness (early — inherited — or late — acquired — mutations), the last one(gene expression unstability) being related to the intrinsic stability of the expressionphenotype. The first part of this study focuses on the multidimensional patterns ofrobustness in small and random networks, and the second part on the evolutionaryconsequences of the pleiotropic nature of robustness, through individual-basedsimulations. The network model is derived from Wagner (1994) and Wagner (1996) and belongsto the family of gene regulatory network models sometimes refered to as ”Wagnermodel” (see Fierst and Phillips, 2015 for a historical record). The main differencewith Wagner (1996) and further work is that the network output (gene expressions)is quantitative and not qualitative, in the same way as in Siegal and Bergman(2002).More specifically, the structure of a n -gene network is encoded as a n × n matrix W , while the state of the network is stored into a vector of size n , P . In this setting, W ij encodes the influence of gene j on the expression of gene i , W ij < W ij > W ij = 0 denotes the absence of regulatory interaction. P i is the expression of gene i ,ranging between 0 (no expression) and 1 (maximum expression).The properties of these gene networks are explored in a discrete dynamic system: P t +1 = F ( WP t ) , (1)where the function F is a vectorized version of a sigmoid scaling function: F ( x , x , . . . , x n ) = [ f ( x ) , f ( x ) , . . . , f ( x n )]; f ( x ) = 11 + λ a e − µ a x , (2)with λ a = (1 − a ) /a and µ a = 1 /a (1 − a ). The function f is scaled such that f (0) = a and df /dx | x =0 = 1; the parameter a thus stands for the constitutive geneexpression (the expression of a gene in absence of regulators), and this functiondefines the scale of the matrix W : W ij = δ ( δ (cid:28)
1) means that the expression ofgene i at the next time step will tend to P i,t +1 = a + δ if i is regulated by a single,fully expressed gene j ( P j,t = 1).Gene networks dynamics started from an initial expression P , and geneexpression was updated for T timesteps. By default, P = ( a, a, ..., a ), since thisstep immediately follows a virtual initial state with no expression. The expressionphenotype corresponding to a gene network was determined by averaging geneexpressions during the last τ timesteps for each gene i : P ∗ i = (1 /τ ) (cid:80) Tt = T − τ P it .3/22 .2 Robustness indicators Five robustness indicators were calculated, corresponding to five different aspects ofgenetic or environmental robustness in a gene network: network stability ρ S ,robustess to early ( ρ E ) and late ( ρ e ) environmental disturbance, and robustness toearly ( ρ M ) and late ( ρ m ) genetic disturbance. All indicators were expressed on ascale homogeneous to log variances in gene expressions; the mode of calculation issummarized in Table 1.Dynamic systems based on the Wagner model often tend to generate limit cyclesand never converge to a stable equilibrium. Network stability ρ S quantifies thecapacity for a specific network to lead to stable gene expressions. For consistencywith other indicators, this unstability was measured as the log squared differencebetween the average expression during the last τ timesteps, and an extra timestep.The robustness to early environmental disturbance ρ E measures the capacity ofa network to reach a consistent final state starting from different initial geneexpressions. In practice, R replicates of the network dynamics were run, in whichthe initial gene expressions ( P ) were drawn into Gaussian ( µ = a, σ = σ E )distributions (expression values < > i was measured as the log variance inthe final gene expression across these replicates.The robustness to late environmental disturbance ρ e measures the capacity of anetwork to recover its equilibrium state after having being disturbed. Geneexpressions after T timesteps were disturbed by adding a random Gaussian noise ofstandard deviation σ e to each gene of the network, and ρ ei was computed for eachgene i as the log variance in gene expression at time step T + 1 over R replicates.The robustness to early mutations ρ M measures the system robustness toinherited genetic mutations (modifications of the W matrix). A random non-zeroelement of the W matrix was shifted by a random Gaussian number of standarddeviation σ M , and its consequences on the mean expression of all network genes wasrecorded. The procedure was replicated R times, and the robustness score ρ M i foreach gene i was calculated as the log variance of gene expression across R replicates.Finally, the robustess to late mutations ρ m measured the effect of mutations inthe gene network W after having reached the final state. In practice, the W matrixwas mutated in the same way as for ρ M with a standard deviation σ m , but itsconsequences on gene expression were calculated for only one timestep, startingfrom the last state of the network. The robustness score was calculated as for otherindicators (log variance over R replicates).All these scores were calculated for every gene i of a given network, and thenaveraged over all genes in order to get a series of summary network descriptors. Themagnitude of the score itself is arbitrary, as it depends on the size of the disturbance.However, indicators happen to increase approximately linearly with the size of thedisturbance (Appendix 1), suggesting the the results would be largely unaffected bya change in the variance of mutational effects and environmental noise. Random networks were generated as n × n matrices filled with independentidentically-distributed random numbers drawn into a Gaussian ( µ = 0 , σ = 1)distribution. A density parameter 1 /n ≤ d ≤ W matrix. Zeros were placed randomly, withthe constraint that all genes should be regulated by at least another one. 4/22ndicator Robustness component Computation Disturbance std. dev. ρ S Expression stability ρ Si = log( P ∗ i − P T +1 ) ρ E Early noise in geneexpression ρ Ei = log[ R − (cid:80) Rr =1 ( P ∗ i,r − P ∗ i ) ] σ E = 0 . ρ e Late noise in geneexpression ρ ei = log[ R − (cid:80) Rr =1 ( P i,T +1 ,r − P i,T +1 ) ] σ e = 0 . ρ M Early (inherited)mutations ρ M i = log[ R − (cid:80) Rr =1 ( P ∗ i,r − P ∗ i ) ] σ M = 0 . ρ m Late (aquired)mutations ρ mi = log[ R − (cid:80) Rr =1 ( P i,T +1 ,r − P i,T +1 ) ] σ m = 0 . i stands for the gene(1 ≤ i ≤ n ), and r for the replicate (1 ≤ r ≤ R ), since all indicators except ρ S were estimated by aresampling procedure. P ∗ i stand for the average gene expression of the network (mean expressionfrom the last τ timesteps), and P i = (1 /R ) (cid:80) Rr =1 P ∗ i,r represents the mean over replicates. Noise ingene expression was simulated by adding a random Gaussian deviation to the initial state of thenetwork (for ρ E ) or to the last state of the network (for ρ e ). Mutations were simulated by adding arandom deviation to a random interaction in the network, either before starting the network dynamics( ρ M ) or after the last time step ( ρ m ). All robustness indicators are homogeneous to a log variancein gene expression; robustness increases when the indicator gets smaller, and sensitivity increaseswhen the indicator increases. The last column indicates the standard deviation of the correspondingGaussian disturbance. The main interest of gene-network models is the complexity and the richness of theunderlying genotype-phenotype relationship. As a side effect, such models are ingeneral difficult to handle mathematically (Carneiro et al., 2011; Le Cunff andPakdaman, 2012). Excluding the one-gene self-regulating case (which already hasnon-trivial mathematical properties, Guyeux et al., 2018), the simplest network(2-by-2 matrix) has four genetic parameters, which makes the exploration of theparameter set tedious. Here, the number of dimensions was restricted byconsidering the set of networks that lead to a predefined arbitrary equilibrium, P θ ∞ = ( P θ , P θ ). As F ( WP θ ∞ ) = P θ ∞ , the W matrix can be reduced to twoindependent parameters, W and W : W = F (cid:20)(cid:18) W AW B (cid:19) (cid:18) P θ P θ (cid:19)(cid:21) = (cid:18) P θ P θ (cid:19) , (3)with A = 1 P θ [ f − ( P θ ) − W P θ ] ,B = 1 P θ [ f − ( P θ ) − W P θ ] , (4) f − ( x ) = − µ a log (cid:16) − xλ a x (cid:17) being the inverse of f ( x ) (equation 2). The W matrixachieving the desired P θ ∗∞ equilibrium from a specific pair W , W always exists(and is unique), but the stability of the equilibrium is not guaranteed. Networkswhich final gene expression P ∗ = ( P ∗ , P ∗ ) differed substantially from the target (inpractice, when | P ∗ − P θ | + | P ∗ − P θ | > .
15) were excluded from the analysis.Such discrepancies correspond to either unstable equilibria (in which case geneexpressions were driven away from the equilibrium) or extreme oscillatory behaviors5/22large oscillations may hit extreme expression limits (0 or 1), which drives theaverage expression away from the target equilibrium).
The evolution of gene networks under various evolutionary constraints was studiedby individual-based simulations. Each individual was featured by its genotype (a n × n W matrix, by default n = 6), its expression phenotype P ∗ , and the fiverobustness scores ρ S , ρ E , ρ e , ρ M , and ρ m . Individuals were haploid and reproducedclonally. Mutations consisted in adding a random Gaussian deviate of variance σ ν to an element of the W matrix, with a rate ν per individual and per generation.Generations were non-overlapping, and population size N was constant. Ageneration consisted in sampling N new individuals among the N parents, with aprobability propotional to the individual fitness. Fitness was computed assumingstabilizing selection around a target (optimal) expression level for n (cid:48) ≤ n genes ofthe network (by default n (cid:48) = 3), as w = exp( − (cid:80) n (cid:48) i =1 s i ( P ∗ i − θ i ) ), where s i was thestrength of stabiizing selection on gene i ( s i = 0 standing for no selection), and θ i was the optimal expression phenotype. The θ i were drawn in a uniform (0,1)distribution at the beginning of each replicated simulation, and the initial genenetwork was empty ( W ij = 0) except for one random element per line, which wasinitialized to match the optimal expression using equation (4), which can betrivially extended for any number of genes (with only one unknown per line).In addition, direct, directional selection on robustness indicators was performedin some simulations, consisting in multiplying individual fitness byexp( (cid:80) x ∈ ( S,E,e,M,m ) β x ρ x ), where β x was the strength of directional (positive ornegative) selection on robustness index x (in practice, β x = ± . https://github.com/lerouzic/robustness ). Random interaction matrices are regularly used in the literature to study thegeneral properties of gene networks (e.g. Carneiro et al., 2011; Pinho et al., 2012).As such, random networks are not expected to reflect the properties ofbiologically-realistic genetic architectures, as biological networks are far fromrandom. However, such an approach helps developing a general intuition about theproperties of the underlying model, especially about the amount of selection it takesto bring a network into its non-random state.Correlations were calculated between all five robustness components over 10,000random networks (Appendix 2). All robustness components were positivelycorrelated. However, correlations ranged from about 0.63 (late genetic vs. earlyenvironmental) to above 0.97 (late environmental vs. late genetic). A PrincipalComponent Analysis (Figure 1) confirms that robustness components were not fullycorrelated. The first PC (82% of the total variance) corresponds to the generalrobustness of the network, and involves all robustness indexes. The remainingvariance is explained by orthogonal vectors separating all other robustness 6/22 r E r e r M r m r S r E r e r M r m r S r E r e r M r m r S r E r e r M r m r S r E r e r M r m r S P C P C P C P C P C % variance0 20 40 60 80 Figure 1: Summary of the principal component analysis on the five robustnesscomponents over 10,000 random 6-gene networks ( µ = 0 , σ = 1). Left: position ofthe five robustness components on all five (normalized) Principal Components (PC); ρ S : Stability, ρ E : Early environmental, ρ e : Late environmental, ρ M : Early genetic, ρ m : Late genetic. Right: relative contribution of the five PCs to the total variance.components. At least 4 out of 5 PCs, explaining 10% to 2% of the total variance,did not vanish when increasing the sample size (Appendix 3). In the following, I considered the particular case of P ∞ = (0.3, 0.6) with twointermediate, distinct gene expressions. Equivalent results could be achieved with adifferent target. Figure 2 illustrates how the robustness components varied in thisconstrained 2-gene network model (red stands for maximum robustness, i.e.minimum scores for ρ S , ρ E , ρ e , ρ M , and ρ m ). In the white regions, the equilibriumwas not achieved in numerical simulations for at least three different reasons(Appendix 4): (i) flutuations around the equilibrium are large enough to hit theedges of the (0,1) interval, shifting the mean expression; (ii) the expressiondynamics was slow and the network was unable to get close to the equilibrium after16 time steps; (iii) the equilibrium was not reachable from the default starting point.It immediately appears that the different robustness components werecorrelated, but did not overlap perfecty. The simulated behavior of networks A to Eare illustrated in Appendix 5, and the corresponding W matrices are provided inAppendix 6. The network denoted as B was robust to most sources of disturbance,while network E was sensitive to all components except stability. Network C wasunstable, but remained relatively buffered. Networks A and D illustrateintermediate loss-of-robustness behaviors, through different mechanisms (unstabilityfor network D, and weak buffering for network A).This 2-gene network analysis thus confirms the results obtained for larger genenetworks. Robustness is thus not a feature of large and intricate geneticarchitectures, as it is already present (and multidimensional) in the simplest genenetworks. The different robustness components are partially independent, even in7/22 − Early Environmental ( r E ) W W − − − − − −15 − − − − − ABCD E −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 − Late Environmental ( r e ) W W − − − . − . − − − . − . − − − . − . − − − . − . − − − . ABCD E −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 − Early Genetic ( r M ) W W − . − − − . − . − − . − − . − − . ABCD E−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 − Late Genetic ( r m ) W W − . − . − . − . − . −6.2 − . ABCD E −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 − Stability ( r S ) W W − − − − − − − − − − − − − ABCD E
Figure 2: Robustness indicators ( ρ S , ρ E , ρ e , ρ M , ρ m ) estimated for an exhaustive continuum oftwo-gene networks with an arbitrary expression equilibrium at P ∞ = (0 . , . The evolution of robustness was studied through individual-based simulations, inwhich all individuals were characterized by their genotype (a 6-gene network) and aset of phenotypes (gene expressions and network robustness). Gene expressions for3 out of 6 genes were under stabilizing selection. In addition to stabilizing selectionon gene expression, robustness indicators were directly selected towards more or lesssensitivity. This settings allows for a slight, indirect selection pressure towards morerobustness to mutations in all simulations, as a result of the selection for geneticcanalization due to stabilizing selection. (lower scores for ρ M , and for the correlatedtrait ρ m , Figure 3, black circles).Direct selection on all robustess components lead to a response, showing thatrobustness is evolvable. Yet, the evolutionary potential of the robustness indicatorsdiffered substantially, as indicated by the differences in the Y-scales. Robustness8/22 − − − − − − Early Environmental ( r E ) Generations 0 2000 4000 6000 8000 10000 − − − − Late Environmental ( r e ) Generations 0 2000 4000 6000 8000 10000 − . − . − . − . Early Genetic ( r M ) Generations0 2000 4000 6000 8000 10000 − . − . − . − . Late Genetic ( r m ) Generations 0 2000 4000 6000 8000 10000 − − − − − − Stability ( r S ) Generations
Figure 3: The evolution of all five robustness indicators (five figure panels) was recorded for10,000 generations in individual-based simulations. The figures show the average robustness over 20replicated simulations. The control simulations (black circles) correspond to stabilizing selection only.Colored symbols correspond to simulations in which various robusntess indicators were selected upor down (upward or downward triangles). Bold-faced symbols indicate the result of direct selection(the robustness indicator displayed in the panel was directly selected), while thin symbols stand forindirect selection (the response is the result of selection on another indicator, which can be identifiedbased on its color).indicators being all homogeneous to a variance of gene expression, they could becompared directly. The most evolvable robustness components are Stability ( ρ S )and early environmental disturbances ( ρ E ), which can differ by up to 25 log units(11 orders of magnitude). In contrast, robustness to late environmental noise ρ e andgenetic changes ( ρ M and ρ m ) only differed by 3 to 4 log units (i.e. a factor 10 to100) after 10,000 generations of selection, although the selection response was stillongoing.Selection on all robustness components also lead to an indirect response of allother components, which confirms a general genetic correlation. Direct selection onenvironmental canalization or gene expression stability can thus induce theevolution of genetic canalization, and vice versa . Indirect responses were generallysmaller or similar to direct repsonses, except when selecting for more sensitivity forearly mutations ρ M . This could a be consequence from the canalizing selection ongene expression, which opposes the direct selection towards decanalized networks.Differences in evolvability can be attributed to differences in the strength ofgenetic constraints, i.e. the propensity to disconnect robustness components.Simulations were run to test the long-term effect of antagonistic selection on allpairs of robustness indicators (Fig. 4). Possible outcome included free evolution9/22 ( r E ) Late Environmental ( r e ) Early Genetic ( r M ) Late Genetic ( r m ) Stability ( r S ) target − & corr − target − & corr + target + & corr − target + & corr + Relative response 0 2000 6000 10000 − − − S t ab ili t y ( r S ) − − − E a r l y G ene t i c ( r M ) − − − − E a r l y E n v i r on m en t a l ( r E ) Generations
Figure 4: Bivariate response to selection (average over 20 simulation replicates) for all combinationsof robustness indicators. Left: relative response (normalized by the up - down difference at generation10,000) to univariare selection (thick horizontal lines) and bivariate selection (symbols). Zero standsfor the control at generation 10,000. Four bivariate selection treatments were simulated: negativeselection of both traits, negative for the target trait and positive for the correlated trait, positivefor the target trait and negative for the correlated trait, and positive selection on both traits.Symbol colors stand for the correlated trait. Right panels: three characteristic dynamics representingdifferent outcome for bivariate response (top: the target trait responded freely to selection in spite ofantagonistic selection on the correlated trait; middle: the target traits responded, but at a lower paceas when selected alone; bottom: the target trait failed to respond to selection when the correlatedtrait was selected in the opposite direction.(the response to antagonistic selection was identical to selection on the target traitonly), constrained evolution (the target trait responded slowly), or no evolution(the constraint was absolute and no response to antagonistic selection was observed).Each outcome was possible in practice (Fig. 4), but selection was able to trigger apartial or full response in the direction orthogonal to the main correlation in mostcases. Most robustness components thus appeared to be at least partially evolvableindependently from each other.
Whether or not various robustness components of genetic architectures areindependent is a key issue to understand why organisms are robust or sensitive togenetic or environmental disturbances. In particular, independent genetic bases ofrobustness components would call for independent evolutionary histories, while apleiotropic genetic architecture could explain the evolution of nonadaptiverobustness components as a result of indirect selection. The analysis of the genetic10/22orrelations between five robustness components, based on a simple gene networkmodel, results in a balanced answer: robustness components appear to be generallycorrelated, but the pleiotropy is not an absolute constraint, and most pairs ofrobustness components could be selected in divergent directions. Such aquantitative answer to the so-called ’congruence’ hypothesis (Visser et al., 2003)would explain both how unselected robustness components could be partly drivenby indirect selection and why various robustness-related features seem to have theirown evolutionary history.
Gene regulation networks are popular candidates when attempting to modelcomplex biological processes: they are at least partly built on solid and realisticprinciples (transcription factors can enhance or repress the expression of othergenes), gene regulation plays a crucial role in most biological, physiological, anddevelopmental mchanisms, and even modest-sized regulation networks display awide diversity of possible outcomes, including homeostasis (stable equilibrium ofgene expressions) (Stern, 1999), cyclic dynamics (Leloup and Goldbeter, 2003;Akman et al., 2010), or amplification of a weak signal (Hornung and Barkai, 2008).Conveniently, the phenotypic level considered as the output of a gene network (theexpression level of all network genes) can be assimilated to a partial transcriptome,which opens the possibility for confrontation with empirical data.The framework proposed by Wagner (1994) and Wagner (1996) is particularlypopular in evolutionary biology to model gene network evolution due to itscomputational simplicity and efficiency, combined with a direct biologicalinterpretation (each line of the regulation matrix is the set of transcription factorfixation sites in the promoter of a gene). In practice, multiple variants based on thisoriginal model have been derived, either to address specific questions, or to correctfor unrealistic features. Here, I used a quantitative version of the model, in whichgene expressions are scaled between 0 (no expression) and 1 (maximum expression),which was first proposed in Wagner (1994), although later work have oftenpreferred binary networks (in which genes can be on/off, e.g. Wagner, 1996;Ciliberti et al., 2007), and a gene expression scaling between -1 and 1. Unlike inWagner (1996) and Siegal and Bergman (2002), mutational effects were correlatedin my simulations (the value of the mutant allele was drawn in a Gaussian centeredaround the value of the parental allele), which allows for cumulative evolution.Finally, the sigmoid response function was made asymetrical by introducing aconstitutive expression parameter (as in e.g. R¨unneburger and Le Rouzic, 2016) inorder to avoid the unrealistically high expression of unregulated genes (half themaximum expression) from the default Wagner model.Discrete time and simple matrix algebra were necessary to run evolutionaryindividual-based computer simulations, in which the network output needs to becalculated for thousands of individuals and thousands of generations. Using morerealistic models based on continuous time and differential equations, non-linearregulation effects, and independent degradation and transcription rates would makethe simulations impractical, with little benefit in terms of explanatory power.Computational constraints also limit the network size to a few dozen genes max,which is not enough to generate realistic levels of sparcity – simulated gene networkswere too dense to be realistic. The simulated phenotypic target (maintaining aconstant set of gene expressions) were also extremely simple compared to what genenetworks are theoretically able to do (e.g. converging to different equilibria indifferent cell types, or controling a complex dynamic of gene expression during thedevelopment). However, the general conclusions from this study are (at least11/22ualitatively) robust to most simulation parameters (Appendix 7), suggesting thatthey reflect general properties of the underlying genetic architecture.
There are potentially many ways to measure the robustness of a phenotypic trait.Here, five indicators were measured, in order to catch various (and potentiallyindependent) aspects of what is generally defined as robustness. The sensitivity toinherited mutations ( ρ M ) is probably the most popular one, as it is central to thediscussion around the evolution of canalization (Waddington, 1959; Wagner, 1996;Fares, 2015). The sensitivity to environmental perturbations is also unavoidable,although its implementation in a gene network model is less straightforward. Here,it was calculated as both the sensitivity of the network to disturbance in the initialexpression state ( ρ E ), which measures the size of the basin of attraction of theoptimal expression pattern, and as the strength of the stability of the equilibriumwhen disturbed ( ρ e ). These two measurements could be interpreted asdevelopmental robustness and physiological homeostasis, respectively, as theyquantify the response of the network to disturbances in the expression levels atdifferent time scales. The robustness to mutations occuring after the networkconvergence ( ρ m ) was considered because it sets up an alternative to the genetic vs.environmental congruence hypothesis: in long-lived organisms, non-heritable(somatic) mutations participate to the ageing process (Kennedy et al., 2012), andageing could be under direct selection. Thus, the robustness to somatic mutationscould also drive indirectly the evolution of genetic canalization. Finally, the genenetwork stability ( ρ S , amplitude of the fluctuations of gene expressions) wasconsidered as a robustness component because it has been proven to drive anindirect response of genetic canalization, based on very similar model (Siegal andBergman, 2002).These indicators were chosen based on the possibility to measure them innumerical simulations. Although the empirical assessment of the correlationbetween robustness components would be way more convincing than a theoreticalstudy, defining similar measurements from experimental datasets could bechallenging. For instance, ρ M and ρ E could, at least in theory, be estimated as thevariance in gene expression across genetic backgrounds or across environmentalconditions, respectively. Measuring ρ m environmentally could be more complicated,as it would likely be confounded with other ageing mechanisms. In contrast, theempirical distinction between e.g. ρ e and ρ S relies on discriminating internal vs.external sources of noise, and might be in practice impossible. In all cases, geneexpression data are generally quite noisy and their analysis necessitate heavycorrections to prevent multiple testing issues. Studying empirically the robustnessand evolvability of molecular and morphological traits has long been considered as achallenging task, but methodological and technological progress has recentlybrought new concrete perspectives (Payne and Wagner, 2019).Some popular measurements of developmental robustness were not consideredhere for technical reasons. For instance, fluctuating asymetry (the variance betweenthe same phenotypic trait measured in the right and the left body parts ofsymmetric organisms) is a convenient measurement of microenvironmental effectson the development (Debat and David, 2001; Leamy and Klingenberg, 2005), but ithas no equivalent at the level of gene expression in a regulation network. Thedeterministic sensitivity to a directional environmental gradient could also be usedto measure phenotypic plasticity, which is central to the question of phenotypicrobustness. Yet, there are several ways to model phenotypic plasticity in a genenetwork (Masel, 2004; Odorico et al., 2018), and it requires a specific selection setup12/22different expression optimums as a function of the environment). Because of thisadditional complexity, phenotypic plasticity was excluded from the focus of thiswork, although the evolution of plasticity of gene expression remains an intriguingand fundamental question. In particular, phenotypic plasticity (i.e. an adaptive lackof robustness to some environmental signal) may itself be canalized to genetic orother environmental disturbances (Stearns and Kawecki, 1994); considering reactionnorms (a measurement of plasticity) as quantitative traits thus opens challengingquestions about the adaptive evolution of the canalization of robustness traits. Acknowledgements
Many thanks to Laurent Loison for insightful discussions. Simulations were partlyperformed on the Core Cluster of the Institut Fran¸cais de Bioinformatique (IFB)(ANR-11-INBS-0013) .
Conflict of interest disclosure
The author of this article declares that he has no financial conflict of interest withthe content of this article.
References
Akman, O. E., D. A. Rand, P. E. Brown, and A. J. Millar (2010). Robustness fromflexibility in the fungal circadian clock.
BMC systems biology
Nature
Nature
BMC evolutionary biology
Proceedings of the National Academy ofSciences
Trends in Ecology & Evolution
Journal of evolutionary biology
5, e3188v1. issn : 2167-9843.Eshel, I. and C. Matessi (1998). Canalization, genetic assimilation andpreadaptation: a quantitative genetic model.
Genetics
BMC EvolutionaryBiology
Journal of evolutionary biology
Trends in Genetics
Journal of Experimental Zoology Part B:Molecular and Developmental Evolution
The Quarterly review ofbiology
Mathematics
Seminars in Cell & DevelopmentalBiology . Vol. 88. Elsevier, pp. 67–79.Hermisson, J., T. F. Hansen, and G. P. Wagner (2003). Epistasis in polygenic traitsand the evolution of genetic architecture under stabilizing selection.
TheAmerican Naturalist
PLoS Comput Biol
Nature
Evolution
Mechanisms of ageing and development
Frontiers in Ecology andEvolution
7, p. 56.Le Cunff, Y. and K. Pakdaman (2012). Phenotype-genotype relation in Wagner’scanalization model.
Journal of Theoretical Biology
Evolutionary Biology
Annu. Rev. Ecol. Evol. Syst.
36, pp. 1–21.Leloup, J.-C. and A. Goldbeter (2003). Toward a detailed computational model forthe mammalian circadian clock.
Proceedings of the National Academy ofSciences
Seminarsin cell & developmental biology . Vol. 88. Elsevier, pp. 4–13.Masel, J (2004). Genetic assimilation can occur in the absence of selection for theassimilating phenotype, suggesting a role for the canalization heuristic.
Journalof evolutionary biology
Trends in genetics
Evolution
Bioessays
Wiley Interdisciplinary Reviews: Systems Biologyand Medicine
Journal of evolutionary biology
Nature Reviews Genetics
PloS one
R: A Language and Environment for Statistical Computing . RFoundation for Statistical Computing. Vienna, Austria.Rajon, E. and J. Masel (2013). Compensatory evolution and the origins ofinnovations.
Genetics
BMC evolutionary biology
Proceedings of the National Academy ofSciences
Bulletin ofmathematical biology
Proceedings of the NationalAcademy of Sciences
Evolution
Proceedings of the National Academy of Sciences
Evolution
Nature
Nature
Proceedings ofthe National Academy of Sciences
Evolution
Robustness and evolvability in living systems . Vol. 24. Princetonuniversity press.Wagner, G. P., G. Booth, and H. Bagheri-Chaichian (1997). A population genetictheory of canalization.
Evolution ppendix 1
Sensitivity of the robustness measurements to the magnitudeof the disturbance − − − − s E E a r l y E n v i r on m en t a l ( r E ) Random networks − − − − s e La t e E n v i r on m en t a l ( r e ) − − − − s M E a r l y G ene t i c ( r M ) − − − − s m La t e G ene t i c ( r m ) − − − − s E E a r l y E n v i r on m en t a l ( r E ) Evolved networks − − − s e La t e E n v i r on m en t a l ( r e ) − − − s M E a r l y G ene t i c ( r M ) − − − s m La t e G ene t i c ( r m ) Four out of five robustness indicators ( ρ E , ρ e , ρ M , ρ m ) depend on the magnitude ofthe disturbance. The figure displays the influence of the size of the disturbance onthe robustness measurement (left: random networks, right: evolved networks).Robustness scores appear very reliable for evolved networks (the rank of differentgenotypes in terms of robustness does not depend on the size of the disturbance),while the picture is more messy for random networks, which can be differentiallyrobust to large or small disturbances. 16/22 ppendix 2 Correlations among robustness indexes − − − − La t e E n v i r on m en t a l ( r e ) r=0.66 − − − − E a r l y G ene t i c ( r M ) r=0.76 − − − − La t e G ene t i c ( r m ) r=0.63 − − − − S t ab ili t y ( r S ) −40 −30 −20 −10 Early Environmental ( r E ) r=0.70 r=0.88r=0.97 −40 −30 −20 −10 Late Environmental ( r e ) r=0.74 r=0.88 −40 −30 −20 −10 Early Genetic ( r M ) r=0.74 −40 −30 −20 −10 Late Genetic ( r m ) r=0.71 Correlations between all five robustness components among 10,000 random 6-gene networks( µ = 0 , σ = 1). 17/22 ppendix 3 Sampling effects on Principal Components
Number of simulated networks P r opo r t i on v a r i an c e e x p l a i ned PC1PC2PC3PC4PC5 . % % % %
100 1000 10000 Number of robustness tests P r opo r t i on v a r i an c e e x p l a i ned PC1PC2PC3PC4PC5 . % % % %
10 1000 100000
Influence of the sampling effect (number of networks and number of replicates R to estimate robustness) on the relative weight of the principal components. All PCsexcept the last one are robust to sampling. 18/22 ppendix 4 Reasons for not reaching the desired equilibrium −1.5 −0.5 0.5 1.0 1.5 2.0 − W W On target Still changing Alternative eq. Large osc.
Although equation 4 guarantees that an equilibrium exists at the targetphenotypic expression, the equilibrium might not reachable in practice whensimulating the gene network dynamics. The colored area in the figure correspondsto networks that failed to produced the target phenotype, each color representing adistinct reason; Yellow: network dynamics was slow and the final gene expressionhas not been reached yet after 16 time steps; Gray: an alternative equilibrium wasreached (most of the time implying that one or both genes are either completelysilenced to fully expressed). Red: The network steady state featured oscillationsthat were so large that they hit the maximum or minimum expression, shifting theaverage expression away from the target expression. 19/22 ppendix 5
Illustration of the robustness scores . . . . . . E x p r e ss i on Early Genetic A Late Genetic Early Environmental Late Environmental . . . . . . E x p r e ss i on B . . . . . . E x p r e ss i on C . . . . . . E x p r e ss i on D . . . . . . E x p r e ss i on Time steps E Time steps
Time steps
Time steps
The figure displays a subset of the replicated tests for four robustness indexes. Columns A to Ecorrespond to the five networks described in Appendix 6. In each panel, the default (undisturbed)network kinetics is displayed as plain lines (black for gene 1, red for gene 2). By construction, allnetworks have an equilibrium at (0.3, 0.6). The network stability can be assessed from the amplitudeof the cycles in the undisturbed kinetics. Network homeostasis (first line) was estimated bydisturbing the expression state of the genes (for readability, the network response was expanded overseveral time steps). Environmental canalization (second line) was estimated by disturbing the initialstate (semi-transparent lines). Genetic canalization (third line) was measured as the variance in thefinal state when mutating the network at the beginning of the development. Finally, the robustnessto late mutations (last line) consisted in mutating the network after having reached the equilibrium.20/22 ppendix 6
Two-gene example networks W W W W A 0.70 0.20 -0.21 0.38B -0.30 0.30 0.29 0.33C -0.40 0.80 0.34 0.08D -1.00 -0.80 0.64 0.88E 1.50 3.50 -0.61 -1.27The five two-gene networks detailed in figures 2 and 5. 21/22 ppendix 7
Exploration of the parameter set F i t ne ss . . . . E a r l y E n v i r on m en t a l ( r E ) − − − − − La t e E n v i r on m en t a l ( r e ) − − − E a r l y G ene t i c ( r M ) − − − − − La t e G ene t i c ( r m ) − − − − − S t ab ili t y ( r S ) − − − − − Mutation rate ( m ) Population size (N)
10 50 500 5000
Number of genes (n)
Selected genes (n')
Selection strength (s)
Influence of various simulation parameters (mutation rate µ , population size N ,total number of genes n , number of selected genes n (cid:48) , and strenght of selection s )on fitness and robustness indexes at equilibrium. The figure reports the mean ±±