Effective connectivity determines the critical dynamics of biochemical networks
EEffective connectivity determines the critical dynamics ofbiochemical networks
Santosh Manicka , , + Manuel Marques-Pita , , , + Luis M. Rocha , Center for Social and Biomedical Complexity, Luddy School of Informatics, Computing& Engineering, Indiana University, Bloomington IN, USA Instituto Gulbenkian de Ciˆencia, 2780-156, Oeiras, Portugal Universidade Lus´ofona, COPELABS. 1700-097, Lisbon, Portugal [email protected]
Abstract
Living systems operate in a critical dynamical regime—between order and chaos—wherethey are both resilient to perturbation, and flexible enough to evolve. To characterize suchcritical dynamics, the established structural theory of criticality uses automata networkconnectivity and node bias (to be on or off) as tuning parameters. This parsimony in thenumber of parameters needed sometimes leads to uncertain predictions about the dynamicalregime of both random and systems biology models of biochemical regulation. We derivea more accurate theory of criticality by accounting for canalization , the existence of redun-dancy that buffers automata response to inputs. The new canalization theory of criticalityis based on a measure of effective connectivity.
It contributes to resolving the problemof finding precise ways to design or control network models of biochemical regulation fordesired dynamical behavior. Our analyses reveal that effective connectivity significantlyimproves the prediction of critical behavior in random automata network ensembles. Wealso show that the average effective connectivity of a large battery of systems biology modelsis much lower than the connectivity of their original interaction structure. This suggeststhat canalization has been selected to dynamically reduce and homogenize the seeminglyheterogeneous connectivity of biochemical networks.
Introduction
In the study of biological, social, and technological systems, network models have becomeimportant tools. [10] Network model structure is defined by a graph
G ≡ ( X, E ), where actors(e.g., biochemical species and environmental factors) are represented as a set of nodes , X , andinteractions between pairs of nodes as a set of edges , E . Several dynamical properties of networkscan be inferred from their structure alone. [11] The structural properties of networks yieldinsights into the organization of living systems and societies. [9, 10] Yet, the rules of interactionbetween nodes must be considered in order to study network dynamics. For biochemical systems,understanding the precise inference of interaction rules is a difficult task because vast amounts ofdata are required to estimate the kinetic parameters governing molecular concentration rates. Inresponse, a growing number of successful modelers have overcome the need for precise parameterestimation by relying on coarse-grained qualitative approaches to modeling interactions betweennodes. [1, 2, 32, 50, 59]In 1969 Kauffman introduced the simplest qualitative model of biochemical regulation andsignaling, the Boolean network (BN). [37] Nodes in a Kauffman network are defined as simple ∗ + These authors contributed equally to this work. a r X i v : . [ q - b i o . M N ] J a n oolean automata, and consequently, their interactions are described as logical state transitionrules. The state transition rule of a node, x i , incorporates k i inputs—typically the states of othernodes, or external signals. Network dynamics ensue as the state of every node x i ∈ X is updatedsynchronously in discrete time steps. As the dynamics unfold from an initial configuration, thenetwork eventually settles into an attractor configuration. An attractor can be a stable fixed-point—a network configuration that leads to itself in the next time step—or a sequence ofconfigurations that repeat periodically. Kauffman observed that hard-to-predict behaviors ofreal-world biochemical systems are also exhibited by these canonical networks, and they havesince been widely used to model genetic regulation and signaling. [2, 3, 12, 32, 39, 57, 61]State transition rules in these models are derived from molecular data and used to capture thecharacteristic combinatorial regulation pervasive in biochemical networks. [3, 5, 17, 21, 28, 44]Attractors then correspond to stable states of real systems, such as those that determine cellfate, as evidenced in a large number of models. [13, 21, 31, 39, 44]Important discoveries in biology have been made using automata models even though theyare built from coarse qualitative representations of biochemical entities and interactions and theyuse state-transition rules that often ignore the precise specification of interaction timings. [3,12, 44, 63] For instance, the BN model of the yeast cell cycle reproduced the complete dynamicaltrajectories originating from known initial conditions to known attractor configurations. [22] Foranother instance, a BN model of intra-cellular signal transduction in breast cancer reproducedknown drug resistance mechanisms and uncovered new and effective drug interventions. [64] Athird, striking instance is a prescription obtained from a BN model for how to reprogram alreadydifferentiated cells. [16] While most existing models of biochemical regulation are Boolean, anincreasing number of models have considered automata with more than two states. [3, 65] Inaddition, automata networks have been extended to include various sources of stochasticity.[12, 46] BNs have become important conceptual frameworks to study a number of generalprinciples in theoretical biology, two of which, canalization [46, 62] and criticality , [37, 39, 41, 42]are at the core of the research presented here. Critical dynamics in complex networks
The notion of criticality emerged from the observation that some dynamical systems can bein a state of thermodynamic equilibrium that depends on some critical parameter. Tuningthis parameter makes the system undergo phase transitions. In an ordered phase, the systembecomes insensitive to perturbations and changes in initial conditions. Conversely, in a chaotic phase, dynamic trajectories within the system vary vastly as a result of small perturbationsor minute differences in initial conditions. In the critical phase—the one between order andchaos—the system is robust to most small perturbations, yet sensitive to some, making it flexibleenough to respond differentially to a range of input signals. In this phase, small changes in initialconditions do not lead to completely different dynamic trajectories. Though other notions ofcriticality exist, this is the focus of the research presented here. In theory, complex networks inthe critical phase can perform collective information processing, which may be a key aspect incomplex life processes, such as genetic regulation. [19, 38, 41, 49]
Criticality in living systems
Kauffman [37] not only introduced random BNs (RBNs) to model genetic regulation, but alsopresented one of the first intuitions about criticality in living systems. He suggested thatbiologically plausible regulatory networks that exhibit the kinds of stable dynamics seen inbiology, must have, on average, low connectivity. Later, Kauffman elaborated on this intuitionby proposing the hypothesis that biological systems operate in a critical dynamical regime ,between order and chaos, and that a critical tuning parameter is the network connectivity.Kauffman hypothesized that each biochemical species in a given regulatory network should2ave two regulators on average—that is, a regular network connectivity where every statetransition rule incorporates k ≈ attractor hypothesis —network attractors correspond to cell types in genetic networks—hadbecome an accepted fact. Kauffman and Bornholdt examined the research on the biochemicalcriticality hypothesis and highlighted the following supporting evidence: (a) the distributionof genes damaged by the spreading effects of deleting selected genes in yeast mutant has apower law distribution, which indicates criticality; [51, 55] (b) similar, biologically-plausibleinitial configurations in global gene expression data obtained from macrophages follow somewhatparallel trajectories to attractors. These trajectories are neither identical, which would indicateorder, nor divergent, which would indicate chaos; [48] (c) a large battery of sixty-seven Booleanmodels of real biochemical networks operate in the critical regime based on the analysis oftheir predicted structural and dynamical properties. [20] Indeed, it is now widely accepted thatbiochemical complex networks are critical. [8, 30, 33, 40, 57] See Roli et al.[53] and Mu˜noz[47]for recent reviews that explore further evidence of criticality in living systems. The parameters and quantification of criticality
Several methods have been proposed for quantifying criticality in complex networks. Earlydevelopments were grounded in physics and dynamical systems theory. For example, Bak [6]showed that regular, spatially-extended, dissipative dynamical systems can evolve to a self-organized critical state with spatial and temporal power-law scaling behavior. Around the sameperiod, Langton studied network dynamics using computer simulations of cellular automata(CA)—a canonical discrete idealization of spatially extended systems. [41, 42] Langton madea very important connection between the local behavior of CA state transition rules and thecollective phenomenon of criticality. He introduced a (local) parameter, λ , to measure theproportion of state transitions in a given CA rule that do not go to a basal (quiescent) state.CA with different values of λ have collective behaviors that closely match distinct classes. InLangton’s account, the transition between order and chaos takes place at λ ≈ . structural theory (ST) of criticality for BNs. [23] According to this theory, if a BN hashomogeneous in-degree, k, and fixed bias, p , then the critical boundary between ordered andchaotic network dynamics is given by, 2 kp (1 − p ) = 1 . (1)The ST, as defined originally, holds for fixed connectivity and fixed bias. It has since been shownthat the same theory holds when connectivity is not fixed, but normally distributed around acharacteristic mean value. The same applies to the bias. [4] While Eq. 1 is theoretically well-founded, it is not an accurate predictor of dynamical regime, particularly if the BN dynamics arenear the critical edge. This is the case even for BNs that abide the most strict, fixed in-degreeand bias assumptions. We elaborate on this in the following sections.While Langton determined dynamical regime using properties such as transient lengths,number of attractors, and attractor sizes in CA, Derrida and colleagues formalized a quantitativecollective measure known as the Derrida parameter , ζ . [24] This parameter is derived fromthe Derrida plot: a curve that shows the degree to which small perturbations in pairs of3therwise identical initial configurations diverge in their dynamical trajectories. This divergenceis measured as the average number of different node-states (Hamming distance) that separatethe two initial trajectories after a number of time-steps. The ζ parameter is the slope of theDerrida plot at the origin. If ζ <
1, the BN is classified in the ordered regime. Conversely, if ζ >
1, it is classified as chaotic. Thus a value ζ ≈ ζ to determinethe dynamical regime of BN ensembles in this research. (See § Methods for details.)
Canalization
Canalization [62] is used to characterize the buffering of genetic and epigenetic perturbationsthat lead to the stability of phenotypic traits. [58, 60] Gene regulatory networks, for exam-ple, have the remarkable feature that they tend to be made of highly canalizing regulatoryinteractions. [8, 20, 30, 39] Canalization has been studied by characterizing redundancy in thestate transition rules of automata. [36, 38, 39, 46, 52] It is observed when an automaton’sstate transition can be determined from the known state of a subset of its inputs, which meansthe remaining inputs are contextually irrelevant or redundant. [52] Canalization influencesthe dynamical behavior of automata networks, contributes to their stability, [8, 35, 36, 43, 56]increases the likelihood of modular attractors, and makes canalizing networks more controllableby tuning external signals. [27, 46] Yet, canalization has not previously been consideredas a tuning parameter to define the structure and dynamics of critical networks or to findthe most biologically plausible models of biochemical systems by targeting their stability andcriticality.[20]Previous studies of the effects of canalization on network stability and criticality have focusedon the so-called strictly canalizing state-transition rules. [52] Such rules always have one inputthat, in at least one of its possible states, is sufficient to determine the automaton’s statetransition. The idea behind these studies was to build BNs with strictly canalizing state-transition rules, measure the average sensitivity of their nodes, and quantify the propensity ofan automaton to change its state as the result of perturbation to one of its inputs. [56] Thisnode measure was then extended to quantify sensitivity at the network level. Notably, theaverage network sensitivity is equivalent to the ST defined in Eq. 1 for predicting criticality.[20] However, canalization is a much more frequent phenomenon when we consider, not onlystrict canalization, but also collective canalization. [52]In Boolean automata, collective canalization is observed when a subset of inputs, in somestate combination, jointly determines an automaton’s state transition. An automaton’s effec-tive connectivity , k e , introduced by authors Marques-Pita and Rocha,[46] is a measure of theexpected minimal number of inputs that are necessary to determine its state transitions. Itaccounts for the existence of both strictly and collectively canalizing inputs. If we consider howthe original connectivity structure of a BN is affected by canalization, it becomes clear thatit is not a useful representation of how control signals propagate in network dynamics becausecanalizing rules make some edges in the original structure contextually redundant. The roles ofsome edges in transmitting control signals vary, however, in that some edges become completelyredundant, or conversely, essential, in different dynamical trajectories and attractors. Thereare many possible effective structures with very distinct dynamical behaviors for any given in-degree network structure. [27] This must be considered in order to understand how structureand dynamics account for control and criticality together. [20, 27] Effective connectivity caneasily be extended to a network measure by computing the mean in-degree of the effectivestructure of a given automata network. The effective network connectivity characterizes boththe interaction structure and the canalization in one parameter.We have addressed the limitations of the ST of criticality in automata networks [20] in Eq. 1by including the effective network connectivity as a tuning parameter to account for canalization.To test our main hypothesis that this parameter is a better predictor of criticality than the in-degree connectivity, k , we frame the prediction of dynamical stability as a binary classification4igure 1: Dynamical regimes in the ( k, (cid:104) k e (cid:105) ) parameter space. Dynamical regime (orderedor chaotic) was determined using the Derrida parameter ζ computed for the large RBN ensemblewe generated (see main text). Pie charts depict the dynamical regimes of RBN aggregates forthe possible ( k, (cid:104) k e (cid:105) ) pairs in our RBN ensembles. Blue and red areas indicate the proportionsof networks with stable and chaotic dynamics, respectively. The black dashed line correspondsto the critical (cid:104) k e (cid:105) , as described in the main text. The critical boundary equation, derivedusing binary symbolic regression in a linear model with an interaction term, is 0 . k + 0 . (cid:104) k e (cid:105) − . k (cid:104) k e (cid:105) = 1. Out of the 266,400 BNs in our RBN ensembles, 224,083 (approx. 84%) areclassified as chaotic.problem. We have produced a large dataset of random Boolean-network ensembles with nearly300K distinct networks generated under the same assumptions made by ST concerning networkconnectivity, k, and bias, p . To include effective network connectivity as a tuning parameter, wehave produced a catalog of state-transition rules for each k e value across the values of k and p .Furthermore, we have analyzed a large set of 63 models of biochemical regulation and signalingobtained from the Cell Collective repository. [31, 32] For potential application areas, notethat this repository includes automata models on, for example, lac operon interaction, T-cellreceptor signaling, yeast cell cycle and apoptosis, cholesterol regulation, Influenza A replication,Drosophila body segmentation, lymphocyte differentiation, and cortical area development. Results
We define the effective connectivity of a given homogeneous BN as the mean effective connec-tivity of its nodes, all of which have been sampled from a small interval of size ∆ k e = 0 . (cid:104) k e (cid:105) . In addition, based theprinciple of bias symmetry in logical rules, the compound term p (1 − p ) is set as an independentvariable that represents the bias parameter. The dynamical regime of every BN in our ensembledata is characterized by a binary transformation of its Derrida parameter, ζ , whereby ζ > ζ ≤ § Methods for further details.5 he canalization theory of criticality
We search and optimize binary classifiers to predict the dynamical regime of the RBNs in ourensemble dataset using six specific model classes of increasing complexity. For each class, thereis a model instance that considers the original connectivity, k , and another that considers (cid:104) k e (cid:105) instead. All other elements of a given class are kept identical in both instances; see § Methodsfor details. Figure 1 depicts the proportions of chaotic and stable BNs in our ensemble datasetfor the values of k and (cid:104) k e (cid:105) , as well as the best criticality decision boundary we obtainedwhen considering k and (cid:104) k e (cid:105) as tuning parameters. The majority of BNs in our ensemblesare classified as chaotic (84%), based on the Derrida parameter. Therefore, cross-validationprediction performance is best captured by measures tailored for unbalanced classificationscenarios such as the Matthews Correlation Coefficient (MCC). [7] We also show results forMcFadden’s R since we are performing logistic regression, as well as the Area Under the Curve (AUC) for ranking performance; see § Methods for details.Model class (1) has the lowest complexity, and serves to compare the predictive power ofthe original network connectivity, k , with that of the effective connectivity, (cid:104) k e (cid:105) , disregardingthe bias parameter. It yields the following decision boundaries: − . k = 1 , and 0 . (cid:104) k e (cid:105) = 1.The corresponding critical values for the tuning parameters are k = − .
11, and (cid:104) k e (cid:105) = 1 . (cid:104) k e (cid:105) over k is clear in this model class, since the model instancebased on, k, classifies every BN as chaotic, whereas the instance based on (cid:104) k e (cid:105) partitions thedata into two reasonably correct dynamical regimes. Indeed, as shown in the first column onthe left in Figure 2, while MCC( k ) ≈
0, MCC( (cid:104) k e (cid:105) ) ≈ .
49, with similar behavior for R .Moreover, AUC( k ) ≈ .
5, while AUC( (cid:104) k e (cid:105) ) ≈ .
88. Thus, the best classifier based solely onin-degree k is equivalent to a random coin toss, while the best classifier based solely on effectiveconnectivity k e yields reasonably good performance a . The study of model class (1), as well asthe lack of synergy between k and k e , demonstrates that the original network connectivity onits own carries no information about criticality, but effective connectivity, on its own, yields areasonable prediction of criticality. This result strongly suggests that dynamical canalizationalone is an important factor in criticality.Model class (2) is defined by the interaction between the term for the bias parameter p (1 − p ) and the term for either k or (cid:104) k e (cid:105) (see § Methods). The optimal decision boundariesobtained are: 1 . kp (1 − p ) = 1, and 3 . (cid:104) k e (cid:105) p (1 − p ) = 1. The corresponding performancemetrics are shown in the second column of Figure 2. Remarkably, all classification performancemeasures for model instance (cid:104) k e (cid:105) in class (2) are very high, with near-perfect MCC and R scores, and perfect ranking performance measured by AUC, as detailed below. In contrast, theclassification performance for the model instance based on k in the same class is substantially,and significantly, lower (see also Figure 5). To better understand the performance differencefor models in class (2) consider Figure 3. First, a very crisp boundary exists between stableand chaotic dynamics in the space ( (cid:104) k e (cid:105) , p ); the two regimes are more neatly organized withalmost no misclassifications beyond the critical boundary. This is in sharp contrast with theless distinct boundary observed in the ( k, p ) space around the critical boundaries, predicted byboth the ST and the optimized the instance of the model based on k . Indeed, in these casessubstantial misclassifcations occur, whereby stable networks are observed well into the predictedchaotic regime, and vice versa, especially for the ST boundary b . Notice, for instance, that BNs a To further ascertain whether k and k e interact synergistically to predict criticality, we perform binary logisticregression considering the linear effects of k and (cid:104) k e (cid:105) , as well as their interaction. The critical decision boundaryof the best such classifier is 0 . k + 0 . (cid:104) k e (cid:105) − . k (cid:104) k e (cid:105) = 1. However, the MCC ≈ .
49 is essentially the sameas (cid:104) k e (cid:105) instance of model 1 , which demonstrates that adding k does not increase the classification performanceof using k e alone. This is also clear from the best interaction model where the coefficient of (cid:104) k e (cid:105) is seven timeslarger than the others. b Notice that the ST is defined by a slightly different critical boundary (eq. 1) than what we obtained byoptimizing model 2 ( k ) against random ensemble data. This is likely because the ST was derived theoreticallywhile model 2 was derived from empirical data circumscribed to a finite k range. In any case, the ST is not Performance scores for the regression models used to find the optimalcritical boundary.
Each model belongs to one of six model classes—labeled in increasingorder of complexity. For each model class, orange illustrates k as a tuning parameter, andred, (cid:104) k e (cid:105) instead. In every class, (cid:104) k e (cid:105) is a better tuning parameter for criticality than k . See § Methods for further details about classes and performance measures.with stable dynamics are observed for most values of k when p = 0 . (cid:104) k e (cid:105) , p ) space, these networks neatly cluster at (cid:104) k e (cid:105) = 1, right on thecritical boundary. Similar behavior occurs for all bias values.The classification performance together with the observation of the arrangement of dy-namical regimes around the critical boundaries in Figure 3 demonstrate that using effectiveconnectivity ( (cid:104) k e (cid:105) ) instead of the original connectivity ( k ) in RBNs leads to a much moreaccurate, near-perfect prediction of the critical boundary that separates stable and chaotic dy-namics, as well as a more organized characterization of both regimes. In other words, accountingfor canalization and interaction bias at the node- or micro-level, leads to optimal prediction ofmacro-level dynamics. Indeed, model 2 ( (cid:104) k e (cid:105) ) shows the most accurate decision boundary for thecritical boundary, with more complex models yielding no increase in classification performance.We refer to this model as the canalization theory (CT) of criticality in BNs. The Canalization Theory optimizes complexity and classification performance
We use a Pareto front method to optimize for decision boundaries that best balance the trade-offbetween model complexity and classification performance. This method relies on the graphicalrepresentation shown in Figure 4, which depicts the performance of decision boundaries obtainedfrom the different model classes considered. A given boundary is marked with an arrow ifand only if its performance is greater than that of all models of lower complexity. In short,performance increases substantially when passing from model class (1) to (2), but not by usingmore complex model classes (3) and beyond. Indeed, in model 2 ( (cid:104) k e (cid:105) ), the CT, achievesnear-perfect classification performance with MCC = 0 .
96 and R = 0 .
94, and perfect rankingAUC ≈
1. Therefore, more complex model classes could not improve much at all over suchperformance. Interestingly, models based on k do not show much improvement in performancebeyond model complexity class (2), even though the performance of model 2 ( k ) is much smallerthan that of 2 ( (cid:104) k e (cid:105) ) with MCC = 0 .
73 and R = 0 . optimal on this range and leads to slightly worse classification performance, MCC= 0 .
73 and R = 0 .
28, thanmodel 2 ( k ). Dynamical regimes in the ( k, p ) and ( (cid:104) k e (cid:105) , p ) parameter spaces . Dynamicalregime (ordered, or chaotic) was determined using the Derrida parameter ζ computed for thelarge RBN ensemble we generated (see main text). Blue areas indicate proportions of networkswith ordered dynamics, and the red areas indicate the proportions that were found to be chaotic.Panel (A) depicts the ( k, p ) space, while panel (B), the ( (cid:104) k e (cid:105) , p ) space. The black dashed curvesrepresent criticality models in model class (2) described in the main text. The dashed bluecurve shown in (A) corresponds to the current criticality model per the ST, shown for reference.smaller than the respective values for the model 2 instance based on (cid:104) k e (cid:105) . In other words, eventhough there is much room to improve, increasing the complexity of the models based on theoriginal connectivity, k, does not yield performance gains. This implies that unless canalizationis factored in, as in the model instances based on (cid:104) k e (cid:105) , no increase in performance is gained overthe ST. We thus conclude that model class (2) is optimal in terms of simplicity and performancefor both instances, but the instance that uses (cid:104) k e (cid:105) is considerably (and significantly as shownbelow) better at predicting the dynamical regime of BNs. The classification performance of the CT is significantly better than that ofthe ST
To estimate the statistical significance of the increased performance of the CT, as well as toensure that it does not derive from over-fitting the data, we compare both instances of modelsin every class under cross validation (details in § Methods). The statistical significance resultsfor class (2) are shown in Figure 5. All performance measures for the CT are significantlybetter than for the model instance using k , based on paired-sample t-tests ( P < . The CT via unconstrained symbolic regression
We performed an unconstrained search using Symbolic Regression [54] alongside the constrainedsearch reported previously in this section. We used the symmetric effect of biases p and 1 − p on the Derrida parameter ζ to justify using only 0 < p ≤ . p (1 − p ). This type of unconstrained search works in a much larger space of modelclasses, so finding an optimal model that also guarantees minimal class complexity can be hard.8igure 4: Pareto front analysis of model complexity vs. performance for the sixmodel classes fit to RBN ensembles.
Models are in increasing order of complexity fromclass 1 to class 6. A model class is labeled on the axis only if its performance is greater than theperformances of all models of lower complexity. For each model class, orange illustrates k as atuning parameter, and red, (cid:104) k e (cid:105) instead. Arrows mark the performance of the optimal modelclass, characterized by a substantial rise followed by very little gain afterward. Notice that forall performance measures, model class 2 with (cid:104) k e (cid:105) has the best Pareto front performance.Figure 5: Classification performance of models in class 2 under nested 4-fold cross-validation.
Significant differences (
P < . k e ) is greater than that of the best instance of model class 2 with ( k )—a class thatincludes the ST. 9nterpreting the fitting functions and coefficients can also be difficult, as stochastic searchessometimes introduce artifacts in the classifier. Despite these potentially limiting aspects inherentto stochastic search algorithms, we obtained a high-performance classifier that belongs to thesame model class as the ST and the CT. The decision boundary for this classifier is the function3 . (cid:104) k e (cid:105) p = 1. The performance of this classifier is R = 0 .
88, and MCC = 0 .
93, valuesvery similar to those of the CT. Additional information about the top classifiers produced bysymbolic regression is shown in appendix S2.
The dynamics of systems biology models is very canalized and better charac-terized by the CT
We analyze 63 Boolean models of biochemical networks that have been experimentally validated.These models are from the Cell Collective repository. [31] We refer to the set 2 ,
979 automatain these models that are neither tautological nor contradictory as C . Approximately 48% of theautomata in C have one input ( k = 1) so the connectivity of these cannot be further reduced bycomputing k e . Automata with 2 ≤ k ≤ C , and the remaining 2%have 10 ≤ k ≤
15 inputs, see Figure 6(A). We excluded automata with k = 1, since they cannotbe reduced, and the very few automata with k ≥
10, were merged into a set C ∗ . The in-degreedistribution ( k ) of the automata in both sets C and C ∗ is highly right-skewed with skewness ≈ ≥
5. We thus report themedian and interquartile range as measures of central tendency and dispersion: Med( C k ) = 2,Med( C ∗ k ) = 3 and IQR( C k ) = IQR( C ∗ k ) = 3 −
1. The k e distributions for automata in C and C ∗ are also heavily right-skewed with approximately the same skewness ≈
2, and leptokurtic,with normalized kurtosis ≥ k e are: Med( C k e ) = 1 . C ∗ k e ) = 1 .
25, IQR( C k e ) = 1 . −
1, and IQR( C ∗ k e ) = 1 . − . k e box plots of the automata in C ∗ k e , one for each value of k . Being so heavilyleptokurtic, most of the automata in C ∗ have both in-degree k , and effective connectivity k e ,very close to the respective central tendency, namely k = 3, and k e ≈ .
25. However, the widerdispersion for k suggests that effective connectivity flattened the original in-degree distributionsof the BN models considered and shows that canalization is both very high and pervasive acrossdifferent systems biology models. See appendix S4 for additional details.The dynamical regime of the Cell Collective models can be inferred from their DerridaParameter, ζ , ( § Methods), which varies very little: IQR( ζ ) = 0 . − . ζ ∈ [0 . , . ζ >
1. The other 52 models have ζ valuesslightly below ζ = 1. The low dispersion, ζ ≈
1, is a strong indication that the Cell Collectivemodels are in, or very close to, the critical regime, validating what is known about them[20]. InFigure 7, we show that the near-critical status of these models is not clear in the ( (cid:104) k (cid:105) , (cid:104) p (cid:105) ) spaceof the ST, but is quite clearly revealed in the ( (cid:104) k e (cid:105) , (cid:104) p (cid:105) ) space of the CT. The critical boundarycurves are derived by fitting class-2 models representing the ST and the CT to maximize theMCC score. While the networks are dispersed mostly far from the boundary curve in the STspace, they cluster very near the boundary in the CT space. Thus, the latter better characterizesthe known dynamics of these models, which are mostly near critical. Indeed, looking at the AUCranking measure, we have AUC (ST) = 0 .
54, which is only marginally better than a randomtoss, while AUC (CT) = 0 .
81. In other words, The ranking (by distance to boundary) is farsuperior for the CT. The classification performance is also superior for the CT, even thoughthe many near-critical (and few chaotic) models make classification performance less relevant:MCC (ST) = 0 .
44, MCC (CT) = 0 . k and p , but the Cell Collective networks areheterogeneous. Therefore, we use the mean values of these quantities in our analysis, as shownin Figure 7. While the CT can be properly developed for heterogeneous networks in the future(see § Discussion), here we derive new critical boundary curves by re-fitting both variants of10igure 6:
Characterization of k and k e for the automata in the 63 Cell CollectiveBN models analyzed. Depicted is the subset of automata that have two or more inputs(52% of thetotal ), denoted in the main text by C ∗ . The median value for k is Med( C ∗ k ) = 3,while for k e , Med( C ∗ k e ) = 1 .
25. The low median values (and low dispersion for k e ; see main text)indicate, not only that there is a pervasive canalization in validated BN models of biochemicalsystems, but also that effective connectivity ‘flattens’ the original degree distributions. Onaverage, knowing the state of 1.25 inputs is sufficient to determine the state transitions of theseautomata.model class 2 to the heterogeneous Cell Collective data. Still, c coefficients of the new curvesare not very different from the optima found for the homogeneous case (Figure 3): in the( (cid:104) k (cid:105) , (cid:104) p (cid:105) ) space, the new c = 1 .
03 (was c = 1 .
49 for homogeneous case), and in the ( (cid:104) k e (cid:105) , (cid:104) p (cid:105) )space the new c = 3 . c = 3 . c results inshifting the boundary curves slightly to the right in the case of the heterogeneous networks of theCell Collective, thus increasing the area of the stable regime. This is an expected result, sincewe know that heterogeneous connectivity leads to more stable BN dynamics.[4] In summary, itis clear that including canalizing dynamics in a model of criticality yields a substantially bettercharacterization (cf. AUC score) and prediction (cf. MCC score) of the dynamics of systemsbiology automata network models. Discussion
The CT is more accurate in predicting criticality than the ST and belongs tothe same model class
Previous studies of criticality in automata networks have relied on the ST, which characterizesnetworks and their critical boundary in the ( k, p ) space. The CT introduced here includes theeffects of node-level canalization and characterizes networks and their critical boundary in the( (cid:104) k e (cid:105) , p ) space instead. In this new space, the criticality boundary leads to much more accuratepredictions (Figs. 2, 4, & 5), and also reveals a much more organized dynamical regime spacein both random ensembles (Fig. 3) and systems biology models (Fig. 7). Notably, the CTbelongs to the same model class as the ST c . The Pareto-optimal model class is of the form cκp (1 − p ), where the network connectivity term κ is the original in-degree ( k ) in the ST or the c We pursued both class-constrained and unconstrained regression analysis, leading to almost identical criticalboundaries in the same model class
Predicted dynamic regimes of Cell Collective BNs by ST (panel A) andCT (panel B).
Blue dots denote stable models ( ζ < ζ > k e ) in our new CT (See § Methods). The bias of state transition rules inthe network is denoted p , and coefficient c defines where the curve is positioned in the relevantparameter space (the smallest value of κ when p = 1 / § Results) shows that effective connectivity alone yields a reasonable predictionperformance, but in-degree alone does not ( § Results and Figs. 1, 2, & 4).
Effective connectivity captures characteristic properties of dynamical regime
In the space of 2 k possible logical rules for a given k , there are only (2 k ) − p when tautologies and contradictions are ignored, and this number is halved when taking intoaccount the principle of bias symmetry in Boolean functions. The ST implicitly assumes thatall functions of same k and p contribute in the same way to dynamical regime. We demonstrate,however, that the finer characterization of the canalized logic of individual automata is necessaryto accurately predict the dynamical regime of automata networks. In Figure 3(A), homogeneousnetworks of the same size whose nodes are automata with the exact same k and p are shownto have opposite dynamical regimes, even far from the critical boundary of the ST. In contrast,when we transform the critical phase transition space to the finer characterization enabled by k e ,as in Figure 3(B), networks with the same p and (cid:104) k e (cid:105) almost always display the same dynamicalregime—except very near the CT critical boundary—as demonstrated by a near-perfect MCCscore ( § Results). Notice further that in this latter case, networks are not homogeneous in k e and are thus grouped by (cid:104) k e (cid:105) . Therefore, some variation in dynamical regime for the same p and (cid:104) k e (cid:105) is expected. Even so, such variation is only observed near the critical boundary, which12emonstrates that k e (and its mean value in the BN) is very characteristic of the dynamicalregime. Finally, note that k e includes the contribution of collective canalization, while othermeasures of canalization such as sensitivity do not ( § Methods). This means that the nonlineareffects of collective canalization are included and contribute to the finer characterization ofcriticality that the CT provides.
Effective structure is more homogeneous than original structure
While we are aware that the ST has been extended to consider heterogeneous BNs—with, forexample, power-law distributions [4, 25]—we have not yet considered such an extension forthe CT. One reason is that the BN models of biochemical regulation and random ensemblesused here are not large enough to properly distinguish heterogeneous degree distributions. [14]Another important reason, as this study reveals, is that the original interaction structure of theBN is replaced by canalized dynamics that instantiates a more homogeneous effective structurewith low-degree distributions. Indeed, a consistent observation in our results is that (cid:104) k e (cid:105) (cid:28) k formost automata both in the random BN ensembles and in the 63 heterogeneous Boolean models ofbiochemical regulation and signaling that we analyzed— k e = k only for the two parity functionsfor each k .[46] Furthermore, k e is significantly smaller in Cell Collective automata than forsame size and bias random automata.[26] Therefore, the ubiquitous canalization (redundancy)present in automata nodes can dramatically alter the original interaction structure of a network,revealing a truer effective structure that takes canalizing dynamics into account.It is known that such effective structure affects the dynamics and controlability of BNs.[27,46] While effective structure can be easily computed [18] and used to uncover control pathwaysin biochemical regulation and signaling, [26, 46] we do not yet know how its topology is organizedacross random and real-world networks. The evidence presented here for the systems biologymodels in the Cell Collective indicates that the effective structure is much more homogeneousthan the original interaction structures, as demonstrated by the small dispersion of k e values incomparison to the dispersion of k ( § Results). This suggests that very heterogeneous biologicalregulation and signaling networks (lognormal or asymptotic power-law degree distributions)may effectively function dynamically with more homogeneous and low-degree distributions.An exhaustive study of the topology of effective structure is still needed to investigate thishypothesis. The present research, however, offers much evidence that the canalizing dynamicsthat defines an underlying effective structure is an important factor in determining criticaldynamics in random and biochemical networks.
Beyond criticality: harnessing canalization in complex systems
The theoretical development and experimental results we present provide a new theory ofcriticality that accounts for canalization, the CT. Based on the same class of functions, the newtheory does not increase the complexity of the current theory, but increases substantially andsignificantly the ability to accurately predict the dynamical regime of automata networks. Giventhat automata networks are canonical examples of complex multivariate dynamical systems,the high classification accuracy of the new theory strongly suggests that canalization is a primemechanism for tuning the dynamical regime of complex systems. Indeed, our results withsystems biology models suggest that canalization plays a fundamental role in the dynamics ofbiochemical regulation and signaling, which is missed by studying the structure of biochemicalinteractions alone. Therefore, beyond the study of criticality, a precise characterization ofcanalization is likely to enable the tailoring of interventions in complex systems towards desirabledynamical behavior.[27, 46]The concept of effective connectivity underlying the CT integrates information about thestructure and dynamics of multivariate interactions—in-degree connectivity and input redun-dancy in state transitions, respectively. It implies that the behavior and function of complex13ystems is dictated by an effective structure that is revealed only after removal of causalredundancy in the logic of how variables integrate input signals. This truer structure ofinteractions is a more accurate portrait of causal multivariate dynamics, which is more canalizedthan the original structure of interactions implies. This is why we find stable (or critical)dynamics in networks whose structure would be predicted by the current ST to be chaotic, andvice versa (see Figures 3 & 7). In this sense, canalization is a network-level mechanism that canbe tailored by evolution. Going forward, the methodology can provide powerful analytical toolsto uncover the causal pathways that determine control and resilience to interventions in variouscomplex systems,[18] such as genetic regulation in biological development,[46] and treatmentstrategies in cancer and other diseases.[26]
Methods
Boolean automata definitions and notation A Boolean automaton is a binary variable, x ∈ { , } , where state 0 is interpreted as false ( off or unexpressed ), and state 1 as true ( on or expressed ). The states of x are updated in discretetime-steps, t , according to a Boolean state transition rule of k inputs: x t +1 = f (cid:0) i t , ..., i tk (cid:1) .Therefore f : { , } k → { , } . Such a rule can be defined by a Boolean logic formula or bya look-up (truth) table (LUT) with 2 k entries. Each LUT entry of an automaton x , f α , isdefined by (1) a specific condition , which is a conjunction of k inputs represented as a unique k -tuple of input-variable (Boolean) states, and (2) the automaton’s next state (transition) x t +1 ,given the condition. We denote the entire state transition rule of an automaton x in its LUTrepresentation as F ≡ { f α : α = 1 , ..., k } . Boolean networks A Boolean Network (BN) is a graph
B ≡ ( X, E ), where X is a set of n Boolean automata nodes x i ∈ X, i = 1 , ..., n , and E is a set of directed edges e ji ∈ E : x i , x j ∈ X . If e ji ∈ E , thenautomaton x j is an input to automaton x i , as computed by F i . X i = { x j ∈ X : e ji ∈ E } whichdenotes the set of input automata of x i . Its cardinality, k i = | X i | , is the in-degree of node x i ,which determines the size of its LUT, | F i | = 2 k i . We refer to each entry of F i as f i : α , α = 1 ... k i .At any given time t , B is in a specific configuration of node states, x t = [ x , x , ..., x n ]. We usethe terms state for individual automata ( x ) and configuration ( x ) for the collection of states ofthe set of automata of B , i.e., the collective network state. Starting from an initial configuration, x , the nodes of a BN are updated with a synchronous or asynchronous policy. The dynamics of B is thus defined by the temporal sequence of the 2 n possible configurations that ensue. Thetransitions between configurations can be represented as a state transition graph , STG, whereeach vertex is a configuration, and each directed edge denotes a transition from x t to x t +1 .The STG of B thus encodes the network’s entire dynamical landscape . Under the synchronousupdating scheme (used in the studies reported in this paper) configurations that repeat, suchthat x t + µ = x t , are known as attractors ; fixed point when µ = 1, and limit cycle , with period µ ,when µ >
1. The disconnected subgraphs of a STG that lead to an attractor are known as basinsof attraction . A BN B has a finite number of attractors, b , each denoted by A i : i = 1 , ..., b . Effective Connectivity
The effective connectivity ( k e ) tallies the expected number of inputs of an automaton x i thatare minimally sufficient to determine an its state transitions. When a subset of such minimalinputs is in a certain state combination, the remaining inputs are effectively redundant—theycan be in any state with no effect on the transition of x i . These effective inputs, or enputs for14hort, can be identified using the schema redescription methodology introduced by Marques-Pita and Rocha, [46] which we illustrate next. The formula for the logic rule OR with twoinputs can be written as x = i ∨ i . The Truth Table for this expression can be redescribedas wildcard schemata as follows: F (cid:48) = { (1 , , ( , } and F (cid:48) = { (0 , } , where F (cid:48) denotes theset of wildcard schemata that prescribe transitions to 1 (ON), and conversely, F (cid:48) denotes thewildcard schemata prescribing transitions to 0 (OFF), a set that contains only one schema inthis case. The wildcard symbol ‘ , i = 1, then the transition x t +1 = 1 is guaranteed,regardless of the state of i . A closer look at F (cid:48) reveals that only one input is necessary tosettle transitions to 1 (ON) in this example, and this is the case for the OR rule with any numberof inputs. The entire set of schemata for a given automaton can be used to determine its effectiveconnectivity. This requires the computation of the average minimal number of enputs necessaryto determine its state transition. Effective connectivity is computed from the upper bound on input redundancy , [46] yielding a sum of the minimal number of enputs required to settle eachof the possible 2 k state transitions specified in the automaton’s LUT. This value is then dividedby 2 k to obtain k e . For this computation we iterate over the entire LUT of the automaton; foreach LUT entry we accumulate the number of enputs of the wildcard schema matched, withthe largest number of wildcard symbols; once all LUT entries have been processed, the finalaccumulated sum is divided the the LUT size. In our example k e = 1 .
25. This is the case since three of the four look-up entries in the LUT have one of the inputs in the on state, which issufficient to settle the transition, while one of the entries requires two ( i = 0 , i = 0), so in thiscase k e = [(3 ×
1) + (1 × /
4, see [46] for details. Note that k e ≤ k and that the higher thedifference between k e and k , the more canalization there is in the automaton rule, and also, thelower the effective connectivity the automaton will have as a node in a BN.Other measures of canalization in Boolean automata exist and have been linked to criticality,such as average sensitivity , [56] and the more general c-sensitivity .[34] Effective connectivitypresents several advantages over these measures. First and foremost, it is designed to cap-ture collective canalization,[46] a very common non-linear phenomenon in automata wherebya subset of inputs jointly determine the state of an automaton, while rendering redundantthe complement subset of inputs.[52] In contrast, sensitivity independently aggregates theinfluence ( activity ) of each individual input to an automaton. It is thus a linear measure ofcanalization. This means that effective connectivity provides a more nuanced and realisticmeasurement of canalization that includes non-linear effects.[26, 45] For instance, even forautomata of k = 2, sensitivity does not discriminate between such common Boolean functionsas conjunction/disjunction and proposition/negation: s ( x ∧ x ) = s ( x ∨ x ) = s ( x ) = s ( ¬ x ) = 1. Effective connectivity, on the other hand, correctly accounts for the additionalcollective canalization that is present in the conjunction/disjunction (and other) functions: k e ( x ∧ x ) = k e ( x ∨ x ) = 5 / .
25, while k e ( x ) = k e ( ¬ x ) = 1. Since non-linear, collectivecanalization increases with k ,[26, 52] the finer characterization of the phenomenon provided byeffective connectivity becomes more relevant as well. Interestingly, both sensitivity and effectiveconnectivity can be easily computed from our schema description methodology,[26] which isavailable in the CANA Python package. [18] Finally, ‘ c -sensitivity’ [34] extends sensitivity tosubsets of c inputs, but it results in a vector of k values for each c , which is much less amenableto the regression analysis of criticality boundaries we pursue in this study than is the scalarvalue measured by k e . Generation of RBN ensembles
Each of the ensembles of RBNs that we produced for this study is characterized by a set oftuning parameters, namely ( k, k e , p ). The network connectivity k is a fixed (homogeneous)variable. This means that in our ensembles every node x i is connected to k nodes. The effectiveconnectivity is the mean value in a small interval (bin), and the bias is also fixed (homogeneous).15ote that the values of these parameters are always homogeneously distributed, in alignmentwith the assumptions made by the ST in Eq. 1. For a given value combination of ( k, k e , p ) asingle random BN is generated by choosing: (1) for each constituent node, a random set of k input nodes; and (2) a random Boolean automaton with k inputs, output-bias p , and effectiveconnectivity in a small range k e ± (cid:15) from an existing catalog. The reason for binning k e is thatthe possible values for this parameter vary significantly for each combination of k and p , whichleads to a sparse matrix of viable ensembles ( k, k e , p ), where viability is determined by theexistence of Boolean state transition rules that satisfy specific combinations of the parametervalues (see appendix S3 for further details). Thus, without loss of information, we bin k e usinga small bin size (cid:15) = 0 .
25 leading to k e being homogeneously distributed in regular intervals ofsize ∆ k e = 0 .
5, and to a more dense matrix of viable ensembles. Because the values of k e arebinned, we refer to the k e tuning parameter as (cid:104) k e (cid:105) . Producing a random Boolean automatonwith a given ( k, p ) is simple: (1) generate an all-zeroes vector of length 2 k ; (2) assign the stateone ( on ) to (2 k ) p LUT random entries in the resulting vector; and (3) assume the updatedvector represents the state transitions of the automaton in the lexicographic order of inputcombinations. To control for k e , we generate a catalog of Boolean automata with a largenumber of ( k, k e , p ) value combinations, from which automata with the appropriate parametervalues are picked during the generation of the RBN ensembles. The catalogs for Boolean rulesof k = 2 , , k , automata are first obtained by random generationfor a given k and p , with their k e subsequently computed. The number of possible automatafor a given k and p is (cid:0) k p (2 k ) (cid:1) . Thus, for k >
4, the catalogs contain a random sample of 10 Boolean rules for each ( k, p ) if the total number possible is greater than 10 , and all the Booleanstate transition rules otherwise. Additionally, to obtain automata with k e in ranges essentiallyinaccessible to random generation via k and p alone, we use a genetic algorithm. We refer theinterested reader to appendix S3 for details. We have considered the following ranges for ourtuning parameters: the number of nodes per network N = 100, k ∈ { , , , , } , p = [0 . , . p = 1 / k , and (cid:104) k e (cid:105) = [1 , k ] with ∆ k e = 0 .
5. By sweeping the space of values for ourensemble parameters we have generated a total of 266.4K RBNs.
Computation of the Derrida parameter
For a given BN, we compute the ζ parameter [23, 24, 39] by first generating I = 250 randominitial configurations, and producing an almost identical copy for each, where the copy differsonly in the state of a small number m of states that have been perturbed (flipped). We set thisvalue to be a random integer m ∈ [1 , .., N/ t time steps; we set t = 1. Third, computingthe Hamming distance between the two resulting configurations. Fourth, for each value of m ,averaging the Hamming distances obtained in the previous step and and plotting them against m to produce the Derrida plot. Finally, fifth, calculating ζ as the slope of the Derrida plot atthe origin. A value of ζ = 1 indicates criticality. A value above (below) this is interpreted asmeaning the BN is in the chaotic (stable) dynamical regime. Constrained search for decision boundaries
The dataset we produce contains individual RBNs, each characterized by the independentvariables k, p , and k e , and with one dependent variable with value one (1) if ζ > R=step(logistic(model)) , where the output of the logistic function isthe probability that the dependent variable has value one (chaotic regime). The output of thestep function is the predicted binary value of the dependent variable given a threshold τ = 0 . τ then16he classifier predicts that BN to be in the chaotic regime, and critical/stable otherwise. Eachmodel tested belongs to one of the following model classes, where κ is the in-degree k in the STor the mean effective connectivity (cid:104) k e (cid:105) in the CT, listed in increasing order of model complexity.Model complexity is defined by the number of terms and the number of predictors in each term(in that order): c κ ; c κp (1 − p ); c κ + c p (1 − p ); c κ + c κp (1 − p ); c κp (1 − p ) + c p (1 − p ); c κ + c κp (1 − p ) + c p (1 − p ); In our binary logistic regression we use the p (1 − p ) as a single independent variableaccounting for the bias, rather than just p due to the principle of duality in Boolean logic.The coefficients derived for each criticality model are used to construct a decision surface. Forthis, the resulting equations have been manipulated so that the independent variables and theircoefficients are on the left-hand side and the value (1) on the right-hand side, thus facilitatingcomparisons with the ST. Performance measures
Mc-Fadden’s R is a standard goodness-of-fit measure used for logistic regression models. It iscomputed as one minus the ratio of the log-likelihood of the model to that of the intercept-onlymodel. [15] The maximum value of this pseudo R is 1. The MCC is ideal for computingclassification performance in unbalanced scenarios, [7] such as the one studied here, wherebythere are many more instances of chaotic automata networks in the random ensembles thaninstances of stable network dynamics. Computed for the classifier using model predictions andtest data, it is defined as a function of the number of true positives (TP), false positives (FP),true negatives (TN) and false negatives (FN): M CC = T P × T N − F P × F N √ ( T P + F P )( T P + F N )( T N + F P )( T N + F N ) . [7]The MCC ranges between -1 and 1, where -1 indicates perfect opposite classification, 1 indicatesperfect classification, and 0 indicates random classification. Here, the positive label is associatedwith the chaotic dynamical regime R = 1, and the negative label with the stable (stable/critical)regime R = 0. The AUC is defined as a function of the true positive rate (TPR), the proportionof true positives in the total number of positive instances, false positive rate (FPR), and theproportion of false positives in the total number of negative instances, as follows: AU C = (cid:82) T P R ( T ) F P R (cid:48) ( T ) dT . The AUC ranges between 0 and 1, for perfectly incorrect and correctclassification at the endpoints, respectively. A random classifier yields a value of 0 .
5. It isinterpreted as the probability with which the classifier ranks positive instances (label 1) higherthan negative instances (label 0). [29]
Cross-validation
The full dataset was randomly split into 4 non-overlapping equally sized partitions (75% − outer foldings . A similarprocedure was followed on each of the training splits, yielding a total of 16 training-testing pairs(see appendix S1 for further details). Measures of classification and regression performance (aswith the full dataset) on the testing splits were collected. The 16 sets of performance scoreswere averaged to produce an estimate of generalization performance score for each measure.Between-model comparisons were made using pair-sample t-tests because the two models wereevaluated on the same set of sixteen test folds. The paired t-tests were one-sided with thealternative hypothesis that the mean score of model 2 ( (cid:104) k e (cid:105) ) is greater than that of model 2 ( k ).17 ymbolic regression A supplemental study was performed using a different curve fitting method to find the criticaldecision surface. We used symbolic regression (a type of unconstrained search), which is, inessence, a genetic programming algorithm. [54] The symmetric effect of the biases p and 1 − p on the Derrida parameter was used to prune the search space by considering 0 < p ≤ . k e and p with a coefficient that varied slightly in different runs. Theensembles were defined in the same way as in the main methods with the only difference thatwe used networks of size N = 48 instead of N = 100. The best classifier found was the function3 . (cid:104) k e (cid:105) p = 1, with performance values very close to those of the CT. See appendix S2 forfurther details. References [1] R´eka Albert. Boolean modeling of genetic regulatory networks. In
Complex networks , pages459–481. Springer, 2004.[2] R´eka Albert and Hans G Othmer. The topology of the regulatory interactions predicts theexpression pattern of the segment polarity genes in drosophila melanogaster.
Journal oftheoretical biology , 223(1):1–18, 2003.[3] Reka Albert and Juilee Thakar. Boolean modeling: a logic-based dynamic approach forunderstanding signaling and regulatory networks and for making useful predictions.
WileyInterdisciplinary Reviews: Systems Biology and Medicine , 6(5):353–369, 2014.[4] Maximino Aldana. Boolean dynamics of networks with scale-free topology.
Physica D:Nonlinear Phenomena , 185(1):45–66, 2003.[5] Uri Alon.
An introduction to systems biology: design principles of biological circuits . CRCpress, 2019.[6] Per Bak, Chao Tang, and Kurt Wiesenfeld. Self-organized criticality.
Physical Review A , 38(1):364–374, 1988. ISSN 10502947. doi:10.1103/PhysRevA.38.364 . arXiv:2003.08130 .[7] Pierre Baldi, Søren Brunak, Yves Chauvin, Claus AF Andersen, and Henrik Nielsen. As-sessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics ,16(5):412–424, 2000.[8] Enrique Balleza, Elena R Alvarez-Buylla, Alvaro Chaos, Stuart Kauffman, Ilya Shmulevich,and Maximino Aldana. Critical dynamics in genetic regulatory networks: examples fromfour kingdoms.
PLoS One , 3(6):e2456, 2008.[9] Albert-L´aszl´o Barab´asi. The network takeover.
Nature Physics , 8(1):14, 2011.1810] Albert-L´aszl´o Barab´asi et al.
Network science . Cambridge university press, 2016.[11] Alain Barrat, Marc Barthelemy, and Alessandro Vespignani.
Dynamical processes oncomplex networks . Cambridge university press, 2008.[12] Stefan Bornholdt. Boolean network models of cellular regulation: prospects and limitations.
Journal of the Royal Society Interface , 5(Suppl 1):S85–S94, 2008.[13] Stefan Bornholdt and Stuart Kauffman. Ensembles, dynamics, and cell types: Revisitingthe statistical mechanics perspective on cellular regulation.
Journal of Theoretical Biology ,467:15–22, 2019. ISSN 10958541. doi:10.1016/j.jtbi.2019.01.036 . arXiv:1902.00483 .URL https://doi.org/10.1016/j.jtbi.2019.01.036 .[14] Anna D Broido and Aaron Clauset. Scale-free networks are rare. arXiv preprintarXiv:1801.03400 , 2018.[15] J. Bruin. Faq: What are pseudo r-squareds? , OCT 2011.[16] Rui Chang, Robert Shoemaker, and Wei Wang. Systematic search for recipes to generateinduced pluripotent stem cells. PLoS computational biology , 7(12):e1002300, 2011.[17] Madalena Chaves, Reka Albert, and Eduardo D Sontag. Robustness and fragility of booleanmodels for genetic regulatory networks.
Journal of theoretical biology , 235(3):431–449, 2005.[18] Rion B. Correia, Alexander J. Gates, Xuan Wang, and Luis M. Rocha. Cana: Apython package for quantifying control and canalization in boolean networks.
Frontiersin Physiology , 9, 2018.[19] James P Crutchfield and Karl Young. Computation at the onset of chaos. In
The SantaFe Institute, Westview . Citeseer, 1988.[20] Bryan C Daniels, Hyunju Kim, Douglas Moore, Siyu Zhou, Harrison Smith, Bradley Karas,Stuart A Kauffman, and Sara I Walker. Logic and connectivity jointly determine criticalityin biological gene regulatory networks, 2018. arXiv:1805.01447.[21] Maria Davidich and Stefan Bornholdt. The transition from differential equations to booleannetworks: a case study in simplifying a regulatory network model.
Journal of TheoreticalBiology , 255(3):269–277, 2008.[22] Maria I Davidich and Stefan Bornholdt. Boolean network model predicts cell cycle sequenceof fission yeast.
PloS one , 3(2):e1672, 2008.[23] Bernard Derrida and Yves Pomeau. Random networks of automata: a simple annealedapproximation.
EPL (Europhysics Letters) , 1(2):45, 1986.[24] Bernard Derrida and Dietrich Stauffer. Phase transitions in two-dimensional kauffmancellular automata.
EPL (Europhysics Letters) , 2(10):739, 1986.[25] Jeffrey J Fox and Colin C Hill. From topology to dynamics in biochemical networks.
Chaos:An Interdisciplinary Journal of Nonlinear Science , 11(4):809–815, 2001.[26] A. J. Gates, R. Brattig Correia, X. Wang, and L. M. Rocha. The effective graph revealsredundancy and collective canalization in biochemical regulation. Submitted, 2020.[27] Alexander J Gates and Luis M Rocha. Control of complex networks requires both structureand dynamics.
Scientific reports , 6, 2016.1928] C˘alin C Guet, Michael B Elowitz, Weihong Hsing, and Stanislas Leibler. Combinatorialsynthesis of genetic networks.
Science , 296(5572):1466–1470, 2002.[29] David J Hand. Measuring classifier performance: a coherent alternative to the area underthe roc curve.
Machine learning , 77(1):103–123, 2009.[30] Stephen E Harris, Bruce K Sawhill, Andrew Wuensche, and Stuart Kauffman. A modelof transcriptional regulatory networks based on biases in the observed regulation rules.
Complexity , 7(4):23–40, 2002.[31] Tomas Helikar. Interactive modeling of biological networks. https://cellcollective.org , 2016.[32] Tom´aˇs Helikar, Bryan Kowal, Sean McClenathan, Mitchell Bruckner, Thaine Rowley, AlexMadrahimov, Ben Wicks, Manish Shrestha, Kahani Limbu, and Jim A Rogers. The cellcollective: toward an open and collaborative approach to systems biology.
BMC systemsbiology , 6(1):96, 2012.[33] Jorge Hidalgo, Jacopo Grilli, Samir Suweis, Miguel A. Mu˜noz, Jayanth R. Banavar, andAmos Maritan. Information-based fitness and the emergence of criticality in living systems.
Proceedings of the National Academy of Sciences of the United States of America , 111(28):10095–10100, 2014. ISSN 10916490. doi:10.1073/pnas.1319166111 . arXiv:1307.4325 .[34] Claus Kadelka, Jack Kuipers, and Reinhard Laubenbacher. The influence of canalizationon the robustness of boolean networks. arXiv preprint arXiv:1607.04474 , 2016.[35] Fredrik Karlsson and Michael H¨ornquist. Order or chaos in boolean gene networks dependson the mean fraction of canalizing functions. Physica A: Statistical Mechanics and itsApplications , 384(2):747–757, 2007.[36] Stuart Kauffman, Carsten Peterson, Bj¨orn Samuelsson, and Carl Troein. Genetic networkswith canalyzing boolean rules are always stable.
Proceedings of the National Academy ofSciences of the United States of America , 101(49):17102–17107, 2004.[37] Stuart A Kauffman. Metabolic stability and epigenesis in randomly constructed geneticnets.
Journal of theoretical biology , 22(3):437–467, 1969.[38] Stuart A Kauffman. Emergent properties in random complex automata.
Physica D:Nonlinear Phenomena , 10(1-2):145–156, 1984.[39] Stuart A Kauffman.
The origins of order: Self organization and selection in evolution .Oxford University Press, USA, 1993.[40] Dmitry Krotov, Julien O. Dubuis, Thomas Gregora, and William Bialek. Morphogenesisat criticality.
Proceedings of the National Academy of Sciences of the United States ofAmerica , 111(10):3683–3688, 2014. ISSN 10916490. doi:10.1073/pnas.1324186111 .[41] Christopher Langton. Computation at the edge of chaos: Phase transition and emergentcomputation. Technical report, Los Alamos National Lab., NM (USA), 1990.[42] Christopher G Langton. Studying artificial life with cellular automata.
Physica D:Nonlinear Phenomena , 22(1-3):120–149, 1986.[43] Lori Layne, Elena Dimitrova, and Matthew Macauley. Nested canalyzing depth andnetwork stability.
Bulletin of mathematical biology , 74(2):422–433, 2012.2044] Javier Mac´ıa, Stefanie Widder, and Ricard Sol´e. Why are cellular switches boolean? generalconditions for multistable genetic circuits.
Journal of theoretical biology , 261(1):126–135,2009.[45] S. Manicka.
The role of canalization in the spreading of perturbations in Boolean networks .Doctoral dissertation, Indiana University, Informatics and Computing, May 2017.[46] Manuel Marques-Pita and Luis M Rocha. Canalization and control in automata networks:body segmentation in drosophila melanogaster.
PloS one , 8(3):e55946, 2013.[47] Miguel A Munoz. Colloquium: Criticality and dynamical scaling in living systems.
Reviewsof Modern Physics , 90(3):031001, 2018.[48] Matti Nykter, Nathan D. Price, Maximino Aldana, Stephen A. Ramsey, Stuart A.Kauffman, Leroy E. Hood, Olli Yli-Harja, and Ilya Shmulevich. Gene expression dynamicsin the macrophage exhibit criticality.
Proceedings of the National Academy of Sciencesof the United States of America , 105(6):1897–1900, 2008. ISSN 00278424. doi:10.1073/pnas.0711525105 .[49] Norman H Packard. Adaptation toward the edge of chaos.
Dynamic patterns in complexsystems , 212:293, 1988.[50] Isabelle S Peter, Emmanuel Faure, and Eric H Davidson. Predictive computation ofgenomic logic processing functions in embryonic development.
Proceedings of the NationalAcademy of Sciences , 109(41):16434–16442, 2012.[51] P. R¨am¨o, J. Kesseli, and O. Yli-Harja. Perturbation avalanches and criticality in generegulatory networks.
Journal of Theoretical Biology , 242(1):164–170, 2006. ISSN 00225193. doi:10.1016/j.jtbi.2006.02.011 .[52] CJ Olson Reichhardt and Kevin E Bassler. Canalization and symmetry in boolean modelsfor genetic regulatory networks.
Journal of Physics A: Mathematical and Theoretical , 40(16):4339, 2007.[53] Andrea Roli, Marco Villani, Alessandro Filisetti, and Roberto Serra. Dynamical criticality:overview and open questions.
Journal of Systems Science and Complexity , 31(3):647–663,2018.[54] Michael Schmidt and Hod Lipson. Distilling free-form natural laws from experimental data.
Science , 324(5923):81–5, Apr 2009. doi:10.1126/science.1165893 .[55] R. Serra, M. Villani, A. Graudenzi, and S. A. Kauffman. Why a simple model ofgenetic regulatory networks describes the distribution of avalanches in gene expressiondata.
Journal of Theoretical Biology , 246(3):449–460, 2007. ISSN 00225193. doi:10.1016/j.jtbi.2007.01.012 .[56] C Seshadhri, Yevgeniy Vorobeychik, Jackson R Mayo, Robert C Armstrong, and Joseph RRuthruff. Influence and dynamic behavior in random boolean networks.
Physical reviewletters , 107(10):108701, 2011.[57] Ilya Shmulevich, Stuart A Kauffman, and Maximino Aldana. Eukaryotic cells aredynamically ordered or critical but not chaotic.
Proceedings of the National Academyof Sciences of the United States of America , 102(38):13439–13444, 2005.[58] Mark L Siegal and Aviv Bergman. Waddington’s canalization revisited: developmentalstability and evolution.
Proceedings of the National Academy of Sciences , 99(16):10528–10532, 2002. 2159] Steven H Strogatz.
Nonlinear dynamics and chaos: with applications to physics, biology,chemistry, and engineering . CRC Press, 2018.[60] Kirsten HWJ ten Tusscher and Paulien Hogeweg. The role of genome and gene regulatorynetwork canalization in the evolution of multi-trait polymorphisms and sympatricspeciation.
BMC Evolutionary Biology , 9(1):159, 2009.[61] Ren´e Thomas.
Kinetic Logic: A Boolean Approach to the Analysis of Complex RegulatorySystems: Proceedings of the EMBO Course “Formal Analysis of Genetic Regulation”, Heldin Brussels, September 6–16, 1977 , volume 29. Springer Science & Business Media, 2013.[62] Conrad H Waddington. Canalization of development and the inheritance of acquiredcharacters.
Nature , 150(3811):563, 1942.[63] Gang Yang, Colin Campbell, and R´eka Albert. Compensatory interactions to stabilizemultiple steady states or mitigate the effects of multiple deregulations in biologicalnetworks.
Physical Review E , 94(6):062316, 2016.[64] Jorge G´omez Tejeda Za˜nudo, Maurizio Scaltriti, and R´eka Albert. A network modelingapproach to elucidate drug resistance mechanisms and predict combinatorial drugtreatments in breast cancer.
Cancer Convergence , 1(1):5, 2017.[65] Jorge GT Za˜nudo, Steven N Steinway, and R´eka Albert. Discrete dynamic networkmodeling of oncogenic signaling: mechanistic insights for personalized treatment of cancer.
Current Opinion in Systems Biology , 9:1–10, 2018.
Acknowledgements
MMP Acknowledges input and discussions about the original ideas with Prof. Christof Teuscherand Prof. Melanie Mitchell (Portland State University, USA), as well as funding provided byFunda¸c˜ao para a Ciˆencia e a Tecnologia (Portugal) grant PTDC/EIA-CCO/114108/2009. Theauthors thank Deborah Rocha for thorough line editing. LMR was partially funded by theNational Institutes of Health, National Library of Medicine Program, grant 01LM011945-01, bya Fulbright Commission fellowship, by NSF-NRT grant 1735095 “Interdisciplinary Training inComplex Networks and Systems,” and by Funda¸c˜ao para a Ciˆencia e a Tecnologia (Portugal)grant PTDC/EIA-CCO/114108/2009
Author contributions statement
MMP and LMR conceived hypothesis and research rationale. SM, MMP and LMR designedand executed the experiments, analyzed the data, and wrote the paper
Additional information